Merge branch 'cassandra-3.0' into cassandra-3.9

commit: a98552e65c823238862893c8cf515f866cccbbd7 [log] [tgz]
author: Marcus Eriksson <marcuse@apache.org> Mon Aug 01 10:46:08 2016 +0200
committer: Marcus Eriksson <marcuse@apache.org> Mon Aug 01 10:46:08 2016 +0200
tree: f200948c154b84ca248d8752dbd70c1d5d3f708f
parent: 5fb6f95859f3b08e1037439ce1fe1475434af1dd [diff]
parent: 10e719cb6c15b38ff2ae769734d0508509d2ea22 [diff]
diff --git a/.gitignore b/.gitignore
index acaa51a..f5b1ce1 100644
--- a/.gitignore
+++ b/.gitignore

@@ -71,3 +71,7 @@
 lib/jsr223/jython/cachedir
 lib/jsr223/scala/*.jar
 
+/.ant-targets-build.xml
+
+# Generated files from the documentation
+doc/source/configuration/cassandra_config_file.rst

diff --git a/CHANGES.txt b/CHANGES.txt
index c2d7a4d..e7d9066 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt

@@ -1,4 +1,7 @@
-3.0.9
+3.9
+ * cqlsh: Fix handling of $$-escaped strings (CASSANDRA-12189)
+ * Fix SSL JMX requiring truststore containing server cert (CASSANDRA-12109)
+Merged from 3.0:
  * Rerun ReplicationAwareTokenAllocatorTest on failure to avoid flakiness (CASSANDRA-12277)
  * Exception when computing read-repair for range tombstones (CASSANDRA-12263)
  * Lost counter writes in compact table and static columns (CASSANDRA-12219)
@@ -10,6 +13,44 @@
    to connect with too low of a protocol version (CASSANDRA-11464)
  * NullPointerExpception when reading/compacting table (CASSANDRA-11988)
  * Fix problem with undeleteable rows on upgrade to new sstable format (CASSANDRA-12144)
+Merged from 2.2:
+ * Wait for tracing events before returning response and query at same consistency level client side (CASSANDRA-11465)
+ * cqlsh copyutil should get host metadata by connected address (CASSANDRA-11979)
+ * Fixed cqlshlib.test.remove_test_db (CASSANDRA-12214)
+Merged from 2.1:
+ * cannot use cql since upgrading python to 2.7.11+ (CASSANDRA-11850)
+ * Allow STCS-in-L0 compactions to reduce scope with LCS (CASSANDRA-12040)
+
+
+3.8
+ * Fix hdr logging for single operation workloads (CASSANDRA-12145)
+ * Fix SASI PREFIX search in CONTAINS mode with partial terms (CASSANDRA-12073)
+ * Increase size of flushExecutor thread pool (CASSANDRA-12071)
+ * Partial revert of CASSANDRA-11971, cannot recycle buffer in SP.sendMessagesToNonlocalDC (CASSANDRA-11950)
+ * Upgrade netty to 4.0.39 (CASSANDRA-12032, CASSANDRA-12034)
+ * Improve details in compaction log message (CASSANDRA-12080)
+ * Allow unset values in CQLSSTableWriter (CASSANDRA-11911)
+ * Chunk cache to request compressor-compatible buffers if pool space is exhausted (CASSANDRA-11993)
+ * Remove DatabaseDescriptor dependencies from SequentialWriter (CASSANDRA-11579)
+ * Move skip_stop_words filter before stemming (CASSANDRA-12078)
+ * Support seek() in EncryptedFileSegmentInputStream (CASSANDRA-11957)
+ * SSTable tools mishandling LocalPartitioner (CASSANDRA-12002)
+ * When SEPWorker assigned work, set thread name to match pool (CASSANDRA-11966)
+ * Add cross-DC latency metrics (CASSANDRA-11596)
+ * Allow terms in selection clause (CASSANDRA-10783)
+ * Add bind variables to trace (CASSANDRA-11719)
+ * Switch counter shards' clock to timestamps (CASSANDRA-9811)
+ * Introduce HdrHistogram and response/service/wait separation to stress tool (CASSANDRA-11853)
+ * entry-weighers in QueryProcessor should respect partitionKeyBindIndexes field (CASSANDRA-11718)
+ * Support older ant versions (CASSANDRA-11807)
+ * Estimate compressed on disk size when deciding if sstable size limit reached (CASSANDRA-11623)
+ * cassandra-stress profiles should support case sensitive schemas (CASSANDRA-11546)
+ * Remove DatabaseDescriptor dependency from FileUtils (CASSANDRA-11578)
+ * Faster streaming (CASSANDRA-9766)
+ * Add prepared query parameter to trace for "Execute CQL3 prepared query" session (CASSANDRA-11425)
+ * Add repaired percentage metric (CASSANDRA-11503)
+ * Add Change-Data-Capture (CASSANDRA-8844)
+Merged from 3.0:
  * Fix paging logic for deleted partitions with static columns (CASSANDRA-12107)
  * Wait until the message is being send to decide which serializer must be used (CASSANDRA-11393)
  * Fix migration of static thrift column names with non-text comparators (CASSANDRA-12147)
@@ -23,34 +64,23 @@
    those static columns in query results (CASSANDRA-12123)
  * Avoid digest mismatch with empty but static rows (CASSANDRA-12090)
  * Fix EOF exception when altering column type (CASSANDRA-11820)
-Merged from 2.2:
- * Wait for tracing events before returning response and query at same consistency level client side (CASSANDRA-11465)
- * cqlsh copyutil should get host metadata by connected address (CASSANDRA-11979)
- * Fixed cqlshlib.test.remove_test_db (CASSANDRA-12214)
- * Synchronize ThriftServer::stop() (CASSANDRA-12105)
- * Use dedicated thread for JMX notifications (CASSANDRA-12146)
- * Improve streaming synchronization and fault tolerance (CASSANDRA-11414)
- * MemoryUtil.getShort() should return an unsigned short also for architectures not supporting unaligned memory accesses (CASSANDRA-11973)
-Merged from 2.1:
- * Allow STCS-in-L0 compactions to reduce scope with LCS (CASSANDRA-12040)
- * cannot use cql since upgrading python to 2.7.11+ (CASSANDRA-11850)
- * Fix filtering on clustering columns when 2i is used (CASSANDRA-11907)
-
-
-3.0.8
- * Fix potential race in schema during new table creation (CASSANDRA-12083)
  * cqlsh: fix error handling in rare COPY FROM failure scenario (CASSANDRA-12070)
  * Disable autocompaction during drain (CASSANDRA-11878)
  * Add a metrics timer to MemtablePool and use it to track time spent blocked on memory in MemtableAllocator (CASSANDRA-11327)
  * Fix upgrading schema with super columns with non-text subcomparators (CASSANDRA-12023)
  * Add TimeWindowCompactionStrategy (CASSANDRA-9666)
 Merged from 2.2:
+ * Synchronize ThriftServer::stop() (CASSANDRA-12105)
+ * Use dedicated thread for JMX notifications (CASSANDRA-12146)
+ * Improve streaming synchronization and fault tolerance (CASSANDRA-11414)
+ * MemoryUtil.getShort() should return an unsigned short also for architectures not supporting unaligned memory accesses (CASSANDRA-11973)
  * Allow nodetool info to run with readonly JMX access (CASSANDRA-11755)
  * Validate bloom_filter_fp_chance against lowest supported
    value when the table is created (CASSANDRA-11920)
  * Don't send erroneous NEW_NODE notifications on restart (CASSANDRA-11038)
  * StorageService shutdown hook should use a volatile variable (CASSANDRA-11984)
 Merged from 2.1:
+ * Fix filtering on clustering columns when 2i is used (CASSANDRA-11907)
  * Avoid stalling paxos when the paxos state expires (CASSANDRA-12043)
  * Remove finished incoming streaming connections from MessagingService (CASSANDRA-11854)
  * Don't try to get sstables for non-repairing column families (CASSANDRA-12077)
@@ -64,11 +94,15 @@
  * cqlsh COPY FROM: shutdown parent cluster after forking, to avoid corrupting SSL connections (CASSANDRA-11749)
 
 
-3.0.7
+3.7
+ * Support multiple folders for user defined compaction tasks (CASSANDRA-11765)
+ * Fix race in CompactionStrategyManager's pause/resume (CASSANDRA-11922)
+Merged from 3.0:
  * Fix legacy serialization of Thrift-generated non-compound range tombstones
    when communicating with 2.x nodes (CASSANDRA-11930)
  * Fix Directories instantiations where CFS.initialDirectories should be used (CASSANDRA-11849)
  * Avoid referencing DatabaseDescriptor in AbstractType (CASSANDRA-11912)
+ * Don't use static dataDirectories field in Directories instances (CASSANDRA-11647)
  * Fix sstables not being protected from removal during index build (CASSANDRA-11905)
  * cqlsh: Suppress stack trace from Read/WriteFailures (CASSANDRA-11032)
  * Remove unneeded code to repair index summaries that have
@@ -82,17 +116,18 @@
  * Update Java Driver (CASSANDRA-11615)
 Merged from 2.2:
  * Persist local metadata earlier in startup sequence (CASSANDRA-11742)
- * Run CommitLog tests with different compression settings (CASSANDRA-9039)
  * cqlsh: fix tab completion for case-sensitive identifiers (CASSANDRA-11664)
  * Avoid showing estimated key as -1 in tablestats (CASSANDRA-11587)
  * Fix possible race condition in CommitLog.recover (CASSANDRA-11743)
  * Enable client encryption in sstableloader with cli options (CASSANDRA-11708)
  * Possible memory leak in NIODataInputStream (CASSANDRA-11867)
  * Add seconds to cqlsh tracing session duration (CASSANDRA-11753)
+ * Fix commit log replay after out-of-order flush completion (CASSANDRA-9669)
  * Prohibit Reversed Counter type as part of the PK (CASSANDRA-9395)
+ * cqlsh: correctly handle non-ascii chars in error messages (CASSANDRA-11626)
 Merged from 2.1:
+ * Run CommitLog tests with different compression settings (CASSANDRA-9039)
  * cqlsh: apply current keyspace to source command (CASSANDRA-11152)
- * Backport CASSANDRA-11578 (CASSANDRA-11750)
  * Clear out parent repair session if repair coordinator dies (CASSANDRA-11824)
  * Set default streaming_socket_timeout_in_ms to 24 hours (CASSANDRA-11840)
  * Do not consider local node a valid source during replace (CASSANDRA-11848)
@@ -100,7 +135,77 @@
  * Avoid holding SSTableReaders for duration of incremental repair (CASSANDRA-11739)
 
 
-3.0.6
+3.6
+ * Correctly migrate schema for frozen UDTs during 2.x -> 3.x upgrades
+   (does not affect any released versions) (CASSANDRA-11613)
+ * Allow server startup if JMX is configured directly (CASSANDRA-11725)
+ * Prevent direct memory OOM on buffer pool allocations (CASSANDRA-11710)
+ * Enhanced Compaction Logging (CASSANDRA-10805)
+ * Make prepared statement cache size configurable (CASSANDRA-11555)
+ * Integrated JMX authentication and authorization (CASSANDRA-10091)
+ * Add units to stress ouput (CASSANDRA-11352)
+ * Fix PER PARTITION LIMIT for single and multi partitions queries (CASSANDRA-11603)
+ * Add uncompressed chunk cache for RandomAccessReader (CASSANDRA-5863)
+ * Clarify ClusteringPrefix hierarchy (CASSANDRA-11213)
+ * Always perform collision check before joining ring (CASSANDRA-10134)
+ * SSTableWriter output discrepancy (CASSANDRA-11646)
+ * Fix potential timeout in NativeTransportService.testConcurrentDestroys (CASSANDRA-10756)
+ * Support large partitions on the 3.0 sstable format (CASSANDRA-11206,11763)
+ * Add support to rebuild from specific range (CASSANDRA-10406)
+ * Optimize the overlapping lookup by calculating all the
+   bounds in advance (CASSANDRA-11571)
+ * Support json/yaml output in noetool tablestats (CASSANDRA-5977)
+ * (stress) Add datacenter option to -node options (CASSANDRA-11591)
+ * Fix handling of empty slices (CASSANDRA-11513)
+ * Make number of cores used by cqlsh COPY visible to testing code (CASSANDRA-11437)
+ * Allow filtering on clustering columns for queries without secondary indexes (CASSANDRA-11310)
+ * Refactor Restriction hierarchy (CASSANDRA-11354)
+ * Eliminate allocations in R/W path (CASSANDRA-11421)
+ * Update Netty to 4.0.36 (CASSANDRA-11567)
+ * Fix PER PARTITION LIMIT for queries requiring post-query ordering (CASSANDRA-11556)
+ * Allow instantiation of UDTs and tuples in UDFs (CASSANDRA-10818)
+ * Support UDT in CQLSSTableWriter (CASSANDRA-10624)
+ * Support for non-frozen user-defined types, updating
+   individual fields of user-defined types (CASSANDRA-7423)
+ * Make LZ4 compression level configurable (CASSANDRA-11051)
+ * Allow per-partition LIMIT clause in CQL (CASSANDRA-7017)
+ * Make custom filtering more extensible with UserExpression (CASSANDRA-11295)
+ * Improve field-checking and error reporting in cassandra.yaml (CASSANDRA-10649)
+ * Print CAS stats in nodetool proxyhistograms (CASSANDRA-11507)
+ * More user friendly error when providing an invalid token to nodetool (CASSANDRA-9348)
+ * Add static column support to SASI index (CASSANDRA-11183)
+ * Support EQ/PREFIX queries in SASI CONTAINS mode without tokenization (CASSANDRA-11434)
+ * Support LIKE operator in prepared statements (CASSANDRA-11456)
+ * Add a command to see if a Materialized View has finished building (CASSANDRA-9967)
+ * Log endpoint and port associated with streaming operation (CASSANDRA-8777)
+ * Print sensible units for all log messages (CASSANDRA-9692)
+ * Upgrade Netty to version 4.0.34 (CASSANDRA-11096)
+ * Break the CQL grammar into separate Parser and Lexer (CASSANDRA-11372)
+ * Compress only inter-dc traffic by default (CASSANDRA-8888)
+ * Add metrics to track write amplification (CASSANDRA-11420)
+ * cassandra-stress: cannot handle "value-less" tables (CASSANDRA-7739)
+ * Add/drop multiple columns in one ALTER TABLE statement (CASSANDRA-10411)
+ * Add require_endpoint_verification opt for internode encryption (CASSANDRA-9220)
+ * Add auto import java.util for UDF code block (CASSANDRA-11392)
+ * Add --hex-format option to nodetool getsstables (CASSANDRA-11337)
+ * sstablemetadata should print sstable min/max token (CASSANDRA-7159)
+ * Do not wrap CassandraException in TriggerExecutor (CASSANDRA-9421)
+ * COPY TO should have higher double precision (CASSANDRA-11255)
+ * Stress should exit with non-zero status after failure (CASSANDRA-10340)
+ * Add client to cqlsh SHOW_SESSION (CASSANDRA-8958)
+ * Fix nodetool tablestats keyspace level metrics (CASSANDRA-11226)
+ * Store repair options in parent_repair_history (CASSANDRA-11244)
+ * Print current leveling in sstableofflinerelevel (CASSANDRA-9588)
+ * Change repair message for keyspaces with RF 1 (CASSANDRA-11203)
+ * Remove hard-coded SSL cipher suites and protocols (CASSANDRA-10508)
+ * Improve concurrency in CompactionStrategyManager (CASSANDRA-10099)
+ * (cqlsh) interpret CQL type for formatting blobs (CASSANDRA-11274)
+ * Refuse to start and print txn log information in case of disk
+   corruption (CASSANDRA-10112)
+ * Resolve some eclipse-warnings (CASSANDRA-11086)
+ * (cqlsh) Show static columns in a different color (CASSANDRA-11059)
+ * Allow to remove TTLs on table with default_time_to_live (CASSANDRA-11207)
+Merged from 3.0:
  * Disallow creating view with a static column (CASSANDRA-11602)
  * Reduce the amount of object allocations caused by the getFunctions methods (CASSANDRA-11593)
  * Potential error replaying commitlog with smallint/tinyint/date/time types (CASSANDRA-11618)
@@ -121,8 +226,6 @@
    header is received (CASSANDRA-11464)
  * Validate that num_tokens and initial_token are consistent with one another (CASSANDRA-10120)
 Merged from 2.2:
- * Fix commit log replay after out-of-order flush completion (CASSANDRA-9669)
- * cqlsh: correctly handle non-ascii chars in error messages (CASSANDRA-11626)
  * Exit JVM if JMX server fails to startup (CASSANDRA-11540)
  * Produce a heap dump when exiting on OOM (CASSANDRA-9861)
  * Restore ability to filter on clustering columns when using a 2i (CASSANDRA-11510)
@@ -150,7 +253,12 @@
  * Validate levels when building LeveledScanner to avoid overlaps with orphaned sstables (CASSANDRA-9935)
 
 
-3.0.5
+3.5
+ * StaticTokenTreeBuilder should respect posibility of duplicate tokens (CASSANDRA-11525)
+ * Correctly fix potential assertion error during compaction (CASSANDRA-11353)
+ * Avoid index segment stitching in RAM which lead to OOM on big SSTable files (CASSANDRA-11383)
+ * Fix clustering and row filters for LIKE queries on clustering columns (CASSANDRA-11397)
+Merged from 3.0:
  * Fix rare NPE on schema upgrade from 2.x to 3.x (CASSANDRA-10943)
  * Improve backoff policy for cqlsh COPY FROM (CASSANDRA-11320)
  * Improve IF NOT EXISTS check in CREATE INDEX (CASSANDRA-11131)
@@ -158,7 +266,7 @@
  * Enable SO_REUSEADDR for JMX RMI server sockets (CASSANDRA-11093)
  * Allocate merkletrees with the correct size (CASSANDRA-11390)
  * Support streaming pre-3.0 sstables (CASSANDRA-10990)
- * Add backpressure to compressed commit log (CASSANDRA-10971)
+ * Add backpressure to compressed or encrypted commit log (CASSANDRA-10971)
  * SSTableExport supports secondary index tables (CASSANDRA-11330)
  * Fix sstabledump to include missing info in debug output (CASSANDRA-11321)
  * Establish and implement canonical bulk reading workload(s) (CASSANDRA-10331)
@@ -179,19 +287,49 @@
  * Fix bloom filter sizing with LCS (CASSANDRA-11344)
  * (cqlsh) Fix error when result is 0 rows with EXPAND ON (CASSANDRA-11092)
  * Add missing newline at end of bin/cqlsh (CASSANDRA-11325)
- * Fix AE in nodetool cfstats (backport CASSANDRA-10859) (CASSANDRA-11297)
  * Unresolved hostname leads to replace being ignored (CASSANDRA-11210)
  * Only log yaml config once, at startup (CASSANDRA-11217)
  * Reference leak with parallel repairs on the same table (CASSANDRA-11215)
 Merged from 2.1:
  * Add a -j parameter to scrub/cleanup/upgradesstables to state how
    many threads to use (CASSANDRA-11179)
- * Backport CASSANDRA-10679 (CASSANDRA-9598)
- * InvalidateKeys should have a weak ref to key cache (CASSANDRA-11176)
  * COPY FROM on large datasets: fix progress report and debug performance (CASSANDRA-11053)
+ * InvalidateKeys should have a weak ref to key cache (CASSANDRA-11176)
 
-3.0.4
- * Preserve order for preferred SSL cipher suites (CASSANDRA-11164)
+
+3.4
+ * (cqlsh) add cqlshrc option to always connect using ssl (CASSANDRA-10458)
+ * Cleanup a few resource warnings (CASSANDRA-11085)
+ * Allow custom tracing implementations (CASSANDRA-10392)
+ * Extract LoaderOptions to be able to be used from outside (CASSANDRA-10637)
+ * fix OnDiskIndexTest to properly treat empty ranges (CASSANDRA-11205)
+ * fix TrackerTest to handle new notifications (CASSANDRA-11178)
+ * add SASI validation for partitioner and complex columns (CASSANDRA-11169)
+ * Add caching of encrypted credentials in PasswordAuthenticator (CASSANDRA-7715)
+ * fix SASI memtable switching on flush (CASSANDRA-11159)
+ * Remove duplicate offline compaction tracking (CASSANDRA-11148)
+ * fix EQ semantics of analyzed SASI indexes (CASSANDRA-11130)
+ * Support long name output for nodetool commands (CASSANDRA-7950)
+ * Encrypted hints (CASSANDRA-11040)
+ * SASI index options validation (CASSANDRA-11136)
+ * Optimize disk seek using min/max column name meta data when the LIMIT clause is used
+   (CASSANDRA-8180)
+ * Add LIKE support to CQL3 (CASSANDRA-11067)
+ * Generic Java UDF types (CASSANDRA-10819)
+ * cqlsh: Include sub-second precision in timestamps by default (CASSANDRA-10428)
+ * Set javac encoding to utf-8 (CASSANDRA-11077)
+ * Integrate SASI index into Cassandra (CASSANDRA-10661)
+ * Add --skip-flush option to nodetool snapshot
+ * Skip values for non-queried columns (CASSANDRA-10657)
+ * Add support for secondary indexes on static columns (CASSANDRA-8103)
+ * CommitLogUpgradeTestMaker creates broken commit logs (CASSANDRA-11051)
+ * Add metric for number of dropped mutations (CASSANDRA-10866)
+ * Simplify row cache invalidation code (CASSANDRA-10396)
+ * Support user-defined compaction through nodetool (CASSANDRA-10660)
+ * Stripe view locks by key and table ID to reduce contention (CASSANDRA-10981)
+ * Add nodetool gettimeout and settimeout commands (CASSANDRA-10953)
+ * Add 3.0 metadata to sstablemetadata output (CASSANDRA-10838)
+Merged from 3.0:
  * MV should only query complex columns included in the view (CASSANDRA-11069)
  * Failed aggregate creation breaks server permanently (CASSANDRA-11064)
  * Add sstabledump tool (CASSANDRA-7464)
@@ -213,6 +351,7 @@
    properly (CASSANDRA-11050)
  * Fix NPE when using forceRepairRangeAsync without DC (CASSANDRA-11239)
 Merged from 2.2:
+ * Preserve order for preferred SSL cipher suites (CASSANDRA-11164)
  * Range.compareTo() violates the contract of Comparable (CASSANDRA-11216)
  * Avoid NPE when serializing ErrorMessage with null message (CASSANDRA-11167)
  * Replacing an aggregate with a new version doesn't reset INITCOND (CASSANDRA-10840)
@@ -231,6 +370,7 @@
  * (cqlsh) Support utf-8/cp65001 encoding on Windows (CASSANDRA-11030)
  * Fix paging on DISTINCT queries repeats result when first row in partition changes
    (CASSANDRA-10010)
+ * (cqlsh) Support timezone conversion using pytz (CASSANDRA-10397)
  * cqlsh: change default encoding to UTF-8 (CASSANDRA-11124)
 Merged from 2.1:
  * Checking if an unlogged batch is local is inefficient (CASSANDRA-11529)
@@ -248,11 +388,14 @@
  * Gossiper#isEnabled is not thread safe (CASSANDRA-11116)
  * Avoid major compaction mixing repaired and unrepaired sstables in DTCS (CASSANDRA-11113)
  * Make it clear what DTCS timestamp_resolution is used for (CASSANDRA-11041)
- * (cqlsh) Support timezone conversion using pytz (CASSANDRA-10397)
  * (cqlsh) Display milliseconds when datetime overflows (CASSANDRA-10625)
 
 
-3.0.3
+3.3
+ * Avoid infinite loop if owned range is smaller than number of
+   data dirs (CASSANDRA-11034)
+ * Avoid bootstrap hanging when existing nodes have no data to stream (CASSANDRA-11010)
+Merged from 3.0:
  * Remove double initialization of newly added tables (CASSANDRA-11027)
  * Filter keys searcher results by target range (CASSANDRA-11104)
  * Fix deserialization of legacy read commands (CASSANDRA-11087)
@@ -273,24 +416,17 @@
  * Remove checksum files after replaying hints (CASSANDRA-10947)
  * Support passing base table metadata to custom 2i validation (CASSANDRA-10924)
  * Ensure stale index entries are purged during reads (CASSANDRA-11013)
+ * (cqlsh) Also apply --connect-timeout to control connection
+   timeout (CASSANDRA-10959)
  * Fix AssertionError when removing from list using UPDATE (CASSANDRA-10954)
  * Fix UnsupportedOperationException when reading old sstable with range
    tombstone (CASSANDRA-10743)
  * MV should use the maximum timestamp of the primary key (CASSANDRA-10910)
  * Fix potential assertion error during compaction (CASSANDRA-10944)
- * Fix counting of received sstables in streaming (CASSANDRA-10949)
- * Implement hints compression (CASSANDRA-9428)
- * Fix potential assertion error when reading static columns (CASSANDRA-10903)
- * Avoid NoSuchElementException when executing empty batch (CASSANDRA-10711)
- * Avoid building PartitionUpdate in toString (CASSANDRA-10897)
- * Reduce heap spent when receiving many SSTables (CASSANDRA-10797)
- * Add back support for 3rd party auth providers to bulk loader (CASSANDRA-10873)
- * Eliminate the dependency on jgrapht for UDT resolution (CASSANDRA-10653)
- * (Hadoop) Close Clusters and Sessions in Hadoop Input/Output classes (CASSANDRA-10837)
- * Fix sstableloader not working with upper case keyspace name (CASSANDRA-10806)
 Merged from 2.2:
  * maxPurgeableTimestamp needs to check memtables too (CASSANDRA-9949)
  * Apply change to compaction throughput in real time (CASSANDRA-10025)
+ * (cqlsh) encode input correctly when saving history
  * Fix potential NPE on ORDER BY queries with IN (CASSANDRA-10955)
  * Start L0 STCS-compactions even if there is a L0 -> L1 compaction
    going (CASSANDRA-10979)
@@ -298,22 +434,11 @@
  * Avoid NPE when performing sstable tasks (scrub etc.) (CASSANDRA-10980)
  * Make sure client gets tombstone overwhelmed warning (CASSANDRA-9465)
  * Fix error streaming section more than 2GB (CASSANDRA-10961)
- * (cqlsh) Also apply --connect-timeout to control connection
-   timeout (CASSANDRA-10959)
  * Histogram buckets exposed in jmx are sorted incorrectly (CASSANDRA-10975)
  * Enable GC logging by default (CASSANDRA-10140)
  * Optimize pending range computation (CASSANDRA-9258)
  * Skip commit log and saved cache directories in SSTable version startup check (CASSANDRA-10902)
  * drop/alter user should be case sensitive (CASSANDRA-10817)
- * jemalloc detection fails due to quoting issues in regexv (CASSANDRA-10946)
- * (cqlsh) show correct column names for empty result sets (CASSANDRA-9813)
- * Add new types to Stress (CASSANDRA-9556)
- * Add property to allow listening on broadcast interface (CASSANDRA-9748)
- * Fix regression in split size on CqlInputFormat (CASSANDRA-10835)
- * Better handling of SSL connection errors inter-node (CASSANDRA-10816)
- * Disable reloading of GossipingPropertyFileSnitch (CASSANDRA-9474)
- * Verify tables in pseudo-system keyspaces at startup (CASSANDRA-10761)
- * (cqlsh) encode input correctly when saving history
 Merged from 2.1:
  * test_bulk_round_trip_blogposts is failing occasionally (CASSANDRA-10938)
  * Fix isJoined return true only after becoming cluster member (CASANDRA-11007)
@@ -329,6 +454,55 @@
  * Retry sending gossip syn multiple times during shadow round (CASSANDRA-8072)
  * Fix pending range calculation during moves (CASSANDRA-10887)
  * Sane default (200Mbps) for inter-DC streaming througput (CASSANDRA-8708)
+
+
+
+3.2
+ * Make sure tokens don't exist in several data directories (CASSANDRA-6696)
+ * Add requireAuthorization method to IAuthorizer (CASSANDRA-10852)
+ * Move static JVM options to conf/jvm.options file (CASSANDRA-10494)
+ * Fix CassandraVersion to accept x.y version string (CASSANDRA-10931)
+ * Add forceUserDefinedCleanup to allow more flexible cleanup (CASSANDRA-10708)
+ * (cqlsh) allow setting TTL with COPY (CASSANDRA-9494)
+ * Fix counting of received sstables in streaming (CASSANDRA-10949)
+ * Implement hints compression (CASSANDRA-9428)
+ * Fix potential assertion error when reading static columns (CASSANDRA-10903)
+ * Fix EstimatedHistogram creation in nodetool tablehistograms (CASSANDRA-10859)
+ * Establish bootstrap stream sessions sequentially (CASSANDRA-6992)
+ * Sort compactionhistory output by timestamp (CASSANDRA-10464)
+ * More efficient BTree removal (CASSANDRA-9991)
+ * Make tablehistograms accept the same syntax as tablestats (CASSANDRA-10149)
+ * Group pending compactions based on table (CASSANDRA-10718)
+ * Add compressor name in sstablemetadata output (CASSANDRA-9879)
+ * Fix type casting for counter columns (CASSANDRA-10824)
+ * Prevent running Cassandra as root (CASSANDRA-8142)
+ * bound maximum in-flight commit log replay mutation bytes to 64 megabytes (CASSANDRA-8639)
+ * Normalize all scripts (CASSANDRA-10679)
+ * Make compression ratio much more accurate (CASSANDRA-10225)
+ * Optimize building of Clustering object when only one is created (CASSANDRA-10409)
+ * Make index building pluggable (CASSANDRA-10681)
+ * Add sstable flush observer (CASSANDRA-10678)
+ * Improve NTS endpoints calculation (CASSANDRA-10200)
+ * Improve performance of the folderSize function (CASSANDRA-10677)
+ * Add support for type casting in selection clause (CASSANDRA-10310)
+ * Added graphing option to cassandra-stress (CASSANDRA-7918)
+ * Abort in-progress queries that time out (CASSANDRA-7392)
+ * Add transparent data encryption core classes (CASSANDRA-9945)
+Merged from 3.0:
+ * Better handling of SSL connection errors inter-node (CASSANDRA-10816)
+ * Avoid NoSuchElementException when executing empty batch (CASSANDRA-10711)
+ * Avoid building PartitionUpdate in toString (CASSANDRA-10897)
+ * Reduce heap spent when receiving many SSTables (CASSANDRA-10797)
+ * Add back support for 3rd party auth providers to bulk loader (CASSANDRA-10873)
+ * Eliminate the dependency on jgrapht for UDT resolution (CASSANDRA-10653)
+ * (Hadoop) Close Clusters and Sessions in Hadoop Input/Output classes (CASSANDRA-10837)
+ * Fix sstableloader not working with upper case keyspace name (CASSANDRA-10806)
+Merged from 2.2:
+ * jemalloc detection fails due to quoting issues in regexv (CASSANDRA-10946)
+ * (cqlsh) show correct column names for empty result sets (CASSANDRA-9813)
+ * Add new types to Stress (CASSANDRA-9556)
+ * Add property to allow listening on broadcast interface (CASSANDRA-9748)
+Merged from 2.1:
  * Match cassandra-loader options in COPY FROM (CASSANDRA-9303)
  * Fix binding to any address in CqlBulkRecordWriter (CASSANDRA-9309)
  * cqlsh fails to decode utf-8 characters for text typed columns (CASSANDRA-10875)
@@ -341,12 +515,14 @@
  * Allow cancellation of index summary redistribution (CASSANDRA-8805)
 
 
-3.0.2
- * Fix upgrade data loss due to range tombstone deleting more data than then should
-   (CASSANDRA-10822)
+3.1.1
+Merged from 3.0:
+  * Fix upgrade data loss due to range tombstone deleting more data than then should
+    (CASSANDRA-10822)
 
 
-3.0.1
+3.1
+Merged from 3.0:
  * Avoid MV race during node decommission (CASSANDRA-10674)
  * Disable reloading of GossipingPropertyFileSnitch (CASSANDRA-9474)
  * Handle single-column deletions correction in materialized views

diff --git a/NEWS.txt b/NEWS.txt
index 56dd6ea..7418f3a 100644
--- a/NEWS.txt
+++ b/NEWS.txt

@@ -13,16 +13,58 @@
 'sstableloader' tool. You can upgrade the file format of your snapshots
 using the provided 'sstableupgrade' tool.
 
-3.0.8
-=====
+
+3.8
+===
+
+New features
+------------
+   - Shared pool threads are now named according to the stage they are executing
+     tasks for. Thread names mentioned in traced queries change accordingly.
+   - A new option has been added to cassandra-stress "-rate fixed={number}/s"
+     that forces a scheduled rate of operations/sec over time. Using this, stress can
+     accurately account for coordinated ommission from the stress process.
+   - The cassandra-stress "-rate limit=" option has been renamed to "-rate throttle="
+   - hdr histograms have been added to stress runs, it's output can be saved to disk using:
+     "-log hdrfile=" option. This histogram includes response/service/wait times when used with the
+     fixed or throttle rate options.  The histogram file can be plotted on
+     http://hdrhistogram.github.io/HdrHistogram/plotFiles.html
+   - TimeWindowCompactionStrategy has been added. This has proven to be a better approach
+     to time series compaction and new tables should use this instead of DTCS. See
+     CASSANDRA-9666 for details.
+   - Change-Data-Capture is now available. See cassandra.yaml and for cdc-specific flags and
+     a brief explanation of on-disk locations for archived data in CommitLog form. This can
+     be enabled via ALTER TABLE ... WITH cdc=true.
+     Upon flush, CommitLogSegments containing data for CDC-enabled tables are moved to
+     the data/cdc_raw directory until removed by the user and writes to CDC-enabled tables
+     will be rejected with a WriteTimeoutException once cdc_total_space_in_mb is reached
+     between unflushed CommitLogSegments and cdc_raw.
 
 Upgrading
 ---------
-   - Nothing specific to this release, but please see previous versions upgrading section,
-     especially if you are upgrading from 2.2.
+    - The name "json" and "distinct" are not valid anymore a user-defined function
+      names (they are still valid as column name however). In the unlikely case where
+      you had defined functions with such names, you will need to recreate
+      those under a different name, change your code to use the new names and
+      drop the old versions, and this _before_ upgrade (see CASSANDRA-10783 for more
+      details).
+    - Due to changes in schema migration handling and the storage format after 3.0, you will
+      see error messages such as:
+         "java.lang.RuntimeException: Unknown column cdc during deserialization"
+      in your system logs on a mixed-version cluster during upgrades. This error message
+      is harmless and due to the 3.8 nodes having cdc added to their schema tables while
+      the <3.8 nodes do not. This message should cease once all nodes are upgraded to 3.8.
+      As always, refrain from schema changes during cluster upgrades.
 
-3.0.7
-=====
+Deprecation
+-----------
+   - DateTieredCompactionStrategy has been deprecated - new tables should use
+     TimeWindowCompactionStrategy. Note that migrating an existing DTCS-table to TWCS might
+     cause increased compaction load for a while after the migration so make sure you run
+     tests before migrating. Read CASSANDRA-9666 for background on this.
+
+3.7
+===
 
 Upgrading
 ---------
@@ -32,64 +74,92 @@
      value of native_transport_max_frame_size_in_mb. SSTables will be considered corrupt if
      they contain values whose size exceeds this limit. See CASSANDRA-9530 for more details.
 
+
+3.6
+=====
+
+New features
+------------
+   - JMX connections can now use the same auth mechanisms as CQL clients. New options
+     in cassandra-env.(sh|ps1) enable JMX authentication and authorization to be delegated
+     to the IAuthenticator and IAuthorizer configured in cassandra.yaml. The default settings
+     still only expose JMX locally, and use the JVM's own security mechanisms when remote
+     connections are permitted. For more details on how to enable the new options, see the
+     comments in cassandra-env.sh. A new class of IResource, JMXResource, is provided for
+     the purposes of GRANT/REVOKE via CQL. See CASSANDRA-10091 for more details.
+     Also, directly setting JMX remote port via the com.sun.management.jmxremote.port system
+     property at startup is deprecated. See CASSANDRA-11725 for more details.
+   - JSON timestamps are now in UTC and contain the timezone information, see CASSANDRA-11137 for more details.
+   - Collision checks are performed when joining the token ring, regardless of whether
+     the node should bootstrap. Additionally, replace_address can legitimately be used
+     without bootstrapping to help with recovery of nodes with partially failed disks.
+     See CASSANDRA-10134 for more details.
+   - Key cache will only hold indexed entries up to the size configured by
+     column_index_cache_size_in_kb in cassandra.yaml in memory. Larger indexed entries
+     will never go into memory. See CASSANDRA-11206 for more details.
+   - For tables having a default_time_to_live specifying a TTL of 0 will remove the TTL
+     from the inserted or updated values.
+   - Startup is now aborted if corrupted transaction log files are found. The details
+     of the affected log files are now logged, allowing the operator to decide how
+     to resolve the situation.
+   - Filtering expressions are made more pluggable and can be added programatically via
+     a QueryHandler implementation. See CASSANDRA-11295 for more details.
+
+
+3.4
+=====
+
+New features
+------------
+    - Internal authentication now supports caching of encrypted credentials.
+      Reference cassandra.yaml:credentials_validity_in_ms
+    - Remote configuration of auth caches via JMX can be disabled using the
+      the system property cassandra.disable_auth_caches_remote_configuration
+    - sstabledump tool is added to be 3.0 version of former sstable2json. The tool only
+      supports v3.0+ SSTables. See tool's help for more detail.
+
+Upgrading
+---------
+    - Nothing specific to 3.4 but please see previous versions upgrading section,
+      especially if you are upgrading from 2.2.
+
 Deprecation
 -----------
-   - DateTieredCompactionStrategy has been deprecated - new tables should use
-     TimeWindowCompactionStrategy. Note that migrating an existing DTCS-table to TWCS might
-     cause increased compaction load for a while after the migration so make sure you run
-     tests before migrating. Read CASSANDRA-9666 for background on this.
+    - The mbean interfaces org.apache.cassandra.auth.PermissionsCacheMBean and
+      org.apache.cassandra.auth.RolesCacheMBean are deprecated in favor of
+      org.apache.cassandra.auth.AuthCacheMBean. This generalized interface is
+      common across all caches in the auth subsystem. The specific mbean interfaces
+      for each individual cache will be removed in a subsequent major version.
+
+
+3.2
+===
 
 New features
 ------------
-   - TimeWindowCompactionStrategy has been added. This has proven to be a better approach
-     to time series compaction and new tables should use this instead of DTCS. See
-     CASSANDRA-9666 for details.
-
-3.0.6
-=====
-
-New features
-------------
-   - JSON timestamps are now in UTC and contain the timezone information, see
-     CASSANDRA-11137 for more details.
-
-3.0.5
-=====
-
-Upgrading
----------
-   - Nothing specific to this release, but please see previous versions upgrading section,
-     especially if you are upgrading from 2.2.
-
-3.0.4
-=====
-
-New features
-------------
-   - sstabledump tool is added to be 3.0 version of former sstable2json. The tool only
-     supports v3.0+ SSTables. See tool's help for more detail.
-
-Upgrading
----------
-   - Nothing specific to this release, but please see previous versions upgrading section,
-     especially if you are upgrading from 2.2.
-
-
-3.0.3
-=====
-
-New features
-------------
+   - We now make sure that a token does not exist in several data directories. This
+     means that we run one compaction strategy per data_file_directory and we use
+     one thread per directory to flush. Use nodetool relocatesstables to make sure your
+     tokens are in the correct place, or just wait and compaction will handle it. See
+     CASSANDRA-6696 for more details.
+   - bound maximum in-flight commit log replay mutation bytes to 64 megabytes
+     tunable via cassandra.commitlog_max_outstanding_replay_bytes
+   - Support for type casting has been added to the selection clause.
    - Hinted handoff now supports compression. Reference cassandra.yaml:hints_compression.
      Note: hints compression is currently disabled by default.
 
 Upgrading
 ---------
-    - Nothing specific to 3.0.3 but please see previous versions upgrading section,
-      especially if you are upgrading from 2.2.
+   - The compression ratio metrics computation has been modified to be more accurate.
+   - Running Cassandra as root is prevented by default.
+   - JVM options are moved from cassandra-env.(sh|ps1) to jvm.options file
+
+Deprecation
+-----------
+   - The Thrift API is deprecated and will be removed in Cassandra 4.0.
 
 
-3.0.1
+3.1
 =====
 
 Upgrading
@@ -313,8 +383,8 @@
      the legacy tables, so clients experience no disruption. Issuing DCL
      statements during an upgrade is not supported.
      Once all nodes are upgraded, an operator with superuser privileges should
-     drop the legacy tables, system_auth.users, system_auth.credentials and 
-     system_auth.permissions. Doing so will prompt Cassandra to switch over to 
+     drop the legacy tables, system_auth.users, system_auth.credentials and
+     system_auth.permissions. Doing so will prompt Cassandra to switch over to
      the new tables without requiring any further intervention.
      While the legacy tables are present a restarted node will re-run the data
      conversion and report the outcome so that operators can verify that it is
@@ -483,8 +553,8 @@
     - cqlsh will now display timestamps with a UTC timezone. Previously,
       timestamps were displayed with the local timezone.
     - Commit log files are no longer recycled by default, due to negative
-      performance implications. This can be enabled again with the 
-      commitlog_segment_recycling option in your cassandra.yaml 
+      performance implications. This can be enabled again with the
+      commitlog_segment_recycling option in your cassandra.yaml
     - JMX methods set/getCompactionStrategyClass have been deprecated, use
       set/getCompactionParameters/set/getCompactionParametersJson instead
 
@@ -613,7 +683,7 @@
 Upgrading
 ---------
    - commitlog_sync_batch_window_in_ms behavior has changed from the
-     maximum time to wait between fsync to the minimum time.  We are 
+     maximum time to wait between fsync to the minimum time.  We are
      working on making this more user-friendly (see CASSANDRA-9533) but in the
      meantime, this means 2.1 needs a much smaller batch window to keep
      writer threads from starving.  The suggested default is now 2ms.

diff --git a/NOTICE.txt b/NOTICE.txt
index a20994f..1c552fc 100644
--- a/NOTICE.txt
+++ b/NOTICE.txt

@@ -83,3 +83,6 @@
 ASM
 (http://asm.ow2.org/)
 Copyright (c) 2000-2011 INRIA, France Telecom
+
+HdrHistogram
+http://hdrhistogram.org

diff --git a/bin/cassandra b/bin/cassandra
index c968c35..3206fdc 100755
--- a/bin/cassandra
+++ b/bin/cassandra

@@ -211,7 +211,7 @@
 }
 
 # Parse any command line options.
-args=`getopt vfhp:bD:H:E: "$@"`
+args=`getopt vRfhp:bD:H:E: "$@"`
 eval set -- "$args"
 
 classname="org.apache.cassandra.service.CassandraDaemon"
@@ -234,6 +234,10 @@
             "$JAVA" -cp "$CLASSPATH" "-Dlogback.configurationFile=logback-tools.xml" org.apache.cassandra.tools.GetVersion
             exit 0
         ;;
+        -R)
+            allow_root="yes"
+            shift
+        ;;
         -D)
             properties="$properties -D$2"
             shift 2
@@ -248,15 +252,27 @@
         ;;
         --)
             shift
+            if [ "x$*" != "x" ] ; then
+                echo "Error parsing arguments! Unknown argument \"$*\"" >&2
+                exit 1
+            fi
             break
         ;;
         *)
-            echo "Error parsing arguments!" >&2
+            echo "Error parsing arguments! Unknown argument \"$1\"" >&2
             exit 1
         ;;
     esac
 done
 
+if [ "x$allow_root" != "xyes" ] ; then
+    if [ "`id -u`" = "0" ] || [ "`id -g`" = "0" ] ; then
+        echo "Running Cassandra as root user or group is not recommended - please start Cassandra using a different system user."
+        echo "If you really want to force running Cassandra as root, use -R command line option."
+        exit 1
+    fi
+fi
+
 # see CASSANDRA-7254
 "$JAVA" -cp "$CLASSPATH" $JVM_OPTS 2>&1 | grep -q 'Error: Exception thrown by the agent : java.lang.NullPointerException'
 if [ $? -ne "1" ]; then 

diff --git a/bin/cqlsh.py b/bin/cqlsh.py
index 1ce7cfc..ce85449 100644
--- a/bin/cqlsh.py
+++ b/bin/cqlsh.py

@@ -51,6 +51,10 @@
 if sys.version_info[0] != 2 or sys.version_info[1] != 7:
     sys.exit("\nCQL Shell supports only Python 2.7\n")
 
+# see CASSANDRA-10428
+if platform.python_implementation().startswith('Jython'):
+    sys.exit("\nCQL Shell does not run on Jython\n")
+
 UTF8 = 'utf-8'
 CP65001 = 'cp65001'  # Win utf-8 variant
 
@@ -70,7 +74,7 @@
 CQL_LIB_PREFIX = 'cassandra-driver-internal-only-'
 
 CASSANDRA_PATH = os.path.join(os.path.dirname(os.path.realpath(__file__)), '..')
-CASSANDRA_CQL_HTML_FALLBACK = 'https://cassandra.apache.org/doc/cql3/CQL-3.0.html'
+CASSANDRA_CQL_HTML_FALLBACK = 'https://cassandra.apache.org/doc/cql3/CQL-3.2.html'
 
 # default location of local CQL.html
 if os.path.exists(CASSANDRA_PATH + '/doc/cql3/CQL.html'):
@@ -163,22 +167,23 @@
 from cqlshlib import cql3handling, cqlhandling, pylexotron, sslhandling
 from cqlshlib.copyutil import ExportTask, ImportTask
 from cqlshlib.displaying import (ANSI_RESET, BLUE, COLUMN_NAME_COLORS, CYAN,
-                                 RED, FormattedValue, colorme)
+                                 RED, WHITE, FormattedValue, colorme)
 from cqlshlib.formatting import (DEFAULT_DATE_FORMAT, DEFAULT_NANOTIME_FORMAT,
-                                 DEFAULT_TIMESTAMP_FORMAT, DateTimeFormat,
-                                 format_by_type, format_value_utype,
-                                 formatter_for)
+                                 DEFAULT_TIMESTAMP_FORMAT, CqlType, DateTimeFormat,
+                                 format_by_type, formatter_for)
 from cqlshlib.tracing import print_trace, print_trace_session
 from cqlshlib.util import get_file_encoding_bomsize, trim_if_present
 
 DEFAULT_HOST = '127.0.0.1'
 DEFAULT_PORT = 9042
-DEFAULT_CQLVER = '3.4.0'
+DEFAULT_SSL = False
+DEFAULT_CQLVER = '3.4.2'
 DEFAULT_PROTOCOL_VERSION = 4
 DEFAULT_CONNECT_TIMEOUT_SECONDS = 5
 DEFAULT_REQUEST_TIMEOUT_SECONDS = 10
 
 DEFAULT_FLOAT_PRECISION = 5
+DEFAULT_DOUBLE_PRECISION = 5
 DEFAULT_MAX_TRACE_WAIT = 10
 
 if readline is not None and readline.__doc__ is not None and 'libedit' in readline.__doc__:
@@ -473,8 +478,9 @@
                        'MAXATTEMPTS', 'REPORTFREQUENCY', 'DECIMALSEP', 'THOUSANDSSEP', 'BOOLSTYLE',
                        'NUMPROCESSES', 'CONFIGFILE', 'RATEFILE']
 COPY_FROM_OPTIONS = ['CHUNKSIZE', 'INGESTRATE', 'MAXBATCHSIZE', 'MINBATCHSIZE', 'MAXROWS',
-                     'SKIPROWS', 'SKIPCOLS', 'MAXPARSEERRORS', 'MAXINSERTERRORS', 'ERRFILE', 'PREPAREDSTATEMENTS']
-COPY_TO_OPTIONS = ['ENCODING', 'PAGESIZE', 'PAGETIMEOUT', 'BEGINTOKEN', 'ENDTOKEN', 'MAXOUTPUTSIZE', 'MAXREQUESTS']
+                     'SKIPROWS', 'SKIPCOLS', 'MAXPARSEERRORS', 'MAXINSERTERRORS', 'ERRFILE', 'PREPAREDSTATEMENTS', 'TTL']
+COPY_TO_OPTIONS = ['ENCODING', 'PAGESIZE', 'PAGETIMEOUT', 'BEGINTOKEN', 'ENDTOKEN', 'MAXOUTPUTSIZE', 'MAXREQUESTS',
+                   'FLOATPRECISION', 'DOUBLEPRECISION']
 
 
 @cqlsh_syntax_completer('copyOption', 'optnames')
@@ -571,14 +577,14 @@
     return ver, vertuple
 
 
-def format_value(val, output_encoding, addcolor=False, date_time_format=None,
+def format_value(val, cqltype, encoding, addcolor=False, date_time_format=None,
                  float_precision=None, colormap=None, nullval=None):
     if isinstance(val, DecodeError):
         if addcolor:
             return colorme(repr(val.thebytes), colormap, 'error')
         else:
             return FormattedValue(repr(val.thebytes))
-    return format_by_type(type(val), val, output_encoding, colormap=colormap,
+    return format_by_type(val, cqltype=cqltype, encoding=encoding, colormap=colormap,
                           addcolor=addcolor, nullval=nullval, date_time_format=date_time_format,
                           float_precision=float_precision)
 
@@ -595,30 +601,6 @@
 
 
 def insert_driver_hooks():
-    extend_cql_deserialization()
-    auto_format_udts()
-
-
-def extend_cql_deserialization():
-    """
-    The python driver returns BLOBs as string, but we expect them as bytearrays
-    the implementation of cassandra.cqltypes.BytesType.deserialize.
-
-    The deserializers package exists only when the driver has been compiled with cython extensions and
-    cassandra.deserializers.DesBytesType replaces cassandra.cqltypes.BytesType.deserialize.
-
-    DesBytesTypeByteArray is a fast deserializer that converts blobs into bytearrays but it was
-    only introduced recently (3.1.0). If it is available we use it, otherwise we remove
-    cassandra.deserializers.DesBytesType so that we fall back onto cassandra.cqltypes.BytesType.deserialize
-    just like in the case where no cython extensions are present.
-    """
-    if hasattr(cassandra, 'deserializers'):
-        if hasattr(cassandra.deserializers, 'DesBytesTypeByteArray'):
-            cassandra.deserializers.DesBytesType = cassandra.deserializers.DesBytesTypeByteArray
-        else:
-            del cassandra.deserializers.DesBytesType
-
-    cassandra.cqltypes.BytesType.deserialize = staticmethod(lambda byts, protocol_version: bytearray(byts))
 
     class DateOverFlowWarning(RuntimeWarning):
         pass
@@ -629,7 +611,8 @@
         try:
             return datetime_from_timestamp(timestamp_ms / 1000.0)
         except OverflowError:
-            warnings.warn(DateOverFlowWarning("Some timestamps are larger than Python datetime can represent. Timestamps are displayed in milliseconds from epoch."))
+            warnings.warn(DateOverFlowWarning("Some timestamps are larger than Python datetime can represent. "
+                                              "Timestamps are displayed in milliseconds from epoch."))
             return timestamp_ms
 
     cassandra.cqltypes.DateType.deserialize = staticmethod(deserialize_date_fallback_int)
@@ -641,27 +624,6 @@
     cassandra.cqltypes.CassandraType.support_empty_values = True
 
 
-def auto_format_udts():
-    # when we see a new user defined type, set up the shell formatting for it
-    udt_apply_params = cassandra.cqltypes.UserType.apply_parameters
-
-    def new_apply_params(cls, *args, **kwargs):
-        udt_class = udt_apply_params(*args, **kwargs)
-        formatter_for(udt_class.typename)(format_value_utype)
-        return udt_class
-
-    cassandra.cqltypes.UserType.udt_apply_parameters = classmethod(new_apply_params)
-
-    make_udt_class = cassandra.cqltypes.UserType.make_udt_class
-
-    def new_make_udt_class(cls, *args, **kwargs):
-        udt_class = make_udt_class(*args, **kwargs)
-        formatter_for(udt_class.tuple_type.__name__)(format_value_utype)
-        return udt_class
-
-    cassandra.cqltypes.UserType.make_udt_class = classmethod(new_make_udt_class)
-
-
 class FrozenType(cassandra.cqltypes._ParameterizedType):
     """
     Needed until the bundled python driver adds FrozenType.
@@ -706,6 +668,7 @@
                  display_timestamp_format=DEFAULT_TIMESTAMP_FORMAT,
                  display_date_format=DEFAULT_DATE_FORMAT,
                  display_float_precision=DEFAULT_FLOAT_PRECISION,
+                 display_double_precision=DEFAULT_DOUBLE_PRECISION,
                  display_timezone=None,
                  max_trace_wait=DEFAULT_MAX_TRACE_WAIT,
                  ssl=False,
@@ -755,6 +718,7 @@
         self.display_date_format = display_date_format
 
         self.display_float_precision = display_float_precision
+        self.display_double_precision = display_double_precision
 
         self.display_timezone = display_timezone
 
@@ -765,10 +729,6 @@
 
         self.current_keyspace = keyspace
 
-        self.display_timestamp_format = display_timestamp_format
-        self.display_nanotime_format = display_nanotime_format
-        self.display_date_format = display_date_format
-
         self.max_trace_wait = max_trace_wait
         self.session.max_trace_wait = max_trace_wait
 
@@ -822,20 +782,22 @@
     def cqlver_atleast(self, major, minor=0, patch=0):
         return self.cql_ver_tuple[:3] >= (major, minor, patch)
 
-    def myformat_value(self, val, **kwargs):
+    def myformat_value(self, val, cqltype=None, **kwargs):
         if isinstance(val, DecodeError):
             self.decoding_errors.append(val)
         try:
             dtformats = DateTimeFormat(timestamp_format=self.display_timestamp_format,
                                        date_format=self.display_date_format, nanotime_format=self.display_nanotime_format,
                                        timezone=self.display_timezone)
-            return format_value(val, self.output_codec.name,
+            precision = self.display_double_precision if cqltype is not None and cqltype.type_name == 'double' \
+                else self.display_float_precision
+            return format_value(val, cqltype=cqltype, encoding=self.output_codec.name,
                                 addcolor=self.color, date_time_format=dtformats,
-                                float_precision=self.display_float_precision, **kwargs)
+                                float_precision=precision, **kwargs)
         except Exception, e:
             err = FormatError(val, e)
             self.decoding_errors.append(err)
-            return format_value(err, self.output_codec.name, addcolor=self.color)
+            return format_value(err, cqltype=cqltype, encoding=self.output_codec.name, addcolor=self.color)
 
     def myformat_colname(self, name, table_meta=None):
         column_colors = COLUMN_NAME_COLORS.copy()
@@ -845,6 +807,8 @@
                 column_colors.default_factory = lambda: RED
             elif name in [col.name for col in table_meta.clustering_key]:
                 column_colors.default_factory = lambda: CYAN
+            elif name in table_meta.columns and table_meta.columns[name].is_static:
+                column_colors.default_factory = lambda: WHITE
         return self.myformat_value(name, colormap=column_colors)
 
     def report_connection(self):
@@ -921,8 +885,7 @@
         except KeyError:
             raise UserTypeNotFound("User type %r not found" % typename)
 
-        return [(field_name, field_type.cql_parameterized_type())
-                for field_name, field_type in zip(user_type.field_names, user_type.field_types)]
+        return zip(user_type.field_names, user_type.field_types)
 
     def get_userfunction_names(self, ksname=None):
         if ksname is None:
@@ -1369,7 +1332,14 @@
             # print header only
             self.print_formatted_result(formatted_names, None)
             return
-        formatted_values = [map(self.myformat_value, row.values()) for row in rows]
+
+        cql_types = []
+        if table_meta:
+            ks_meta = self.conn.metadata.keyspaces[table_meta.keyspace_name]
+            cql_types = [CqlType(table_meta.columns[c].cql_type, ks_meta)
+                         if c in table_meta.columns else None for c in column_names]
+
+        formatted_values = [map(self.myformat_value, row.values(), cql_types) for row in rows]
 
         if self.expand_enabled:
             self.print_formatted_result_vertically(formatted_names, formatted_values)
@@ -1648,7 +1618,6 @@
         except KeyError:
             raise UserTypeNotFound("User type %r not found" % typename)
         print usertype.export_as_string()
-        print
 
     def describe_cluster(self):
         print '\nCluster: %s' % self.get_cluster_name()
@@ -1884,6 +1853,7 @@
                                     have to compile every batch statement. For large and oversized clusters
                                     this will result in a faster import but for smaller clusters it may generate
                                     timeouts.
+          TTL=3600                - the time to live in seconds, by default data will not expire
 
         Available COPY TO options and defaults:
 
@@ -1896,6 +1866,8 @@
           MAXOUTPUTSIZE='-1'       - the maximum size of the output file measured in number of lines,
                                      beyond this maximum the output file will be split into segments,
                                      -1 means unlimited.
+          FLOATPRECISION=5         - the number of digits displayed after the decimal point for cql float values
+          DOUBLEPRECISION=12       - the number of digits displayed after the decimal point for cql double values
 
         When entering CSV data on STDIN, you can use the sequence "\."
         on a line by itself to end the data input.
@@ -2002,6 +1974,7 @@
                          display_date_format=self.display_date_format,
                          display_nanotime_format=self.display_nanotime_format,
                          display_float_precision=self.display_float_precision,
+                         display_double_precision=self.display_double_precision,
                          max_trace_wait=self.max_trace_wait)
         subshell.cmdloop()
         f.close()
@@ -2462,6 +2435,8 @@
                                                     DEFAULT_DATE_FORMAT)
     optvalues.float_precision = option_with_default(configs.getint, 'ui', 'float_precision',
                                                     DEFAULT_FLOAT_PRECISION)
+    optvalues.double_precision = option_with_default(configs.getint, 'ui', 'double_precision',
+                                                     DEFAULT_DOUBLE_PRECISION)
     optvalues.field_size_limit = option_with_default(configs.getint, 'csv', 'field_size_limit', csv.field_size_limit())
     optvalues.max_trace_wait = option_with_default(configs.getfloat, 'tracing', 'max_trace_wait',
                                                    DEFAULT_MAX_TRACE_WAIT)
@@ -2469,7 +2444,7 @@
 
     optvalues.debug = False
     optvalues.file = None
-    optvalues.ssl = False
+    optvalues.ssl = option_with_default(configs.getboolean, 'connection', 'ssl', DEFAULT_SSL)
     optvalues.encoding = option_with_default(configs.get, 'ui', 'encoding', UTF8)
 
     optvalues.tty = option_with_default(configs.getboolean, 'ui', 'tty', sys.stdin.isatty())
@@ -2584,6 +2559,7 @@
         sys.stderr.write("Using CQL driver: %s\n" % (cassandra,))
         sys.stderr.write("Using connect timeout: %s seconds\n" % (options.connect_timeout,))
         sys.stderr.write("Using '%s' encoding\n" % (options.encoding,))
+        sys.stderr.write("Using ssl: %s\n" % (options.ssl,))
 
     # create timezone based on settings, environment or auto-detection
     timezone = None
@@ -2631,6 +2607,7 @@
                       display_nanotime_format=options.nanotime_format,
                       display_date_format=options.date_format,
                       display_float_precision=options.float_precision,
+                      display_double_precision=options.double_precision,
                       display_timezone=timezone,
                       max_trace_wait=options.max_trace_wait,
                       ssl=options.ssl,

diff --git a/build.xml b/build.xml
index 19675e4..e13fdc0 100644
--- a/build.xml
+++ b/build.xml

@@ -25,7 +25,7 @@
     <property name="debuglevel" value="source,lines,vars"/>
 
     <!-- default version and SCM information -->
-    <property name="base.version" value="3.0.8"/>
+    <property name="base.version" value="3.9"/>
     <property name="scm.connection" value="scm:git://git.apache.org/cassandra.git"/>
     <property name="scm.developerConnection" value="scm:git://git.apache.org/cassandra.git"/>
     <property name="scm.url" value="http://git-wip-us.apache.org/repos/asf?p=cassandra.git;a=tree"/>
@@ -34,6 +34,7 @@
     <property name="basedir" value="."/>
     <property name="build.src" value="${basedir}/src"/>
     <property name="build.src.java" value="${basedir}/src/java"/>
+    <property name="build.src.antlr" value="${basedir}/src/antlr"/>
     <property name="build.src.jdkoverride" value="${basedir}/src/jdkoverride" />
     <property name="build.src.resources" value="${basedir}/src/resources"/>
     <property name="build.src.gen-java" value="${basedir}/src/gen-java"/>
@@ -66,6 +67,8 @@
     <property name="test.microbench.src" value="${test.dir}/microbench"/>
     <property name="dist.dir" value="${build.dir}/dist"/>
     <property name="tmp.dir" value="${java.io.tmpdir}"/>
+
+    <property name="doc.dir" value="${basedir}/doc"/>
 	
     <property name="source.version" value="1.8"/>
     <property name="target.version" value="1.8"/>
@@ -211,12 +214,15 @@
     -->
     <target name="check-gen-cql3-grammar">
         <uptodate property="cql3current"
-                srcfile="${build.src.java}/org/apache/cassandra/cql3/Cql.g"
-                targetfile="${build.src.gen-java}/org/apache/cassandra/cql3/Cql.tokens"/>
+                targetfile="${build.src.gen-java}/org/apache/cassandra/cql3/Cql.tokens">
+            <srcfiles dir="${build.src.antlr}">
+                <include name="*.g"/>
+            </srcfiles>
+        </uptodate>
     </target>
  
     <target name="gen-cql3-grammar" depends="check-gen-cql3-grammar" unless="cql3current">
-      <echo>Building Grammar ${build.src.java}/org/apache/cassandra/cql3/Cql.g  ...</echo>
+      <echo>Building Grammar ${build.src.antlr}/Cql.g  ...</echo>
       <java classname="org.antlr.Tool"
             classpath="${build.dir.lib}/jars/antlr-3.5.2.jar;${build.lib}/antlr-runtime-3.5.2.jar;${build.lib}/ST4-4.0.8.jar"
             fork="true"
@@ -224,7 +230,7 @@
          <jvmarg value="-Xmx512M" />
          <arg value="-Xconversiontimeout" />
          <arg value="10000" />
-         <arg value="${build.src.java}/org/apache/cassandra/cql3/Cql.g" />
+         <arg value="${build.src.antlr}/Cql.g" />
          <arg value="-fo" />
          <arg value="${build.src.gen-java}/org/apache/cassandra/cql3/" />
          <arg value="-Xmaxinlinedfastates"/>
@@ -245,6 +251,26 @@
         </wikitext-to-html>
     </target>
 
+    <target name="gen-doc" depends="maven-ant-tasks-init" description="Generate documentation">
+        <exec executable="make" osfamily="unix" dir="${doc.dir}">
+            <arg value="html"/>
+        </exec>
+        <exec executable="cmd" osfamily="dos" dir="${doc.dir}">
+            <arg value="/c"/>
+            <arg value="make.bat"/>
+            <arg value="html"/>
+        </exec>
+    </target>
+
+    <!--
+        Generates Java sources for tokenization support from jflex
+        grammar files
+    -->
+    <target name="generate-jflex-java" description="Generate Java from jflex grammar">
+        <taskdef classname="jflex.anttask.JFlexTask" classpath="${build.lib}/jflex-1.6.0.jar" name="jflex" />
+        <jflex file="${build.src.java}/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerImpl.jflex" destdir="${build.src.gen-java}/" />
+    </target>
+
     <!--
        Fetch Maven Ant Tasks and Cassandra's dependencies
        These targets are intentionally free of dependencies so that they
@@ -337,6 +363,7 @@
           <dependency groupId="net.jpountz.lz4" artifactId="lz4" version="1.3.0"/>
           <dependency groupId="com.ning" artifactId="compress-lzf" version="0.8.4"/>
           <dependency groupId="com.google.guava" artifactId="guava" version="18.0"/>
+          <dependency groupId="org.hdrhistogram" artifactId="HdrHistogram" version="2.1.9"/>
           <dependency groupId="commons-cli" artifactId="commons-cli" version="1.1"/>
           <dependency groupId="commons-codec" artifactId="commons-codec" version="1.2"/>
           <dependency groupId="org.apache.commons" artifactId="commons-lang3" version="3.1"/>
@@ -361,6 +388,7 @@
 	
           <dependency groupId="com.thinkaurelius.thrift" artifactId="thrift-server" version="0.3.7">
             <exclusion groupId="org.slf4j" artifactId="slf4j-log4j12"/>
+            <exclusion groupId="junit" artifactId="junit"/>
           </dependency>
           <dependency groupId="org.yaml" artifactId="snakeyaml" version="1.11"/>
           <dependency groupId="org.apache.thrift" artifactId="libthrift" version="0.9.2">
@@ -398,10 +426,15 @@
           <dependency groupId="com.addthis.metrics" artifactId="reporter-config3" version="3.0.0" />
           <dependency groupId="org.mindrot" artifactId="jbcrypt" version="0.3m" />
           <dependency groupId="io.airlift" artifactId="airline" version="0.6" />
-          <dependency groupId="io.netty" artifactId="netty-all" version="4.0.23.Final" />
+          <dependency groupId="io.netty" artifactId="netty-all" version="4.0.39.Final" />
           <dependency groupId="com.google.code.findbugs" artifactId="jsr305" version="2.0.2" />
           <dependency groupId="com.clearspring.analytics" artifactId="stream" version="2.5.2" />
-          <dependency groupId="com.datastax.cassandra" artifactId="cassandra-driver-core" version="3.0.1" classifier="shaded" />
+          <dependency groupId="com.datastax.cassandra" artifactId="cassandra-driver-core" version="3.0.1" classifier="shaded">
+            <exclusion groupId="io.netty" artifactId="netty-buffer"/>
+            <exclusion groupId="io.netty" artifactId="netty-codec"/>
+            <exclusion groupId="io.netty" artifactId="netty-handler"/>
+            <exclusion groupId="io.netty" artifactId="netty-transport"/>
+          </dependency>
           <dependency groupId="org.eclipse.jdt.core.compiler" artifactId="ecj" version="4.4.2" />
           <dependency groupId="org.caffinitas.ohc" artifactId="ohc-core" version="0.4.3" />
           <dependency groupId="org.caffinitas.ohc" artifactId="ohc-core-j8" version="0.4.3" />
@@ -410,7 +443,12 @@
           	<exclusion groupId="log4j" artifactId="log4j"/>
           </dependency>
           <dependency groupId="joda-time" artifactId="joda-time" version="2.4" />
-
+          <dependency groupId="com.carrotsearch" artifactId="hppc" version="0.5.4" />
+          <dependency groupId="de.jflex" artifactId="jflex" version="1.6.0" />
+          <dependency groupId="net.mintern" artifactId="primitive" version="1.0" />
+          <dependency groupId="com.github.rholder" artifactId="snowball-stemmer" version="1.3.0.581.1" />
+          <dependency groupId="com.googlecode.concurrent-trees" artifactId="concurrent-trees" version="2.4.0" />
+	  <dependency groupId="com.github.ben-manes.caffeine" artifactId="caffeine" version="2.2.6" />
         </dependencyManagement>
         <developer id="alakshman" name="Avinash Lakshman"/>
         <developer id="aleksey" name="Aleksey Yeschenko"/>
@@ -454,7 +492,12 @@
       	<dependency groupId="org.apache.hadoop" artifactId="hadoop-minicluster"/>
       	<dependency groupId="com.google.code.findbugs" artifactId="jsr305"/>
         <dependency groupId="org.antlr" artifactId="antlr"/>
-        <dependency groupId="com.datastax.cassandra" artifactId="cassandra-driver-core" classifier="shaded"/>
+        <dependency groupId="com.datastax.cassandra" artifactId="cassandra-driver-core" classifier="shaded">
+          <exclusion groupId="io.netty" artifactId="netty-buffer"/>
+          <exclusion groupId="io.netty" artifactId="netty-codec"/>
+          <exclusion groupId="io.netty" artifactId="netty-handler"/>
+          <exclusion groupId="io.netty" artifactId="netty-transport"/>
+        </dependency>
         <dependency groupId="org.eclipse.jdt.core.compiler" artifactId="ecj"/>
         <dependency groupId="org.caffinitas.ohc" artifactId="ohc-core" version="0.4.3" />
         <dependency groupId="org.caffinitas.ohc" artifactId="ohc-core-j8" version="0.4.3" />
@@ -471,7 +514,13 @@
                 artifactId="cassandra-parent"
                 version="${version}"/>
         <dependency groupId="junit" artifactId="junit"/>
-        <dependency groupId="com.datastax.cassandra" artifactId="cassandra-driver-core" classifier="shaded"/>
+        <dependency groupId="com.datastax.cassandra" artifactId="cassandra-driver-core" classifier="shaded">
+          <exclusion groupId="io.netty" artifactId="netty-buffer"/>
+          <exclusion groupId="io.netty" artifactId="netty-codec"/>
+          <exclusion groupId="io.netty" artifactId="netty-handler"/>
+          <exclusion groupId="io.netty" artifactId="netty-transport"/>
+        </dependency>
+        <dependency groupId="io.netty" artifactId="netty-all"/>
         <dependency groupId="org.eclipse.jdt.core.compiler" artifactId="ecj"/>
         <dependency groupId="org.caffinitas.ohc" artifactId="ohc-core"/>
         <dependency groupId="org.openjdk.jmh" artifactId="jmh-core"/>
@@ -546,7 +595,12 @@
         <dependency groupId="org.apache.hadoop" artifactId="hadoop-minicluster" optional="true"/>
 
         <!-- don't need the Java Driver to run, but if you use the hadoop stuff or UDFs -->
-        <dependency groupId="com.datastax.cassandra" artifactId="cassandra-driver-core" classifier="shaded" optional="true"/>
+        <dependency groupId="com.datastax.cassandra" artifactId="cassandra-driver-core" classifier="shaded" optional="true">
+          <exclusion groupId="io.netty" artifactId="netty-buffer"/>
+          <exclusion groupId="io.netty" artifactId="netty-codec"/>
+          <exclusion groupId="io.netty" artifactId="netty-handler"/>
+          <exclusion groupId="io.netty" artifactId="netty-transport"/>
+        </dependency>
 
         <!-- don't need jna to run, but nice to have -->
         <dependency groupId="net.java.dev.jna" artifactId="jna"/>
@@ -559,6 +613,7 @@
         <dependency groupId="org.fusesource" artifactId="sigar"/>
         <dependency groupId="org.eclipse.jdt.core.compiler" artifactId="ecj"/>
         <dependency groupId="org.caffinitas.ohc" artifactId="ohc-core"/>
+	<dependency groupId="com.github.ben-manes.caffeine" artifactId="caffeine" />
       </artifact:pom>
       <artifact:pom id="thrift-pom"
                     artifactId="cassandra-thrift"
@@ -573,6 +628,12 @@
         <dependency groupId="org.slf4j" artifactId="log4j-over-slf4j"/>
         <dependency groupId="org.slf4j" artifactId="jcl-over-slf4j"/>
         <dependency groupId="org.apache.thrift" artifactId="libthrift"/>
+        <dependency groupId="com.carrotsearch" artifactId="hppc" version="0.5.4" />
+        <dependency groupId="de.jflex" artifactId="jflex" version="1.6.0" />
+        <dependency groupId="net.mintern" artifactId="primitive" version="1.0" />
+        <dependency groupId="com.github.rholder" artifactId="snowball-stemmer" version="1.3.0.581.1" />
+        <dependency groupId="com.googlecode.concurrent-trees" artifactId="concurrent-trees" version="2.4.0" />
+
       </artifact:pom>
       <artifact:pom id="clientutil-pom"
                     artifactId="cassandra-clientutil"
@@ -739,19 +800,19 @@
         depends="maven-ant-tasks-retrieve-build,build-project" description="Compile Cassandra classes"/>
     <target name="codecoverage" depends="jacoco-run,jacoco-report" description="Create code coverage report"/>
 
-    <target depends="init,gen-cql3-grammar,generate-cql-html"
+    <target depends="init,gen-cql3-grammar,generate-cql-html,generate-jflex-java"
             name="build-project">
         <echo message="${ant.project.name}: ${ant.file}"/>
         <!-- Order matters! -->
         <javac fork="true"
-               debug="true" debuglevel="${debuglevel}"
+               debug="true" debuglevel="${debuglevel}" encoding="utf-8"
                destdir="${build.classes.thrift}" includeantruntime="false" source="${source.version}" target="${target.version}"
                memorymaximumsize="512M">
             <src path="${interface.thrift.dir}/gen-java"/>
             <classpath refid="cassandra.classpath"/>
         </javac>
         <javac fork="true"
-               debug="true" debuglevel="${debuglevel}"
+               debug="true" debuglevel="${debuglevel}" encoding="utf-8"
                destdir="${build.classes.main}" includeantruntime="false" source="${source.version}" target="${target.version}"
                memorymaximumsize="512M">
             <src path="${build.src.java}"/>
@@ -777,7 +838,7 @@
     </path>
     <target name="stress-build" depends="build" description="build stress tool">
     	<mkdir dir="${stress.build.classes}" />
-        <javac debug="true" debuglevel="${debuglevel}" destdir="${stress.build.classes}" includeantruntime="true" source="${source.version}" target="${target.version}">
+        <javac compiler="modern" debug="true" debuglevel="${debuglevel}" encoding="utf-8" destdir="${stress.build.classes}" includeantruntime="true" source="${source.version}" target="${target.version}">
             <src path="${stress.build.src}" />
             <classpath>
                 <path refid="cassandra.classes" />
@@ -788,6 +849,9 @@
                 </path>
             </classpath>
         </javac>
+        <copy todir="${stress.build.classes}">
+            <fileset dir="${stress.build.src}/resources" />
+        </copy>
     </target>
 
 	<target name="_write-poms" depends="maven-declare-dependencies">
@@ -958,7 +1022,7 @@
     </target>
 
     <!-- creates release tarballs -->
-    <target name="artifacts" depends="jar,javadoc"
+    <target name="artifacts" depends="jar,javadoc,gen-doc"
             description="Create Cassandra release artifacts">
       <mkdir dir="${dist.dir}"/>
       <!-- fix the control linefeed so that builds on windows works on linux -->
@@ -978,9 +1042,15 @@
       </copy>
       <copy todir="${dist.dir}/doc">
         <fileset dir="doc">
-          <exclude name="cql3/CQL.textile"/>
+          <include name="cql3/CQL.html" />
+          <include name="cql3/CQL.css" />
+          <include name="SASI.md" />
         </fileset>
       </copy>
+      <copy todir="${dist.dir}/doc/html">
+        <fileset dir="doc" />
+        <globmapper from="build/html/*" to="*"/>
+      </copy>
       <copy todir="${dist.dir}/bin">
         <fileset dir="bin"/>
       </copy>
@@ -1107,12 +1177,14 @@
 
   <target name="build-test" depends="build" description="Compile test classes">
     <javac
+     compiler="modern"
      debug="true"
      debuglevel="${debuglevel}"
      destdir="${test.classes}"
      includeantruntime="false"
      source="${source.version}" 
-     target="${target.version}">
+     target="${target.version}"
+     encoding="utf-8">
      <classpath>
         <path refid="cassandra.classpath"/>
      </classpath>
@@ -1188,9 +1260,20 @@
             <filelist dir="@{inputdir}" files="@{filelist}"/>
         </batchtest>
       </junit>
-      <delete quiet="true" failonerror="false" dir="${build.test.dir}/cassandra/commitlog:@{poffset}"/>
-      <delete quiet="true" failonerror="false" dir="${build.test.dir}/cassandra/data:@{poffset}"/>
-      <delete quiet="true" failonerror="false" dir="${build.test.dir}/cassandra/saved_caches:@{poffset}"/>
+
+      <condition property="fileSep" value=";">
+        <os family="windows"/>
+      </condition>
+      <condition property="fileSep" else=":">
+        <isset property="fileSep"/>
+      </condition>
+      <fail unless="fileSep">Failed to set File Separator. This shouldn't happen.</fail>
+
+      <delete quiet="true" failonerror="false" dir="${build.test.dir}/cassandra/commitlog${fileSep}@{poffset}"/>
+      <delete quiet="true" failonerror="false" dir="${build.test.dir}/cassandra/cdc_raw${fileSep}@{poffset}"/>
+      <delete quiet="true" failonerror="false" dir="${build.test.dir}/cassandra/data${fileSep}@{poffset}"/>
+      <delete quiet="true" failonerror="false" dir="${build.test.dir}/cassandra/saved_caches${fileSep}@{poffset}"/>
+      <delete quiet="true" failonerror="false" dir="${build.test.dir}/cassandra/hints${fileSep}@{poffset}"/>
     </sequential>
   </macrodef>
 
@@ -1285,6 +1368,25 @@
     </sequential>
   </macrodef>
 
+  <macrodef name="testlist-cdc">
+    <attribute name="test.file.list" />
+    <attribute name="testlist.offset" />
+    <sequential>
+      <property name="cdc_yaml" value="${build.test.dir}/cassandra.cdc.yaml"/>
+      <testmacrohelper inputdir="${test.unit.src}" filelist="@{test.file.list}" poffset="@{testlist.offset}"
+                       exclude="**/*.java" timeout="${test.timeout}" testtag="cdc">
+        <jvmarg value="-Dlegacy-sstable-root=${test.data}/legacy-sstables"/>
+        <jvmarg value="-Dinvalid-legacy-sstable-root=${test.data}/invalid-legacy-sstables"/>
+        <jvmarg value="-Dmigration-sstable-root=${test.data}/migration-sstables"/>
+        <jvmarg value="-Dcassandra.ring_delay_ms=1000"/>
+        <jvmarg value="-Dcassandra.tolerate_sstable_size=true"/>
+        <jvmarg value="-Dcassandra.config=file:///${cdc_yaml}"/>
+        <jvmarg value="-Dcassandra.skip_sync=true" />
+        <jvmarg value="-Dcassandra.config.loader=org.apache.cassandra.OffsetAwareConfigurationLoader"/>
+      </testmacrohelper>
+    </sequential>
+  </macrodef>
+
   <!--
     Run named ant task with jacoco, such as "ant jacoco-run -Dtaskname=test"
     the target run must enable the jacoco agent if usejacoco is 'yes' -->
@@ -1325,13 +1427,26 @@
     <testparallel testdelegate="testlist-compression" />
   </target>
 
+  <target name="test-cdc" depends="build-test" description="Execute unit tests with change-data-capture enabled">
+    <property name="cdc_yaml" value="${build.test.dir}/cassandra.cdc.yaml"/>
+    <concat destfile="${cdc_yaml}">
+      <fileset file="${test.conf}/cassandra.yaml"/>
+      <fileset file="${test.conf}/cdc.yaml"/>
+    </concat>
+    <path id="all-test-classes-path">
+      <fileset dir="${test.unit.src}" includes="**/${test.name}.java" />
+    </path>
+    <property name="all-test-classes" refid="all-test-classes-path"/>
+    <testparallel testdelegate="testlist-cdc" />
+  </target>
+
   <target name="msg-ser-gen-test" depends="build-test" description="Generates message serializations">
     <testmacro inputdir="${test.unit.src}"
         timeout="${test.timeout}" filter="**/SerializationsTest.java">
       <jvmarg value="-Dcassandra.test-serialization-writes=True"/>
     </testmacro>
   </target>
-  
+
   <target name="msg-ser-test" depends="build-test" description="Tests message serializations">
       <testmacro inputdir="${test.unit.src}" timeout="${test.timeout}"
                filter="**/SerializationsTest.java"/>

diff --git a/conf/cassandra-env.ps1 b/conf/cassandra-env.ps1
index f037b56..9e2f50d 100644
--- a/conf/cassandra-env.ps1
+++ b/conf/cassandra-env.ps1

@@ -436,10 +436,6 @@
         exit

     }

 

-    # enable assertions.  disabling this in production will give a modest

-    # performance benefit (around 5%).

-    $env:JVM_OPTS = "$env:JVM_OPTS -ea"

-

     # Specifies the default port over which Cassandra will be available for

     # JMX connections.

     $JMX_PORT="7199"

@@ -447,50 +443,11 @@
     # store in env to check if it's avail in verification

     $env:JMX_PORT=$JMX_PORT

 

-    # enable thread priorities, primarily so we can give periodic tasks

-    # a lower priority to avoid interfering with client workload

-    $env:JVM_OPTS="$env:JVM_OPTS -XX:+UseThreadPriorities"

-    # allows lowering thread priority without being root on linux - probably

-    # not necessary on Windows but doesn't harm anything.

-    # see http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workar

-    $env:JVM_OPTS="$env:JVM_OPTS -XX:ThreadPriorityPolicy=42"

-

-    $env:JVM_OPTS="$env:JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError"

-

-    # Per-thread stack size.

-    $env:JVM_OPTS="$env:JVM_OPTS -Xss256k"

-

-    # Larger interned string table, for gossip's benefit (CASSANDRA-6410)

-    $env:JVM_OPTS="$env:JVM_OPTS -XX:StringTableSize=1000003"

-

-    # Make sure all memory is faulted and zeroed on startup.

-    # This helps prevent soft faults in containers and makes

-    # transparent hugepage allocation more effective.

-    #$env:JVM_OPTS="$env:JVM_OPTS -XX:+AlwaysPreTouch"

-

-    # Biased locking does not benefit Cassandra.

-    $env:JVM_OPTS="$env:JVM_OPTS -XX:-UseBiasedLocking"

-

-    # Enable thread-local allocation blocks and allow the JVM to automatically

-    # resize them at runtime.

-    $env:JVM_OPTS="$env:JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB"

-

-    # http://www.evanjones.ca/jvm-mmap-pause.html

-    $env:JVM_OPTS="$env:JVM_OPTS -XX:+PerfDisableSharedMem"

-

     # Configure the following for JEMallocAllocator and if jemalloc is not available in the system

     # library path.

     # set LD_LIBRARY_PATH=<JEMALLOC_HOME>/lib/

     # $env:JVM_OPTS="$env:JVM_OPTS -Djava.library.path=<JEMALLOC_HOME>/lib/"

 

-    # uncomment to have Cassandra JVM listen for remote debuggers/profilers on port 1414

-    # $env:JVM_OPTS="$env:JVM_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=1414"

-

-    # Prefer binding to IPv4 network intefaces (when net.ipv6.bindv6only=1). See

-    # http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6342561 (short version:

-    # comment out this entry to enable IPv6 support).

-    $env:JVM_OPTS="$env:JVM_OPTS -Djava.net.preferIPv4Stack=true"

-

     # jmx: metrics and administration interface

     #

     # add this if you're having trouble connecting:

@@ -506,12 +463,37 @@
     # with authentication and ssl enabled. See https://wiki.apache.org/cassandra/JmxSecurity

     #

     #$env:JVM_OPTS="$env:JVM_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"

-    #$env:JVM_OPTS="$env:JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"

+    #$env:JVM_OPTS="$env:JVM_OPTS -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT"

+    #

+    # JMX SSL options

+    #$env:JVM_OPTS="$env:JVM_OPTS -Dcom.sun.management.jmxremote.ssl=true"

+    #$env:JVM_OPTS="$env:JVM_OPTS -Dcom.sun.management.jmxremote.ssl.need.client.auth=true"

+    #$env:JVM_OPTS="$env:JVM_OPTS -Dcom.sun.management.jmxremote.ssl.enabled.protocols=<enabled-protocols>"

+    #$env:JVM_OPTS="$env:JVM_OPTS -Dcom.sun.management.jmxremote.ssl.enabled.cipher.suites=<enabled-cipher-suites>"

+    #$env:JVM_OPTS="$env:JVM_OPTS -Djavax.net.ssl.keyStore=C:/keystore"

+    #$env:JVM_OPTS="$env:JVM_OPTS -Djavax.net.ssl.keyStorePassword=<keystore-password>"

+    #$env:JVM_OPTS="$env:JVM_OPTS -Djavax.net.ssl.trustStore=C:/truststore"

+    #$env:JVM_OPTS="$env:JVM_OPTS -Djavax.net.ssl.trustStorePassword=<truststore-password>"

+    #

+    # JMX auth options

     #$env:JVM_OPTS="$env:JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=true"

+    ## Basic file based authn & authz

     #$env:JVM_OPTS="$env:JVM_OPTS -Dcom.sun.management.jmxremote.password.file=C:/jmxremote.password"

-    $env:JVM_OPTS="$env:JVM_OPTS -Dcassandra.jmx.local.port=$JMX_PORT -XX:+DisableExplicitGC"

+    #$env:JVM_OPTS="$env:JVM_OPTS -Dcom.sun.management.jmxremote.access.file=C:/jmxremote.access"

+

+    ## Custom auth settings which can be used as alternatives to JMX's out of the box auth utilities.

+    ## JAAS login modules can be used for authentication by uncommenting these two properties.

+    ## Cassandra ships with a LoginModule implementation - org.apache.cassandra.auth.CassandraLoginModule -

+    ## which delegates to the IAuthenticator configured in cassandra.yaml

+    #$env:JVM_OPTS="$env:JVM_OPTS -Dcassandra.jmx.remote.login.config=CassandraLogin"

+    #$env:JVM_OPTS="$env:JVM_OPTS -Djava.security.auth.login.config=C:/cassandra-jaas.config"

+

+    ## Cassandra also ships with a helper for delegating JMX authz calls to the configured IAuthorizer,

+    ## uncomment this to use it. Requires one of the two authentication options to be enabled

+    #$env:JVM_OPTS="$env:JVM_OPTS -Dcassandra.jmx.authorizer=org.apache.cassandra.auth.jmx.AuthorizationProxy"

+

+    # Default JMX setup, bound to local loopback address only

+    $env:JVM_OPTS="$env:JVM_OPTS -Dcassandra.jmx.local.port=$JMX_PORT"

 

     $env:JVM_OPTS="$env:JVM_OPTS $env:JVM_EXTRA_OPTS"

-

-    #$env:JVM_OPTS="$env:JVM_OPTS -XX:+UnlockCommercialFeatures -XX:+FlightRecorder"

 }


diff --git a/conf/cassandra-env.sh b/conf/cassandra-env.sh
index 44fe110..5a02f79 100644
--- a/conf/cassandra-env.sh
+++ b/conf/cassandra-env.sh

@@ -88,7 +88,7 @@
 
 # Determine the sort of JVM we'll be running on.
 java_ver_output=`"${JAVA:-java}" -version 2>&1`
-jvmver=`echo "$java_ver_output" | grep '[openjdk|java] version' | awk -F'"' 'NR==1 {print $2}'`
+jvmver=`echo "$java_ver_output" | grep '[openjdk|java] version' | awk -F'"' 'NR==1 {print $2}' | cut -d\- -f1`
 JVM_VERSION=${jvmver%_*}
 JVM_PATCH_VERSION=${jvmver#*_}
 
@@ -203,62 +203,17 @@
     JVM_OPTS="$JVM_OPTS -XX:+UseCondCardMark"
 fi
 
-# enable assertions.  disabling this in production will give a modest
-# performance benefit (around 5%).
-JVM_OPTS="$JVM_OPTS -ea"
-
-# Per-thread stack size.
-JVM_OPTS="$JVM_OPTS -Xss256k"
-
-# Make sure all memory is faulted and zeroed on startup.
-# This helps prevent soft faults in containers and makes
-# transparent hugepage allocation more effective.
-JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch"
-
-# Biased locking does not benefit Cassandra.
-JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"
-
-# Larger interned string table, for gossip's benefit (CASSANDRA-6410)
-JVM_OPTS="$JVM_OPTS -XX:StringTableSize=1000003"
-
-# Enable thread-local allocation blocks and allow the JVM to automatically
-# resize them at runtime.
-JVM_OPTS="$JVM_OPTS -XX:+UseTLAB -XX:+ResizeTLAB"
-
-# http://www.evanjones.ca/jvm-mmap-pause.html
-JVM_OPTS="$JVM_OPTS -XX:+PerfDisableSharedMem"
-
 # provides hints to the JIT compiler
 JVM_OPTS="$JVM_OPTS -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
 
 # add the jamm javaagent
 JVM_OPTS="$JVM_OPTS -javaagent:$CASSANDRA_HOME/lib/jamm-0.3.0.jar"
 
-# enable thread priorities, primarily so we can give periodic tasks
-# a lower priority to avoid interfering with client workload
-JVM_OPTS="$JVM_OPTS -XX:+UseThreadPriorities"
-# allows lowering thread priority without being root.  see
-# http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workaround.html
-JVM_OPTS="$JVM_OPTS -XX:ThreadPriorityPolicy=42"
-
 # set jvm HeapDumpPath with CASSANDRA_HEAPDUMP_DIR
-JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError"
 if [ "x$CASSANDRA_HEAPDUMP_DIR" != "x" ]; then
     JVM_OPTS="$JVM_OPTS -XX:HeapDumpPath=$CASSANDRA_HEAPDUMP_DIR/cassandra-`date +%s`-pid$$.hprof"
 fi
 
-# uncomment to have Cassandra JVM listen for remote debuggers/profilers on port 1414
-# JVM_OPTS="$JVM_OPTS -agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=1414"
-
-# uncomment to have Cassandra JVM log internal method compilation (developers only)
-# JVM_OPTS="$JVM_OPTS -XX:+UnlockDiagnosticVMOptions -XX:+LogCompilation"
-# JVM_OPTS="$JVM_OPTS -XX:+UnlockCommercialFeatures -XX:+FlightRecorder"
-
-# Prefer binding to IPv4 network intefaces (when net.ipv6.bindv6only=1). See
-# http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6342561 (short version:
-# comment out this entry to enable IPv6 support).
-JVM_OPTS="$JVM_OPTS -Djava.net.preferIPv4Stack=true"
-
 # jmx: metrics and administration interface
 #
 # add this if you're having trouble connecting:
@@ -283,23 +238,45 @@
 JMX_PORT="7199"
 
 if [ "$LOCAL_JMX" = "yes" ]; then
-  JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.local.port=$JMX_PORT -XX:+DisableExplicitGC"
+  JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.local.port=$JMX_PORT"
+  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=false"
 else
-  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.port=$JMX_PORT"
+  JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.remote.port=$JMX_PORT"
+  # if ssl is enabled the same port cannot be used for both jmx and rmi so either
+  # pick another value for this property or comment out to use a random port (though see CASSANDRA-7087 for origins)
   JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.rmi.port=$JMX_PORT"
-  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=false"
+
+  # turn on JMX authentication. See below for further options
   JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.authenticate=true"
-  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password"
-#  JVM_OPTS="$JVM_OPTS -Djavax.net.ssl.keyStore=/path/to/keystore"
-#  JVM_OPTS="$JVM_OPTS -Djavax.net.ssl.keyStorePassword=<keystore-password>"
-#  JVM_OPTS="$JVM_OPTS -Djavax.net.ssl.trustStore=/path/to/truststore"
-#  JVM_OPTS="$JVM_OPTS -Djavax.net.ssl.trustStorePassword=<truststore-password>"
-#  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl.need.client.auth=true"
-#  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.registry.ssl=true"
-#  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl.enabled.protocols=<enabled-protocols>"
-#  JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl.enabled.cipher.suites=<enabled-cipher-suites>"
+
+  # jmx ssl options
+  #JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl=true"
+  #JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl.need.client.auth=true"
+  #JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl.enabled.protocols=<enabled-protocols>"
+  #JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.ssl.enabled.cipher.suites=<enabled-cipher-suites>"
+  #JVM_OPTS="$JVM_OPTS -Djavax.net.ssl.keyStore=/path/to/keystore"
+  #JVM_OPTS="$JVM_OPTS -Djavax.net.ssl.keyStorePassword=<keystore-password>"
+  #JVM_OPTS="$JVM_OPTS -Djavax.net.ssl.trustStore=/path/to/truststore"
+  #JVM_OPTS="$JVM_OPTS -Djavax.net.ssl.trustStorePassword=<truststore-password>"
 fi
 
+# jmx authentication and authorization options. By default, auth is only
+# activated for remote connections but they can also be enabled for local only JMX
+## Basic file based authn & authz
+JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password"
+#JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.access.file=/etc/cassandra/jmxremote.access"
+## Custom auth settings which can be used as alternatives to JMX's out of the box auth utilities.
+## JAAS login modules can be used for authentication by uncommenting these two properties.
+## Cassandra ships with a LoginModule implementation - org.apache.cassandra.auth.CassandraLoginModule -
+## which delegates to the IAuthenticator configured in cassandra.yaml. See the sample JAAS configuration
+## file cassandra-jaas.config
+#JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.remote.login.config=CassandraLogin"
+#JVM_OPTS="$JVM_OPTS -Djava.security.auth.login.config=$CASSANDRA_HOME/conf/cassandra-jaas.config"
+
+## Cassandra also ships with a helper for delegating JMX authz calls to the configured IAuthorizer,
+## uncomment this to use it. Requires one of the two authentication options to be enabled
+#JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.authorizer=org.apache.cassandra.auth.jmx.AuthorizationProxy"
+
 # To use mx4j, an HTML interface for JMX, add mx4j-tools.jar to the lib/
 # directory.
 # See http://wiki.apache.org/cassandra/Operations#Monitoring_with_MX4J

diff --git a/conf/cassandra-jaas.config b/conf/cassandra-jaas.config
new file mode 100644
index 0000000..f3a9bf7
--- /dev/null
+++ b/conf/cassandra-jaas.config

@@ -0,0 +1,4 @@
+// Delegates authentication to Cassandra's configured IAuthenticator
+CassandraLogin {
+  org.apache.cassandra.auth.CassandraLoginModule REQUIRED;
+};

diff --git a/conf/cassandra.yaml b/conf/cassandra.yaml
index 085b68e..aaabc2b 100644
--- a/conf/cassandra.yaml
+++ b/conf/cassandra.yaml

@@ -35,20 +35,22 @@
 # Only supported with the Murmur3Partitioner.
 # allocate_tokens_for_keyspace: KEYSPACE
 
-# initial_token allows you to specify tokens manually.  While you can use # it with
+# initial_token allows you to specify tokens manually.  While you can use it with
 # vnodes (num_tokens > 1, above) -- in which case you should provide a 
-# comma-separated list -- it's primarily used when adding nodes # to legacy clusters 
+# comma-separated list -- it's primarily used when adding nodes to legacy clusters 
 # that do not have vnodes enabled.
 # initial_token:
 
 # See http://wiki.apache.org/cassandra/HintedHandoff
 # May either be "true" or "false" to enable globally
 hinted_handoff_enabled: true
+
 # When hinted_handoff_enabled is true, a black list of data centers that will not
 # perform hinted handoff
-#hinted_handoff_disabled_datacenters:
+# hinted_handoff_disabled_datacenters:
 #    - DC1
 #    - DC2
+
 # this defines the maximum amount of time a dead host will have hints
 # generated.  After it has been dead this long, new hints for it will not be
 # created until it has been seen alive and gone down again.
@@ -120,11 +122,11 @@
 #   increase system_auth keyspace replication factor if you use this role manager.
 role_manager: CassandraRoleManager
 
-# Validity period for roles cache (fetching permissions can be an
-# expensive operation depending on the authorizer). Granted roles are cached for
-# authenticated sessions in AuthenticatedUser and after the period specified
-# here, become eligible for (async) reload.
-# Defaults to 2000, set to 0 to disable.
+# Validity period for roles cache (fetching granted roles can be an expensive
+# operation depending on the role manager, CassandraRoleManager is one example)
+# Granted roles are cached for authenticated sessions in AuthenticatedUser and
+# after the period specified here, become eligible for (async) reload.
+# Defaults to 2000, set to 0 to disable caching entirely.
 # Will be disabled automatically for AllowAllAuthenticator.
 roles_validity_in_ms: 2000
 
@@ -134,7 +136,7 @@
 # completes. If roles_validity_in_ms is non-zero, then this must be
 # also.
 # Defaults to the same value as roles_validity_in_ms.
-# roles_update_interval_in_ms: 1000
+# roles_update_interval_in_ms: 2000
 
 # Validity period for permissions cache (fetching permissions can be an
 # expensive operation depending on the authorizer, CassandraAuthorizer is
@@ -148,7 +150,26 @@
 # completes. If permissions_validity_in_ms is non-zero, then this must be
 # also.
 # Defaults to the same value as permissions_validity_in_ms.
-# permissions_update_interval_in_ms: 1000
+# permissions_update_interval_in_ms: 2000
+
+# Validity period for credentials cache. This cache is tightly coupled to
+# the provided PasswordAuthenticator implementation of IAuthenticator. If
+# another IAuthenticator implementation is configured, this cache will not
+# be automatically used and so the following settings will have no effect.
+# Please note, credentials are cached in their encrypted form, so while
+# activating this cache may reduce the number of queries made to the
+# underlying table, it may not  bring a significant reduction in the
+# latency of individual authentication attempts.
+# Defaults to 2000, set to 0 to disable credentials caching.
+credentials_validity_in_ms: 2000
+
+# Refresh interval for credentials cache (if enabled).
+# After this interval, cache entries become eligible for refresh. Upon next
+# access, an async reload is scheduled and the old value returned until it
+# completes. If credentials_validity_in_ms is non-zero, then this must be
+# also.
+# Defaults to the same value as credentials_validity_in_ms.
+# credentials_update_interval_in_ms: 2000
 
 # The partitioner is responsible for distributing groups of rows (by
 # partition key) across nodes in the cluster.  You should leave this
@@ -174,28 +195,85 @@
 # If not set, the default directory is $CASSANDRA_HOME/data/commitlog.
 # commitlog_directory: /var/lib/cassandra/commitlog
 
-# policy for data disk failures:
-# die: shut down gossip and client transports and kill the JVM for any fs errors or
-#      single-sstable errors, so the node can be replaced.
-# stop_paranoid: shut down gossip and client transports even for single-sstable errors,
-#                kill the JVM for errors during startup.
-# stop: shut down gossip and client transports, leaving the node effectively dead, but
-#       can still be inspected via JMX, kill the JVM for errors during startup.
-# best_effort: stop using the failed disk and respond to requests based on
-#              remaining available sstables.  This means you WILL see obsolete
-#              data at CL.ONE!
-# ignore: ignore fatal errors and let requests fail, as in pre-1.2 Cassandra
+# Enable / disable CDC functionality on a per-node basis. This modifies the logic used
+# for write path allocation rejection (standard: never reject. cdc: reject Mutation
+# containing a CDC-enabled table if at space limit in cdc_raw_directory).
+cdc_enabled: false
+
+# CommitLogSegments are moved to this directory on flush if cdc_enabled: true and the
+# segment contains mutations for a CDC-enabled table. This should be placed on a
+# separate spindle than the data directories. If not set, the default directory is
+# $CASSANDRA_HOME/data/cdc_raw.
+# cdc_raw_directory: /var/lib/cassandra/cdc_raw
+
+# Policy for data disk failures:
+#
+# die
+#   shut down gossip and client transports and kill the JVM for any fs errors or
+#   single-sstable errors, so the node can be replaced.
+#
+# stop_paranoid
+#   shut down gossip and client transports even for single-sstable errors,
+#   kill the JVM for errors during startup.
+#
+# stop
+#   shut down gossip and client transports, leaving the node effectively dead, but
+#   can still be inspected via JMX, kill the JVM for errors during startup.
+#
+# best_effort
+#    stop using the failed disk and respond to requests based on
+#    remaining available sstables.  This means you WILL see obsolete
+#    data at CL.ONE!
+#
+# ignore
+#    ignore fatal errors and let requests fail, as in pre-1.2 Cassandra
 disk_failure_policy: stop
 
-# policy for commit disk failures:
-# die: shut down gossip and Thrift and kill the JVM, so the node can be replaced.
-# stop: shut down gossip and Thrift, leaving the node effectively dead, but
-#       can still be inspected via JMX.
-# stop_commit: shutdown the commit log, letting writes collect but
-#              continuing to service reads, as in pre-2.0.5 Cassandra
-# ignore: ignore fatal errors and let the batches fail
+# Policy for commit disk failures:
+#
+# die
+#   shut down gossip and Thrift and kill the JVM, so the node can be replaced.
+#
+# stop
+#   shut down gossip and Thrift, leaving the node effectively dead, but
+#   can still be inspected via JMX.
+#
+# stop_commit
+#   shutdown the commit log, letting writes collect but
+#   continuing to service reads, as in pre-2.0.5 Cassandra
+#
+# ignore
+#   ignore fatal errors and let the batches fail
 commit_failure_policy: stop
 
+# Maximum size of the native protocol prepared statement cache
+#
+# Valid values are either "auto" (omitting the value) or a value greater 0.
+#
+# Note that specifying a too large value will result in long running GCs and possbily
+# out-of-memory errors. Keep the value at a small fraction of the heap.
+#
+# If you constantly see "prepared statements discarded in the last minute because
+# cache limit reached" messages, the first step is to investigate the root cause
+# of these messages and check whether prepared statements are used correctly -
+# i.e. use bind markers for variable parts.
+#
+# Do only change the default value, if you really have more prepared statements than
+# fit in the cache. In most cases it is not neccessary to change this value.
+# Constantly re-preparing statements is a performance penalty.
+#
+# Default value ("auto") is 1/256th of the heap or 10MB, whichever is greater
+prepared_statements_cache_size_mb:
+
+# Maximum size of the Thrift prepared statement cache
+#
+# If you do not use Thrift at all, it is safe to leave this value at "auto".
+#
+# See description of 'prepared_statements_cache_size_mb' above for more information.
+#
+# Default value ("auto") is 1/256th of the heap or 10MB, whichever is greater
+thrift_prepared_statements_cache_size_mb:
+
 # Maximum size of the key cache in memory.
 #
 # Each key cache hit saves 1 seek and each row cache hit saves 2 seeks at the
@@ -225,11 +303,14 @@
 # Disabled by default, meaning all keys are going to be saved
 # key_cache_keys_to_save: 100
 
-# Row cache implementation class name.
-# Available implementations:
-#   org.apache.cassandra.cache.OHCProvider                Fully off-heap row cache implementation (default).
-#   org.apache.cassandra.cache.SerializingCacheProvider   This is the row cache implementation availabile
-#                                                         in previous releases of Cassandra.
+# Row cache implementation class name. Available implementations:
+#
+# org.apache.cassandra.cache.OHCProvider
+#   Fully off-heap row cache implementation (default).
+#
+# org.apache.cassandra.cache.SerializingCacheProvider
+#   This is the row cache implementation availabile
+#   in previous releases of Cassandra.
 # row_cache_class_name: org.apache.cassandra.cache.OHCProvider
 
 # Maximum size of the row cache in memory.
@@ -324,7 +405,7 @@
 # Compression to apply to the commit log. If omitted, the commit log
 # will be written uncompressed.  LZ4, Snappy, and Deflate compressors
 # are supported.
-#commitlog_compression:
+# commitlog_compression:
 #   - class_name: LZ4Compressor
 #     parameters:
 #         -
@@ -361,9 +442,14 @@
 # be limited by the less of concurrent reads or concurrent writes.
 concurrent_materialized_view_writes: 32
 
-# Maximum memory to use for pooling sstable buffers. Defaults to the smaller
-# of 1/4 of heap or 512MB. This pool is allocated off-heap, so is in addition
-# to the memory allocated for heap. Memory is only allocated as needed.
+# Maximum memory to use for sstable chunk cache and buffer pooling.
+# 32MB of this are reserved for pooling buffers, the rest is used as an
+# cache that holds uncompressed sstable chunks.
+# Defaults to the smaller of 1/4 of heap or 512MB. This pool is allocated off-heap,
+# so is in addition to the memory allocated for heap. The cache also has on-heap
+# overhead which is roughly 128 bytes per chunk (i.e. 0.2% of the reserved size
+# if the default 64k chunk size is used).
+# Memory is only allocated when needed.
 # file_cache_size_in_mb: 512
 
 # Flag indicating whether to allocate on or off heap when the sstable buffer
@@ -396,8 +482,15 @@
 
 # Specify the way Cassandra allocates and manages memtable memory.
 # Options are:
-#   heap_buffers:    on heap nio buffers
-#   offheap_buffers: off heap (direct) nio buffers
+#
+# heap_buffers
+#   on heap nio buffers
+#
+# offheap_buffers
+#   off heap (direct) nio buffers
+#
+# offheap_objects
+#    off heap objects
 memtable_allocation_type: heap_buffers
 
 # Total space to use for commit logs on disk.
@@ -413,14 +506,28 @@
 
 # This sets the amount of memtable flush writer threads.  These will
 # be blocked by disk io, and each one will hold a memtable in memory
-# while blocked. 
+# while blocked.
 #
-# memtable_flush_writers defaults to the smaller of (number of disks,
-# number of cores), with a minimum of 2 and a maximum of 8.
-# 
-# If your data directories are backed by SSD, you should increase this
-# to the number of cores.
-#memtable_flush_writers: 8
+# memtable_flush_writers defaults to one per data_file_directory.
+#
+# If your data directories are backed by SSD, you can increase this, but
+# avoid having memtable_flush_writers * data_file_directories > number of cores
+#memtable_flush_writers: 1
+
+# Total space to use for change-data-capture logs on disk.
+#
+# If space gets above this value, Cassandra will throw WriteTimeoutException
+# on Mutations including tables with CDC enabled. A CDCCompactor is responsible
+# for parsing the raw CDC logs and deleting them when parsing is completed.
+#
+# The default value is the min of 4096 mb and 1/8th of the total space
+# of the drive where cdc_raw_directory resides.
+# cdc_total_space_in_mb: 4096
+
+# When we hit our cdc_raw limit and the CDCCompactor is either running behind
+# or experiencing backpressure, we check at the following interval to see if any
+# new space for cdc-tracked tables has been made available. Default to 250ms
+# cdc_free_space_check_interval_ms: 250
 
 # A fixed memory pool size in MB for for SSTable index summaries. If left
 # empty, this will default to 5% of the heap size. If the memory usage of
@@ -456,8 +563,7 @@
 # Address or interface to bind to and tell other Cassandra nodes to connect to.
 # You _must_ change this if you want multiple nodes to be able to communicate!
 #
-# Set listen_address OR listen_interface, not both. Interfaces must correspond
-# to a single address, IP aliasing is not supported.
+# Set listen_address OR listen_interface, not both.
 #
 # Leaving it blank leaves it up to InetAddress.getLocalHost(). This
 # will always do the Right Thing _if_ the node is properly configured
@@ -466,12 +572,16 @@
 #
 # Setting listen_address to 0.0.0.0 is always wrong.
 #
+listen_address: localhost
+
+# Set listen_address OR listen_interface, not both. Interfaces must correspond
+# to a single address, IP aliasing is not supported.
+# listen_interface: eth0
+
 # If you choose to specify the interface by name and the interface has an ipv4 and an ipv6 address
 # you can specify which should be chosen using listen_interface_prefer_ipv6. If false the first ipv4
 # address will be used. If true the first ipv6 address will be used. Defaults to false preferring
 # ipv4. If there is only one address it will be selected regardless of ipv4/ipv6.
-listen_address: localhost
-# listen_interface: eth0
 # listen_interface_prefer_ipv6: false
 
 # Address to broadcast to other Cassandra nodes
@@ -530,8 +640,7 @@
 # The address or interface to bind the Thrift RPC service and native transport
 # server to.
 #
-# Set rpc_address OR rpc_interface, not both. Interfaces must correspond
-# to a single address, IP aliasing is not supported.
+# Set rpc_address OR rpc_interface, not both.
 #
 # Leaving rpc_address blank has the same effect as on listen_address
 # (i.e. it will be based on the configured hostname of the node).
@@ -540,13 +649,16 @@
 # set broadcast_rpc_address to a value other than 0.0.0.0.
 #
 # For security reasons, you should not expose this port to the internet.  Firewall it if needed.
-#
+rpc_address: localhost
+
+# Set rpc_address OR rpc_interface, not both. Interfaces must correspond
+# to a single address, IP aliasing is not supported.
+# rpc_interface: eth1
+
 # If you choose to specify the interface by name and the interface has an ipv4 and an ipv6 address
 # you can specify which should be chosen using rpc_interface_prefer_ipv6. If false the first ipv4
 # address will be used. If true the first ipv6 address will be used. Defaults to false preferring
 # ipv4. If there is only one address it will be selected regardless of ipv4/ipv6.
-rpc_address: localhost
-# rpc_interface: eth1
 # rpc_interface_prefer_ipv6: false
 
 # port for Thrift to listen for clients on
@@ -563,16 +675,18 @@
 
 # Cassandra provides two out-of-the-box options for the RPC Server:
 #
-# sync  -> One thread per thrift connection. For a very large number of clients, memory
-#          will be your limiting factor. On a 64 bit JVM, 180KB is the minimum stack size
-#          per thread, and that will correspond to your use of virtual memory (but physical memory
-#          may be limited depending on use of stack space).
+# sync
+#   One thread per thrift connection. For a very large number of clients, memory
+#   will be your limiting factor. On a 64 bit JVM, 180KB is the minimum stack size
+#   per thread, and that will correspond to your use of virtual memory (but physical memory
+#   may be limited depending on use of stack space).
 #
-# hsha  -> Stands for "half synchronous, half asynchronous." All thrift clients are handled
-#          asynchronously using a small number of threads that does not vary with the amount
-#          of thrift clients (and thus scales well to many clients). The rpc requests are still
-#          synchronous (one thread per active request). If hsha is selected then it is essential
-#          that rpc_max_threads is changed from the default value of unlimited.
+# hsha
+#   Stands for "half synchronous, half asynchronous." All thrift clients are handled
+#   asynchronously using a small number of threads that does not vary with the amount
+#   of thrift clients (and thus scales well to many clients). The rpc requests are still
+#   synchronous (one thread per active request). If hsha is selected then it is essential
+#   that rpc_max_threads is changed from the default value of unlimited.
 #
 # The default is sync because on Windows hsha is about 30% slower.  On Linux,
 # sync/hsha performance is about the same, with hsha of course using less memory.
@@ -601,13 +715,17 @@
 # Uncomment to set socket buffer size for internode communication
 # Note that when setting this, the buffer size is limited by net.core.wmem_max
 # and when not setting it it is defined by net.ipv4.tcp_wmem
-# See:
+# See also:
 # /proc/sys/net/core/wmem_max
 # /proc/sys/net/core/rmem_max
 # /proc/sys/net/ipv4/tcp_wmem
 # /proc/sys/net/ipv4/tcp_wmem
-# and: man tcp
+# and 'man tcp'
 # internode_send_buff_size_in_bytes:
+
+# Uncomment to set socket buffer size for internode communication
+# Note that when setting this, the buffer size is limited by net.core.wmem_max
+# and when not setting it it is defined by net.ipv4.tcp_wmem
 # internode_recv_buff_size_in_bytes:
 
 # Frame size for thrift (maximum message length).
@@ -631,39 +749,26 @@
 # lose data on truncation or drop.
 auto_snapshot: true
 
-# When executing a scan, within or across a partition, we need to keep the
-# tombstones seen in memory so we can return them to the coordinator, which
-# will use them to make sure other replicas also know about the deleted rows.
-# With workloads that generate a lot of tombstones, this can cause performance
-# problems and even exaust the server heap.
-# (http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets)
-# Adjust the thresholds here if you understand the dangers and want to
-# scan more tombstones anyway.  These thresholds may also be adjusted at runtime
-# using the StorageService mbean.
-tombstone_warn_threshold: 1000
-tombstone_failure_threshold: 100000
-
 # Granularity of the collation index of rows within a partition.
 # Increase if your rows are large, or if you have a very large
 # number of rows per partition.  The competing goals are these:
-#   1) a smaller granularity means more index entries are generated
-#      and looking up rows withing the partition by collation column
-#      is faster
-#   2) but, Cassandra will keep the collation index in memory for hot
-#      rows (as part of the key cache), so a larger granularity means
-#      you can cache more hot rows
+#
+# - a smaller granularity means more index entries are generated
+#   and looking up rows withing the partition by collation column
+#   is faster
+# - but, Cassandra will keep the collation index in memory for hot
+#   rows (as part of the key cache), so a larger granularity means
+#   you can cache more hot rows
 column_index_size_in_kb: 64
 
-
-# Log WARN on any batch size exceeding this value. 5kb per batch by default.
-# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
-batch_size_warn_threshold_in_kb: 5
-
-# Fail any batch exceeding this value. 50kb (10x warn threshold) by default.
-batch_size_fail_threshold_in_kb: 50
-
-# Log WARN on any batches not of type LOGGED than span across more partitions than this limit
-unlogged_batch_across_partitions_warn_threshold: 10
+# Per sstable indexed key cache entries (the collation index in memory
+# mentioned above) exceeding this size will not be held on heap.
+# This means that only partition information is held on heap and the
+# index entries are read from disk.
+#
+# Note that this size refers to the size of the
+# serialized index information and not the size of the partition.
+column_index_cache_size_in_kb: 2
 
 # Number of simultaneous compactions to allow, NOT including
 # validation "compactions" for anti-entropy repair.  Simultaneous
@@ -689,9 +794,6 @@
 # of compaction, including validation compaction.
 compaction_throughput_mb_per_sec: 16
 
-# Log a warning when compacting partitions larger than this value
-compaction_large_partition_warning_threshold_mb: 100
-
 # When compacting, the replacement sstable(s) can be opened before they
 # are completely written, and used in place of the prior sstables for
 # any range that has been written. This helps to smoothly transfer reads 
@@ -754,6 +856,7 @@
 
 # endpoint_snitch -- Set this to a class that implements
 # IEndpointSnitch.  The snitch has two functions:
+#
 # - it teaches Cassandra enough about your network topology to route
 #   requests efficiently
 # - it allows Cassandra to spread replicas around your cluster to avoid
@@ -772,34 +875,40 @@
 # under Ec2Snitch (which will locate them in a new "datacenter") and
 # decommissioning the old ones.
 #
-# Out of the box, Cassandra provides
-#  - SimpleSnitch:
+# Out of the box, Cassandra provides:
+#
+# SimpleSnitch:
 #    Treats Strategy order as proximity. This can improve cache
 #    locality when disabling read repair.  Only appropriate for
 #    single-datacenter deployments.
-#  - GossipingPropertyFileSnitch
+#
+# GossipingPropertyFileSnitch
 #    This should be your go-to snitch for production use.  The rack
 #    and datacenter for the local node are defined in
 #    cassandra-rackdc.properties and propagated to other nodes via
 #    gossip.  If cassandra-topology.properties exists, it is used as a
 #    fallback, allowing migration from the PropertyFileSnitch.
-#  - PropertyFileSnitch:
+#
+# PropertyFileSnitch:
 #    Proximity is determined by rack and data center, which are
 #    explicitly configured in cassandra-topology.properties.
-#  - Ec2Snitch:
+#
+# Ec2Snitch:
 #    Appropriate for EC2 deployments in a single Region. Loads Region
 #    and Availability Zone information from the EC2 API. The Region is
 #    treated as the datacenter, and the Availability Zone as the rack.
 #    Only private IPs are used, so this will not work across multiple
 #    Regions.
-#  - Ec2MultiRegionSnitch:
+#
+# Ec2MultiRegionSnitch:
 #    Uses public IPs as broadcast_address to allow cross-region
 #    connectivity.  (Thus, you should set seed addresses to the public
 #    IP as well.) You will need to open the storage_port or
 #    ssl_storage_port on the public IP firewall.  (For intra-Region
 #    traffic, Cassandra will switch to the private IP after
 #    establishing a connection.)
-#  - RackInferringSnitch:
+#
+# RackInferringSnitch:
 #    Proximity is determined by rack and data center, which are
 #    assumed to correspond to the 3rd and 2nd octet of each node's IP
 #    address, respectively.  Unless this happens to match your
@@ -839,20 +948,26 @@
 request_scheduler: org.apache.cassandra.scheduler.NoScheduler
 
 # Scheduler Options vary based on the type of scheduler
-# NoScheduler - Has no options
+#
+# NoScheduler
+#   Has no options
+#
 # RoundRobin
-#  - throttle_limit -- The throttle_limit is the number of in-flight
-#                      requests per client.  Requests beyond 
-#                      that limit are queued up until
-#                      running requests can complete.
-#                      The value of 80 here is twice the number of
-#                      concurrent_reads + concurrent_writes.
-#  - default_weight -- default_weight is optional and allows for
-#                      overriding the default which is 1.
-#  - weights -- Weights are optional and will default to 1 or the
-#               overridden default_weight. The weight translates into how
-#               many requests are handled during each turn of the
-#               RoundRobin, based on the scheduler id.
+#   throttle_limit
+#     The throttle_limit is the number of in-flight
+#     requests per client.  Requests beyond 
+#     that limit are queued up until
+#     running requests can complete.
+#     The value of 80 here is twice the number of
+#     concurrent_reads + concurrent_writes.
+#   default_weight
+#     default_weight is optional and allows for
+#     overriding the default which is 1.
+#   weights
+#     Weights are optional and will default to 1 or the
+#     overridden default_weight. The weight translates into how
+#     many requests are handled during each turn of the
+#     RoundRobin, based on the scheduler id.
 #
 # request_scheduler_options:
 #    throttle_limit: 80
@@ -866,11 +981,15 @@
 # request_scheduler_id: keyspace
 
 # Enable or disable inter-node encryption
-# Default settings are TLS v1, RSA 1024-bit keys (it is imperative that
-# users generate their own keys) TLS_RSA_WITH_AES_128_CBC_SHA as the cipher
-# suite for authentication, key exchange and encryption of the actual data transfers.
-# Use the DHE/ECDHE ciphers if running in FIPS 140 compliant mode.
-# NOTE: No custom encryption options are enabled at the moment
+# JVM defaults for supported SSL socket protocols and cipher suites can
+# be replaced using custom encryption options. This is not recommended
+# unless you have policies in place that dictate certain settings, or
+# need to disable vulnerable ciphers or protocols in case the JVM cannot
+# be updated.
+# FIPS compliant settings can be configured at JVM level and should not
+# involve changing encryption settings here:
+# https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/FIPS.html
+# *NOTE* No custom encryption options are enabled at the moment
 # The available internode options are : all, none, dc, rack
 #
 # If set to dc cassandra will encrypt the traffic between the DCs
@@ -892,6 +1011,7 @@
     # store_type: JKS
     # cipher_suites: [TLS_RSA_WITH_AES_128_CBC_SHA,TLS_RSA_WITH_AES_256_CBC_SHA,TLS_DHE_RSA_WITH_AES_128_CBC_SHA,TLS_DHE_RSA_WITH_AES_256_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA,TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA]
     # require_client_auth: false
+    # require_endpoint_verification: false
 
 # enable or disable client/server encryption.
 client_encryption_options:
@@ -912,10 +1032,17 @@
 
 # internode_compression controls whether traffic between nodes is
 # compressed.
-# can be:  all  - all traffic is compressed
-#          dc   - traffic between different datacenters is compressed
-#          none - nothing is compressed.
-internode_compression: all
+# Can be:
+#
+# all
+#   all traffic is compressed
+#
+# dc
+#   traffic between different datacenters is compressed
+#
+# none
+#   nothing is compressed.
+internode_compression: dc
 
 # Enable or disable tcp_nodelay for inter-dc communication.
 # Disabling it will result in larger (but fewer) network packets being sent,
@@ -931,12 +1058,8 @@
 # This threshold can be adjusted to minimize logging if necessary
 # gc_log_threshold_in_ms: 200
 
-# GC Pauses greater than gc_warn_threshold_in_ms will be logged at WARN level
 # If unset, all GC Pauses greater than gc_log_threshold_in_ms will log at
 # INFO level
-# Adjust the threshold based on your application throughput requirement
-gc_warn_threshold_in_ms: 1000
-
 # UDFs (user defined functions) are disabled by default.
 # As of Cassandra 3.0 there is a sandbox in place that should prevent execution of evil code.
 enable_user_defined_functions: false
@@ -954,6 +1077,69 @@
 # setting.
 windows_timer_interval: 1
 
+
+# Enables encrypting data at-rest (on disk). Different key providers can be plugged in, but the default reads from
+# a JCE-style keystore. A single keystore can hold multiple keys, but the one referenced by
+# the "key_alias" is the only key that will be used for encrypt opertaions; previously used keys
+# can still (and should!) be in the keystore and will be used on decrypt operations
+# (to handle the case of key rotation).
+#
+# It is strongly recommended to download and install Java Cryptography Extension (JCE)
+# Unlimited Strength Jurisdiction Policy Files for your version of the JDK.
+# (current link: http://www.oracle.com/technetwork/java/javase/downloads/jce8-download-2133166.html)
+#
+# Currently, only the following file types are supported for transparent data encryption, although
+# more are coming in future cassandra releases: commitlog, hints
+transparent_data_encryption_options:
+    enabled: false
+    chunk_length_kb: 64
+    cipher: AES/CBC/PKCS5Padding
+    key_alias: testing:1
+    # CBC IV length for AES needs to be 16 bytes (which is also the default size)
+    # iv_length: 16
+    key_provider: 
+      - class_name: org.apache.cassandra.security.JKSKeyProvider
+        parameters: 
+          - keystore: conf/.keystore
+            keystore_password: cassandra
+            store_type: JCEKS
+            key_password: cassandra
+
+
+#####################
+# SAFETY THRESHOLDS #
+#####################
+
+# When executing a scan, within or across a partition, we need to keep the
+# tombstones seen in memory so we can return them to the coordinator, which
+# will use them to make sure other replicas also know about the deleted rows.
+# With workloads that generate a lot of tombstones, this can cause performance
+# problems and even exaust the server heap.
+# (http://www.datastax.com/dev/blog/cassandra-anti-patterns-queues-and-queue-like-datasets)
+# Adjust the thresholds here if you understand the dangers and want to
+# scan more tombstones anyway.  These thresholds may also be adjusted at runtime
+# using the StorageService mbean.
+tombstone_warn_threshold: 1000
+tombstone_failure_threshold: 100000
+
+# Log WARN on any batch size exceeding this value. 5kb per batch by default.
+# Caution should be taken on increasing the size of this threshold as it can lead to node instability.
+batch_size_warn_threshold_in_kb: 5
+
+# Fail any batch exceeding this value. 50kb (10x warn threshold) by default.
+batch_size_fail_threshold_in_kb: 50
+
+# Log WARN on any batches not of type LOGGED than span across more partitions than this limit
+unlogged_batch_across_partitions_warn_threshold: 10
+
+# Log a warning when compacting partitions larger than this value
+compaction_large_partition_warning_threshold_mb: 100
+
+# GC Pauses greater than gc_warn_threshold_in_ms will be logged at WARN level
+# Adjust the threshold based on your application throughput requirement
+# By default, Cassandra logs GC Pauses greater than 200 ms at INFO level
+gc_warn_threshold_in_ms: 1000
+
 # Maximum size of any value in SSTables. Safety measure to detect SSTable corruption
 # early. Any value size larger than this threshold will result into marking an SSTable
 # as corrupted.

diff --git a/conf/cqlshrc.sample b/conf/cqlshrc.sample
index cb02b04..0bf926f 100644
--- a/conf/cqlshrc.sample
+++ b/conf/cqlshrc.sample

@@ -35,9 +35,10 @@
 ;; Display timezone
 ;timezone = Etc/UTC
 
-;; The number of digits displayed after the decimal point
+;; The number of digits displayed after the decimal point for single and double precision numbers
 ;; (note that increasing this to large numbers can result in unusual values)
-; float_precision = 5
+;float_precision = 5
+;double_precision = 12
 
 ;; Used for automatic completion and suggestions
 ; completekey = tab
@@ -76,6 +77,9 @@
 ;; The port to connect to (9042 is the native protocol default)
 port = 9042
 
+;; Always connect using SSL - false by default
+; ssl = true
+
 ;; A timeout in seconds for opening new connections
 ; timeout = 10
 

diff --git a/conf/jvm.options b/conf/jvm.options
index a7b3bd8..692d06b 100644
--- a/conf/jvm.options
+++ b/conf/jvm.options

@@ -8,6 +8,138 @@
 # - dynamic flags will be appended to these on cassandra-env              #
 ###########################################################################
 
+######################
+# STARTUP PARAMETERS #
+######################
+
+# Uncomment any of the following properties to enable specific startup parameters
+
+# In a multi-instance deployment, multiple Cassandra instances will independently assume that all
+# CPU processors are available to it. This setting allows you to specify a smaller set of processors
+# and perhaps have affinity.
+#-Dcassandra.available_processors=number_of_processors
+
+# The directory location of the cassandra.yaml file.
+#-Dcassandra.config=directory
+
+# Sets the initial partitioner token for a node the first time the node is started.
+#-Dcassandra.initial_token=token
+
+# Set to false to start Cassandra on a node but not have the node join the cluster.
+#-Dcassandra.join_ring=true|false
+
+# Set to false to clear all gossip state for the node on restart. Use when you have changed node
+# information in cassandra.yaml (such as listen_address).
+#-Dcassandra.load_ring_state=true|false
+
+# Enable pluggable metrics reporter. See Pluggable metrics reporting in Cassandra 2.0.2.
+#-Dcassandra.metricsReporterConfigFile=file
+
+# Set the port on which the CQL native transport listens for clients. (Default: 9042)
+#-Dcassandra.native_transport_port=port
+
+# Overrides the partitioner. (Default: org.apache.cassandra.dht.Murmur3Partitioner)
+#-Dcassandra.partitioner=partitioner
+
+# To replace a node that has died, restart a new node in its place specifying the address of the
+# dead node. The new node must not have any data in its data directory, that is, it must be in the
+# same state as before bootstrapping.
+#-Dcassandra.replace_address=listen_address or broadcast_address of dead node
+
+# Allow restoring specific tables from an archived commit log.
+#-Dcassandra.replayList=table
+
+# Allows overriding of the default RING_DELAY (1000ms), which is the amount of time a node waits
+# before joining the ring.
+#-Dcassandra.ring_delay_ms=ms
+
+# Set the port for the Thrift RPC service, which is used for client connections. (Default: 9160)
+#-Dcassandra.rpc_port=port
+
+# Set the SSL port for encrypted communication. (Default: 7001)
+#-Dcassandra.ssl_storage_port=port
+
+# Enable or disable the native transport server. See start_native_transport in cassandra.yaml.
+# cassandra.start_native_transport=true|false
+
+# Enable or disable the Thrift RPC server. (Default: true)
+#-Dcassandra.start_rpc=true/false
+
+# Set the port for inter-node communication. (Default: 7000)
+#-Dcassandra.storage_port=port
+
+# Set the default location for the trigger JARs. (Default: conf/triggers)
+#-Dcassandra.triggers_dir=directory
+
+# For testing new compaction and compression strategies. It allows you to experiment with different
+# strategies and benchmark write performance differences without affecting the production workload. 
+#-Dcassandra.write_survey=true
+
+# To disable configuration via JMX of auth caches (such as those for credentials, permissions and
+# roles). This will mean those config options can only be set (persistently) in cassandra.yaml
+# and will require a restart for new values to take effect.
+#-Dcassandra.disable_auth_caches_remote_configuration=true
+
+########################
+# GENERAL JVM SETTINGS #
+########################
+
+# enable assertions.  disabling this in production will give a modest
+# performance benefit (around 5%).
+-ea
+
+# enable thread priorities, primarily so we can give periodic tasks
+# a lower priority to avoid interfering with client workload
+-XX:+UseThreadPriorities
+
+# allows lowering thread priority without being root on linux - probably
+# not necessary on Windows but doesn't harm anything.
+# see http://tech.stolsvik.com/2010/01/linux-java-thread-priorities-workar
+-XX:ThreadPriorityPolicy=42
+
+# Enable heap-dump if there's an OOM
+-XX:+HeapDumpOnOutOfMemoryError
+
+# Per-thread stack size.
+-Xss256k
+
+# Larger interned string table, for gossip's benefit (CASSANDRA-6410)
+-XX:StringTableSize=1000003
+
+# Make sure all memory is faulted and zeroed on startup.
+# This helps prevent soft faults in containers and makes
+# transparent hugepage allocation more effective.
+-XX:+AlwaysPreTouch
+
+# Disable biased locking as it does not benefit Cassandra.
+-XX:-UseBiasedLocking
+
+# Enable thread-local allocation blocks and allow the JVM to automatically
+# resize them at runtime.
+-XX:+UseTLAB
+-XX:+ResizeTLAB
+
+# http://www.evanjones.ca/jvm-mmap-pause.html
+-XX:+PerfDisableSharedMem
+
+# Prefer binding to IPv4 network intefaces (when net.ipv6.bindv6only=1). See
+# http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6342561 (short version:
+# comment out this entry to enable IPv6 support).
+-Djava.net.preferIPv4Stack=true
+
+### Debug options
+
+# uncomment to enable flight recorder
+#-XX:+UnlockCommercialFeatures
+#-XX:+FlightRecorder
+
+# uncomment to have Cassandra JVM listen for remote debuggers/profilers on port 1414
+#-agentlib:jdwp=transport=dt_socket,server=y,suspend=n,address=1414
+
+# uncomment to have Cassandra JVM log internal method compilation (developers only)
+#-XX:+UnlockDiagnosticVMOptions
+#-XX:+LogCompilation
+
 #################
 # HEAP SETTINGS #
 #################

diff --git a/debian/changelog b/debian/changelog
index 69b7cf9..e071140 100644
--- a/debian/changelog
+++ b/debian/changelog

@@ -1,44 +1,32 @@
-cassandra (3.0.8) unstable; urgency=medium
-
-  * New release 
-
- -- Jake Luciani <jake@apache.org>  Tue, 28 Jun 2016 20:12:22 -0400
-
-cassandra (3.0.7) unstable; urgency=medium
+cassandra (3.8) unstable; urgency=medium
 
   * New release
 
- -- Jake Luciani <jake@apache.org>  Mon, 06 Jun 2016 14:27:28 -0400
+ -- Jake Luciani <jake@apache.org>  Mon, 27 Jun 2016 20:40:40 -0400
 
-cassandra (3.0.6) unstable; urgency=medium
+cassandra (3.6) unstable; urgency=medium
 
   * New release
 
- -- Jake Luciani <jake@apache.org>  Tue, 03 May 2016 09:29:31 -0400
+ -- Jake Luciani <jake@apache.org>  Tue, 03 May 2016 09:12:18 -0400
 
-cassandra (3.0.5) unstable; urgency=medium
+cassandra (3.4) unstable; urgency=medium
 
   * New release
 
- -- Jake Luciani <jake@apache.org>  Sat, 02 Apr 2016 07:57:16 -0400
+ -- Jake Luciani <jake@apache.org>  Mon, 29 Feb 2016 10:39:33 -0500
 
-cassandra (3.0.4) unstable; urgency=medium
+cassandra (3.2) unstable; urgency=medium
 
   * New release
 
- -- Jake Luciani <jake@apache.org>  Mon, 29 Feb 2016 10:36:33 -0500
+ -- Jake Luciani <jake@apache.org>  Tue, 05 Jan 2016 10:24:04 -0500
 
-cassandra (3.0.3) unstable; urgency=medium
-
-  * New release 
-
- -- Jake Luciani <jake@apache.org>  Wed, 03 Feb 2016 08:54:57 -0500
-
-cassandra (3.0.1) unstable; urgency=medium
+cassandra (3.1) unstable; urgency=medium
 
   * New release
 
- -- Jake Luciani <jake@apache.org>  Fri, 04 Dec 2015 15:56:02 -0500
+ -- Jake Luciani <jake@apache.org>  Fri, 04 Dec 2015 16:00:04 -0500
 
 cassandra (3.0.0) unstable; urgency=medium
 

diff --git a/doc/Makefile b/doc/Makefile
new file mode 100644
index 0000000..9a736cc
--- /dev/null
+++ b/doc/Makefile

@@ -0,0 +1,256 @@
+# Makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = sphinx-build
+PAPER         =
+BUILDDIR      = build
+
+# Internal variables.
+PAPEROPT_a4     = -D latex_paper_size=a4
+PAPEROPT_letter = -D latex_paper_size=letter
+ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source
+# the i18n builder cannot share the environment and doctrees with the others
+I18NSPHINXOPTS  = $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) source
+
+YAML_DOC_INPUT=../conf/cassandra.yaml
+YAML_DOC_OUTPUT=source/configuration/cassandra_config_file.rst
+
+MAKE_CASSANDRA_YAML = python convert_yaml_to_rst.py $(YAML_DOC_INPUT) $(YAML_DOC_OUTPUT)
+
+.PHONY: help
+help:
+	@echo "Please use \`make <target>' where <target> is one of"
+	@echo "  html       to make standalone HTML files"
+	@echo "  dirhtml    to make HTML files named index.html in directories"
+	@echo "  singlehtml to make a single large HTML file"
+	@echo "  pickle     to make pickle files"
+	@echo "  json       to make JSON files"
+	@echo "  htmlhelp   to make HTML files and a HTML help project"
+	@echo "  qthelp     to make HTML files and a qthelp project"
+	@echo "  applehelp  to make an Apple Help Book"
+	@echo "  devhelp    to make HTML files and a Devhelp project"
+	@echo "  epub       to make an epub"
+	@echo "  epub3      to make an epub3"
+	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
+	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
+	@echo "  latexpdfja to make LaTeX files and run them through platex/dvipdfmx"
+	@echo "  text       to make text files"
+	@echo "  man        to make manual pages"
+	@echo "  texinfo    to make Texinfo files"
+	@echo "  info       to make Texinfo files and run them through makeinfo"
+	@echo "  gettext    to make PO message catalogs"
+	@echo "  changes    to make an overview of all changed/added/deprecated items"
+	@echo "  xml        to make Docutils-native XML files"
+	@echo "  pseudoxml  to make pseudoxml-XML files for display purposes"
+	@echo "  linkcheck  to check all external links for integrity"
+	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
+	@echo "  coverage   to run coverage check of the documentation (if enabled)"
+	@echo "  dummy      to check syntax errors of document sources"
+
+.PHONY: clean
+clean:
+	rm -rf $(BUILDDIR)/*
+	rm $(YAML_DOC_OUTPUT)
+
+.PHONY: html
+html:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
+	@echo
+	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
+
+.PHONY: dirhtml
+dirhtml:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
+	@echo
+	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
+
+.PHONY: singlehtml
+singlehtml:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
+	@echo
+	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
+
+.PHONY: pickle
+pickle:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
+	@echo
+	@echo "Build finished; now you can process the pickle files."
+
+.PHONY: json
+json:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
+	@echo
+	@echo "Build finished; now you can process the JSON files."
+
+.PHONY: htmlhelp
+htmlhelp:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
+	@echo
+	@echo "Build finished; now you can run HTML Help Workshop with the" \
+	      ".hhp project file in $(BUILDDIR)/htmlhelp."
+
+.PHONY: qthelp
+qthelp:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
+	@echo
+	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
+	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
+	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/ApacheCassandraDocumentation.qhcp"
+	@echo "To view the help file:"
+	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/ApacheCassandraDocumentation.qhc"
+
+.PHONY: applehelp
+applehelp:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b applehelp $(ALLSPHINXOPTS) $(BUILDDIR)/applehelp
+	@echo
+	@echo "Build finished. The help book is in $(BUILDDIR)/applehelp."
+	@echo "N.B. You won't be able to view it unless you put it in" \
+	      "~/Library/Documentation/Help or install it in your application" \
+	      "bundle."
+
+.PHONY: devhelp
+devhelp:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
+	@echo
+	@echo "Build finished."
+	@echo "To view the help file:"
+	@echo "# mkdir -p $$HOME/.local/share/devhelp/ApacheCassandraDocumentation"
+	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/ApacheCassandraDocumentation"
+	@echo "# devhelp"
+
+.PHONY: epub
+epub:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
+	@echo
+	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
+
+.PHONY: epub3
+epub3:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b epub3 $(ALLSPHINXOPTS) $(BUILDDIR)/epub3
+	@echo
+	@echo "Build finished. The epub3 file is in $(BUILDDIR)/epub3."
+
+.PHONY: latex
+latex:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+	@echo
+	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
+	@echo "Run \`make' in that directory to run these through (pdf)latex" \
+	      "(use \`make latexpdf' here to do that automatically)."
+
+.PHONY: latexpdf
+latexpdf:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+	@echo "Running LaTeX files through pdflatex..."
+	$(MAKE) -C $(BUILDDIR)/latex all-pdf
+	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
+
+.PHONY: latexpdfja
+latexpdfja:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+	@echo "Running LaTeX files through platex and dvipdfmx..."
+	$(MAKE) -C $(BUILDDIR)/latex all-pdf-ja
+	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
+
+.PHONY: text
+text:
+	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
+	@echo
+	@echo "Build finished. The text files are in $(BUILDDIR)/text."
+
+.PHONY: man
+man:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
+	@echo
+	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
+
+.PHONY: texinfo
+texinfo:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
+	@echo
+	@echo "Build finished. The Texinfo files are in $(BUILDDIR)/texinfo."
+	@echo "Run \`make' in that directory to run these through makeinfo" \
+	      "(use \`make info' here to do that automatically)."
+
+.PHONY: info
+info:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b texinfo $(ALLSPHINXOPTS) $(BUILDDIR)/texinfo
+	@echo "Running Texinfo files through makeinfo..."
+	make -C $(BUILDDIR)/texinfo info
+	@echo "makeinfo finished; the Info files are in $(BUILDDIR)/texinfo."
+
+.PHONY: gettext
+gettext:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b gettext $(I18NSPHINXOPTS) $(BUILDDIR)/locale
+	@echo
+	@echo "Build finished. The message catalogs are in $(BUILDDIR)/locale."
+
+.PHONY: changes
+changes:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
+	@echo
+	@echo "The overview file is in $(BUILDDIR)/changes."
+
+.PHONY: linkcheck
+linkcheck:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
+	@echo
+	@echo "Link check complete; look for any errors in the above output " \
+	      "or in $(BUILDDIR)/linkcheck/output.txt."
+
+.PHONY: doctest
+doctest:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
+	@echo "Testing of doctests in the sources finished, look at the " \
+	      "results in $(BUILDDIR)/doctest/output.txt."
+
+.PHONY: coverage
+coverage:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b coverage $(ALLSPHINXOPTS) $(BUILDDIR)/coverage
+	@echo "Testing of coverage in the sources finished, look at the " \
+	      "results in $(BUILDDIR)/coverage/python.txt."
+
+.PHONY: xml
+xml:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b xml $(ALLSPHINXOPTS) $(BUILDDIR)/xml
+	@echo
+	@echo "Build finished. The XML files are in $(BUILDDIR)/xml."
+
+.PHONY: pseudoxml
+pseudoxml:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b pseudoxml $(ALLSPHINXOPTS) $(BUILDDIR)/pseudoxml
+	@echo
+	@echo "Build finished. The pseudo-XML files are in $(BUILDDIR)/pseudoxml."
+
+.PHONY: dummy
+dummy:
+	$(MAKE_CASSANDRA_YAML)
+	$(SPHINXBUILD) -b dummy $(ALLSPHINXOPTS) $(BUILDDIR)/dummy
+	@echo
+	@echo "Build finished. Dummy builder generates no files."

diff --git a/doc/README.md b/doc/README.md
new file mode 100644
index 0000000..9ba47a1
--- /dev/null
+++ b/doc/README.md

@@ -0,0 +1,31 @@
+Apache Cassandra documentation directory
+========================================
+
+This directory contains the documentation maintained in-tree for Apache
+Cassandra. This directory contains the following documents:
+- The source of the official Cassandra documentation, in the `source/`
+  subdirectory. See below for more details on how to edit/build that
+  documentation.
+- The specification(s) for the supported versions of native transport protocol.
+- Additional documentation on the SASI implementation (`SASI.md`). TODO: we
+  should probably move the first half of that documentation to the general
+  documentation, and the implementation explanation parts into the wiki.
+
+
+Official documentation
+----------------------
+
+The source for the official documentation for Apache Cassandra can be found in
+the `source` subdirectory. The documentation uses [sphinx](http://www.sphinx-doc.org/)
+and is thus written in [reStructuredText](http://docutils.sourceforge.net/rst.html).
+
+To build the HTML documentation, you will need to first install sphinx and the
+[sphinx ReadTheDocs theme](the https://pypi.python.org/pypi/sphinx_rtd_theme), which
+on unix you can do with:
+```
+pip install sphinx sphinx_rtd_theme
+```
+
+The documentation can then be built from this directory by calling `make html`
+(or `make.bat html` on windows). Alternatively, the top-level `ant gen-doc`
+target can be used.

diff --git a/doc/SASI.md b/doc/SASI.md
new file mode 100644
index 0000000..a4762c9
--- /dev/null
+++ b/doc/SASI.md

@@ -0,0 +1,778 @@
+# SASIIndex
+
+[`SASIIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/SASIIndex.java),
+or "SASI" for short, is an implementation of Cassandra's
+`Index` interface that can be used as an alternative to the
+existing implementations. SASI's indexing and querying improves on
+existing implementations by tailoring it specifically to Cassandra's
+needs. SASI has superior performance in cases where queries would
+previously require filtering. In achieving this performance, SASI aims
+to be significantly less resource intensive than existing
+implementations, in memory, disk, and CPU usage. In addition, SASI
+supports prefix and contains queries on strings (similar to SQL's
+`LIKE = "foo*"` or `LIKE = "*foo*"'`).
+
+The following goes on describe how to get up and running with SASI,
+demonstrates usage with examples, and provides some details on its
+implementation.
+
+## Using SASI
+
+The examples below walk through creating a table and indexes on its
+columns, and performing queries on some inserted data. The patchset in
+this repository includes support for the Thrift and CQL3 interfaces.
+
+The examples below assume the `demo` keyspace has been created and is
+in use.
+
+```
+cqlsh> CREATE KEYSPACE demo WITH replication = {
+   ... 'class': 'SimpleStrategy',
+   ... 'replication_factor': '1'
+   ... };
+cqlsh> USE demo;
+```
+
+All examples are performed on the `sasi` table:
+
+```
+cqlsh:demo> CREATE TABLE sasi (id uuid, first_name text, last_name text,
+        ... age int, height int, created_at bigint, primary key (id));
+```
+
+#### Creating Indexes
+
+To create SASI indexes use CQLs `CREATE CUSTOM INDEX` statement:
+
+```
+cqlsh:demo> CREATE CUSTOM INDEX ON sasi (first_name) USING 'org.apache.cassandra.index.sasi.SASIIndex'
+        ... WITH OPTIONS = {
+        ... 'analyzer_class':
+        ...   'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer',
+        ... 'case_sensitive': 'false'
+        ... };
+
+cqlsh:demo> CREATE CUSTOM INDEX ON sasi (last_name) USING 'org.apache.cassandra.index.sasi.SASIIndex'
+        ... WITH OPTIONS = {'mode': 'CONTAINS'};
+
+cqlsh:demo> CREATE CUSTOM INDEX ON sasi (age) USING 'org.apache.cassandra.index.sasi.SASIIndex';
+
+cqlsh:demo> CREATE CUSTOM INDEX ON sasi (created_at) USING 'org.apache.cassandra.index.sasi.SASIIndex'
+        ...  WITH OPTIONS = {'mode': 'SPARSE'};
+```
+
+The indexes created have some options specified that customize their
+behaviour and potentially performance. The index on `first_name` is
+case-insensitive. The analyzers are discussed more in a subsequent
+example. The `NonTokenizingAnalyzer` performs no analysis on the
+text. Each index has a mode: `PREFIX`, `CONTAINS`, or `SPARSE`, the
+first being the default. The `last_name` index is created with the
+mode `CONTAINS` which matches terms on suffixes instead of prefix
+only. Examples of this are available below and more detail can be
+found in the section on
+[OnDiskIndex](#ondiskindexbuilder).The
+`created_at` column is created with its mode set to `SPARSE`, which is
+meant to improve performance of querying large, dense number ranges
+like timestamps for data inserted every millisecond. Details of the
+`SPARSE` implementation can also be found in the section on the
+[OnDiskIndex](#ondiskindexbuilder). The `age`
+index is created with the default `PREFIX` mode and no
+case-sensitivity or text analysis options are specified since the
+field is numeric.
+
+After inserting the following data and performing a `nodetool flush`,
+SASI performing index flushes to disk can be seen in Cassandra's logs
+-- although the direct call to flush is not required (see
+[IndexMemtable](#indexmemtable) for more details).
+
+```
+cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
+        ... VALUES (556ebd54-cbe5-4b75-9aae-bf2a31a24500, 'Pavel', 'Yaskevich', 27, 181, 1442959315018);
+
+cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
+        ... VALUES (5770382a-c56f-4f3f-b755-450e24d55217, 'Jordan', 'West', 26, 173, 1442959315019);
+
+cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
+        ... VALUES (96053844-45c3-4f15-b1b7-b02c441d3ee1, 'Mikhail', 'Stepura', 36, 173, 1442959315020);
+
+cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
+        ... VALUES (f5dfcabe-de96-4148-9b80-a1c41ed276b4, 'Michael', 'Kjellman', 26, 180, 1442959315021);
+
+cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
+        ... VALUES (2970da43-e070-41a8-8bcb-35df7a0e608a, 'Johnny', 'Zhang', 32, 175, 1442959315022);
+
+cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
+        ... VALUES (6b757016-631d-4fdb-ac62-40b127ccfbc7, 'Jason', 'Brown', 40, 182, 1442959315023);
+
+cqlsh:demo> INSERT INTO sasi (id, first_name, last_name, age, height, created_at)
+        ... VALUES (8f909e8a-008e-49dd-8d43-1b0df348ed44, 'Vijay', 'Parthasarathy', 34, 183, 1442959315024);
+
+cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi;
+
+ first_name | last_name     | age | height | created_at
+------------+---------------+-----+--------+---------------
+    Michael |      Kjellman |  26 |    180 | 1442959315021
+    Mikhail |       Stepura |  36 |    173 | 1442959315020
+      Jason |         Brown |  40 |    182 | 1442959315023
+      Pavel |     Yaskevich |  27 |    181 | 1442959315018
+      Vijay | Parthasarathy |  34 |    183 | 1442959315024
+     Jordan |          West |  26 |    173 | 1442959315019
+     Johnny |         Zhang |  32 |    175 | 1442959315022
+
+(7 rows)
+```
+
+#### Equality & Prefix Queries
+
+SASI supports all queries already supported by CQL, including LIKE statement
+for PREFIX, CONTAINS and SUFFIX searches.
+
+```
+cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi
+        ... WHERE first_name = 'Pavel';
+
+  first_name | last_name | age | height | created_at
+-------------+-----------+-----+--------+---------------
+       Pavel | Yaskevich |  27 |    181 | 1442959315018
+
+(1 rows)
+```
+
+```
+cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi
+       ... WHERE first_name = 'pavel';
+
+  first_name | last_name | age | height | created_at
+-------------+-----------+-----+--------+---------------
+       Pavel | Yaskevich |  27 |    181 | 1442959315018
+
+(1 rows)
+```
+
+```
+cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi
+        ... WHERE first_name LIKE 'M%';
+
+ first_name | last_name | age | height | created_at
+------------+-----------+-----+--------+---------------
+    Michael |  Kjellman |  26 |    180 | 1442959315021
+    Mikhail |   Stepura |  36 |    173 | 1442959315020
+
+(2 rows)
+```
+
+Of course, the case of the query does not matter for the `first_name`
+column because of the options provided at index creation time.
+
+```
+cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi
+        ... WHERE first_name LIKE 'm%';
+
+ first_name | last_name | age | height | created_at
+------------+-----------+-----+--------+---------------
+    Michael |  Kjellman |  26 |    180 | 1442959315021
+    Mikhail |   Stepura |  36 |    173 | 1442959315020
+
+(2 rows)
+```
+
+#### Compound Queries
+
+SASI supports queries with multiple predicates, however, due to the
+nature of the default indexing implementation, CQL requires the user
+to specify `ALLOW FILTERING` to opt-in to the potential performance
+pitfalls of such a query. With SASI, while the requirement to include
+`ALLOW FILTERING` remains, to reduce modifications to the grammar, the
+performance pitfalls do not exist because filtering is not
+performed. Details on how SASI joins data from multiple predicates is
+available below in the
+[Implementation Details](#implementation-details)
+section.
+
+```
+cqlsh:demo> SELECT first_name, last_name, age, height, created_at FROM sasi
+        ... WHERE first_name LIKE 'M%' and age < 30 ALLOW FILTERING;
+
+ first_name | last_name | age | height | created_at
+------------+-----------+-----+--------+---------------
+    Michael |  Kjellman |  26 |    180 | 1442959315021
+
+(1 rows)
+```
+
+#### Suffix Queries
+
+The next example demonstrates `CONTAINS` mode on the `last_name`
+column. By using this mode predicates can search for any strings
+containing the search string as a sub-string. In this case the strings
+containing "a" or "an".
+
+```
+cqlsh:demo> SELECT * FROM sasi WHERE last_name LIKE '%a%';
+
+ id                                   | age | created_at    | first_name | height | last_name
+--------------------------------------+-----+---------------+------------+--------+---------------
+ f5dfcabe-de96-4148-9b80-a1c41ed276b4 |  26 | 1442959315021 |    Michael |    180 |      Kjellman
+ 96053844-45c3-4f15-b1b7-b02c441d3ee1 |  36 | 1442959315020 |    Mikhail |    173 |       Stepura
+ 556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | 1442959315018 |      Pavel |    181 |     Yaskevich
+ 8f909e8a-008e-49dd-8d43-1b0df348ed44 |  34 | 1442959315024 |      Vijay |    183 | Parthasarathy
+ 2970da43-e070-41a8-8bcb-35df7a0e608a |  32 | 1442959315022 |     Johnny |    175 |         Zhang
+
+(5 rows)
+
+cqlsh:demo> SELECT * FROM sasi WHERE last_name LIKE '%an%';
+
+ id                                   | age | created_at    | first_name | height | last_name
+--------------------------------------+-----+---------------+------------+--------+-----------
+ f5dfcabe-de96-4148-9b80-a1c41ed276b4 |  26 | 1442959315021 |    Michael |    180 |  Kjellman
+ 2970da43-e070-41a8-8bcb-35df7a0e608a |  32 | 1442959315022 |     Johnny |    175 |     Zhang
+
+(2 rows)
+```
+
+#### Expressions on Non-Indexed Columns
+
+SASI also supports filtering on non-indexed columns like `height`. The
+expression can only narrow down an existing query using `AND`.
+
+```
+cqlsh:demo> SELECT * FROM sasi WHERE last_name LIKE '%a%' AND height >= 175 ALLOW FILTERING;
+
+ id                                   | age | created_at    | first_name | height | last_name
+--------------------------------------+-----+---------------+------------+--------+---------------
+ f5dfcabe-de96-4148-9b80-a1c41ed276b4 |  26 | 1442959315021 |    Michael |    180 |      Kjellman
+ 556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | 1442959315018 |      Pavel |    181 |     Yaskevich
+ 8f909e8a-008e-49dd-8d43-1b0df348ed44 |  34 | 1442959315024 |      Vijay |    183 | Parthasarathy
+ 2970da43-e070-41a8-8bcb-35df7a0e608a |  32 | 1442959315022 |     Johnny |    175 |         Zhang
+
+(4 rows)
+```
+
+#### Text Analysis (Tokenization and Stemming)
+
+Lastly, to demonstrate text analysis an additional column is needed on
+the table. Its definition, index, and statements to update rows are shown below.
+
+```
+cqlsh:demo> ALTER TABLE sasi ADD bio text;
+cqlsh:demo> CREATE CUSTOM INDEX ON sasi (bio) USING 'org.apache.cassandra.index.sasi.SASIIndex'
+        ... WITH OPTIONS = {
+        ... 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer',
+        ... 'tokenization_enable_stemming': 'true',
+        ... 'analyzed': 'true',
+        ... 'tokenization_normalize_lowercase': 'true',
+        ... 'tokenization_locale': 'en'
+        ... };
+cqlsh:demo> UPDATE sasi SET bio = 'Software Engineer, who likes distributed systems, doesnt like to argue.' WHERE id = 5770382a-c56f-4f3f-b755-450e24d55217;
+cqlsh:demo> UPDATE sasi SET bio = 'Software Engineer, works on the freight distribution at nights and likes arguing' WHERE id = 556ebd54-cbe5-4b75-9aae-bf2a31a24500;
+cqlsh:demo> SELECT * FROM sasi;
+
+ id                                   | age | bio                                                                              | created_at    | first_name | height | last_name
+--------------------------------------+-----+----------------------------------------------------------------------------------+---------------+------------+--------+---------------
+ f5dfcabe-de96-4148-9b80-a1c41ed276b4 |  26 |                                                                             null | 1442959315021 |    Michael |    180 |      Kjellman
+ 96053844-45c3-4f15-b1b7-b02c441d3ee1 |  36 |                                                                             null | 1442959315020 |    Mikhail |    173 |       Stepura
+ 6b757016-631d-4fdb-ac62-40b127ccfbc7 |  40 |                                                                             null | 1442959315023 |      Jason |    182 |         Brown
+ 556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | Software Engineer, works on the freight distribution at nights and likes arguing | 1442959315018 |      Pavel |    181 |     Yaskevich
+ 8f909e8a-008e-49dd-8d43-1b0df348ed44 |  34 |                                                                             null | 1442959315024 |      Vijay |    183 | Parthasarathy
+ 5770382a-c56f-4f3f-b755-450e24d55217 |  26 |          Software Engineer, who likes distributed systems, doesnt like to argue. | 1442959315019 |     Jordan |    173 |          West
+ 2970da43-e070-41a8-8bcb-35df7a0e608a |  32 |                                                                             null | 1442959315022 |     Johnny |    175 |         Zhang
+
+(7 rows)
+```
+
+Index terms and query search strings are stemmed for the `bio` column
+because it was configured to use the
+[`StandardAnalyzer`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzer.java)
+and `analyzed` is set to `true`. The
+`tokenization_normalize_lowercase` is similar to the `case_sensitive`
+property but for the
+[`StandardAnalyzer`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzer.java). These
+query demonstrates the stemming applied by [`StandardAnalyzer`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzer.java).
+
+```
+cqlsh:demo> SELECT * FROM sasi WHERE bio LIKE 'distributing';
+
+ id                                   | age | bio                                                                              | created_at    | first_name | height | last_name
+--------------------------------------+-----+----------------------------------------------------------------------------------+---------------+------------+--------+-----------
+ 556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | Software Engineer, works on the freight distribution at nights and likes arguing | 1442959315018 |      Pavel |    181 | Yaskevich
+ 5770382a-c56f-4f3f-b755-450e24d55217 |  26 |          Software Engineer, who likes distributed systems, doesnt like to argue. | 1442959315019 |     Jordan |    173 |      West
+
+(2 rows)
+
+cqlsh:demo> SELECT * FROM sasi WHERE bio LIKE 'they argued';
+
+ id                                   | age | bio                                                                              | created_at    | first_name | height | last_name
+--------------------------------------+-----+----------------------------------------------------------------------------------+---------------+------------+--------+-----------
+ 556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | Software Engineer, works on the freight distribution at nights and likes arguing | 1442959315018 |      Pavel |    181 | Yaskevich
+ 5770382a-c56f-4f3f-b755-450e24d55217 |  26 |          Software Engineer, who likes distributed systems, doesnt like to argue. | 1442959315019 |     Jordan |    173 |      West
+
+(2 rows)
+
+cqlsh:demo> SELECT * FROM sasi WHERE bio LIKE 'working at the company';
+
+ id                                   | age | bio                                                                              | created_at    | first_name | height | last_name
+--------------------------------------+-----+----------------------------------------------------------------------------------+---------------+------------+--------+-----------
+ 556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | Software Engineer, works on the freight distribution at nights and likes arguing | 1442959315018 |      Pavel |    181 | Yaskevich
+
+(1 rows)
+
+cqlsh:demo> SELECT * FROM sasi WHERE bio LIKE 'soft eng';
+
+ id                                   | age | bio                                                                              | created_at    | first_name | height | last_name
+--------------------------------------+-----+----------------------------------------------------------------------------------+---------------+------------+--------+-----------
+ 556ebd54-cbe5-4b75-9aae-bf2a31a24500 |  27 | Software Engineer, works on the freight distribution at nights and likes arguing | 1442959315018 |      Pavel |    181 | Yaskevich
+ 5770382a-c56f-4f3f-b755-450e24d55217 |  26 |          Software Engineer, who likes distributed systems, doesnt like to argue. | 1442959315019 |     Jordan |    173 |      West
+
+(2 rows)
+```
+
+## Implementation Details
+
+While SASI, at the surface, is simply an implementation of the
+`Index` interface, at its core there are several data
+structures and algorithms used to satisfy it. These are described
+here. Additionally, the changes internal to Cassandra to support SASIs
+integration are described.
+
+The `Index` interface divides responsibility of the
+implementer into two parts: Indexing and Querying. Further, Cassandra
+makes it possible to divide those responsibilities into the memory and
+disk components. SASI takes advantage of Cassandra's write-once,
+immutable, ordered data model to build indexes along with the flushing
+of the memtable to disk -- this is the origin of the name "SSTable
+Attached Secondary Index".
+
+The SASI index data structures are built in memory as the SSTable is
+being written and they are flushed to disk before the writing of the
+SSTable completes. The writing of each index file only requires
+sequential writes to disk. In some cases, partial flushes are
+performed, and later stitched back together, to reduce memory
+usage. These data structures are optimized for this use case.
+
+Taking advantage of Cassandra's ordered data model, at query time,
+candidate indexes are narrowed down for searching minimize the amount
+of work done. Searching is then performed using an efficient method
+that streams data off disk as needed.
+
+### Indexing
+
+Per SSTable, SASI writes an index file for each indexed column. The
+data for these files is built in memory using the
+[`OnDiskIndexBuilder`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndexBuilder.java). Once
+flushed to disk, the data is read using the
+[`OnDiskIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java)
+class. These are composed of bytes representing indexed terms,
+organized for efficient writing or searching respectively. The keys
+and values they hold represent tokens and positions in an SSTable and
+these are stored per-indexed term in
+[`TokenTreeBuilder`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTreeBuilder.java)s
+for writing, and
+[`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)s
+for querying. These index files are memory mapped after being written
+to disk, for quicker access. For indexing data in the memtable SASI
+uses its
+[`IndexMemtable`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/IndexMemtable.java)
+class.
+
+#### OnDiskIndex(Builder)
+
+Each
+[`OnDiskIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java)
+is an instance of a modified
+[Suffix Array](https://en.wikipedia.org/wiki/Suffix_array) data
+structure. The
+[`OnDiskIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java)
+is comprised of page-size blocks of sorted terms and pointers to the
+terms' associated data, as well as the data itself, stored also in one
+or more page-sized blocks. The
+[`OnDiskIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java)
+is structured as a tree of arrays, where each level describes the
+terms in the level below, the final level being the terms
+themselves. The `PointerLevel`s and their `PointerBlock`s contain
+terms and pointers to other blocks that *end* with those terms. The
+`DataLevel`, the final level, and its `DataBlock`s contain terms and
+point to the data itself, contained in [`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)s.
+
+The terms written to the
+[`OnDiskIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java)
+vary depending on its "mode": either `PREFIX`, `CONTAINS`, or
+`SPARSE`. In the `PREFIX` and `SPARSE` cases terms exact values are
+written exactly once per `OnDiskIndex`. For example, a `PREFIX` index
+with terms `Jason`, `Jordan`, `Pavel`, all three will be included in
+the index. A `CONTAINS` index writes additional terms for each suffix of
+each term recursively. Continuing with the example, a `CONTAINS` index
+storing the previous terms would also store `ason`, `ordan`, `avel`,
+`son`, `rdan`, `vel`, etc. This allows for queries on the suffix of
+strings. The `SPARSE` mode differs from `PREFIX` in that for every 64
+blocks of terms a
+[`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)
+is built merging all the
+[`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)s
+for each term into a single one. This copy of the data is used for
+efficient iteration of large ranges of e.g. timestamps. The index
+"mode" is configurable per column at index creation time.
+
+#### TokenTree(Builder)
+
+The
+[`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)
+is an implementation of the well-known
+[B+-tree](https://en.wikipedia.org/wiki/B%2B_tree) that has been
+modified to optimize for its use-case. In particular, it has been
+optimized to associate tokens, longs, with a set of positions in an
+SSTable, also longs. Allowing the set of long values accommodates
+the possibility of a hash collision in the token, but the data
+structure is optimized for the unlikely possibility of such a
+collision.
+
+To optimize for its write-once environment the
+[`TokenTreeBuilder`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTreeBuilder.java)
+completely loads its interior nodes as the tree is built and it uses
+the well-known algorithm optimized for bulk-loading the data
+structure.
+
+[`TokenTree`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java)s provide the means to iterate a tokens, and file
+positions, that match a given term, and to skip forward in that
+iteration, an operation used heavily at query time.
+
+#### IndexMemtable
+
+The
+[`IndexMemtable`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/IndexMemtable.java)
+handles indexing the in-memory data held in the memtable. The
+[`IndexMemtable`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/IndexMemtable.java)
+in turn manages either a
+[`TrieMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/TrieMemIndex.java)
+or a
+[`SkipListMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/SkipListMemIndex.java)
+per-column. The choice of which index type is used is data
+dependent. The
+[`TrieMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/TrieMemIndex.java)
+is used for literal types. `AsciiType` and `UTF8Type` are literal
+types by defualt but any column can be configured as a literal type
+using the `is_literal` option at index creation time. For non-literal
+types the
+[`SkipListMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/SkipListMemIndex.java)
+is used. The
+[`TrieMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/TrieMemIndex.java)
+is an implementation that can efficiently support prefix queries on
+character-like data. The
+[`SkipListMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/SkipListMemIndex.java),
+conversely, is better suited for Cassandra other data types like
+numbers.
+
+The
+[`TrieMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/TrieMemIndex.java)
+is built using either the `ConcurrentRadixTree` or
+`ConcurrentSuffixTree` from the `com.goooglecode.concurrenttrees`
+package. The choice between the two is made based on the indexing
+mode, `PREFIX` or other modes, and `CONTAINS` mode, respectively.
+
+The
+[`SkipListMemIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/memory/SkipListMemIndex.java)
+is built on top of `java.util.concurrent.ConcurrentSkipListSet`.
+
+### Querying
+
+Responsible for converting the internal `IndexExpression`
+representation into SASI's
+[`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)
+and
+[`Expression`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Expression.java)
+tree, optimizing the tree to reduce the amount of work done, and
+driving the query itself the
+[`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
+is the work horse of SASI's querying implementation. To efficiently
+perform union and intersection operations SASI provides several
+iterators similar to Cassandra's `MergeIterator` but tailored
+specifically for SASIs use, and with more features. The
+[`RangeUnionIterator`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeUnionIterator.java),
+like its name suggests, performs set union over sets of tokens/keys
+matching the query, only reading as much data as it needs from each
+set to satisfy the query. The
+[`RangeIntersectionIterator`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java),
+similar to its counterpart, performs set intersection over its data.
+
+#### QueryPlan
+
+The
+[`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
+instantiated per search query is at the core of SASIs querying
+implementation. Its work can be divided in two stages: analysis and
+execution.
+
+During the analysis phase,
+[`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
+converts from Cassandra's internal representation of
+`IndexExpression`s, which has also been modified to support encoding
+queries that contain ORs and groupings of expressions using
+parentheses (see the
+[Cassandra Internal Changes](#cassandra-internal-changes)
+section below for more details). This process produces a tree of
+[`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)s, which in turn may contain [`Expression`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Expression.java)s, all of which
+provide an alternative, more efficient, representation of the query.
+
+During execution the
+[`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
+uses the `DecoratedKey`-generating iterator created from the
+[`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java) tree. These keys are read from disk and a final check to
+ensure they satisfy the query is made, once again using the
+[`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java) tree. At the point the desired amount of matching data has
+been found, or there is no more matching data, the result set is
+returned to the coordinator through the existing internal components.
+
+The number of queries (total/failed/timed-out), and their latencies,
+are maintined per-table/column family.
+
+SASI also supports concurrently iterating terms for the same index
+accross SSTables. The concurrency factor is controlled by the
+`cassandra.search_concurrency_factor` system property. The default is
+`1`.
+
+##### QueryController
+
+Each
+[`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
+references a
+[`QueryController`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java)
+used throughout the execution phase. The
+[`QueryController`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java)
+has two responsibilities: to manage and ensure the proper cleanup of
+resources (indexes), and to strictly enforce the time bound for query,
+specified by the user via the range slice timeout. All indexes are
+accessed via the
+[`QueryController`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java)
+so that they can be safely released by it later. The
+[`QueryController`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java)'s
+`checkpoint` function is called in specific places in the execution
+path to ensure the time-bound is enforced.
+
+##### QueryPlan Optimizations
+
+While in the analysis phase, the
+[`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
+performs several potential optimizations to the query. The goal of
+these optimizations is to reduce the amount of work performed during
+the execution phase.
+
+The simplest optimization performed is compacting multiple expressions
+joined by logical intersection (`AND`) into a single [`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java) with
+three or more [`Expression`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Expression.java)s. For example, the query `WHERE age < 100 AND
+fname = 'p*' AND first_name != 'pa*' AND age > 21` would,
+without modification, have the following tree:
+
+                          ┌───────┐
+                 ┌────────│  AND  │──────┐
+                 │        └───────┘      │
+                 ▼                       ▼
+              ┌───────┐             ┌──────────┐
+        ┌─────│  AND  │─────┐       │age < 100 │
+        │     └───────┘     │       └──────────┘
+        ▼                   ▼
+    ┌──────────┐          ┌───────┐
+    │ fname=p* │        ┌─│  AND  │───┐
+    └──────────┘        │ └───────┘   │
+                        ▼             ▼
+                    ┌──────────┐  ┌──────────┐
+                    │fname!=pa*│  │ age > 21 │
+                    └──────────┘  └──────────┘
+
+[`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
+will remove the redundant right branch whose root is the final `AND`
+and has leaves `fname != pa*` and `age > 21`. These [`Expression`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Expression.java)s will
+be compacted into the parent `AND`, a safe operation due to `AND`
+being associative and commutative. The resulting tree looks like the
+following:
+
+                                  ┌───────┐
+                         ┌────────│  AND  │──────┐
+                         │        └───────┘      │
+                         ▼                       ▼
+                      ┌───────┐             ┌──────────┐
+          ┌───────────│  AND  │────────┐    │age < 100 │
+          │           └───────┘        │    └──────────┘
+          ▼               │            ▼
+    ┌──────────┐          │      ┌──────────┐
+    │ fname=p* │          ▼      │ age > 21 │
+    └──────────┘    ┌──────────┐ └──────────┘
+                    │fname!=pa*│
+                    └──────────┘
+
+When excluding results from the result set, using `!=`, the
+[`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
+determines the best method for handling it. For range queries, for
+example, it may be optimal to divide the range into multiple parts
+with a hole for the exclusion. For string queries, such as this one,
+it is more optimal, however, to simply note which data to skip, or
+exclude, while scanning the index. Following this optimization the
+tree looks like this:
+
+                                   ┌───────┐
+                          ┌────────│  AND  │──────┐
+                          │        └───────┘      │
+                          ▼                       ▼
+                       ┌───────┐             ┌──────────┐
+               ┌───────│  AND  │────────┐    │age < 100 │
+               │       └───────┘        │    └──────────┘
+               ▼                        ▼
+        ┌──────────────────┐         ┌──────────┐
+        │     fname=p*     │         │ age > 21 │
+        │ exclusions=[pa*] │         └──────────┘
+        └──────────────────┘
+
+The last type of optimization applied, for this query, is to merge
+range expressions across branches of the tree -- without modifying the
+meaning of the query, of course. In this case, because the query
+contains all `AND`s the `age` expressions can be collapsed. Along with
+this optimization, the initial collapsing of unneeded `AND`s can also
+be applied once more to result in this final tree using to execute the
+query:
+
+                            ┌───────┐
+                     ┌──────│  AND  │───────┐
+                     │      └───────┘       │
+                     ▼                      ▼
+           ┌──────────────────┐    ┌────────────────┐
+           │     fname=p*     │    │ 21 < age < 100 │
+           │ exclusions=[pa*] │    └────────────────┘
+           └──────────────────┘
+
+#### Operations and Expressions
+
+As discussed, the
+[`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
+optimizes a tree represented by
+[`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)s
+as interior nodes, and
+[`Expression`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Expression.java)s
+as leaves. The
+[`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)
+class, more specifically, can have zero, one, or two
+[`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)s
+as children and an unlimited number of expressions. The iterators used
+to perform the queries, discussed below in the
+"Range(Union|Intersection)Iterator" section, implement the necessary
+logic to merge results transparently regardless of the
+[`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)s
+children.
+
+Besides participating in the optimizations performed by the
+[`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java),
+[`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)
+is also responsible for taking a row that has been returned by the
+query and making a final validation that it in fact does match. This
+`satisfiesBy` operation is performed recursively from the root of the
+[`Operation`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java)
+tree for a given query. These checks are performed directly on the
+data in a given row. For more details on how `satisfiesBy` works see
+the documentation
+[in the code](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/Operation.java#L87-L123).
+
+#### Range(Union|Intersection)Iterator
+
+The abstract `RangeIterator` class provides a unified interface over
+the two main operations performed by SASI at various layers in the
+execution path: set intersection and union. These operations are
+performed in a iterated, or "streaming", fashion to prevent unneeded
+reads of elements from either set. In both the intersection and union
+cases the algorithms take advantage of the data being pre-sorted using
+the same sort order, e.g. term or token order.
+
+The
+[`RangeUnionIterator`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeUnionIterator.java)
+performs the "Merge-Join" portion of the
+[Sort-Merge-Join](https://en.wikipedia.org/wiki/Sort-merge_join)
+algorithm, with the properties of an outer-join, or union. It is
+implemented with several optimizations to improve its performance over
+a large number of iterators -- sets to union. Specifically, the
+iterator exploits the likely case of the data having many sub-groups
+of overlapping ranges and the unlikely case that all ranges will
+overlap each other. For more details see the
+[javadoc](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeUnionIterator.java#L9-L21).
+
+The
+[`RangeIntersectionIterator`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java)
+itself is not a subclass of `RangeIterator`. It is a container for
+several classes, one of which, `AbstractIntersectionIterator`,
+sub-classes `RangeIterator`. SASI supports two methods of performing
+the intersection operation, and the ability to be adaptive in choosing
+between them based on some properties of the data.
+
+`BounceIntersectionIterator`, and the `BOUNCE` strategy, works like
+the
+[`RangeUnionIterator`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeUnionIterator.java)
+in that it performs a "Merge-Join", however, its nature is similar to
+a inner-join, where like values are merged by a data-specific merge
+function (e.g. merging two tokens in a list to lookup in a SSTable
+later). See the
+[javadoc](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L88-L101)
+for more details on its implementation.
+
+`LookupIntersectionIterator`, and the `LOOKUP` strategy, performs a
+different operation, more similar to a lookup in an associative data
+structure, or "hash lookup" in database terminology. Once again,
+details on the implementation can be found in the
+[javadoc](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java#L199-L208).
+
+The choice between the two iterators, or the `ADAPTIVE` strategy, is
+based upon the ratio of data set sizes of the minimum and maximum
+range of the sets being intersected. If the number of the elements in
+minimum range divided by the number of elements is the maximum range
+is less than or equal to `0.01`, then the `ADAPTIVE` strategy chooses
+the `LookupIntersectionIterator`, otherwise the
+`BounceIntersectionIterator` is chosen.
+
+### The SASIIndex Class
+
+The above components are glued together by the
+[`SASIIndex`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/SASIIndex.java)
+class which implements `Index`, and is instantiated
+per-table containing SASI indexes. It manages all indexes for a table
+via the
+[`sasi.conf.DataTracker`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java)
+and
+[`sasi.conf.view.View`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/conf/view/View.java)
+components, controls writing of all indexes for an SSTable via its
+[`PerSSTableIndexWriter`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriter.java), and initiates searches with
+`Searcher`. These classes glue the previously
+mentioned indexing components together with Cassandra's SSTable
+life-cycle ensuring indexes are not only written when Memtable's flush
+but also as SSTable's are compacted. For querying, the
+`Searcher` does little but defer to
+[`QueryPlan`](https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java)
+and update e.g. latency metrics exposed by SASI.
+
+### Cassandra Internal Changes
+
+To support the above changes and integrate them into Cassandra a few
+minor internal changes were made to Cassandra itself. These are
+described here.
+
+#### SSTable Write Life-cycle Notifications
+
+The `SSTableFlushObserver` is an observer pattern-like interface,
+whose sub-classes can register to be notified about events in the
+life-cycle of writing out a SSTable. Sub-classes can be notified when a
+flush begins and ends, as well as when each next row is about to be
+written, and each next column. SASI's `PerSSTableIndexWriter`,
+discussed above, is the only current subclass.
+
+### Limitations and Caveats
+
+The following are items that can be addressed in future updates but are not
+available in this repository or are not currently implemented.
+
+* The cluster must be configured to use a partitioner that produces
+  `LongToken`s, e.g. `Murmur3Partitioner`. Other existing partitioners which
+  don't produce LongToken e.g. `ByteOrderedPartitioner` and `RandomPartitioner`
+  will not work with SASI.
+* Not Equals and OR support have been removed in this release while
+  changes are made to Cassandra itself to support them.
+
+### Contributors
+
+* [Pavel Yaskevich](https://github.com/xedin)
+* [Jordan West](https://github.com/jrwest)
+* [Michael Kjellman](https://github.com/mkjellman)
+* [Jason Brown](https://github.com/jasobrown)
+* [Mikhail Stepura](https://github.com/mishail)

diff --git a/doc/convert_yaml_to_rst.py b/doc/convert_yaml_to_rst.py
new file mode 100644
index 0000000..fee6d8c
--- /dev/null
+++ b/doc/convert_yaml_to_rst.py

@@ -0,0 +1,144 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""
+A script to convert cassandra.yaml into ReStructuredText for
+the online documentation.
+
+Usage:
+
+    convert_yaml_to_rest.py conf/cassandra.yaml docs/source/conf.rst
+"""
+
+import sys
+import re
+
+# Detects options, whether commented or uncommented.
+# Group 1 will be non-empty if the option is commented out.
+# Group 2 will contain the option name.
+# Group 3 will contain the default value, if one exists.
+option_re = re.compile(r"^(# ?)?([a-z0-9_]+): ?([^/].*)")
+
+# Detects normal comment lines.
+commented_re = re.compile(r"^# ?(.*)")
+
+# A set of option names that have complex values (i.e. lists or dicts).
+# This list is hardcoded because there did not seem to be another
+# good way to reliably detect this case, especially considering
+# that these can be commented out (making it useless to use a yaml parser).
+COMPLEX_OPTIONS = (
+    'seed_provider',
+    'request_scheduler_options',
+    'data_file_directories',
+    'commitlog_compression',
+    'hints_compression',
+    'server_encryption_options',
+    'client_encryption_options',
+    'transparent_data_encryption_options',
+    'hinted_handoff_disabled_datacenters'
+)
+
+
+def convert(yaml_file, dest_file):
+    with open(yaml_file, 'r') as f:
+        # Trim off the boilerplate header
+        lines = f.readlines()[7:]
+
+    with open(dest_file, 'w') as outfile:
+        outfile.write("Cassandra Configuration File\n")
+        outfile.write("============================\n")
+
+        # since comments preceed an option, this holds all of the comment
+        # lines we've seen since the last option
+        comments_since_last_option = []
+        line_iter = iter(lines)
+        while True:
+            try:
+                line = next(line_iter)
+            except StopIteration:
+                break
+
+            match = option_re.match(line)
+            if match:
+                option_name = match.group(2)
+                is_commented = bool(match.group(1))
+
+                is_complex = option_name in COMPLEX_OPTIONS
+                complex_option = read_complex_option(line_iter) if is_complex else None
+
+                write_section_header(option_name, outfile)
+                write_comments(comments_since_last_option, is_commented, outfile)
+                if is_complex:
+                    write_complex_option(complex_option, outfile)
+                else:
+                    maybe_write_default_value(match, outfile)
+                comments_since_last_option = []
+            else:
+                comment_match = commented_re.match(line)
+                if comment_match:
+                    comments_since_last_option.append(comment_match.group(1))
+                elif line == "\n":
+                    comments_since_last_option.append('')
+
+
+def write_section_header(option_name, outfile):
+    outfile.write("\n")
+    outfile.write("``%s``\n" % (option_name,))
+    outfile.write("-" * (len(option_name) + 4) + "\n")
+
+
+def write_comments(comment_lines, is_commented, outfile):
+    if is_commented:
+        outfile.write("*This option is commented out by default.*\n")
+
+    for comment in comment_lines:
+        if "SAFETY THRESHOLDS" not in comment_lines:
+            outfile.write(comment + "\n")
+
+
+def maybe_write_default_value(option_match, outfile):
+    default_value = option_match.group(3)
+    if default_value and default_value != "\n":
+        outfile.write("\n*Default Value:* %s\n" % (default_value,))
+
+
+def read_complex_option(line_iter):
+    option_lines = []
+    try:
+        while True:
+            line = next(line_iter)
+            if line == '\n':
+                return option_lines
+            else:
+                option_lines.append(line)
+    except StopIteration:
+        return option_lines
+
+
+def write_complex_option(lines, outfile):
+    outfile.write("\n*Default Value (complex option)*::\n\n")
+    for line in lines:
+        outfile.write((" " * 4) + line)
+
+
+if __name__ == '__main__':
+    if len(sys.argv) != 3:
+        print >> sys.stderr, "Usage: %s <yaml source file> <rst dest file>" % (sys.argv[0],)
+        sys.exit(1)
+
+    yaml_file = sys.argv[1]
+    dest_file = sys.argv[2]
+    convert(yaml_file, dest_file)

diff --git a/doc/cql3/CQL.textile b/doc/cql3/CQL.textile
index 2a37452..e2fee84 100644
--- a/doc/cql3/CQL.textile
+++ b/doc/cql3/CQL.textile

@@ -1,6 +1,6 @@
 <link rel="StyleSheet" href="CQL.css" type="text/css" media="screen">
 
-h1. Cassandra Query Language (CQL) v3.4.0
+h1. Cassandra Query Language (CQL) v3.4.2
 
 
 
@@ -399,7 +399,9 @@
 
 <instruction> ::= ALTER <identifier> TYPE <type>
                 | ADD   <identifier> <type>
+                | ADD   ( <identifier> <type> ( , <identifier> <type> )* )
                 | DROP  <identifier>
+                | DROP  ( <identifier> ( , <identifier> )* )
                 | WITH  <option> ( AND <option> )*
 p. 
 __Sample:__
@@ -418,11 +420,33 @@
 The @ALTER@ statement is used to manipulate table definitions. It allows for adding new columns, dropping existing ones, changing the type of existing columns, or updating the table options. As with table creation, @ALTER COLUMNFAMILY@ is allowed as an alias for @ALTER TABLE@.
 
 The @<tablename>@ is the table name optionally preceded by the keyspace name.  The @<instruction>@ defines the alteration to perform:
-* @ALTER@: Update the type of a given defined column. Note that the type of the "clustering columns":#createTablepartitionClustering cannot be modified as it induces the on-disk ordering of rows. Columns on which a "secondary index":#createIndexStmt is defined have the same restriction. Other columns are free from those restrictions (no validation of existing data is performed), but it is usually a bad idea to change the type to a non-compatible one, unless no data have been inserted for that column yet, as this could confuse CQL drivers/tools.
+* @ALTER@: Update the type of a given defined column. Note that the type of the "clustering columns":#createTablepartitionClustering can be modified only in very limited cases, as it induces the on-disk ordering of rows. Columns on which a "secondary index":#createIndexStmt is defined have the same restriction. To change the type of any other column, the column must already exist in type definition and its type should be compatible with the new type. No validation of existing data is performed. The compatibility table is available below.
 * @ADD@: Adds a new column to the table. The @<identifier>@ for the new column must not conflict with an existing column. Moreover, columns cannot be added to tables defined with the @COMPACT STORAGE@ option.
 * @DROP@: Removes a column from the table. Dropped columns will immediately become unavailable in the queries and will not be included in compacted sstables in the future. If a column is readded, queries won't return values written before the column was last dropped. It is assumed that timestamps represent actual time, so if this is not your case, you should NOT readd previously dropped columns. Columns can't be dropped from tables defined with the @COMPACT STORAGE@ option.
 * @WITH@: Allows to update the options of the table. The "supported @<option>@":#createTableOptions (and syntax) are the same as for the @CREATE TABLE@ statement except that @COMPACT STORAGE@ is not supported. Note that setting any @compaction@ sub-options has the effect of erasing all previous @compaction@ options, so you  need to re-specify all the sub-options if you want to keep them. The same note applies to the set of @compression@ sub-options.
 
+h4. CQL type compatibility:
+
+CQL data types may be converted only as the following table.
+
+|_. Data type may be altered to:|_.Data type|
+|timestamp|bigint|
+|ascii, bigint, boolean, date, decimal, double, float, inet, int, smallint, text, time, timestamp, timeuuid, tinyint, uuid, varchar, varint|blob|
+|int|date|
+|ascii, varchar|text|
+|bigint|time|
+|bigint|timestamp|
+|timeuuid|uuid|
+|ascii, text|varchar|
+|bigint, int, timestamp|varint|
+
+Clustering columns have stricter requirements, only the below conversions are allowed.
+
+|_. Data type may be altered to:|_.Data type|
+|ascii, text, varchar|blob|
+|ascii, varchar|text|
+|ascii, text|varchar|
+
 h3(#dropTableStmt). DROP TABLE
 
 __Syntax:__
@@ -829,10 +853,7 @@
 
 <names-list> ::= '(' <identifier> ( ',' <identifier> )* ')'
 
-<value-list> ::= '(' <term-or-literal> ( ',' <term-or-literal> )* ')'
-
-<term-or-literal> ::= <term>
-                    | <collection-literal>
+<value-list> ::= '(' <term> ( ',' <term> )* ')'
 
 <option> ::= TIMESTAMP <integer>
            | TTL <integer>
@@ -871,13 +892,17 @@
                | <identifier> '=' <identifier> ('+' | '-') (<int-term> | <set-literal> | <list-literal>)
                | <identifier> '=' <identifier> '+' <map-literal>
                | <identifier> '[' <term> ']' '=' <term>
+               | <identifier> '.' <field> '=' <term>
 
 <condition> ::= <identifier> <op> <term>
-              | <identifier> IN (<variable> | '(' ( <term> ( ',' <term> )* )? ')')
+              | <identifier> IN <in-values>
               | <identifier> '[' <term> ']' <op> <term>
-              | <identifier> '[' <term> ']' IN <term>
+              | <identifier> '[' <term> ']' IN <in-values>
+              | <identifier> '.' <field> <op> <term>
+              | <identifier> '.' <field> IN <in-values>
 
 <op> ::= '<' | '<=' | '=' | '!=' | '>=' | '>'
+<in-values> ::= (<variable> | '(' ( <term> ( ',' <term> )* )? ')')
 
 <where-clause> ::= <relation> ( AND <relation> )*
 
@@ -914,11 +939,13 @@
 
 The @id = id + <collection-literal>@ and @id[value1] = value2@ forms of @<assignment>@ are for collections. Please refer to the "relevant section":#collections for more details.
 
+The @id.field = <term>@ form of @<assignemt>@ is for setting the value of a single field on a non-frozen user-defined types.
+
 h4(#updateOptions). @<options>@
 
 The @UPDATE@ and @INSERT@ statements support the following options:
 * @TIMESTAMP@: sets the timestamp for the operation. If not specified, the coordinator will use the current time (in microseconds) at the start of statement execution as the timestamp. This is usually a suitable default.
-* @TTL@: specifies an optional Time To Live (in seconds) for the inserted values. If set, the inserted values are automatically removed from the database after the specified time. Note that the TTL concerns the inserted values, not the columns themselves. This means that any subsequent update of the column will also reset the TTL (to whatever TTL is specified in that update). By default, values never expire. A TTL of 0 or a negative value is equivalent to no TTL.
+* @TTL@: specifies an optional Time To Live (in seconds) for the inserted values. If set, the inserted values are automatically removed from the database after the specified time. Note that the TTL concerns the inserted values, not the columns themselves. This means that any subsequent update of the column will also reset the TTL (to whatever TTL is specified in that update). By default, values never expire. A TTL of 0 is equivalent to no TTL. If the table has a default_time_to_live, a TTL of 0 will remove the TTL for the inserted or updated values.
 
 
 h3(#deleteStmt). DELETE
@@ -932,7 +959,9 @@
                   WHERE <where-clause>
                   ( IF ( EXISTS | ( <condition> ( AND <condition> )*) ) )?
 
-<selection> ::= <identifier> ( '[' <term> ']' )?
+<selection> ::= <identifier>
+              | <identifier> '[' <term> ']'
+              | <identifier> '.' <field>
 
 <where-clause> ::= <relation> ( AND <relation> )*
 
@@ -944,11 +973,14 @@
              | '(' <identifier> (',' <identifier>)* ')' IN <variable>
 
 <op> ::= '=' | '<' | '>' | '<=' | '>='
+<in-values> ::= (<variable> | '(' ( <term> ( ',' <term> )* )? ')')
 
 <condition> ::= <identifier> (<op> | '!=') <term>
-              | <identifier> IN (<variable> | '(' ( <term> ( ',' <term> )* )? ')')
+              | <identifier> IN <in-values>
               | <identifier> '[' <term> ']' (<op> | '!=') <term>
-              | <identifier> '[' <term> ']' IN <term>
+              | <identifier> '[' <term> ']' IN <in-values>
+              | <identifier> '.' <field> (<op> | '!=') <term>
+              | <identifier> '.' <field> IN <in-values>
 
 p. 
 __Sample:__
@@ -958,7 +990,7 @@
 
 DELETE phone FROM Users WHERE userid IN (C73DE1D3-AF08-40F3-B124-3FF3E5109F22, B70DE1D0-9908-4AE3-BE34-5573E5B09F14);
 p. 
-The @DELETE@ statement deletes columns and rows. If column names are provided directly after the @DELETE@ keyword, only those columns are deleted from the row indicated by the @<where-clause>@ (the @id[value]@ syntax in @<selection>@ is for collection, please refer to the "collection section":#collections for more details).  Otherwise, whole rows are removed. The @<where-clause>@ specifies which rows are to be deleted.  Multiple rows may be deleted with one statement by using an @IN@ clause.  A range of rows may be deleted using an inequality operator (such as @>=@).
+The @DELETE@ statement deletes columns and rows. If column names are provided directly after the @DELETE@ keyword, only those columns are deleted from the row indicated by the @<where-clause>@.  The @id[value]@ syntax in @<selection>@ is for non-frozen collections (please refer to the "collection section":#collections for more details).  The @id.field@ syntax is for the deletion of non-frozen user-defined types.  Otherwise, whole rows are removed. The @<where-clause>@ specifies which rows are to be deleted.  Multiple rows may be deleted with one statement by using an @IN@ clause.  A range of rows may be deleted using an inequality operator (such as @>=@).
 
 @DELETE@ supports the @TIMESTAMP@ option with the same semantics as the "@UPDATE@":#updateStmt statement.
 
@@ -1030,18 +1062,21 @@
                   FROM <tablename>
                   ( WHERE <where-clause> )?
                   ( ORDER BY <order-by> )?
+                  ( PER PARTITION LIMIT <integer> )?
                   ( LIMIT <integer> )?
                   ( ALLOW FILTERING )?
 
 <select-clause> ::= DISTINCT? <selection-list>
-                  | COUNT '(' ( '*' | '1' ) ')' (AS <identifier>)?
 
 <selection-list> ::= <selector> (AS <identifier>)? ( ',' <selector> (AS <identifier>)? )*
                    | '*'
 
 <selector> ::= <identifier>
+             | <term>
              | WRITETIME '(' <identifier> ')'
+             | COUNT '(' '*' ')'
              | TTL '(' <identifier> ')'
+             | CAST '(' <selector> AS <type> ')'
              | <function> '(' (<selector> (',' <selector>)*)? ')'
 
 <where-clause> ::= <relation> ( AND <relation> )*
@@ -1083,7 +1118,7 @@
 
 The @<select-clause>@ determines which columns needs to be queried and returned in the result-set. It consists of either the comma-separated list of <selector> or the wildcard character (@*@) to select all the columns defined for the table.
 
-A @<selector>@ is either a column name to retrieve or a @<function>@ of one or more @<term>@s. The function allowed are the same as for @<term>@ and are described in the "function section":#functions. In addition to these generic functions, the @WRITETIME@ (resp. @TTL@) function allows to select the timestamp of when the column was inserted (resp. the time to live (in seconds) for the column (or null if the column has no expiration set)).
+A @<selector>@ is either a column name to retrieve or a @<function>@ of one or more @<term>@s. The function allowed are the same as for @<term>@ and are described in the "function section":#functions. In addition to these generic functions, the @WRITETIME@ (resp. @TTL@) function allows to select the timestamp of when the column was inserted (resp. the time to live (in seconds) for the column (or null if the column has no expiration set)) and the "@CAST@":#castFun function can be used to convert one data type to another.
 
 Any @<selector>@ can be aliased using @AS@ keyword (see examples). Please note that @<where-clause>@ and @<order-by>@ clause should refer to the columns by their original names and not by their aliases.
 
@@ -1147,9 +1182,9 @@
 * if the table has been defined without any specific @CLUSTERING ORDER@, then then allowed orderings are the order induced by the clustering columns and the reverse of that one.
 * otherwise, the orderings allowed are the order of the @CLUSTERING ORDER@ option and the reversed one.
 
-h4(#selectLimit). @LIMIT@
+h4(#selectLimit). @LIMIT@ and @PER PARTITION LIMIT@
 
-The @LIMIT@ option to a @SELECT@ statement limits the number of rows returned by a query.
+The @LIMIT@ option to a @SELECT@ statement limits the number of rows returned by a query, while the @PER PARTITION LIMIT@ option limits the number of rows returned for a given partition by the query. Note that both type of limit can used in the same statement.
 
 h4(#selectAllowFiltering). @ALLOW FILTERING@
 
@@ -1440,6 +1475,7 @@
 * The hierarchy of Data resources, Keyspaces and Tables has the structure @ALL KEYSPACES@ -> @KEYSPACE@ -> @TABLE@
 * Function resources have the structure @ALL FUNCTIONS@ -> @KEYSPACE@ -> @FUNCTION@
 * Resources representing roles have the structure @ALL ROLES@ -> @ROLE@
+* Resources representing JMX ObjectNames, which map to sets of MBeans/MXBeans, have the structure @ALL MBEANS@ -> @MBEAN@
 
 Permissions can be granted at any level of these hierarchies and they flow downwards. So granting a permission on a resource higher up the chain automatically grants that same permission on all resources lower down. For example, granting @SELECT@ on a @KEYSPACE@ automatically grants it on all @TABLES@ in that @KEYSPACE@. Likewise, granting a permission on @ALL FUNCTIONS@ grants it on every defined function, regardless of which keyspace it is scoped in. It is also possible to grant permissions on all functions scoped to a particular keyspace. 
 
@@ -1455,7 +1491,7 @@
 * @DESCRIBE@
 * @EXECUTE@
 
-Not all permissions are applicable to every type of resource. For instance, @EXECUTE@ is only relevant in the context of functions; granting @EXECUTE@ on a resource representing a table is nonsensical. Attempting to @GRANT@ a permission on resource to which it cannot be applied results in an error response. The following illustrates which permissions can be granted on which types of resource, and which statements are enabled by that permission.
+Not all permissions are applicable to every type of resource. For instance, @EXECUTE@ is only relevant in the context of functions or mbeans; granting @EXECUTE@ on a resource representing a table is nonsensical. Attempting to @GRANT@ a permission on resource to which it cannot be applied results in an error response. The following illustrates which permissions can be granted on which types of resource, and which statements are enabled by that permission.
 
 |_. permission |_. resource                   |_. operations        |
 | @CREATE@     | @ALL KEYSPACES@              |@CREATE KEYSPACE@ ==<br>== @CREATE TABLE@ in any keyspace|
@@ -1482,9 +1518,15 @@
 | @SELECT@     | @ALL KEYSPACES@              |@SELECT@ on any table|
 | @SELECT@     | @KEYSPACE@                   |@SELECT@ on any table in keyspace|
 | @SELECT@     | @TABLE@                      |@SELECT@ on specified table|
+| @SELECT@     | @ALL MBEANS@                 |Call getter methods on any mbean|
+| @SELECT@     | @MBEANS@                     |Call getter methods on any mbean matching a wildcard pattern|
+| @SELECT@     | @MBEAN@                      |Call getter methods on named mbean|
 | @MODIFY@     | @ALL KEYSPACES@              |@INSERT@ on any table ==<br>== @UPDATE@ on any table ==<br>== @DELETE@ on any table ==<br>== @TRUNCATE@ on any table|
-| @MODIFY@     | @KEYSPACE@                  |@INSERT@ on any table in keyspace ==<br>== @UPDATE@ on any table in keyspace ==<br>  == @DELETE@ on any table in keyspace ==<br>== @TRUNCATE@ on any table in keyspace
+| @MODIFY@     | @KEYSPACE@                   |@INSERT@ on any table in keyspace ==<br>== @UPDATE@ on any table in keyspace ==<br>  == @DELETE@ on any table in keyspace ==<br>== @TRUNCATE@ on any table in keyspace
 | @MODIFY@     | @TABLE@                      |@INSERT@ ==<br>== @UPDATE@ ==<br>== @DELETE@ ==<br>== @TRUNCATE@|
+| @MODIFY@     | @ALL MBEANS@                 |Call setter methods on any mbean|
+| @MODIFY@     | @MBEANS@                     |Call setter methods on any mbean matching a wildcard pattern|
+| @MODIFY@     | @MBEAN@                      |Call setter methods on named mbean|
 | @AUTHORIZE@  | @ALL KEYSPACES@              |@GRANT PERMISSION@ on any table ==<br>== @REVOKE PERMISSION@ on any table|
 | @AUTHORIZE@  | @KEYSPACE@                   |@GRANT PERMISSION@ on table in keyspace ==<br>== @REVOKE PERMISSION@ on table in keyspace|
 | @AUTHORIZE@  | @TABLE@                      |@GRANT PERMISSION@ ==<br>== @REVOKE PERMISSION@ |
@@ -1492,12 +1534,21 @@
 | @AUTHORIZE@  | @ALL FUNCTIONS IN KEYSPACE@  |@GRANT PERMISSION@ in keyspace ==<br>== @REVOKE PERMISSION@ in keyspace|
 | @AUTHORIZE@  | @ALL FUNCTIONS IN KEYSPACE@  |@GRANT PERMISSION@ in keyspace ==<br>== @REVOKE PERMISSION@ in keyspace|
 | @AUTHORIZE@  | @FUNCTION@                   |@GRANT PERMISSION@ ==<br>== @REVOKE PERMISSION@|
+| @AUTHORIZE@  | @ALL MBEANS@                 |@GRANT PERMISSION@ on any mbean ==<br>== @REVOKE PERMISSION@ on any mbean|
+| @AUTHORIZE@  | @MBEANS@                     |@GRANT PERMISSION@ on any mbean matching a wildcard pattern ==<br>== @REVOKE PERMISSION@ on any mbean matching a wildcard pattern|
+| @AUTHORIZE@  | @MBEAN@                      |@GRANT PERMISSION@ on named mbean ==<br>== @REVOKE PERMISSION@ on named mbean|
 | @AUTHORIZE@  | @ALL ROLES@                  |@GRANT ROLE@ grant any role ==<br>== @REVOKE ROLE@ revoke any role|
 | @AUTHORIZE@  | @ROLES@                      |@GRANT ROLE@ grant role ==<br>== @REVOKE ROLE@ revoke role|
 | @DESCRIBE@   | @ALL ROLES@                  |@LIST ROLES@ all roles or only roles granted to another, specified role|
+| @DESCRIBE@   | @ALL MBEANS                  |Retrieve metadata about any mbean from the platform's MBeanServer|
+| @DESCRIBE@   | @MBEANS                      |Retrieve metadata about any mbean matching a wildcard patter from the platform's MBeanServer|
+| @DESCRIBE@   | @MBEAN                       |Retrieve metadata about a named mbean from the platform's MBeanServer|
 | @EXECUTE@    | @ALL FUNCTIONS@              |@SELECT@, @INSERT@, @UPDATE@ using any function ==<br>== use of any function in @CREATE AGGREGATE@|
 | @EXECUTE@    | @ALL FUNCTIONS IN KEYSPACE@  |@SELECT@, @INSERT@, @UPDATE@ using any function in keyspace ==<br>== use of any function in keyspace in @CREATE AGGREGATE@|
 | @EXECUTE@    | @FUNCTION@                   |@SELECT@, @INSERT@, @UPDATE@ using function ==<br>== use of function in @CREATE AGGREGATE@|
+| @EXECUTE@    | @ALL MBEANS@                 |Execute operations on any mbean|
+| @EXECUTE@    | @MBEANS@                     |Execute operations on any mbean matching a wildcard pattern|
+| @EXECUTE@    | @MBEAN@                      |Execute operations on named mbean|
 
 
 h3(#grantPermissionsStmt). GRANT PERMISSION
@@ -1516,6 +1567,8 @@
              | ROLE <identifier>
              | ALL FUNCTIONS ( IN KEYSPACE <identifier> )?
              | FUNCTION <functionname>
+             | ALL MBEANS
+             | ( MBEAN | MBEANS ) <objectName>
 p. 
 
 __Sample:__ 
@@ -1570,7 +1623,9 @@
              | ROLE <identifier>
              | ALL FUNCTIONS ( IN KEYSPACE <identifier> )?
              | FUNCTION <functionname>
-p. 
+             | ALL MBEANS
+             | ( MBEAN | MBEANS ) <objectName>
+p.
 
 __Sample:__ 
 
@@ -1598,7 +1653,9 @@
              | ROLE <identifier>
              | ALL FUNCTIONS ( IN KEYSPACE <identifier> )?
              | FUNCTION <functionname>
-p. 
+             | ALL MBEANS
+             | ( MBEAN | MBEANS ) <objectName>
+p.
 
 __Sample:__
 
@@ -1619,7 +1676,9 @@
 
 h2(#types). Data Types
 
-CQL supports a rich set of data types for columns defined in a table, including collection types. On top of those native and collection types, users can also provide custom types (through a JAVA class extending @AbstractType@ loadable by Cassandra). The syntax of types is thus:
+CQL supports a rich set of data types for columns defined in a table, including collection types. On top of those native
+and collection types, users can also provide custom types (through a JAVA class extending @AbstractType@ loadable by
+Cassandra). The syntax of types is thus:
 
 bc(syntax).. 
 <type> ::= <native-type>
@@ -1866,6 +1925,37 @@
 
 CQL3 distinguishes between built-in functions (so called 'native functions') and "user-defined functions":#udfs.  CQL3 includes several native functions, described below:
 
+h3(#castFun). Cast
+
+The @cast@ function can be used to converts one native datatype to another.
+
+The following table describes the conversions supported by the @cast@ function. Cassandra will silently ignore any cast converting a datatype into its own datatype.
+
+|_. from    |_. to   |
+|@ascii@   |@text@, @varchar@                                                                                    |
+|@bigint@   |@tinyint@, @smallint@, @int@, @float@, @double@, @decimal@, @varint@, @text@, @varchar@             |
+|@boolean@  |@text@, @varchar@                                                                                   |
+|@counter@  |@tinyint@, @smallint@, @int@, @bigint@, @float@, @double@, @decimal@, @varint@, @text@, @varchar@   |
+|@date@      |@timestamp@                                                                                        |
+|@decimal@  |@tinyint@, @smallint@, @int@, @bigint@, @float@, @double@, @varint@, @text@, @varchar@              |
+|@double@   |@tinyint@, @smallint@, @int@, @bigint@, @float@, @decimal@, @varint@, @text@, @varchar@             |
+|@float@     |@tinyint@, @smallint@, @int@, @bigint@, @double@, @decimal@, @varint@, @text@, @varchar@           |
+|@inet@      |@text@, @varchar@                                                                                  |
+|@int@       |@tinyint@, @smallint@, @bigint@, @float@, @double@, @decimal@, @varint@, @text@, @varchar@         |
+|@smallint@ |@tinyint@, @int@, @bigint@, @float@, @double@, @decimal@, @varint@, @text@, @varchar@               |
+|@time@      |@text@, @varchar@                                                                                  |
+|@timestamp@|@date@, @text@, @varchar@                                                                           |
+|@timeuuid@ |@timestamp@, @date@, @text@, @varchar@                                                              |
+|@tinyint@  |@tinyint@, @smallint@, @int@, @bigint@, @float@, @double@, @decimal@, @varint@, @text@, @varchar@   |
+|@uuid@      |@text@, @varchar@                                                                                  |
+|@varint@   |@tinyint@, @smallint@, @int@, @bigint@, @float@, @double@, @decimal@, @text@, @varchar@             |
+
+
+The conversions rely strictly on Java's semantics. For example, the double value 1 will be converted to the text value '1.0'.
+
+bc(sample). 
+SELECT avg(cast(count as double)) FROM myTable
+
 h3(#tokenFun). Token
 
 The @token@ function allows to compute the token for a given partition key. The exact signature of the token function depends on the table concerned and of the partitioner used by the cluster.
@@ -2016,6 +2106,49 @@
 
 User-defined functions can be used in "@SELECT@":#selectStmt, "@INSERT@":#insertStmt and "@UPDATE@":#updateStmt statements.
 
+The implicitly available @udfContext@ field (or binding for script UDFs) provides the neccessary functionality to create new UDT and tuple values.
+
+bc(sample). 
+CREATE TYPE custom_type (txt text, i int);
+CREATE FUNCTION fct_using_udt ( somearg int )
+  RETURNS NULL ON NULL INPUT
+  RETURNS custom_type
+  LANGUAGE java
+  AS $$
+    UDTValue udt = udfContext.newReturnUDTValue();
+    udt.setString("txt", "some string");
+    udt.setInt("i", 42);
+    return udt;
+  $$;
+
+The definition of the @UDFContext@ interface can be found in the Apache Cassandra source code for @org.apache.cassandra.cql3.functions.UDFContext@.
+
+bc(sample). 
+public interface UDFContext
+{
+    UDTValue newArgUDTValue(String argName);
+    UDTValue newArgUDTValue(int argNum);
+    UDTValue newReturnUDTValue();
+    UDTValue newUDTValue(String udtName);
+    TupleValue newArgTupleValue(String argName);
+    TupleValue newArgTupleValue(int argNum);
+    TupleValue newReturnTupleValue();
+    TupleValue newTupleValue(String cqlDefinition);
+}
+
+Java UDFs already have some imports for common interfaces and classes defined. These imports are:
+Please note, that these convenience imports are not available for script UDFs.
+
+bc(sample). 
+import java.nio.ByteBuffer;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import org.apache.cassandra.cql3.functions.UDFContext;
+import com.datastax.driver.core.TypeCodec;
+import com.datastax.driver.core.TupleValue;
+import com.datastax.driver.core.UDTValue;
+
 See "@CREATE FUNCTION@":#createFunctionStmt and "@DROP FUNCTION@":#dropFunctionStmt.
 
 h2(#udas). User-Defined Aggregates
@@ -2280,6 +2413,18 @@
 
 The following describes the changes in each version of CQL.
 
+h3. 3.4.2
+
+* "@INSERT/UPDATE options@":#updateOptions for tables having a default_time_to_live specifying a TTL of 0 will remove the TTL from the inserted or updated values
+* "@ALTER TABLE@":#alterTableStmt @ADD@ and @DROP@ now allow mutiple columns to be added/removed
+* New "@PER PARTITION LIMIT@":#selectLimit option (see "CASSANDRA-7017":https://issues.apache.org/jira/browse/CASSANDRA-7017).
+* "User-defined functions":#udfs can now instantiate @UDTValue@ and @TupleValue@ instances via the new @UDFContext@ interface (see "CASSANDRA-10818":https://issues.apache.org/jira/browse/CASSANDRA-10818).
+* "User-defined types"#createTypeStmt may now be stored in a non-frozen form, allowing individual fields to be updated and deleted in "@UPDATE@ statements":#updateStmt and "@DELETE@ statements":#deleteStmt, respectively. ("CASSANDRA-7423":https://issues.apache.org/jira/browse/CASSANDRA-7423)
+
+h3. 3.4.1
+
+* Adds @CAST@ functions. See "@Cast@":#castFun.
+
 h3. 3.4.0
 
 * Support for "materialized views":#createMVStmt

diff --git a/doc/make.bat b/doc/make.bat
new file mode 100644
index 0000000..93671fc
--- /dev/null
+++ b/doc/make.bat

@@ -0,0 +1,281 @@
+@ECHO OFF

+

+REM Command file for Sphinx documentation

+

+if "%SPHINXBUILD%" == "" (

+	set SPHINXBUILD=sphinx-build

+)

+set BUILDDIR=build

+set ALLSPHINXOPTS=-d %BUILDDIR%/doctrees %SPHINXOPTS% .

+set I18NSPHINXOPTS=%SPHINXOPTS% .

+if NOT "%PAPER%" == "" (

+	set ALLSPHINXOPTS=-D latex_paper_size=%PAPER% %ALLSPHINXOPTS%

+	set I18NSPHINXOPTS=-D latex_paper_size=%PAPER% %I18NSPHINXOPTS%

+)

+

+if "%1" == "" goto help

+

+if "%1" == "help" (

+	:help

+	echo.Please use `make ^<target^>` where ^<target^> is one of

+	echo.  html       to make standalone HTML files

+	echo.  dirhtml    to make HTML files named index.html in directories

+	echo.  singlehtml to make a single large HTML file

+	echo.  pickle     to make pickle files

+	echo.  json       to make JSON files

+	echo.  htmlhelp   to make HTML files and a HTML help project

+	echo.  qthelp     to make HTML files and a qthelp project

+	echo.  devhelp    to make HTML files and a Devhelp project

+	echo.  epub       to make an epub

+	echo.  epub3      to make an epub3

+	echo.  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter

+	echo.  text       to make text files

+	echo.  man        to make manual pages

+	echo.  texinfo    to make Texinfo files

+	echo.  gettext    to make PO message catalogs

+	echo.  changes    to make an overview over all changed/added/deprecated items

+	echo.  xml        to make Docutils-native XML files

+	echo.  pseudoxml  to make pseudoxml-XML files for display purposes

+	echo.  linkcheck  to check all external links for integrity

+	echo.  doctest    to run all doctests embedded in the documentation if enabled

+	echo.  coverage   to run coverage check of the documentation if enabled

+	echo.  dummy      to check syntax errors of document sources

+	goto end

+)

+

+if "%1" == "clean" (

+	for /d %%i in (%BUILDDIR%\*) do rmdir /q /s %%i

+	del /q /s %BUILDDIR%\*

+	goto end

+)

+

+

+REM Check if sphinx-build is available and fallback to Python version if any

+%SPHINXBUILD% 1>NUL 2>NUL

+if errorlevel 9009 goto sphinx_python

+goto sphinx_ok

+

+:sphinx_python

+

+set SPHINXBUILD=python -m sphinx.__init__

+%SPHINXBUILD% 2> nul

+if errorlevel 9009 (

+	echo.

+	echo.The 'sphinx-build' command was not found. Make sure you have Sphinx

+	echo.installed, then set the SPHINXBUILD environment variable to point

+	echo.to the full path of the 'sphinx-build' executable. Alternatively you

+	echo.may add the Sphinx directory to PATH.

+	echo.

+	echo.If you don't have Sphinx installed, grab it from

+	echo.http://sphinx-doc.org/

+	exit /b 1

+)

+

+:sphinx_ok

+

+

+if "%1" == "html" (

+	%SPHINXBUILD% -b html %ALLSPHINXOPTS% %BUILDDIR%/html

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. The HTML pages are in %BUILDDIR%/html.

+	goto end

+)

+

+if "%1" == "dirhtml" (

+	%SPHINXBUILD% -b dirhtml %ALLSPHINXOPTS% %BUILDDIR%/dirhtml

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. The HTML pages are in %BUILDDIR%/dirhtml.

+	goto end

+)

+

+if "%1" == "singlehtml" (

+	%SPHINXBUILD% -b singlehtml %ALLSPHINXOPTS% %BUILDDIR%/singlehtml

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. The HTML pages are in %BUILDDIR%/singlehtml.

+	goto end

+)

+

+if "%1" == "pickle" (

+	%SPHINXBUILD% -b pickle %ALLSPHINXOPTS% %BUILDDIR%/pickle

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished; now you can process the pickle files.

+	goto end

+)

+

+if "%1" == "json" (

+	%SPHINXBUILD% -b json %ALLSPHINXOPTS% %BUILDDIR%/json

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished; now you can process the JSON files.

+	goto end

+)

+

+if "%1" == "htmlhelp" (

+	%SPHINXBUILD% -b htmlhelp %ALLSPHINXOPTS% %BUILDDIR%/htmlhelp

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished; now you can run HTML Help Workshop with the ^

+.hhp project file in %BUILDDIR%/htmlhelp.

+	goto end

+)

+

+if "%1" == "qthelp" (

+	%SPHINXBUILD% -b qthelp %ALLSPHINXOPTS% %BUILDDIR%/qthelp

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished; now you can run "qcollectiongenerator" with the ^

+.qhcp project file in %BUILDDIR%/qthelp, like this:

+	echo.^> qcollectiongenerator %BUILDDIR%\qthelp\Foo.qhcp

+	echo.To view the help file:

+	echo.^> assistant -collectionFile %BUILDDIR%\qthelp\Foo.ghc

+	goto end

+)

+

+if "%1" == "devhelp" (

+	%SPHINXBUILD% -b devhelp %ALLSPHINXOPTS% %BUILDDIR%/devhelp

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished.

+	goto end

+)

+

+if "%1" == "epub" (

+	%SPHINXBUILD% -b epub %ALLSPHINXOPTS% %BUILDDIR%/epub

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. The epub file is in %BUILDDIR%/epub.

+	goto end

+)

+

+if "%1" == "epub3" (

+	%SPHINXBUILD% -b epub3 %ALLSPHINXOPTS% %BUILDDIR%/epub3

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. The epub3 file is in %BUILDDIR%/epub3.

+	goto end

+)

+

+if "%1" == "latex" (

+	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished; the LaTeX files are in %BUILDDIR%/latex.

+	goto end

+)

+

+if "%1" == "latexpdf" (

+	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex

+	cd %BUILDDIR%/latex

+	make all-pdf

+	cd %~dp0

+	echo.

+	echo.Build finished; the PDF files are in %BUILDDIR%/latex.

+	goto end

+)

+

+if "%1" == "latexpdfja" (

+	%SPHINXBUILD% -b latex %ALLSPHINXOPTS% %BUILDDIR%/latex

+	cd %BUILDDIR%/latex

+	make all-pdf-ja

+	cd %~dp0

+	echo.

+	echo.Build finished; the PDF files are in %BUILDDIR%/latex.

+	goto end

+)

+

+if "%1" == "text" (

+	%SPHINXBUILD% -b text %ALLSPHINXOPTS% %BUILDDIR%/text

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. The text files are in %BUILDDIR%/text.

+	goto end

+)

+

+if "%1" == "man" (

+	%SPHINXBUILD% -b man %ALLSPHINXOPTS% %BUILDDIR%/man

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. The manual pages are in %BUILDDIR%/man.

+	goto end

+)

+

+if "%1" == "texinfo" (

+	%SPHINXBUILD% -b texinfo %ALLSPHINXOPTS% %BUILDDIR%/texinfo

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. The Texinfo files are in %BUILDDIR%/texinfo.

+	goto end

+)

+

+if "%1" == "gettext" (

+	%SPHINXBUILD% -b gettext %I18NSPHINXOPTS% %BUILDDIR%/locale

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. The message catalogs are in %BUILDDIR%/locale.

+	goto end

+)

+

+if "%1" == "changes" (

+	%SPHINXBUILD% -b changes %ALLSPHINXOPTS% %BUILDDIR%/changes

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.The overview file is in %BUILDDIR%/changes.

+	goto end

+)

+

+if "%1" == "linkcheck" (

+	%SPHINXBUILD% -b linkcheck %ALLSPHINXOPTS% %BUILDDIR%/linkcheck

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Link check complete; look for any errors in the above output ^

+or in %BUILDDIR%/linkcheck/output.txt.

+	goto end

+)

+

+if "%1" == "doctest" (

+	%SPHINXBUILD% -b doctest %ALLSPHINXOPTS% %BUILDDIR%/doctest

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Testing of doctests in the sources finished, look at the ^

+results in %BUILDDIR%/doctest/output.txt.

+	goto end

+)

+

+if "%1" == "coverage" (

+	%SPHINXBUILD% -b coverage %ALLSPHINXOPTS% %BUILDDIR%/coverage

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Testing of coverage in the sources finished, look at the ^

+results in %BUILDDIR%/coverage/python.txt.

+	goto end

+)

+

+if "%1" == "xml" (

+	%SPHINXBUILD% -b xml %ALLSPHINXOPTS% %BUILDDIR%/xml

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. The XML files are in %BUILDDIR%/xml.

+	goto end

+)

+

+if "%1" == "pseudoxml" (

+	%SPHINXBUILD% -b pseudoxml %ALLSPHINXOPTS% %BUILDDIR%/pseudoxml

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. The pseudo-XML files are in %BUILDDIR%/pseudoxml.

+	goto end

+)

+

+if "%1" == "dummy" (

+	%SPHINXBUILD% -b dummy %ALLSPHINXOPTS% %BUILDDIR%/dummy

+	if errorlevel 1 exit /b 1

+	echo.

+	echo.Build finished. Dummy builder generates no files.

+	goto end

+)

+

+:end


diff --git a/doc/source/_static/extra.css b/doc/source/_static/extra.css
new file mode 100644
index 0000000..b55515e
--- /dev/null
+++ b/doc/source/_static/extra.css

@@ -0,0 +1,43 @@
+div:not(.highlight) > pre {
+    background: #fff;
+    border: 1px solid #e1e4e5;
+    color: #404040;
+    margin: 1px 0 24px 0;
+    overflow-x: auto;
+    padding: 12px 12px;
+    font-size: 12px;
+}
+
+a.reference.internal code.literal {
+    border: none;
+    font-size: 12px;
+    color: #2980B9;
+    padding: 0;
+    background: none;
+}
+
+a.reference.internal:visited code.literal {
+    color: #9B59B6;
+    padding: 0;
+    background: none;
+}
+
+
+/* override table width restrictions */
+.wy-table-responsive table td, .wy-table-responsive table th {
+    white-space: normal;
+}
+
+.wy-table-responsive {
+    margin-bottom: 24px;
+    max-width: 100%;
+    overflow: visible;
+}
+
+table.contentstable {
+    margin: 0;
+}
+
+td.rightcolumn {
+    padding-left: 30px;
+}

diff --git a/doc/source/_templates/indexcontent.html b/doc/source/_templates/indexcontent.html
new file mode 100644
index 0000000..a71a7e9
--- /dev/null
+++ b/doc/source/_templates/indexcontent.html

@@ -0,0 +1,33 @@
+{% extends "defindex.html" %}
+{% block tables %}
+<p><strong>{% trans %}Main documentation parts:{% endtrans %}</strong></p>
+  <table class="contentstable" align="center"><tr>
+    <td width="50%">
+      <p class="biglink"><a class="biglink" href="{{ pathto("getting_started/index") }}">{% trans %}Getting started{% endtrans %}</a><br/>
+         <span class="linkdescr">{% trans %}Newbie friendly starting point{% endtrans %}</span></p>
+      <p class="biglink"><a class="biglink" href="{{ pathto("architecture/index") }}">{% trans %}Cassandra Architecture{% endtrans %}</a><br/>
+         <span class="linkdescr">{% trans %}Cassandra's big picture{% endtrans %}</span></p>
+      <p class="biglink"><a class="biglink" href="{{ pathto("data_modeling/index") }}">{% trans %}Data Modeling{% endtrans %}</a><br/>
+         <span class="linkdescr">{% trans %}Or how to make square pegs fit round holes{% endtrans %}</span></p>
+      <p class="biglink"><a class="biglink" href="{{ pathto("cql/index") }}">{% trans %}Cassandra Query Language{% endtrans %}</a><br/>
+         <span class="linkdescr">{% trans %}CQL reference documentation{% endtrans %}</span></p>
+      <p class="biglink"><a class="biglink" href="{{ pathto("configuration/index") }}">{% trans %}Configuration{% endtrans %}</a><br/>
+         <span class="linkdescr">{% trans %}Cassandra's handles and knobs{% endtrans %}</span></p>
+    </td><td width="50%" class="rightcolumn">
+      <p class="biglink"><a class="biglink" href="{{ pathto("operating/index") }}">{% trans %}Operating Cassandra{% endtrans %}</a><br/>
+         <span class="linkdescr">{% trans %}The operator's corner{% endtrans %}</span></p>
+      <p class="biglink"><a class="biglink" href="{{ pathto("tooling/index") }}">{% trans %}Cassandra's Tools{% endtrans %}</a><br/>
+         <span class="linkdescr">{% trans %}cqlsh, nodetool, ...{% endtrans %}</span></p>
+      <p class="biglink"><a class="biglink" href="{{ pathto("troubleshooting/index") }}">{% trans %}Troubleshooting{% endtrans %}</a><br/>
+         <span class="linkdescr">{% trans %}What to look for when you have a problem{% endtrans %}</span></p>
+      <p class="biglink"><a class="biglink" href="{{ pathto("faq/index") }}">{% trans %}FAQs{% endtrans %}</a><br/>
+         <span class="linkdescr">{% trans %}Frequently Asked Questions (with answers!){% endtrans %}</span></p>
+    </td></tr>
+  </table>
+
+<p><strong>{% trans %}Meta informations:{% endtrans %}</strong></p>
+
+<p class="biglink"><a class="biglink" href="{{ pathto("bugs") }}">{% trans %}Reporting bugs{% endtrans %}</a></p>
+<p class="biglink"><a class="biglink" href="{{ pathto("contactus") }}">{% trans %}Contact us{% endtrans %}</a></p>
+
+{% endblock %}

diff --git a/doc/source/architecture/dynamo.rst b/doc/source/architecture/dynamo.rst
new file mode 100644
index 0000000..d146471
--- /dev/null
+++ b/doc/source/architecture/dynamo.rst

@@ -0,0 +1,137 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Dynamo
+------
+
+Gossip
+^^^^^^
+
+.. todo:: todo
+
+Failure Detection
+^^^^^^^^^^^^^^^^^
+
+.. todo:: todo
+
+Token Ring/Ranges
+^^^^^^^^^^^^^^^^^
+
+.. todo:: todo
+
+.. _replication-strategy:
+
+Replication
+^^^^^^^^^^^
+
+The replication strategy of a keyspace determines which nodes are replicas for a given token range. The two main
+replication strategies are :ref:`simple-strategy` and :ref:`network-topology-strategy`.
+
+.. _simple-strategy:
+
+SimpleStrategy
+~~~~~~~~~~~~~~
+
+SimpleStrategy allows a single integer ``replication_factor`` to be defined. This determines the number of nodes that
+should contain a copy of each row.  For example, if ``replication_factor`` is 3, then three different nodes should store
+a copy of each row.
+
+SimpleStrategy treats all nodes identically, ignoring any configured datacenters or racks.  To determine the replicas
+for a token range, Cassandra iterates through the tokens in the ring, starting with the token range of interest.  For
+each token, it checks whether the owning node has been added to the set of replicas, and if it has not, it is added to
+the set.  This process continues until ``replication_factor`` distinct nodes have been added to the set of replicas.
+
+.. _network-topology-strategy:
+
+NetworkTopologyStrategy
+~~~~~~~~~~~~~~~~~~~~~~~
+
+NetworkTopologyStrategy allows a replication factor to be specified for each datacenter in the cluster.  Even if your
+cluster only uses a single datacenter, NetworkTopologyStrategy should be prefered over SimpleStrategy to make it easier
+to add new physical or virtual datacenters to the cluster later.
+
+In addition to allowing the replication factor to be specified per-DC, NetworkTopologyStrategy also attempts to choose
+replicas within a datacenter from different racks.  If the number of racks is greater than or equal to the replication
+factor for the DC, each replica will be chosen from a different rack.  Otherwise, each rack will hold at least one
+replica, but some racks may hold more than one. Note that this rack-aware behavior has some potentially `surprising
+implications <https://issues.apache.org/jira/browse/CASSANDRA-3810>`_.  For example, if there are not an even number of
+nodes in each rack, the data load on the smallest rack may be much higher.  Similarly, if a single node is bootstrapped
+into a new rack, it will be considered a replica for the entire ring.  For this reason, many operators choose to
+configure all nodes on a single "rack".
+
+Tunable Consistency
+^^^^^^^^^^^^^^^^^^^
+
+Cassandra supports a per-operation tradeoff between consistency and availability through *Consistency Levels*.
+Essentially, an operation's consistency level specifies how many of the replicas need to respond to the coordinator in
+order to consider the operation a success.
+
+The following consistency levels are available:
+
+``ONE``
+  Only a single replica must respond.
+
+``TWO``
+  Two replicas must respond.
+
+``THREE``
+  Three replicas must respond.
+
+``QUORUM``
+  A majority (n/2 + 1) of the replicas must respond.
+
+``ALL``
+  All of the replicas must respond.
+
+``LOCAL_QUORUM``
+  A majority of the replicas in the local datacenter (whichever datacenter the coordinator is in) must respond.
+
+``EACH_QUORUM``
+  A majority of the replicas in each datacenter must respond.
+
+``LOCAL_ONE``
+  Only a single replica must respond.  In a multi-datacenter cluster, this also gaurantees that read requests are not
+  sent to replicas in a remote datacenter.
+
+``ANY``
+  A single replica may respond, or the coordinator may store a hint. If a hint is stored, the coordinator will later
+  attempt to replay the hint and deliver the mutation to the replicas.  This consistency level is only accepted for
+  write operations.
+
+Write operations are always sent to all replicas, regardless of consistency level. The consistency level simply
+controls how many responses the coordinator waits for before responding to the client.
+
+For read operations, the coordinator generally only issues read commands to enough replicas to satisfy the consistency
+level. There are a couple of exceptions to this:
+
+- Speculative retry may issue a redundant read request to an extra replica if the other replicas have not responded
+  within a specified time window.
+- Based on ``read_repair_chance`` and ``dclocal_read_repair_chance`` (part of a table's schema), read requests may be
+  randomly sent to all replicas in order to repair potentially inconsistent data.
+
+Picking Consistency Levels
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+It is common to pick read and write consistency levels that are high enough to overlap, resulting in "strong"
+consistency.  This is typically expressed as ``W + R > RF``, where ``W`` is the write consistency level, ``R`` is the
+read consistency level, and ``RF`` is the replication factor.  For example, if ``RF = 3``, a ``QUORUM`` request will
+require responses from at least two of the three replicas.  If ``QUORUM`` is used for both writes and reads, at least
+one of the replicas is guaranteed to participate in *both* the write and the read request, which in turn guarantees that
+the latest write will be read. In a multi-datacenter environment, ``LOCAL_QUORUM`` can be used to provide a weaker but
+still useful guarantee: reads are guaranteed to see the latest write from within the same datacenter.
+
+If this type of strong consistency isn't required, lower consistency levels like ``ONE`` may be used to improve
+throughput, latency, and availability.

diff --git a/doc/source/architecture/guarantees.rst b/doc/source/architecture/guarantees.rst
new file mode 100644
index 0000000..c0b58d8
--- /dev/null
+++ b/doc/source/architecture/guarantees.rst

@@ -0,0 +1,20 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Guarantees
+----------
+
+.. todo:: todo

diff --git a/doc/source/architecture/index.rst b/doc/source/architecture/index.rst
new file mode 100644
index 0000000..58eda13
--- /dev/null
+++ b/doc/source/architecture/index.rst

@@ -0,0 +1,29 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Architecture
+============
+
+This section describes the general architecture of Apache Cassandra.
+
+.. toctree::
+   :maxdepth: 2
+
+   overview
+   dynamo
+   storage_engine
+   guarantees
+

diff --git a/doc/source/architecture/overview.rst b/doc/source/architecture/overview.rst
new file mode 100644
index 0000000..005b15b
--- /dev/null
+++ b/doc/source/architecture/overview.rst

@@ -0,0 +1,20 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Overview
+--------
+
+.. todo:: todo

diff --git a/doc/source/architecture/storage_engine.rst b/doc/source/architecture/storage_engine.rst
new file mode 100644
index 0000000..e4114e5
--- /dev/null
+++ b/doc/source/architecture/storage_engine.rst

@@ -0,0 +1,82 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Storage Engine
+--------------
+
+.. _commit-log:
+
+CommitLog
+^^^^^^^^^
+
+.. todo:: todo
+
+.. _memtables:
+
+Memtables
+^^^^^^^^^
+
+Memtables are in-memory structures where Cassandra buffers writes.  In general, there is one active memtable per table.
+Eventually, memtables are flushed onto disk and become immutable `SSTables`_.  This can be triggered in several
+ways:
+
+- The memory usage of the memtables exceeds the configured threshold  (see ``memtable_cleanup_threshold``)
+- The :ref:`commit-log` approaches its maximum size, and forces memtable flushes in order to allow commitlog segments to
+  be freed
+
+Memtables may be stored entirely on-heap or partially off-heap, depending on ``memtable_allocation_type``.
+
+SSTables
+^^^^^^^^
+
+SSTables are the immutable data files that Cassandra uses for persisting data on disk.
+
+As SSTables are flushed to disk from :ref:`memtables` or are streamed from other nodes, Cassandra triggers compactions
+which combine multiple SSTables into one.  Once the new SSTable has been written, the old SSTables can be removed.
+
+Each SSTable is comprised of multiple components stored in separate files:
+
+``Data.db``
+  The actual data, i.e. the contents of rows.
+
+``Index.db``
+  An index from partition keys to positions in the ``Data.db`` file.  For wide partitions, this may also include an
+  index to rows within a partition.
+
+``Summary.db``
+  A sampling of (by default) every 128th entry in the ``Index.db`` file.
+
+``Filter.db``
+  A Bloom Filter of the partition keys in the SSTable.
+
+``CompressionInfo.db``
+  Metadata about the offsets and lengths of compression chunks in the ``Data.db`` file.
+
+``Statistics.db``
+  Stores metadata about the SSTable, including information about timestamps, tombstones, clustering keys, compaction,
+  repair, compression, TTLs, and more.
+
+``Digest.crc32``
+  A CRC-32 digest of the ``Data.db`` file.
+
+``TOC.txt``
+  A plain text list of the component files for the SSTable.
+
+Within the ``Data.db`` file, rows are organized by partition.  These partitions are sorted in token order (i.e. by a
+hash of the partition key when the default partitioner, ``Murmur3Partition``, is used).  Within a partition, rows are
+stored in the order of their clustering keys.
+
+SSTables can be optionally compressed using block-based compression.

diff --git a/doc/source/bugs.rst b/doc/source/bugs.rst
new file mode 100644
index 0000000..90efb14
--- /dev/null
+++ b/doc/source/bugs.rst

@@ -0,0 +1,31 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Reporting bugs and contributing
+===============================
+
+If you encounter a problem with Cassandra, the first places to ask for help are the :ref:`user mailing list
+<mailing-lists>` and the ``#cassandra`` :ref:`IRC channel <irc-channels>`.
+
+If, after having asked for help, you suspect that you have found a bug in Cassandra, you should report it by opening a
+ticket through the `Apache Cassandra JIRA <https://issues.apache.org/jira/browse/CASSANDRA>`__. Please provide as much
+details as you can on your problem, and don't forget to indicate which version of Cassandra you are running and on which
+environment.
+
+If you would like to contribute, please check `the section on contributing
+<https://wiki.apache.org/cassandra/HowToContribute>`__ on the Cassandra wiki. Please note that the source of this
+documentation is part of the Cassandra git repository and hence contributions to the documentation should follow the
+same path.

diff --git a/doc/source/conf.py b/doc/source/conf.py
new file mode 100644
index 0000000..9caf188
--- /dev/null
+++ b/doc/source/conf.py

@@ -0,0 +1,432 @@
+# -*- coding: utf-8 -*-
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+#
+# Apache Cassandra Documentation documentation build configuration file
+#
+# This file is execfile()d with the current directory set to its containing
+# dir.
+import re
+
+# Finds out the version (so we don't have to manually edit that file every
+# time we change the version)
+cassandra_build_file = '../../build.xml'
+with open(cassandra_build_file) as f:
+    m = re.search("name=\"base\.version\" value=\"([^\"]+)\"", f.read())
+    if not m or m.lastindex != 1:
+        raise RuntimeException("Problem finding version in build.xml file, this shouldn't happen.")
+    cassandra_version = m.group(1)
+
+
+
+# -- General configuration ------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+# needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be
+# extensions coming with Sphinx (named 'sphinx.ext.*') or your custom
+# ones.
+extensions = [
+    'sphinx.ext.todo',
+    'sphinx.ext.mathjax',
+    'sphinx.ext.ifconfig',
+]
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['_templates']
+
+# The suffix(es) of source filenames.
+# You can specify multiple suffix as a list of string:
+source_suffix = ['.rst']
+
+# The encoding of source files.
+#
+# source_encoding = 'utf-8-sig'
+
+# The master toctree document.
+master_doc = 'index'
+
+# General information about the project.
+project = u'Apache Cassandra'
+copyright = u'2016, The Apache Cassandra team'
+author = u'The Apache Cassandra team'
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+version = cassandra_version
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#
+# This is also used if you do content translation via gettext catalogs.
+# Usually you set "language" from the command line for these cases.
+language = None
+
+# There are two options for replacing |today|: either, you set today to some
+# non-false value, then it is used:
+#
+# today = ''
+#
+# Else, today_fmt is used as the format for a strftime call.
+#
+# today_fmt = '%B %d, %Y'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+# This patterns also effect to html_static_path and html_extra_path
+exclude_patterns = []
+
+# The reST default role (used for this markup: `text`) to use for all
+# documents.
+#
+# default_role = None
+
+# If true, '()' will be appended to :func: etc. cross-reference text.
+#
+# add_function_parentheses = True
+
+# If true, the current module name will be prepended to all description
+# unit titles (such as .. function::).
+#
+# add_module_names = True
+
+# If true, sectionauthor and moduleauthor directives will be shown in the
+# output. They are ignored by default.
+#
+# show_authors = False
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# A list of ignored prefixes for module index sorting.
+# modindex_common_prefix = []
+
+# If true, keep warnings as "system message" paragraphs in the built documents.
+# keep_warnings = False
+
+# If true, `todo` and `todoList` produce output, else they produce nothing.
+todo_include_todos = True
+
+
+# -- Options for HTML output ----------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+#
+html_theme = 'sphinx_rtd_theme'
+
+html_context = { 'extra_css_files': [ '_static/extra.css' ] }
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#
+# html_theme_options = {}
+
+# Add any paths that contain custom themes here, relative to this directory.
+#html_theme_path = ['.']
+
+# The name for this set of Sphinx documents.
+# "<project> v<release> documentation" by default.
+#
+html_title = u'Apache Cassandra Documentation v%s' % version
+
+# A shorter title for the navigation bar.  Default is the same as html_title.
+#
+# html_short_title = None
+
+# The name of an image file (relative to this directory) to place at the top
+# of the sidebar.
+#
+# html_logo = None
+
+# The name of an image file (relative to this directory) to use as a favicon of
+# the docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
+# pixels large.
+#
+# html_favicon = None
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['_static']
+
+# Add any extra paths that contain custom files (such as robots.txt or
+# .htaccess) here, relative to this directory. These files are copied
+# directly to the root of the documentation.
+#
+# html_extra_path = []
+
+# If not None, a 'Last updated on:' timestamp is inserted at every page
+# bottom, using the given strftime format.
+# The empty string is equivalent to '%b %d, %Y'.
+#
+# html_last_updated_fmt = None
+
+# If true, SmartyPants will be used to convert quotes and dashes to
+# typographically correct entities.
+#
+# html_use_smartypants = True
+
+# Custom sidebar templates, maps document names to template names.
+#
+# html_sidebars = {}
+
+# Additional templates that should be rendered to pages, maps page names to
+# template names.
+#
+html_additional_pages = {
+        'index': 'indexcontent.html'
+}
+
+# If false, no module index is generated.
+#
+# html_domain_indices = True
+
+# If false, no index is generated.
+#
+# html_use_index = True
+
+# If true, the index is split into individual pages for each letter.
+#
+# html_split_index = False
+
+# If true, links to the reST sources are added to the pages.
+#
+# html_show_sourcelink = True
+
+# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
+#
+# html_show_sphinx = True
+
+# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
+#
+# html_show_copyright = True
+
+# If true, an OpenSearch description file will be output, and all pages will
+# contain a <link> tag referring to it.  The value of this option must be the
+# base URL from which the finished HTML is served.
+#
+# html_use_opensearch = ''
+
+# This is the file name suffix for HTML files (e.g. ".xhtml").
+# html_file_suffix = None
+
+# Language to be used for generating the HTML full-text search index.
+# Sphinx supports the following languages:
+#   'da', 'de', 'en', 'es', 'fi', 'fr', 'hu', 'it', 'ja'
+#   'nl', 'no', 'pt', 'ro', 'ru', 'sv', 'tr', 'zh'
+#
+# html_search_language = 'en'
+
+# A dictionary with options for the search language support, empty by default.
+# 'ja' uses this config value.
+# 'zh' user can custom change `jieba` dictionary path.
+#
+# html_search_options = {'type': 'default'}
+
+# The name of a javascript file (relative to the configuration directory) that
+# implements a search results scorer. If empty, the default will be used.
+#
+# html_search_scorer = 'scorer.js'
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'ApacheCassandraDocumentationdoc'
+
+# -- Options for LaTeX output ---------------------------------------------
+
+latex_elements = {
+     # The paper size ('letterpaper' or 'a4paper').
+     #
+     # 'papersize': 'letterpaper',
+
+     # The font size ('10pt', '11pt' or '12pt').
+     #
+     # 'pointsize': '10pt',
+
+     # Additional stuff for the LaTeX preamble.
+     #
+     # 'preamble': '',
+
+     # Latex figure (float) alignment
+     #
+     # 'figure_align': 'htbp',
+}
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title,
+#  author, documentclass [howto, manual, or own class]).
+latex_documents = [
+    (master_doc, 'ApacheCassandra.tex', u'Apache Cassandra Documentation',
+     u'The Apache Cassandra team', 'manual'),
+]
+
+# The name of an image file (relative to this directory) to place at the top of
+# the title page.
+#
+# latex_logo = None
+
+# For "manual" documents, if this is true, then toplevel headings are parts,
+# not chapters.
+#
+# latex_use_parts = False
+
+# If true, show page references after internal links.
+#
+# latex_show_pagerefs = False
+
+# If true, show URL addresses after external links.
+#
+# latex_show_urls = False
+
+# Documents to append as an appendix to all manuals.
+#
+# latex_appendices = []
+
+# If false, no module index is generated.
+#
+# latex_domain_indices = True
+
+
+# -- Options for manual page output ---------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+    (master_doc, 'apachecassandra', u'Apache Cassandra Documentation',
+     [author], 1)
+]
+
+# If true, show URL addresses after external links.
+#
+# man_show_urls = False
+
+
+# -- Options for Texinfo output -------------------------------------------
+
+# Grouping the document tree into Texinfo files. List of tuples
+# (source start file, target name, title, author,
+#  dir menu entry, description, category)
+texinfo_documents = [
+    (master_doc, 'ApacheCassandra', u'Apache Cassandra Documentation',
+     author, 'ApacheCassandraDocumentation', 'One line description of project.',
+     'Miscellaneous'),
+]
+
+# Documents to append as an appendix to all manuals.
+#
+# texinfo_appendices = []
+
+# If false, no module index is generated.
+#
+# texinfo_domain_indices = True
+
+# How to display URL addresses: 'footnote', 'no', or 'inline'.
+#
+# texinfo_show_urls = 'footnote'
+
+# If true, do not generate a @detailmenu in the "Top" node's menu.
+#
+# texinfo_no_detailmenu = False
+
+
+# -- Options for Epub output ----------------------------------------------
+
+# Bibliographic Dublin Core info.
+epub_title = project
+epub_author = author
+epub_publisher = author
+epub_copyright = copyright
+
+# The basename for the epub file. It defaults to the project name.
+# epub_basename = project
+
+# The HTML theme for the epub output. Since the default themes are not
+# optimized for small screen space, using the same theme for HTML and epub
+# output is usually not wise. This defaults to 'epub', a theme designed to save
+# visual space.
+#
+# epub_theme = 'epub'
+
+# The language of the text. It defaults to the language option
+# or 'en' if the language is not set.
+#
+# epub_language = ''
+
+# The scheme of the identifier. Typical schemes are ISBN or URL.
+# epub_scheme = ''
+
+# The unique identifier of the text. This can be a ISBN number
+# or the project homepage.
+#
+# epub_identifier = ''
+
+# A unique identification for the text.
+#
+# epub_uid = ''
+
+# A tuple containing the cover image and cover page html template filenames.
+#
+# epub_cover = ()
+
+# A sequence of (type, uri, title) tuples for the guide element of content.opf.
+#
+# epub_guide = ()
+
+# HTML files that should be inserted before the pages created by sphinx.
+# The format is a list of tuples containing the path and title.
+#
+# epub_pre_files = []
+
+# HTML files that should be inserted after the pages created by sphinx.
+# The format is a list of tuples containing the path and title.
+#
+# epub_post_files = []
+
+# A list of files that should not be packed into the epub file.
+epub_exclude_files = ['search.html']
+
+# The depth of the table of contents in toc.ncx.
+#
+# epub_tocdepth = 3
+
+# Allow duplicate toc entries.
+#
+# epub_tocdup = True
+
+# Choose between 'default' and 'includehidden'.
+#
+# epub_tocscope = 'default'
+
+# Fix unsupported image types using the Pillow.
+#
+# epub_fix_images = False
+
+# Scale large images.
+#
+# epub_max_image_width = 0
+
+# How to display URL addresses: 'footnote', 'no', or 'inline'.
+#
+# epub_show_urls = 'inline'
+
+# If false, no index is generated.
+#
+# epub_use_index = True

diff --git a/doc/source/configuration/index.rst b/doc/source/configuration/index.rst
new file mode 100644
index 0000000..f774fda
--- /dev/null
+++ b/doc/source/configuration/index.rst

@@ -0,0 +1,25 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Configuring Cassandra
+=====================
+
+This section describes how to configure Apache Cassandra.
+
+.. toctree::
+   :maxdepth: 1
+
+   cassandra_config_file

diff --git a/doc/source/contactus.rst b/doc/source/contactus.rst
new file mode 100644
index 0000000..8d0f5dd
--- /dev/null
+++ b/doc/source/contactus.rst

@@ -0,0 +1,53 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Contact us
+==========
+
+You can get in touch with the Cassandra community either via the mailing lists or the freenode IRC channels.
+
+.. _mailing-lists:
+
+Mailing lists
+-------------
+
+The following mailing lists are available:
+
+- `Users <http://www.mail-archive.com/user@cassandra.apache.org/>`__ – General discussion list for users - `Subscribe
+  <user-subscribe@cassandra.apache.org>`__
+- `Developers <http://www.mail-archive.com/dev@cassandra.apache.org/>`__ – Development related discussion - `Subscribe
+  <dev-subscribe@cassandra.apache.org>`__
+- `Commits <http://www.mail-archive.com/commits@cassandra.apache.org/>`__ – Commit notification source repository -
+  `Subscribe <commits-subscribe@cassandra.apache.org>`__
+- `Client Libraries <http://www.mail-archive.com/client-dev@cassandra.apache.org/>`__ – Discussion related to the
+  development of idiomatic client APIs - `Subscribe <client-dev-subscribe@cassandra.apache.org>`__
+
+Subscribe by sending an email to the email address in the Subscribe links above. Follow the instructions in the welcome
+email to confirm your subscription. Make sure to keep the welcome email as it contains instructions on how to
+unsubscribe.
+
+.. _irc-channels:
+
+IRC
+---
+
+To chat with developers or users in real-time, join our channels on `IRC freenode <http://webchat.freenode.net/>`__. The
+following channels are available:
+
+- ``#cassandra`` - for user questions and general discussions.
+- ``#cassandra-dev`` - strictly for questions or discussions related to Cassandra development.
+- ``#cassandra-builds`` - results of automated test builds.
+

diff --git a/doc/source/cql/appendices.rst b/doc/source/cql/appendices.rst
new file mode 100644
index 0000000..c4bb839
--- /dev/null
+++ b/doc/source/cql/appendices.rst

@@ -0,0 +1,308 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: sql
+
+Appendices
+----------
+
+.. _appendix-A:
+
+Appendix A: CQL Keywords
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+CQL distinguishes between *reserved* and *non-reserved* keywords.
+Reserved keywords cannot be used as identifier, they are truly reserved
+for the language (but one can enclose a reserved keyword by
+double-quotes to use it as an identifier). Non-reserved keywords however
+only have a specific meaning in certain context but can used as
+identifier otherwise. The only *raison d’être* of these non-reserved
+keywords is convenience: some keyword are non-reserved when it was
+always easy for the parser to decide whether they were used as keywords
+or not.
+
++--------------------+-------------+
+| Keyword            | Reserved?   |
++====================+=============+
+| ``ADD``            | yes         |
++--------------------+-------------+
+| ``AGGREGATE``      | no          |
++--------------------+-------------+
+| ``ALL``            | no          |
++--------------------+-------------+
+| ``ALLOW``          | yes         |
++--------------------+-------------+
+| ``ALTER``          | yes         |
++--------------------+-------------+
+| ``AND``            | yes         |
++--------------------+-------------+
+| ``APPLY``          | yes         |
++--------------------+-------------+
+| ``AS``             | no          |
++--------------------+-------------+
+| ``ASC``            | yes         |
++--------------------+-------------+
+| ``ASCII``          | no          |
++--------------------+-------------+
+| ``AUTHORIZE``      | yes         |
++--------------------+-------------+
+| ``BATCH``          | yes         |
++--------------------+-------------+
+| ``BEGIN``          | yes         |
++--------------------+-------------+
+| ``BIGINT``         | no          |
++--------------------+-------------+
+| ``BLOB``           | no          |
++--------------------+-------------+
+| ``BOOLEAN``        | no          |
++--------------------+-------------+
+| ``BY``             | yes         |
++--------------------+-------------+
+| ``CALLED``         | no          |
++--------------------+-------------+
+| ``CLUSTERING``     | no          |
++--------------------+-------------+
+| ``COLUMNFAMILY``   | yes         |
++--------------------+-------------+
+| ``COMPACT``        | no          |
++--------------------+-------------+
+| ``CONTAINS``       | no          |
++--------------------+-------------+
+| ``COUNT``          | no          |
++--------------------+-------------+
+| ``COUNTER``        | no          |
++--------------------+-------------+
+| ``CREATE``         | yes         |
++--------------------+-------------+
+| ``CUSTOM``         | no          |
++--------------------+-------------+
+| ``DATE``           | no          |
++--------------------+-------------+
+| ``DECIMAL``        | no          |
++--------------------+-------------+
+| ``DELETE``         | yes         |
++--------------------+-------------+
+| ``DESC``           | yes         |
++--------------------+-------------+
+| ``DESCRIBE``       | yes         |
++--------------------+-------------+
+| ``DISTINCT``       | no          |
++--------------------+-------------+
+| ``DOUBLE``         | no          |
++--------------------+-------------+
+| ``DROP``           | yes         |
++--------------------+-------------+
+| ``ENTRIES``        | yes         |
++--------------------+-------------+
+| ``EXECUTE``        | yes         |
++--------------------+-------------+
+| ``EXISTS``         | no          |
++--------------------+-------------+
+| ``FILTERING``      | no          |
++--------------------+-------------+
+| ``FINALFUNC``      | no          |
++--------------------+-------------+
+| ``FLOAT``          | no          |
++--------------------+-------------+
+| ``FROM``           | yes         |
++--------------------+-------------+
+| ``FROZEN``         | no          |
++--------------------+-------------+
+| ``FULL``           | yes         |
++--------------------+-------------+
+| ``FUNCTION``       | no          |
++--------------------+-------------+
+| ``FUNCTIONS``      | no          |
++--------------------+-------------+
+| ``GRANT``          | yes         |
++--------------------+-------------+
+| ``IF``             | yes         |
++--------------------+-------------+
+| ``IN``             | yes         |
++--------------------+-------------+
+| ``INDEX``          | yes         |
++--------------------+-------------+
+| ``INET``           | no          |
++--------------------+-------------+
+| ``INFINITY``       | yes         |
++--------------------+-------------+
+| ``INITCOND``       | no          |
++--------------------+-------------+
+| ``INPUT``          | no          |
++--------------------+-------------+
+| ``INSERT``         | yes         |
++--------------------+-------------+
+| ``INT``            | no          |
++--------------------+-------------+
+| ``INTO``           | yes         |
++--------------------+-------------+
+| ``JSON``           | no          |
++--------------------+-------------+
+| ``KEY``            | no          |
++--------------------+-------------+
+| ``KEYS``           | no          |
++--------------------+-------------+
+| ``KEYSPACE``       | yes         |
++--------------------+-------------+
+| ``KEYSPACES``      | no          |
++--------------------+-------------+
+| ``LANGUAGE``       | no          |
++--------------------+-------------+
+| ``LIMIT``          | yes         |
++--------------------+-------------+
+| ``LIST``           | no          |
++--------------------+-------------+
+| ``LOGIN``          | no          |
++--------------------+-------------+
+| ``MAP``            | no          |
++--------------------+-------------+
+| ``MODIFY``         | yes         |
++--------------------+-------------+
+| ``NAN``            | yes         |
++--------------------+-------------+
+| ``NOLOGIN``        | no          |
++--------------------+-------------+
+| ``NORECURSIVE``    | yes         |
++--------------------+-------------+
+| ``NOSUPERUSER``    | no          |
++--------------------+-------------+
+| ``NOT``            | yes         |
++--------------------+-------------+
+| ``NULL``           | yes         |
++--------------------+-------------+
+| ``OF``             | yes         |
++--------------------+-------------+
+| ``ON``             | yes         |
++--------------------+-------------+
+| ``OPTIONS``        | no          |
++--------------------+-------------+
+| ``OR``             | yes         |
++--------------------+-------------+
+| ``ORDER``          | yes         |
++--------------------+-------------+
+| ``PASSWORD``       | no          |
++--------------------+-------------+
+| ``PERMISSION``     | no          |
++--------------------+-------------+
+| ``PERMISSIONS``    | no          |
++--------------------+-------------+
+| ``PRIMARY``        | yes         |
++--------------------+-------------+
+| ``RENAME``         | yes         |
++--------------------+-------------+
+| ``REPLACE``        | yes         |
++--------------------+-------------+
+| ``RETURNS``        | no          |
++--------------------+-------------+
+| ``REVOKE``         | yes         |
++--------------------+-------------+
+| ``ROLE``           | no          |
++--------------------+-------------+
+| ``ROLES``          | no          |
++--------------------+-------------+
+| ``SCHEMA``         | yes         |
++--------------------+-------------+
+| ``SELECT``         | yes         |
++--------------------+-------------+
+| ``SET``            | yes         |
++--------------------+-------------+
+| ``SFUNC``          | no          |
++--------------------+-------------+
+| ``SMALLINT``       | no          |
++--------------------+-------------+
+| ``STATIC``         | no          |
++--------------------+-------------+
+| ``STORAGE``        | no          |
++--------------------+-------------+
+| ``STYPE``          | no          |
++--------------------+-------------+
+| ``SUPERUSER``      | no          |
++--------------------+-------------+
+| ``TABLE``          | yes         |
++--------------------+-------------+
+| ``TEXT``           | no          |
++--------------------+-------------+
+| ``TIME``           | no          |
++--------------------+-------------+
+| ``TIMESTAMP``      | no          |
++--------------------+-------------+
+| ``TIMEUUID``       | no          |
++--------------------+-------------+
+| ``TINYINT``        | no          |
++--------------------+-------------+
+| ``TO``             | yes         |
++--------------------+-------------+
+| ``TOKEN``          | yes         |
++--------------------+-------------+
+| ``TRIGGER``        | no          |
++--------------------+-------------+
+| ``TRUNCATE``       | yes         |
++--------------------+-------------+
+| ``TTL``            | no          |
++--------------------+-------------+
+| ``TUPLE``          | no          |
++--------------------+-------------+
+| ``TYPE``           | no          |
++--------------------+-------------+
+| ``UNLOGGED``       | yes         |
++--------------------+-------------+
+| ``UPDATE``         | yes         |
++--------------------+-------------+
+| ``USE``            | yes         |
++--------------------+-------------+
+| ``USER``           | no          |
++--------------------+-------------+
+| ``USERS``          | no          |
++--------------------+-------------+
+| ``USING``          | yes         |
++--------------------+-------------+
+| ``UUID``           | no          |
++--------------------+-------------+
+| ``VALUES``         | no          |
++--------------------+-------------+
+| ``VARCHAR``        | no          |
++--------------------+-------------+
+| ``VARINT``         | no          |
++--------------------+-------------+
+| ``WHERE``          | yes         |
++--------------------+-------------+
+| ``WITH``           | yes         |
++--------------------+-------------+
+| ``WRITETIME``      | no          |
++--------------------+-------------+
+
+Appendix B: CQL Reserved Types
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following type names are not currently used by CQL, but are reserved
+for potential future use. User-defined types may not use reserved type
+names as their name.
+
++-----------------+
+| type            |
++=================+
+| ``bitstring``   |
++-----------------+
+| ``byte``        |
++-----------------+
+| ``complex``     |
++-----------------+
+| ``enum``        |
++-----------------+
+| ``interval``    |
++-----------------+
+| ``macaddr``     |
++-----------------+

diff --git a/doc/source/cql/changes.rst b/doc/source/cql/changes.rst
new file mode 100644
index 0000000..263df13
--- /dev/null
+++ b/doc/source/cql/changes.rst

@@ -0,0 +1,189 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: sql
+
+Changes
+-------
+
+The following describes the changes in each version of CQL.
+
+3.4.2
+^^^^^
+
+- If a table has a non zero ``default_time_to_live``, then explicitly specifying a TTL of 0 in an ``INSERT`` or
+  ``UPDATE`` statement will result in the new writes not having any expiration (that is, an explicit TTL of 0 cancels
+  the ``default_time_to_live``). This wasn't the case before and the ``default_time_to_live`` was applied even though a
+  TTL had been explicitly set.
+- ``ALTER TABLE`` ``ADD`` and ``DROP`` now allow multiple columns to be added/removed.
+- New ``PER PARTITION LIMIT`` option for ``SELECT`` statements (see `CASSANDRA-7017
+  <https://issues.apache.org/jira/browse/CASSANDRA-7017)>`__.
+- :ref:`User-defined functions <cql-functions>` can now instantiate ``UDTValue`` and ``TupleValue`` instances via the
+  new ``UDFContext`` interface (see `CASSANDRA-10818 <https://issues.apache.org/jira/browse/CASSANDRA-10818)>`__.
+- :ref:`User-defined types <udts>` may now be stored in a non-frozen form, allowing individual fields to be updated and
+  deleted in ``UPDATE`` statements and ``DELETE`` statements, respectively. (`CASSANDRA-7423
+  <https://issues.apache.org/jira/browse/CASSANDRA-7423)>`__).
+
+3.4.1
+^^^^^
+
+- Adds ``CAST`` functions.
+
+3.4.0
+^^^^^
+
+- Support for :ref:`materialized views <materialized-views>`.
+- ``DELETE`` support for inequality expressions and ``IN`` restrictions on any primary key columns.
+- ``UPDATE`` support for ``IN`` restrictions on any primary key columns.
+
+3.3.1
+^^^^^
+
+- The syntax ``TRUNCATE TABLE X`` is now accepted as an alias for ``TRUNCATE X``.
+
+3.3.0
+^^^^^
+
+- :ref:`User-defined functions and aggregates <cql-functions>` are now supported.
+- Allows double-dollar enclosed strings literals as an alternative to single-quote enclosed strings.
+- Introduces Roles to supersede user based authentication and access control
+- New ``date``, ``time``, ``tinyint`` and ``smallint`` :ref:`data types <data-types>` have been added.
+- :ref:`JSON support <cql-json>` has been added
+- Adds new time conversion functions and deprecate ``dateOf`` and ``unixTimestampOf``.
+
+3.2.0
+^^^^^
+
+- :ref:`User-defined types <udts>` supported.
+- ``CREATE INDEX`` now supports indexing collection columns, including indexing the keys of map collections through the
+  ``keys()`` function
+- Indexes on collections may be queried using the new ``CONTAINS`` and ``CONTAINS KEY`` operators
+- :ref:`Tuple types <tuples>` were added to hold fixed-length sets of typed positional fields.
+- ``DROP INDEX`` now supports optionally specifying a keyspace.
+
+3.1.7
+^^^^^
+
+- ``SELECT`` statements now support selecting multiple rows in a single partition using an ``IN`` clause on combinations
+  of clustering columns.
+- ``IF NOT EXISTS`` and ``IF EXISTS`` syntax is now supported by ``CREATE USER`` and ``DROP USER`` statements,
+  respectively.
+
+3.1.6
+^^^^^
+
+- A new ``uuid()`` method has been added.
+- Support for ``DELETE ... IF EXISTS`` syntax.
+
+3.1.5
+^^^^^
+
+- It is now possible to group clustering columns in a relation, see :ref:`WHERE <where-clause>` clauses.
+- Added support for :ref:`static columns <static-columns>`.
+
+3.1.4
+^^^^^
+
+- ``CREATE INDEX`` now allows specifying options when creating CUSTOM indexes.
+
+3.1.3
+^^^^^
+
+- Millisecond precision formats have been added to the :ref:`timestamp <timestamps>` parser.
+
+3.1.2
+^^^^^
+
+- ``NaN`` and ``Infinity`` has been added as valid float constants. They are now reserved keywords. In the unlikely case
+  you we using them as a column identifier (or keyspace/table one), you will now need to double quote them.
+
+3.1.1
+^^^^^
+
+- ``SELECT`` statement now allows listing the partition keys (using the ``DISTINCT`` modifier). See `CASSANDRA-4536
+  <https://issues.apache.org/jira/browse/CASSANDRA-4536>`__.
+- The syntax ``c IN ?`` is now supported in ``WHERE`` clauses. In that case, the value expected for the bind variable
+  will be a list of whatever type ``c`` is.
+- It is now possible to use named bind variables (using ``:name`` instead of ``?``).
+
+3.1.0
+^^^^^
+
+- ``ALTER TABLE`` ``DROP`` option added.
+- ``SELECT`` statement now supports aliases in select clause. Aliases in WHERE and ORDER BY clauses are not supported.
+- ``CREATE`` statements for ``KEYSPACE``, ``TABLE`` and ``INDEX`` now supports an ``IF NOT EXISTS`` condition.
+  Similarly, ``DROP`` statements support a ``IF EXISTS`` condition.
+- ``INSERT`` statements optionally supports a ``IF NOT EXISTS`` condition and ``UPDATE`` supports ``IF`` conditions.
+
+3.0.5
+^^^^^
+
+- ``SELECT``, ``UPDATE``, and ``DELETE`` statements now allow empty ``IN`` relations (see `CASSANDRA-5626
+  <https://issues.apache.org/jira/browse/CASSANDRA-5626)>`__.
+
+3.0.4
+^^^^^
+
+- Updated the syntax for custom :ref:`secondary indexes <secondary-indexes>`.
+- Non-equal condition on the partition key are now never supported, even for ordering partitioner as this was not
+  correct (the order was **not** the one of the type of the partition key). Instead, the ``token`` method should always
+  be used for range queries on the partition key (see :ref:`WHERE clauses <where-clause>`).
+
+3.0.3
+^^^^^
+
+- Support for custom :ref:`secondary indexes <secondary-indexes>` has been added.
+
+3.0.2
+^^^^^
+
+- Type validation for the :ref:`constants <constants>` has been fixed. For instance, the implementation used to allow
+  ``'2'`` as a valid value for an ``int`` column (interpreting it has the equivalent of ``2``), or ``42`` as a valid
+  ``blob`` value (in which case ``42`` was interpreted as an hexadecimal representation of the blob). This is no longer
+  the case, type validation of constants is now more strict. See the :ref:`data types <data-types>` section for details
+  on which constant is allowed for which type.
+- The type validation fixed of the previous point has lead to the introduction of blobs constants to allow the input of
+  blobs. Do note that while the input of blobs as strings constant is still supported by this version (to allow smoother
+  transition to blob constant), it is now deprecated and will be removed by a future version. If you were using strings
+  as blobs, you should thus update your client code ASAP to switch blob constants.
+- A number of functions to convert native types to blobs have also been introduced. Furthermore the token function is
+  now also allowed in select clauses. See the :ref:`section on functions <cql-functions>` for details.
+
+3.0.1
+^^^^^
+
+- Date strings (and timestamps) are no longer accepted as valid ``timeuuid`` values. Doing so was a bug in the sense
+  that date string are not valid ``timeuuid``, and it was thus resulting in `confusing behaviors
+  <https://issues.apache.org/jira/browse/CASSANDRA-4936>`__. However, the following new methods have been added to help
+  working with ``timeuuid``: ``now``, ``minTimeuuid``, ``maxTimeuuid`` ,
+  ``dateOf`` and ``unixTimestampOf``.
+- Float constants now support the exponent notation. In other words, ``4.2E10`` is now a valid floating point value.
+
+Versioning
+^^^^^^^^^^
+
+Versioning of the CQL language adheres to the `Semantic Versioning <http://semver.org>`__ guidelines. Versions take the
+form X.Y.Z where X, Y, and Z are integer values representing major, minor, and patch level respectively. There is no
+correlation between Cassandra release versions and the CQL language version.
+
+========= =============================================================================================================
+ version   description
+========= =============================================================================================================
+ Major     The major version *must* be bumped when backward incompatible changes are introduced. This should rarely
+           occur.
+ Minor     Minor version increments occur when new, but backward compatible, functionality is introduced.
+ Patch     The patch version is incremented when bugs are fixed.
+========= =============================================================================================================

diff --git a/doc/source/cql/ddl.rst b/doc/source/cql/ddl.rst
new file mode 100644
index 0000000..7f3431a
--- /dev/null
+++ b/doc/source/cql/ddl.rst

@@ -0,0 +1,677 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: sql
+
+.. _data-definition:
+
+Data Definition
+---------------
+
+CQL stores data in *tables*, whose schema defines the layout of said data in the table, and those tables are grouped in
+*keyspaces*. A keyspace defines a number of options that applies to all the tables it contains, most prominently of
+which is the :ref:`replication strategy <replication-strategy>` used by the keyspace. It is generally encouraged to use
+one keyspace by *application*, and thus many cluster may define only one keyspace.
+
+This section describes the statements used to create, modify, and remove those keyspace and tables.
+
+Common definitions
+^^^^^^^^^^^^^^^^^^
+
+The names of the keyspaces and tables are defined by the following grammar:
+
+.. productionlist::
+   keyspace_name: `name`
+   table_name: [ `keyspace_name` '.' ] `name`
+   name: `unquoted_name` | `quoted_name`
+   unquoted_name: re('[a-zA-Z_0-9]{1, 48}')
+   quoted_name: '"' `unquoted_name` '"'
+
+Both keyspace and table name should be comprised of only alphanumeric characters, cannot be empty and are limited in
+size to 48 characters (that limit exists mostly to avoid filenames (which may include the keyspace and table name) to go
+over the limits of certain file systems). By default, keyspace and table names are case insensitive (``myTable`` is
+equivalent to ``mytable``) but case sensitivity can be forced by using double-quotes (``"myTable"`` is different from
+``mytable``).
+
+Further, a table is always part of a keyspace and a table name can be provided fully-qualified by the keyspace it is
+part of. If is is not fully-qualified, the table is assumed to be in the *current* keyspace (see :ref:`USE statement
+<use-statement>`).
+
+Further, the valid names for columns is simply defined as:
+
+.. productionlist::
+   column_name: `identifier`
+
+We also define the notion of statement options for use in the following section:
+
+.. productionlist::
+   options: `option` ( AND `option` )*
+   option: `identifier` '=' ( `identifier` | `constant` | `map_literal` )
+
+.. _create-keyspace-statement:
+
+CREATE KEYSPACE
+^^^^^^^^^^^^^^^
+
+A keyspace is created using a ``CREATE KEYSPACE`` statement:
+
+.. productionlist::
+   create_keyspace_statement: CREATE KEYSPACE [ IF NOT EXISTS ] `keyspace_name` WITH `options`
+
+For instance::
+
+    CREATE KEYSPACE Excelsior
+               WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3};
+
+    CREATE KEYSPACE Excalibur
+               WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1' : 1, 'DC2' : 3}
+                AND durable_writes = false;
+
+
+The supported ``options`` are:
+
+=================== ========== =========== ========= ===================================================================
+name                 kind       mandatory   default   description
+=================== ========== =========== ========= ===================================================================
+``replication``      *map*      yes                   The replication strategy and options to use for the keyspace (see
+                                                      details below).
+``durable_writes``   *simple*   no          true      Whether to use the commit log for updates on this keyspace
+                                                      (disable this option at your own risk!).
+=================== ========== =========== ========= ===================================================================
+
+The ``replication`` property is mandatory and must at least contains the ``'class'`` sub-option which defines the
+:ref:`replication strategy <replication-strategy>` class to use. The rest of the sub-options depends on what replication
+strategy is used. By default, Cassandra support the following ``'class'``:
+
+- ``'SimpleStrategy'``: A simple strategy that defines a replication factor for the whole cluster. The only sub-options
+  supported is ``'replication_factor'`` to define that replication factor and is mandatory.
+- ``'NetworkTopologyStrategy'``: A replication strategy that allows to set the replication factor independently for
+  each data-center. The rest of the sub-options are key-value pairs where a key is a data-center name and its value is
+  the associated replication factor.
+
+Attempting to create a keyspace that already exists will return an error unless the ``IF NOT EXISTS`` option is used. If
+it is used, the statement will be a no-op if the keyspace already exists.
+
+.. _use-statement:
+
+USE
+^^^
+
+The ``USE`` statement allows to change the *current* keyspace (for the *connection* on which it is executed). A number
+of objects in CQL are bound to a keyspace (tables, user-defined types, functions, ...) and the current keyspace is the
+default keyspace used when those objects are referred without a fully-qualified name (that is, without being prefixed a
+keyspace name). A ``USE`` statement simply takes the keyspace to use as current as argument:
+
+.. productionlist::
+   use_statement: USE `keyspace_name`
+
+.. _alter-keyspace-statement:
+
+ALTER KEYSPACE
+^^^^^^^^^^^^^^
+
+An ``ALTER KEYSPACE`` statement allows to modify the options of a keyspace:
+
+.. productionlist::
+   alter_keyspace_statement: ALTER KEYSPACE `keyspace_name` WITH `options`
+
+For instance::
+
+    ALTER KEYSPACE Excelsior
+              WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 4};
+
+The supported options are the same than for :ref:`creating a keyspace <create-keyspace-statement>`.
+
+.. _drop-keyspace-statement:
+
+DROP KEYSPACE
+^^^^^^^^^^^^^
+
+Dropping a keyspace can be done using the ``DROP KEYSPACE`` statement:
+
+.. productionlist::
+   drop_keyspace_statement: DROP KEYSPACE [ IF EXISTS ] `keyspace_name`
+
+For instance::
+
+    DROP KEYSPACE Excelsior;
+
+Dropping a keyspace results in the immediate, irreversible removal of that keyspace, including all the tables, UTD and
+functions in it, and all the data contained in those tables.
+
+If the keyspace does not exists, the statement will return an error, unless ``IF EXISTS`` is used in which case the
+operation is a no-op.
+
+.. _create-table-statement:
+
+CREATE TABLE
+^^^^^^^^^^^^
+
+Creating a new table uses the ``CREATE TABLE`` statement:
+
+.. productionlist::
+   create_table_statement: CREATE TABLE [ IF NOT EXISTS ] `table_name`
+                         : '('
+                         :     `column_definition`
+                         :     ( ',' `column_definition` )*
+                         :     [ ',' PRIMARY KEY '(' `primary_key` ')' ]
+                         : ')' [ WITH `table_options` ]
+   column_definition: `column_name` `cql_type` [ STATIC ] [ PRIMARY KEY]
+   primary_key: `partition_key` [ ',' `clustering_columns` ]
+   partition_key: `column_name`
+                : | '(' `column_name` ( ',' `column_name` )* ')'
+   clustering_columns: `column_name` ( ',' `column_name` )*
+   table_options: COMPACT STORAGE [ AND `table_options` ]
+                   : | CLUSTERING ORDER BY '(' `clustering_order` ')' [ AND `table_options` ]
+                   : | `options`
+   clustering_order: `column_name` (ASC | DESC) ( ',' `column_name` (ASC | DESC) )*
+
+For instance::
+
+    CREATE TABLE monkeySpecies (
+        species text PRIMARY KEY,
+        common_name text,
+        population varint,
+        average_size int
+    ) WITH comment='Important biological records'
+       AND read_repair_chance = 1.0;
+
+    CREATE TABLE timeline (
+        userid uuid,
+        posted_month int,
+        posted_time uuid,
+        body text,
+        posted_by text,
+        PRIMARY KEY (userid, posted_month, posted_time)
+    ) WITH compaction = { 'class' : 'LeveledCompactionStrategy' };
+
+    CREATE TABLE loads (
+        machine inet,
+        cpu int,
+        mtime timeuuid,
+        load float,
+        PRIMARY KEY ((machine, cpu), mtime)
+    ) WITH CLUSTERING ORDER BY (mtime DESC);
+
+A CQL table has a name and is composed of a set of *rows*. Creating a table amounts to defining which :ref:`columns
+<column-definition>` the rows will be composed, which of those columns compose the :ref:`primary key <primary-key>`, as
+well as optional :ref:`options <create-table-options>` for the table.
+
+Attempting to create an already existing table will return an error unless the ``IF NOT EXISTS`` directive is used. If
+it is used, the statement will be a no-op if the table already exists.
+
+
+.. _column-definition:
+
+Column definitions
+~~~~~~~~~~~~~~~~~~
+
+Every rows in a CQL table has a set of predefined columns defined at the time of the table creation (or added later
+using an :ref:`alter statement<alter-table-statement>`).
+
+A :token:`column_definition` is primarily comprised of the name of the column defined and it's :ref:`type <data-types>`,
+which restrict which values are accepted for that column. Additionally, a column definition can have the following
+modifiers:
+
+``STATIC``
+    it declares the column as being a :ref:`static column <static-columns>`.
+
+``PRIMARY KEY``
+    it declares the column as being the sole component of the :ref:`primary key <primary-key>` of the table.
+
+.. _static-columns:
+
+Static columns
+``````````````
+Some columns can be declared as ``STATIC`` in a table definition. A column that is static will be “shared” by all the
+rows belonging to the same partition (having the same :ref:`partition key <partition-key>`). For instance::
+
+    CREATE TABLE t (
+        pk int,
+        t int,
+        v text,
+        s text static,
+        PRIMARY KEY (pk, t)
+    );
+
+    INSERT INTO t (pk, t, v, s) VALUES (0, 0, 'val0', 'static0');
+    INSERT INTO t (pk, t, v, s) VALUES (0, 1, 'val1', 'static1');
+
+    SELECT * FROM t;
+       pk | t | v      | s
+      ----+---+--------+-----------
+       0  | 0 | 'val0' | 'static1'
+       0  | 1 | 'val1' | 'static1'
+
+As can be seen, the ``s`` value is the same (``static1``) for both of the row in the partition (the partition key in
+that example being ``pk``, both rows are in that same partition): the 2nd insertion has overridden the value for ``s``.
+
+The use of static columns as the following restrictions:
+
+- tables with the ``COMPACT STORAGE`` option (see below) cannot use them.
+- a table without clustering columns cannot have static columns (in a table without clustering columns, every partition
+  has only one row, and so every column is inherently static).
+- only non ``PRIMARY KEY`` columns can be static.
+
+.. _primary-key:
+
+The Primary key
+~~~~~~~~~~~~~~~
+
+Within a table, a row is uniquely identified by its ``PRIMARY KEY``, and hence all table **must** define a PRIMARY KEY
+(and only one). A ``PRIMARY KEY`` definition is composed of one or more of the columns defined in the table.
+Syntactically, the primary key is defined the keywords ``PRIMARY KEY`` followed by comma-separated list of the column
+names composing it within parenthesis, but if the primary key has only one column, one can alternatively follow that
+column definition by the ``PRIMARY KEY`` keywords. The order of the columns in the primary key definition matter.
+
+A CQL primary key is composed of 2 parts:
+
+- the :ref:`partition key <partition-key>` part. It is the first component of the primary key definition. It can be a
+  single column or, using additional parenthesis, can be multiple columns. A table always have at least a partition key,
+  the smallest possible table definition is::
+
+      CREATE TABLE t (k text PRIMARY KEY);
+
+- the :ref:`clustering columns <clustering-columns>`. Those are the columns after the first component of the primary key
+  definition, and the order of those columns define the *clustering order*.
+
+Some example of primary key definition are:
+
+- ``PRIMARY KEY (a)``: ``a`` is the partition key and there is no clustering columns.
+- ``PRIMARY KEY (a, b, c)`` : ``a`` is the partition key and ``b`` and ``c`` are the clustering columns.
+- ``PRIMARY KEY ((a, b), c)`` : ``a`` and ``b`` compose the partition key (this is often called a *composite* partition
+  key) and ``c`` is the clustering column.
+
+
+.. _partition-key:
+
+The partition key
+`````````````````
+
+Within a table, CQL defines the notion of a *partition*. A partition is simply the set of rows that share the same value
+for their partition key. Note that if the partition key is composed of multiple columns, then rows belong to the same
+partition only they have the same values for all those partition key column. So for instance, given the following table
+definition and content::
+
+    CREATE TABLE t (
+        a int,
+        b int,
+        c int,
+        d int,
+        PRIMARY KEY ((a, b), c, d)
+    );
+
+    SELECT * FROM t;
+       a | b | c | d
+      ---+---+---+---
+       0 | 0 | 0 | 0    // row 1
+       0 | 0 | 1 | 1    // row 2
+       0 | 1 | 2 | 2    // row 3
+       0 | 1 | 3 | 3    // row 4
+       1 | 1 | 4 | 4    // row 5
+
+``row 1`` and ``row 2`` are in the same partition, ``row 3`` and ``row 4`` are also in the same partition (but a
+different one) and ``row 5`` is in yet another partition.
+
+Note that a table always has a partition key, and that if the table has no :ref:`clustering columns
+<clustering-columns>`, then every partition of that table is only comprised of a single row (since the primary key
+uniquely identifies rows and the primary key is equal to the partition key if there is no clustering columns).
+
+The most important property of partition is that all the rows belonging to the same partition are guarantee to be stored
+on the same set of replica nodes. In other words, the partition key of a table defines which of the rows will be
+localized together in the Cluster, and it is thus important to choose your partition key wisely so that rows that needs
+to be fetch together are in the same partition (so that querying those rows together require contacting a minimum of
+nodes).
+
+Please note however that there is a flip-side to this guarantee: as all rows sharing a partition key are guaranteed to
+be stored on the same set of replica node, a partition key that groups too much data can create a hotspot.
+
+Another useful property of a partition is that when writing data, all the updates belonging to a single partition are
+done *atomically* and in *isolation*, which is not the case across partitions.
+
+The proper choice of the partition key and clustering columns for a table is probably one of the most important aspect
+of data modeling in Cassandra, and it largely impact which queries can be performed, and how efficiently they are.
+
+
+.. _clustering-columns:
+
+The clustering columns
+``````````````````````
+
+The clustering columns of a table defines the clustering order for the partition of that table. For a given
+:ref:`partition <partition-key>`, all the rows are physically ordered inside Cassandra by that clustering order. For
+instance, given::
+
+    CREATE TABLE t (
+        a int,
+        b int,
+        c int,
+        PRIMARY KEY (a, c, d)
+    );
+
+    SELECT * FROM t;
+       a | b | c
+      ---+---+---
+       0 | 0 | 4     // row 1
+       0 | 1 | 9     // row 2
+       0 | 2 | 2     // row 3
+       0 | 3 | 3     // row 4
+
+then the rows (which all belong to the same partition) are all stored internally in the order of the values of their
+``b`` column (the order they are displayed above). So where the partition key of the table allows to group rows on the
+same replica set, the clustering columns controls how those rows are stored on the replica. That sorting allows the
+retrieval of a range of rows within a partition (for instance, in the example above, ``SELECT * FROM t WHERE a = 0 AND b
+> 1 and b <= 3``) very efficient.
+
+
+.. _create-table-options:
+
+Table options
+~~~~~~~~~~~~~
+
+A CQL table has a number of options that can be set at creation (and, for most of them, :ref:`altered
+<alter-table-statement>` later). These options are specified after the ``WITH`` keyword.
+
+Amongst those options, two important ones cannot be changed after creation and influence which queries can be done
+against the table: the ``COMPACT STORAGE`` option and the ``CLUSTERING ORDER`` option. Those, as well as the other
+options of a table are described in the following sections.
+
+.. _compact-tables:
+
+Compact tables
+``````````````
+
+.. warning:: Since Cassandra 3.0, compact tables have the exact same layout internally than non compact ones (for the
+   same schema obviously), and declaring a table compact **only** creates artificial limitations on the table definition
+   and usage that are necessary to ensure backward compatibility with the deprecated Thrift API. And as ``COMPACT
+   STORAGE`` cannot, as of Cassandra |version|, be removed, it is strongly discouraged to create new table with the
+   ``COMPACT STORAGE`` option.
+
+A *compact* table is one defined with the ``COMPACT STORAGE`` option. This option is mainly targeted towards backward
+compatibility for definitions created before CQL version 3 (see `www.datastax.com/dev/blog/thrift-to-cql3
+<http://www.datastax.com/dev/blog/thrift-to-cql3>`__ for more details) and shouldn't be used for new tables. Declaring a
+table with this option creates limitations for the table which are largely arbitrary but necessary for backward
+compatibility with the (deprecated) Thrift API. Amongst those limitation:
+
+- a compact table cannot use collections nor static columns.
+- if a compact table has at least one clustering column, then it must have *exactly* one column outside of the primary
+  key ones. This imply you cannot add or remove columns after creation in particular.
+- a compact table is limited in the indexes it can create, and no materialized view can be created on it.
+
+.. _clustering-order:
+
+Reversing the clustering order
+``````````````````````````````
+
+The clustering order of a table is defined by the :ref:`clustering columns <clustering-columns>` of that table. By
+default, that ordering is based on natural order of those clustering order, but the ``CLUSTERING ORDER`` allows to
+change that clustering order to use the *reverse* natural order for some (potentially all) of the columns.
+
+The ``CLUSTERING ORDER`` option takes the comma-separated list of the clustering column, each with a ``ASC`` (for
+*ascendant*, e.g. the natural order) or ``DESC`` (for *descendant*, e.g. the reverse natural order). Note in particular
+that the default (if the ``CLUSTERING ORDER`` option is not used) is strictly equivalent to using the option with all
+clustering columns using the ``ASC`` modifier.
+
+Note that this option is basically a hint for the storage engine to change the order in which it stores the row but it
+has 3 visible consequences:
+
+# it limits which ``ORDER BY`` clause are allowed for :ref:`selects <select-statement>` on that table. You can only
+  order results by the clustering order or the reverse clustering order. Meaning that if a table has 2 clustering column
+  ``a`` and ``b`` and you defined ``WITH CLUSTERING ORDER (a DESC, b ASC)``, then in queries you will be allowed to use
+  ``ORDER BY (a DESC, b ASC)`` and (reverse clustering order) ``ORDER BY (a ASC, b DESC)`` but **not** ``ORDER BY (a
+  ASC, b ASC)`` (nor ``ORDER BY (a DESC, b DESC)``).
+# it also change the default order of results when queried (if no ``ORDER BY`` is provided). Results are always returned
+  in clustering order (within a partition).
+# it has a small performance impact on some queries as queries in reverse clustering order are slower than the one in
+  forward clustering order. In practice, this means that if you plan on querying mostly in the reverse natural order of
+  your columns (which is common with time series for instance where you often want data from the newest to the oldest),
+  it is an optimization to declare a descending clustering order.
+
+.. _create-table-general-options:
+
+Other table options
+```````````````````
+
+.. todo:: review (misses cdc if nothing else) and link to proper categories when appropriate (compaction for instance)
+
+A table supports the following options:
+
++--------------------------------+----------+-------------+-----------------------------------------------------------+
+| option                         | kind     | default     | description                                               |
++================================+==========+=============+===========================================================+
+| ``comment``                    | *simple* | none        | A free-form, human-readable comment.                      |
++--------------------------------+----------+-------------+-----------------------------------------------------------+
+| ``read_repair_chance``         | *simple* | 0.1         | The probability with which to query extra nodes (e.g.     |
+|                                |          |             | more nodes than required by the consistency level) for    |
+|                                |          |             | the purpose of read repairs.                              |
++--------------------------------+----------+-------------+-----------------------------------------------------------+
+| ``dclocal_read_repair_chance`` | *simple* | 0           | The probability with which to query extra nodes (e.g.     |
+|                                |          |             | more nodes than required by the consistency level)        |
+|                                |          |             | belonging to the same data center than the read           |
+|                                |          |             | coordinator for the purpose of read repairs.              |
++--------------------------------+----------+-------------+-----------------------------------------------------------+
+| ``gc_grace_seconds``           | *simple* | 864000      | Time to wait before garbage collecting tombstones         |
+|                                |          |             | (deletion markers).                                       |
++--------------------------------+----------+-------------+-----------------------------------------------------------+
+| ``bloom_filter_fp_chance``     | *simple* | 0.00075     | The target probability of false positive of the sstable   |
+|                                |          |             | bloom filters. Said bloom filters will be sized to provide|
+|                                |          |             | the provided probability (thus lowering this value impact |
+|                                |          |             | the size of bloom filters in-memory and on-disk)          |
++--------------------------------+----------+-------------+-----------------------------------------------------------+
+| ``default_time_to_live``       | *simple* | 0           | The default expiration time (“TTL”) in seconds for a      |
+|                                |          |             | table.                                                    |
++--------------------------------+----------+-------------+-----------------------------------------------------------+
+| ``compaction``                 | *map*    | *see below* | :ref:`Compaction options <cql-compaction-options>`.       |
++--------------------------------+----------+-------------+-----------------------------------------------------------+
+| ``compression``                | *map*    | *see below* | :ref:`Compression options <cql-compression-options>`.     |
++--------------------------------+----------+-------------+-----------------------------------------------------------+
+| ``caching``                    | *map*    | *see below* | :ref:`Caching options <cql-caching-options>`.             |
++--------------------------------+----------+-------------+-----------------------------------------------------------+
+
+.. _cql-compaction-options:
+
+Compaction options
+##################
+
+The ``compaction`` options must at least define the ``'class'`` sub-option, that defines the compaction strategy class
+to use. The default supported class are ``'SizeTieredCompactionStrategy'`` (:ref:`STCS <STCS>`),
+``'LeveledCompactionStrategy'`` (:ref:`LCS <LCS>`) and ``'TimeWindowCompactionStrategy'`` (:ref:`TWCS <TWCS>`) (the
+``'DateTieredCompactionStrategy'`` is also supported but is deprecated and ``'TimeWindowCompactionStrategy'`` should be
+preferred instead). Custom strategy can be provided by specifying the full class name as a :ref:`string constant
+<constants>`.
+
+All default strategies support a number of :ref:`common options <compaction-options>`, as well as options specific to
+the strategy chosen (see the section corresponding to your strategy for details: :ref:`STCS <stcs-options>`, :ref:`LCS
+<lcs-options>` and :ref:`TWCS <TWCS>`).
+
+.. _cql-compression-options:
+
+Compression options
+###################
+
+The ``compression`` options define if and how the sstables of the table are compressed. The following sub-options are
+available:
+
+========================= =============== =============================================================================
+ Option                    Default         Description
+========================= =============== =============================================================================
+ ``class``                 LZ4Compressor   The compression algorithm to use. Default compressor are: LZ4Compressor,
+                                           SnappyCompressor and DeflateCompressor. Use ``'enabled' : false`` to disable
+                                           compression. Custom compressor can be provided by specifying the full class
+                                           name as a “string constant”:#constants.
+ ``enabled``               true            Enable/disable sstable compression.
+ ``chunk_length_in_kb``    64KB            On disk SSTables are compressed by block (to allow random reads). This
+                                           defines the size (in KB) of said block. Bigger values may improve the
+                                           compression rate, but increases the minimum size of data to be read from disk
+                                           for a read
+ ``crc_check_chance``      1.0             When compression is enabled, each compressed block includes a checksum of
+                                           that block for the purpose of detecting disk bitrot and avoiding the
+                                           propagation of corruption to other replica. This option defines the
+                                           probability with which those checksums are checked during read. By default
+                                           they are always checked. Set to 0 to disable checksum checking and to 0.5 for
+                                           instance to check them every other read   |
+========================= =============== =============================================================================
+
+.. _cql-caching-options:
+
+Caching options
+###############
+
+The ``caching`` options allows to configure both the *key cache* and the *row cache* for the table. The following
+sub-options are available:
+
+======================== ========= ====================================================================================
+ Option                   Default   Description
+======================== ========= ====================================================================================
+ ``keys``                 ALL       Whether to cache keys (“key cache”) for this table. Valid values are: ``ALL`` and
+                                    ``NONE``.
+ ``rows_per_partition``   NONE      The amount of rows to cache per partition (“row cache”). If an integer ``n`` is
+                                    specified, the first ``n`` queried rows of a partition will be cached. Other
+                                    possible options are ``ALL``, to cache all rows of a queried partition, or ``NONE``
+                                    to disable row caching.
+======================== ========= ====================================================================================
+
+Other considerations:
+#####################
+
+- Adding new columns (see ``ALTER TABLE`` below) is a constant time operation. There is thus no need to try to
+  anticipate future usage when creating a table.
+
+.. _alter-table-statement:
+
+ALTER TABLE
+^^^^^^^^^^^
+
+Altering an existing table uses the ``ALTER TABLE`` statement:
+
+.. productionlist::
+   alter_table_statement: ALTER TABLE `table_name` `alter_table_instruction`
+   alter_table_instruction: ALTER `column_name` TYPE `cql_type`
+                          : | ADD `column_name` `cql_type` ( ',' `column_name` `cql_type` )*
+                          : | DROP `column_name` ( `column_name` )*
+                          : | WITH `options`
+
+For instance::
+
+    ALTER TABLE addamsFamily ALTER lastKnownLocation TYPE uuid;
+
+    ALTER TABLE addamsFamily ADD gravesite varchar;
+
+    ALTER TABLE addamsFamily
+           WITH comment = 'A most excellent and useful table'
+           AND read_repair_chance = 0.2;
+
+The ``ALTER TABLE`` statement can:
+
+- Change the type of one of the column in the table (through the ``ALTER`` instruction). Note that the type of a column
+  cannot be changed arbitrarily. The change of type should be such that any value of the previous type should be a valid
+  value of the new type. Further, for :ref:`clustering columns <clustering-columns>` and columns on which a secondary
+  index is defined, the new type must sort values in the same way the previous type does. See the :ref:`type
+  compatibility table <alter-table-type-compatibility>` below for detail on which type changes are accepted.
+- Add new column(s) to the table (through the ``ADD`` instruction). Note that the primary key of a table cannot be
+  changed and thus newly added column will, by extension, never be part of the primary key. Also note that :ref:`compact
+  tables <compact-tables>` have restrictions regarding column addition. Note that this is constant (in the amount of
+  data the cluster contains) time operation.
+- Remove column(s) from the table. This drops both the column and all its content, but note that while the column
+  becomes immediately unavailable, its content is only removed lazily during compaction. Please also see the warnings
+  below. Due to lazy removal, the altering itself is a constant (in the amount of data removed or contained in the
+  cluster) time operation.
+- Change some of the table options (through the ``WITH`` instruction). The :ref:`supported options
+  <create-table-options>` are the same that when creating a table (outside of ``COMPACT STORAGE`` and ``CLUSTERING
+  ORDER`` that cannot be changed after creation). Note that setting any ``compaction`` sub-options has the effect of
+  erasing all previous ``compaction`` options, so you need to re-specify all the sub-options if you want to keep them.
+  The same note applies to the set of ``compression`` sub-options.
+
+.. warning:: Dropping a column assumes that the timestamps used for the value of this column are "real" timestamp in
+   microseconds. Using "real" timestamps in microseconds is the default is and is **strongly** recommended but as
+   Cassandra allows the client to provide any timestamp on any table it is theoretically possible to use another
+   convention. Please be aware that if you do so, dropping a column will not work correctly.
+
+.. warning:: Once a column is dropped, it is allowed to re-add a column with the same name than the dropped one
+   **unless** the type of the dropped column was a (non-frozen) column (due to an internal technical limitation).
+
+.. _alter-table-type-compatibility:
+
+CQL type compatibility:
+~~~~~~~~~~~~~~~~~~~~~~~
+
+CQL data types may be converted only as the following table.
+
++-------------------------------------------------------+--------------------+
+| Existing type                                         | Can be altered to: |
++=======================================================+====================+
+| timestamp                                             | bigint             |
++-------------------------------------------------------+--------------------+
+| ascii, bigint, boolean, date, decimal, double, float, | blob               |
+| inet, int, smallint, text, time, timestamp, timeuuid, |                    |
+| tinyint, uuid, varchar, varint                        |                    |
++-------------------------------------------------------+--------------------+
+| int                                                   | date               |
++-------------------------------------------------------+--------------------+
+| ascii, varchar                                        | text               |
++-------------------------------------------------------+--------------------+
+| bigint                                                | time               |
++-------------------------------------------------------+--------------------+
+| bigint                                                | timestamp          |
++-------------------------------------------------------+--------------------+
+| timeuuid                                              | uuid               |
++-------------------------------------------------------+--------------------+
+| ascii, text                                           | varchar            |
++-------------------------------------------------------+--------------------+
+| bigint, int, timestamp                                | varint             |
++-------------------------------------------------------+--------------------+
+
+Clustering columns have stricter requirements, only the following conversions are allowed:
+
++------------------------+----------------------+
+| Existing type          | Can be altered to    |
++========================+======================+
+| ascii, text, varchar   | blob                 |
++------------------------+----------------------+
+| ascii, varchar         | text                 |
++------------------------+----------------------+
+| ascii, text            | varchar              |
++------------------------+----------------------+
+
+.. _drop-table-statement:
+
+DROP TABLE
+^^^^^^^^^^
+
+Dropping a table uses the ``DROP TABLE`` statement:
+
+.. productionlist::
+   drop_table_statement: DROP TABLE [ IF EXISTS ] `table_name`
+
+Dropping a table results in the immediate, irreversible removal of the table, including all data it contains.
+
+If the table does not exist, the statement will return an error, unless ``IF EXISTS`` is used in which case the
+operation is a no-op.
+
+.. _truncate-statement:
+
+TRUNCATE
+^^^^^^^^
+
+A table can be truncated using the ``TRUNCATE`` statement:
+
+.. productionlist::
+   truncate_statement: TRUNCATE [ TABLE ] `table_name`
+
+Note that ``TRUNCATE TABLE foo`` is allowed for consistency with other DDL statements but tables are the only object
+that can be truncated currently and so the ``TABLE`` keyword can be omitted.
+
+Truncating a table permanently removes all existing data from the table, but without removing the table itself.

diff --git a/doc/source/cql/definitions.rst b/doc/source/cql/definitions.rst
new file mode 100644
index 0000000..6c3b522
--- /dev/null
+++ b/doc/source/cql/definitions.rst

@@ -0,0 +1,230 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. _UUID: https://en.wikipedia.org/wiki/Universally_unique_identifier
+
+Definitions
+-----------
+
+.. _conventions:
+
+Conventions
+^^^^^^^^^^^
+
+To aid in specifying the CQL syntax, we will use the following conventions in this document:
+
+- Language rules will be given in an informal `BNF variant
+  <http://en.wikipedia.org/wiki/Backus%E2%80%93Naur_Form#Variants>`_ notation. In particular, we'll use square brakets
+  (``[ item ]``) for optional items, ``*`` and ``+`` for repeated items (where ``+`` imply at least one).
+- The grammar will also use the following convention for convenience: non-terminal term will be lowercase (and link to
+  their definition) while terminal keywords will be provided "all caps". Note however that keywords are
+  :ref:`identifiers` and are thus case insensitive in practice. We will also define some early construction using
+  regexp, which we'll indicate with ``re(<some regular expression>)``.
+- The grammar is provided for documentation purposes and leave some minor details out.  For instance, the comma on the
+  last column definition in a ``CREATE TABLE`` statement is optional but supported if present even though the grammar in
+  this document suggests otherwise. Also, not everything accepted by the grammar is necessarily valid CQL.
+- References to keywords or pieces of CQL code in running text will be shown in a ``fixed-width font``.
+
+
+.. _identifiers:
+
+Identifiers and keywords
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+The CQL language uses *identifiers* (or *names*) to identify tables, columns and other objects. An identifier is a token
+matching the regular expression ``[a-zA-Z][a-zA-Z0-9_]*``.
+
+A number of such identifiers, like ``SELECT`` or ``WITH``, are *keywords*. They have a fixed meaning for the language
+and most are reserved. The list of those keywords can be found in :ref:`appendix-A`.
+
+Identifiers and (unquoted) keywords are case insensitive. Thus ``SELECT`` is the same than ``select`` or ``sElEcT``, and
+``myId`` is the same than ``myid`` or ``MYID``. A convention often used (in particular by the samples of this
+documentation) is to use upper case for keywords and lower case for other identifiers.
+
+There is a second kind of identifiers called *quoted identifiers* defined by enclosing an arbitrary sequence of
+characters (non empty) in double-quotes(``"``). Quoted identifiers are never keywords. Thus ``"select"`` is not a
+reserved keyword and can be used to refer to a column (note that using this is particularly advised), while ``select``
+would raise a parsing error. Also, contrarily to unquoted identifiers and keywords, quoted identifiers are case
+sensitive (``"My Quoted Id"`` is *different* from ``"my quoted id"``). A fully lowercase quoted identifier that matches
+``[a-zA-Z][a-zA-Z0-9_]*`` is however *equivalent* to the unquoted identifier obtained by removing the double-quote (so
+``"myid"`` is equivalent to ``myid`` and to ``myId`` but different from ``"myId"``).  Inside a quoted identifier, the
+double-quote character can be repeated to escape it, so ``"foo "" bar"`` is a valid identifier.
+
+.. note:: *quoted identifiers* allows to declare columns with arbitrary names, and those can sometime clash with
+   specific names used by the server. For instance, when using conditional update, the server will respond with a
+   result-set containing a special result named ``"[applied]"``. If you’ve declared a column with such a name, this
+   could potentially confuse some tools and should be avoided. In general, unquoted identifiers should be preferred but
+   if you use quoted identifiers, it is strongly advised to avoid any name enclosed by squared brackets (like
+   ``"[applied]"``) and any name that looks like a function call (like ``"f(x)"``).
+
+More formally, we have:
+
+.. productionlist::
+   identifier: `unquoted_identifier` | `quoted_identifier`
+   unquoted_identifier: re('[a-zA-Z][a-zA-Z0-9_]*')
+   quoted_identifier: '"' (any character where " can appear if doubled)+ '"'
+
+.. _constants:
+
+Constants
+^^^^^^^^^
+
+CQL defines the following kind of *constants*:
+
+.. productionlist::
+   constant: `string` | `integer` | `float` | `boolean` | `uuid` | `blob` | NULL
+   string: '\'' (any character where ' can appear if doubled)+ '\''
+         : '$$' (any character other than '$$') '$$'
+   integer: re('-?[0-9]+')
+   float: re('-?[0-9]+(\.[0-9]*)?([eE][+-]?[0-9+])?') | NAN | INFINITY
+   boolean: TRUE | FALSE
+   uuid: `hex`{8}-`hex`{4}-`hex`{4}-`hex`{4}-`hex`{12}
+   hex: re("[0-9a-fA-F]")
+   blob: '0' ('x' | 'X') `hex`+
+
+In other words:
+
+- A string constant is an arbitrary sequence of characters enclosed by single-quote(``'``). A single-quote
+  can be included by repeating it, e.g. ``'It''s raining today'``. Those are not to be confused with quoted
+  :ref:`identifiers` that use double-quotes. Alternatively, a string can be defined by enclosing the arbitrary sequence
+  of characters by two dollar characters, in which case single-quote can be use without escaping (``$$It's raining
+  today$$``). That latter form is often used when defining :ref:`user-defined functions <udfs>` to avoid having to
+  escape single-quote characters in function body (as they are more likely to occur than ``$$``).
+- Integer, float and boolean constant are defined as expected. Note however than float allows the special ``NaN`` and
+  ``Infinity`` constants.
+- CQL supports UUID_ constants.
+- Blobs content are provided in hexadecimal and prefixed by ``0x``.
+- The special ``NULL`` constant denotes the absence of value.
+
+For how these constants are typed, see the :ref:`data-types` section.
+
+Terms
+^^^^^
+
+CQL has the notion of a *term*, which denotes the kind of values that CQL support. Terms are defined by:
+
+.. productionlist::
+   term: `constant` | `literal` | `function_call` | `type_hint` | `bind_marker`
+   literal: `collection_literal` | `udt_literal` | `tuple_literal`
+   function_call: `identifier` '(' [ `term` (',' `term`)* ] ')'
+   type_hint: '(' `cql_type` `)` term
+   bind_marker: '?' | ':' `identifier`
+
+A term is thus one of:
+
+- A :ref:`constant <constants>`.
+- A literal for either :ref:`a collection <collections>`, :ref:`a user-defined type <udts>` or :ref:`a tuple <tuples>`
+  (see the linked sections for details).
+- A function call: see :ref:`the section on functions <cql-functions>` for details on which :ref:`native function
+  <native-functions>` exists and how to define your own :ref:`user-defined ones <udfs>`.
+- A *type hint*: see the :ref:`related section <type-hints>` for details.
+- A bind marker, which denotes a variable to be bound at execution time. See the section on :ref:`prepared-statements`
+  for details. A bind marker can be either anonymous (``?``) or named (``:some_name``). The latter form provides a more
+  convenient way to refer to the variable for binding it and should generally be preferred.
+
+
+Comments
+^^^^^^^^
+
+A comment in CQL is a line beginning by either double dashes (``--``) or double slash (``//``).
+
+Multi-line comments are also supported through enclosure within ``/*`` and ``*/`` (but nesting is not supported).
+
+::
+
+    — This is a comment
+    // This is a comment too
+    /* This is
+       a multi-line comment */
+
+Statements
+^^^^^^^^^^
+
+CQL consists of statements that can be divided in the following categories:
+
+- :ref:`data-definition` statements, to define and change how the data is stored (keyspaces and tables).
+- :ref:`data-manipulation` statements, for selecting, inserting and deleting data.
+- :ref:`secondary-indexes` statements.
+- :ref:`materialized-views` statements.
+- :ref:`cql-roles` statements.
+- :ref:`cql-permissions` statements.
+- :ref:`User-Defined Functions <udfs>` statements.
+- :ref:`udts` statements.
+- :ref:`cql-triggers` statements.
+
+All the statements are listed below and are described in the rest of this documentation (see links above):
+
+.. productionlist::
+   cql_statement: `statement` [ ';' ]
+   statement: `ddl_statement`
+            : | `dml_statement`
+            : | `secondary_index_statement`
+            : | `materialized_view_statement`
+            : | `role_or_permission_statement`
+            : | `udf_statement`
+            : | `udt_statement`
+            : | `trigger_statement`
+   ddl_statement: `use_statement`
+                : | `create_keyspace_statement`
+                : | `alter_keyspace_statement`
+                : | `drop_keyspace_statement`
+                : | `create_table_statement`
+                : | `alter_table_statement`
+                : | `drop_table_statement`
+                : | `truncate_statement`
+    dml_statement: `select_statement`
+                 : | `insert_statement`
+                 : | `update_statement`
+                 : | `delete_statement`
+                 : | `batch_statement`
+    secondary_index_statement: `create_index_statement`
+                             : | `drop_index_statement`
+    materialized_view_statement: `create_materialized_view_statement`
+                               : | `drop_materialized_view_statement`
+    role_or_permission_statement: `create_role_statement`
+                                : | `alter_role_statement`
+                                : | `drop_role_statement`
+                                : | `grant_role_statement`
+                                : | `revoke_role_statement`
+                                : | `list_roles_statement`
+                                : | `grant_permission_statement`
+                                : | `revoke_permission_statement`
+                                : | `list_permissions_statement`
+                                : | `create_user_statement`
+                                : | `alter_user_statement`
+                                : | `drop_user_statement`
+                                : | `list_users_statement`
+    udf_statement: `create_function_statement`
+                 : | `drop_function_statement`
+                 : | `create_aggregate_statement`
+                 : | `drop_aggregate_statement`
+    udt_statement: `create_type_statement`
+                 : | `alter_type_statement`
+                 : | `drop_type_statement`
+    trigger_statement: `create_trigger_statement`
+                     : | `drop_trigger_statement`
+
+.. _prepared-statements:
+
+Prepared Statements
+^^^^^^^^^^^^^^^^^^^
+
+CQL supports *prepared statements*. Prepared statements are an optimization that allows to parse a query only once but
+execute it multiple times with different concrete values.
+
+Any statement that uses at least one bind marker (see :token:`bind_marker`) will need to be *prepared*. After which the statement
+can be *executed* by provided concrete values for each of its marker. The exact details of how a statement is prepared
+and then executed depends on the CQL driver used and you should refer to your driver documentation.

diff --git a/doc/source/cql/dml.rst b/doc/source/cql/dml.rst
new file mode 100644
index 0000000..989c0ca
--- /dev/null
+++ b/doc/source/cql/dml.rst

@@ -0,0 +1,499 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: sql
+
+.. _data-manipulation:
+
+Data Manipulation
+-----------------
+
+This section describes the statements supported by CQL to insert, update, delete and query data.
+
+.. _select-statement:
+
+SELECT
+^^^^^^
+
+Querying data from data is done using a ``SELECT`` statement:
+
+.. productionlist::
+   select_statement: SELECT [ JSON | DISTINCT ] ( `select_clause` | '*' )
+                   : FROM `table_name`
+                   : [ WHERE `where_clause` ]
+                   : [ ORDER BY `ordering_clause` ]
+                   : [ PER PARTITION LIMIT (`integer` | `bind_marker`) ]
+                   : [ LIMIT (`integer` | `bind_marker`) ]
+                   : [ ALLOW FILTERING ]
+   select_clause: `selector` [ AS `identifier` ] ( ',' `selector` [ AS `identifier` ] )
+   selector: `column_name`
+           : | `term`
+           : | CAST '(' `selector` AS `cql_type` ')'
+           : | `function_name` '(' [ `selector` ( ',' `selector` )* ] ')'
+           : | COUNT '(' '*' ')'
+   where_clause: `relation` ( AND `relation` )*
+   relation: `column_name` `operator` `term`
+           : '(' `column_name` ( ',' `column_name` )* ')' `operator` `tuple_literal`
+           : TOKEN '(' `column_name` ( ',' `column_name` )* ')' `operator` `term`
+   operator: '=' | '<' | '>' | '<=' | '>=' | '!=' | IN | CONTAINS | CONTAINS KEY
+   ordering_clause: `column_name` [ ASC | DESC ] ( ',' `column_name` [ ASC | DESC ] )*
+
+For instance::
+
+    SELECT name, occupation FROM users WHERE userid IN (199, 200, 207);
+    SELECT JSON name, occupation FROM users WHERE userid = 199;
+    SELECT name AS user_name, occupation AS user_occupation FROM users;
+
+    SELECT time, value
+    FROM events
+    WHERE event_type = 'myEvent'
+      AND time > '2011-02-03'
+      AND time <= '2012-01-01'
+
+    SELECT COUNT (*) AS user_count FROM users;
+
+The ``SELECT`` statements reads one or more columns for one or more rows in a table. It returns a result-set of the rows
+matching the request, where each row contains the values for the selection corresponding to the query. Additionally,
+:ref:`functions <cql-functions>` including :ref:`aggregation <aggregate-functions>` ones can be applied to the result.
+
+A ``SELECT`` statement contains at least a :ref:`selection clause <selection-clause>` and the name of the table on which
+the selection is on (note that CQL does **not** joins or sub-queries and thus a select statement only apply to a single
+table). In most case, a select will also have a :ref:`where clause <where-clause>` and it can optionally have additional
+clauses to :ref:`order <ordering-clause>` or :ref:`limit <limit-clause>` the results. Lastly, :ref:`queries that require
+filtering <allow-filtering>` can be allowed if the ``ALLOW FILTERING`` flag is provided.
+
+.. _selection-clause:
+
+Selection clause
+~~~~~~~~~~~~~~~~
+
+The :token:`select_clause` determines which columns needs to be queried and returned in the result-set, as well as any
+transformation to apply to this result before returning. It consists of a comma-separated list of *selectors* or,
+alternatively, of the wildcard character (``*``) to select all the columns defined in the table.
+
+Selectors
+`````````
+
+A :token:`selector` can be one of:
+
+- A column name of the table selected, to retrieve the values for that column.
+- A term, which is usually used nested inside other selectors like functions (if a term is selected directly, then the
+  corresponding column of the result-set will simply have the value of this term for every row returned).
+- A casting, which allows to convert a nested selector to a (compatible) type.
+- A function call, where the arguments are selector themselves. See the section on :ref:`functions <cql-functions>` for
+  more details.
+- The special call ``COUNT(*)`` to the :ref:`COUNT function <count-function>`, which counts all non-null results.
+
+Aliases
+```````
+
+Every *top-level* selector can also be aliased (using `AS`). If so, the name of the corresponding column in the result
+set will be that of the alias. For instance::
+
+    // Without alias
+    SELECT intAsBlob(4) FROM t;
+
+    //  intAsBlob(4)
+    // --------------
+    //  0x00000004
+
+    // With alias
+    SELECT intAsBlob(4) AS four FROM t;
+
+    //  four
+    // ------------
+    //  0x00000004
+
+.. note:: Currently, aliases aren't recognized anywhere else in the statement where they are used (not in the ``WHERE``
+   clause, not in the ``ORDER BY`` clause, ...). You must use the orignal column name instead.
+
+
+``WRITETIME`` and ``TTL`` function
+```````````````````````````````````
+
+Selection supports two special functions (that aren't allowed anywhere else): ``WRITETIME`` and ``TTL``. Both function
+take only one argument and that argument *must* be a column name (so for instance ``TTL(3)`` is invalid).
+
+Those functions allow to retrieve meta-information that are stored internally for each column, namely:
+
+- the timestamp of the value of the column for ``WRITETIME``.
+- the remaining time to live (in seconds) for the value of the column if it set to expire (and ``null`` otherwise).
+
+.. _where-clause:
+
+The ``WHERE`` clause
+~~~~~~~~~~~~~~~~~~~~
+
+The ``WHERE`` clause specifies which rows must be queried. It is composed of relations on the columns that are part of
+the ``PRIMARY KEY`` and/or have a `secondary index <#createIndexStmt>`__ defined on them.
+
+Not all relations are allowed in a query. For instance, non-equal relations (where ``IN`` is considered as an equal
+relation) on a partition key are not supported (but see the use of the ``TOKEN`` method below to do non-equal queries on
+the partition key). Moreover, for a given partition key, the clustering columns induce an ordering of rows and relations
+on them is restricted to the relations that allow to select a **contiguous** (for the ordering) set of rows. For
+instance, given::
+
+    CREATE TABLE posts (
+        userid text,
+        blog_title text,
+        posted_at timestamp,
+        entry_title text,
+        content text,
+        category int,
+        PRIMARY KEY (userid, blog_title, posted_at)
+    )
+
+The following query is allowed::
+
+    SELECT entry_title, content FROM posts
+     WHERE userid = 'john doe'
+       AND blog_title='John''s Blog'
+       AND posted_at >= '2012-01-01' AND posted_at < '2012-01-31'
+
+But the following one is not, as it does not select a contiguous set of rows (and we suppose no secondary indexes are
+set)::
+
+    // Needs a blog_title to be set to select ranges of posted_at
+    SELECT entry_title, content FROM posts
+     WHERE userid = 'john doe'
+       AND posted_at >= '2012-01-01' AND posted_at < '2012-01-31'
+
+When specifying relations, the ``TOKEN`` function can be used on the ``PARTITION KEY`` column to query. In that case,
+rows will be selected based on the token of their ``PARTITION_KEY`` rather than on the value. Note that the token of a
+key depends on the partitioner in use, and that in particular the RandomPartitioner won't yield a meaningful order. Also
+note that ordering partitioners always order token values by bytes (so even if the partition key is of type int,
+``token(-1) > token(0)`` in particular). Example::
+
+    SELECT * FROM posts
+     WHERE token(userid) > token('tom') AND token(userid) < token('bob')
+
+Moreover, the ``IN`` relation is only allowed on the last column of the partition key and on the last column of the full
+primary key.
+
+It is also possible to “group” ``CLUSTERING COLUMNS`` together in a relation using the tuple notation. For instance::
+
+    SELECT * FROM posts
+     WHERE userid = 'john doe'
+       AND (blog_title, posted_at) > ('John''s Blog', '2012-01-01')
+
+will request all rows that sorts after the one having “John's Blog” as ``blog_tile`` and '2012-01-01' for ``posted_at``
+in the clustering order. In particular, rows having a ``post_at <= '2012-01-01'`` will be returned as long as their
+``blog_title > 'John''s Blog'``, which would not be the case for::
+
+    SELECT * FROM posts
+     WHERE userid = 'john doe'
+       AND blog_title > 'John''s Blog'
+       AND posted_at > '2012-01-01'
+
+The tuple notation may also be used for ``IN`` clauses on clustering columns::
+
+    SELECT * FROM posts
+     WHERE userid = 'john doe'
+       AND (blog_title, posted_at) IN (('John''s Blog', '2012-01-01), ('Extreme Chess', '2014-06-01'))
+
+The ``CONTAINS`` operator may only be used on collection columns (lists, sets, and maps). In the case of maps,
+``CONTAINS`` applies to the map values. The ``CONTAINS KEY`` operator may only be used on map columns and applies to the
+map keys.
+
+.. _ordering-clause:
+
+Ordering results
+~~~~~~~~~~~~~~~~
+
+The ``ORDER BY`` clause allows to select the order of the returned results. It takes as argument a list of column names
+along with the order for the column (``ASC`` for ascendant and ``DESC`` for descendant, omitting the order being
+equivalent to ``ASC``). Currently the possible orderings are limited by the :ref:`clustering order <clustering-order>`
+defined on the table:
+
+- if the table has been defined without any specific ``CLUSTERING ORDER``, then then allowed orderings are the order
+  induced by the clustering columns and the reverse of that one.
+- otherwise, the orderings allowed are the order of the ``CLUSTERING ORDER`` option and the reversed one.
+
+.. _limit-clause:
+
+Limiting results
+~~~~~~~~~~~~~~~~
+
+The ``LIMIT`` option to a ``SELECT`` statement limits the number of rows returned by a query, while the ``PER PARTITION
+LIMIT`` option limits the number of rows returned for a given partition by the query. Note that both type of limit can
+used in the same statement.
+
+.. _allow-filtering:
+
+Allowing filtering
+~~~~~~~~~~~~~~~~~~
+
+By default, CQL only allows select queries that don't involve “filtering” server side, i.e. queries where we know that
+all (live) record read will be returned (maybe partly) in the result set. The reasoning is that those “non filtering”
+queries have predictable performance in the sense that they will execute in a time that is proportional to the amount of
+data **returned** by the query (which can be controlled through ``LIMIT``).
+
+The ``ALLOW FILTERING`` option allows to explicitly allow (some) queries that require filtering. Please note that a
+query using ``ALLOW FILTERING`` may thus have unpredictable performance (for the definition above), i.e. even a query
+that selects a handful of records **may** exhibit performance that depends on the total amount of data stored in the
+cluster.
+
+For instance, considering the following table holding user profiles with their year of birth (with a secondary index on
+it) and country of residence::
+
+    CREATE TABLE users (
+        username text PRIMARY KEY,
+        firstname text,
+        lastname text,
+        birth_year int,
+        country text
+    )
+
+    CREATE INDEX ON users(birth_year);
+
+Then the following queries are valid::
+
+    SELECT * FROM users;
+    SELECT * FROM users WHERE birth_year = 1981;
+
+because in both case, Cassandra guarantees that these queries performance will be proportional to the amount of data
+returned. In particular, if no users are born in 1981, then the second query performance will not depend of the number
+of user profile stored in the database (not directly at least: due to secondary index implementation consideration, this
+query may still depend on the number of node in the cluster, which indirectly depends on the amount of data stored.
+Nevertheless, the number of nodes will always be multiple number of magnitude lower than the number of user profile
+stored). Of course, both query may return very large result set in practice, but the amount of data returned can always
+be controlled by adding a ``LIMIT``.
+
+However, the following query will be rejected::
+
+    SELECT * FROM users WHERE birth_year = 1981 AND country = 'FR';
+
+because Cassandra cannot guarantee that it won't have to scan large amount of data even if the result to those query is
+small. Typically, it will scan all the index entries for users born in 1981 even if only a handful are actually from
+France. However, if you “know what you are doing”, you can force the execution of this query by using ``ALLOW
+FILTERING`` and so the following query is valid::
+
+    SELECT * FROM users WHERE birth_year = 1981 AND country = 'FR' ALLOW FILTERING;
+
+.. _insert-statement:
+
+INSERT
+^^^^^^
+
+Inserting data for a row is done using an ``INSERT`` statement:
+
+.. productionlist::
+   insert_statement: INSERT INTO `table_name` ( `names_values` | `json_clause` )
+                   : [ IF NOT EXISTS ]
+                   : [ USING `update_parameter` ( AND `update_parameter` )* ]
+   names_values: `names` VALUES `tuple_literal`
+   json_clause: JSON `string`
+   names: '(' `column_name` ( ',' `column_name` )* ')'
+
+For instance::
+
+    INSERT INTO NerdMovies (movie, director, main_actor, year)
+                    VALUES ('Serenity', 'Joss Whedon', 'Nathan Fillion', 2005)
+          USING TTL 86400;
+
+    INSERT INTO NerdMovies JSON '{"movie": "Serenity",
+                                  "director": "Joss Whedon",
+                                  "year": 2005}';
+
+The ``INSERT`` statement writes one or more columns for a given row in a table. Note that since a row is identified by
+its ``PRIMARY KEY``, at least the columns composing it must be specified. The list of columns to insert to must be
+supplied when using the ``VALUES`` syntax. When using the ``JSON`` syntax, they are optional. See the
+section on :ref:`JSON support <cql-json>` for more detail.
+
+Note that unlike in SQL, ``INSERT`` does not check the prior existence of the row by default: the row is created if none
+existed before, and updated otherwise. Furthermore, there is no mean to know which of creation or update happened.
+
+It is however possible to use the ``IF NOT EXISTS`` condition to only insert if the row does not exist prior to the
+insertion. But please note that using ``IF NOT EXISTS`` will incur a non negligible performance cost (internally, Paxos
+will be used) so this should be used sparingly.
+
+All updates for an ``INSERT`` are applied atomically and in isolation.
+
+Please refer to the :ref:`UPDATE <update-parameters>` section for informations on the :token:`update_parameter`.
+
+Also note that ``INSERT`` does not support counters, while ``UPDATE`` does.
+
+.. _update-statement:
+
+UPDATE
+^^^^^^
+
+Updating a row is done using an ``UPDATE`` statement:
+
+.. productionlist::
+   update_statement: UPDATE `table_name`
+                   : [ USING `update_parameter` ( AND `update_parameter` )* ]
+                   : SET `assignment` ( ',' `assignment` )*
+                   : WHERE `where_clause`
+                   : [ IF ( EXISTS | `condition` ( AND `condition` )*) ]
+   update_parameter: ( TIMESTAMP | TTL ) ( `integer` | `bind_marker` )
+   assignment: `simple_selection` '=' `term`
+             :| `column_name` '=' `column_name` ( '+' | '-' ) `term`
+             :| `column_name` '=' `list_literal` '+' `column_name`
+   simple_selection: `column_name`
+                   :| `column_name` '[' `term` ']'
+                   :| `column_name` '.' `field_name
+   condition: `simple_selection` `operator` `term`
+
+For instance::
+
+    UPDATE NerdMovies USING TTL 400
+       SET director   = 'Joss Whedon',
+           main_actor = 'Nathan Fillion',
+           year       = 2005
+     WHERE movie = 'Serenity';
+
+    UPDATE UserActions
+       SET total = total + 2
+       WHERE user = B70DE1D0-9908-4AE3-BE34-5573E5B09F14
+         AND action = 'click';
+
+The ``UPDATE`` statement writes one or more columns for a given row in a table. The :token:`where_clause` is used to
+select the row to update and must include all columns composing the ``PRIMARY KEY``. Non primary key columns are then
+set using the ``SET`` keyword.
+
+Note that unlike in SQL, ``UPDATE`` does not check the prior existence of the row by default (except through ``IF``, see
+below): the row is created if none existed before, and updated otherwise. Furthermore, there are no means to know
+whether a creation or update occurred.
+
+It is however possible to use the conditions on some columns through ``IF``, in which case the row will not be updated
+unless the conditions are met. But, please note that using ``IF`` conditions will incur a non-negligible performance
+cost (internally, Paxos will be used) so this should be used sparingly.
+
+In an ``UPDATE`` statement, all updates within the same partition key are applied atomically and in isolation.
+
+Regarding the :token:`assignment`:
+
+- ``c = c + 3`` is used to increment/decrement counters. The column name after the '=' sign **must** be the same than
+  the one before the '=' sign. Note that increment/decrement is only allowed on counters, and are the *only* update
+  operations allowed on counters. See the section on :ref:`counters <counters>` for details.
+- ``id = id + <some-collection>`` and ``id[value1] = value2`` are for collections, see the :ref:`relevant section
+  <collections>` for details.
+- ``id.field = 3`` is for setting the value of a field on a non-frozen user-defined types. see the :ref:`relevant section
+  <udts>` for details.
+
+.. _update-parameters:
+
+Update parameters
+~~~~~~~~~~~~~~~~~
+
+The ``UPDATE``, ``INSERT`` (and ``DELETE`` and ``BATCH`` for the ``TIMESTAMP``) statements support the following
+parameters:
+
+- ``TIMESTAMP``: sets the timestamp for the operation. If not specified, the coordinator will use the current time (in
+  microseconds) at the start of statement execution as the timestamp. This is usually a suitable default.
+- ``TTL``: specifies an optional Time To Live (in seconds) for the inserted values. If set, the inserted values are
+  automatically removed from the database after the specified time. Note that the TTL concerns the inserted values, not
+  the columns themselves. This means that any subsequent update of the column will also reset the TTL (to whatever TTL
+  is specified in that update). By default, values never expire. A TTL of 0 is equivalent to no TTL. If the table has a
+  default_time_to_live, a TTL of 0 will remove the TTL for the inserted or updated values.
+
+.. _delete_statement:
+
+DELETE
+^^^^^^
+
+Deleting rows or parts of rows uses the ``DELETE`` statement:
+
+.. productionlist::
+   delete_statement: DELETE [ `simple_selection` ( ',' `simple_selection` ) ]
+                   : FROM `table_name`
+                   : [ USING `update_parameter` ( AND `update_parameter` )* ]
+                   : WHERE `where_clause`
+                   : [ IF ( EXISTS | `condition` ( AND `condition` )*) ]
+
+For instance::
+
+    DELETE FROM NerdMovies USING TIMESTAMP 1240003134
+     WHERE movie = 'Serenity';
+
+    DELETE phone FROM Users
+     WHERE userid IN (C73DE1D3-AF08-40F3-B124-3FF3E5109F22, B70DE1D0-9908-4AE3-BE34-5573E5B09F14);
+
+The ``DELETE`` statement deletes columns and rows. If column names are provided directly after the ``DELETE`` keyword,
+only those columns are deleted from the row indicated by the ``WHERE`` clause. Otherwise, whole rows are removed.
+
+The ``WHERE`` clause specifies which rows are to be deleted. Multiple rows may be deleted with one statement by using an
+``IN`` operator. A range of rows may be deleted using an inequality operator (such as ``>=``).
+
+``DELETE`` supports the ``TIMESTAMP`` option with the same semantics as in :ref:`updates <update-parameters>`.
+
+In a ``DELETE`` statement, all deletions within the same partition key are applied atomically and in isolation.
+
+A ``DELETE`` operation can be conditional through the use of an ``IF`` clause, similar to ``UPDATE`` and ``INSERT``
+statements. However, as with ``INSERT`` and ``UPDATE`` statements, this will incur a non-negligible performance cost
+(internally, Paxos will be used) and so should be used sparingly.
+
+.. _batch_statement:
+
+BATCH
+^^^^^
+
+Multiple ``INSERT``, ``UPDATE`` and ``DELETE`` can be executed in a single statement by grouping them through a
+``BATCH`` statement:
+
+.. productionlist::
+   batch_statement: BEGIN [ UNLOGGED | COUNTER ] BATCH
+                   : [ USING `update_parameter` ( AND `update_parameter` )* ]
+                   : `modification_statement` ( ';' `modification_statement` )*
+                   : APPLY BATCH
+   modification_statement: `insert_statement` | `update_statement` | `delete_statement`
+
+For instance::
+
+    BEGIN BATCH
+       INSERT INTO users (userid, password, name) VALUES ('user2', 'ch@ngem3b', 'second user');
+       UPDATE users SET password = 'ps22dhds' WHERE userid = 'user3';
+       INSERT INTO users (userid, password) VALUES ('user4', 'ch@ngem3c');
+       DELETE name FROM users WHERE userid = 'user1';
+    APPLY BATCH;
+
+The ``BATCH`` statement group multiple modification statements (insertions/updates and deletions) into a single
+statement. It serves several purposes:
+
+- It saves network round-trips between the client and the server (and sometimes between the server coordinator and the
+  replicas) when batching multiple updates.
+- All updates in a ``BATCH`` belonging to a given partition key are performed in isolation.
+- By default, all operations in the batch are performed as *logged*, to ensure all mutations eventually complete (or
+  none will). See the notes on :ref:`UNLOGGED batches <unlogged-batches>` for more details.
+
+Note that:
+
+- ``BATCH`` statements may only contain ``UPDATE``, ``INSERT`` and ``DELETE`` statements (not other batches for instance).
+- Batches are *not* a full analogue for SQL transactions.
+- If a timestamp is not specified for each operation, then all operations will be applied with the same timestamp
+  (either one generated automatically, or the timestamp provided at the batch level). Due to Cassandra's conflict
+  resolution procedure in the case of `timestamp ties <http://wiki.apache.org/cassandra/FAQ#clocktie>`__, operations may
+  be applied in an order that is different from the order they are listed in the ``BATCH`` statement. To force a
+  particular operation ordering, you must specify per-operation timestamps.
+
+.. _unlogged-batches:
+
+``UNLOGGED`` batches
+~~~~~~~~~~~~~~~~~~~~
+
+By default, Cassandra uses a batch log to ensure all operations in a batch eventually complete or none will (note
+however that operations are only isolated within a single partition).
+
+There is a performance penalty for batch atomicity when a batch spans multiple partitions. If you do not want to incur
+this penalty, you can tell Cassandra to skip the batchlog with the ``UNLOGGED`` option. If the ``UNLOGGED`` option is
+used, a failed batch might leave the patch only partly applied.
+
+``COUNTER`` batches
+~~~~~~~~~~~~~~~~~~~
+
+Use the ``COUNTER`` option for batched counter updates. Unlike other
+updates in Cassandra, counter updates are not idempotent.

diff --git a/doc/source/cql/functions.rst b/doc/source/cql/functions.rst
new file mode 100644
index 0000000..efcdf32
--- /dev/null
+++ b/doc/source/cql/functions.rst

@@ -0,0 +1,553 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: sql
+
+.. _cql-functions:
+
+.. Need some intro for UDF and native functions in general and point those to it.
+.. _udfs:
+.. _native-functions:
+
+Functions
+---------
+
+CQL supports 2 main categories of functions:
+
+- the :ref:`scalar functions <scalar-functions>`, which simply take a number of values and produce an output with it.
+- the :ref:`aggregate functions <aggregate-functions>`, which are used to aggregate multiple rows results from a
+  ``SELECT`` statement.
+
+In both cases, CQL provides a number of native "hard-coded" functions as well as the ability to create new user-defined
+functions.
+
+.. note:: By default, the use of user-defined functions is disabled by default for security concerns (even when
+   enabled, the execution of user-defined functions is sandboxed and a "rogue" function should not be allowed to do
+   evil, but no sandbox is perfect so using user-defined functions is opt-in). See the ``enable_user_defined_functions``
+   in ``cassandra.yaml`` to enable them.
+
+.. _scalar-functions:
+
+Scalar functions
+^^^^^^^^^^^^^^^^
+
+.. _scalar-native-functions:
+
+Native functions
+~~~~~~~~~~~~~~~~
+
+Cast
+````
+
+The ``cast`` function can be used to converts one native datatype to another.
+
+The following table describes the conversions supported by the ``cast`` function. Cassandra will silently ignore any
+cast converting a datatype into its own datatype.
+
+=============== =======================================================================================================
+ From            To
+=============== =======================================================================================================
+ ``ascii``       ``text``, ``varchar``
+ ``bigint``      ``tinyint``, ``smallint``, ``int``, ``float``, ``double``, ``decimal``, ``varint``, ``text``,
+                 ``varchar``
+ ``boolean``     ``text``, ``varchar``
+ ``counter``     ``tinyint``, ``smallint``, ``int``, ``bigint``, ``float``, ``double``, ``decimal``, ``varint``,
+                 ``text``, ``varchar``
+ ``date``        ``timestamp``
+ ``decimal``     ``tinyint``, ``smallint``, ``int``, ``bigint``, ``float``, ``double``, ``varint``, ``text``,
+                 ``varchar``
+ ``double``      ``tinyint``, ``smallint``, ``int``, ``bigint``, ``float``, ``decimal``, ``varint``, ``text``,
+                 ``varchar``
+ ``float``       ``tinyint``, ``smallint``, ``int``, ``bigint``, ``double``, ``decimal``, ``varint``, ``text``,
+                 ``varchar``
+ ``inet``        ``text``, ``varchar``
+ ``int``         ``tinyint``, ``smallint``, ``bigint``, ``float``, ``double``, ``decimal``, ``varint``, ``text``,
+                 ``varchar``
+ ``smallint``    ``tinyint``, ``int``, ``bigint``, ``float``, ``double``, ``decimal``, ``varint``, ``text``,
+                 ``varchar``
+ ``time``        ``text``, ``varchar``
+ ``timestamp``   ``date``, ``text``, ``varchar``
+ ``timeuuid``    ``timestamp``, ``date``, ``text``, ``varchar``
+ ``tinyint``     ``tinyint``, ``smallint``, ``int``, ``bigint``, ``float``, ``double``, ``decimal``, ``varint``,
+                 ``text``, ``varchar``
+ ``uuid``        ``text``, ``varchar``
+ ``varint``      ``tinyint``, ``smallint``, ``int``, ``bigint``, ``float``, ``double``, ``decimal``, ``text``,
+                 ``varchar``
+=============== =======================================================================================================
+
+The conversions rely strictly on Java's semantics. For example, the double value 1 will be converted to the text value
+'1.0'. For instance::
+
+    SELECT avg(cast(count as double)) FROM myTable
+
+Token
+`````
+
+The ``token`` function allows to compute the token for a given partition key. The exact signature of the token function
+depends on the table concerned and of the partitioner used by the cluster.
+
+The type of the arguments of the ``token`` depend on the type of the partition key columns. The return type depend on
+the partitioner in use:
+
+- For Murmur3Partitioner, the return type is ``bigint``.
+- For RandomPartitioner, the return type is ``varint``.
+- For ByteOrderedPartitioner, the return type is ``blob``.
+
+For instance, in a cluster using the default Murmur3Partitioner, if a table is defined by::
+
+    CREATE TABLE users (
+        userid text PRIMARY KEY,
+        username text,
+    )
+
+then the ``token`` function will take a single argument of type ``text`` (in that case, the partition key is ``userid``
+(there is no clustering columns so the partition key is the same than the primary key)), and the return type will be
+``bigint``.
+
+Uuid
+````
+The ``uuid`` function takes no parameters and generates a random type 4 uuid suitable for use in ``INSERT`` or
+``UPDATE`` statements.
+
+.. _timeuuid-functions:
+
+Timeuuid functions
+``````````````````
+
+``now``
+#######
+
+The ``now`` function takes no arguments and generates, on the coordinator node, a new unique timeuuid (at the time where
+the statement using it is executed). Note that this method is useful for insertion but is largely non-sensical in
+``WHERE`` clauses. For instance, a query of the form::
+
+    SELECT * FROM myTable WHERE t = now()
+
+will never return any result by design, since the value returned by ``now()`` is guaranteed to be unique.
+
+``minTimeuuid`` and ``maxTimeuuid``
+###################################
+
+The ``minTimeuuid`` (resp. ``maxTimeuuid``) function takes a ``timestamp`` value ``t`` (which can be `either a timestamp
+or a date string <timestamps>`) and return a *fake* ``timeuuid`` corresponding to the *smallest* (resp. *biggest*)
+possible ``timeuuid`` having for timestamp ``t``. So for instance::
+
+    SELECT * FROM myTable
+     WHERE t > maxTimeuuid('2013-01-01 00:05+0000')
+       AND t < minTimeuuid('2013-02-02 10:00+0000')
+
+will select all rows where the ``timeuuid`` column ``t`` is strictly older than ``'2013-01-01 00:05+0000'`` but strictly
+younger than ``'2013-02-02 10:00+0000'``. Please note that ``t >= maxTimeuuid('2013-01-01 00:05+0000')`` would still
+*not* select a ``timeuuid`` generated exactly at '2013-01-01 00:05+0000' and is essentially equivalent to ``t >
+maxTimeuuid('2013-01-01 00:05+0000')``.
+
+.. note:: We called the values generated by ``minTimeuuid`` and ``maxTimeuuid`` *fake* UUID because they do no respect
+   the Time-Based UUID generation process specified by the `RFC 4122 <http://www.ietf.org/rfc/rfc4122.txt>`__. In
+   particular, the value returned by these 2 methods will not be unique. This means you should only use those methods
+   for querying (as in the example above). Inserting the result of those methods is almost certainly *a bad idea*.
+
+Time conversion functions
+`````````````````````````
+
+A number of functions are provided to “convert” a ``timeuuid``, a ``timestamp`` or a ``date`` into another ``native``
+type.
+
+===================== =============== ===================================================================
+ Function name         Input type      Description
+===================== =============== ===================================================================
+ ``toDate``            ``timeuuid``    Converts the ``timeuuid`` argument into a ``date`` type
+ ``toDate``            ``timestamp``   Converts the ``timestamp`` argument into a ``date`` type
+ ``toTimestamp``       ``timeuuid``    Converts the ``timeuuid`` argument into a ``timestamp`` type
+ ``toTimestamp``       ``date``        Converts the ``date`` argument into a ``timestamp`` type
+ ``toUnixTimestamp``   ``timeuuid``    Converts the ``timeuuid`` argument into a ``bigInt`` raw value
+ ``toUnixTimestamp``   ``timestamp``   Converts the ``timestamp`` argument into a ``bigInt`` raw value
+ ``toUnixTimestamp``   ``date``        Converts the ``date`` argument into a ``bigInt`` raw value
+ ``dateOf``            ``timeuuid``    Similar to ``toTimestamp(timeuuid)`` (DEPRECATED)
+ ``unixTimestampOf``   ``timeuuid``    Similar to ``toUnixTimestamp(timeuuid)`` (DEPRECATED)
+===================== =============== ===================================================================
+
+Blob conversion functions
+`````````````````````````
+A number of functions are provided to “convert” the native types into binary data (``blob``). For every
+``<native-type>`` ``type`` supported by CQL (a notable exceptions is ``blob``, for obvious reasons), the function
+``typeAsBlob`` takes a argument of type ``type`` and return it as a ``blob``. Conversely, the function ``blobAsType``
+takes a 64-bit ``blob`` argument and convert it to a ``bigint`` value. And so for instance, ``bigintAsBlob(3)`` is
+``0x0000000000000003`` and ``blobAsBigint(0x0000000000000003)`` is ``3``.
+
+.. _user-defined-scalar-functions:
+
+User-defined functions
+~~~~~~~~~~~~~~~~~~~~~~
+
+User-defined functions allow execution of user-provided code in Cassandra. By default, Cassandra supports defining
+functions in *Java* and *JavaScript*. Support for other JSR 223 compliant scripting languages (such as Python, Ruby, and
+Scala) can be added by adding a JAR to the classpath.
+
+UDFs are part of the Cassandra schema. As such, they are automatically propagated to all nodes in the cluster.
+
+UDFs can be *overloaded* - i.e. multiple UDFs with different argument types but the same function name. Example::
+
+    CREATE FUNCTION sample ( arg int ) ...;
+    CREATE FUNCTION sample ( arg text ) ...;
+
+User-defined functions are susceptible to all of the normal problems with the chosen programming language. Accordingly,
+implementations should be safe against null pointer exceptions, illegal arguments, or any other potential source of
+exceptions. An exception during function execution will result in the entire statement failing.
+
+It is valid to use *complex* types like collections, tuple types and user-defined types as argument and return types.
+Tuple types and user-defined types are handled by the conversion functions of the DataStax Java Driver. Please see the
+documentation of the Java Driver for details on handling tuple types and user-defined types.
+
+Arguments for functions can be literals or terms. Prepared statement placeholders can be used, too.
+
+Note that you can use the double-quoted string syntax to enclose the UDF source code. For example::
+
+    CREATE FUNCTION some_function ( arg int )
+        RETURNS NULL ON NULL INPUT
+        RETURNS int
+        LANGUAGE java
+        AS $$ return arg; $$;
+
+    SELECT some_function(column) FROM atable ...;
+    UPDATE atable SET col = some_function(?) ...;
+
+    CREATE TYPE custom_type (txt text, i int);
+    CREATE FUNCTION fct_using_udt ( udtarg frozen )
+        RETURNS NULL ON NULL INPUT
+        RETURNS text
+        LANGUAGE java
+        AS $$ return udtarg.getString("txt"); $$;
+
+User-defined functions can be used in ``SELECT``, ``INSERT`` and ``UPDATE`` statements.
+
+The implicitly available ``udfContext`` field (or binding for script UDFs) provides the necessary functionality to
+create new UDT and tuple values::
+
+    CREATE TYPE custom\_type (txt text, i int);
+    CREATE FUNCTION fct\_using\_udt ( somearg int )
+        RETURNS NULL ON NULL INPUT
+        RETURNS custom\_type
+        LANGUAGE java
+        AS $$
+            UDTValue udt = udfContext.newReturnUDTValue();
+            udt.setString(“txt”, “some string”);
+            udt.setInt(“i”, 42);
+            return udt;
+        $$;
+
+The definition of the ``UDFContext`` interface can be found in the Apache Cassandra source code for
+``org.apache.cassandra.cql3.functions.UDFContext``.
+
+.. code-block:: java
+
+    public interface UDFContext
+    {
+        UDTValue newArgUDTValue(String argName);
+        UDTValue newArgUDTValue(int argNum);
+        UDTValue newReturnUDTValue();
+        UDTValue newUDTValue(String udtName);
+        TupleValue newArgTupleValue(String argName);
+        TupleValue newArgTupleValue(int argNum);
+        TupleValue newReturnTupleValue();
+        TupleValue newTupleValue(String cqlDefinition);
+    }
+
+Java UDFs already have some imports for common interfaces and classes defined. These imports are:
+
+.. code-block:: java
+
+    import java.nio.ByteBuffer;
+    import java.util.List;
+    import java.util.Map;
+    import java.util.Set;
+    import org.apache.cassandra.cql3.functions.UDFContext;
+    import com.datastax.driver.core.TypeCodec;
+    import com.datastax.driver.core.TupleValue;
+    import com.datastax.driver.core.UDTValue;
+
+Please note, that these convenience imports are not available for script UDFs.
+
+.. _create-function-statement:
+
+CREATE FUNCTION
+```````````````
+
+Creating a new user-defined function uses the ``CREATE FUNCTION`` statement:
+
+.. productionlist::
+   create_function_statement: CREATE [ OR REPLACE ] FUNCTION [ IF NOT EXISTS]
+                            :     `function_name` '(' `arguments_declaration` ')'
+                            :     [ CALLED | RETURNS NULL ] ON NULL INPUT
+                            :     RETURNS `cql_type`
+                            :     LANGUAGE `identifier`
+                            :     AS `string`
+   arguments_declaration: `identifier` `cql_type` ( ',' `identifier` `cql_type` )*
+
+For instance::
+
+    CREATE OR REPLACE FUNCTION somefunction(somearg int, anotherarg text, complexarg frozen<someUDT>, listarg list)
+        RETURNS NULL ON NULL INPUT
+        RETURNS text
+        LANGUAGE java
+        AS $$
+            // some Java code
+        $$;
+
+    CREATE FUNCTION IF NOT EXISTS akeyspace.fname(someArg int)
+        CALLED ON NULL INPUT
+        RETURNS text
+        LANGUAGE java
+        AS $$
+            // some Java code
+        $$;
+
+``CREATE FUNCTION`` with the optional ``OR REPLACE`` keywords either creates a function or replaces an existing one with
+the same signature. A ``CREATE FUNCTION`` without ``OR REPLACE`` fails if a function with the same signature already
+exists.
+
+If the optional ``IF NOT EXISTS`` keywords are used, the function will
+only be created if another function with the same signature does not
+exist.
+
+``OR REPLACE`` and ``IF NOT EXISTS`` cannot be used together.
+
+Behavior on invocation with ``null`` values must be defined for each
+function. There are two options:
+
+#. ``RETURNS NULL ON NULL INPUT`` declares that the function will always
+   return ``null`` if any of the input arguments is ``null``.
+#. ``CALLED ON NULL INPUT`` declares that the function will always be
+   executed.
+
+Function Signature
+##################
+
+Signatures are used to distinguish individual functions. The signature consists of:
+
+#. The fully qualified function name - i.e *keyspace* plus *function-name*
+#. The concatenated list of all argument types
+
+Note that keyspace names, function names and argument types are subject to the default naming conventions and
+case-sensitivity rules.
+
+Functions belong to a keyspace. If no keyspace is specified in ``<function-name>``, the current keyspace is used (i.e.
+the keyspace specified using the ``USE`` statement). It is not possible to create a user-defined function in one of the
+system keyspaces.
+
+.. _drop-function-statement:
+
+DROP FUNCTION
+`````````````
+
+Dropping a function uses the ``DROP FUNCTION`` statement:
+
+.. productionlist::
+   drop_function_statement: DROP FUNCTION [ IF EXISTS ] `function_name` [ '(' `arguments_signature` ')' ]
+   arguments_signature: `cql_type` ( ',' `cql_type` )*
+
+For instance::
+
+    DROP FUNCTION myfunction;
+    DROP FUNCTION mykeyspace.afunction;
+    DROP FUNCTION afunction ( int );
+    DROP FUNCTION afunction ( text );
+
+You must specify the argument types (:token:`arguments_signature`) of the function to drop if there are multiple
+functions with the same name but a different signature (overloaded functions).
+
+``DROP FUNCTION`` with the optional ``IF EXISTS`` keywords drops a function if it exists, but does not throw an error if
+it doesn't
+
+.. _aggregate-functions:
+
+Aggregate functions
+^^^^^^^^^^^^^^^^^^^
+
+Aggregate functions work on a set of rows. They receive values for each row and returns one value for the whole set.
+
+If ``normal`` columns, ``scalar functions``, ``UDT`` fields, ``writetime`` or ``ttl`` are selected together with
+aggregate functions, the values returned for them will be the ones of the first row matching the query.
+
+Native aggregates
+~~~~~~~~~~~~~~~~~
+
+.. _count-function:
+
+Count
+`````
+
+The ``count`` function can be used to count the rows returned by a query. Example::
+
+    SELECT COUNT (*) FROM plays;
+    SELECT COUNT (1) FROM plays;
+
+It also can be used to count the non null value of a given column::
+
+    SELECT COUNT (scores) FROM plays;
+
+Max and Min
+```````````
+
+The ``max`` and ``min`` functions can be used to compute the maximum and the minimum value returned by a query for a
+given column. For instance::
+
+    SELECT MIN (players), MAX (players) FROM plays WHERE game = 'quake';
+
+Sum
+```
+
+The ``sum`` function can be used to sum up all the values returned by a query for a given column. For instance::
+
+    SELECT SUM (players) FROM plays;
+
+Avg
+```
+
+The ``avg`` function can be used to compute the average of all the values returned by a query for a given column. For
+instance::
+
+    SELECT AVG (players) FROM plays;
+
+.. _user-defined-aggregates-functions:
+
+User-Defined Aggregates
+~~~~~~~~~~~~~~~~~~~~~~~
+
+User-defined aggregates allow the creation of custom aggregate functions. Common examples of aggregate functions are
+*count*, *min*, and *max*.
+
+Each aggregate requires an *initial state* (``INITCOND``, which defaults to ``null``) of type ``STYPE``. The first
+argument of the state function must have type ``STYPE``. The remaining arguments of the state function must match the
+types of the user-defined aggregate arguments. The state function is called once for each row, and the value returned by
+the state function becomes the new state. After all rows are processed, the optional ``FINALFUNC`` is executed with last
+state value as its argument.
+
+``STYPE`` is mandatory in order to be able to distinguish possibly overloaded versions of the state and/or final
+function (since the overload can appear after creation of the aggregate).
+
+User-defined aggregates can be used in ``SELECT`` statement.
+
+A complete working example for user-defined aggregates (assuming that a keyspace has been selected using the ``USE``
+statement)::
+
+    CREATE OR REPLACE FUNCTION averageState(state tuple<int,bigint>, val int)
+        CALLED ON NULL INPUT
+        RETURNS tuple
+        LANGUAGE java
+        AS '
+            if (val != null) {
+                state.setInt(0, state.getInt(0)+1);
+                state.setLong(1, state.getLong(1)+val.intValue());
+            }
+            return state;
+        ';
+
+    CREATE OR REPLACE FUNCTION averageFinal (state tuple<int,bigint>)
+        CALLED ON NULL INPUT
+        RETURNS double
+        LANGUAGE java
+        AS '
+            double r = 0;
+            if (state.getInt(0) == 0) return null;
+            r = state.getLong(1);
+            r /= state.getInt(0);
+            return Double.valueOf®;
+        ';
+
+    CREATE OR REPLACE AGGREGATE average(int)
+        SFUNC averageState
+        STYPE tuple
+        FINALFUNC averageFinal
+        INITCOND (0, 0);
+
+    CREATE TABLE atable (
+        pk int PRIMARY KEY,
+        val int
+    );
+
+    INSERT INTO atable (pk, val) VALUES (1,1);
+    INSERT INTO atable (pk, val) VALUES (2,2);
+    INSERT INTO atable (pk, val) VALUES (3,3);
+    INSERT INTO atable (pk, val) VALUES (4,4);
+
+    SELECT average(val) FROM atable;
+
+.. _create-aggregate-statement:
+
+CREATE AGGREGATE
+````````````````
+
+Creating (or replacing) a user-defined aggregate function uses the ``CREATE AGGREGATE`` statement:
+
+.. productionlist::
+   create_aggregate_statement: CREATE [ OR REPLACE ] AGGREGATE [ IF NOT EXISTS ]
+                             :     `function_name` '(' `arguments_signature` ')'
+                             :     SFUNC `function_name`
+                             :     STYPE `cql_type`
+                             :     [ FINALFUNC `function_name` ]
+                             :     [ INITCOND `term` ]
+
+See above for a complete example.
+
+``CREATE AGGREGATE`` with the optional ``OR REPLACE`` keywords either creates an aggregate or replaces an existing one
+with the same signature. A ``CREATE AGGREGATE`` without ``OR REPLACE`` fails if an aggregate with the same signature
+already exists.
+
+``CREATE AGGREGATE`` with the optional ``IF NOT EXISTS`` keywords either creates an aggregate if it does not already
+exist.
+
+``OR REPLACE`` and ``IF NOT EXISTS`` cannot be used together.
+
+``STYPE`` defines the type of the state value and must be specified.
+
+The optional ``INITCOND`` defines the initial state value for the aggregate. It defaults to ``null``. A non-\ ``null``
+``INITCOND`` must be specified for state functions that are declared with ``RETURNS NULL ON NULL INPUT``.
+
+``SFUNC`` references an existing function to be used as the state modifying function. The type of first argument of the
+state function must match ``STYPE``. The remaining argument types of the state function must match the argument types of
+the aggregate function. State is not updated for state functions declared with ``RETURNS NULL ON NULL INPUT`` and called
+with ``null``.
+
+The optional ``FINALFUNC`` is called just before the aggregate result is returned. It must take only one argument with
+type ``STYPE``. The return type of the ``FINALFUNC`` may be a different type. A final function declared with ``RETURNS
+NULL ON NULL INPUT`` means that the aggregate's return value will be ``null``, if the last state is ``null``.
+
+If no ``FINALFUNC`` is defined, the overall return type of the aggregate function is ``STYPE``. If a ``FINALFUNC`` is
+defined, it is the return type of that function.
+
+.. _drop-aggregate-statement:
+
+DROP AGGREGATE
+``````````````
+
+Dropping an user-defined aggregate function uses the ``DROP AGGREGATE`` statement:
+
+.. productionlist::
+   drop_aggregate_statement: DROP AGGREGATE [ IF EXISTS ] `function_name` [ '(' `arguments_signature` ')' ]
+
+For instance::
+
+    DROP AGGREGATE myAggregate;
+    DROP AGGREGATE myKeyspace.anAggregate;
+    DROP AGGREGATE someAggregate ( int );
+    DROP AGGREGATE someAggregate ( text );
+
+The ``DROP AGGREGATE`` statement removes an aggregate created using ``CREATE AGGREGATE``. You must specify the argument
+types of the aggregate to drop if there are multiple aggregates with the same name but a different signature (overloaded
+aggregates).
+
+``DROP AGGREGATE`` with the optional ``IF EXISTS`` keywords drops an aggregate if it exists, and does nothing if a
+function with the signature does not exist.

diff --git a/doc/source/cql/index.rst b/doc/source/cql/index.rst
new file mode 100644
index 0000000..00d90e4
--- /dev/null
+++ b/doc/source/cql/index.rst

@@ -0,0 +1,47 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. _cql:
+
+The Cassandra Query Language (CQL)
+==================================
+
+This document describes the Cassandra Query Language (CQL) [#]_. Note that this document describes the last version of
+the languages. However, the `changes <#changes>`_ section provides the diff between the different versions of CQL.
+
+CQL offers a model close to SQL in the sense that data is put in *tables* containing *rows* of *columns*. For
+that reason, when used in this document, these terms (tables, rows and columns) have the same definition than they have
+in SQL. But please note that as such, they do **not** refer to the concept of rows and columns found in the deprecated
+thrift API (and earlier version 1 and 2 of CQL).
+
+.. toctree::
+   :maxdepth: 2
+
+   definitions
+   types
+   ddl
+   dml
+   indexes
+   mvs
+   security
+   functions
+   json
+   triggers
+   appendices
+   changes
+
+.. [#] Technically, this document CQL version 3, which is not backward compatible with CQL version 1 and 2 (which have
+   been deprecated and remove) and differs from it in numerous ways.

diff --git a/doc/source/cql/indexes.rst b/doc/source/cql/indexes.rst
new file mode 100644
index 0000000..fbe5827
--- /dev/null
+++ b/doc/source/cql/indexes.rst

@@ -0,0 +1,83 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: sql
+
+.. _secondary-indexes:
+
+Secondary Indexes
+-----------------
+
+CQL supports creating secondary indexes on tables, allowing queries on the table to use those indexes. A secondary index
+is identified by a name defined by:
+
+.. productionlist::
+   index_name: re('[a-zA-Z_0-9]+')
+
+
+
+.. _create-index-statement:
+
+CREATE INDEX
+^^^^^^^^^^^^
+
+Creating a secondary index on a table uses the ``CREATE INDEX`` statement:
+
+.. productionlist::
+   create_index_statement: CREATE [ CUSTOM ] INDEX [ IF NOT EXISTS ] [ `index_name` ]
+                         :     ON `table_name` '(' `index_identifier` ')'
+                         :     [ USING `string` [ WITH OPTIONS = `map_literal` ] ]
+   index_identifier: `column_name`
+                   :| ( KEYS | VALUES | ENTRIES | FULL ) '(' `column_name` ')'
+
+For instance::
+
+    CREATE INDEX userIndex ON NerdMovies (user);
+    CREATE INDEX ON Mutants (abilityId);
+    CREATE INDEX ON users (keys(favs));
+    CREATE CUSTOM INDEX ON users (email) USING 'path.to.the.IndexClass';
+    CREATE CUSTOM INDEX ON users (email) USING 'path.to.the.IndexClass' WITH OPTIONS = {'storage': '/mnt/ssd/indexes/'};
+
+The ``CREATE INDEX`` statement is used to create a new (automatic) secondary index for a given (existing) column in a
+given table. A name for the index itself can be specified before the ``ON`` keyword, if desired. If data already exists
+for the column, it will be indexed asynchronously. After the index is created, new data for the column is indexed
+automatically at insertion time.
+
+Attempting to create an already existing index will return an error unless the ``IF NOT EXISTS`` option is used. If it
+is used, the statement will be a no-op if the index already exists.
+
+Indexes on Map Keys
+~~~~~~~~~~~~~~~~~~~
+
+When creating an index on a :ref:`maps <maps>`, you may index either the keys or the values. If the column identifier is
+placed within the ``keys()`` function, the index will be on the map keys, allowing you to use ``CONTAINS KEY`` in
+``WHERE`` clauses. Otherwise, the index will be on the map values.
+
+.. _drop-index-statement:
+
+DROP INDEX
+^^^^^^^^^^
+
+Dropping a secondary index uses the ``DROP INDEX`` statement:
+
+.. productionlist::
+   drop_index_statement: DROP INDEX [ IF EXISTS ] `index_name`
+
+The ``DROP INDEX`` statement is used to drop an existing secondary index. The argument of the statement is the index
+name, which may optionally specify the keyspace of the index.
+
+If the index does not exists, the statement will return an error, unless ``IF EXISTS`` is used in which case the
+operation is a no-op.

diff --git a/doc/source/cql/json.rst b/doc/source/cql/json.rst
new file mode 100644
index 0000000..6482fd6
--- /dev/null
+++ b/doc/source/cql/json.rst

@@ -0,0 +1,112 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: sql
+
+.. _cql-json:
+
+JSON Support
+------------
+
+Cassandra 2.2 introduces JSON support to :ref:`SELECT <select-statement>` and :ref:`INSERT <insert-statement>`
+statements. This support does not fundamentally alter the CQL API (for example, the schema is still enforced), it simply
+provides a convenient way to work with JSON documents.
+
+SELECT JSON
+^^^^^^^^^^^
+
+With ``SELECT`` statements, the ``JSON`` keyword can be used to return each row as a single ``JSON`` encoded map. The
+remainder of the ``SELECT`` statement behavior is the same.
+
+The result map keys are the same as the column names in a normal result set. For example, a statement like ``SELECT JSON
+a, ttl(b) FROM ...`` would result in a map with keys ``"a"`` and ``"ttl(b)"``. However, this is one notable exception:
+for symmetry with ``INSERT JSON`` behavior, case-sensitive column names with upper-case letters will be surrounded with
+double quotes. For example, ``SELECT JSON myColumn FROM ...`` would result in a map key ``"\"myColumn\""`` (note the
+escaped quotes).
+
+The map values will ``JSON``-encoded representations (as described below) of the result set values.
+
+INSERT JSON
+^^^^^^^^^^^
+
+With ``INSERT`` statements, the new ``JSON`` keyword can be used to enable inserting a ``JSON`` encoded map as a single
+row. The format of the ``JSON`` map should generally match that returned by a ``SELECT JSON`` statement on the same
+table. In particular, case-sensitive column names should be surrounded with double quotes. For example, to insert into a
+table with two columns named "myKey" and "value", you would do the following::
+
+    INSERT INTO mytable JSON '{ "\"myKey\"": 0, "value": 0}'
+
+Any columns which are omitted from the ``JSON`` map will be defaulted to a ``NULL`` value (which will result in a
+tombstone being created).
+
+JSON Encoding of Cassandra Data Types
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Where possible, Cassandra will represent and accept data types in their native ``JSON`` representation. Cassandra will
+also accept string representations matching the CQL literal format for all single-field types. For example, floats,
+ints, UUIDs, and dates can be represented by CQL literal strings. However, compound types, such as collections, tuples,
+and user-defined types must be represented by native ``JSON`` collections (maps and lists) or a JSON-encoded string
+representation of the collection.
+
+The following table describes the encodings that Cassandra will accept in ``INSERT JSON`` values (and ``fromJson()``
+arguments) as well as the format Cassandra will use when returning data for ``SELECT JSON`` statements (and
+``fromJson()``):
+
+=============== ======================== =============== ==============================================================
+ Type            Formats accepted         Return format   Notes
+=============== ======================== =============== ==============================================================
+ ``ascii``       string                   string          Uses JSON's ``\u`` character escape
+ ``bigint``      integer, string          integer         String must be valid 64 bit integer
+ ``blob``        string                   string          String should be 0x followed by an even number of hex digits
+ ``boolean``     boolean, string          boolean         String must be "true" or "false"
+ ``date``        string                   string          Date in format ``YYYY-MM-DD``, timezone UTC
+ ``decimal``     integer, float, string   float           May exceed 32 or 64-bit IEEE-754 floating point precision in
+                                                          client-side decoder
+ ``double``      integer, float, string   float           String must be valid integer or float
+ ``float``       integer, float, string   float           String must be valid integer or float
+ ``inet``        string                   string          IPv4 or IPv6 address
+ ``int``         integer, string          integer         String must be valid 32 bit integer
+ ``list``        list, string             list            Uses JSON's native list representation
+ ``map``         map, string              map             Uses JSON's native map representation
+ ``smallint``    integer, string          integer         String must be valid 16 bit integer
+ ``set``         list, string             list            Uses JSON's native list representation
+ ``text``        string                   string          Uses JSON's ``\u`` character escape
+ ``time``        string                   string          Time of day in format ``HH-MM-SS[.fffffffff]``
+ ``timestamp``   integer, string          string          A timestamp. Strings constant allows to input :ref:`timestamps
+                                                          as dates <timestamps>`. Datestamps with format ``YYYY-MM-DD
+                                                          HH:MM:SS.SSS`` are returned.
+ ``timeuuid``    string                   string          Type 1 UUID. See :token:`constant` for the UUID format
+ ``tinyint``     integer, string          integer         String must be valid 8 bit integer
+ ``tuple``       list, string             list            Uses JSON's native list representation
+ ``UDT``         map, string              map             Uses JSON's native map representation with field names as keys
+ ``uuid``        string                   string          See :token:`constant` for the UUID format
+ ``varchar``     string                   string          Uses JSON's ``\u`` character escape
+ ``varint``      integer, string          integer         Variable length; may overflow 32 or 64 bit integers in
+                                                          client-side decoder
+=============== ======================== =============== ==============================================================
+
+The fromJson() Function
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The ``fromJson()`` function may be used similarly to ``INSERT JSON``, but for a single column value. It may only be used
+in the ``VALUES`` clause of an ``INSERT`` statement or as one of the column values in an ``UPDATE``, ``DELETE``, or
+``SELECT`` statement. For example, it cannot be used in the selection clause of a ``SELECT`` statement.
+
+The toJson() Function
+^^^^^^^^^^^^^^^^^^^^^
+
+The ``toJson()`` function may be used similarly to ``SELECT JSON``, but for a single column value. It may only be used
+in the selection clause of a ``SELECT`` statement.

diff --git a/doc/source/cql/mvs.rst b/doc/source/cql/mvs.rst
new file mode 100644
index 0000000..84c18e0
--- /dev/null
+++ b/doc/source/cql/mvs.rst

@@ -0,0 +1,166 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: sql
+
+.. _materialized-views:
+
+Materialized Views
+------------------
+
+Materialized views names are defined by:
+
+.. productionlist::
+   view_name: re('[a-zA-Z_0-9]+')
+
+
+.. _create-materialized-view-statement:
+
+CREATE MATERIALIZED VIEW
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+You can create a materialized view on a table using a ``CREATE MATERIALIZED VIEW`` statement:
+
+.. productionlist::
+   create_materialized_view_statement: CREATE MATERIALIZED VIEW [ IF NOT EXISTS ] `view_name` AS
+                                     :     `select_statement`
+                                     :     PRIMARY KEY '(' `primary_key` ')'
+                                     :     WITH `table_options`
+
+For instance::
+
+    CREATE MATERIALIZED VIEW monkeySpecies_by_population AS
+        SELECT * FROM monkeySpecies
+        WHERE population IS NOT NULL AND species IS NOT NULL
+        PRIMARY KEY (population, species)
+        WITH comment=‘Allow query by population instead of species’;
+
+The ``CREATE MATERIALIZED VIEW`` statement creates a new materialized view. Each such view is a set of *rows* which
+corresponds to rows which are present in the underlying, or base, table specified in the ``SELECT`` statement. A
+materialized view cannot be directly updated, but updates to the base table will cause corresponding updates in the
+view.
+
+Creating a materialized view has 3 main parts:
+
+- The :ref:`select statement <mv-select>` that restrict the data included in the view.
+- The :ref:`primary key <mv-primary-key>` definition for the view.
+- The :ref:`options <mv-options>` for the view.
+
+Attempting to create an already existing materialized view will return an error unless the ``IF NOT EXISTS`` option is
+used. If it is used, the statement will be a no-op if the materialized view already exists.
+
+.. _mv-select:
+
+MV select statement
+```````````````````
+
+The select statement of a materialized view creation defines which of the base table is included in the view. That
+statement is limited in a number of ways:
+
+- the :ref:`selection <selection-clause>` is limited to those that only select columns of the base table. In other
+  words, you can't use any function (aggregate or not), casting, term, etc. Aliases are also not supported. You can
+  however use `*` as a shortcut of selecting all columns. Further, :ref:`static columns <static-columns>` cannot be
+  included in a materialized view (which means ``SELECT *`` isn't allowed if the base table has static columns).
+- the ``WHERE`` clause have the following restrictions:
+
+  - it cannot include any :token:`bind_marker`.
+  - the columns that are not part of the *base table* primary key can only be restricted by an ``IS NOT NULL``
+    restriction. No other restriction is allowed.
+  - as the columns that are part of the *view* primary key cannot be null, they must always be at least restricted by a
+    ``IS NOT NULL`` restriction (or any other restriction, but they must have one).
+
+- it cannot have neither an :ref:`ordering clause <ordering-clause>`, nor a :ref:`limit <limit-clause>`, nor :ref:`ALLOW
+  FILTERING <allow-filtering>`.
+
+.. _mv-primary-key:
+
+MV primary key
+``````````````
+
+A view must have a primary key and that primary key must conform to the following restrictions:
+
+- it must contain all the primary key columns of the base table. This ensures that every row of the view correspond to
+  exactly one row of the base table.
+- it can only contain a single column that is not a primary key column in the base table.
+
+So for instance, give the following base table definition::
+
+    CREATE TABLE t (
+        k int,
+        c1 int,
+        c2 int,
+        v1 int,
+        v2 int,
+        PRIMARY KEY (k, c1, c2)
+    )
+
+then the following view definitions are allowed::
+
+    CREATE MATERIALIZED VIEW mv1 AS
+        SELECT * FROM t WHERE k IS NOT NULL AND c1 IS NOT NULL AND c2 IS NOT NULL
+        PRIMARY KEY (c1, k, c2)
+
+    CREATE MATERIALIZED VIEW mv1 AS
+        SELECT * FROM t WHERE k IS NOT NULL AND c1 IS NOT NULL AND c2 IS NOT NULL
+        PRIMARY KEY (v1, k, c1, c2)
+
+but the following ones are **not** allowed::
+
+    // Error: cannot include both v1 and v2 in the primary key as both are not in the base table primary key
+    CREATE MATERIALIZED VIEW mv1 AS
+        SELECT * FROM t WHERE k IS NOT NULL AND c1 IS NOT NULL AND c2 IS NOT NULL AND v1 IS NOT NULL
+        PRIMARY KEY (v1, v2, k, c1, c2)
+
+    // Error: must include k in the primary as it's a base table primary key column
+    CREATE MATERIALIZED VIEW mv1 AS
+        SELECT * FROM t WHERE c1 IS NOT NULL AND c2 IS NOT NULL
+        PRIMARY KEY (c1, c2)
+
+
+.. _mv-options:
+
+MV options
+``````````
+
+A materialized view is internally implemented by a table and as such, creating a MV allows the :ref:`same options than
+creating a table <create-table-options>`.
+
+
+.. _alter-materialized-view-statement:
+
+ALTER MATERIALIZED VIEW
+^^^^^^^^^^^^^^^^^^^^^^^
+
+After creation, you can alter the options of a materialized view using the ``ALTER MATERIALIZED VIEW`` statement:
+
+.. productionlist::
+   alter_materialized_view_statement: ALTER MATERIALIZED VIEW `view_name` WITH `table_options`
+
+The options that can be updated are the same than at creation time and thus the :ref:`same than for tables
+<create-table-options>`.
+
+.. _drop-materialized-view-statement:
+
+DROP MATERIALIZED VIEW
+^^^^^^^^^^^^^^^^^^^^^^
+
+Dropping a materialized view users the ``DROP MATERIALIZED VIEW`` statement:
+
+.. productionlist::
+   drop_materialized_view_statement: DROP MATERIALIZED VIEW [ IF EXISTS ] `view_name`;
+
+If the materialized view does not exists, the statement will return an error, unless ``IF EXISTS`` is used in which case
+the operation is a no-op.

diff --git a/doc/source/cql/security.rst b/doc/source/cql/security.rst
new file mode 100644
index 0000000..aa65383
--- /dev/null
+++ b/doc/source/cql/security.rst

@@ -0,0 +1,497 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: sql
+
+.. _cql-security:
+
+Security
+--------
+
+.. _cql-roles:
+
+Database Roles
+^^^^^^^^^^^^^^
+
+.. _create-role-statement:
+
+CREATE ROLE
+~~~~~~~~~~~
+
+Creating a role uses the ``CREATE ROLE`` statement:
+
+.. productionlist::
+   create_role_statement: CREATE ROLE [ IF NOT EXISTS ] `role_name`
+                        :     [ WITH `role_options` ]
+   role_options: `role_option` ( AND `role_option` )*
+   role_option: PASSWORD '=' `string`
+              :| LOGIN '=' `boolean`
+              :| SUPERUSER '=' `boolean`
+              :| OPTIONS '=' `map_literal`
+
+For instance::
+
+    CREATE ROLE new_role;
+    CREATE ROLE alice WITH PASSWORD = 'password_a' AND LOGIN = true;
+    CREATE ROLE bob WITH PASSWORD = 'password_b' AND LOGIN = true AND SUPERUSER = true;
+    CREATE ROLE carlos WITH OPTIONS = { 'custom_option1' : 'option1_value', 'custom_option2' : 99 };
+
+By default roles do not possess ``LOGIN`` privileges or ``SUPERUSER`` status.
+
+:ref:`Permissions <cql-permissions>` on database resources are granted to roles; types of resources include keyspaces,
+tables, functions and roles themselves. Roles may be granted to other roles to create hierarchical permissions
+structures; in these hierarchies, permissions and ``SUPERUSER`` status are inherited, but the ``LOGIN`` privilege is
+not.
+
+If a role has the ``LOGIN`` privilege, clients may identify as that role when connecting. For the duration of that
+connection, the client will acquire any roles and privileges granted to that role.
+
+Only a client with with the ``CREATE`` permission on the database roles resource may issue ``CREATE ROLE`` requests (see
+the :ref:`relevant section <cql-permissions>` below), unless the client is a ``SUPERUSER``. Role management in Cassandra
+is pluggable and custom implementations may support only a subset of the listed options.
+
+Role names should be quoted if they contain non-alphanumeric characters.
+
+.. _setting-credentials-for-internal-authentication:
+
+Setting credentials for internal authentication
+```````````````````````````````````````````````
+
+Use the ``WITH PASSWORD`` clause to set a password for internal authentication, enclosing the password in single
+quotation marks.
+
+If internal authentication has not been set up or the role does not have ``LOGIN`` privileges, the ``WITH PASSWORD``
+clause is not necessary.
+
+Creating a role conditionally
+`````````````````````````````
+
+Attempting to create an existing role results in an invalid query condition unless the ``IF NOT EXISTS`` option is used.
+If the option is used and the role exists, the statement is a no-op::
+
+    CREATE ROLE other_role;
+    CREATE ROLE IF NOT EXISTS other_role;
+
+
+.. _alter-role-statement:
+
+ALTER ROLE
+~~~~~~~~~~
+
+Altering a role options uses the ``ALTER ROLE`` statement:
+
+.. productionlist::
+   alter_role_statement: ALTER ROLE `role_name` WITH `role_options`
+
+For instance::
+
+    ALTER ROLE bob WITH PASSWORD = 'PASSWORD_B' AND SUPERUSER = false;
+
+Conditions on executing ``ALTER ROLE`` statements:
+
+-  A client must have ``SUPERUSER`` status to alter the ``SUPERUSER`` status of another role
+-  A client cannot alter the ``SUPERUSER`` status of any role it currently holds
+-  A client can only modify certain properties of the role with which it identified at login (e.g. ``PASSWORD``)
+-  To modify properties of a role, the client must be granted ``ALTER`` :ref:`permission <cql-permissions>` on that role
+
+.. _drop-role-statement:
+
+DROP ROLE
+~~~~~~~~~
+
+Dropping a role uses the ``DROP ROLE`` statement:
+
+.. productionlist::
+   drop_role_statement: DROP ROLE [ IF EXISTS ] `role_name`
+
+``DROP ROLE`` requires the client to have ``DROP`` :ref:`permission <cql-permissions>` on the role in question. In
+addition, client may not ``DROP`` the role with which it identified at login. Finally, only a client with ``SUPERUSER``
+status may ``DROP`` another ``SUPERUSER`` role.
+
+Attempting to drop a role which does not exist results in an invalid query condition unless the ``IF EXISTS`` option is
+used. If the option is used and the role does not exist the statement is a no-op.
+
+.. _grant-role-statement:
+
+GRANT ROLE
+~~~~~~~~~~
+
+Granting a role to another uses the ``GRANT ROLE`` statement:
+
+.. productionlist::
+   grant_role_statement: GRANT `role_name` TO `role_name`
+
+For instance::
+
+    GRANT report_writer TO alice;
+
+This statement grants the ``report_writer`` role to ``alice``. Any permissions granted to ``report_writer`` are also
+acquired by ``alice``.
+
+Roles are modelled as a directed acyclic graph, so circular grants are not permitted. The following examples result in
+error conditions::
+
+    GRANT role_a TO role_b;
+    GRANT role_b TO role_a;
+
+    GRANT role_a TO role_b;
+    GRANT role_b TO role_c;
+    GRANT role_c TO role_a;
+
+.. _revoke-role-statement:
+
+REVOKE ROLE
+~~~~~~~~~~~
+
+Revoking a role uses the ``REVOKE ROLE`` statement:
+
+.. productionlist::
+   revoke_role_statement: REVOKE `role_name` FROM `role_name`
+
+For instance::
+
+    REVOKE report_writer FROM alice;
+
+This statement revokes the ``report_writer`` role from ``alice``. Any permissions that ``alice`` has acquired via the
+``report_writer`` role are also revoked.
+
+.. _list-roles-statement:
+
+LIST ROLES
+~~~~~~~~~~
+
+All the known roles (in the system or granted to specific role) can be listed using the ``LIST ROLES`` statement:
+
+.. productionlist::
+   list_roles_statement: LIST ROLES [ OF `role_name` ] [ NORECURSIVE ]
+
+For instance::
+
+    LIST ROLES;
+
+returns all known roles in the system, this requires ``DESCRIBE`` permission on the database roles resource. And::
+
+    LIST ROLES OF alice;
+
+enumerates all roles granted to ``alice``, including those transitively acquired. But::
+
+    LIST ROLES OF bob NORECURSIVE
+
+lists all roles directly granted to ``bob`` without including any of the transitively acquired ones.
+
+Users
+^^^^^
+
+Prior to the introduction of roles in Cassandra 2.2, authentication and authorization were based around the concept of a
+``USER``. For backward compatibility, the legacy syntax has been preserved with ``USER`` centric statements becoming
+synonyms for the ``ROLE`` based equivalents. In other words, creating/updating a user is just a different syntax for
+creating/updating a role.
+
+.. _create-user-statement:
+
+CREATE USER
+~~~~~~~~~~~
+
+Creating a user uses the ``CREATE USER`` statement:
+
+.. productionlist::
+   create_user_statement: CREATE USER [ IF NOT EXISTS ] `role_name` [ WITH PASSWORD `string` ] [ `user_option` ]
+   user_option: SUPERUSER | NOSUPERUSER
+
+For instance::
+
+    CREATE USER alice WITH PASSWORD 'password_a' SUPERUSER;
+    CREATE USER bob WITH PASSWORD 'password_b' NOSUPERUSER;
+
+``CREATE USER`` is equivalent to ``CREATE ROLE`` where the ``LOGIN`` option is ``true``. So, the following pairs of
+statements are equivalent::
+
+    CREATE USER alice WITH PASSWORD 'password_a' SUPERUSER;
+    CREATE ROLE alice WITH PASSWORD = 'password_a' AND LOGIN = true AND SUPERUSER = true;
+
+    CREATE USER IF EXISTS alice WITH PASSWORD 'password_a' SUPERUSER;
+    CREATE ROLE IF EXISTS alice WITH PASSWORD = 'password_a' AND LOGIN = true AND SUPERUSER = true;
+
+    CREATE USER alice WITH PASSWORD 'password_a' NOSUPERUSER;
+    CREATE ROLE alice WITH PASSWORD = 'password_a' AND LOGIN = true AND SUPERUSER = false;
+
+    CREATE USER alice WITH PASSWORD 'password_a' NOSUPERUSER;
+    CREATE ROLE alice WITH PASSWORD = 'password_a' WITH LOGIN = true;
+
+    CREATE USER alice WITH PASSWORD 'password_a';
+    CREATE ROLE alice WITH PASSWORD = 'password_a' WITH LOGIN = true;
+
+.. _alter-user-statement:
+
+ALTER USER
+~~~~~~~~~~
+
+Altering the options of a user uses the ``ALTER USER`` statement:
+
+.. productionlist::
+   alter_user_statement: ALTER USER `role_name` [ WITH PASSWORD `string` ] [ `user_option` ]
+
+For instance::
+
+    ALTER USER alice WITH PASSWORD 'PASSWORD_A';
+    ALTER USER bob SUPERUSER;
+
+.. _drop-user-statement:
+
+DROP USER
+~~~~~~~~~
+
+Dropping a user uses the ``DROP USER`` statement:
+
+.. productionlist::
+   drop_user_statement: DROP USER [ IF EXISTS ] `role_name`
+
+.. _list-users-statement:
+
+LIST USERS
+~~~~~~~~~~
+
+Existing users can be listed using the ``LIST USERS`` statement:
+
+.. productionlist::
+   list_users_statement: LIST USERS
+
+Note that this statement is equivalent to::
+
+    LIST ROLES;
+
+but only roles with the ``LOGIN`` privilege are included in the output.
+
+Data Control
+^^^^^^^^^^^^
+
+.. _cql-permissions:
+
+Permissions
+~~~~~~~~~~~
+
+Permissions on resources are granted to roles; there are several different types of resources in Cassandra and each type
+is modelled hierarchically:
+
+- The hierarchy of Data resources, Keyspaces and Tables has the structure ``ALL KEYSPACES`` -> ``KEYSPACE`` ->
+  ``TABLE``.
+- Function resources have the structure ``ALL FUNCTIONS`` -> ``KEYSPACE`` -> ``FUNCTION``
+- Resources representing roles have the structure ``ALL ROLES`` -> ``ROLE``
+- Resources representing JMX ObjectNames, which map to sets of MBeans/MXBeans, have the structure ``ALL MBEANS`` ->
+  ``MBEAN``
+
+Permissions can be granted at any level of these hierarchies and they flow downwards. So granting a permission on a
+resource higher up the chain automatically grants that same permission on all resources lower down. For example,
+granting ``SELECT`` on a ``KEYSPACE`` automatically grants it on all ``TABLES`` in that ``KEYSPACE``. Likewise, granting
+a permission on ``ALL FUNCTIONS`` grants it on every defined function, regardless of which keyspace it is scoped in. It
+is also possible to grant permissions on all functions scoped to a particular keyspace.
+
+Modifications to permissions are visible to existing client sessions; that is, connections need not be re-established
+following permissions changes.
+
+The full set of available permissions is:
+
+- ``CREATE``
+- ``ALTER``
+- ``DROP``
+- ``SELECT``
+- ``MODIFY``
+- ``AUTHORIZE``
+- ``DESCRIBE``
+- ``EXECUTE``
+
+Not all permissions are applicable to every type of resource. For instance, ``EXECUTE`` is only relevant in the context
+of functions or mbeans; granting ``EXECUTE`` on a resource representing a table is nonsensical. Attempting to ``GRANT``
+a permission on resource to which it cannot be applied results in an error response. The following illustrates which
+permissions can be granted on which types of resource, and which statements are enabled by that permission.
+
+=============== =============================== =======================================================================
+ Permission      Resource                        Operations
+=============== =============================== =======================================================================
+ ``CREATE``      ``ALL KEYSPACES``               ``CREATE KEYSPACE`` and ``CREATE TABLE`` in any keyspace
+ ``CREATE``      ``KEYSPACE``                    ``CREATE TABLE`` in specified keyspace
+ ``CREATE``      ``ALL FUNCTIONS``               ``CREATE FUNCTION`` in any keyspace and ``CREATE AGGREGATE`` in any
+                                                 keyspace
+ ``CREATE``      ``ALL FUNCTIONS IN KEYSPACE``   ``CREATE FUNCTION`` and ``CREATE AGGREGATE`` in specified keyspace
+ ``CREATE``      ``ALL ROLES``                   ``CREATE ROLE``
+ ``ALTER``       ``ALL KEYSPACES``               ``ALTER KEYSPACE`` and ``ALTER TABLE`` in any keyspace
+ ``ALTER``       ``KEYSPACE``                    ``ALTER KEYSPACE`` and ``ALTER TABLE`` in specified keyspace
+ ``ALTER``       ``TABLE``                       ``ALTER TABLE``
+ ``ALTER``       ``ALL FUNCTIONS``               ``CREATE FUNCTION`` and ``CREATE AGGREGATE``: replacing any existing
+ ``ALTER``       ``ALL FUNCTIONS IN KEYSPACE``   ``CREATE FUNCTION`` and ``CREATE AGGREGATE``: replacing existing in
+                                                 specified keyspace
+ ``ALTER``       ``FUNCTION``                    ``CREATE FUNCTION`` and ``CREATE AGGREGATE``: replacing existing
+ ``ALTER``       ``ALL ROLES``                   ``ALTER ROLE`` on any role
+ ``ALTER``       ``ROLE``                        ``ALTER ROLE``
+ ``DROP``        ``ALL KEYSPACES``               ``DROP KEYSPACE`` and ``DROP TABLE`` in any keyspace
+ ``DROP``        ``KEYSPACE``                    ``DROP TABLE`` in specified keyspace
+ ``DROP``        ``TABLE``                       ``DROP TABLE``
+ ``DROP``        ``ALL FUNCTIONS``               ``DROP FUNCTION`` and ``DROP AGGREGATE`` in any keyspace
+ ``DROP``        ``ALL FUNCTIONS IN KEYSPACE``   ``DROP FUNCTION`` and ``DROP AGGREGATE`` in specified keyspace
+ ``DROP``        ``FUNCTION``                    ``DROP FUNCTION``
+ ``DROP``        ``ALL ROLES``                   ``DROP ROLE`` on any role
+ ``DROP``        ``ROLE``                        ``DROP ROLE``
+ ``SELECT``      ``ALL KEYSPACES``               ``SELECT`` on any table
+ ``SELECT``      ``KEYSPACE``                    ``SELECT`` on any table in specified keyspace
+ ``SELECT``      ``TABLE``                       ``SELECT`` on specified table
+ ``SELECT``      ``ALL MBEANS``                  Call getter methods on any mbean
+ ``SELECT``      ``MBEANS``                      Call getter methods on any mbean matching a wildcard pattern
+ ``SELECT``      ``MBEAN``                       Call getter methods on named mbean
+ ``MODIFY``      ``ALL KEYSPACES``               ``INSERT``, ``UPDATE``, ``DELETE`` and ``TRUNCATE`` on any table
+ ``MODIFY``      ``KEYSPACE``                    ``INSERT``, ``UPDATE``, ``DELETE`` and ``TRUNCATE`` on any table in
+                                                 specified keyspace
+ ``MODIFY``      ``TABLE``                       ``INSERT``, ``UPDATE``, ``DELETE`` and ``TRUNCATE`` on specified table
+ ``MODIFY``      ``ALL MBEANS``                  Call setter methods on any mbean
+ ``MODIFY``      ``MBEANS``                      Call setter methods on any mbean matching a wildcard pattern
+ ``MODIFY``      ``MBEAN``                       Call setter methods on named mbean
+ ``AUTHORIZE``   ``ALL KEYSPACES``               ``GRANT PERMISSION`` and ``REVOKE PERMISSION`` on any table
+ ``AUTHORIZE``   ``KEYSPACE``                    ``GRANT PERMISSION`` and ``REVOKE PERMISSION`` on any table in
+                                                 specified keyspace
+ ``AUTHORIZE``   ``TABLE``                       ``GRANT PERMISSION`` and ``REVOKE PERMISSION`` on specified table
+ ``AUTHORIZE``   ``ALL FUNCTIONS``               ``GRANT PERMISSION`` and ``REVOKE PERMISSION`` on any function
+ ``AUTHORIZE``   ``ALL FUNCTIONS IN KEYSPACE``   ``GRANT PERMISSION`` and ``REVOKE PERMISSION`` in specified keyspace
+ ``AUTHORIZE``   ``FUNCTION``                    ``GRANT PERMISSION`` and ``REVOKE PERMISSION`` on specified function
+ ``AUTHORIZE``   ``ALL MBEANS``                  ``GRANT PERMISSION`` and ``REVOKE PERMISSION`` on any mbean
+ ``AUTHORIZE``   ``MBEANS``                      ``GRANT PERMISSION`` and ``REVOKE PERMISSION`` on any mbean matching
+                                                 a wildcard pattern
+ ``AUTHORIZE``   ``MBEAN``                       ``GRANT PERMISSION`` and ``REVOKE PERMISSION`` on named mbean
+ ``AUTHORIZE``   ``ALL ROLES``                   ``GRANT ROLE`` and ``REVOKE ROLE`` on any role
+ ``AUTHORIZE``   ``ROLES``                       ``GRANT ROLE`` and ``REVOKE ROLE`` on specified roles
+ ``DESCRIBE``    ``ALL ROLES``                   ``LIST ROLES`` on all roles or only roles granted to another,
+                                                 specified role
+ ``DESCRIBE``    ``ALL MBEANS``                  Retrieve metadata about any mbean from the platform's MBeanServer
+ ``DESCRIBE``    ``MBEANS``                      Retrieve metadata about any mbean matching a wildcard patter from the
+                                                 platform's MBeanServer
+ ``DESCRIBE``    ``MBEAN``                       Retrieve metadata about a named mbean from the platform's MBeanServer
+ ``EXECUTE``     ``ALL FUNCTIONS``               ``SELECT``, ``INSERT`` and ``UPDATE`` using any function, and use of
+                                                 any function in ``CREATE AGGREGATE``
+ ``EXECUTE``     ``ALL FUNCTIONS IN KEYSPACE``   ``SELECT``, ``INSERT`` and ``UPDATE`` using any function in specified
+                                                 keyspace and use of any function in keyspace in ``CREATE AGGREGATE``
+ ``EXECUTE``     ``FUNCTION``                    ``SELECT``, ``INSERT`` and ``UPDATE`` using specified function and use
+                                                 of the function in ``CREATE AGGREGATE``
+ ``EXECUTE``     ``ALL MBEANS``                  Execute operations on any mbean
+ ``EXECUTE``     ``MBEANS``                      Execute operations on any mbean matching a wildcard pattern
+ ``EXECUTE``     ``MBEAN``                       Execute operations on named mbean
+=============== =============================== =======================================================================
+
+.. _grant-permission-statement:
+
+GRANT PERMISSION
+~~~~~~~~~~~~~~~~
+
+Granting a permission uses the ``GRANT PERMISSION`` statement:
+
+.. productionlist::
+   grant_permission_statement: GRANT `permissions` ON `resource` TO `role_name`
+   permissions: ALL [ PERMISSIONS ] | `permission` [ PERMISSION ]
+   permission: CREATE | ALTER | DROP | SELECT | MODIFY | AUTHORIZE | DESCRIBE | EXECUTE
+   resource: ALL KEYSPACES
+           :| KEYSPACE `keyspace_name`
+           :| [ TABLE ] `table_name`
+           :| ALL ROLES
+           :| ROLE `role_name`
+           :| ALL FUNCTIONS [ IN KEYSPACE `keyspace_name` ]
+           :| FUNCTION `function_name` '(' [ `cql_type` ( ',' `cql_type` )* ] ')'
+           :| ALL MBEANS
+           :| ( MBEAN | MBEANS ) `string`
+
+For instance::
+
+    GRANT SELECT ON ALL KEYSPACES TO data_reader;
+
+This gives any user with the role ``data_reader`` permission to execute ``SELECT`` statements on any table across all
+keyspaces::
+
+    GRANT MODIFY ON KEYSPACE keyspace1 TO data_writer;
+
+This give any user with the role ``data_writer`` permission to perform ``UPDATE``, ``INSERT``, ``UPDATE``, ``DELETE``
+and ``TRUNCATE`` queries on all tables in the ``keyspace1`` keyspace::
+
+    GRANT DROP ON keyspace1.table1 TO schema_owner;
+
+This gives any user with the ``schema_owner`` role permissions to ``DROP`` ``keyspace1.table1``::
+
+    GRANT EXECUTE ON FUNCTION keyspace1.user_function( int ) TO report_writer;
+
+This grants any user with the ``report_writer`` role permission to execute ``SELECT``, ``INSERT`` and ``UPDATE`` queries
+which use the function ``keyspace1.user_function( int )``::
+
+    GRANT DESCRIBE ON ALL ROLES TO role_admin;
+
+This grants any user with the ``role_admin`` role permission to view any and all roles in the system with a ``LIST
+ROLES`` statement
+
+.. _grant-all:
+
+GRANT ALL
+`````````
+
+When the ``GRANT ALL`` form is used, the appropriate set of permissions is determined automatically based on the target
+resource.
+
+Automatic Granting
+``````````````````
+
+When a resource is created, via a ``CREATE KEYSPACE``, ``CREATE TABLE``, ``CREATE FUNCTION``, ``CREATE AGGREGATE`` or
+``CREATE ROLE`` statement, the creator (the role the database user who issues the statement is identified as), is
+automatically granted all applicable permissions on the new resource.
+
+.. _revoke-permission-statement:
+
+REVOKE PERMISSION
+~~~~~~~~~~~~~~~~~
+
+Revoking a permission from a role uses the ``REVOKE PERMISSION`` statement:
+
+.. productionlist::
+   revoke_permission_statement: REVOKE `permissions` ON `resource` FROM `role_name`
+
+For instance::
+
+    REVOKE SELECT ON ALL KEYSPACES FROM data_reader;
+    REVOKE MODIFY ON KEYSPACE keyspace1 FROM data_writer;
+    REVOKE DROP ON keyspace1.table1 FROM schema_owner;
+    REVOKE EXECUTE ON FUNCTION keyspace1.user_function( int ) FROM report_writer;
+    REVOKE DESCRIBE ON ALL ROLES FROM role_admin;
+
+.. _list-permissions-statement:
+
+LIST PERMISSIONS
+~~~~~~~~~~~~~~~~
+
+Listing granted permissions uses the ``LIST PERMISSIONS`` statement:
+
+.. productionlist::
+   list_permissions_statement: LIST `permissions` [ ON `resource` ] [ OF `role_name` [ NORECURSIVE ] ]
+
+For instance::
+
+    LIST ALL PERMISSIONS OF alice;
+
+Show all permissions granted to ``alice``, including those acquired transitively from any other roles::
+
+    LIST ALL PERMISSIONS ON keyspace1.table1 OF bob;
+
+Show all permissions on ``keyspace1.table1`` granted to ``bob``, including those acquired transitively from any other
+roles. This also includes any permissions higher up the resource hierarchy which can be applied to ``keyspace1.table1``.
+For example, should ``bob`` have ``ALTER`` permission on ``keyspace1``, that would be included in the results of this
+query. Adding the ``NORECURSIVE`` switch restricts the results to only those permissions which were directly granted to
+``bob`` or one of ``bob``'s roles::
+
+    LIST SELECT PERMISSIONS OF carlos;
+
+Show any permissions granted to ``carlos`` or any of ``carlos``'s roles, limited to ``SELECT`` permissions on any
+resource.

diff --git a/doc/source/cql/triggers.rst b/doc/source/cql/triggers.rst
new file mode 100644
index 0000000..3bba72d
--- /dev/null
+++ b/doc/source/cql/triggers.rst

@@ -0,0 +1,63 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: sql
+
+.. _cql-triggers:
+
+Triggers
+--------
+
+Triggers are identified by a name defined by:
+
+.. productionlist::
+   trigger_name: `identifier`
+
+
+.. _create-trigger-statement:
+
+CREATE TRIGGER
+^^^^^^^^^^^^^^
+
+Creating a new trigger uses the ``CREATE TRIGGER`` statement:
+
+.. productionlist::
+   create_trigger_statement: CREATE TRIGGER [ IF NOT EXISTS ] `trigger_name`
+                           :     ON `table_name`
+                           :     USING `string`
+
+For instance::
+
+    CREATE TRIGGER myTrigger ON myTable USING 'org.apache.cassandra.triggers.InvertedIndex';
+
+The actual logic that makes up the trigger can be written in any Java (JVM) language and exists outside the database.
+You place the trigger code in a ``lib/triggers`` subdirectory of the Cassandra installation directory, it loads during
+cluster startup, and exists on every node that participates in a cluster. The trigger defined on a table fires before a
+requested DML statement occurs, which ensures the atomicity of the transaction.
+
+.. _drop-trigger-statement:
+
+DROP TRIGGER
+^^^^^^^^^^^^
+
+Dropping a trigger uses the ``DROP TRIGGER`` statement:
+
+.. productionlist::
+   drop_trigger_statement: DROP TRIGGER [ IF EXISTS ] `trigger_name` ON `table_name`
+
+For instance::
+
+    DROP TRIGGER myTrigger ON myTable;

diff --git a/doc/source/cql/types.rst b/doc/source/cql/types.rst
new file mode 100644
index 0000000..80cf864
--- /dev/null
+++ b/doc/source/cql/types.rst

@@ -0,0 +1,518 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: sql
+
+.. _UUID: https://en.wikipedia.org/wiki/Universally_unique_identifier
+
+.. _data-types:
+
+Data Types
+----------
+
+CQL is a typed language and supports a rich set of data types, including :ref:`native types <native-types>`,
+:ref:`collection types <collections>`, :ref:`user-defined types <udts>`, :ref:`tuple types <tuples>` and :ref:`custom
+types <custom-types>`:
+
+.. productionlist::
+   cql_type: `native_type` | `collection_type` | `user_defined_type` | `tuple_type` | `custom_type`
+
+
+.. _native-types:
+
+Native Types
+^^^^^^^^^^^^
+
+The native types supported by CQL are:
+
+.. productionlist::
+   native_type: ASCII
+              : | BIGINT
+              : | BLOB
+              : | BOOLEAN
+              : | COUNTER
+              : | DATE
+              : | DECIMAL
+              : | DOUBLE
+              : | FLOAT
+              : | INET
+              : | INT
+              : | SMALLINT
+              : | TEXT
+              : | TIME
+              : | TIMESTAMP
+              : | TIMEUUID
+              : | TINYINT
+              : | UUID
+              : | VARCHAR
+              : | VARINT
+
+The following table gives additional informations on the native data types, and on which kind of :ref:`constants
+<constants>` each type supports:
+
+=============== ===================== ==================================================================================
+ type            constants supported   description
+=============== ===================== ==================================================================================
+ ``ascii``       :token:`string`       ASCII character string
+ ``bigint``      :token:`integer`      64-bit signed long
+ ``blob``        :token:`blob`         Arbitrary bytes (no validation)
+ ``boolean``     :token:`boolean`      Either ``true`` or ``false``
+ ``counter``     :token:`integer`      Counter column (64-bit signed value). See :ref:`counters` for details
+ ``date``        :token:`integer`,     A date (with no corresponding time value). See :ref:`dates` below for details
+                 :token:`string`
+ ``decimal``     :token:`integer`,     Variable-precision decimal
+                 :token:`float`
+ ``double``      :token:`integer`      64-bit IEEE-754 floating point
+                 :token:`float`
+ ``float``       :token:`integer`,     32-bit IEEE-754 floating point
+                 :token:`float`
+ ``inet``        :token:`string`       An IP address, either IPv4 (4 bytes long) or IPv6 (16 bytes long). Note that
+                                       there is no ``inet`` constant, IP address should be input as strings
+ ``int``         :token:`integer`      32-bit signed int
+ ``smallint``    :token:`integer`      16-bit signed int
+ ``text``        :token:`string`       UTF8 encoded string
+ ``time``        :token:`integer`,     A time (with no corresponding date value) with nanosecond precision. See
+                 :token:`string`       :ref:`times` below for details
+ ``timestamp``   :token:`integer`,     A timestamp (date and time) with millisecond precision. See :ref:`timestamps`
+                 :token:`string`       below for details
+ ``timeuuid``    :token:`uuid`         Version 1 UUID_, generally used as a “conflict-free” timestamp. Also see
+                                       :ref:`timeuuid-functions`
+ ``tinyint``     :token:`integer`      8-bit signed int
+ ``uuid``        :token:`uuid`         A UUID_ (of any version)
+ ``varchar``     :token:`string`       UTF8 encoded string
+ ``varint``      :token:`integer`      Arbitrary-precision integer
+=============== ===================== ==================================================================================
+
+.. _counters:
+
+Counters
+~~~~~~~~
+
+The ``counter`` type is used to define *counter columns*. A counter column is a column whose value is a 64-bit signed
+integer and on which 2 operations are supported: incrementing and decrementing (see the :ref:`UPDATE statement
+<update-statement>` for syntax). Note that the value of a counter cannot be set: a counter does not exist until first
+incremented/decremented, and that first increment/decrement is made as if the prior value was 0.
+
+.. _counter-limitations:
+
+Counters have a number of important limitations:
+
+- They cannot be used for columns part of the ``PRIMARY KEY`` of a table.
+- A table that contains a counter can only contain counters. In other words, either all the columns of a table outside
+  the ``PRIMARY KEY`` have the ``counter`` type, or none of them have it.
+- Counters do not support :ref:`expiration <ttls>`.
+- The deletion of counters is supported, but is only guaranteed to work the first time you delete a counter. In other
+  words, you should not re-update a counter that you have deleted (if you do, proper behavior is not guaranteed).
+- Counter updates are, by nature, not `idemptotent <https://en.wikipedia.org/wiki/Idempotence>`__. An important
+  consequence is that if a counter update fails unexpectedly (timeout or loss of connection to the coordinator node),
+  the client has no way to know if the update has been applied or not. In particular, replaying the update may or may
+  not lead to an over count.
+
+.. _timestamps:
+
+Working with timestamps
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Values of the ``timestamp`` type are encoded as 64-bit signed integers representing a number of milliseconds since the
+standard base time known as `the epoch <https://en.wikipedia.org/wiki/Unix_time>`__: January 1 1970 at 00:00:00 GMT.
+
+Timestamps can be input in CQL either using their value as an :token:`integer`, or using a :token:`string` that
+represents an `ISO 8601 <https://en.wikipedia.org/wiki/ISO_8601>`__ date. For instance, all of the values below are
+valid ``timestamp`` values for  Mar 2, 2011, at 04:05:00 AM, GMT:
+
+- ``1299038700000``
+- ``'2011-02-03 04:05+0000'``
+- ``'2011-02-03 04:05:00+0000'``
+- ``'2011-02-03 04:05:00.000+0000'``
+- ``'2011-02-03T04:05+0000'``
+- ``'2011-02-03T04:05:00+0000'``
+- ``'2011-02-03T04:05:00.000+0000'``
+
+The ``+0000`` above is an RFC 822 4-digit time zone specification; ``+0000`` refers to GMT. US Pacific Standard Time is
+``-0800``. The time zone may be omitted if desired (``'2011-02-03 04:05:00'``), and if so, the date will be interpreted
+as being in the time zone under which the coordinating Cassandra node is configured. There are however difficulties
+inherent in relying on the time zone configuration being as expected, so it is recommended that the time zone always be
+specified for timestamps when feasible.
+
+The time of day may also be omitted (``'2011-02-03'`` or ``'2011-02-03+0000'``), in which case the time of day will
+default to 00:00:00 in the specified or default time zone. However, if only the date part is relevant, consider using
+the :ref:`date <dates>` type.
+
+.. _dates:
+
+Working with dates
+^^^^^^^^^^^^^^^^^^
+
+Values of the ``date`` type are encoded as 32-bit unsigned integers representing a number of days with “the epoch” at
+the center of the range (2^31). Epoch is January 1st, 1970
+
+As for :ref:`timestamp <timestamps>`, a date can be input either as an :token:`integer` or using a date
+:token:`string`. In the later case, the format should be ``yyyy-mm-dd`` (so ``'2011-02-03'`` for instance).
+
+.. _times:
+
+Working with times
+^^^^^^^^^^^^^^^^^^
+
+Values of the ``time`` type are encoded as 64-bit signed integers representing the number of nanoseconds since midnight.
+
+As for :ref:`timestamp <timestamps>`, a time can be input either as an :token:`integer` or using a :token:`string`
+representing the time. In the later case, the format should be ``hh:mm:ss[.fffffffff]`` (where the sub-second precision
+is optional and if provided, can be less than the nanosecond). So for instance, the following are valid inputs for a
+time:
+
+-  ``'08:12:54'``
+-  ``'08:12:54.123'``
+-  ``'08:12:54.123456'``
+-  ``'08:12:54.123456789'``
+
+
+.. _collections:
+
+Collections
+^^^^^^^^^^^
+
+CQL supports 3 kind of collections: :ref:`maps`, :ref:`sets` and :ref:`lists`. The types of those collections is defined
+by:
+
+.. productionlist::
+   collection_type: MAP '<' `cql_type` ',' `cql_type` '>'
+                  : | SET '<' `cql_type` '>'
+                  : | LIST '<' `cql_type` '>'
+
+and their values can be inputd using collection literals:
+
+.. productionlist::
+   collection_literal: `map_literal` | `set_literal` | `list_literal`
+   map_literal: '{' [ `term` ':' `term` (',' `term` : `term`)* ] '}'
+   set_literal: '{' [ `term` (',' `term`)* ] '}'
+   list_literal: '[' [ `term` (',' `term`)* ] ']'
+
+Note however that neither :token:`bind_marker` nor ``NULL`` are supported inside collection literals.
+
+Noteworthy characteristics
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Collections are meant for storing/denormalizing relatively small amount of data. They work well for things like “the
+phone numbers of a given user”, “labels applied to an email”, etc. But when items are expected to grow unbounded (“all
+messages sent by a user”, “events registered by a sensor”...), then collections are not appropriate and a specific table
+(with clustering columns) should be used. Concretely, (non-frozen) collections have the following noteworthy
+characteristics and limitations:
+
+- Individual collections are not indexed internally. Which means that even to access a single element of a collection,
+  the while collection has to be read (and reading one is not paged internally).
+- While insertion operations on sets and maps never incur a read-before-write internally, some operations on lists do.
+  Further, some lists operations are not idempotent by nature (see the section on :ref:`lists <lists>` below for
+  details), making their retry in case of timeout problematic. It is thus advised to prefer sets over lists when
+  possible.
+
+Please note that while some of those limitations may or may not be removed/improved upon in the future, it is a
+anti-pattern to use a (single) collection to store large amounts of data.
+
+.. _maps:
+
+Maps
+~~~~
+
+A ``map`` is a (sorted) set of key-value pairs, where keys are unique and the map is sorted by its keys. You can define
+and insert a map with::
+
+    CREATE TABLE users (
+        id text PRIMARY KEY,
+        name text,
+        favs map<text, text> // A map of text keys, and text values
+    );
+
+    INSERT INTO users (id, name, favs)
+               VALUES ('jsmith', 'John Smith', { 'fruit' : 'Apple', 'band' : 'Beatles' });
+
+    // Replace the existing map entirely.
+    UPDATE users SET favs = { 'fruit' : 'Banana' } WHERE id = 'jsmith';
+
+Further, maps support:
+
+- Updating or inserting one or more elements::
+
+    UPDATE users SET favs['author'] = 'Ed Poe' WHERE id = 'jsmith';
+    UPDATE users SET favs = favs + { 'movie' : 'Cassablanca', 'band' : 'ZZ Top' } WHERE id = 'jsmith';
+
+- Removing one or more element (if an element doesn't exist, removing it is a no-op but no error is thrown)::
+
+    DELETE favs['author'] FROM users WHERE id = 'jsmith';
+    UPDATE users SET favs = favs - { 'movie', 'band'} WHERE id = 'jsmith';
+
+  Note that for removing multiple elements in a ``map``, you remove from it a ``set`` of keys.
+
+Lastly, TTLs are allowed for both ``INSERT`` and ``UPDATE``, but in both case the TTL set only apply to the newly
+inserted/updated elements. In other words::
+
+    UPDATE users USING TTL 10 SET favs['color'] = 'green' WHERE id = 'jsmith';
+
+will only apply the TTL to the ``{ 'color' : 'green' }`` record, the rest of the map remaining unaffected.
+
+
+.. _sets:
+
+Sets
+~~~~
+
+A ``set`` is a (sorted) collection of unique values. You can define and insert a map with::
+
+    CREATE TABLE images (
+        name text PRIMARY KEY,
+        owner text,
+        tags set<text> // A set of text values
+    );
+
+    INSERT INTO images (name, owner, tags)
+                VALUES ('cat.jpg', 'jsmith', { 'pet', 'cute' });
+
+    // Replace the existing set entirely
+    UPDATE images SET tags = { 'kitten', 'cat', 'lol' } WHERE id = 'jsmith';
+
+Further, sets support:
+
+- Adding one or multiple elements (as this is a set, inserting an already existing element is a no-op)::
+
+    UPDATE images SET tags = tags + { 'gray', 'cuddly' } WHERE name = 'cat.jpg';
+
+- Removing one or multiple elements (if an element doesn't exist, removing it is a no-op but no error is thrown)::
+
+    UPDATE images SET tags = tags - { 'cat' } WHERE name = 'cat.jpg';
+
+Lastly, as for :ref:`maps <maps>`, TTLs if used only apply to the newly inserted values.
+
+.. _lists:
+
+Lists
+~~~~~
+
+.. note:: As mentioned above and further discussed at the end of this section, lists have limitations and specific
+   performance considerations that you should take into account before using them. In general, if you can use a
+   :ref:`set <sets>` instead of list, always prefer a set.
+
+A ``list`` is a (sorted) collection of non-unique values where elements are ordered by there position in the list. You
+can define and insert a list with::
+
+    CREATE TABLE plays (
+        id text PRIMARY KEY,
+        game text,
+        players int,
+        scores list<int> // A list of integers
+    )
+
+    INSERT INTO plays (id, game, players, scores)
+               VALUES ('123-afde', 'quake', 3, [17, 4, 2]);
+
+    // Replace the existing list entirely
+    UPDATE plays SET scores = [ 3, 9, 4] WHERE id = '123-afde';
+
+Further, lists support:
+
+- Appending and prepending values to a list::
+
+    UPDATE plays SET players = 5, scores = scores + [ 14, 21 ] WHERE id = '123-afde';
+    UPDATE plays SET players = 6, scores = [ 3 ] + scores WHERE id = '123-afde';
+
+- Setting the value at a particular position in the list. This imply that the list has a pre-existing element for that
+  position or an error will be thrown that the list is too small::
+
+    UPDATE plays SET scores[1] = 7 WHERE id = '123-afde';
+
+- Removing an element by its position in the list. This imply that the list has a pre-existing element for that position
+  or an error will be thrown that the list is too small. Further, as the operation removes an element from the list, the
+  list size will be diminished by 1, shifting the position of all the elements following the one deleted::
+
+    DELETE scores[1] FROM plays WHERE id = '123-afde';
+
+- Deleting *all* the occurrences of particular values in the list (if a particular element doesn't occur at all in the
+  list, it is simply ignored and no error is thrown)::
+
+    UPDATE plays SET scores = scores - [ 12, 21 ] WHERE id = '123-afde';
+
+.. warning:: The append and prepend operations are not idempotent by nature. So in particular, if one of these operation
+   timeout, then retrying the operation is not safe and it may (or may not) lead to appending/prepending the value
+   twice.
+
+.. warning:: Setting and removing an element by position and removing occurences of particular values incur an internal
+   *read-before-write*. They will thus run more slowly and take more ressources than usual updates (with the exclusion
+   of conditional write that have their own cost).
+
+Lastly, as for :ref:`maps <maps>`, TTLs when used only apply to the newly inserted values.
+
+.. _udts:
+
+User-Defined Types
+^^^^^^^^^^^^^^^^^^
+
+CQL support the definition of user-defined types (UDT for short). Such a type can be created, modified and removed using
+the :token:`create_type_statement`, :token:`alter_type_statement` and :token:`drop_type_statement` described below. But
+once created, a UDT is simply referred to by its name:
+
+.. productionlist::
+   user_defined_type: `udt_name`
+   udt_name: [ `keyspace_name` '.' ] `identifier`
+
+
+Creating a UDT
+~~~~~~~~~~~~~~
+
+Creating a new user-defined type is done using a ``CREATE TYPE`` statement defined by:
+
+.. productionlist::
+   create_type_statement: CREATE TYPE [ IF NOT EXISTS ] `udt_name`
+                        :     '(' `field_definition` ( ',' `field_definition` )* ')'
+   field_definition: `identifier` `cql_type`
+
+A UDT has a name (used to declared columns of that type) and is a set of named and typed fields. Fields name can be any
+type, including collections or other UDT. For instance::
+
+    CREATE TYPE phone (
+        country_code int,
+        number text,
+    )
+
+    CREATE TYPE address (
+        street text,
+        city text,
+        zip int,
+        phones map<text, phone>
+    )
+
+    CREATE TABLE user (
+        name text PRIMARY KEY,
+        addresses map<text, frozen<address>>
+    )
+
+Note that:
+
+- Attempting to create an already existing type will result in an error unless the ``IF NOT EXISTS`` option is used. If
+  it is used, the statement will be a no-op if the type already exists.
+- A type is intrinsically bound to the keyspace in which it is created, and can only be used in that keyspace. At
+  creation, if the type name is prefixed by a keyspace name, it is created in that keyspace. Otherwise, it is created in
+  the current keyspace.
+- As of Cassandra |version|, UDT have to be frozen in most cases, hence the ``frozen<address>`` in the table definition
+  above. Please see the section on :ref:`frozen <frozen>` for more details.
+
+UDT literals
+~~~~~~~~~~~~
+
+Once a used-defined type has been created, value can be input using a UDT literal:
+
+.. productionlist::
+   udt_literal: '{' `identifier` ':' `term` ( ',' `identifier` ':' `term` )* '}'
+
+In other words, a UDT literal is like a :ref:`map <maps>` literal but its keys are the names of the fields of the type.
+For instance, one could insert into the table define in the previous section using::
+
+    INSERT INTO user (name, addresses)
+              VALUES ('z3 Pr3z1den7', {
+                  'home' : {
+                      street: '1600 Pennsylvania Ave NW',
+                      city: 'Washington',
+                      zip: '20500',
+                      phones: { 'cell' : { country_code: 1, number: '202 456-1111' },
+                                'landline' : { country_code: 1, number: '...' } }
+                  }
+                  'work' : {
+                      street: '1600 Pennsylvania Ave NW',
+                      city: 'Washington',
+                      zip: '20500',
+                      phones: { 'fax' : { country_code: 1, number: '...' } }
+                  }
+              })
+
+To be valid, a UDT literal should only include fields defined by the type it is a literal of, but it can omit some field
+(in which case those will be ``null``).
+
+Altering a UDT
+~~~~~~~~~~~~~~
+
+An existing user-defined type can be modified using an ``ALTER TYPE`` statement:
+
+.. productionlist::
+   alter_type_statement: ALTER TYPE `udt_name` `alter_type_modification`
+   alter_type_modification: ALTER `identifier` TYPE `cql_type`
+                          : | ADD `field_definition`
+                          : | RENAME `identifier` TO `identifier` ( `identifier` TO `identifier` )*
+
+You can:
+
+- modify the type of particular field (``ALTER TYPE address ALTER zip TYPE bigint``). The restrictions for such change
+  are the same than when :ref:`altering the type of column <alter-table-statement>`.
+- add a new field to the type (``ALTER TYPE address ADD country text``). That new field will be ``null`` for any values
+  of the type created before the addition.
+- rename the fields of the type (``ALTER TYPE address RENAME zip TO zipcode``).
+
+Dropping a UDT
+~~~~~~~~~~~~~~
+
+You can drop an existing user-defined type using a ``DROP TYPE`` statement:
+
+.. productionlist::
+   drop_type_statement: DROP TYPE [ IF EXISTS ] `udt_name`
+
+Dropping a type results in the immediate, irreversible removal of that type. However, attempting to drop a type that is
+still in use by another type, table or function will result in an error.
+
+If the type dropped does not exist, an error will be returned unless ``IF EXISTS`` is used, in which case the operation
+is a no-op.
+
+.. _tuples:
+
+Tuples
+^^^^^^
+
+CQL also support tuples and tuple types (where the elements can be of different types). Functionally, tuples can be
+though as anonymous UDT with anonymous fields. Tuple types and tuple literals are defined by:
+
+.. productionlist::
+   tuple_type: TUPLE '<' `cql_type` ( ',' `cql_type` )* '>'
+   tuple_literal: '(' `term` ( ',' `term` )* ')'
+
+and can be used thusly::
+
+    CREATE TABLE durations (
+        event text,
+        duration tuple<int, text>,
+    )
+
+    INSERT INTO durations (event, duration) VALUES ('ev1', (3, 'hours'));
+
+Unlike other "composed" types (collections and UDT), a tuple is always :ref:`frozen <frozen>` (without the need of the
+`frozen` keyword) and it is not possible to update only some elements of a tuple (without updating the whole tuple).
+Also, a tuple literal should always have the same number of value than declared in the type it is a tuple of (some of
+those values can be null but they need to be explicitly declared as so).
+
+.. _custom-types:
+
+Custom Types
+^^^^^^^^^^^^
+
+.. note:: Custom types exists mostly for backward compatiliby purposes and their usage is discouraged. Their usage is
+   complex, not user friendly and the other provided types, particularly :ref:`user-defined types <udts>`, should almost
+   always be enough.
+
+A custom type is defined by:
+
+.. productionlist::
+   custom_type: `string`
+
+A custom type is a :token:`string` that contains the name of Java class that extends the server side ``AbstractType``
+class and that can be loaded by Cassandra (it should thus be in the ``CLASSPATH`` of every node running Cassandra). That
+class will define what values are valid for the type and how the time sorts when used for a clustering column. For any
+other purpose, a value of a custom type is the same than that of a ``blob``, and can in particular be input using the
+:token:`blob` literal syntax.

diff --git a/doc/source/data_modeling/index.rst b/doc/source/data_modeling/index.rst
new file mode 100644
index 0000000..dde031a
--- /dev/null
+++ b/doc/source/data_modeling/index.rst

@@ -0,0 +1,20 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Data Modeling
+=============
+
+.. todo:: TODO

diff --git a/doc/source/faq/index.rst b/doc/source/faq/index.rst
new file mode 100644
index 0000000..4ac0be4
--- /dev/null
+++ b/doc/source/faq/index.rst

@@ -0,0 +1,20 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Frequently Asked Questions
+==========================
+
+.. TODO: todo

diff --git a/doc/source/getting_started/configuring.rst b/doc/source/getting_started/configuring.rst
new file mode 100644
index 0000000..27fac78
--- /dev/null
+++ b/doc/source/getting_started/configuring.rst

@@ -0,0 +1,67 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Configuring Cassandra
+---------------------
+
+For running Cassandra on a single node, the steps above are enough, you don't really need to change any configuration.
+However, when you deploy a cluster of nodes, or use clients that are not on the same host, then there are some
+parameters that must be changed.
+
+The Cassandra configuration files can be found in the ``conf`` directory of tarballs. For packages, the configuration
+files will be located in ``/etc/cassandra``.
+
+Main runtime properties
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Most of configuration in Cassandra is done via yaml properties that can be set in ``cassandra.yaml``. At a minimum you
+should consider setting the following properties:
+
+- ``cluster_name``: the name of your cluster.
+- ``seeds``: a comma separated list of the IP addresses of your cluster seeds.
+- ``storage_port``: you don't necessarily need to change this but make sure that there are no firewalls blocking this
+  port.
+- ``listen_address``: the IP address of your node, this is what allows other nodes to communicate with this node so it
+  is important that you change it. Alternatively, you can set ``listen_interface`` to tell Cassandra which interface to
+  use, and consecutively which address to use. Set only one, not both.
+- ``native_transport_port``: as for storage\_port, make sure this port is not blocked by firewalls as clients will
+  communicate with Cassandra on this port.
+
+Changing the location of directories
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The following yaml properties control the location of directories:
+
+- ``data_file_directories``: one or more directories where data files are located.
+- ``commitlog_directory``: the directory where commitlog files are located.
+- ``saved_caches_directory``: the directory where saved caches are located.
+- ``hints_directory``: the directory where hints are located.
+
+For performance reasons, if you have multiple disks, consider putting commitlog and data files on different disks.
+
+Environment variables
+^^^^^^^^^^^^^^^^^^^^^
+
+JVM-level settings such as heap size can be set in ``cassandra-env.sh``.  You can add any additional JVM command line
+argument to the ``JVM_OPTS`` environment variable; when Cassandra starts these arguments will be passed to the JVM.
+
+Logging
+^^^^^^^
+
+The logger in use is logback. You can change logging properties by editing ``logback.xml``. By default it will log at
+INFO level into a file called ``system.log`` and at debug level into a file called ``debug.log``. When running in the
+foreground, it will also log at INFO level to the console.
+

diff --git a/doc/source/getting_started/drivers.rst b/doc/source/getting_started/drivers.rst
new file mode 100644
index 0000000..baec823
--- /dev/null
+++ b/doc/source/getting_started/drivers.rst

@@ -0,0 +1,107 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. _client-drivers:
+
+Client drivers
+--------------
+
+Here are known Cassandra client drivers organized by language. Before choosing a driver, you should verify the Cassandra
+version and functionality supported by a specific driver.
+
+Java
+^^^^
+
+- `Achilles <http://achilles.archinnov.info/>`__
+- `Astyanax <https://github.com/Netflix/astyanax/wiki/Getting-Started>`__
+- `Casser <https://github.com/noorq/casser>`__
+- `Datastax Java driver <https://github.com/datastax/java-driver>`__
+- `Kundera <https://github.com/impetus-opensource/Kundera>`__
+- `PlayORM <https://github.com/deanhiller/playorm>`__
+
+Python
+^^^^^^
+
+- `Datastax Python driver <https://github.com/datastax/python-driver>`__
+
+Ruby
+^^^^
+
+- `Datastax Ruby driver <https://github.com/datastax/ruby-driver>`__
+
+C# / .NET
+^^^^^^^^^
+
+- `Cassandra Sharp <https://github.com/pchalamet/cassandra-sharp>`__
+- `Datastax C# driver <https://github.com/datastax/csharp-driver>`__
+- `Fluent Cassandra <https://github.com/managedfusion/fluentcassandra>`__
+
+Nodejs
+^^^^^^
+
+- `Datastax Nodejs driver <https://github.com/datastax/nodejs-driver>`__
+- `Node-Cassandra-CQL <https://github.com/jorgebay/node-cassandra-cql>`__
+
+PHP
+^^^
+
+- `CQL \| PHP <http://code.google.com/a/apache-extras.org/p/cassandra-pdo>`__
+- `Datastax PHP driver <https://github.com/datastax/php-driver/>`__
+- `PHP-Cassandra <https://github.com/aparkhomenko/php-cassandra>`__
+- `PHP Library for Cassandra <http://evseevnn.github.io/php-cassandra-binary/>`__
+
+C++
+^^^
+
+- `Datastax C++ driver <https://github.com/datastax/cpp-driver>`__
+- `libQTCassandra <http://sourceforge.net/projects/libqtcassandra>`__
+
+Scala
+^^^^^
+
+- `Datastax Spark connector <https://github.com/datastax/spark-cassandra-connector>`__
+- `Phantom <https://github.com/newzly/phantom>`__
+- `Quill <https://github.com/getquill/quill>`__
+
+Clojure
+^^^^^^^
+
+- `Alia <https://github.com/mpenet/alia>`__
+- `Cassaforte <https://github.com/clojurewerkz/cassaforte>`__
+- `Hayt <https://github.com/mpenet/hayt>`__
+
+Erlang
+^^^^^^
+
+- `CQerl <https://github.com/matehat/cqerl>`__
+- `Erlcass <https://github.com/silviucpp/erlcass>`__
+
+Go
+^^
+
+- `CQLc <http://relops.com/cqlc/>`__
+- `Gocassa <https://github.com/hailocab/gocassa>`__
+- `GoCQL <https://github.com/gocql/gocql>`__
+
+Haskell
+^^^^^^^
+
+- `Cassy <https://github.com/ozataman/cassy>`__
+
+Rust
+^^^^
+
+- `Rust CQL <https://github.com/neich/rust-cql>`__

diff --git a/doc/source/getting_started/index.rst b/doc/source/getting_started/index.rst
new file mode 100644
index 0000000..4ca9c4d
--- /dev/null
+++ b/doc/source/getting_started/index.rst

@@ -0,0 +1,33 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Getting Started
+===============
+
+This section covers how to get started using Apache Cassandra and should be the first thing to read if you are new to
+Cassandra.
+
+.. toctree::
+   :maxdepth: 2
+
+   installing
+   configuring
+   querying
+   drivers
+
+

diff --git a/doc/source/getting_started/installing.rst b/doc/source/getting_started/installing.rst
new file mode 100644
index 0000000..ad0a1e8
--- /dev/null
+++ b/doc/source/getting_started/installing.rst

@@ -0,0 +1,99 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Installing Cassandra
+--------------------
+
+Prerequisites
+^^^^^^^^^^^^^
+
+- The latest version of Java 8, either the `Oracle Java Standard Edition 8
+  <http://www.oracle.com/technetwork/java/javase/downloads/index.html>`__ or `OpenJDK 8 <http://openjdk.java.net/>`__. To
+  verify that you have the correct version of java installed, type ``java -version``.
+
+- For using cqlsh, the latest version of `Python 2.7 <https://www.python.org/downloads/>`__. To verify that you have
+  the correct version of Python installed, type ``python --version``.
+
+Installation from binary tarball files
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- Download the latest stable release from the `Apache Cassandra downloads website <http://cassandra.apache.org/download/>`__.
+
+- Untar the file somewhere, for example:
+
+::
+
+    tar -xvf apache-cassandra-3.6-bin.tar.gz cassandra
+
+The files will be extracted into ``apache-cassandra-3.6``, you need to substitute 3.6 with the release number that you
+have downloaded.
+
+- Optionally add ``apache-cassandra-3.6\bin`` to your path.
+- Start Cassandra in the foreground by invoking ``bin/cassandra -f`` from the command line. Press "Control-C" to stop
+  Cassandra. Start Cassandra in the background by invoking ``bin/cassandra`` from the command line. Invoke ``kill pid``
+  or ``pkill -f CassandraDaemon`` to stop Cassandra, where pid is the Cassandra process id, which you can find for
+  example by invoking ``pgrep -f CassandraDaemon``.
+- Verify that Cassandra is running by invoking ``bin/nodetool status`` from the command line.
+- Configuration files are located in the ``conf`` sub-directory.
+- Since Cassandra 2.1, log and data directories are located in the ``logs`` and ``data`` sub-directories respectively.
+  Older versions defaulted to ``/var/log/cassandra`` and ``/var/lib/cassandra``. Due to this, it is necessary to either
+  start Cassandra with root privileges or change ``conf/cassandra.yaml`` to use directories owned by the current user,
+  as explained below in the section on changing the location of directories.
+
+Installation from Debian packages
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+- Add the Apache repository of Cassandra to ``/etc/apt/sources.list.d/cassandra.sources.list``, for example for version
+  3.6:
+
+::
+
+    echo "deb http://www.apache.org/dist/cassandra/debian 36x main" | sudo tee -a /etc/apt/sources.list.d/cassandra.sources.list
+
+- Update the repositories:
+
+::
+
+    sudo apt-get update
+
+- If you encounter this error:
+
+::
+
+    GPG error: http://www.apache.org 36x InRelease: The following signatures couldn't be verified because the public key is not available: NO_PUBKEY 749D6EEC0353B12C
+
+Then add the public key 749D6EEC0353B12C as follows:
+
+::
+
+    gpg --keyserver pgp.mit.edu --recv-keys 749D6EEC0353B12C
+    gpg --export --armor 749D6EEC0353B12C | sudo apt-key add -
+
+and repeat ``sudo apt-get update``. The actual key may be different, you get it from the error message itself. For a
+full list of Apache contributors public keys, you can refer to `this link <https://www.apache.org/dist/cassandra/KEYS>`__.
+
+- Install Cassandra:
+
+::
+
+    sudo apt-get install cassandra
+
+- You can start Cassandra with ``sudo service cassandra start`` and stop it with ``sudo service cassandra stop``.
+  However, normally the service will start automatically. For this reason be sure to stop it if you need to make any
+  configuration changes.
+- Verify that Cassandra is running by invoking ``nodetool status`` from the command line.
+- The default location of configuration files is ``/etc/cassandra``.
+- The default location of log and data directories is ``/var/log/cassandra/`` and ``/var/lib/cassandra``.

diff --git a/doc/source/getting_started/querying.rst b/doc/source/getting_started/querying.rst
new file mode 100644
index 0000000..55b162b
--- /dev/null
+++ b/doc/source/getting_started/querying.rst

@@ -0,0 +1,52 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Inserting and querying
+----------------------
+
+The API to Cassandra is :ref:`CQL <cql>`, the Cassandra Query Language. To use CQL, you will need to connect to the
+cluster, which can be done:
+
+- either using cqlsh,
+- or through a client driver for Cassandra.
+
+CQLSH
+^^^^^
+
+cqlsh is a command line shell for interacting with Cassandra through CQL. It is shipped with every Cassandra package,
+and can be found in the bin/ directory alongside the cassandra executable. It connects to the single node specified on
+the command line. For example::
+
+    $ bin/cqlsh localhost
+    Connected to Test Cluster at localhost:9042.
+    [cqlsh 5.0.1 | Cassandra 3.8 | CQL spec 3.4.2 | Native protocol v4]
+    Use HELP for help.
+    cqlsh> SELECT cluster_name, listen_address FROM system.local;
+
+     cluster_name | listen_address
+    --------------+----------------
+     Test Cluster |      127.0.0.1
+
+    (1 rows)
+    cqlsh>
+
+See the :ref:`cqlsh section <cqlsh>` for full documentation.
+
+Client drivers
+^^^^^^^^^^^^^^
+
+A lot of client drivers are provided by the Community and a list of known drivers is provided in :ref:`the next section
+<client-drivers>`. You should refer to the documentation of each drivers for more information on how to use them.

diff --git a/doc/source/index.rst b/doc/source/index.rst
new file mode 100644
index 0000000..ec27f5a
--- /dev/null
+++ b/doc/source/index.rst

@@ -0,0 +1,40 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Welcome to Apache Cassandra's documentation!
+============================================
+
+This is the official documentation for `Apache Cassandra <http://cassandra.apache.org>`__ |version|.  If you would like
+to contribute to this documentation, you are welcome to do so by submitting your contribution like any other patch
+following `these instructions <https://wiki.apache.org/cassandra/HowToContribute>`__.
+
+Contents:
+
+.. toctree::
+   :maxdepth: 2
+
+   getting_started/index
+   architecture/index
+   data_modeling/index
+   cql/index
+   configuration/index
+   operating/index
+   tools/index
+   troubleshooting/index
+   faq/index
+
+   bugs
+   contactus

diff --git a/doc/source/operating/backups.rst b/doc/source/operating/backups.rst
new file mode 100644
index 0000000..c071e83
--- /dev/null
+++ b/doc/source/operating/backups.rst

@@ -0,0 +1,22 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Backups
+=======
+
+.. todo:: TODO

diff --git a/doc/source/operating/bloom_filters.rst b/doc/source/operating/bloom_filters.rst
new file mode 100644
index 0000000..0b37c18
--- /dev/null
+++ b/doc/source/operating/bloom_filters.rst

@@ -0,0 +1,65 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Bloom Filters
+-------------
+
+In the read path, Cassandra merges data on disk (in SSTables) with data in RAM (in memtables). To avoid checking every
+SSTable data file for the partition being requested, Cassandra employs a data structure known as a bloom filter.
+
+Bloom filters are a probabilistic data structure that allows Cassandra to determine one of two possible states: - The
+data definitely does not exist in the given file, or - The data probably exists in the given file.
+
+While bloom filters can not guarantee that the data exists in a given SSTable, bloom filters can be made more accurate
+by allowing them to consume more RAM. Operators have the opportunity to tune this behavior per table by adjusting the
+the ``bloom_filter_fp_chance`` to a float between 0 and 1.
+
+The default value for ``bloom_filter_fp_chance`` is 0.1 for tables using LeveledCompactionStrategy and 0.01 for all
+other cases.
+
+Bloom filters are stored in RAM, but are stored offheap, so operators should not consider bloom filters when selecting
+the maximum heap size.  As accuracy improves (as the ``bloom_filter_fp_chance`` gets closer to 0), memory usage
+increases non-linearly - the bloom filter for ``bloom_filter_fp_chance = 0.01`` will require about three times as much
+memory as the same table with ``bloom_filter_fp_chance = 0.1``.
+
+Typical values for ``bloom_filter_fp_chance`` are usually between 0.01 (1%) to 0.1 (10%) false-positive chance, where
+Cassandra may scan an SSTable for a row, only to find that it does not exist on the disk. The parameter should be tuned
+by use case:
+
+- Users with more RAM and slower disks may benefit from setting the ``bloom_filter_fp_chance`` to a numerically lower
+  number (such as 0.01) to avoid excess IO operations
+- Users with less RAM, more dense nodes, or very fast disks may tolerate a higher ``bloom_filter_fp_chance`` in order to
+  save RAM at the expense of excess IO operations
+- In workloads that rarely read, or that only perform reads by scanning the entire data set (such as analytics
+  workloads), setting the ``bloom_filter_fp_chance`` to a much higher number is acceptable.
+
+Changing
+^^^^^^^^
+
+The bloom filter false positive chance is visible in the ``DESCRIBE TABLE`` output as the field
+``bloom_filter_fp_chance``. Operators can change the value with an ``ALTER TABLE`` statement:
+::
+
+    ALTER TABLE keyspace.table WITH bloom_filter_fp_chance=0.01
+
+Operators should be aware, however, that this change is not immediate: the bloom filter is calculated when the file is
+written, and persisted on disk as the Filter component of the SSTable. Upon issuing an ``ALTER TABLE`` statement, new
+files on disk will be written with the new ``bloom_filter_fp_chance``, but existing sstables will not be modified until
+they are compacted - if an operator needs a change to ``bloom_filter_fp_chance`` to take effect, they can trigger an
+SSTable rewrite using ``nodetool scrub`` or ``nodetool upgradesstables -a``, both of which will rebuild the sstables on
+disk, regenerating the bloom filters in the progress.

diff --git a/doc/source/operating/cdc.rst b/doc/source/operating/cdc.rst
new file mode 100644
index 0000000..192f62a
--- /dev/null
+++ b/doc/source/operating/cdc.rst

@@ -0,0 +1,89 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Change Data Capture
+-------------------
+
+Overview
+^^^^^^^^
+
+Change data capture (CDC) provides a mechanism to flag specific tables for archival as well as rejecting writes to those
+tables once a configurable size-on-disk for the combined flushed and unflushed CDC-log is reached. An operator can
+enable CDC on a table by setting the table property ``cdc=true`` (either when :ref:`creating the table
+<create-table-statement>` or :ref:`altering it <alter-table-statement>`), after which any CommitLogSegments containing
+data for a CDC-enabled table are moved to the directory specified in ``cassandra.yaml`` on segment discard. A threshold
+of total disk space allowed is specified in the yaml at which time newly allocated CommitLogSegments will not allow CDC
+data until a consumer parses and removes data from the destination archival directory.
+
+Configuration
+^^^^^^^^^^^^^
+
+Enabling or disable CDC on a table
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+CDC is enable or disable through the `cdc` table property, for instance::
+
+    CREATE TABLE foo (a int, b text, PRIMARY KEY(a)) WITH cdc=true;
+
+    ALTER TABLE foo WITH cdc=true;
+
+    ALTER TABLE foo WITH cdc=false;
+
+cassandra.yaml parameters
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The following `cassandra.yaml` are available for CDC:
+
+``cdc_enabled`` (default: false)
+   Enable or disable CDC operations node-wide.
+``cdc_raw_directory`` (default: ``$CASSANDRA_HOME/data/cdc_raw``)
+   Destination for CommitLogSegments to be moved after all corresponding memtables are flushed.
+``cdc_free_space_in_mb``: (default: min of 4096 and 1/8th volume space)
+   Calculated as sum of all active CommitLogSegments that permit CDC + all flushed CDC segments in
+   ``cdc_raw_directory``.
+``cdc_free_space_check_interval_ms`` (default: 250)
+   When at capacity, we limit the frequency with which we re-calculate the space taken up by ``cdc_raw_directory`` to
+   prevent burning CPU cycles unnecessarily. Default is to check 4 times per second.
+
+.. _reading-commitlogsegments:
+
+Reading CommitLogSegments
+^^^^^^^^^^^^^^^^^^^^^^^^^
+This implementation included a refactor of CommitLogReplayer into `CommitLogReader.java
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java>`__.
+Usage is `fairly straightforward
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java#L132-L140>`__
+with a `variety of signatures
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java#L71-L103>`__
+available for use. In order to handle mutations read from disk, implement `CommitLogReadHandler
+<https://github.com/apache/cassandra/blob/e31e216234c6b57a531cae607e0355666007deb2/src/java/org/apache/cassandra/db/commitlog/CommitLogReadHandler.java>`__.
+
+Warnings
+^^^^^^^^
+
+**Do not enable CDC without some kind of consumption process in-place.**
+
+The initial implementation of Change Data Capture does not include a parser (see :ref:`reading-commitlogsegments` above)
+so, if CDC is enabled on a node and then on a table, the ``cdc_free_space_in_mb`` will fill up and then writes to
+CDC-enabled tables will be rejected unless some consumption process is in place.
+
+Further Reading
+^^^^^^^^^^^^^^^
+
+- `Design doc <https://docs.google.com/document/d/1ZxCWYkeZTquxsvf5hdPc0fiUnUHna8POvgt6TIzML4Y/edit>`__
+- `JIRA ticket <https://issues.apache.org/jira/browse/CASSANDRA-8844>`__

diff --git a/doc/source/operating/compaction.rst b/doc/source/operating/compaction.rst
new file mode 100644
index 0000000..8d70a41
--- /dev/null
+++ b/doc/source/operating/compaction.rst

@@ -0,0 +1,432 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+.. _compaction:
+
+Compaction
+----------
+
+Types of compaction
+^^^^^^^^^^^^^^^^^^^
+
+The concept of compaction is used for different kinds of operations in Cassandra, the common thing about these
+operations is that it takes one or more sstables and output new sstables. The types of compactions are;
+
+Minor compaction
+    triggered automatically in Cassandra.
+Major compaction
+    a user executes a compaction over all sstables on the node.
+User defined compaction
+    a user triggers a compaction on a given set of sstables.
+Scrub
+    try to fix any broken sstables. This can actually remove valid data if that data is corrupted, if that happens you
+    will need to run a full repair on the node.
+Upgradesstables
+    upgrade sstables to the latest version. Run this after upgrading to a new major version.
+Cleanup
+    remove any ranges this node does not own anymore, typically triggered on neighbouring nodes after a node has been
+    bootstrapped since that node will take ownership of some ranges from those nodes.
+Secondary index rebuild
+    rebuild the secondary indexes on the node.
+Anticompaction
+    after repair the ranges that were actually repaired are split out of the sstables that existed when repair started.
+
+When is a minor compaction triggered?
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+#  When an sstable is added to the node through flushing/streaming etc.
+#  When autocompaction is enabled after being disabled (``nodetool enableautocompaction``)
+#  When compaction adds new sstables.
+#  A check for new minor compactions every 5 minutes.
+
+Merging sstables
+^^^^^^^^^^^^^^^^
+
+Compaction is about merging sstables, since partitions in sstables are sorted based on the hash of the partition key it
+is possible to efficiently merge separate sstables. Content of each partition is also sorted so each partition can be
+merged efficiently.
+
+Tombstones and Garbage Collection (GC) Grace
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Why Tombstones
+~~~~~~~~~~~~~~
+
+When a delete request is received by Cassandra it does not actually remove the data from the underlying store. Instead
+it writes a special piece of data known as a tombstone. The Tombstone represents the delete and causes all values which
+occurred before the tombstone to not appear in queries to the database. This approach is used instead of removing values
+because of the distributed nature of Cassandra.
+
+Deletes without tombstones
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Imagine a three node cluster which has the value [A] replicated to every node.::
+
+    [A], [A], [A]
+
+If one of the nodes fails and and our delete operation only removes existing values we can end up with a cluster that
+looks like::
+
+    [], [], [A]
+
+Then a repair operation would replace the value of [A] back onto the two
+nodes which are missing the value.::
+
+    [A], [A], [A]
+
+This would cause our data to be resurrected even though it had been
+deleted.
+
+Deletes with Tombstones
+~~~~~~~~~~~~~~~~~~~~~~~
+
+Starting again with a three node cluster which has the value [A] replicated to every node.::
+
+    [A], [A], [A]
+
+If instead of removing data we add a tombstone record, our single node failure situation will look like this.::
+
+    [A, Tombstone[A]], [A, Tombstone[A]], [A]
+
+Now when we issue a repair the Tombstone will be copied to the replica, rather than the deleted data being
+resurrected.::
+
+    [A, Tombstone[A]], [A, Tombstone[A]], [A, Tombstone[A]]
+
+Our repair operation will correctly put the state of the system to what we expect with the record [A] marked as deleted
+on all nodes. This does mean we will end up accruing Tombstones which will permanently accumulate disk space. To avoid
+keeping tombstones forever we have a parameter known as ``gc_grace_seconds`` for every table in Cassandra.
+
+The gc_grace_seconds parameter and Tombstone Removal
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The table level ``gc_grace_seconds`` parameter controls how long Cassandra will retain tombstones through compaction
+events before finally removing them. This duration should directly reflect the amount of time a user expects to allow
+before recovering a failed node. After ``gc_grace_seconds`` has expired the tombstone may be removed (meaning there will
+no longer be any record that a certain piece of data was deleted), but as a tombstone can live in one sstable and the
+data it covers in another, a compaction must also include both sstable for a tombstone to be removed. More precisely, to
+be able to drop an actual tombstone the following needs to be true;
+
+- The tombstone must be older than ``gc_grace_seconds``
+- If partition X contains the tombstone, the sstable containing the partition plus all sstables containing data older
+  than the tombstone containing X must be included in the same compaction. We don't need to care if the partition is in
+  an sstable if we can guarantee that all data in that sstable is newer than the tombstone. If the tombstone is older
+  than the data it cannot shadow that data.
+- If the option ``only_purge_repaired_tombstones`` is enabled, tombstones are only removed if the data has also been
+  repaired.
+
+If a node remains down or disconnected for longer than ``gc_grace_seconds`` it's deleted data will be repaired back to
+the other nodes and re-appear in the cluster. This is basically the same as in the "Deletes without Tombstones" section.
+Note that tombstones will not be removed until a compaction event even if ``gc_grace_seconds`` has elapsed.
+
+The default value for ``gc_grace_seconds`` is 864000 which is equivalent to 10 days. This can be set when creating or
+altering a table using ``WITH gc_grace_seconds``.
+
+TTL
+^^^
+
+Data in Cassandra can have an additional property called time to live - this is used to automatically drop data that has
+expired once the time is reached. Once the TTL has expired the data is converted to a tombstone which stays around for
+at least ``gc_grace_seconds``. Note that if you mix data with TTL and data without TTL (or just different length of the
+TTL) Cassandra will have a hard time dropping the tombstones created since the partition might span many sstables and
+not all are compacted at once.
+
+Fully expired sstables
+^^^^^^^^^^^^^^^^^^^^^^
+
+If an sstable contains only tombstones and it is guaranteed that that sstable is not shadowing data in any other sstable
+compaction can drop that sstable. If you see sstables with only tombstones (note that TTL:ed data is considered
+tombstones once the time to live has expired) but it is not being dropped by compaction, it is likely that other
+sstables contain older data. There is a tool called ``sstableexpiredblockers`` that will list which sstables are
+droppable and which are blocking them from being dropped. This is especially useful for time series compaction with
+``TimeWindowCompactionStrategy`` (and the deprecated ``DateTieredCompactionStrategy``).
+
+Repaired/unrepaired data
+^^^^^^^^^^^^^^^^^^^^^^^^
+
+With incremental repairs Cassandra must keep track of what data is repaired and what data is unrepaired. With
+anticompaction repaired data is split out into repaired and unrepaired sstables. To avoid mixing up the data again
+separate compaction strategy instances are run on the two sets of data, each instance only knowing about either the
+repaired or the unrepaired sstables. This means that if you only run incremental repair once and then never again, you
+might have very old data in the repaired sstables that block compaction from dropping tombstones in the unrepaired
+(probably newer) sstables.
+
+Data directories
+^^^^^^^^^^^^^^^^
+
+Since tombstones and data can live in different sstables it is important to realize that losing an sstable might lead to
+data becoming live again - the most common way of losing sstables is to have a hard drive break down. To avoid making
+data live tombstones and actual data are always in the same data directory. This way, if a disk is lost, all versions of
+a partition are lost and no data can get undeleted. To achieve this a compaction strategy instance per data directory is
+run in addition to the compaction strategy instances containing repaired/unrepaired data, this means that if you have 4
+data directories there will be 8 compaction strategy instances running. This has a few more benefits than just avoiding
+data getting undeleted:
+
+- It is possible to run more compactions in parallel - leveled compaction will have several totally separate levelings
+  and each one can run compactions independently from the others.
+- Users can backup and restore a single data directory.
+- Note though that currently all data directories are considered equal, so if you have a tiny disk and a big disk
+  backing two data directories, the big one will be limited the by the small one. One work around to this is to create
+  more data directories backed by the big disk.
+
+Single sstable tombstone compaction
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+When an sstable is written a histogram with the tombstone expiry times is created and this is used to try to find
+sstables with very many tombstones and run single sstable compaction on that sstable in hope of being able to drop
+tombstones in that sstable. Before starting this it is also checked how likely it is that any tombstones will actually
+will be able to be dropped how much this sstable overlaps with other sstables. To avoid most of these checks the
+compaction option ``unchecked_tombstone_compaction`` can be enabled.
+
+.. _compaction-options:
+
+Common options
+^^^^^^^^^^^^^^
+
+There is a number of common options for all the compaction strategies;
+
+``enabled`` (default: true)
+    Whether minor compactions should run. Note that you can have 'enabled': true as a compaction option and then do
+    'nodetool enableautocompaction' to start running compactions.
+``tombstone_threshold`` (default: 0.2)
+    How much of the sstable should be tombstones for us to consider doing a single sstable compaction of that sstable.
+``tombstone_compaction_interval`` (default: 86400s (1 day))
+    Since it might not be possible to drop any tombstones when doing a single sstable compaction we need to make sure
+    that one sstable is not constantly getting recompacted - this option states how often we should try for a given
+    sstable. 
+``log_all`` (default: false)
+    New detailed compaction logging, see :ref:`below <detailed-compaction-logging>`.
+``unchecked_tombstone_compaction`` (default: false)
+    The single sstable compaction has quite strict checks for whether it should be started, this option disables those
+    checks and for some usecases this might be needed.  Note that this does not change anything for the actual
+    compaction, tombstones are only dropped if it is safe to do so - it might just rewrite an sstable without being able
+    to drop any tombstones.
+``only_purge_repaired_tombstone`` (default: false)
+    Option to enable the extra safety of making sure that tombstones are only dropped if the data has been repaired.
+``min_threshold`` (default: 4)
+    Lower limit of number of sstables before a compaction is triggered. Not used for ``LeveledCompactionStrategy``.
+``max_threshold`` (default: 32)
+    Upper limit of number of sstables before a compaction is triggered. Not used for ``LeveledCompactionStrategy``.
+
+Further, see the section on each strategy for specific additional options.
+
+Compaction nodetool commands
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The :ref:`nodetool <nodetool>` utility provides a number of commands related to compaction:
+
+``enableautocompaction``
+    Enable compaction.
+``disableautocompaction``
+    Disable compaction.
+``setcompactionthroughput``
+    How fast compaction should run at most - defaults to 16MB/s, but note that it is likely not possible to reach this
+    throughput.
+``compactionstats``
+    Statistics about current and pending compactions.
+``compactionhistory``
+    List details about the last compactions.
+``setcompactionthreshold``
+    Set the min/max sstable count for when to trigger compaction, defaults to 4/32.
+
+Switching the compaction strategy and options using JMX
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+It is possible to switch compaction strategies and its options on just a single node using JMX, this is a great way to
+experiment with settings without affecting the whole cluster. The mbean is::
+
+    org.apache.cassandra.db:type=ColumnFamilies,keyspace=<keyspace_name>,columnfamily=<table_name>
+
+and the attribute to change is ``CompactionParameters`` or ``CompactionParametersJson`` if you use jconsole or jmc. The
+syntax for the json version is the same as you would use in an :ref:`ALTER TABLE <alter-table-statement>` statement -
+for example::
+
+    { 'class': 'LeveledCompactionStrategy', 'sstable_size_in_mb': 123 }
+
+The setting is kept until someone executes an :ref:`ALTER TABLE <alter-table-statement>` that touches the compaction
+settings or restarts the node.
+
+.. _detailed-compaction-logging:
+
+More detailed compaction logging
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+Enable with the compaction option ``log_all`` and a more detailed compaction log file will be produced in your log
+directory.
+
+.. _STCS:
+
+Size Tiered Compaction Strategy
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The basic idea of ``SizeTieredCompactionStrategy`` (STCS) is to merge sstables of approximately the same size. All
+sstables are put in different buckets depending on their size. An sstable is added to the bucket if size of the sstable
+is within ``bucket_low`` and ``bucket_high`` of the current average size of the sstables already in the bucket. This
+will create several buckets and the most interesting of those buckets will be compacted. The most interesting one is
+decided by figuring out which bucket's sstables takes the most reads.
+
+Major compaction
+~~~~~~~~~~~~~~~~
+
+When running a major compaction with STCS you will end up with two sstables per data directory (one for repaired data
+and one for unrepaired data). There is also an option (-s) to do a major compaction that splits the output into several
+sstables. The sizes of the sstables are approximately 50%, 25%, 12.5%... of the total size.
+
+.. _stcs-options:
+
+STCS options
+~~~~~~~~~~~~
+
+``min_sstable_size`` (default: 50MB)
+    Sstables smaller than this are put in the same bucket.
+``bucket_low`` (default: 0.5)
+    How much smaller than the average size of a bucket a sstable should be before not being included in the bucket. That
+    is, if ``bucket_low * avg_bucket_size < sstable_size`` (and the ``bucket_high`` condition holds, see below), then
+    the sstable is added to the bucket.
+``bucket_high`` (default: 1.5)
+    How much bigger than the average size of a bucket a sstable should be before not being included in the bucket. That
+    is, if ``sstable_size < bucket_high * avg_bucket_size`` (and the ``bucket_low`` condition holds, see above), then
+    the sstable is added to the bucket.
+
+Defragmentation
+~~~~~~~~~~~~~~~
+
+Defragmentation is done when many sstables are touched during a read.  The result of the read is put in to the memtable
+so that the next read will not have to touch as many sstables. This can cause writes on a read-only-cluster.
+
+.. _LCS:
+
+Leveled Compaction Strategy
+^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The idea of ``LeveledCompactionStrategy`` (LCS) is that all sstables are put into different levels where we guarantee
+that no overlapping sstables are in the same level. By overlapping we mean that the first/last token of a single sstable
+are never overlapping with other sstables. This means that for a SELECT we will only have to look for the partition key
+in a single sstable per level. Each level is 10x the size of the previous one and each sstable is 160MB by default. L0
+is where sstables are streamed/flushed - no overlap guarantees are given here.
+
+When picking compaction candidates we have to make sure that the compaction does not create overlap in the target level.
+This is done by always including all overlapping sstables in the next level. For example if we select an sstable in L3,
+we need to guarantee that we pick all overlapping sstables in L4 and make sure that no currently ongoing compactions
+will create overlap if we start that compaction. We can start many parallel compactions in a level if we guarantee that
+we wont create overlap. For L0 -> L1 compactions we almost always need to include all L1 sstables since most L0 sstables
+cover the full range. We also can't compact all L0 sstables with all L1 sstables in a single compaction since that can
+use too much memory.
+
+When deciding which level to compact LCS checks the higher levels first (with LCS, a "higher" level is one with a higher
+number, L0 being the lowest one) and if the level is behind a compaction will be started in that level.
+
+Major compaction
+~~~~~~~~~~~~~~~~
+
+It is possible to do a major compaction with LCS - it will currently start by filling out L1 and then once L1 is full,
+it continues with L2 etc. This is sub optimal and will change to create all the sstables in a high level instead,
+CASSANDRA-11817.
+
+Bootstrapping
+~~~~~~~~~~~~~
+
+During bootstrap sstables are streamed from other nodes. The level of the remote sstable is kept to avoid many
+compactions after the bootstrap is done. During bootstrap the new node also takes writes while it is streaming the data
+from a remote node - these writes are flushed to L0 like all other writes and to avoid those sstables blocking the
+remote sstables from going to the correct level, we only do STCS in L0 until the bootstrap is done.
+
+STCS in L0
+~~~~~~~~~~
+
+If LCS gets very many L0 sstables reads are going to hit all (or most) of the L0 sstables since they are likely to be
+overlapping. To more quickly remedy this LCS does STCS compactions in L0 if there are more than 32 sstables there. This
+should improve read performance more quickly compared to letting LCS do its L0 -> L1 compactions. If you keep getting
+too many sstables in L0 it is likely that LCS is not the best fit for your workload and STCS could work out better.
+
+Starved sstables
+~~~~~~~~~~~~~~~~
+
+If a node ends up with a leveling where there are a few very high level sstables that are not getting compacted they
+might make it impossible for lower levels to drop tombstones etc. For example, if there are sstables in L6 but there is
+only enough data to actually get a L4 on the node the left over sstables in L6 will get starved and not compacted.  This
+can happen if a user changes sstable\_size\_in\_mb from 5MB to 160MB for example. To avoid this LCS tries to include
+those starved high level sstables in other compactions if there has been 25 compaction rounds where the highest level
+has not been involved.
+
+.. _lcs-options:
+
+LCS options
+~~~~~~~~~~~
+
+``sstable_size_in_mb`` (default: 160MB)
+    The target compressed (if using compression) sstable size - the sstables can end up being larger if there are very
+    large partitions on the node.
+
+LCS also support the ``cassandra.disable_stcs_in_l0`` startup option (``-Dcassandra.disable_stcs_in_l0=true``) to avoid
+doing STCS in L0.
+
+.. _TWCS:
+
+Time Window CompactionStrategy
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+``TimeWindowCompactionStrategy`` (TWCS) is designed specifically for workloads where it's beneficial to have data on
+disk grouped by the timestamp of the data, a common goal when the workload is time-series in nature or when all data is
+written with a TTL. In an expiring/TTL workload, the contents of an entire SSTable likely expire at approximately the
+same time, allowing them to be dropped completely, and space reclaimed much more reliably than when using
+``SizeTieredCompactionStrategy`` or ``LeveledCompactionStrategy``. The basic concept is that
+``TimeWindowCompactionStrategy`` will create 1 sstable per file for a given window, where a window is simply calculated
+as the combination of two primary options:
+
+``compaction_window_unit`` (default: DAYS)
+    A Java TimeUnit (MINUTES, HOURS, or DAYS).
+``compaction_window_size`` (default: 1)
+    The number of units that make up a window.
+
+Taken together, the operator can specify windows of virtually any size, and `TimeWindowCompactionStrategy` will work to
+create a single sstable for writes within that window. For efficiency during writing, the newest window will be
+compacted using `SizeTieredCompactionStrategy`.
+
+Ideally, operators should select a ``compaction_window_unit`` and ``compaction_window_size`` pair that produces
+approximately 20-30 windows - if writing with a 90 day TTL, for example, a 3 Day window would be a reasonable choice
+(``'compaction_window_unit':'DAYS','compaction_window_size':3``).
+
+TimeWindowCompactionStrategy Operational Concerns
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The primary motivation for TWCS is to separate data on disk by timestamp and to allow fully expired SSTables to drop
+more efficiently. One potential way this optimal behavior can be subverted is if data is written to SSTables out of
+order, with new data and old data in the same SSTable. Out of order data can appear in two ways:
+
+- If the user mixes old data and new data in the traditional write path, the data will be comingled in the memtables
+  and flushed into the same SSTable, where it will remain comingled.
+- If the user's read requests for old data cause read repairs that pull old data into the current memtable, that data
+  will be comingled and flushed into the same SSTable.
+
+While TWCS tries to minimize the impact of comingled data, users should attempt to avoid this behavior.  Specifically,
+users should avoid queries that explicitly set the timestamp via CQL ``USING TIMESTAMP``. Additionally, users should run
+frequent repairs (which streams data in such a way that it does not become comingled), and disable background read
+repair by setting the table's ``read_repair_chance`` and ``dclocal_read_repair_chance`` to 0.
+
+Changing TimeWindowCompactionStrategy Options
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Operators wishing to enable ``TimeWindowCompactionStrategy`` on existing data should consider running a major compaction
+first, placing all existing data into a single (old) window. Subsequent newer writes will then create typical SSTables
+as expected.
+
+Operators wishing to change ``compaction_window_unit`` or ``compaction_window_size`` can do so, but may trigger
+additional compactions as adjacent windows are joined together. If the window size is decrease d (for example, from 24
+hours to 12 hours), then the existing SSTables will not be modified - TWCS can not split existing SSTables into multiple
+windows.

diff --git a/doc/source/operating/compression.rst b/doc/source/operating/compression.rst
new file mode 100644
index 0000000..5876214
--- /dev/null
+++ b/doc/source/operating/compression.rst

@@ -0,0 +1,94 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Compression
+-----------
+
+Cassandra offers operators the ability to configure compression on a per-table basis. Compression reduces the size of
+data on disk by compressing the SSTable in user-configurable compression ``chunk_length_in_kb``. Because Cassandra
+SSTables are immutable, the CPU cost of compressing is only necessary when the SSTable is written - subsequent updates
+to data will land in different SSTables, so Cassandra will not need to decompress, overwrite, and recompress data when
+UPDATE commands are issued. On reads, Cassandra will locate the relevant compressed chunks on disk, decompress the full
+chunk, and then proceed with the remainder of the read path (merging data from disks and memtables, read repair, and so
+on).
+
+Configuring Compression
+^^^^^^^^^^^^^^^^^^^^^^^
+
+Compression is configured on a per-table basis as an optional argument to ``CREATE TABLE`` or ``ALTER TABLE``. By
+default, three options are relevant:
+
+- ``class`` specifies the compression class - Cassandra provides three classes (``LZ4Compressor``,
+  ``SnappyCompressor``, and ``DeflateCompressor`` ). The default is ``SnappyCompressor``.
+- ``chunk_length_in_kb`` specifies the number of kilobytes of data per compression chunk. The default is 64KB.
+- ``crc_check_chance`` determines how likely Cassandra is to verify the checksum on each compression chunk during
+  reads. The default is 1.0.
+
+Users can set compression using the following syntax:
+
+::
+
+    CREATE TABLE keyspace.table (id int PRIMARY KEY) WITH compression = {'class': 'LZ4Compressor'};
+
+Or
+
+::
+
+    ALTER TABLE keyspace.table WITH compression = {'class': 'SnappyCompressor', 'chunk_length_in_kb': 128, 'crc_check_chance': 0.5};
+
+Once enabled, compression can be disabled with ``ALTER TABLE`` setting ``enabled`` to ``false``:
+
+::
+
+    ALTER TABLE keyspace.table WITH compression = {'enabled':'false'};
+
+Operators should be aware, however, that changing compression is not immediate. The data is compressed when the SSTable
+is written, and as SSTables are immutable, the compression will not be modified until the table is compacted. Upon
+issuing a change to the compression options via ``ALTER TABLE``, the existing SSTables will not be modified until they
+are compacted - if an operator needs compression changes to take effect immediately, the operator can trigger an SSTable
+rewrite using ``nodetool scrub`` or ``nodetool upgradesstables -a``, both of which will rebuild the SSTables on disk,
+re-compressing the data in the process.
+
+Benefits and Uses
+^^^^^^^^^^^^^^^^^
+
+Compression's primary benefit is that it reduces the amount of data written to disk. Not only does the reduced size save
+in storage requirements, it often increases read and write throughput, as the CPU overhead of compressing data is faster
+than the time it would take to read or write the larger volume of uncompressed data from disk.
+
+Compression is most useful in tables comprised of many rows, where the rows are similar in nature. Tables containing
+similar text columns (such as repeated JSON blobs) often compress very well.
+
+Operational Impact
+^^^^^^^^^^^^^^^^^^
+
+- Compression metadata is stored off-heap and scales with data on disk.  This often requires 1-3GB of off-heap RAM per
+  terabyte of data on disk, though the exact usage varies with ``chunk_length_in_kb`` and compression ratios.
+
+- Streaming operations involve compressing and decompressing data on compressed tables - in some code paths (such as
+  non-vnode bootstrap), the CPU overhead of compression can be a limiting factor.
+
+- The compression path checksums data to ensure correctness - while the traditional Cassandra read path does not have a
+  way to ensure correctness of data on disk, compressed tables allow the user to set ``crc_check_chance`` (a float from
+  0.0 to 1.0) to allow Cassandra to probabilistically validate chunks on read to verify bits on disk are not corrupt.
+
+Advanced Use
+^^^^^^^^^^^^
+
+Advanced users can provide their own compression class by implementing the interface at
+``org.apache.cassandra.io.compress.ICompressor``.

diff --git a/doc/source/operating/hardware.rst b/doc/source/operating/hardware.rst
new file mode 100644
index 0000000..ad3aa8d
--- /dev/null
+++ b/doc/source/operating/hardware.rst

@@ -0,0 +1,87 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Hardware Choices
+----------------
+
+Like most databases, Cassandra throughput improves with more CPU cores, more RAM, and faster disks. While Cassandra can
+be made to run on small servers for testing or development environments (including Raspberry Pis), a minimal production
+server requires at least 2 cores, and at least 8GB of RAM. Typical production servers have 8 or more cores and at least
+32GB of RAM.
+
+CPU
+^^^
+Cassandra is highly concurrent, handling many simultaneous requests (both read and write) using multiple threads running
+on as many CPU cores as possible. The Cassandra write path tends to be heavily optimized (writing to the commitlog and
+then inserting the data into the memtable), so writes, in particular, tend to be CPU bound. Consequently, adding
+additional CPU cores often increases throughput of both reads and writes.
+
+Memory
+^^^^^^
+Cassandra runs within a Java VM, which will pre-allocate a fixed size heap (java's Xmx system parameter). In addition to
+the heap, Cassandra will use significant amounts of RAM offheap for compression metadata, bloom filters, row, key, and
+counter caches, and an in process page cache. Finally, Cassandra will take advantage of the operating system's page
+cache, storing recently accessed portions files in RAM for rapid re-use.
+
+For optimal performance, operators should benchmark and tune their clusters based on their individual workload. However,
+basic guidelines suggest:
+
+-  ECC RAM should always be used, as Cassandra has few internal safeguards to protect against bit level corruption
+-  The Cassandra heap should be no less than 2GB, and no more than 50% of your system RAM
+-  Heaps smaller than 12GB should consider ParNew/ConcurrentMarkSweep garbage collection
+-  Heaps larger than 12GB should consider G1GC
+
+Disks
+^^^^^
+Cassandra persists data to disk for two very different purposes. The first is to the commitlog when a new write is made
+so that it can be replayed after a crash or system shutdown. The second is to the data directory when thresholds are
+exceeded and memtables are flushed to disk as SSTables.
+
+Commitlogs receive every write made to a Cassandra node and have the potential to block client operations, but they are
+only ever read on node start-up. SSTable (data file) writes on the other hand occur asynchronously, but are read to
+satisfy client look-ups. SSTables are also periodically merged and rewritten in a process called compaction.  The data
+held in the commitlog directory is data that has not been permanently saved to the SSTable data directories - it will be
+periodically purged once it is flushed to the SSTable data files.
+
+Cassandra performs very well on both spinning hard drives and solid state disks. In both cases, Cassandra's sorted
+immutable SSTables allow for linear reads, few seeks, and few overwrites, maximizing throughput for HDDs and lifespan of
+SSDs by avoiding write amplification. However, when using spinning disks, it's important that the commitlog
+(``commitlog_directory``) be on one physical disk (not simply a partition, but a physical disk), and the data files
+(``data_file_directories``) be set to a separate physical disk. By separating the commitlog from the data directory,
+writes can benefit from sequential appends to the commitlog without having to seek around the platter as reads request
+data from various SSTables on disk.
+
+In most cases, Cassandra is designed to provide redundancy via multiple independent, inexpensive servers. For this
+reason, using NFS or a SAN for data directories is an antipattern and should typically be avoided.  Similarly, servers
+with multiple disks are often better served by using RAID0 or JBOD than RAID1 or RAID5 - replication provided by
+Cassandra obsoletes the need for replication at the disk layer, so it's typically recommended that operators take
+advantage of the additional throughput of RAID0 rather than protecting against failures with RAID1 or RAID5.
+
+Common Cloud Choices
+^^^^^^^^^^^^^^^^^^^^
+
+Many large users of Cassandra run in various clouds, including AWS, Azure, and GCE - Cassandra will happily run in any
+of these environments. Users should choose similar hardware to what would be needed in physical space. In EC2, popular
+options include:
+
+- m1.xlarge instances, which provide 1.6TB of local ephemeral spinning storage and sufficient RAM to run moderate
+  workloads
+- i2 instances, which provide both a high RAM:CPU ratio and local ephemeral SSDs
+- m4.2xlarge / c4.4xlarge instances, which provide modern CPUs, enhanced networking and work well with EBS GP2 (SSD)
+  storage
+
+Generally, disk and network performance increases with instance size and generation, so newer generations of instances
+and larger instance types within each family often perform better than their smaller or older alternatives.

diff --git a/doc/source/operating/hints.rst b/doc/source/operating/hints.rst
new file mode 100644
index 0000000..f79f18a
--- /dev/null
+++ b/doc/source/operating/hints.rst

@@ -0,0 +1,22 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Hints
+-----
+
+.. todo:: todo

diff --git a/doc/source/operating/index.rst b/doc/source/operating/index.rst
new file mode 100644
index 0000000..6fc27c8
--- /dev/null
+++ b/doc/source/operating/index.rst

@@ -0,0 +1,38 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Operating Cassandra
+===================
+
+.. toctree::
+   :maxdepth: 2
+
+   snitch
+   topo_changes
+   repair
+   read_repair
+   hints
+   compaction
+   bloom_filters
+   compression
+   cdc
+   backups
+   metrics
+   security
+   hardware
+

diff --git a/doc/source/operating/metrics.rst b/doc/source/operating/metrics.rst
new file mode 100644
index 0000000..5884cad
--- /dev/null
+++ b/doc/source/operating/metrics.rst

@@ -0,0 +1,619 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Monitoring
+----------
+
+Metrics in Cassandra are managed using the `Dropwizard Metrics <http://metrics.dropwizard.io>`__ library. These metrics
+can be queried via JMX or pushed to external monitoring systems using a number of `built in
+<http://metrics.dropwizard.io/3.1.0/getting-started/#other-reporting>`__ and `third party
+<http://metrics.dropwizard.io/3.1.0/manual/third-party/>`__ reporter plugins.
+
+Metrics are collected for a single node. It's up to the operator to use an external monitoring system to aggregate them.
+
+Metric Types
+^^^^^^^^^^^^
+All metrics reported by cassandra fit into one of the following types.
+
+``Gauge``
+    An instantaneous measurement of a value.
+
+``Counter``
+    A gauge for an ``AtomicLong`` instance. Typically this is consumed by monitoring the change since the last call to
+    see if there is a large increase compared to the norm.
+
+``Histogram``
+    Measures the statistical distribution of values in a stream of data.
+
+    In addition to minimum, maximum, mean, etc., it also measures median, 75th, 90th, 95th, 98th, 99th, and 99.9th
+    percentiles.
+
+``Timer``
+    Measures both the rate that a particular piece of code is called and the histogram of its duration.
+
+``Latency``
+    Special type that tracks latency (in microseconds) with a ``Timer`` plus a ``Counter`` that tracks the total latency
+    accrued since starting. The former is useful if you track the change in total latency since the last check. Each
+    metric name of this type will have 'Latency' and 'TotalLatency' appended to it.
+
+``Meter``
+    A meter metric which measures mean throughput and one-, five-, and fifteen-minute exponentially-weighted moving
+    average throughputs.
+
+Table Metrics
+^^^^^^^^^^^^^
+
+Each table in Cassandra has metrics responsible for tracking its state and performance.
+
+The metric names are all appended with the specific ``Keyspace`` and ``Table`` name.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.Table.{{MetricName}}.{{Keyspace}}.{{Table}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=Table keyspace={{Keyspace} scope={{Table}} name={{MetricName}}``
+
+.. NOTE::
+    There is a special table called '``all``' without a keyspace. This represents the aggregation of metrics across
+    **all** tables and keyspaces on the node.
+
+
+======================================= ============== ===========
+Name                                    Type           Description
+======================================= ============== ===========
+MemtableOnHeapSize                      Gauge<Long>    Total amount of data stored in the memtable that resides **on**-heap, including column related overhead and partitions overwritten.
+MemtableOffHeapSize                     Gauge<Long>    Total amount of data stored in the memtable that resides **off**-heap, including column related overhead and partitions overwritten.
+MemtableLiveDataSize                    Gauge<Long>    Total amount of live data stored in the memtable, excluding any data structure overhead.
+AllMemtablesOnHeapSize                  Gauge<Long>    Total amount of data stored in the memtables (2i and pending flush memtables included) that resides **on**-heap.
+AllMemtablesOffHeapSize                 Gauge<Long>    Total amount of data stored in the memtables (2i and pending flush memtables included) that resides **off**-heap.
+AllMemtablesLiveDataSize                Gauge<Long>    Total amount of live data stored in the memtables (2i and pending flush memtables included) that resides off-heap, excluding any data structure overhead.
+MemtableColumnsCount                    Gauge<Long>    Total number of columns present in the memtable.
+MemtableSwitchCount                     Counter        Number of times flush has resulted in the memtable being switched out.
+CompressionRatio                        Gauge<Double>  Current compression ratio for all SSTables.
+EstimatedPartitionSizeHistogram         Gauge<long[]>  Histogram of estimated partition size (in bytes).
+EstimatedPartitionCount                 Gauge<Long>    Approximate number of keys in table.
+EstimatedColumnCountHistogram           Gauge<long[]>  Histogram of estimated number of columns.
+SSTablesPerReadHistogram                Histogram      Histogram of the number of sstable data files accessed per read.
+ReadLatency                             Latency        Local read latency for this table.
+RangeLatency                            Latency        Local range scan latency for this table.
+WriteLatency                            Latency        Local write latency for this table.
+CoordinatorReadLatency                  Timer          Coordinator read latency for this table.
+CoordinatorScanLatency                  Timer          Coordinator range scan latency for this table.
+PendingFlushes                          Counter        Estimated number of flush tasks pending for this table.
+BytesFlushed                            Counter        Total number of bytes flushed since server [re]start.
+CompactionBytesWritten                  Counter        Total number of bytes written by compaction since server [re]start.
+PendingCompactions                      Gauge<Integer> Estimate of number of pending compactions for this table.
+LiveSSTableCount                        Gauge<Integer> Number of SSTables on disk for this table.
+LiveDiskSpaceUsed                       Counter        Disk space used by SSTables belonging to this table (in bytes).
+TotalDiskSpaceUsed                      Counter        Total disk space used by SSTables belonging to this table, including obsolete ones waiting to be GC'd.
+MinPartitionSize                        Gauge<Long>    Size of the smallest compacted partition (in bytes).
+MaxPartitionSize                        Gauge<Long>    Size of the largest compacted partition (in bytes).
+MeanPartitionSize                       Gauge<Long>    Size of the average compacted partition (in bytes).
+BloomFilterFalsePositives               Gauge<Long>    Number of false positives on table's bloom filter.
+BloomFilterFalseRatio                   Gauge<Double>  False positive ratio of table's bloom filter.
+BloomFilterDiskSpaceUsed                Gauge<Long>    Disk space used by bloom filter (in bytes).
+BloomFilterOffHeapMemoryUsed            Gauge<Long>    Off-heap memory used by bloom filter.
+IndexSummaryOffHeapMemoryUsed           Gauge<Long>    Off-heap memory used by index summary.
+CompressionMetadataOffHeapMemoryUsed    Gauge<Long>    Off-heap memory used by compression meta data.
+KeyCacheHitRate                         Gauge<Double>  Key cache hit rate for this table.
+TombstoneScannedHistogram               Histogram      Histogram of tombstones scanned in queries on this table.
+LiveScannedHistogram                    Histogram      Histogram of live cells scanned in queries on this table.
+ColUpdateTimeDeltaHistogram             Histogram      Histogram of column update time delta on this table.
+ViewLockAcquireTime                     Timer          Time taken acquiring a partition lock for materialized view updates on this table.
+ViewReadTime                            Timer          Time taken during the local read of a materialized view update.
+TrueSnapshotsSize                       Gauge<Long>    Disk space used by snapshots of this table including all SSTable components.
+RowCacheHitOutOfRange                   Counter        Number of table row cache hits that do not satisfy the query filter, thus went to disk.
+RowCacheHit                             Counter        Number of table row cache hits.
+RowCacheMiss                            Counter        Number of table row cache misses.
+CasPrepare                              Latency        Latency of paxos prepare round.
+CasPropose                              Latency        Latency of paxos propose round.
+CasCommit                               Latency        Latency of paxos commit round.
+PercentRepaired                         Gauge<Double>  Percent of table data that is repaired on disk.
+SpeculativeRetries                      Counter        Number of times speculative retries were sent for this table.
+WaitingOnFreeMemtableSpace              Histogram      Histogram of time spent waiting for free memtable space, either on- or off-heap.
+DroppedMutations                        Counter        Number of dropped mutations on this table.
+======================================= ============== ===========
+
+Keyspace Metrics
+^^^^^^^^^^^^^^^^
+Each keyspace in Cassandra has metrics responsible for tracking its state and performance.
+
+These metrics are the same as the ``Table Metrics`` above, only they are aggregated at the Keyspace level.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.keyspace.{{MetricName}}.{{Keyspace}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=Keyspace scope={{Keyspace}} name={{MetricName}}``
+
+ThreadPool Metrics
+^^^^^^^^^^^^^^^^^^
+
+Cassandra splits work of a particular type into its own thread pool.  This provides back-pressure and asynchrony for
+requests on a node.  It's important to monitor the state of these thread pools since they can tell you how saturated a
+node is.
+
+The metric names are all appended with the specific ``ThreadPool`` name.  The thread pools are also categorized under a
+specific type.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.ThreadPools.{{MetricName}}.{{Path}}.{{ThreadPoolName}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=ThreadPools scope={{ThreadPoolName}} type={{Type}} name={{MetricName}}``
+
+===================== ============== ===========
+Name                  Type           Description
+===================== ============== ===========
+ActiveTasks           Gauge<Integer> Number of tasks being actively worked on by this pool.
+PendingTasks          Gauge<Integer> Number of queued tasks queued up on this pool.
+CompletedTasks        Counter        Number of tasks completed.
+TotalBlockedTasks     Counter        Number of tasks that were blocked due to queue saturation.
+CurrentlyBlockedTask  Counter        Number of tasks that are currently blocked due to queue saturation but on retry will become unblocked.
+MaxPoolSize           Gauge<Integer> The maximum number of threads in this pool.
+===================== ============== ===========
+
+The following thread pools can be monitored.
+
+============================ ============== ===========
+Name                         Type           Description
+============================ ============== ===========
+Native-Transport-Requests    transport      Handles client CQL requests
+CounterMutationStage         request        Responsible for counter writes
+ViewMutationStage            request        Responsible for materialized view writes
+MutationStage                request        Responsible for all other writes
+ReadRepairStage              request        ReadRepair happens on this thread pool
+ReadStage                    request        Local reads run on this thread pool
+RequestResponseStage         request        Coordinator requests to the cluster run on this thread pool
+AntiEntropyStage             internal       Builds merkle tree for repairs
+CacheCleanupExecutor         internal       Cache maintenance performed on this thread pool
+CompactionExecutor           internal       Compactions are run on these threads
+GossipStage                  internal       Handles gossip requests
+HintsDispatcher              internal       Performs hinted handoff
+InternalResponseStage        internal       Responsible for intra-cluster callbacks
+MemtableFlushWriter          internal       Writes memtables to disk
+MemtablePostFlush            internal       Cleans up commit log after memtable is written to disk
+MemtableReclaimMemory        internal       Memtable recycling
+MigrationStage               internal       Runs schema migrations
+MiscStage                    internal       Misceleneous tasks run here
+PendingRangeCalculator       internal       Calculates token range
+PerDiskMemtableFlushWriter_0 internal       Responsible for writing a spec (there is one of these per disk 0-N)
+Sampler                      internal       Responsible for re-sampling the index summaries of SStables
+SecondaryIndexManagement     internal       Performs updates to secondary indexes
+ValidationExecutor           internal       Performs validation compaction or scrubbing
+============================ ============== ===========
+
+.. |nbsp| unicode:: 0xA0 .. nonbreaking space
+
+Client Request Metrics
+^^^^^^^^^^^^^^^^^^^^^^
+
+Client requests have their own set of metrics that encapsulate the work happening at coordinator level.
+
+Different types of client requests are broken down by ``RequestType``.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.ClientRequest.{{MetricName}}.{{RequestType}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=ClientRequest scope={{RequestType}} name={{MetricName}}``
+
+
+:RequestType: CASRead
+:Description: Metrics related to transactional read requests.
+:Metrics:
+    ===================== ============== =============================================================
+    Name                  Type           Description
+    ===================== ============== =============================================================
+    Timeouts              Counter        Number of timeouts encountered.
+    Failures              Counter        Number of transaction failures encountered.
+    |nbsp|                Latency        Transaction read latency.
+    Unavailables          Counter        Number of unavailable exceptions encountered.
+    UnfinishedCommit      Counter        Number of transactions that were committed on read.
+    ConditionNotMet       Counter        Number of transaction preconditions did not match current values.
+    ContentionHistogram   Histogram      How many contended reads were encountered
+    ===================== ============== =============================================================
+
+:RequestType: CASWrite
+:Description: Metrics related to transactional write requests.
+:Metrics:
+    ===================== ============== =============================================================
+    Name                  Type           Description
+    ===================== ============== =============================================================
+    Timeouts              Counter        Number of timeouts encountered.
+    Failures              Counter        Number of transaction failures encountered.
+    |nbsp|                Latency        Transaction write latency.
+    UnfinishedCommit      Counter        Number of transactions that were committed on write.
+    ConditionNotMet       Counter        Number of transaction preconditions did not match current values.
+    ContentionHistogram   Histogram      How many contended writes were encountered
+    ===================== ============== =============================================================
+
+
+:RequestType: Read
+:Description: Metrics related to standard read requests.
+:Metrics:
+    ===================== ============== =============================================================
+    Name                  Type           Description
+    ===================== ============== =============================================================
+    Timeouts              Counter        Number of timeouts encountered.
+    Failures              Counter        Number of read failures encountered.
+    |nbsp|                Latency        Read latency.
+    Unavailables          Counter        Number of unavailable exceptions encountered.
+    ===================== ============== =============================================================
+
+:RequestType: RangeSlice
+:Description: Metrics related to token range read requests.
+:Metrics:
+    ===================== ============== =============================================================
+    Name                  Type           Description
+    ===================== ============== =============================================================
+    Timeouts              Counter        Number of timeouts encountered.
+    Failures              Counter        Number of range query failures encountered.
+    |nbsp|                Latency        Range query latency.
+    Unavailables          Counter        Number of unavailable exceptions encountered.
+    ===================== ============== =============================================================
+
+:RequestType: Write
+:Description: Metrics related to regular write requests.
+:Metrics:
+    ===================== ============== =============================================================
+    Name                  Type           Description
+    ===================== ============== =============================================================
+    Timeouts              Counter        Number of timeouts encountered.
+    Failures              Counter        Number of write failures encountered.
+    |nbsp|                Latency        Write latency.
+    Unavailables          Counter        Number of unavailable exceptions encountered.
+    ===================== ============== =============================================================
+
+
+:RequestType: ViewWrite
+:Description: Metrics related to materialized view write wrtes.
+:Metrics:
+    ===================== ============== =============================================================
+    Timeouts              Counter        Number of timeouts encountered.
+    Failures              Counter        Number of transaction failures encountered.
+    Unavailables          Counter        Number of unavailable exceptions encountered.
+    ViewReplicasAttempted Counter        Total number of attempted view replica writes.
+    ViewReplicasSuccess   Counter        Total number of succeded view replica writes.
+    ViewPendingMutations  Gauge<Long>    ViewReplicasAttempted - ViewReplicasSuccess.
+    ViewWriteLatency      Timer          Time between when mutation is applied to base table and when CL.ONE is achieved on view.
+    ===================== ============== =============================================================
+
+Cache Metrics
+^^^^^^^^^^^^^
+
+Cassandra caches have metrics to track the effectivness of the caches. Though the ``Table Metrics`` might be more useful.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.Cache.{{MetricName}}.{{CacheName}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=Cache scope={{CacheName}} name={{MetricName}}``
+
+========================== ============== ===========
+Name                       Type           Description
+========================== ============== ===========
+Capacity                   Gauge<Long>    Cache capacity in bytes.
+Entries                    Gauge<Integer> Total number of cache entries.
+FifteenMinuteCacheHitRate  Gauge<Double>  15m cache hit rate.
+FiveMinuteCacheHitRate     Gauge<Double>  5m cache hit rate.
+OneMinuteCacheHitRate      Gauge<Double>  1m cache hit rate.
+HitRate                    Gauge<Double>  All time cache hit rate.
+Hits                       Meter          Total number of cache hits.
+Misses                     Meter          Total number of cache misses.
+MissLatency                Timer          Latency of misses.
+Requests                   Gauge<Long>    Total number of cache requests.
+Size                       Gauge<Long>    Total size of occupied cache, in bytes.
+========================== ============== ===========
+
+The following caches are covered:
+
+============================ ===========
+Name                         Description
+============================ ===========
+CounterCache                 Keeps hot counters in memory for performance.
+ChunkCache                   In process uncompressed page cache.
+KeyCache                     Cache for partition to sstable offsets.
+RowCache                     Cache for rows kept in memory.
+============================ ===========
+
+.. NOTE::
+    Misses and MissLatency are only defined for the ChunkCache
+
+CQL Metrics
+^^^^^^^^^^^
+
+Metrics specific to CQL prepared statement caching.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.CQL.{{MetricName}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=CQL name={{MetricName}}``
+
+========================== ============== ===========
+Name                       Type           Description
+========================== ============== ===========
+PreparedStatementsCount    Gauge<Integer> Number of cached prepared statements.
+PreparedStatementsEvicted  Counter        Number of prepared statements evicted from the prepared statement cache
+PreparedStatementsExecuted Counter        Number of prepared statements executed.
+RegularStatementsExecuted  Counter        Number of **non** prepared statements executed.
+PreparedStatementsRatio    Gauge<Double>  Percentage of statements that are prepared vs unprepared.
+========================== ============== ===========
+
+
+DroppedMessage Metrics
+^^^^^^^^^^^^^^^^^^^^^^
+
+Metrics specific to tracking dropped messages for different types of requests.
+Dropped writes are stored and retried by ``Hinted Handoff``
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.DroppedMessages.{{MetricName}}.{{Type}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=DroppedMetrics scope={{Type}} name={{MetricName}}``
+
+========================== ============== ===========
+Name                       Type           Description
+========================== ============== ===========
+CrossNodeDroppedLatency    Timer          The dropped latency across nodes.
+InternalDroppedLatency     Timer          The dropped latency within node.
+Dropped                    Meter          Number of dropped messages.
+========================== ============== ===========
+
+The different types of messages tracked are:
+
+============================ ===========
+Name                         Description
+============================ ===========
+BATCH_STORE                  Batchlog write
+BATCH_REMOVE                 Batchlog cleanup (after succesfully applied)
+COUNTER_MUTATION             Counter writes
+HINT                         Hint replay
+MUTATION                     Regular writes
+READ                         Regular reads
+READ_REPAIR                  Read repair
+PAGED_SLICE                  Paged read
+RANGE_SLICE                  Token range read
+REQUEST_RESPONSE             RPC Callbacks
+_TRACE                       Tracing writes
+============================ ===========
+
+Streaming Metrics
+^^^^^^^^^^^^^^^^^
+
+Metrics reported during ``Streaming`` operations, such as repair, bootstrap, rebuild.
+
+These metrics are specific to a peer endpoint, with the source node being the node you are pulling the metrics from.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.Streaming.{{MetricName}}.{{PeerIP}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=Streaming scope={{PeerIP}} name={{MetricName}}``
+
+========================== ============== ===========
+Name                       Type           Description
+========================== ============== ===========
+IncomingBytes              Counter        Number of bytes streamed to this node from the peer.
+OutgoingBytes              Counter        Number of bytes streamed to the peer endpoint from this node.
+========================== ============== ===========
+
+
+Compaction Metrics
+^^^^^^^^^^^^^^^^^^
+
+Metrics specific to ``Compaction`` work.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.Compaction.{{MetricName}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=Compaction name={{MetricName}}``
+
+========================== ======================================== ===============================================
+Name                       Type                                     Description
+========================== ======================================== ===============================================
+BytesCompacted             Counter                                  Total number of bytes compacted since server [re]start.
+PendingTasks               Gauge<Integer>                           Estimated number of compactions remaining to perform.
+CompletedTasks             Gauge<Long>                              Number of completed compactions since server [re]start.
+TotalCompactionsCompleted  Meter                                    Throughput of completed compactions since server [re]start.
+PendingTasksByTableName    Gauge<Map<String, Map<String, Integer>>> Estimated number of compactions remaining to perform, grouped by keyspace and then table name. This info is also kept in ``Table Metrics``.
+========================== ======================================== ===============================================
+
+CommitLog Metrics
+^^^^^^^^^^^^^^^^^
+
+Metrics specific to the ``CommitLog``
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.CommitLog.{{MetricName}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=CommitLog name={{MetricName}}``
+
+========================== ============== ===========
+Name                       Type           Description
+========================== ============== ===========
+CompletedTasks             Gauge<Long>    Total number of commit log messages written since [re]start.
+PendingTasks               Gauge<Long>    Number of commit log messages written but yet to be fsync'd.
+TotalCommitLogSize         Gauge<Long>    Current size, in bytes, used by all the commit log segments.
+WaitingOnSegmentAllocation Timer          Time spent waiting for a CommitLogSegment to be allocated - under normal conditions this should be zero.
+WaitingOnCommit            Timer          The time spent waiting on CL fsync; for Periodic this is only occurs when the sync is lagging its sync interval.
+========================== ============== ===========
+
+Storage Metrics
+^^^^^^^^^^^^^^^
+
+Metrics specific to the storage engine.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.Storage.{{MetricName}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=Storage name={{MetricName}}``
+
+========================== ============== ===========
+Name                       Type           Description
+========================== ============== ===========
+Exceptions                 Counter        Number of internal exceptions caught. Under normal exceptions this should be zero.
+Load                       Counter        Size, in bytes, of the on disk data size this node manages.
+TotalHints                 Counter        Number of hint messages written to this node since [re]start. Includes one entry for each host to be hinted per hint.
+TotalHintsInProgress       Counter        Number of hints attemping to be sent currently.
+========================== ============== ===========
+
+HintedHandoff Metrics
+^^^^^^^^^^^^^^^^^^^^^
+
+Metrics specific to Hinted Handoff.  There are also some metrics related to hints tracked in ``Storage Metrics``
+
+These metrics include the peer endpoint **in the metric name**
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.HintedHandOffManager.{{MetricName}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=HintedHandOffManager name={{MetricName}}``
+
+=========================== ============== ===========
+Name                        Type           Description
+=========================== ============== ===========
+Hints_created-{{PeerIP}}    Counter        Number of hints on disk for this peer.
+Hints_not_stored-{{PeerIP}} Counter        Number of hints not stored for this peer, due to being down past the configured hint window.
+=========================== ============== ===========
+
+SSTable Index Metrics
+^^^^^^^^^^^^^^^^^^^^^
+
+Metrics specific to the SSTable index metadata.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.Index.{{MetricName}}.RowIndexEntry``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=Index scope=RowIndexEntry name={{MetricName}}``
+
+=========================== ============== ===========
+Name                        Type           Description
+=========================== ============== ===========
+IndexedEntrySize            Histogram      Histogram of the on-heap size, in bytes, of the index across all SSTables.
+IndexInfoCount              Histogram      Histogram of the number of on-heap index entries managed across all SSTables.
+IndexInfoGets               Histogram      Histogram of the number index seeks performed per SSTable.
+=========================== ============== ===========
+
+BufferPool Metrics
+^^^^^^^^^^^^^^^^^^
+
+Metrics specific to the internal recycled buffer pool Cassandra manages.  This pool is meant to keep allocations and GC
+lower by recycling on and off heap buffers.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.BufferPool.{{MetricName}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=BufferPool name={{MetricName}}``
+
+=========================== ============== ===========
+Name                        Type           Description
+=========================== ============== ===========
+Size                        Gauge<Long>    Size, in bytes, of the managed buffer pool
+Misses                      Meter           The rate of misses in the pool. The higher this is the more allocations incurred.
+=========================== ============== ===========
+
+
+Client Metrics
+^^^^^^^^^^^^^^
+
+Metrics specifc to client managment.
+
+Reported name format:
+
+**Metric Name**
+    ``org.apache.cassandra.metrics.Client.{{MetricName}}``
+
+**JMX MBean**
+    ``org.apache.cassandra.metrics:type=Client name={{MetricName}}``
+
+=========================== ============== ===========
+Name                        Type           Description
+=========================== ============== ===========
+connectedNativeClients      Counter        Number of clients connected to this nodes native protocol server
+connectedThriftClients      Counter        Number of clients connected to this nodes thrift protocol server
+=========================== ============== ===========
+
+JMX
+^^^
+
+Any JMX based client can access metrics from cassandra.
+
+If you wish to access JMX metrics over http it's possible to download `Mx4jTool <http://mx4j.sourceforge.net/>`__ and
+place ``mx4j-tools.jar`` into the classpath.  On startup you will see in the log::
+
+    HttpAdaptor version 3.0.2 started on port 8081
+
+To choose a different port (8081 is the default) or a different listen address (0.0.0.0 is not the default) edit
+``conf/cassandra-env.sh`` and uncomment::
+
+    #MX4J_ADDRESS="-Dmx4jaddress=0.0.0.0"
+
+    #MX4J_PORT="-Dmx4jport=8081"
+
+
+Metric Reporters
+^^^^^^^^^^^^^^^^
+
+As mentioned at the top of this section on monitoring the Cassandra metrics can be exported to a number of monitoring
+system a number of `built in <http://metrics.dropwizard.io/3.1.0/getting-started/#other-reporting>`__ and `third party
+<http://metrics.dropwizard.io/3.1.0/manual/third-party/>`__ reporter plugins.
+
+The configuration of these plugins is managed by the `metrics reporter config project
+<https://github.com/addthis/metrics-reporter-config>`__. There is a sample configuration file located at
+``conf/metrics-reporter-config-sample.yaml``.
+
+Once configured, you simply start cassandra with the flag
+``-Dcassandra.metricsReporterConfigFile=metrics-reporter-config.yaml``. The specified .yaml file plus any 3rd party
+reporter jars must all be in Cassandra's classpath.

diff --git a/doc/source/operating/read_repair.rst b/doc/source/operating/read_repair.rst
new file mode 100644
index 0000000..0e52bf5
--- /dev/null
+++ b/doc/source/operating/read_repair.rst

@@ -0,0 +1,22 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Read repair
+-----------
+
+.. todo:: todo

diff --git a/doc/source/operating/repair.rst b/doc/source/operating/repair.rst
new file mode 100644
index 0000000..97d8ce8
--- /dev/null
+++ b/doc/source/operating/repair.rst

@@ -0,0 +1,22 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Repair
+------
+
+.. todo:: todo

diff --git a/doc/source/operating/security.rst b/doc/source/operating/security.rst
new file mode 100644
index 0000000..dfcd9e6
--- /dev/null
+++ b/doc/source/operating/security.rst

@@ -0,0 +1,410 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Security
+--------
+
+There are three main components to the security features provided by Cassandra:
+
+- TLS/SSL encryption for client and inter-node communication
+- Client authentication
+- Authorization
+
+TLS/SSL Encryption
+^^^^^^^^^^^^^^^^^^
+Cassandra provides secure communication between a client machine and a database cluster and between nodes within a
+cluster. Enabling encryption ensures that data in flight is not compromised and is transferred securely. The options for
+client-to-node and node-to-node encryption are managed separately and may be configured independently.
+
+In both cases, the JVM defaults for supported protocols and cipher suites are used when encryption is enabled. These can
+be overidden using the settings in ``cassandra.yaml``, but this is not recommended unless there are policies in place
+which dictate certain settings or a need to disable vulnerable ciphers or protocols in cases where the JVM cannot be
+updated.
+
+FIPS compliant settings can be configured at the JVM level and should not involve changing encryption settings in
+cassandra.yaml. See `the java document on FIPS <https://docs.oracle.com/javase/8/docs/technotes/guides/security/jsse/FIPS.html>`__
+for more details.
+
+For information on generating the keystore and truststore files used in SSL communications, see the
+`java documentation on creating keystores <http://download.oracle.com/javase/6/docs/technotes/guides/security/jsse/JSSERefGuide.html#CreateKeystore>`__
+
+Inter-node Encryption
+~~~~~~~~~~~~~~~~~~~~~
+
+The settings for managing inter-node encryption are found in ``cassandra.yaml`` in the ``server_encryption_options``
+section. To enable inter-node encryption, change the ``internode_encryption`` setting from its default value of ``none``
+to one value from: ``rack``, ``dc`` or ``all``.
+
+Client to Node Encryption
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The settings for managing client to node encryption are found in ``cassandra.yaml`` in the ``client_encryption_options``
+section. There are two primary toggles here for enabling encryption, ``enabled`` and ``optional``.
+
+- If neither is set to ``true``, client connections are entirely unencrypted.
+- If ``enabled`` is set to ``true`` and ``optional`` is set to ``false``, all client connections must be secured.
+- If both options are set to ``true``, both encrypted and unencrypted connections are supported using the same port.
+  Client connections using encryption with this configuration will be automatically detected and handled by the server.
+
+As an alternative to the ``optional`` setting, separate ports can also be configured for secure and unsecure connections
+where operational requirements demand it. To do so, set ``optional`` to false and use the ``native_transport_port_ssl``
+setting in ``cassandra.yaml`` to specify the port to be used for secure client communication.
+
+.. _operation-roles:
+
+Roles
+^^^^^
+
+Cassandra uses database roles, which may represent either a single user or a group of users, in both authentication and
+permissions management. Role management is an extension point in Cassandra and may be configured using the
+``role_manager`` setting in ``cassandra.yaml``. The default setting uses ``CassandraRoleManager``, an implementation
+which stores role information in the tables of the ``system_auth`` keyspace.
+
+See also the :ref:`CQL documentation on roles <cql-roles>`.
+
+Authentication
+^^^^^^^^^^^^^^
+
+Authentication is pluggable in Cassandra and is configured using the ``authenticator`` setting in ``cassandra.yaml``.
+Cassandra ships with two options included in the default distribution.
+
+By default, Cassandra is configured with ``AllowAllAuthenticator`` which performs no authentication checks and therefore
+requires no credentials. It is used to disable authentication completely. Note that authentication is a necessary
+condition of Cassandra's permissions subsystem, so if authentication is disabled, effectively so are permissions.
+
+The default distribution also includes ``PasswordAuthenticator``, which stores encrypted credentials in a system table.
+This can be used to enable simple username/password authentication.
+
+.. _password-authentication:
+
+Enabling Password Authentication
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Before enabling client authentication on the cluster, client applications should be pre-configured with their intended
+credentials. When a connection is initiated, the server will only ask for credentials once authentication is
+enabled, so setting up the client side config in advance is safe. In contrast, as soon as a server has authentication
+enabled, any connection attempt without proper credentials will be rejected which may cause availability problems for
+client applications. Once clients are setup and ready for authentication to be enabled, follow this procedure to enable
+it on the cluster.
+
+Pick a single node in the cluster on which to perform the initial configuration. Ideally, no clients should connect
+to this node during the setup process, so you may want to remove it from client config, block it at the network level
+or possibly add a new temporary node to the cluster for this purpose. On that node, perform the following steps:
+
+1. Open a ``cqlsh`` session and change the replication factor of the ``system_auth`` keyspace. By default, this keyspace
+   uses ``SimpleReplicationStrategy`` and a ``replication_factor`` of 1. It is recommended to change this for any
+   non-trivial deployment to ensure that should nodes become unavailable, login is still possible. Best practice is to
+   configure a replication factor of 3 to 5 per-DC.
+
+::
+
+    ALTER KEYSPACE system_auth WITH replication = {'class': 'NetworkTopologyStrategy', 'DC1': 3, 'DC2': 3};
+
+2. Edit ``cassandra.yaml`` to change the ``authenticator`` option like so:
+
+::
+
+    authenticator: PasswordAuthenticator
+
+3. Restart the node.
+
+4. Open a new ``cqlsh`` session using the credentials of the default superuser:
+
+::
+
+    cqlsh -u cassandra -p cassandra
+
+5. During login, the credentials for the default superuser are read with a consistency level of ``QUORUM``, whereas
+   those for all other users (including superusers) are read at ``LOCAL_ONE``. In the interests of performance and
+   availability, as well as security, operators should create another superuser and disable the default one. This step
+   is optional, but highly recommended. While logged in as the default superuser, create another superuser role which
+   can be used to bootstrap further configuration.
+
+::
+
+    # create a new superuser
+    CREATE ROLE dba WITH SUPERUSER = true AND LOGIN = true AND PASSWORD = 'super';
+
+6. Start a new cqlsh session, this time logging in as the new_superuser and disable the default superuser.
+
+::
+
+    ALTER ROLE cassandra WITH SUPERUSER = false AND LOGIN = false;
+
+7. Finally, set up the roles and credentials for your application users with :ref:`CREATE ROLE <create-role-statement>`
+   statements.
+
+At the end of these steps, the one node is configured to use password authentication. To roll that out across the
+cluster, repeat steps 2 and 3 on each node in the cluster. Once all nodes have been restarted, authentication will be
+fully enabled throughout the cluster.
+
+Note that using ``PasswordAuthenticator`` also requires the use of :ref:`CassandraRoleManager <operation-roles>`.
+
+See also: :ref:`setting-credentials-for-internal-authentication`, :ref:`CREATE ROLE <create-role-statement>`,
+:ref:`ALTER ROLE <alter-role-statement>`, :ref:`ALTER KEYSPACE <alter-keyspace-statement>` and :ref:`GRANT PERMISSION
+<grant-permission-statement>`,
+
+Authorization
+^^^^^^^^^^^^^
+
+Authorization is pluggable in Cassandra and is configured using the ``authorizer`` setting in ``cassandra.yaml``.
+Cassandra ships with two options included in the default distribution.
+
+By default, Cassandra is configured with ``AllowAllAuthorizer`` which performs no checking and so effectively grants all
+permissions to all roles. This must be used if ``AllowAllAuthenticator`` is the configured authenticator.
+
+The default distribution also includes ``CassandraAuthorizer``, which does implement full permissions management
+functionality and stores its data in Cassandra system tables.
+
+Enabling Internal Authorization
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Permissions are modelled as a whitelist, with the default assumption that a given role has no access to any database
+resources. The implication of this is that once authorization is enabled on a node, all requests will be rejected until
+the required permissions have been granted. For this reason, it is strongly recommended to perform the initial setup on
+a node which is not processing client requests.
+
+The following assumes that authentication has already been enabled via the process outlined in
+:ref:`password-authentication`. Perform these steps to enable internal authorization across the cluster:
+
+1. On the selected node, edit ``cassandra.yaml`` to change the ``authorizer`` option like so:
+
+::
+
+    authorizer: CassandraAuthorizer
+
+2. Restart the node.
+
+3. Open a new ``cqlsh`` session using the credentials of a role with superuser credentials:
+
+::
+
+    cqlsh -u dba -p super
+
+4. Configure the appropriate access privileges for your clients using `GRANT PERMISSION <cql.html#grant-permission>`_
+   statements. On the other nodes, until configuration is updated and the node restarted, this will have no effect so
+   disruption to clients is avoided.
+
+::
+
+    GRANT SELECT ON ks.t1 TO db_user;
+
+5. Once all the necessary permissions have been granted, repeat steps 1 and 2 for each node in turn. As each node
+   restarts and clients reconnect, the enforcement of the granted permissions will begin.
+
+See also: :ref:`GRANT PERMISSION <grant-permission-statement>`, `GRANT ALL <grant-all>` and :ref:`REVOKE PERMISSION
+<revoke-permission-statement>`
+
+Caching
+^^^^^^^
+
+Enabling authentication and authorization places additional load on the cluster by frequently reading from the
+``system_auth`` tables. Furthermore, these reads are in the critical paths of many client operations, and so has the
+potential to severely impact quality of service. To mitigate this, auth data such as credentials, permissions and role
+details are cached for a configurable period. The caching can be configured (and even disabled) from ``cassandra.yaml``
+or using a JMX client. The JMX interface also supports invalidation of the various caches, but any changes made via JMX
+are not persistent and will be re-read from ``cassandra.yaml`` when the node is restarted.
+
+Each cache has 3 options which can be set:
+
+Validity Period
+    Controls the expiration of cache entries. After this period, entries are invalidated and removed from the cache.
+Refresh Rate
+    Controls the rate at which background reads are performed to pick up any changes to the underlying data. While these
+    async refreshes are performed, caches will continue to serve (possibly) stale data. Typically, this will be set to a
+    shorter time than the validity period.
+Max Entries
+    Controls the upper bound on cache size.
+
+The naming for these options in ``cassandra.yaml`` follows the convention:
+
+* ``<type>_validity_in_ms``
+* ``<type>_update_interval_in_ms``
+* ``<type>_cache_max_entries``
+
+Where ``<type>`` is one of ``credentials``, ``permissions``, or ``roles``.
+
+As mentioned, these are also exposed via JMX in the mbeans under the ``org.apache.cassandra.auth`` domain.
+
+JMX access
+^^^^^^^^^^
+
+Access control for JMX clients is configured separately to that for CQL. For both authentication and authorization, two
+providers are available; the first based on standard JMX security and the second which integrates more closely with
+Cassandra's own auth subsystem.
+
+The default settings for Cassandra make JMX accessible only from localhost. To enable remote JMX connections, edit
+``cassandra-env.sh`` (or ``cassandra-env.ps1`` on Windows) to change the ``LOCAL_JMX`` setting to ``yes``. Under the
+standard configuration, when remote JMX connections are enabled, :ref:`standard JMX authentication <standard-jmx-auth>`
+is also switched on.
+
+Note that by default, local-only connections are not subject to authentication, but this can be enabled.
+
+If enabling remote connections, it is recommended to also use :ref:`SSL <jmx-with-ssl>` connections.
+
+Finally, after enabling auth and/or SSL, ensure that tools which use JMX, such as :ref:`nodetool <nodetool>`, are
+correctly configured and working as expected.
+
+.. _standard-jmx-auth:
+
+Standard JMX Auth
+~~~~~~~~~~~~~~~~~
+
+Users permitted to connect to the JMX server are specified in a simple text file. The location of this file is set in
+``cassandra-env.sh`` by the line:
+
+::
+
+    JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password"
+
+Edit the password file to add username/password pairs:
+
+::
+
+    jmx_user jmx_password
+
+Secure the credentials file so that only the user running the Cassandra process can read it :
+
+::
+
+    $ chown cassandra:cassandra /etc/cassandra/jmxremote.password
+    $ chmod 400 /etc/cassandra/jmxremote.password
+
+Optionally, enable access control to limit the scope of what defined users can do via JMX. Note that this is a fairly
+blunt instrument in this context as most operational tools in Cassandra require full read/write access. To configure a
+simple access file, uncomment this line in ``cassandra-env.sh``:
+
+::
+
+    #JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.access.file=/etc/cassandra/jmxremote.access"
+
+Then edit the access file to grant your JMX user readwrite permission:
+
+::
+
+    jmx_user readwrite
+
+Cassandra must be restarted to pick up the new settings.
+
+See also : `Using File-Based Password Authentication In JMX
+<http://docs.oracle.com/javase/7/docs/technotes/guides/management/agent.html#gdenv>`__
+
+
+Cassandra Integrated Auth
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+An alternative to the out-of-the-box JMX auth is to useeCassandra's own authentication and/or authorization providers
+for JMX clients. This is potentially more flexible and secure but it come with one major caveat. Namely that it is not
+available until `after` a node has joined the ring, because the auth subsystem is not fully configured until that point
+However, it is often critical for monitoring purposes to have JMX access particularly during bootstrap. So it is
+recommended, where possible, to use local only JMX auth during bootstrap and then, if remote connectivity is required,
+to switch to integrated auth once the node has joined the ring and initial setup is complete.
+
+With this option, the same database roles used for CQL authentication can be used to control access to JMX, so updates
+can be managed centrally using just ``cqlsh``. Furthermore, fine grained control over exactly which operations are
+permitted on particular MBeans can be acheived via :ref:`GRANT PERMISSION <grant-permission-statement>`.
+
+To enable integrated authentication, edit ``cassandra-env.sh`` to uncomment these lines:
+
+::
+
+    #JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.remote.login.config=CassandraLogin"
+    #JVM_OPTS="$JVM_OPTS -Djava.security.auth.login.config=$CASSANDRA_HOME/conf/cassandra-jaas.config"
+
+And disable the JMX standard auth by commenting this line:
+
+::
+
+    JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.password.file=/etc/cassandra/jmxremote.password"
+
+To enable integrated authorization, uncomment this line:
+
+::
+
+    #JVM_OPTS="$JVM_OPTS -Dcassandra.jmx.authorizer=org.apache.cassandra.auth.jmx.AuthorizationProxy"
+
+Check standard access control is off by ensuring this line is commented out:
+
+::
+
+   #JVM_OPTS="$JVM_OPTS -Dcom.sun.management.jmxremote.access.file=/etc/cassandra/jmxremote.access"
+
+With integrated authentication and authorization enabled, operators can define specific roles and grant them access to
+the particular JMX resources that they need. For example, a role with the necessary permissions to use tools such as
+jconsole or jmc in read-only mode would be defined as:
+
+::
+
+    CREATE ROLE jmx WITH LOGIN = false;
+    GRANT SELECT ON ALL MBEANS TO jmx;
+    GRANT DESCRIBE ON ALL MBEANS TO jmx;
+    GRANT EXECUTE ON MBEAN 'java.lang:type=Threading' TO jmx;
+    GRANT EXECUTE ON MBEAN 'com.sun.management:type=HotSpotDiagnostic' TO jmx;
+
+    # Grant the jmx role to one with login permissions so that it can access the JMX tooling
+    CREATE ROLE ks_user WITH PASSWORD = 'password' AND LOGIN = true AND SUPERUSER = false;
+    GRANT jmx TO ks_user;
+
+Fine grained access control to individual MBeans is also supported:
+
+::
+
+    GRANT EXECUTE ON MBEAN 'org.apache.cassandra.db:type=Tables,keyspace=test_keyspace,table=t1' TO ks_user;
+    GRANT EXECUTE ON MBEAN 'org.apache.cassandra.db:type=Tables,keyspace=test_keyspace,table=*' TO ks_owner;
+
+This permits the ``ks_user`` role to invoke methods on the MBean representing a single table in ``test_keyspace``, while
+granting the same permission for all table level MBeans in that keyspace to the ``ks_owner`` role.
+
+Adding/removing roles and granting/revoking of permissions is handled dynamically once the initial setup is complete, so
+no further restarts are required if permissions are altered.
+
+See also: :ref:`Permissions <cql-permissions>`.
+
+.. _jmx-with-ssl:
+
+JMX With SSL
+~~~~~~~~~~~~
+
+JMX SSL configuration is controlled by a number of system properties, some of which are optional. To turn on SSL, edit
+the relevant lines in ``cassandra-env.sh`` (or ``cassandra-env.ps1`` on Windows) to uncomment and set the values of these
+properties as required:
+
+``com.sun.management.jmxremote.ssl``
+    set to true to enable SSL
+``com.sun.management.jmxremote.ssl.need.client.auth``
+    set to true to enable validation of client certificates
+``com.sun.management.jmxremote.registry.ssl``
+    enables SSL sockets for the RMI registry from which clients obtain the JMX connector stub
+``com.sun.management.jmxremote.ssl.enabled.protocols``
+    by default, the protocols supported by the JVM will be used, override with a comma-separated list. Note that this is
+    not usually necessary and using the defaults is the preferred option.
+``com.sun.management.jmxremote.ssl.enabled.cipher.suites``
+    by default, the cipher suites supported by the JVM will be used, override with a comma-separated list. Note that
+    this is not usually necessary and using the defaults is the preferred option.
+``javax.net.ssl.keyStore``
+    set the path on the local filesystem of the keystore containing server private keys and public certificates
+``javax.net.ssl.keyStorePassword``
+    set the password of the keystore file
+``javax.net.ssl.trustStore``
+    if validation of client certificates is required, use this property to specify the path of the truststore containing
+    the public certificates of trusted clients
+``javax.net.ssl.trustStorePassword``
+    set the password of the truststore file
+
+See also: `Oracle Java7 Docs <http://docs.oracle.com/javase/7/docs/technotes/guides/management/agent.html#gdemv>`__,
+`Monitor Java with JMX <https://www.lullabot.com/articles/monitor-java-with-jmx>`__

diff --git a/doc/source/operating/snitch.rst b/doc/source/operating/snitch.rst
new file mode 100644
index 0000000..faea0b3
--- /dev/null
+++ b/doc/source/operating/snitch.rst

@@ -0,0 +1,78 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Snitch
+------
+
+In cassandra, the snitch has two functions:
+
+- it teaches Cassandra enough about your network topology to route requests efficiently.
+- it allows Cassandra to spread replicas around your cluster to avoid correlated failures. It does this by grouping
+  machines into "datacenters" and "racks."  Cassandra will do its best not to have more than one replica on the same
+  "rack" (which may not actually be a physical location).
+
+Dynamic snitching
+^^^^^^^^^^^^^^^^^
+
+The dynamic snitch monitor read latencies to avoid reading from hosts that have slowed down. The dynamic snitch is
+configured with the following properties on ``cassandra.yaml``:
+
+- ``dynamic_snitch``: whether the dynamic snitch should be enabled or disabled.
+- ``dynamic_snitch_update_interval_in_ms``: controls how often to perform the more expensive part of host score
+  calculation.
+- ``dynamic_snitch_reset_interval_in_ms``: if set greater than zero and read_repair_chance is < 1.0, this will allow
+  'pinning' of replicas to hosts in order to increase cache capacity.
+- ``dynamic_snitch_badness_threshold:``: The badness threshold will control how much worse the pinned host has to be
+  before the dynamic snitch will prefer other replicas over it.  This is expressed as a double which represents a
+  percentage.  Thus, a value of 0.2 means Cassandra would continue to prefer the static snitch values until the pinned
+  host was 20% worse than the fastest.
+
+Snitch classes
+^^^^^^^^^^^^^^
+
+The ``endpoint_snitch`` parameter in ``cassandra.yaml`` should be set to the class the class that implements
+``IEndPointSnitch`` which will be wrapped by the dynamic snitch and decide if two endpoints are in the same data center
+or on the same rack. Out of the box, Cassandra provides the snitch implementations:
+
+GossipingPropertyFileSnitch
+    This should be your go-to snitch for production use. The rack and datacenter for the local node are defined in
+    cassandra-rackdc.properties and propagated to other nodes via gossip. If ``cassandra-topology.properties`` exists,
+    it is used as a fallback, allowing migration from the PropertyFileSnitch.
+
+SimpleSnitch
+    Treats Strategy order as proximity. This can improve cache locality when disabling read repair. Only appropriate for
+    single-datacenter deployments.
+
+PropertyFileSnitch
+    Proximity is determined by rack and data center, which are explicitly configured in
+    ``cassandra-topology.properties``.
+
+Ec2Snitch
+    Appropriate for EC2 deployments in a single Region. Loads Region and Availability Zone information from the EC2 API.
+    The Region is treated as the datacenter, and the Availability Zone as the rack. Only private IPs are used, so this
+    will not work across multiple regions.
+
+Ec2MultiRegionSnitch
+    Uses public IPs as broadcast_address to allow cross-region connectivity (thus, you should set seed addresses to the
+    public IP as well). You will need to open the ``storage_port`` or ``ssl_storage_port`` on the public IP firewall
+    (For intra-Region traffic, Cassandra will switch to the private IP after establishing a connection).
+
+RackInferringSnitch
+    Proximity is determined by rack and data center, which are assumed to correspond to the 3rd and 2nd octet of each
+    node's IP address, respectively.  Unless this happens to match your deployment conventions, this is best used as an
+    example of writing a custom Snitch class and is provided in that spirit.

diff --git a/doc/source/operating/topo_changes.rst b/doc/source/operating/topo_changes.rst
new file mode 100644
index 0000000..9d6a2ba
--- /dev/null
+++ b/doc/source/operating/topo_changes.rst

@@ -0,0 +1,122 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. highlight:: none
+
+Adding, replacing, moving and removing nodes
+--------------------------------------------
+
+Bootstrap
+^^^^^^^^^
+
+Adding new nodes is called "bootstrapping". The ``num_tokens`` parameter will define the amount of virtual nodes
+(tokens) the joining node will be assigned during bootstrap. The tokens define the sections of the ring (token ranges)
+the node will become responsible for.
+
+Token allocation
+~~~~~~~~~~~~~~~~
+
+With the default token allocation algorithm the new node will pick ``num_tokens`` random tokens to become responsible
+for. Since tokens are distributed randomly, load distribution improves with a higher amount of virtual nodes, but it
+also increases token management overhead. The default of 256 virtual nodes should provide a reasonable load balance with
+acceptable overhead.
+
+On 3.0+ a new token allocation algorithm was introduced to allocate tokens based on the load of existing virtual nodes
+for a given keyspace, and thus yield an improved load distribution with a lower number of tokens. To use this approach,
+the new node must be started with the JVM option ``-Dcassandra.allocate_tokens_for_keyspace=<keyspace>``, where
+``<keyspace>`` is the keyspace from which the algorithm can find the load information to optimize token assignment for.
+
+Manual token assignment
+"""""""""""""""""""""""
+
+You may specify a comma-separated list of tokens manually with the ``initial_token`` ``cassandra.yaml`` parameter, and
+if that is specified Cassandra will skip the token allocation process. This may be useful when doing token assignment
+with an external tool or when restoring a node with its previous tokens.
+
+Range streaming
+~~~~~~~~~~~~~~~~
+
+After the tokens are allocated, the joining node will pick current replicas of the token ranges it will become
+responsible for to stream data from. By default it will stream from the primary replica of each token range in order to
+guarantee data in the new node will be consistent with the current state.
+
+In the case of any unavailable replica, the consistent bootstrap process will fail. To override this behavior and
+potentially miss data from an unavailable replica, set the JVM flag ``-Dcassandra.consistent.rangemovement=false``.
+
+Resuming failed/hanged bootstrap
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+On 2.2+, if the bootstrap process fails, it's possible to resume bootstrap from the previous saved state by calling
+``nodetool bootstrap resume``. If for some reason the bootstrap hangs or stalls, it may also be resumed by simply
+restarting the node. In order to cleanup bootstrap state and start fresh, you may set the JVM startup flag
+``-Dcassandra.reset_bootstrap_progress=true``.
+
+On lower versions, when the bootstrap proces fails it is recommended to wipe the node (remove all the data), and restart
+the bootstrap process again.
+
+Manual bootstrapping
+~~~~~~~~~~~~~~~~~~~~
+
+It's possible to skip the bootstrapping process entirely and join the ring straight away by setting the hidden parameter
+``auto_bootstrap: false``. This may be useful when restoring a node from a backup or creating a new data-center.
+
+Removing nodes
+^^^^^^^^^^^^^^
+
+You can take a node out of the cluster with ``nodetool decommission`` to a live node, or ``nodetool removenode`` (to any
+other machine) to remove a dead one. This will assign the ranges the old node was responsible for to other nodes, and
+replicate the appropriate data there. If decommission is used, the data will stream from the decommissioned node. If
+removenode is used, the data will stream from the remaining replicas.
+
+No data is removed automatically from the node being decommissioned, so if you want to put the node back into service at
+a different token on the ring, it should be removed manually.
+
+Moving nodes
+^^^^^^^^^^^^
+
+When ``num_tokens: 1`` it's possible to move the node position in the ring with ``nodetool move``. Moving is both a
+convenience over and more efficient than decommission + bootstrap. After moving a node, ``nodetool cleanup`` should be
+run to remove any unnecessary data.
+
+Replacing a dead node
+^^^^^^^^^^^^^^^^^^^^^
+
+In order to replace a dead node, start cassandra with the JVM startup flag
+``-Dcassandra.replace_address_first_boot=<dead_node_ip>``. Once this property is enabled the node starts in a hibernate
+state, during which all the other nodes will see this node to be down.
+
+The replacing node will now start to bootstrap the data from the rest of the nodes in the cluster. The main difference
+between normal bootstrapping of a new node is that this new node will not accept any writes during this phase.
+
+Once the bootstrapping is complete the node will be marked "UP", we rely on the hinted handoff's for making this node
+consistent (since we don't accept writes since the start of the bootstrap).
+
+.. Note:: If the replacement process takes longer than ``max_hint_window_in_ms`` you **MUST** run repair to make the
+   replaced node consistent again, since it missed ongoing writes during bootstrapping.
+
+Monitoring progress
+^^^^^^^^^^^^^^^^^^^
+
+Bootstrap, replace, move and remove progress can be monitored using ``nodetool netstats`` which will show the progress
+of the streaming operations.
+
+Cleanup data after range movements
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+As a safety measure, Cassandra does not automatically remove data from nodes that "lose" part of their token range due
+to a range movement operation (bootstrap, move, replace). Run ``nodetool cleanup`` on the nodes that lost ranges to the
+joining node when you are satisfied the new node is up and working. If you do not do this the old data will still be
+counted against the load on that node.

diff --git a/doc/source/tools/cqlsh.rst b/doc/source/tools/cqlsh.rst
new file mode 100644
index 0000000..45e2db8
--- /dev/null
+++ b/doc/source/tools/cqlsh.rst

@@ -0,0 +1,455 @@
+.. highlight:: none
+
+.. _cqlsh:
+
+cqlsh: the CQL shell
+--------------------
+
+cqlsh is a command line shell for interacting with Cassandra through CQL (the Cassandra Query Language).  It is shipped
+with every Cassandra package, and can be found in the bin/ directory alongside the cassandra executable.  cqlsh utilizes
+the Python native protocol driver, and connects to the single node specified on the command line.
+
+
+Compatibility
+^^^^^^^^^^^^^
+
+cqlsh is compatible with Python 2.7.
+
+In general, a given version of cqlsh is only guaranteed to work with the version of Cassandra that it was released with.
+In some cases, cqlsh make work with older or newer versions of Cassandra, but this is not officially supported.
+
+
+Optional Dependencies
+^^^^^^^^^^^^^^^^^^^^^
+
+cqlsh ships with all essential dependencies.  However, there are some optional dependencies that can be installed to
+improve the capabilities of cqlsh.
+
+pytz
+~~~~
+
+By default, cqlsh displays all timestamps with a UTC timezone.  To support display of timestamps with another timezone,
+the `pytz <http://pytz.sourceforge.net/>`__ library must be installed.  See the ``timezone`` option in cqlshrc_ for
+specifying a timezone to use.
+
+cython
+~~~~~~
+
+The performance of cqlsh's ``COPY`` operations can be improved by installing `cython <http://cython.org/>`__.  This will
+compile the python modules that are central to the performance of ``COPY``.
+
+cqlshrc
+^^^^^^^
+
+The ``cqlshrc`` file holds configuration options for cqlsh.  By default this is in the user's home directory at
+``~/.cassandra/cqlsh``, but a custom location can be specified with the ``--cqlshrc`` option.
+
+Example config values and documentation can be found in the ``conf/cqlshrc.sample`` file of a tarball installation.  You
+can also view the latest version of `cqlshrc online <https://github.com/apache/cassandra/blob/trunk/conf/cqlshrc.sample>`__.
+
+
+Command Line Options
+^^^^^^^^^^^^^^^^^^^^
+
+Usage:
+
+``cqlsh [options] [host [port]]``
+
+Options:
+
+``-C`` ``--color``
+  Force color output
+
+``--no-color``
+  Disable color output
+
+``--browser``
+  Specify the browser to use for displaying cqlsh help.  This can be one of the `supported browser names
+  <https://docs.python.org/2/library/webbrowser.html>`__ (e.g. ``firefox``) or a browser path followed by ``%s`` (e.g.
+  ``/usr/bin/google-chrome-stable %s``).
+
+``--ssl``
+  Use SSL when connecting to Cassandra
+
+``-u`` ``--user``
+  Username to authenticate against Cassandra with
+
+``-p`` ``--password``
+  Password to authenticate against Cassandra with, should
+  be used in conjunction with ``--user``
+
+``-k`` ``--keyspace``
+  Keyspace to authenticate to, should be used in conjunction
+  with ``--user``
+
+``-f`` ``--file``
+  Execute commands from the given file, then exit
+
+``--debug``
+  Print additional debugging information
+
+``--encoding``
+  Specify a non-default encoding for output (defaults to UTF-8)
+
+``--cqlshrc``
+  Specify a non-default location for the ``cqlshrc`` file
+
+``-e`` ``--execute``
+  Execute the given statement, then exit
+
+``--connect-timeout``
+  Specify the connection timeout in seconds (defaults to 2s)
+
+``--request-timeout``
+  Specify the request timeout in seconds (defaults to 10s)
+
+``-t`` ``--tty``
+  Force tty mode (command prompt)
+
+
+Special Commands
+^^^^^^^^^^^^^^^^
+
+In addition to supporting regular CQL statements, cqlsh also supports a number of special commands that are not part of
+CQL.  These are detailed below.
+
+``CONSISTENCY``
+~~~~~~~~~~~~~~~
+
+`Usage`: ``CONSISTENCY <consistency level>``
+
+Sets the consistency level for operations to follow.  Valid arguments include:
+
+- ``ANY``
+- ``ONE``
+- ``TWO``
+- ``THREE``
+- ``QUORUM``
+- ``ALL``
+- ``LOCAL_QUORUM``
+- ``LOCAL_ONE``
+- ``SERIAL``
+- ``LOCAL_SERIAL``
+
+``SERIAL CONSISTENCY``
+~~~~~~~~~~~~~~~~~~~~~~
+
+`Usage`: ``SERIAL CONSISTENCY <consistency level>``
+
+Sets the serial consistency level for operations to follow.  Valid arguments include:
+
+- ``SERIAL``
+- ``LOCAL_SERIAL``
+
+The serial consistency level is only used by conditional updates (``INSERT``, ``UPDATE`` and ``DELETE`` with an ``IF``
+condition). For those, the serial consistency level defines the consistency level of the serial phase (or “paxos” phase)
+while the normal consistency level defines the consistency for the “learn” phase, i.e. what type of reads will be
+guaranteed to see the update right away. For example, if a conditional write has a consistency level of ``QUORUM`` (and
+is successful), then a ``QUORUM`` read is guaranteed to see that write. But if the regular consistency level of that
+write is ``ANY``, then only a read with a consistency level of ``SERIAL`` is guaranteed to see it (even a read with
+consistency ``ALL`` is not guaranteed to be enough).
+
+``SHOW VERSION``
+~~~~~~~~~~~~~~~~
+Prints the cqlsh, Cassandra, CQL, and native protocol versions in use.  Example::
+
+    cqlsh> SHOW VERSION
+    [cqlsh 5.0.1 | Cassandra 3.8 | CQL spec 3.4.2 | Native protocol v4]
+
+``SHOW HOST``
+~~~~~~~~~~~~~
+
+Prints the IP address and port of the Cassandra node that cqlsh is connected to in addition to the cluster name.
+Example::
+
+    cqlsh> SHOW HOST
+    Connected to Prod_Cluster at 192.0.0.1:9042.
+
+``SHOW SESSION``
+~~~~~~~~~~~~~~~~
+
+Pretty prints a specific tracing session.
+
+`Usage`: ``SHOW SESSION <session id>``
+
+Example usage::
+
+    cqlsh> SHOW SESSION 95ac6470-327e-11e6-beca-dfb660d92ad8
+
+    Tracing session: 95ac6470-327e-11e6-beca-dfb660d92ad8
+
+     activity                                                  | timestamp                  | source    | source_elapsed | client
+    -----------------------------------------------------------+----------------------------+-----------+----------------+-----------
+                                            Execute CQL3 query | 2016-06-14 17:23:13.979000 | 127.0.0.1 |              0 | 127.0.0.1
+     Parsing SELECT * FROM system.local; [SharedPool-Worker-1] | 2016-06-14 17:23:13.982000 | 127.0.0.1 |           3843 | 127.0.0.1
+    ...
+
+
+``SOURCE``
+~~~~~~~~~~
+
+Reads the contents of a file and executes each line as a CQL statement or special cqlsh command.
+
+`Usage`: ``SOURCE <string filename>``
+
+Example usage::
+
+    cqlsh> SOURCE '/home/thobbs/commands.cql'
+
+``CAPTURE``
+~~~~~~~~~~~
+
+Begins capturing command output and appending it to a specified file.  Output will not be shown at the console while it
+is captured.
+
+`Usage`::
+
+    CAPTURE '<file>';
+    CAPTURE OFF;
+    CAPTURE;
+
+That is, the path to the file to be appended to must be given inside a string literal. The path is interpreted relative
+to the current working directory. The tilde shorthand notation (``'~/mydir'``) is supported for referring to ``$HOME``.
+
+Only query result output is captured. Errors and output from cqlsh-only commands will still be shown in the cqlsh
+session.
+
+To stop capturing output and show it in the cqlsh session again, use ``CAPTURE OFF``.
+
+To inspect the current capture configuration, use ``CAPTURE`` with no arguments.
+
+``HELP``
+~~~~~~~~
+
+Gives information about cqlsh commands. To see available topics, enter ``HELP`` without any arguments. To see help on a
+topic, use ``HELP <topic>``.  Also see the ``--browser`` argument for controlling what browser is used to display help.
+
+``TRACING``
+~~~~~~~~~~~
+
+Enables or disables tracing for queries.  When tracing is enabled, once a query completes, a trace of the events during
+the query will be printed.
+
+`Usage`::
+
+    TRACING ON
+    TRACING OFF
+
+``PAGING``
+~~~~~~~~~~
+
+Enables paging, disables paging, or sets the page size for read queries.  When paging is enabled, only one page of data
+will be fetched at a time and a prompt will appear to fetch the next page.  Generally, it's a good idea to leave paging
+enabled in an interactive session to avoid fetching and printing large amounts of data at once.
+
+`Usage`::
+
+    PAGING ON
+    PAGING OFF
+    PAGING <page size in rows>
+
+``EXPAND``
+~~~~~~~~~~
+
+Enables or disables vertical printing of rows.  Enabling ``EXPAND`` is useful when many columns are fetched, or the
+contents of a single column are large.
+
+`Usage`::
+
+    EXPAND ON
+    EXPAND OFF
+
+``LOGIN``
+~~~~~~~~~
+
+Authenticate as a specified Cassandra user for the current session.
+
+`Usage`::
+
+    LOGIN <username> [<password>]
+
+``EXIT``
+~~~~~~~~~
+
+Ends the current session and terminates the cqlsh process.
+
+`Usage`::
+
+    EXIT
+    QUIT
+
+``CLEAR``
+~~~~~~~~~
+
+Clears the console.
+
+`Usage`::
+
+    CLEAR
+    CLS
+
+``DESCRIBE``
+~~~~~~~~~~~~
+
+Prints a description (typically a series of DDL statements) of a schema element or the cluster.  This is useful for
+dumping all or portions of the schema.
+
+`Usage`::
+
+    DESCRIBE CLUSTER
+    DESCRIBE SCHEMA
+    DESCRIBE KEYSPACES
+    DESCRIBE KEYSPACE <keyspace name>
+    DESCRIBE TABLES
+    DESCRIBE TABLE <table name>
+    DESCRIBE INDEX <index name>
+    DESCRIBE MATERIALIZED VIEW <view name>
+    DESCRIBE TYPES
+    DESCRIBE TYPE <type name>
+    DESCRIBE FUNCTIONS
+    DESCRIBE FUNCTION <function name>
+    DESCRIBE AGGREGATES
+    DESCRIBE AGGREGATE <aggregate function name>
+
+In any of the commands, ``DESC`` may be used in place of ``DESCRIBE``.
+
+The ``DESCRIBE CLUSTER`` command prints the cluster name and partitioner::
+
+    cqlsh> DESCRIBE CLUSTER
+
+    Cluster: Test Cluster
+    Partitioner: Murmur3Partitioner
+
+The ``DESCRIBE SCHEMA`` command prints the DDL statements needed to recreate the entire schema.  This is especially
+useful for dumping the schema in order to clone a cluster or restore from a backup.
+
+``COPY TO``
+~~~~~~~~~~~
+
+Copies data from a table to a CSV file.
+
+`Usage`::
+
+    COPY <table name> [(<column>, ...)] TO <file name> WITH <copy option> [AND <copy option> ...]
+
+If no columns are specified, all columns from the table will be copied to the CSV file.  A subset of columns to copy may
+be specified by adding a comma-separated list of column names surrounded by parenthesis after the table name.
+
+
+The ``<file name>`` should be a string literal (with single quotes) representing a path to the destination file.  This
+can also the special value ``STDOUT`` (without single quotes) to print the CSV to stdout.
+
+See :ref:`shared-copy-options` for options that apply to both ``COPY TO`` and ``COPY FROM``.
+
+Options for ``COPY TO``
+```````````````````````
+
+``MAXREQUESTS``
+  The maximum number token ranges to fetch simultaneously. Defaults to 6.
+
+``PAGESIZE``
+  The number of rows to fetch in a single page. Defaults to 1000.
+
+``PAGETIMEOUT``
+  By default the page timeout is 10 seconds per 1000 entries
+  in the page size or 10 seconds if pagesize is smaller.
+
+``BEGINTOKEN``, ``ENDTOKEN``
+  Token range to export.  Defaults to exporting the full ring.
+
+``MAXOUTPUTSIZE``
+  The maximum size of the output file measured in number of lines;
+  beyond this maximum the output file will be split into segments.
+  -1 means unlimited, and is the default.
+
+``ENCODING``
+  The encoding used for characters. Defaults to ``utf8``.
+
+``COPY FROM``
+~~~~~~~~~~~~~
+Copies data from a CSV file to table.
+
+`Usage`::
+
+    COPY <table name> [(<column>, ...)] FROM <file name> WITH <copy option> [AND <copy option> ...]
+
+If no columns are specified, all columns from the CSV file will be copied to the table.  A subset
+of columns to copy may be specified by adding a comma-separated list of column names surrounded
+by parenthesis after the table name.
+
+The ``<file name>`` should be a string literal (with single quotes) representing a path to the
+source file.  This can also the special value ``STDIN`` (without single quotes) to read the
+CSV data from stdin.
+
+See :ref:`shared-copy-options` for options that apply to both ``COPY TO`` and ``COPY FROM``.
+
+Options for ``COPY TO``
+```````````````````````
+
+``INGESTRATE``
+  The maximum number of rows to process per second. Defaults to 100000.
+
+``MAXROWS``
+  The maximum number of rows to import. -1 means unlimited, and is the default.
+
+``SKIPROWS``
+  A number of initial rows to skip.  Defaults to 0.
+
+``SKIPCOLS``
+  A comma-separated list of column names to ignore.  By default, no columns are skipped.
+
+``MAXPARSEERRORS``
+  The maximum global number of parsing errors to ignore. -1 means unlimited, and is the default.
+
+``MAXINSERTERRORS``
+  The maximum global number of insert errors to ignore. -1 means unlimited.  The default is 1000.
+
+``ERRFILE`` =
+  A file to store all rows that could not be imported, by default this is ``import_<ks>_<table>.err`` where ``<ks>`` is
+  your keyspace and ``<table>`` is your table name.
+
+``MAXBATCHSIZE``
+  The max number of rows inserted in a single batch. Defaults to 20.
+
+``MINBATCHSIZE``
+  The min number of rows inserted in a single batch. Defaults to 2.
+
+``CHUNKSIZE``
+  The number of rows that are passed to child worker processes from the main process at a time. Defaults to 1000.
+
+.. _shared-copy-options:
+
+Shared COPY Options
+```````````````````
+
+Options that are common to both ``COPY TO`` and ``COPY FROM``.
+
+``NULLVAL``
+  The string placeholder for null values.  Defaults to ``null``.
+
+``HEADER``
+  For ``COPY TO``, controls whether the first line in the CSV output file will contain the column names.  For COPY FROM,
+  specifies whether the first line in the CSV input file contains column names.  Defaults to ``false``.
+
+``DECIMALSEP``
+  The character that is used as the decimal point separator.  Defaults to ``.``.
+
+``THOUSANDSSEP``
+  The character that is used to separate thousands. Defaults to the empty string.
+
+``BOOLSTYlE``
+  The string literal format for boolean values.  Defaults to ``True,False``.
+
+``NUMPROCESSES``
+  The number of child worker processes to create for ``COPY`` tasks.  Defaults to a max of 4 for ``COPY FROM`` and 16
+  for ``COPY TO``.  However, at most (num_cores - 1) processes will be created.
+
+``MAXATTEMPTS``
+  The maximum number of failed attempts to fetch a range of data (when using ``COPY TO``) or insert a chunk of data
+  (when using ``COPY FROM``) before giving up. Defaults to 5.
+
+``REPORTFREQUENCY``
+  How often status updates are refreshed, in seconds.  Defaults to 0.25.
+
+``RATEFILE``
+  An optional file to output rate statistics to.  By default, statistics are not output to a file.

diff --git a/doc/source/tools/index.rst b/doc/source/tools/index.rst
new file mode 100644
index 0000000..5a5e4d5
--- /dev/null
+++ b/doc/source/tools/index.rst

@@ -0,0 +1,26 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Cassandra Tools
+===============
+
+This section describes the command line tools provided with Apache Cassandra.
+
+.. toctree::
+   :maxdepth: 1
+
+   cqlsh
+   nodetool

diff --git a/doc/source/tools/nodetool.rst b/doc/source/tools/nodetool.rst
new file mode 100644
index 0000000..e373031
--- /dev/null
+++ b/doc/source/tools/nodetool.rst

@@ -0,0 +1,22 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+.. _nodetool:
+
+Nodetool
+--------
+
+.. todo:: Try to autogenerate this from Nodetool’s help.

diff --git a/doc/source/troubleshooting/index.rst b/doc/source/troubleshooting/index.rst
new file mode 100644
index 0000000..2e5cf10
--- /dev/null
+++ b/doc/source/troubleshooting/index.rst

@@ -0,0 +1,20 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..     http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing, software
+.. distributed under the License is distributed on an "AS IS" BASIS,
+.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+.. See the License for the specific language governing permissions and
+.. limitations under the License.
+
+Troubleshooting
+===============
+
+.. TODO: todo

diff --git a/ide/idea-iml-file.xml b/ide/idea-iml-file.xml
index e6c4aee..f14fe2e 100644
--- a/ide/idea-iml-file.xml
+++ b/ide/idea-iml-file.xml

@@ -33,6 +33,7 @@
             <sourceFolder url="file://$MODULE_DIR$/test/microbench" isTestSource="true" />
             <sourceFolder url="file://$MODULE_DIR$/test/burn" isTestSource="true" />
             <sourceFolder url="file://$MODULE_DIR$/test/resources" type="java-test-resource" />
+            <sourceFolder url="file://$MODULE_DIR$/test/conf" type="java-test-resource" />
             <excludeFolder url="file://$MODULE_DIR$/.idea" />
             <excludeFolder url="file://$MODULE_DIR$/.settings" />
             <excludeFolder url="file://$MODULE_DIR$/build" />

diff --git a/lib/HdrHistogram-2.1.9.jar b/lib/HdrHistogram-2.1.9.jar
new file mode 100644
index 0000000..efa2637
--- /dev/null
+++ b/lib/HdrHistogram-2.1.9.jar
Binary files differ

diff --git a/lib/caffeine-2.2.6.jar b/lib/caffeine-2.2.6.jar
new file mode 100644
index 0000000..74b91bc
--- /dev/null
+++ b/lib/caffeine-2.2.6.jar
Binary files differ

diff --git a/lib/concurrent-trees-2.4.0.jar b/lib/concurrent-trees-2.4.0.jar
new file mode 100644
index 0000000..9c488fe
--- /dev/null
+++ b/lib/concurrent-trees-2.4.0.jar
Binary files differ

diff --git a/lib/hppc-0.5.4.jar b/lib/hppc-0.5.4.jar
new file mode 100644
index 0000000..d84b83b
--- /dev/null
+++ b/lib/hppc-0.5.4.jar
Binary files differ

diff --git a/lib/jflex-1.6.0.jar b/lib/jflex-1.6.0.jar
new file mode 100644
index 0000000..550e446
--- /dev/null
+++ b/lib/jflex-1.6.0.jar
Binary files differ

diff --git a/lib/licenses/netty-all-4.0.23.Final.txt b/lib/licenses/caffeine-2.2.6.txt
similarity index 100%
copy from lib/licenses/netty-all-4.0.23.Final.txt
copy to lib/licenses/caffeine-2.2.6.txt


diff --git a/lib/licenses/concurrent-trees-2.4.0.txt b/lib/licenses/concurrent-trees-2.4.0.txt
new file mode 100644
index 0000000..50086f8
--- /dev/null
+++ b/lib/licenses/concurrent-trees-2.4.0.txt

@@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!) The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
\ No newline at end of file

diff --git a/lib/licenses/hdrhistogram-2.1.9.txt b/lib/licenses/hdrhistogram-2.1.9.txt
new file mode 100644
index 0000000..9b4e66e
--- /dev/null
+++ b/lib/licenses/hdrhistogram-2.1.9.txt

@@ -0,0 +1,41 @@
+The code in this repository code was Written by Gil Tene, Michael Barker,
+and Matt Warren, and released to the public domain, as explained at
+http://creativecommons.org/publicdomain/zero/1.0/
+
+For users of this code who wish to consume it under the "BSD" license
+rather than under the public domain or CC0 contribution text mentioned
+above, the code found under this directory is *also* provided under the
+following license (commonly referred to as the BSD 2-Clause License). This
+license does not detract from the above stated release of the code into
+the public domain, and simply represents an additional license granted by
+the Author.
+
+-----------------------------------------------------------------------------
+** Beginning of "BSD 2-Clause License" text. **
+
+ Copyright (c) 2012, 2013, 2014 Gil Tene
+ Copyright (c) 2014 Michael Barker
+ Copyright (c) 2014 Matt Warren
+ All rights reserved.
+
+ Redistribution and use in source and binary forms, with or without
+ modification, are permitted provided that the following conditions are met:
+
+ 1. Redistributions of source code must retain the above copyright notice,
+    this list of conditions and the following disclaimer.
+
+ 2. Redistributions in binary form must reproduce the above copyright notice,
+    this list of conditions and the following disclaimer in the documentation
+    and/or other materials provided with the distribution.
+
+ THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
+ AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
+ IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
+ ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE
+ LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
+ CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
+ SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
+ INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
+ CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
+ ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF
+ THE POSSIBILITY OF SUCH DAMAGE.

diff --git a/lib/licenses/netty-all-4.0.23.Final.txt b/lib/licenses/hppc-0.5.4.txt
similarity index 100%
copy from lib/licenses/netty-all-4.0.23.Final.txt
copy to lib/licenses/hppc-0.5.4.txt


diff --git a/lib/licenses/jflex-1.6.0.txt b/lib/licenses/jflex-1.6.0.txt
new file mode 100644
index 0000000..50086f8
--- /dev/null
+++ b/lib/licenses/jflex-1.6.0.txt

@@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!) The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
\ No newline at end of file

diff --git a/lib/licenses/netty-all-4.0.23.Final.txt b/lib/licenses/netty-all-4.0.39.Final.txt
similarity index 100%
rename from lib/licenses/netty-all-4.0.23.Final.txt
rename to lib/licenses/netty-all-4.0.39.Final.txt


diff --git a/lib/licenses/primitive-1.0.txt b/lib/licenses/primitive-1.0.txt
new file mode 100644
index 0000000..50086f8
--- /dev/null
+++ b/lib/licenses/primitive-1.0.txt

@@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!) The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
\ No newline at end of file

diff --git a/lib/licenses/snowball-stemmer-1.3.0.581.1.txt b/lib/licenses/snowball-stemmer-1.3.0.581.1.txt
new file mode 100644
index 0000000..50086f8
--- /dev/null
+++ b/lib/licenses/snowball-stemmer-1.3.0.581.1.txt

@@ -0,0 +1,201 @@
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+
+   APPENDIX: How to apply the Apache License to your work.
+
+      To apply the Apache License to your work, attach the following
+      boilerplate notice, with the fields enclosed by brackets "[]"
+      replaced with your own identifying information. (Don't include
+      the brackets!) The text should be enclosed in the appropriate
+      comment syntax for the file format. We also recommend that a
+      file or class name and description of purpose be included on the
+      same "printed page" as the copyright notice for easier
+      identification within third-party archives.
+
+   Copyright [yyyy] [name of copyright owner]
+
+   Licensed under the Apache License, Version 2.0 (the "License");
+   you may not use this file except in compliance with the License.
+   You may obtain a copy of the License at
+
+       http://www.apache.org/licenses/LICENSE-2.0
+
+   Unless required by applicable law or agreed to in writing, software
+   distributed under the License is distributed on an "AS IS" BASIS,
+   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+   See the License for the specific language governing permissions and
+   limitations under the License.
\ No newline at end of file

diff --git a/lib/netty-all-4.0.23.Final.jar b/lib/netty-all-4.0.23.Final.jar
deleted file mode 100644
index 0555a16..0000000
--- a/lib/netty-all-4.0.23.Final.jar
+++ /dev/null
Binary files differ

diff --git a/lib/netty-all-4.0.39.Final.jar b/lib/netty-all-4.0.39.Final.jar
new file mode 100644
index 0000000..3f5b3e6
--- /dev/null
+++ b/lib/netty-all-4.0.39.Final.jar
Binary files differ

diff --git a/lib/primitive-1.0.jar b/lib/primitive-1.0.jar
new file mode 100644
index 0000000..288daa0
--- /dev/null
+++ b/lib/primitive-1.0.jar
Binary files differ

diff --git a/lib/snowball-stemmer-1.3.0.581.1.jar b/lib/snowball-stemmer-1.3.0.581.1.jar
new file mode 100644
index 0000000..92189b9
--- /dev/null
+++ b/lib/snowball-stemmer-1.3.0.581.1.jar
Binary files differ

diff --git a/pylib/cqlshlib/copyutil.py b/pylib/cqlshlib/copyutil.py
index 5c1196b..d0524fe 100644
--- a/pylib/cqlshlib/copyutil.py
+++ b/pylib/cqlshlib/copyutil.py

@@ -52,7 +52,7 @@
 
 from cql3handling import CqlRuleSet
 from displaying import NO_COLOR_MAP
-from formatting import format_value_default, DateTimeFormat, EMPTY, get_formatter
+from formatting import format_value_default, CqlType, DateTimeFormat, EMPTY, get_formatter
 from sslhandling import ssl_settings
 
 PROFILE_ON = False
@@ -309,8 +309,10 @@
         copy_options['pagetimeout'] = int(opts.pop('pagetimeout', max(10, 10 * (copy_options['pagesize'] / 1000))))
         copy_options['maxattempts'] = int(opts.pop('maxattempts', 5))
         copy_options['dtformats'] = DateTimeFormat(opts.pop('datetimeformat', shell.display_timestamp_format),
-                                                   shell.display_date_format, shell.display_nanotime_format)
-        copy_options['float_precision'] = shell.display_float_precision
+                                                   shell.display_date_format, shell.display_nanotime_format,
+                                                   milliseconds_only=True)
+        copy_options['floatprecision'] = int(opts.pop('floatprecision', '5'))
+        copy_options['doubleprecision'] = int(opts.pop('doubleprecision', '12'))
         copy_options['chunksize'] = int(opts.pop('chunksize', 5000))
         copy_options['ingestrate'] = int(opts.pop('ingestrate', 100000))
         copy_options['maxbatchsize'] = int(opts.pop('maxbatchsize', 20))
@@ -332,6 +334,7 @@
         copy_options['ratefile'] = safe_normpath(opts.pop('ratefile', ''))
         copy_options['maxoutputsize'] = int(opts.pop('maxoutputsize', '-1'))
         copy_options['preparedstatements'] = bool(opts.pop('preparedstatements', 'true').lower() == 'true')
+        copy_options['ttl'] = int(opts.pop('ttl', -1))
 
         # Hidden properties, they do not appear in the documentation but can be set in config files
         # or on the cmd line but w/o completion
@@ -371,8 +374,11 @@
         """
         try:
             num_cores_for_testing = os.environ.get('CQLSH_COPY_TEST_NUM_CORES', '')
-            return int(num_cores_for_testing) if num_cores_for_testing else mp.cpu_count()
+            ret = int(num_cores_for_testing) if num_cores_for_testing else mp.cpu_count()
+            printdebugmsg("Detected %d core(s)" % (ret,))
+            return ret
         except NotImplementedError:
+            printdebugmsg("Failed to detect number of cores, returning 1")
             return 1
 
     @staticmethod
@@ -1464,7 +1470,8 @@
     def __init__(self, params):
         ChildProcess.__init__(self, params=params, target=self.run)
         options = params['options']
-        self.float_precision = options.copy['float_precision']
+        self.float_precision = options.copy['floatprecision']
+        self.double_precision = options.copy['doubleprecision']
         self.nullval = options.copy['nullval']
         self.max_requests = options.copy['maxrequests']
 
@@ -1588,12 +1595,17 @@
         return session
 
     def attach_callbacks(self, token_range, future, session):
+        metadata = session.cluster.metadata
+        ks_meta = metadata.keyspaces[self.ks]
+        table_meta = ks_meta.tables[self.table]
+        cql_types = [CqlType(table_meta.columns[c].cql_type, ks_meta) for c in self.columns]
+
         def result_callback(rows):
             if future.has_more_pages:
                 future.start_fetching_next_page()
-                self.write_rows_to_csv(token_range, rows)
+                self.write_rows_to_csv(token_range, rows, cql_types)
             else:
-                self.write_rows_to_csv(token_range, rows)
+                self.write_rows_to_csv(token_range, rows, cql_types)
                 self.send((None, None))
                 session.complete_request()
 
@@ -1603,7 +1615,7 @@
 
         future.add_callbacks(callback=result_callback, errback=err_callback)
 
-    def write_rows_to_csv(self, token_range, rows):
+    def write_rows_to_csv(self, token_range, rows, cql_types):
         if not rows:
             return  # no rows in this range
 
@@ -1612,7 +1624,7 @@
             writer = csv.writer(output, **self.options.dialect)
 
             for row in rows:
-                writer.writerow(map(self.format_value, row))
+                writer.writerow(map(self.format_value, row, cql_types))
 
             data = (output.getvalue(), len(rows))
             self.send((token_range, data))
@@ -1621,18 +1633,21 @@
         except Exception, e:
             self.report_error(e, token_range)
 
-    def format_value(self, val):
+    def format_value(self, val, cqltype):
         if val is None or val == EMPTY:
             return format_value_default(self.nullval, colormap=NO_COLOR_MAP)
 
-        ctype = type(val)
-        formatter = self.formatters.get(ctype, None)
+        formatter = self.formatters.get(cqltype, None)
         if not formatter:
-            formatter = get_formatter(ctype)
-            self.formatters[ctype] = formatter
+            formatter = get_formatter(val, cqltype)
+            self.formatters[cqltype] = formatter
 
-        return formatter(val, encoding=self.encoding, colormap=NO_COLOR_MAP, date_time_format=self.date_time_format,
-                         float_precision=self.float_precision, nullval=self.nullval, quote=False,
+        if not hasattr(cqltype, 'precision'):
+            cqltype.precision = self.double_precision if cqltype.type_name == 'double' else self.float_precision
+
+        return formatter(val, cqltype=cqltype,
+                         encoding=self.encoding, colormap=NO_COLOR_MAP, date_time_format=self.date_time_format,
+                         float_precision=cqltype.precision, nullval=self.nullval, quote=False,
                          decimal_sep=self.decimal_sep, thousands_sep=self.thousands_sep,
                          boolean_styles=self.boolean_styles)
 
@@ -1842,9 +1857,9 @@
 
             return ret
 
-        # this should match all possible CQL datetime formats
+        # this should match all possible CQL and CQLSH datetime formats
         p = re.compile("(\d{4})\-(\d{2})\-(\d{2})\s?(?:'T')?" +  # YYYY-MM-DD[( |'T')]
-                       "(?:(\d{2}):(\d{2})(?::(\d{2}))?)?" +  # [HH:MM[:SS]]
+                       "(?:(\d{2}):(\d{2})(?::(\d{2})(?:\.(\d{1,6}))?))?" +  # [HH:MM[:SS[.NNNNNN]]]
                        "(?:([+\-])(\d{2}):?(\d{2}))?")  # [(+|-)HH[:]MM]]
 
         def convert_datetime(val, **_):
@@ -1871,13 +1886,16 @@
                                     int(m.group(6)) if m.group(6) else 0,  # second
                                     0, 1, -1))  # day of week, day of year, dst-flag
 
-            if m.group(7):
-                offset = (int(m.group(8)) * 3600 + int(m.group(9)) * 60) * int(m.group(7) + '1')
+            # convert sub-seconds (a number between 1 and 6 digits) to milliseconds
+            milliseconds = 0 if not m.group(7) else int(m.group(7)) * pow(10, 3 - len(m.group(7)))
+
+            if m.group(8):
+                offset = (int(m.group(9)) * 3600 + int(m.group(10)) * 60) * int(m.group(8) + '1')
             else:
                 offset = -time.timezone
 
             # scale seconds to millis for the raw value
-            return (timegm(tval) + offset) * 1e3
+            return ((timegm(tval) + offset) * 1e3) + milliseconds
 
         def convert_date(v, **_):
             return Date(v)
@@ -2150,6 +2168,7 @@
         self.min_batch_size = options.copy['minbatchsize']
         self.max_batch_size = options.copy['maxbatchsize']
         self.use_prepared_statements = options.copy['preparedstatements']
+        self.ttl = options.copy['ttl']
         self.max_inflight_messages = options.copy['maxinflightmessages']
         self.max_backoff_attempts = options.copy['maxbackoffattempts']
 
@@ -2216,7 +2235,8 @@
                                                             protect_name(self.table),
                                                             ', '.join(protect_names(self.valid_columns),),
                                                             ', '.join(['?' for _ in self.valid_columns]))
-
+            if self.ttl >= 0:
+                query += 'USING TTL %s' % (self.ttl,)
             query = self.session.prepare(query)
             query.consistency_level = self.consistency_level
             prepared_statement = query
@@ -2225,6 +2245,8 @@
             query = 'INSERT INTO %s.%s (%s) VALUES (%%s)' % (protect_name(self.ks),
                                                              protect_name(self.table),
                                                              ', '.join(protect_names(self.valid_columns),))
+            if self.ttl >= 0:
+                query += 'USING TTL %s' % (self.ttl,)
             make_statement = self.wrap_make_statement(self.make_non_prepared_batch_statement)
 
         conv = ImportConversion(self, table_meta, prepared_statement)
@@ -2281,25 +2303,20 @@
         return make_statement_with_failures if self.test_failures else make_statement
 
     def make_counter_batch_statement(self, query, conv, batch, replicas):
-        def make_full_query(r):
+        statement = BatchStatement(batch_type=BatchType.COUNTER, consistency_level=self.consistency_level)
+        statement.replicas = replicas
+        statement.keyspace = self.ks
+        for row in batch['rows']:
             where_clause = []
             set_clause = []
-            for i, value in enumerate(r):
+            for i, value in enumerate(row):
                 if i in conv.primary_key_indexes:
                     where_clause.append("%s=%s" % (self.valid_columns[i], value))
                 else:
                     set_clause.append("%s=%s+%s" % (self.valid_columns[i], self.valid_columns[i], value))
-            return query % (','.join(set_clause), ' AND '.join(where_clause))
 
-        if len(batch['rows']) == 1:
-            statement = SimpleStatement(make_full_query(batch['rows'][0]), consistency_level=self.consistency_level)
-        else:
-            statement = BatchStatement(batch_type=BatchType.COUNTER, consistency_level=self.consistency_level)
-            for row in batch['rows']:
-                statement.add(make_full_query(row))
-
-        statement.replicas = replicas
-        statement.keyspace = self.ks
+            full_query_text = query % (','.join(set_clause), ' AND '.join(where_clause))
+            statement.add(full_query_text)
         return statement
 
     def make_prepared_batch_statement(self, query, _, batch, replicas):
@@ -2313,25 +2330,17 @@
         We could optimize further by removing bound_statements altogether but we'd have to duplicate much
         more driver's code (BoundStatement.bind()).
         """
-        if len(batch['rows']) == 1:
-            statement = query.bind(batch['rows'][0])
-        else:
-            statement = BatchStatement(batch_type=BatchType.UNLOGGED, consistency_level=self.consistency_level)
-            statement._statements_and_parameters = [(True, query.query_id, query.bind(r).values) for r in batch['rows']]
-
+        statement = BatchStatement(batch_type=BatchType.UNLOGGED, consistency_level=self.consistency_level)
         statement.replicas = replicas
         statement.keyspace = self.ks
+        statement._statements_and_parameters = [(True, query.query_id, query.bind(r).values) for r in batch['rows']]
         return statement
 
     def make_non_prepared_batch_statement(self, query, _, batch, replicas):
-        if len(batch['rows']) == 1:
-            statement = SimpleStatement(query % (','.join(batch['rows'][0]),), consistency_level=self.consistency_level)
-        else:
-            statement = BatchStatement(batch_type=BatchType.UNLOGGED, consistency_level=self.consistency_level)
-            statement._statements_and_parameters = [(False, query % (','.join(r),), ()) for r in batch['rows']]
-
+        statement = BatchStatement(batch_type=BatchType.UNLOGGED, consistency_level=self.consistency_level)
         statement.replicas = replicas
         statement.keyspace = self.ks
+        statement._statements_and_parameters = [(False, query % (','.join(r),), ()) for r in batch['rows']]
         return statement
 
     def convert_rows(self, conv, chunk):

diff --git a/pylib/cqlshlib/cql3handling.py b/pylib/cqlshlib/cql3handling.py
index 9008514..c01e441 100644
--- a/pylib/cqlshlib/cql3handling.py
+++ b/pylib/cqlshlib/cql3handling.py

@@ -51,6 +51,7 @@
         ('default_time_to_live', None),
         ('speculative_retry', None),
         ('memtable_flush_period_in_ms', None),
+        ('cdc', None)
     )
 
     columnfamily_layout_map_options = (
@@ -141,7 +142,7 @@
 <stringLiteral> ::= <quotedStringLiteral>
                   | <pgStringLiteral> ;
 <quotedStringLiteral> ::= /'([^']|'')*'/ ;
-<pgStringLiteral> ::= /\$\$(?:(?!\$\$)|[^$])*\$\$/;
+<pgStringLiteral> ::= /\$\$(?:(?!\$\$).)*\$\$/;
 <quotedName> ::=    /"([^"]|"")*"/ ;
 <float> ::=         /-?[0-9]+\.[0-9]+/ ;
 <uuid> ::=          /[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}/ ;
@@ -160,7 +161,7 @@
             | "false"
             ;
 
-<unclosedPgString>::= /\$\$(?:(?!\$\$)|[^$])*/ ;
+<unclosedPgString>::= /\$\$(?:(?!\$\$).)*/ ;
 <unclosedString>  ::= /'([^']|'')*/ ;
 <unclosedName>    ::= /"([^"]|"")*/ ;
 <unclosedComment> ::= /[/][*].*$/ ;
@@ -478,6 +479,8 @@
     if this_opt in ('min_compaction_threshold', 'max_compaction_threshold',
                     'gc_grace_seconds', 'min_index_interval', 'max_index_interval'):
         return [Hint('<integer>')]
+    if this_opt in ('cdc'):
+        return [Hint('<true|false>')]
     return [Hint('<option_value>')]
 
 
@@ -565,13 +568,13 @@
 
 
 @completer_for('nonSystemKeyspaceName', 'ksname')
-def ks_name_completer(ctxt, cass):
+def non_system_ks_name_completer(ctxt, cass):
     ksnames = [n for n in cass.get_keyspace_names() if n not in SYSTEM_KEYSPACES]
     return map(maybe_escape_name, ksnames)
 
 
 @completer_for('alterableKeyspaceName', 'ksname')
-def ks_name_completer(ctxt, cass):
+def alterable_ks_name_completer(ctxt, cass):
     ksnames = [n for n in cass.get_keyspace_names() if n not in NONALTERBALE_KEYSPACES]
     return map(maybe_escape_name, ksnames)
 
@@ -700,7 +703,9 @@
              | "WRITETIME" "(" [colname]=<cident> ")"
              | "TTL" "(" [colname]=<cident> ")"
              | "COUNT" "(" star=( "*" | "1" ) ")"
+             | "CAST" "(" <selector> "AS" <storageType> ")"
              | <functionName> <selectionFunctionArguments>
+             | <term>
              ;
 <selectionFunctionArguments> ::= "(" ( <selector> ( "," <selector> )* )? ")"
                           ;
@@ -876,7 +881,6 @@
 
 @completer_for('insertStatement', 'valcomma')
 def insert_valcomma_completer(ctxt, cass):
-    layout = get_table_meta(ctxt, cass)
     numcols = len(ctxt.get_binding('colname', ()))
     numvals = len(ctxt.get_binding('newval', ()))
     if numcols > numvals:
@@ -900,21 +904,27 @@
                         ( "IF" ( "EXISTS" | <conditions> ))?
                     ;
 <assignment> ::= updatecol=<cident>
-                    ( "=" update_rhs=( <term> | <cident> )
+                    (( "=" update_rhs=( <term> | <cident> )
                                 ( counterop=( "+" | "-" ) inc=<wholenumber>
-                                | listadder="+" listcol=<cident> )?
-                    | indexbracket="[" <term> "]" "=" <term> )
+                                | listadder="+" listcol=<cident> )? )
+                    | ( indexbracket="[" <term> "]" "=" <term> )
+                    | ( udt_field_dot="." udt_field=<identifier> "=" <term> ))
                ;
 <conditions> ::=  <condition> ( "AND" <condition> )*
                ;
-<condition> ::= <cident> ( "[" <term> "]" )? (("=" | "<" | ">" | "<=" | ">=" | "!=") <term>
-                                             | "IN" "(" <term> ( "," <term> )* ")")
+<condition_op_and_rhs> ::= (("=" | "<" | ">" | "<=" | ">=" | "!=") <term>)
+                           | ("IN" "(" <term> ( "," <term> )* ")" )
+                         ;
+<condition> ::= conditioncol=<cident>
+                    ( (( indexbracket="[" <term> "]" )
+                      |( udt_field_dot="." udt_field=<identifier> )) )?
+                    <condition_op_and_rhs>
               ;
 '''
 
 
 @completer_for('updateStatement', 'updateopt')
-def insert_option_completer(ctxt, cass):
+def update_option_completer(ctxt, cass):
     opts = set('TIMESTAMP TTL'.split())
     for opt in ctxt.get_binding('updateopt', ()):
         opts.discard(opt.split()[0])
@@ -983,6 +993,62 @@
         return ['[']
     return []
 
+
+@completer_for('assignment', 'udt_field_dot')
+def update_udt_field_dot_completer(ctxt, cass):
+    layout = get_table_meta(ctxt, cass)
+    curcol = dequote_name(ctxt.get_binding('updatecol', ''))
+    return ["."] if _is_usertype(layout, curcol) else []
+
+
+@completer_for('assignment', 'udt_field')
+def assignment_udt_field_completer(ctxt, cass):
+    layout = get_table_meta(ctxt, cass)
+    curcol = dequote_name(ctxt.get_binding('updatecol', ''))
+    return _usertype_fields(ctxt, cass, layout, curcol)
+
+
+def _is_usertype(layout, curcol):
+    coltype = layout.columns[curcol].cql_type
+    return coltype not in simple_cql_types and coltype not in ('map', 'set', 'list')
+
+
+def _usertype_fields(ctxt, cass, layout, curcol):
+    if not _is_usertype(layout, curcol):
+        return []
+
+    coltype = layout.columns[curcol].cql_type
+    ks = ctxt.get_binding('ksname', None)
+    if ks is not None:
+        ks = dequote_name(ks)
+    user_type = cass.get_usertype_layout(ks, coltype)
+    return [field_name for (field_name, field_type) in user_type]
+
+
+@completer_for('condition', 'indexbracket')
+def condition_indexbracket_completer(ctxt, cass):
+    layout = get_table_meta(ctxt, cass)
+    curcol = dequote_name(ctxt.get_binding('conditioncol', ''))
+    coltype = layout.columns[curcol].cql_type
+    if coltype in ('map', 'list'):
+        return ['[']
+    return []
+
+
+@completer_for('condition', 'udt_field_dot')
+def condition_udt_field_dot_completer(ctxt, cass):
+    layout = get_table_meta(ctxt, cass)
+    curcol = dequote_name(ctxt.get_binding('conditioncol', ''))
+    return ["."] if _is_usertype(layout, curcol) else []
+
+
+@completer_for('condition', 'udt_field')
+def condition_udt_field_completer(ctxt, cass):
+    layout = get_table_meta(ctxt, cass)
+    curcol = dequote_name(ctxt.get_binding('conditioncol', ''))
+    return _usertype_fields(ctxt, cass, layout, curcol)
+
+
 syntax_rules += r'''
 <deleteStatement> ::= "DELETE" ( <deleteSelector> ( "," <deleteSelector> )* )?
                         "FROM" cf=<columnFamilyName>
@@ -990,7 +1056,9 @@
                         "WHERE" <whereClause>
                         ( "IF" ( "EXISTS" | <conditions> ) )?
                     ;
-<deleteSelector> ::= delcol=<cident> ( memberbracket="[" memberselector=<term> "]" )?
+<deleteSelector> ::= delcol=<cident>
+                     ( ( "[" <term> "]" )
+                     | ( "." <identifier> ) )?
                    ;
 <deleteOption> ::= "TIMESTAMP" <wholenumber>
                  ;
@@ -1010,6 +1078,7 @@
     layout = get_table_meta(ctxt, cass)
     return map(maybe_escape_name, regular_column_names(layout))
 
+
 syntax_rules += r'''
 <batchStatement> ::= "BEGIN" ( "UNLOGGED" | "COUNTER" )? "BATCH"
                         ( "USING" [batchopt]=<usingOption>
@@ -1059,7 +1128,7 @@
                                 ;
 
 <cfamProperty> ::= <property>
-                 | "COMPACT" "STORAGE"
+                 | "COMPACT" "STORAGE" "CDC"
                  | "CLUSTERING" "ORDER" "BY" "(" <cfamOrdering>
                                                  ( "," <cfamOrdering> )* ")"
                  ;
@@ -1402,6 +1471,7 @@
 <resource> ::= <dataResource>
              | <roleResource>
              | <functionResource>
+             | <jmxResource>
              ;
 
 <dataResource> ::= ( "ALL" "KEYSPACES" )
@@ -1420,6 +1490,11 @@
                            ")" )
                        )
                      ;
+
+<jmxResource> ::= ( "ALL" "MBEANS")
+                | ( ( "MBEAN" | "MBEANS" ) <stringLiteral> )
+                ;
+
 '''
 
 
@@ -1471,7 +1546,7 @@
 
 
 @completer_for('dropTriggerStatement', 'triggername')
-def alter_type_field_completer(ctxt, cass):
+def drop_trigger_completer(ctxt, cass):
     names = get_trigger_names(ctxt, cass)
     return map(maybe_escape_name, names)
 

diff --git a/pylib/cqlshlib/formatting.py b/pylib/cqlshlib/formatting.py
index dcd08da..5364c18 100644
--- a/pylib/cqlshlib/formatting.py
+++ b/pylib/cqlshlib/formatting.py

@@ -16,7 +16,9 @@
 
 import binascii
 import calendar
+import datetime
 import math
+import os
 import re
 import sys
 import platform
@@ -59,7 +61,7 @@
 empty_colormap = defaultdict(lambda: '')
 
 
-def format_by_type(cqltype, val, encoding, colormap=None, addcolor=False,
+def format_by_type(val, cqltype, encoding, colormap=None, addcolor=False,
                    nullval=None, date_time_format=None, float_precision=None,
                    decimal_sep=None, thousands_sep=None, boolean_styles=None):
     if nullval is None:
@@ -74,7 +76,7 @@
         date_time_format = DateTimeFormat()
     if float_precision is None:
         float_precision = default_float_precision
-    return format_value(cqltype, val, encoding=encoding, colormap=colormap,
+    return format_value(val, cqltype=cqltype, encoding=encoding, colormap=colormap,
                         date_time_format=date_time_format, float_precision=float_precision,
                         nullval=nullval, decimal_sep=decimal_sep, thousands_sep=thousands_sep,
                         boolean_styles=boolean_styles)
@@ -99,20 +101,102 @@
 
 DEFAULT_NANOTIME_FORMAT = '%H:%M:%S.%N'
 DEFAULT_DATE_FORMAT = '%Y-%m-%d'
-DEFAULT_TIMESTAMP_FORMAT = '%Y-%m-%d %H:%M:%S%z'
 
-if platform.system() == 'Windows':
-    DEFAULT_TIME_FORMAT = '%Y-%m-%d %H:%M:%S %Z'
+DEFAULT_TIMESTAMP_FORMAT = os.environ.get('CQLSH_DEFAULT_TIMESTAMP_FORMAT', '')
+if not DEFAULT_TIMESTAMP_FORMAT:
+    DEFAULT_TIMESTAMP_FORMAT = '%Y-%m-%d %H:%M:%S.%f%z'
 
 
-class DateTimeFormat():
+class DateTimeFormat:
 
     def __init__(self, timestamp_format=DEFAULT_TIMESTAMP_FORMAT, date_format=DEFAULT_DATE_FORMAT,
-                 nanotime_format=DEFAULT_NANOTIME_FORMAT, timezone=None):
+                 nanotime_format=DEFAULT_NANOTIME_FORMAT, timezone=None, milliseconds_only=False):
         self.timestamp_format = timestamp_format
         self.date_format = date_format
         self.nanotime_format = nanotime_format
         self.timezone = timezone
+        self.milliseconds_only = milliseconds_only  # the microseconds part, .NNNNNN, wil be rounded to .NNN
+
+
+class CqlType(object):
+    """
+    A class for converting a string into a cql type name that can match a formatter
+    and a list of its sub-types, if any.
+    """
+    pattern = re.compile('^([^<]*)<(.*)>$')  # *<*>
+
+    def __init__(self, typestring, ksmeta=None):
+        self.type_name, self.sub_types, self.formatter = self.parse(typestring, ksmeta)
+
+    def __str__(self):
+        return "%s%s" % (self.type_name, self.sub_types or '')
+
+    __repr__ = __str__
+
+    def get_n_sub_types(self, num):
+        """
+        Return the sub-types if the requested number matches the length of the sub-types (tuples)
+        or the first sub-type times the number requested if the length of the sub-types is one (list, set),
+        otherwise raise an exception
+        """
+        if len(self.sub_types) == num:
+            return self.sub_types
+        elif len(self.sub_types) == 1:
+            return [self.sub_types[0]] * num
+        else:
+            raise Exception("Unexpected number of subtypes %d - %s" % (num, self.sub_types))
+
+    def parse(self, typestring, ksmeta):
+        """
+        Parse the typestring by looking at this pattern: *<*>. If there is no match then the type
+        is either a simple type or a user type, otherwise it must be a composite type
+        for which we need to look-up the sub-types. For user types the sub types can be extracted
+        from the keyspace metadata.
+        """
+        while True:
+            m = self.pattern.match(typestring)
+            if not m:  # no match, either a simple or a user type
+                name = typestring
+                if ksmeta and name in ksmeta.user_types:  # a user type, look at ks meta for sub types
+                    sub_types = [CqlType(t, ksmeta) for t in ksmeta.user_types[name].field_types]
+                    return name, sub_types, format_value_utype
+                else:
+                    return name, [], self._get_formatter(name)
+            else:
+                if m.group(1) == 'frozen':  # ignore frozen<>
+                    typestring = m.group(2)
+                    continue
+
+                name = m.group(1)  # a composite type, parse sub types
+                return name, self.parse_sub_types(m.group(2), ksmeta), self._get_formatter(name)
+
+    @staticmethod
+    def _get_formatter(name):
+        return _formatters.get(name.lower())
+
+    @staticmethod
+    def parse_sub_types(val, ksmeta):
+        """
+        Split val into sub-strings separated by commas but only if not within a <> pair
+        Return a list of CqlType instances where each instance is initialized with the sub-strings
+        that were found.
+        """
+        last = 0
+        level = 0
+        ret = []
+        for i, c in enumerate(val):
+            if c == '<':
+                level += 1
+            elif c == '>':
+                level -= 1
+            elif c == ',' and level == 0:
+                ret.append(val[last:i].strip())
+                last = i + 1
+
+        if last < len(val) - 1:
+            ret.append(val[last:].strip())
+
+        return [CqlType(r, ksmeta) for r in ret]
 
 
 def format_value_default(val, colormap, **_):
@@ -126,20 +210,24 @@
 _formatters = {}
 
 
-def format_value(type, val, **kwargs):
+def format_value(val, cqltype, **kwargs):
     if val == EMPTY:
         return format_value_default('', **kwargs)
-    formatter = _formatters.get(type.__name__, format_value_default)
-    return formatter(val, **kwargs)
+
+    formatter = get_formatter(val, cqltype)
+    return formatter(val, cqltype=cqltype, **kwargs)
 
 
-def get_formatter(type):
-    return _formatters.get(type.__name__, format_value_default)
+def get_formatter(val, cqltype):
+    if cqltype and cqltype.formatter:
+        return cqltype.formatter
+
+    return _formatters.get(type(val).__name__.lower(), format_value_default)
 
 
 def formatter_for(typname):
     def registrator(f):
-        _formatters[typname] = f
+        _formatters[typname.lower()] = f
         return f
     return registrator
 
@@ -149,6 +237,7 @@
     bval = '0x' + binascii.hexlify(val)
     return colorme(bval, colormap, 'blob')
 formatter_for('buffer')(format_value_blob)
+formatter_for('blob')(format_value_blob)
 
 
 def format_python_formatted_type(val, colormap, color, quote=False):
@@ -169,6 +258,8 @@
 def format_value_uuid(val, colormap, **_):
     return format_python_formatted_type(val, colormap, 'uuid')
 
+formatter_for('timeuuid')(format_value_uuid)
+
 
 @formatter_for('inet')
 def formatter_value_inet(val, colormap, quote=False, **_):
@@ -181,6 +272,8 @@
         val = boolean_styles[0] if val else boolean_styles[1]
     return format_python_formatted_type(val, colormap, 'boolean')
 
+formatter_for('boolean')(format_value_boolean)
+
 
 def format_floating_point_type(val, colormap, float_precision, decimal_sep=None, thousands_sep=None, **_):
     if math.isnan(val):
@@ -209,6 +302,7 @@
     return colorme(bval, colormap, 'float')
 
 formatter_for('float')(format_floating_point_type)
+formatter_for('double')(format_floating_point_type)
 
 
 def format_integer_type(val, colormap, thousands_sep=None, **_):
@@ -232,22 +326,53 @@
 
 formatter_for('long')(format_integer_type)
 formatter_for('int')(format_integer_type)
+formatter_for('bigint')(format_integer_type)
+formatter_for('varint')(format_integer_type)
 
 
 @formatter_for('datetime')
 def format_value_timestamp(val, colormap, date_time_format, quote=False, **_):
-    bval = strftime(date_time_format.timestamp_format, calendar.timegm(val.utctimetuple()), timezone=date_time_format.timezone)
+    if isinstance(val, datetime.datetime):
+        bval = strftime(date_time_format.timestamp_format,
+                        calendar.timegm(val.utctimetuple()),
+                        microseconds=val.microsecond,
+                        timezone=date_time_format.timezone)
+        if date_time_format.milliseconds_only:
+            bval = round_microseconds(bval)
+    else:
+        bval = str(val)
+
     if quote:
         bval = "'%s'" % bval
     return colorme(bval, colormap, 'timestamp')
 
+formatter_for('timestamp')(format_value_timestamp)
 
-def strftime(time_format, seconds, timezone=None):
-    ret_dt = datetime_from_timestamp(seconds).replace(tzinfo=UTC())
+
+def strftime(time_format, seconds, microseconds=0, timezone=None):
+    ret_dt = datetime_from_timestamp(seconds) + datetime.timedelta(microseconds=microseconds)
+    ret_dt = ret_dt.replace(tzinfo=UTC())
     if timezone:
         ret_dt = ret_dt.astimezone(timezone)
     return ret_dt.strftime(time_format)
 
+microseconds_regex = re.compile("(.*)(?:\.(\d{1,6}))(.*)")
+
+
+def round_microseconds(val):
+    """
+    For COPY TO, we need to round microsecond to milliseconds because server side
+    TimestampSerializer.dateStringPatterns only parses milliseconds. If we keep microseconds,
+    users may try to import with COPY FROM a file generated with COPY TO and have problems if
+    prepared statements are disabled, see CASSANDRA-11631.
+    """
+    m = microseconds_regex.match(val)
+    if not m:
+        return val
+
+    milliseconds = int(m.group(2)) * pow(10, 3 - len(m.group(2)))
+    return '%s.%03d%s' % (m.group(1), milliseconds, '' if not m.group(3) else m.group(3))
+
 
 @formatter_for('Date')
 def format_value_date(val, colormap, **_):
@@ -273,16 +398,18 @@
 
 # name alias
 formatter_for('unicode')(format_value_text)
+formatter_for('text')(format_value_text)
+formatter_for('ascii')(format_value_text)
 
 
-def format_simple_collection(val, lbracket, rbracket, encoding,
+def format_simple_collection(val, cqltype, lbracket, rbracket, encoding,
                              colormap, date_time_format, float_precision, nullval,
                              decimal_sep, thousands_sep, boolean_styles):
-    subs = [format_value(type(sval), sval, encoding=encoding, colormap=colormap,
+    subs = [format_value(sval, cqltype=stype, encoding=encoding, colormap=colormap,
                          date_time_format=date_time_format, float_precision=float_precision,
                          nullval=nullval, quote=True, decimal_sep=decimal_sep,
                          thousands_sep=thousands_sep, boolean_styles=boolean_styles)
-            for sval in val]
+            for sval, stype in zip(val, cqltype.get_n_sub_types(len(val)))]
     bval = lbracket + ', '.join(get_str(sval) for sval in subs) + rbracket
     if colormap is NO_COLOR_MAP:
         return bval
@@ -295,25 +422,25 @@
 
 
 @formatter_for('list')
-def format_value_list(val, encoding, colormap, date_time_format, float_precision, nullval,
+def format_value_list(val, cqltype, encoding, colormap, date_time_format, float_precision, nullval,
                       decimal_sep, thousands_sep, boolean_styles, **_):
-    return format_simple_collection(val, '[', ']', encoding, colormap,
+    return format_simple_collection(val, cqltype, '[', ']', encoding, colormap,
                                     date_time_format, float_precision, nullval,
                                     decimal_sep, thousands_sep, boolean_styles)
 
 
 @formatter_for('tuple')
-def format_value_tuple(val, encoding, colormap, date_time_format, float_precision, nullval,
+def format_value_tuple(val, cqltype, encoding, colormap, date_time_format, float_precision, nullval,
                        decimal_sep, thousands_sep, boolean_styles, **_):
-    return format_simple_collection(val, '(', ')', encoding, colormap,
+    return format_simple_collection(val, cqltype, '(', ')', encoding, colormap,
                                     date_time_format, float_precision, nullval,
                                     decimal_sep, thousands_sep, boolean_styles)
 
 
 @formatter_for('set')
-def format_value_set(val, encoding, colormap, date_time_format, float_precision, nullval,
+def format_value_set(val, cqltype, encoding, colormap, date_time_format, float_precision, nullval,
                      decimal_sep, thousands_sep, boolean_styles, **_):
-    return format_simple_collection(sorted(val), '{', '}', encoding, colormap,
+    return format_simple_collection(sorted(val), cqltype, '{', '}', encoding, colormap,
                                     date_time_format, float_precision, nullval,
                                     decimal_sep, thousands_sep, boolean_styles)
 formatter_for('frozenset')(format_value_set)
@@ -322,15 +449,15 @@
 
 
 @formatter_for('dict')
-def format_value_map(val, encoding, colormap, date_time_format, float_precision, nullval,
+def format_value_map(val, cqltype, encoding, colormap, date_time_format, float_precision, nullval,
                      decimal_sep, thousands_sep, boolean_styles, **_):
-    def subformat(v):
-        return format_value(type(v), v, encoding=encoding, colormap=colormap,
+    def subformat(v, t):
+        return format_value(v, cqltype=t, encoding=encoding, colormap=colormap,
                             date_time_format=date_time_format, float_precision=float_precision,
                             nullval=nullval, quote=True, decimal_sep=decimal_sep,
                             thousands_sep=thousands_sep, boolean_styles=boolean_styles)
 
-    subs = [(subformat(k), subformat(v)) for (k, v) in sorted(val.items())]
+    subs = [(subformat(k, cqltype.sub_types[0]), subformat(v, cqltype.sub_types[1])) for (k, v) in sorted(val.items())]
     bval = '{' + ', '.join(get_str(k) + ': ' + get_str(v) for (k, v) in subs) + '}'
     if colormap is NO_COLOR_MAP:
         return bval
@@ -345,14 +472,15 @@
 formatter_for('OrderedDict')(format_value_map)
 formatter_for('OrderedMap')(format_value_map)
 formatter_for('OrderedMapSerializedKey')(format_value_map)
+formatter_for('map')(format_value_map)
 
 
-def format_value_utype(val, encoding, colormap, date_time_format, float_precision, nullval,
+def format_value_utype(val, cqltype, encoding, colormap, date_time_format, float_precision, nullval,
                        decimal_sep, thousands_sep, boolean_styles, **_):
-    def format_field_value(v):
+    def format_field_value(v, t):
         if v is None:
             return colorme(nullval, colormap, 'error')
-        return format_value(type(v), v, encoding=encoding, colormap=colormap,
+        return format_value(v, cqltype=t, encoding=encoding, colormap=colormap,
                             date_time_format=date_time_format, float_precision=float_precision,
                             nullval=nullval, quote=True, decimal_sep=decimal_sep,
                             thousands_sep=thousands_sep, boolean_styles=boolean_styles)
@@ -360,7 +488,8 @@
     def format_field_name(name):
         return format_value_text(name, encoding=encoding, colormap=colormap, quote=False)
 
-    subs = [(format_field_name(k), format_field_value(v)) for (k, v) in val._asdict().items()]
+    subs = [(format_field_name(k), format_field_value(v, t)) for ((k, v), t) in zip(val._asdict().items(),
+                                                                                    cqltype.sub_types)]
     bval = '{' + ', '.join(get_str(k) + ': ' + get_str(v) for (k, v) in subs) + '}'
     if colormap is NO_COLOR_MAP:
         return bval

diff --git a/pylib/cqlshlib/test/run_cqlsh.py b/pylib/cqlshlib/test/run_cqlsh.py
index b011df4..fa010fe 100644
--- a/pylib/cqlshlib/test/run_cqlsh.py
+++ b/pylib/cqlshlib/test/run_cqlsh.py

@@ -189,6 +189,8 @@
                    flags=0, ptty_timeout=None):
         if not isinstance(until, re._pattern_type):
             until = re.compile(until, flags)
+
+        cqlshlog.debug("Searching for %r" % (until.pattern,))
         got = self.readbuf
         self.readbuf = ''
         with timing_out(timeout):

diff --git a/pylib/cqlshlib/test/test_cqlsh_completion.py b/pylib/cqlshlib/test/test_cqlsh_completion.py
index e736ea7..21eb088 100644
--- a/pylib/cqlshlib/test/test_cqlsh_completion.py
+++ b/pylib/cqlshlib/test/test_cqlsh_completion.py

@@ -367,12 +367,12 @@
                             choices=['EXISTS', '<quotedName>', '<identifier>'])
 
         self.trycompletions("UPDATE empty_table SET lonelycol = 'eggs' WHERE TOKEN(lonelykey) <= TOKEN(13) IF EXISTS ",
-                            choices=['>=', '!=', '<=', 'IN', '[', ';', '=', '<', '>'])
+                            choices=['>=', '!=', '<=', 'IN', '[', ';', '=', '<', '>', '.'])
 
     def test_complete_in_delete(self):
         self.trycompletions('DELETE F', choices=['FROM', '<identifier>', '<quotedName>'])
 
-        self.trycompletions('DELETE a ', choices=['FROM', '[', ','])
+        self.trycompletions('DELETE a ', choices=['FROM', '[', '.', ','])
         self.trycompletions('DELETE a [',
                             choices=['<wholenumber>', 'false', '-', '<uuid>',
                                      '<pgStringLiteral>', '<float>', 'TOKEN',
@@ -449,7 +449,7 @@
                             choices=['EXISTS', '<identifier>', '<quotedName>'])
         self.trycompletions(('DELETE FROM twenty_rows_composite_table USING TIMESTAMP 0 WHERE '
                              'TOKEN(a) >= TOKEN(0) IF b '),
-                            choices=['>=', '!=', '<=', 'IN', '[', '=', '<', '>'])
+                            choices=['>=', '!=', '<=', 'IN', '=', '<', '>'])
         self.trycompletions(('DELETE FROM twenty_rows_composite_table USING TIMESTAMP 0 WHERE '
                              'TOKEN(a) >= TOKEN(0) IF b < 0 '),
                             choices=['AND', ';'])
@@ -595,7 +595,7 @@
                                      'memtable_flush_period_in_ms',
                                      'read_repair_chance', 'CLUSTERING',
                                      'COMPACT', 'caching', 'comment',
-                                     'min_index_interval', 'speculative_retry'])
+                                     'min_index_interval', 'speculative_retry', 'cdc'])
         self.trycompletions(prefix + ' new_table (col_a int PRIMARY KEY) WITH ',
                             choices=['bloom_filter_fp_chance', 'compaction',
                                      'compression',
@@ -605,7 +605,7 @@
                                      'memtable_flush_period_in_ms',
                                      'read_repair_chance', 'CLUSTERING',
                                      'COMPACT', 'caching', 'comment',
-                                     'min_index_interval', 'speculative_retry'])
+                                     'min_index_interval', 'speculative_retry', 'cdc'])
         self.trycompletions(prefix + ' new_table (col_a int PRIMARY KEY) WITH bloom_filter_fp_chance ',
                             immediate='= ')
         self.trycompletions(prefix + ' new_table (col_a int PRIMARY KEY) WITH bloom_filter_fp_chance = ',
@@ -653,7 +653,7 @@
                                      'memtable_flush_period_in_ms',
                                      'read_repair_chance', 'CLUSTERING',
                                      'COMPACT', 'caching', 'comment',
-                                     'min_index_interval', 'speculative_retry'])
+                                     'min_index_interval', 'speculative_retry', 'cdc'])
         self.trycompletions(prefix + " new_table (col_a int PRIMARY KEY) WITH compaction = "
                             + "{'class': 'DateTieredCompactionStrategy', '",
                             choices=['base_time_seconds', 'max_sstable_age_days',
@@ -669,7 +669,6 @@
                                      'enabled', 'unchecked_tombstone_compaction',
                                      'only_purge_repaired_tombstones'])
 
-
     def test_complete_in_create_columnfamily(self):
         self.trycompletions('CREATE C', choices=['COLUMNFAMILY', 'CUSTOM'])
         self.trycompletions('CREATE CO', immediate='LUMNFAMILY ')

diff --git a/pylib/cqlshlib/test/test_cqlsh_output.py b/pylib/cqlshlib/test/test_cqlsh_output.py
index d905095..8dba651 100644
--- a/pylib/cqlshlib/test/test_cqlsh_output.py
+++ b/pylib/cqlshlib/test/test_cqlsh_output.py

@@ -360,10 +360,10 @@
             ('''select timestampcol from has_all_types where num = 0;''', """
              timestampcol
              MMMMMMMMMMMM
-            --------------------------
+            ---------------------------------
 
-             2012-05-14 12:53:20+0000
-             GGGGGGGGGGGGGGGGGGGGGGGG
+             2012-05-14 12:53:20.000000+0000
+             GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
 
 
             (1 rows)
@@ -376,10 +376,10 @@
                 ('''select timestampcol from has_all_types where num = 0;''', """
                  timestampcol
                  MMMMMMMMMMMM
-                --------------------------
+                ---------------------------------
 
-                 2012-05-14 09:53:20-0300
-                 GGGGGGGGGGGGGGGGGGGGGGGG
+                 2012-05-14 09:53:20.000000-0300
+                 GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
 
 
                 (1 rows)
@@ -620,6 +620,7 @@
                 varintcol varint
             ) WITH bloom_filter_fp_chance = 0.01
                 AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
+                AND cdc = false
                 AND comment = ''
                 AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
                 AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}

diff --git a/pylib/cqlshlib/tracing.py b/pylib/cqlshlib/tracing.py
index cea3568..26e228b 100644
--- a/pylib/cqlshlib/tracing.py
+++ b/pylib/cqlshlib/tracing.py

@@ -16,6 +16,7 @@
 
 from cqlshlib.displaying import MAGENTA
 from datetime import datetime, timedelta
+from formatting import CqlType
 import time
 from cassandra.query import QueryTrace, TraceUnavailable
 
@@ -42,7 +43,7 @@
     if not rows:
         shell.printerr("No rows for session %s found." % (trace.trace_id,))
         return
-    names = ['activity', 'timestamp', 'source', 'source_elapsed']
+    names = ['activity', 'timestamp', 'source', 'source_elapsed', 'client']
 
     formatted_names = map(shell.myformat_colname, names)
     formatted_values = [map(shell.myformat_value, row) for row in rows]
@@ -59,18 +60,19 @@
     if not trace.events:
         return []
 
-    rows = [[trace.request_type, str(datetime_from_utc_to_local(trace.started_at)), trace.coordinator, 0]]
+    rows = [[trace.request_type, str(datetime_from_utc_to_local(trace.started_at)), trace.coordinator, 0, trace.client]]
 
     # append main rows (from events table).
     for event in trace.events:
         rows.append(["%s [%s]" % (event.description, event.thread_name),
                      str(datetime_from_utc_to_local(event.datetime)),
                      event.source,
-                     total_micro_seconds(event.source_elapsed)])
+                     total_micro_seconds(event.source_elapsed),
+                     trace.client])
     # append footer row (from sessions table).
     if trace.duration:
         finished_at = (datetime_from_utc_to_local(trace.started_at) + trace.duration)
-        rows.append(['Request complete', str(finished_at), trace.coordinator, total_micro_seconds(trace.duration)])
+        rows.append(['Request complete', str(finished_at), trace.coordinator, total_micro_seconds(trace.duration), trace.client])
     else:
         finished_at = trace.duration = "--"
 

diff --git a/src/antlr/Cql.g b/src/antlr/Cql.g
new file mode 100644
index 0000000..61bdc43
--- /dev/null
+++ b/src/antlr/Cql.g

@@ -0,0 +1,122 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+grammar Cql;
+
+options {
+    language = Java;
+}
+
+import Parser,Lexer;
+
+@header {
+    package org.apache.cassandra.cql3;
+
+    import java.util.ArrayList;
+    import java.util.Arrays;
+    import java.util.Collections;
+    import java.util.EnumSet;
+    import java.util.HashSet;
+    import java.util.HashMap;
+    import java.util.LinkedHashMap;
+    import java.util.List;
+    import java.util.Map;
+    import java.util.Set;
+
+    import org.apache.cassandra.auth.*;
+    import org.apache.cassandra.config.ColumnDefinition;
+    import org.apache.cassandra.cql3.*;
+    import org.apache.cassandra.cql3.restrictions.CustomIndexExpression;
+    import org.apache.cassandra.cql3.statements.*;
+    import org.apache.cassandra.cql3.selection.*;
+    import org.apache.cassandra.cql3.functions.*;
+    import org.apache.cassandra.db.marshal.CollectionType;
+    import org.apache.cassandra.exceptions.ConfigurationException;
+    import org.apache.cassandra.exceptions.InvalidRequestException;
+    import org.apache.cassandra.exceptions.SyntaxException;
+    import org.apache.cassandra.utils.Pair;
+}
+
+@members {
+    public void addErrorListener(ErrorListener listener)
+    {
+        gParser.addErrorListener(listener);
+    }
+
+    public void removeErrorListener(ErrorListener listener)
+    {
+        gParser.removeErrorListener(listener);
+    }
+
+    public void displayRecognitionError(String[] tokenNames, RecognitionException e)
+    {
+        gParser.displayRecognitionError(tokenNames, e);
+    }
+
+    protected void addRecognitionError(String msg)
+    {
+        gParser.addRecognitionError(msg);
+    }
+}
+
+@lexer::header {
+    package org.apache.cassandra.cql3;
+
+    import org.apache.cassandra.exceptions.SyntaxException;
+}
+
+@lexer::members {
+    List<Token> tokens = new ArrayList<Token>();
+
+    public void emit(Token token)
+    {
+        state.token = token;
+        tokens.add(token);
+    }
+
+    public Token nextToken()
+    {
+        super.nextToken();
+        if (tokens.size() == 0)
+            return new CommonToken(Token.EOF);
+        return tokens.remove(0);
+    }
+
+    private final List<ErrorListener> listeners = new ArrayList<ErrorListener>();
+
+    public void addErrorListener(ErrorListener listener)
+    {
+        this.listeners.add(listener);
+    }
+
+    public void removeErrorListener(ErrorListener listener)
+    {
+        this.listeners.remove(listener);
+    }
+
+    public void displayRecognitionError(String[] tokenNames, RecognitionException e)
+    {
+        for (int i = 0, m = listeners.size(); i < m; i++)
+            listeners.get(i).syntaxError(this, tokenNames, e);
+    }
+}
+
+query returns [ParsedStatement stmnt]
+    : st=cqlStatement (';')* EOF { $stmnt = st; }
+    ;

diff --git a/src/antlr/Lexer.g b/src/antlr/Lexer.g
new file mode 100644
index 0000000..16b2ac4
--- /dev/null
+++ b/src/antlr/Lexer.g

@@ -0,0 +1,323 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+lexer grammar Lexer;
+
+@lexer::members {
+    List<Token> tokens = new ArrayList<Token>();
+
+    public void emit(Token token)
+    {
+        state.token = token;
+        tokens.add(token);
+    }
+
+    public Token nextToken()
+    {
+        super.nextToken();
+        if (tokens.size() == 0)
+            return new CommonToken(Token.EOF);
+        return tokens.remove(0);
+    }
+
+    private final List<ErrorListener> listeners = new ArrayList<ErrorListener>();
+
+    public void addErrorListener(ErrorListener listener)
+    {
+        this.listeners.add(listener);
+    }
+
+    public void removeErrorListener(ErrorListener listener)
+    {
+        this.listeners.remove(listener);
+    }
+
+    public void displayRecognitionError(String[] tokenNames, RecognitionException e)
+    {
+        for (int i = 0, m = listeners.size(); i < m; i++)
+            listeners.get(i).syntaxError(this, tokenNames, e);
+    }
+}
+
+// Case-insensitive keywords
+K_SELECT:      S E L E C T;
+K_FROM:        F R O M;
+K_AS:          A S;
+K_WHERE:       W H E R E;
+K_AND:         A N D;
+K_KEY:         K E Y;
+K_KEYS:        K E Y S;
+K_ENTRIES:     E N T R I E S;
+K_FULL:        F U L L;
+K_INSERT:      I N S E R T;
+K_UPDATE:      U P D A T E;
+K_WITH:        W I T H;
+K_LIMIT:       L I M I T;
+K_PER:         P E R;
+K_PARTITION:   P A R T I T I O N;
+K_USING:       U S I N G;
+K_USE:         U S E;
+K_DISTINCT:    D I S T I N C T;
+K_COUNT:       C O U N T;
+K_SET:         S E T;
+K_BEGIN:       B E G I N;
+K_UNLOGGED:    U N L O G G E D;
+K_BATCH:       B A T C H;
+K_APPLY:       A P P L Y;
+K_TRUNCATE:    T R U N C A T E;
+K_DELETE:      D E L E T E;
+K_IN:          I N;
+K_CREATE:      C R E A T E;
+K_KEYSPACE:    ( K E Y S P A C E
+                 | S C H E M A );
+K_KEYSPACES:   K E Y S P A C E S;
+K_COLUMNFAMILY:( C O L U M N F A M I L Y
+                 | T A B L E );
+K_MATERIALIZED:M A T E R I A L I Z E D;
+K_VIEW:        V I E W;
+K_INDEX:       I N D E X;
+K_CUSTOM:      C U S T O M;
+K_ON:          O N;
+K_TO:          T O;
+K_DROP:        D R O P;
+K_PRIMARY:     P R I M A R Y;
+K_INTO:        I N T O;
+K_VALUES:      V A L U E S;
+K_TIMESTAMP:   T I M E S T A M P;
+K_TTL:         T T L;
+K_CAST:        C A S T;
+K_ALTER:       A L T E R;
+K_RENAME:      R E N A M E;
+K_ADD:         A D D;
+K_TYPE:        T Y P E;
+K_COMPACT:     C O M P A C T;
+K_STORAGE:     S T O R A G E;
+K_ORDER:       O R D E R;
+K_BY:          B Y;
+K_ASC:         A S C;
+K_DESC:        D E S C;
+K_ALLOW:       A L L O W;
+K_FILTERING:   F I L T E R I N G;
+K_IF:          I F;
+K_IS:          I S;
+K_CONTAINS:    C O N T A I N S;
+
+K_GRANT:       G R A N T;
+K_ALL:         A L L;
+K_PERMISSION:  P E R M I S S I O N;
+K_PERMISSIONS: P E R M I S S I O N S;
+K_OF:          O F;
+K_REVOKE:      R E V O K E;
+K_MODIFY:      M O D I F Y;
+K_AUTHORIZE:   A U T H O R I Z E;
+K_DESCRIBE:    D E S C R I B E;
+K_EXECUTE:     E X E C U T E;
+K_NORECURSIVE: N O R E C U R S I V E;
+K_MBEAN:       M B E A N;
+K_MBEANS:      M B E A N S;
+
+K_USER:        U S E R;
+K_USERS:       U S E R S;
+K_ROLE:        R O L E;
+K_ROLES:       R O L E S;
+K_SUPERUSER:   S U P E R U S E R;
+K_NOSUPERUSER: N O S U P E R U S E R;
+K_PASSWORD:    P A S S W O R D;
+K_LOGIN:       L O G I N;
+K_NOLOGIN:     N O L O G I N;
+K_OPTIONS:     O P T I O N S;
+
+K_CLUSTERING:  C L U S T E R I N G;
+K_ASCII:       A S C I I;
+K_BIGINT:      B I G I N T;
+K_BLOB:        B L O B;
+K_BOOLEAN:     B O O L E A N;
+K_COUNTER:     C O U N T E R;
+K_DECIMAL:     D E C I M A L;
+K_DOUBLE:      D O U B L E;
+K_FLOAT:       F L O A T;
+K_INET:        I N E T;
+K_INT:         I N T;
+K_SMALLINT:    S M A L L I N T;
+K_TINYINT:     T I N Y I N T;
+K_TEXT:        T E X T;
+K_UUID:        U U I D;
+K_VARCHAR:     V A R C H A R;
+K_VARINT:      V A R I N T;
+K_TIMEUUID:    T I M E U U I D;
+K_TOKEN:       T O K E N;
+K_WRITETIME:   W R I T E T I M E;
+K_DATE:        D A T E;
+K_TIME:        T I M E;
+
+K_NULL:        N U L L;
+K_NOT:         N O T;
+K_EXISTS:      E X I S T S;
+
+K_MAP:         M A P;
+K_LIST:        L I S T;
+K_NAN:         N A N;
+K_INFINITY:    I N F I N I T Y;
+K_TUPLE:       T U P L E;
+
+K_TRIGGER:     T R I G G E R;
+K_STATIC:      S T A T I C;
+K_FROZEN:      F R O Z E N;
+
+K_FUNCTION:    F U N C T I O N;
+K_FUNCTIONS:   F U N C T I O N S;
+K_AGGREGATE:   A G G R E G A T E;
+K_SFUNC:       S F U N C;
+K_STYPE:       S T Y P E;
+K_FINALFUNC:   F I N A L F U N C;
+K_INITCOND:    I N I T C O N D;
+K_RETURNS:     R E T U R N S;
+K_CALLED:      C A L L E D;
+K_INPUT:       I N P U T;
+K_LANGUAGE:    L A N G U A G E;
+K_OR:          O R;
+K_REPLACE:     R E P L A C E;
+
+K_JSON:        J S O N;
+K_LIKE:        L I K E;
+
+// Case-insensitive alpha characters
+fragment A: ('a'|'A');
+fragment B: ('b'|'B');
+fragment C: ('c'|'C');
+fragment D: ('d'|'D');
+fragment E: ('e'|'E');
+fragment F: ('f'|'F');
+fragment G: ('g'|'G');
+fragment H: ('h'|'H');
+fragment I: ('i'|'I');
+fragment J: ('j'|'J');
+fragment K: ('k'|'K');
+fragment L: ('l'|'L');
+fragment M: ('m'|'M');
+fragment N: ('n'|'N');
+fragment O: ('o'|'O');
+fragment P: ('p'|'P');
+fragment Q: ('q'|'Q');
+fragment R: ('r'|'R');
+fragment S: ('s'|'S');
+fragment T: ('t'|'T');
+fragment U: ('u'|'U');
+fragment V: ('v'|'V');
+fragment W: ('w'|'W');
+fragment X: ('x'|'X');
+fragment Y: ('y'|'Y');
+fragment Z: ('z'|'Z');
+
+STRING_LITERAL
+    @init{
+        StringBuilder txt = new StringBuilder(); // temporary to build pg-style-string
+    }
+    @after{ setText(txt.toString()); }
+    :
+      /* pg-style string literal */
+      (
+        '\$' '\$'
+        ( /* collect all input until '$$' is reached again */
+          {  (input.size() - input.index() > 1)
+               && !"$$".equals(input.substring(input.index(), input.index() + 1)) }?
+             => c=. { txt.appendCodePoint(c); }
+        )*
+        '\$' '\$'
+      )
+      |
+      /* conventional quoted string literal */
+      (
+        '\'' (c=~('\'') { txt.appendCodePoint(c);} | '\'' '\'' { txt.appendCodePoint('\''); })* '\''
+      )
+    ;
+
+QUOTED_NAME
+    @init{ StringBuilder b = new StringBuilder(); }
+    @after{ setText(b.toString()); }
+    : '\"' (c=~('\"') { b.appendCodePoint(c); } | '\"' '\"' { b.appendCodePoint('\"'); })+ '\"'
+    ;
+
+fragment DIGIT
+    : '0'..'9'
+    ;
+
+fragment LETTER
+    : ('A'..'Z' | 'a'..'z')
+    ;
+
+fragment HEX
+    : ('A'..'F' | 'a'..'f' | '0'..'9')
+    ;
+
+fragment EXPONENT
+    : E ('+' | '-')? DIGIT+
+    ;
+
+INTEGER
+    : '-'? DIGIT+
+    ;
+
+QMARK
+    : '?'
+    ;
+
+/*
+ * Normally a lexer only emits one token at a time, but ours is tricked out
+ * to support multiple (see @lexer::members near the top of the grammar).
+ */
+FLOAT
+    : INTEGER EXPONENT
+    | INTEGER '.' DIGIT* EXPONENT?
+    ;
+
+/*
+ * This has to be before IDENT so it takes precendence over it.
+ */
+BOOLEAN
+    : T R U E | F A L S E
+    ;
+
+IDENT
+    : LETTER (LETTER | DIGIT | '_')*
+    ;
+
+HEXNUMBER
+    : '0' X HEX*
+    ;
+
+UUID
+    : HEX HEX HEX HEX HEX HEX HEX HEX '-'
+      HEX HEX HEX HEX '-'
+      HEX HEX HEX HEX '-'
+      HEX HEX HEX HEX '-'
+      HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX
+    ;
+
+WS
+    : (' ' | '\t' | '\n' | '\r')+ { $channel = HIDDEN; }
+    ;
+
+COMMENT
+    : ('--' | '//') .* ('\n'|'\r') { $channel = HIDDEN; }
+    ;
+
+MULTILINE_COMMENT
+    : '/*' .* '*/' { $channel = HIDDEN; }
+    ;

diff --git a/src/java/org/apache/cassandra/cql3/Cql.g b/src/antlr/Parser.g
similarity index 81%
rename from src/java/org/apache/cassandra/cql3/Cql.g
rename to src/antlr/Parser.g
index 453a03d..f61f464 100644
--- a/src/java/org/apache/cassandra/cql3/Cql.g
+++ b/src/antlr/Parser.g

@@ -17,42 +17,15 @@
  * under the License.
  */
 
-grammar Cql;
+parser grammar Parser;
 
 options {
     language = Java;
 }
 
-@header {
-    package org.apache.cassandra.cql3;
-
-    import java.util.ArrayList;
-    import java.util.Arrays;
-    import java.util.Collections;
-    import java.util.EnumSet;
-    import java.util.HashSet;
-    import java.util.HashMap;
-    import java.util.LinkedHashMap;
-    import java.util.List;
-    import java.util.Map;
-    import java.util.Set;
-
-    import org.apache.cassandra.auth.*;
-    import org.apache.cassandra.cql3.*;
-    import org.apache.cassandra.cql3.restrictions.CustomIndexExpression;
-    import org.apache.cassandra.cql3.statements.*;
-    import org.apache.cassandra.cql3.selection.*;
-    import org.apache.cassandra.cql3.functions.*;
-    import org.apache.cassandra.db.marshal.CollectionType;
-    import org.apache.cassandra.exceptions.ConfigurationException;
-    import org.apache.cassandra.exceptions.InvalidRequestException;
-    import org.apache.cassandra.exceptions.SyntaxException;
-    import org.apache.cassandra.utils.Pair;
-}
-
 @members {
     private final List<ErrorListener> listeners = new ArrayList<ErrorListener>();
-    private final List<ColumnIdentifier> bindVariables = new ArrayList<ColumnIdentifier>();
+    protected final List<ColumnIdentifier> bindVariables = new ArrayList<ColumnIdentifier>();
 
     public static final Set<String> reservedTypeNames = new HashSet<String>()
     {{
@@ -116,7 +89,7 @@
             listeners.get(i).syntaxError(this, tokenNames, e);
     }
 
-    private void addRecognitionError(String msg)
+    protected void addRecognitionError(String msg)
     {
         for (int i = 0, m = listeners.size(); i < m; i++)
             listeners.get(i).syntaxError(this, msg);
@@ -127,7 +100,7 @@
         if (map == null || map.entries == null || map.entries.isEmpty())
             return Collections.<String, String>emptyMap();
 
-        Map<String, String> res = new HashMap<String, String>(map.entries.size());
+        Map<String, String> res = new HashMap<>(map.entries.size());
 
         for (Pair<Term.Raw, Term.Raw> entry : map.entries)
         {
@@ -160,9 +133,9 @@
         return res;
     }
 
-    public void addRawUpdate(List<Pair<ColumnIdentifier.Raw, Operation.RawUpdate>> operations, ColumnIdentifier.Raw key, Operation.RawUpdate update)
+    public void addRawUpdate(List<Pair<ColumnDefinition.Raw, Operation.RawUpdate>> operations, ColumnDefinition.Raw key, Operation.RawUpdate update)
     {
-        for (Pair<ColumnIdentifier.Raw, Operation.RawUpdate> p : operations)
+        for (Pair<ColumnDefinition.Raw, Operation.RawUpdate> p : operations)
         {
             if (p.left.equals(key) && !p.right.isCompatibleWith(update))
                 addRecognitionError("Multiple incompatible setting of column " + key);
@@ -182,56 +155,36 @@
 
         return filtered;
     }
-}
 
-@lexer::header {
-    package org.apache.cassandra.cql3;
-
-    import org.apache.cassandra.exceptions.SyntaxException;
-}
-
-@lexer::members {
-    List<Token> tokens = new ArrayList<Token>();
-
-    public void emit(Token token)
+    public String canonicalizeObjectName(String s, boolean enforcePattern)
     {
-        state.token = token;
-        tokens.add(token);
+        // these two conditions are here because technically they are valid
+        // ObjectNames, but we want to restrict their use without adding unnecessary
+        // work to JMXResource construction as that also happens on hotter code paths
+        if ("".equals(s))
+            addRecognitionError("Empty JMX object name supplied");
+
+        if ("*:*".equals(s))
+            addRecognitionError("Please use ALL MBEANS instead of wildcard pattern");
+
+        try
+        {
+            javax.management.ObjectName objectName = javax.management.ObjectName.getInstance(s);
+            if (enforcePattern && !objectName.isPattern())
+                addRecognitionError("Plural form used, but non-pattern JMX object name specified (" + s + ")");
+            return objectName.getCanonicalName();
+        }
+        catch (javax.management.MalformedObjectNameException e)
+        {
+          addRecognitionError(s + " is not a valid JMX object name");
+          return s;
+        }
     }
 
-    public Token nextToken()
-    {
-        super.nextToken();
-        if (tokens.size() == 0)
-            return new CommonToken(Token.EOF);
-        return tokens.remove(0);
-    }
-
-    private final List<ErrorListener> listeners = new ArrayList<ErrorListener>();
-
-    public void addErrorListener(ErrorListener listener)
-    {
-        this.listeners.add(listener);
-    }
-
-    public void removeErrorListener(ErrorListener listener)
-    {
-        this.listeners.remove(listener);
-    }
-
-    public void displayRecognitionError(String[] tokenNames, RecognitionException e)
-    {
-        for (int i = 0, m = listeners.size(); i < m; i++)
-            listeners.get(i).syntaxError(this, tokenNames, e);
-    }
 }
 
 /** STATEMENTS **/
 
-query returns [ParsedStatement stmnt]
-    : st=cqlStatement (';')* EOF { $stmnt = st; }
-    ;
-
 cqlStatement returns [ParsedStatement stmt]
     @after{ if (stmt != null) stmt.setBoundVariables(bindVariables); }
     : st1= selectStatement                 { $stmt = st1; }
@@ -293,16 +246,18 @@
     @init {
         boolean isDistinct = false;
         Term.Raw limit = null;
-        Map<ColumnIdentifier.Raw, Boolean> orderings = new LinkedHashMap<ColumnIdentifier.Raw, Boolean>();
+        Term.Raw perPartitionLimit = null;
+        Map<ColumnDefinition.Raw, Boolean> orderings = new LinkedHashMap<>();
         boolean allowFiltering = false;
         boolean isJson = false;
     }
-    : K_SELECT 
+    : K_SELECT
       ( K_JSON { isJson = true; } )?
       ( ( K_DISTINCT { isDistinct = true; } )? sclause=selectClause )
       K_FROM cf=columnFamilyName
       ( K_WHERE wclause=whereClause )?
       ( K_ORDER K_BY orderByClause[orderings] ( ',' orderByClause[orderings] )* )?
+      ( K_PER K_PARTITION K_LIMIT rows=intValue { perPartitionLimit = rows; } )?
       ( K_LIMIT rows=intValue { limit = rows; } )?
       ( K_ALLOW K_FILTERING  { allowFiltering = true; } )?
       {
@@ -311,7 +266,7 @@
                                                                              allowFiltering,
                                                                              isJson);
           WhereClause where = wclause == null ? WhereClause.empty() : wclause.build();
-          $expr = new SelectStatement.RawStatement(cf, params, sclause, where, limit);
+          $expr = new SelectStatement.RawStatement(cf, params, sclause, where, limit, perPartitionLimit);
       }
     ;
 
@@ -325,14 +280,21 @@
     : us=unaliasedSelector (K_AS c=noncol_ident { alias = c; })? { $s = new RawSelector(us, alias); }
     ;
 
+/*
+ * A single selection. The core of it is selecting a column, but we also allow any term and function, as well as
+ * sub-element selection for UDT.
+ */
 unaliasedSelector returns [Selectable.Raw s]
     @init { Selectable.Raw tmp = null; }
     :  ( c=cident                                  { tmp = c; }
-       | K_COUNT '(' countArgument ')'             { tmp = new Selectable.WithFunction.Raw(FunctionName.nativeFunction("countRows"), Collections.<Selectable.Raw>emptyList());}
+       | v=value                                   { tmp = new Selectable.WithTerm.Raw(v); }
+       | '(' ct=comparatorType ')' v=value         { tmp = new Selectable.WithTerm.Raw(new TypeCast(ct, v)); }
+       | K_COUNT '(' '\*' ')'                      { tmp = Selectable.WithFunction.Raw.newCountRowsFunction(); }
        | K_WRITETIME '(' c=cident ')'              { tmp = new Selectable.WritetimeOrTTL.Raw(c, true); }
        | K_TTL       '(' c=cident ')'              { tmp = new Selectable.WritetimeOrTTL.Raw(c, false); }
+       | K_CAST      '(' sn=unaliasedSelector K_AS t=native_type ')' {tmp = new Selectable.WithCast.Raw(sn, t);}
        | f=functionName args=selectionFunctionArgs { tmp = new Selectable.WithFunction.Raw(f, args); }
-       ) ( '.' fi=cident { tmp = new Selectable.WithFieldSelection.Raw(tmp, fi); } )* { $s = tmp; }
+       ) ( '.' fi=fident { tmp = new Selectable.WithFieldSelection.Raw(tmp, fi); } )* { $s = tmp; }
     ;
 
 selectionFunctionArgs returns [List<Selectable.Raw> a]
@@ -342,11 +304,6 @@
       ')' { $a = args; }
     ;
 
-countArgument
-    : '\*'
-    | i=INTEGER { if (!i.getText().equals("1")) addRecognitionError("Only COUNT(1) is supported, got COUNT(" + i.getText() + ")");}
-    ;
-
 whereClause returns [WhereClause.Builder clause]
     @init{ $clause = new WhereClause.Builder(); }
     : relationOrExpression[$clause] (K_AND relationOrExpression[$clause])*
@@ -362,7 +319,7 @@
     : 'expr(' idxName[name] ',' t=term ')' { clause.add(new CustomIndexExpression(name, t));}
     ;
 
-orderByClause[Map<ColumnIdentifier.Raw, Boolean> orderings]
+orderByClause[Map<ColumnDefinition.Raw, Boolean> orderings]
     @init{
         boolean reversed = false;
     }
@@ -384,8 +341,8 @@
 normalInsertStatement [CFName cf] returns [UpdateStatement.ParsedInsert expr]
     @init {
         Attributes.Raw attrs = new Attributes.Raw();
-        List<ColumnIdentifier.Raw> columnNames  = new ArrayList<ColumnIdentifier.Raw>();
-        List<Term.Raw> values = new ArrayList<Term.Raw>();
+        List<ColumnDefinition.Raw> columnNames  = new ArrayList<>();
+        List<Term.Raw> values = new ArrayList<>();
         boolean ifNotExists = false;
     }
     : '(' c1=cident { columnNames.add(c1); }  ( ',' cn=cident { columnNames.add(cn); } )* ')'
@@ -437,7 +394,7 @@
 updateStatement returns [UpdateStatement.ParsedUpdate expr]
     @init {
         Attributes.Raw attrs = new Attributes.Raw();
-        List<Pair<ColumnIdentifier.Raw, Operation.RawUpdate>> operations = new ArrayList<Pair<ColumnIdentifier.Raw, Operation.RawUpdate>>();
+        List<Pair<ColumnDefinition.Raw, Operation.RawUpdate>> operations = new ArrayList<>();
         boolean ifExists = false;
     }
     : K_UPDATE cf=columnFamilyName
@@ -450,13 +407,13 @@
                                                   attrs,
                                                   operations,
                                                   wclause.build(),
-                                                  conditions == null ? Collections.<Pair<ColumnIdentifier.Raw, ColumnCondition.Raw>>emptyList() : conditions,
+                                                  conditions == null ? Collections.<Pair<ColumnDefinition.Raw, ColumnCondition.Raw>>emptyList() : conditions,
                                                   ifExists);
      }
     ;
 
-updateConditions returns [List<Pair<ColumnIdentifier.Raw, ColumnCondition.Raw>> conditions]
-    @init { conditions = new ArrayList<Pair<ColumnIdentifier.Raw, ColumnCondition.Raw>>(); }
+updateConditions returns [List<Pair<ColumnDefinition.Raw, ColumnCondition.Raw>> conditions]
+    @init { conditions = new ArrayList<Pair<ColumnDefinition.Raw, ColumnCondition.Raw>>(); }
     : columnCondition[conditions] ( K_AND columnCondition[conditions] )*
     ;
 
@@ -484,7 +441,7 @@
                                             attrs,
                                             columnDeletions,
                                             wclause.build(),
-                                            conditions == null ? Collections.<Pair<ColumnIdentifier.Raw, ColumnCondition.Raw>>emptyList() : conditions,
+                                            conditions == null ? Collections.<Pair<ColumnDefinition.Raw, ColumnCondition.Raw>>emptyList() : conditions,
                                             ifExists);
       }
     ;
@@ -498,6 +455,7 @@
 deleteOp returns [Operation.RawDeletion op]
     : c=cident                { $op = new Operation.ColumnDeletion(c); }
     | c=cident '[' t=term ']' { $op = new Operation.ElementDeletion(c, t); }
+    | c=cident '.' field=fident { $op = new Operation.FieldDeletion(c, field); }
     ;
 
 usingClauseDelete[Attributes.Raw attrs]
@@ -716,7 +674,7 @@
     ;
 
 typeColumns[CreateTypeStatement expr]
-    : k=noncol_ident v=comparatorType { $expr.addDefinition(k, v); }
+    : k=fident v=comparatorType { $expr.addDefinition(k, v); }
     ;
 
 
@@ -757,8 +715,8 @@
 createMaterializedViewStatement returns [CreateViewStatement expr]
     @init {
         boolean ifNotExists = false;
-        List<ColumnIdentifier.Raw> partitionKeys = new ArrayList<>();
-        List<ColumnIdentifier.Raw> compositeKeys = new ArrayList<>();
+        List<ColumnDefinition.Raw> partitionKeys = new ArrayList<>();
+        List<ColumnDefinition.Raw> compositeKeys = new ArrayList<>();
     }
     : K_CREATE K_MATERIALIZED K_VIEW (K_IF K_NOT K_EXISTS { ifNotExists = true; })? cf=columnFamilyName K_AS
         K_SELECT sclause=selectClause K_FROM basecf=columnFamilyName
@@ -783,7 +741,7 @@
     }
     : K_CREATE K_TRIGGER (K_IF K_NOT K_EXISTS { ifNotExists = true; } )? (name=cident)
         K_ON cf=columnFamilyName K_USING cls=STRING_LITERAL
-      { $expr = new CreateTriggerStatement(cf, name.toString(), $cls.text, ifNotExists); }
+      { $expr = new CreateTriggerStatement(cf, name.rawText(), $cls.text, ifNotExists); }
     ;
 
 /**
@@ -792,7 +750,7 @@
 dropTriggerStatement returns [DropTriggerStatement expr]
      @init { boolean ifExists = false; }
     : K_DROP K_TRIGGER (K_IF K_EXISTS { ifExists = true; } )? (name=cident) K_ON cf=columnFamilyName
-      { $expr = new DropTriggerStatement(cf, name.toString(), ifExists); }
+      { $expr = new DropTriggerStatement(cf, name.rawText(), ifExists); }
     ;
 
 /**
@@ -804,11 +762,10 @@
         K_WITH properties[attrs] { $expr = new AlterKeyspaceStatement(ks, attrs); }
     ;
 
-
 /**
  * ALTER COLUMN FAMILY <CF> ALTER <column> TYPE <newtype>;
- * ALTER COLUMN FAMILY <CF> ADD <column> <newtype>;
- * ALTER COLUMN FAMILY <CF> DROP <column>;
+ * ALTER COLUMN FAMILY <CF> ADD <column> <newtype>; | ALTER COLUMN FAMILY <CF> ADD (<column> <newtype>,<column1> <newtype1>..... <column n> <newtype n>)
+ * ALTER COLUMN FAMILY <CF> DROP <column>; | ALTER COLUMN FAMILY <CF> DROP ( <column>,<column1>.....<column n>)
  * ALTER COLUMN FAMILY <CF> WITH <property> = <value>;
  * ALTER COLUMN FAMILY <CF> RENAME <column> TO <column>;
  */
@@ -816,20 +773,32 @@
     @init {
         AlterTableStatement.Type type = null;
         TableAttributes attrs = new TableAttributes();
-        Map<ColumnIdentifier.Raw, ColumnIdentifier.Raw> renames = new HashMap<ColumnIdentifier.Raw, ColumnIdentifier.Raw>();
-        boolean isStatic = false;
+        Map<ColumnDefinition.Raw, ColumnDefinition.Raw> renames = new HashMap<ColumnDefinition.Raw, ColumnDefinition.Raw>();
+        List<AlterTableStatementColumn> colNameList = new ArrayList<AlterTableStatementColumn>();
     }
     : K_ALTER K_COLUMNFAMILY cf=columnFamilyName
-          ( K_ALTER id=cident K_TYPE v=comparatorType { type = AlterTableStatement.Type.ALTER; }
-          | K_ADD   id=cident v=comparatorType ({ isStatic=true; } K_STATIC)? { type = AlterTableStatement.Type.ADD; }
-          | K_DROP  id=cident                         { type = AlterTableStatement.Type.DROP; }
+          ( K_ALTER id=cident  K_TYPE v=comparatorType  { type = AlterTableStatement.Type.ALTER; } { colNameList.add(new AlterTableStatementColumn(id,v)); }
+          | K_ADD  (        (id=cident   v=comparatorType   b1=cfisStatic { colNameList.add(new AlterTableStatementColumn(id,v,b1)); })
+                     | ('('  id1=cident  v1=comparatorType  b1=cfisStatic { colNameList.add(new AlterTableStatementColumn(id1,v1,b1)); }
+                       ( ',' idn=cident  vn=comparatorType  bn=cfisStatic { colNameList.add(new AlterTableStatementColumn(idn,vn,bn)); } )* ')' ) ) { type = AlterTableStatement.Type.ADD; }
+          | K_DROP (         id=cident  { colNameList.add(new AlterTableStatementColumn(id)); }
+                     | ('('  id1=cident { colNameList.add(new AlterTableStatementColumn(id1)); }
+                       ( ',' idn=cident { colNameList.add(new AlterTableStatementColumn(idn)); } )* ')') ) { type = AlterTableStatement.Type.DROP; }
           | K_WITH  properties[attrs]                 { type = AlterTableStatement.Type.OPTS; }
           | K_RENAME                                  { type = AlterTableStatement.Type.RENAME; }
                id1=cident K_TO toId1=cident { renames.put(id1, toId1); }
                ( K_AND idn=cident K_TO toIdn=cident { renames.put(idn, toIdn); } )*
           )
     {
-        $expr = new AlterTableStatement(cf, type, id, v, attrs, renames, isStatic);
+        $expr = new AlterTableStatement(cf, type, colNameList, attrs, renames);
+    }
+    ;
+
+cfisStatic returns [boolean isStaticColumn]
+    @init{
+        boolean isStatic = false;
+    }
+    : (K_STATIC { isStatic=true; })? { $isStaticColumn = isStatic;
     }
     ;
 
@@ -843,7 +812,7 @@
         $expr = new AlterViewStatement(name, attrs);
     }
     ;
-    
+
 
 /**
  * ALTER TYPE <name> ALTER <field> TYPE <newtype>;
@@ -852,12 +821,12 @@
  */
 alterTypeStatement returns [AlterTypeStatement expr]
     : K_ALTER K_TYPE name=userTypeName
-          ( K_ALTER f=noncol_ident K_TYPE v=comparatorType { $expr = AlterTypeStatement.alter(name, f, v); }
-          | K_ADD   f=noncol_ident v=comparatorType        { $expr = AlterTypeStatement.addition(name, f, v); }
+          ( K_ALTER f=fident K_TYPE v=comparatorType { $expr = AlterTypeStatement.alter(name, f, v); }
+          | K_ADD   f=fident v=comparatorType        { $expr = AlterTypeStatement.addition(name, f, v); }
           | K_RENAME
-               { Map<ColumnIdentifier, ColumnIdentifier> renames = new HashMap<ColumnIdentifier, ColumnIdentifier>(); }
-                 id1=noncol_ident K_TO toId1=noncol_ident { renames.put(id1, toId1); }
-                 ( K_AND idn=noncol_ident K_TO toIdn=noncol_ident { renames.put(idn, toIdn); } )*
+               { Map<FieldIdentifier, FieldIdentifier> renames = new HashMap<>(); }
+                 id1=fident K_TO toId1=fident { renames.put(id1, toId1); }
+                 ( K_AND idn=fident K_TO toIdn=fident { renames.put(idn, toIdn); } )*
                { $expr = AlterTypeStatement.renames(name, renames); }
           )
     ;
@@ -988,6 +957,7 @@
     : d=dataResource { $res = $d.res; }
     | r=roleResource { $res = $r.res; }
     | f=functionResource { $res = $f.res; }
+    | j=jmxResource { $res = $j.res; }
     ;
 
 dataResource returns [DataResource res]
@@ -997,6 +967,14 @@
       { $res = DataResource.table($cf.name.getKeyspace(), $cf.name.getColumnFamily()); }
     ;
 
+jmxResource returns [JMXResource res]
+    : K_ALL K_MBEANS { $res = JMXResource.root(); }
+    // when a bean name (or pattern) is supplied, validate that it's a legal ObjectName
+    // also, just to be picky, if the "MBEANS" form is used, only allow a pattern style names
+    | K_MBEAN mbean { $res = JMXResource.mbean(canonicalizeObjectName($mbean.text, false)); }
+    | K_MBEANS mbean { $res = JMXResource.mbean(canonicalizeObjectName($mbean.text, true)); }
+    ;
+
 roleResource returns [RoleResource res]
     : K_ALL K_ROLES { $res = RoleResource.root(); }
     | K_ROLE role = userOrRoleName { $res = RoleResource.role($role.name.getName()); }
@@ -1166,10 +1144,10 @@
 // Column Identifiers.  These need to be treated differently from other
 // identifiers because the underlying comparator is not necessarily text. See
 // CASSANDRA-8178 for details.
-cident returns [ColumnIdentifier.Raw id]
-    : t=IDENT              { $id = new ColumnIdentifier.Literal($t.text, false); }
-    | t=QUOTED_NAME        { $id = new ColumnIdentifier.Literal($t.text, true); }
-    | k=unreserved_keyword { $id = new ColumnIdentifier.Literal(k, false); }
+cident returns [ColumnDefinition.Raw id]
+    : t=IDENT              { $id = ColumnDefinition.Raw.forUnquoted($t.text); }
+    | t=QUOTED_NAME        { $id = ColumnDefinition.Raw.forQuoted($t.text); }
+    | k=unreserved_keyword { $id = ColumnDefinition.Raw.forUnquoted(k); }
     ;
 
 // Column identifiers where the comparator is known to be text
@@ -1179,6 +1157,12 @@
     | k=unreserved_keyword { $id = ColumnIdentifier.getInterned(k, false); }
     ;
 
+fident returns [FieldIdentifier id]
+    : t=IDENT              { $id = FieldIdentifier.forUnquoted($t.text); }
+    | t=QUOTED_NAME        { $id = FieldIdentifier.forQuoted($t.text); }
+    | k=unreserved_keyword { $id = FieldIdentifier.forUnquoted(k); }
+    ;
+
 // Identifiers that do not refer to columns
 noncol_ident returns [ColumnIdentifier id]
     : t=IDENT              { $id = new ColumnIdentifier($t.text, false); }
@@ -1276,10 +1260,10 @@
     ;
 
 usertypeLiteral returns [UserTypes.Literal ut]
-    @init{ Map<ColumnIdentifier, Term.Raw> m = new HashMap<ColumnIdentifier, Term.Raw>(); }
+    @init{ Map<FieldIdentifier, Term.Raw> m = new HashMap<>(); }
     @after{ $ut = new UserTypes.Literal(m); }
     // We don't allow empty literals because that conflicts with sets/maps and is currently useless since we don't allow empty user types
-    : '{' k1=noncol_ident ':' v1=term { m.put(k1, v1); } ( ',' kn=noncol_ident ':' vn=term { m.put(kn, vn); } )* '}'
+    : '{' k1=fident ':' v1=term { m.put(k1, v1); } ( ',' kn=fident ':' vn=term { m.put(kn, vn); } )* '}'
     ;
 
 tupleLiteral returns [Tuples.Literal tt]
@@ -1333,16 +1317,17 @@
     | '(' c=comparatorType ')' t=term  { $term = new TypeCast(c, t); }
     ;
 
-columnOperation[List<Pair<ColumnIdentifier.Raw, Operation.RawUpdate>> operations]
+columnOperation[List<Pair<ColumnDefinition.Raw, Operation.RawUpdate>> operations]
     : key=cident columnOperationDifferentiator[operations, key]
     ;
 
-columnOperationDifferentiator[List<Pair<ColumnIdentifier.Raw, Operation.RawUpdate>> operations, ColumnIdentifier.Raw key]
+columnOperationDifferentiator[List<Pair<ColumnDefinition.Raw, Operation.RawUpdate>> operations, ColumnDefinition.Raw key]
     : '=' normalColumnOperation[operations, key]
-    | '[' k=term ']' specializedColumnOperation[operations, key, k]
+    | '[' k=term ']' collectionColumnOperation[operations, key, k]
+    | '.' field=fident udtColumnOperation[operations, key, field]
     ;
 
-normalColumnOperation[List<Pair<ColumnIdentifier.Raw, Operation.RawUpdate>> operations, ColumnIdentifier.Raw key]
+normalColumnOperation[List<Pair<ColumnDefinition.Raw, Operation.RawUpdate>> operations, ColumnDefinition.Raw key]
     : t=term ('+' c=cident )?
       {
           if (c == null)
@@ -1372,14 +1357,21 @@
       }
     ;
 
-specializedColumnOperation[List<Pair<ColumnIdentifier.Raw, Operation.RawUpdate>> operations, ColumnIdentifier.Raw key, Term.Raw k]
+collectionColumnOperation[List<Pair<ColumnDefinition.Raw, Operation.RawUpdate>> operations, ColumnDefinition.Raw key, Term.Raw k]
     : '=' t=term
       {
           addRawUpdate(operations, key, new Operation.SetElement(k, t));
       }
     ;
 
-columnCondition[List<Pair<ColumnIdentifier.Raw, ColumnCondition.Raw>> conditions]
+udtColumnOperation[List<Pair<ColumnDefinition.Raw, Operation.RawUpdate>> operations, ColumnDefinition.Raw key, FieldIdentifier field]
+    : '=' t=term
+      {
+          addRawUpdate(operations, key, new Operation.SetField(field, t));
+      }
+    ;
+
+columnCondition[List<Pair<ColumnDefinition.Raw, ColumnCondition.Raw>> conditions]
     // Note: we'll reject duplicates later
     : key=cident
         ( op=relationType t=term { conditions.add(Pair.create(key, ColumnCondition.Raw.simpleCondition(t, op))); }
@@ -1394,6 +1386,13 @@
                 | marker=inMarker { conditions.add(Pair.create(key, ColumnCondition.Raw.collectionInCondition(element, marker))); }
                 )
             )
+        | '.' field=fident
+            ( op=relationType t=term { conditions.add(Pair.create(key, ColumnCondition.Raw.udtFieldCondition(t, field, op))); }
+            | K_IN
+                ( values=singleColumnInValues { conditions.add(Pair.create(key, ColumnCondition.Raw.udtFieldInCondition(field, values))); }
+                | marker=inMarker { conditions.add(Pair.create(key, ColumnCondition.Raw.udtFieldInCondition(field, marker))); }
+                )
+            )
         )
     ;
 
@@ -1422,6 +1421,7 @@
 
 relation[WhereClause.Builder clauses]
     : name=cident type=relationType t=term { $clauses.add(new SingleColumnRelation(name, type, t)); }
+    | name=cident K_LIKE t=term { $clauses.add(new SingleColumnRelation(name, Operator.LIKE, t)); }
     | name=cident K_IS K_NOT K_NULL { $clauses.add(new SingleColumnRelation(name, Operator.IS_NOT, Constants.NULL_LITERAL)); }
     | K_TOKEN l=tupleOfIdentifiers type=relationType t=term
         { $clauses.add(new TokenRelation(l, type, t)); }
@@ -1460,8 +1460,8 @@
     | ':' name=noncol_ident { $marker = newINBindVariables(name); }
     ;
 
-tupleOfIdentifiers returns [List<ColumnIdentifier.Raw> ids]
-    @init { $ids = new ArrayList<ColumnIdentifier.Raw>(); }
+tupleOfIdentifiers returns [List<ColumnDefinition.Raw> ids]
+    @init { $ids = new ArrayList<ColumnDefinition.Raw>(); }
     : '(' n1=cident { $ids.add(n1); } (',' ni=cident { $ids.add(ni); })* ')'
     ;
 
@@ -1563,6 +1563,10 @@
     | QUOTED_NAME { addRecognitionError("Quoted strings are are not supported for user names and USER is deprecated, please use ROLE");}
     ;
 
+mbean
+    : STRING_LITERAL
+    ;
+
 // Basically the same as cident, but we need to exlude existing CQL3 types
 // (which for some reason are not reserved otherwise)
 non_type_ident returns [ColumnIdentifier id]
@@ -1574,7 +1578,7 @@
 
 unreserved_keyword returns [String str]
     : u=unreserved_function_keyword     { $str = u; }
-    | k=(K_TTL | K_COUNT | K_WRITETIME | K_KEY) { $str = $k.text; }
+    | k=(K_TTL | K_COUNT | K_WRITETIME | K_KEY | K_CAST | K_JSON | K_DISTINCT) { $str = $k.text; }
     ;
 
 unreserved_function_keyword returns [String str]
@@ -1610,7 +1614,6 @@
         | K_EXISTS
         | K_CUSTOM
         | K_TRIGGER
-        | K_DISTINCT
         | K_CONTAINS
         | K_STATIC
         | K_FROZEN
@@ -1624,269 +1627,10 @@
         | K_INITCOND
         | K_RETURNS
         | K_LANGUAGE
-        | K_JSON
         | K_CALLED
         | K_INPUT
+        | K_LIKE
+        | K_PER
+        | K_PARTITION
         ) { $str = $k.text; }
     ;
-
-// Case-insensitive keywords
-K_SELECT:      S E L E C T;
-K_FROM:        F R O M;
-K_AS:          A S;
-K_WHERE:       W H E R E;
-K_AND:         A N D;
-K_KEY:         K E Y;
-K_KEYS:        K E Y S;
-K_ENTRIES:     E N T R I E S;
-K_FULL:        F U L L;
-K_INSERT:      I N S E R T;
-K_UPDATE:      U P D A T E;
-K_WITH:        W I T H;
-K_LIMIT:       L I M I T;
-K_USING:       U S I N G;
-K_USE:         U S E;
-K_DISTINCT:    D I S T I N C T;
-K_COUNT:       C O U N T;
-K_SET:         S E T;
-K_BEGIN:       B E G I N;
-K_UNLOGGED:    U N L O G G E D;
-K_BATCH:       B A T C H;
-K_APPLY:       A P P L Y;
-K_TRUNCATE:    T R U N C A T E;
-K_DELETE:      D E L E T E;
-K_IN:          I N;
-K_CREATE:      C R E A T E;
-K_KEYSPACE:    ( K E Y S P A C E
-                 | S C H E M A );
-K_KEYSPACES:   K E Y S P A C E S;
-K_COLUMNFAMILY:( C O L U M N F A M I L Y
-                 | T A B L E );
-K_MATERIALIZED:M A T E R I A L I Z E D;
-K_VIEW:        V I E W;
-K_INDEX:       I N D E X;
-K_CUSTOM:      C U S T O M;
-K_ON:          O N;
-K_TO:          T O;
-K_DROP:        D R O P;
-K_PRIMARY:     P R I M A R Y;
-K_INTO:        I N T O;
-K_VALUES:      V A L U E S;
-K_TIMESTAMP:   T I M E S T A M P;
-K_TTL:         T T L;
-K_ALTER:       A L T E R;
-K_RENAME:      R E N A M E;
-K_ADD:         A D D;
-K_TYPE:        T Y P E;
-K_COMPACT:     C O M P A C T;
-K_STORAGE:     S T O R A G E;
-K_ORDER:       O R D E R;
-K_BY:          B Y;
-K_ASC:         A S C;
-K_DESC:        D E S C;
-K_ALLOW:       A L L O W;
-K_FILTERING:   F I L T E R I N G;
-K_IF:          I F;
-K_IS:          I S;
-K_CONTAINS:    C O N T A I N S;
-
-K_GRANT:       G R A N T;
-K_ALL:         A L L;
-K_PERMISSION:  P E R M I S S I O N;
-K_PERMISSIONS: P E R M I S S I O N S;
-K_OF:          O F;
-K_REVOKE:      R E V O K E;
-K_MODIFY:      M O D I F Y;
-K_AUTHORIZE:   A U T H O R I Z E;
-K_DESCRIBE:    D E S C R I B E;
-K_EXECUTE:     E X E C U T E;
-K_NORECURSIVE: N O R E C U R S I V E;
-
-K_USER:        U S E R;
-K_USERS:       U S E R S;
-K_ROLE:        R O L E;
-K_ROLES:       R O L E S;
-K_SUPERUSER:   S U P E R U S E R;
-K_NOSUPERUSER: N O S U P E R U S E R;
-K_PASSWORD:    P A S S W O R D;
-K_LOGIN:       L O G I N;
-K_NOLOGIN:     N O L O G I N;
-K_OPTIONS:     O P T I O N S;
-
-K_CLUSTERING:  C L U S T E R I N G;
-K_ASCII:       A S C I I;
-K_BIGINT:      B I G I N T;
-K_BLOB:        B L O B;
-K_BOOLEAN:     B O O L E A N;
-K_COUNTER:     C O U N T E R;
-K_DECIMAL:     D E C I M A L;
-K_DOUBLE:      D O U B L E;
-K_FLOAT:       F L O A T;
-K_INET:        I N E T;
-K_INT:         I N T;
-K_SMALLINT:    S M A L L I N T;
-K_TINYINT:     T I N Y I N T;
-K_TEXT:        T E X T;
-K_UUID:        U U I D;
-K_VARCHAR:     V A R C H A R;
-K_VARINT:      V A R I N T;
-K_TIMEUUID:    T I M E U U I D;
-K_TOKEN:       T O K E N;
-K_WRITETIME:   W R I T E T I M E;
-K_DATE:        D A T E;
-K_TIME:        T I M E;
-
-K_NULL:        N U L L;
-K_NOT:         N O T;
-K_EXISTS:      E X I S T S;
-
-K_MAP:         M A P;
-K_LIST:        L I S T;
-K_NAN:         N A N;
-K_INFINITY:    I N F I N I T Y;
-K_TUPLE:       T U P L E;
-
-K_TRIGGER:     T R I G G E R;
-K_STATIC:      S T A T I C;
-K_FROZEN:      F R O Z E N;
-
-K_FUNCTION:    F U N C T I O N;
-K_FUNCTIONS:   F U N C T I O N S;
-K_AGGREGATE:   A G G R E G A T E;
-K_SFUNC:       S F U N C;
-K_STYPE:       S T Y P E;
-K_FINALFUNC:   F I N A L F U N C;
-K_INITCOND:    I N I T C O N D;
-K_RETURNS:     R E T U R N S;
-K_CALLED:      C A L L E D;
-K_INPUT:       I N P U T;
-K_LANGUAGE:    L A N G U A G E;
-K_OR:          O R;
-K_REPLACE:     R E P L A C E;
-
-K_JSON:        J S O N;
-
-// Case-insensitive alpha characters
-fragment A: ('a'|'A');
-fragment B: ('b'|'B');
-fragment C: ('c'|'C');
-fragment D: ('d'|'D');
-fragment E: ('e'|'E');
-fragment F: ('f'|'F');
-fragment G: ('g'|'G');
-fragment H: ('h'|'H');
-fragment I: ('i'|'I');
-fragment J: ('j'|'J');
-fragment K: ('k'|'K');
-fragment L: ('l'|'L');
-fragment M: ('m'|'M');
-fragment N: ('n'|'N');
-fragment O: ('o'|'O');
-fragment P: ('p'|'P');
-fragment Q: ('q'|'Q');
-fragment R: ('r'|'R');
-fragment S: ('s'|'S');
-fragment T: ('t'|'T');
-fragment U: ('u'|'U');
-fragment V: ('v'|'V');
-fragment W: ('w'|'W');
-fragment X: ('x'|'X');
-fragment Y: ('y'|'Y');
-fragment Z: ('z'|'Z');
-
-STRING_LITERAL
-    @init{
-        StringBuilder txt = new StringBuilder(); // temporary to build pg-style-string
-    }
-    @after{ setText(txt.toString()); }
-    :
-      /* pg-style string literal */
-      (
-        '\$' '\$'
-        ( /* collect all input until '$$' is reached again */
-          {  (input.size() - input.index() > 1)
-               && !"$$".equals(input.substring(input.index(), input.index() + 1)) }?
-             => c=. { txt.appendCodePoint(c); }
-        )*
-        '\$' '\$'
-      )
-      |
-      /* conventional quoted string literal */
-      (
-        '\'' (c=~('\'') { txt.appendCodePoint(c);} | '\'' '\'' { txt.appendCodePoint('\''); })* '\''
-      )
-    ;
-
-QUOTED_NAME
-    @init{ StringBuilder b = new StringBuilder(); }
-    @after{ setText(b.toString()); }
-    : '\"' (c=~('\"') { b.appendCodePoint(c); } | '\"' '\"' { b.appendCodePoint('\"'); })+ '\"'
-    ;
-
-fragment DIGIT
-    : '0'..'9'
-    ;
-
-fragment LETTER
-    : ('A'..'Z' | 'a'..'z')
-    ;
-
-fragment HEX
-    : ('A'..'F' | 'a'..'f' | '0'..'9')
-    ;
-
-fragment EXPONENT
-    : E ('+' | '-')? DIGIT+
-    ;
-
-INTEGER
-    : '-'? DIGIT+
-    ;
-
-QMARK
-    : '?'
-    ;
-
-/*
- * Normally a lexer only emits one token at a time, but ours is tricked out
- * to support multiple (see @lexer::members near the top of the grammar).
- */
-FLOAT
-    : INTEGER EXPONENT
-    | INTEGER '.' DIGIT* EXPONENT?
-    ;
-
-/*
- * This has to be before IDENT so it takes precendence over it.
- */
-BOOLEAN
-    : T R U E | F A L S E
-    ;
-
-IDENT
-    : LETTER (LETTER | DIGIT | '_')*
-    ;
-
-HEXNUMBER
-    : '0' X HEX*
-    ;
-
-UUID
-    : HEX HEX HEX HEX HEX HEX HEX HEX '-'
-      HEX HEX HEX HEX '-'
-      HEX HEX HEX HEX '-'
-      HEX HEX HEX HEX '-'
-      HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX HEX
-    ;
-
-WS
-    : (' ' | '\t' | '\n' | '\r')+ { $channel = HIDDEN; }
-    ;
-
-COMMENT
-    : ('--' | '//') .* ('\n'|'\r') { $channel = HIDDEN; }
-    ;
-
-MULTILINE_COMMENT
-    : '/*' .* '*/' { $channel = HIDDEN; }
-    ;

diff --git a/src/java/org/apache/cassandra/auth/AllowAllAuthorizer.java b/src/java/org/apache/cassandra/auth/AllowAllAuthorizer.java
index bc6fee4..3b40979 100644
--- a/src/java/org/apache/cassandra/auth/AllowAllAuthorizer.java
+++ b/src/java/org/apache/cassandra/auth/AllowAllAuthorizer.java

@@ -22,6 +22,12 @@
 
 public class AllowAllAuthorizer implements IAuthorizer
 {
+    @Override
+    public boolean requireAuthorization()
+    {
+        return false;
+    }
+
     public Set<Permission> authorize(AuthenticatedUser user, IResource resource)
     {
         return resource.applicablePermissions();

diff --git a/src/java/org/apache/cassandra/auth/AuthCache.java b/src/java/org/apache/cassandra/auth/AuthCache.java
new file mode 100644
index 0000000..0d2a01e
--- /dev/null
+++ b/src/java/org/apache/cassandra/auth/AuthCache.java

@@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.auth;
+
+import java.lang.management.ManagementFactory;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.ThreadPoolExecutor;
+import java.util.concurrent.TimeUnit;
+import java.util.function.Consumer;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import javax.management.MBeanServer;
+import javax.management.MalformedObjectNameException;
+import javax.management.ObjectName;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.util.concurrent.ListenableFuture;
+import com.google.common.util.concurrent.ListenableFutureTask;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor;
+
+public class AuthCache<K, V> implements AuthCacheMBean
+{
+    private static final Logger logger = LoggerFactory.getLogger(AuthCache.class);
+
+    private static final String MBEAN_NAME_BASE = "org.apache.cassandra.auth:type=";
+
+    private volatile LoadingCache<K, V> cache;
+    private ThreadPoolExecutor cacheRefreshExecutor;
+
+    private final String name;
+    private final Consumer<Integer> setValidityDelegate;
+    private final Supplier<Integer> getValidityDelegate;
+    private final Consumer<Integer> setUpdateIntervalDelegate;
+    private final Supplier<Integer> getUpdateIntervalDelegate;
+    private final Consumer<Integer> setMaxEntriesDelegate;
+    private final Supplier<Integer> getMaxEntriesDelegate;
+    private final Function<K, V> loadFunction;
+    private final Supplier<Boolean> enableCache;
+
+    protected AuthCache(String name,
+                        Consumer<Integer> setValidityDelegate,
+                        Supplier<Integer> getValidityDelegate,
+                        Consumer<Integer> setUpdateIntervalDelegate,
+                        Supplier<Integer> getUpdateIntervalDelegate,
+                        Consumer<Integer> setMaxEntriesDelegate,
+                        Supplier<Integer> getMaxEntriesDelegate,
+                        Function<K, V> loadFunction,
+                        Supplier<Boolean> enableCache)
+    {
+        this.name = name;
+        this.setValidityDelegate = setValidityDelegate;
+        this.getValidityDelegate = getValidityDelegate;
+        this.setUpdateIntervalDelegate = setUpdateIntervalDelegate;
+        this.getUpdateIntervalDelegate = getUpdateIntervalDelegate;
+        this.setMaxEntriesDelegate = setMaxEntriesDelegate;
+        this.getMaxEntriesDelegate = getMaxEntriesDelegate;
+        this.loadFunction = loadFunction;
+        this.enableCache = enableCache;
+        init();
+    }
+
+    protected void init()
+    {
+        this.cacheRefreshExecutor = new DebuggableThreadPoolExecutor(name + "Refresh", Thread.NORM_PRIORITY);
+        this.cache = initCache(null);
+        try
+        {
+            MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
+            mbs.registerMBean(this, getObjectName());
+        }
+        catch (Exception e)
+        {
+            throw new RuntimeException(e);
+        }
+    }
+
+    protected ObjectName getObjectName() throws MalformedObjectNameException
+    {
+        return new ObjectName(MBEAN_NAME_BASE + name);
+    }
+
+    public V get(K k) throws ExecutionException
+    {
+        if (cache == null)
+            return loadFunction.apply(k);
+
+        return cache.get(k);
+    }
+
+    public void invalidate()
+    {
+        cache = initCache(null);
+    }
+
+    public void invalidate(K k)
+    {
+        if (cache != null)
+            cache.invalidate(k);
+    }
+
+    public void setValidity(int validityPeriod)
+    {
+        if (Boolean.getBoolean("cassandra.disable_auth_caches_remote_configuration"))
+            throw new UnsupportedOperationException("Remote configuration of auth caches is disabled");
+
+        setValidityDelegate.accept(validityPeriod);
+        cache = initCache(cache);
+    }
+
+    public int getValidity()
+    {
+        return getValidityDelegate.get();
+    }
+
+    public void setUpdateInterval(int updateInterval)
+    {
+        if (Boolean.getBoolean("cassandra.disable_auth_caches_remote_configuration"))
+            throw new UnsupportedOperationException("Remote configuration of auth caches is disabled");
+
+        setUpdateIntervalDelegate.accept(updateInterval);
+        cache = initCache(cache);
+    }
+
+    public int getUpdateInterval()
+    {
+        return getUpdateIntervalDelegate.get();
+    }
+
+    public void setMaxEntries(int maxEntries)
+    {
+        if (Boolean.getBoolean("cassandra.disable_auth_caches_remote_configuration"))
+            throw new UnsupportedOperationException("Remote configuration of auth caches is disabled");
+
+        setMaxEntriesDelegate.accept(maxEntries);
+        cache = initCache(cache);
+    }
+
+    public int getMaxEntries()
+    {
+        return getMaxEntriesDelegate.get();
+    }
+
+    private LoadingCache<K, V> initCache(LoadingCache<K, V> existing)
+    {
+        if (!enableCache.get())
+            return null;
+
+        if (getValidity() <= 0)
+            return null;
+
+        logger.info("(Re)initializing {} (validity period/update interval/max entries) ({}/{}/{})",
+                    name, getValidity(), getUpdateInterval(), getMaxEntries());
+
+        LoadingCache<K, V> newcache = CacheBuilder.newBuilder()
+                           .refreshAfterWrite(getUpdateInterval(), TimeUnit.MILLISECONDS)
+                           .expireAfterWrite(getValidity(), TimeUnit.MILLISECONDS)
+                           .maximumSize(getMaxEntries())
+                           .build(new CacheLoader<K, V>()
+                           {
+                               public V load(K k)
+                               {
+                                   return loadFunction.apply(k);
+                               }
+
+                               public ListenableFuture<V> reload(final K k, final V oldV)
+                               {
+                                   ListenableFutureTask<V> task = ListenableFutureTask.create(() -> {
+                                       try
+                                       {
+                                           return loadFunction.apply(k);
+                                       }
+                                       catch (Exception e)
+                                       {
+                                           logger.trace("Error performing async refresh of auth data in {}", name, e);
+                                           throw e;
+                                       }
+                                   });
+                                   cacheRefreshExecutor.execute(task);
+                                   return task;
+                               }
+                           });
+        if (existing != null)
+            newcache.putAll(existing.asMap());
+        return newcache;
+    }
+}

diff --git a/src/java/org/apache/cassandra/auth/AuthCacheMBean.java b/src/java/org/apache/cassandra/auth/AuthCacheMBean.java
new file mode 100644
index 0000000..43fb88e
--- /dev/null
+++ b/src/java/org/apache/cassandra/auth/AuthCacheMBean.java

@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.auth;
+
+public interface AuthCacheMBean
+{
+    public void invalidate();
+
+    public void setValidity(int validityPeriod);
+
+    public int getValidity();
+
+    public void setUpdateInterval(int updateInterval);
+
+    public int getUpdateInterval();
+
+    public void setMaxEntries(int maxEntries);
+
+    public int getMaxEntries();
+}

diff --git a/src/java/org/apache/cassandra/auth/CassandraAuthorizer.java b/src/java/org/apache/cassandra/auth/CassandraAuthorizer.java
index 37e01fc..619ecf8 100644
--- a/src/java/org/apache/cassandra/auth/CassandraAuthorizer.java
+++ b/src/java/org/apache/cassandra/auth/CassandraAuthorizer.java

@@ -41,6 +41,16 @@
 import org.apache.cassandra.serializers.SetSerializer;
 import org.apache.cassandra.serializers.UTF8Serializer;
 import org.apache.cassandra.service.ClientState;
+import java.lang.management.ManagementFactory;
+
+import javax.management.MBeanServer;
+import javax.management.MalformedObjectNameException;
+import javax.management.ObjectName;
+
+import org.apache.cassandra.cql3.QueryOptions;
+import org.apache.cassandra.cql3.QueryProcessor;
+import org.apache.cassandra.cql3.UntypedResultSet;
+import org.apache.cassandra.cql3.UntypedResultSet.Row;
 import org.apache.cassandra.service.QueryState;
 import org.apache.cassandra.transport.messages.ResultMessage;
 import org.apache.cassandra.utils.ByteBufferUtil;
@@ -269,7 +279,7 @@
                                        RoleResource grantee)
     throws RequestValidationException, RequestExecutionException
     {
-        if (!performer.isSuper() && !performer.getRoles().contains(grantee))
+        if (!(performer.isSuper() || performer.isSystem()) && !performer.getRoles().contains(grantee))
             throw new UnauthorizedException(String.format("You are not authorized to view %s's permissions",
                                                           grantee == null ? "everyone" : grantee.getRoleName()));
 

diff --git a/src/java/org/apache/cassandra/auth/CassandraLoginModule.java b/src/java/org/apache/cassandra/auth/CassandraLoginModule.java
new file mode 100644
index 0000000..2ccf962
--- /dev/null
+++ b/src/java/org/apache/cassandra/auth/CassandraLoginModule.java

@@ -0,0 +1,257 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.auth;
+
+import java.io.IOException;
+import java.util.HashMap;
+import java.util.Map;
+import javax.security.auth.Subject;
+import javax.security.auth.callback.*;
+import javax.security.auth.login.FailedLoginException;
+import javax.security.auth.login.LoginException;
+import javax.security.auth.spi.LoginModule;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.exceptions.AuthenticationException;
+import org.apache.cassandra.service.StorageService;
+
+/**
+ * LoginModule which authenticates a user towards the Cassandra database using
+ * the internal authentication mechanism.
+ */
+public class CassandraLoginModule implements LoginModule
+{
+    private static final Logger logger = LoggerFactory.getLogger(CassandraLoginModule.class);
+
+    // initial state
+    private Subject subject;
+    private CallbackHandler callbackHandler;
+
+    // the authentication status
+    private boolean succeeded = false;
+    private boolean commitSucceeded = false;
+
+    // username and password
+    private String username;
+    private char[] password;
+
+    private CassandraPrincipal principal;
+
+    /**
+     * Initialize this {@code}LoginModule{@code}.
+     *
+     * @param subject the {@code}Subject{@code} to be authenticated. <p>
+     * @param callbackHandler a {@code}CallbackHandler{@code} for communicating
+     *        with the end user (prompting for user names and passwords, for example)
+     * @param sharedState shared {@code}LoginModule{@code} state. This param is unused.
+     * @param options options specified in the login {@code}Configuration{@code} for this particular
+     *        {@code}LoginModule{@code}. This param is unused
+     */
+    @Override
+    public void initialize(Subject subject,
+                           CallbackHandler callbackHandler,
+                           Map<java.lang.String, ?> sharedState,
+                           Map<java.lang.String, ?> options)
+    {
+        this.subject = subject;
+        this.callbackHandler = callbackHandler;
+    }
+
+    /**
+     * Authenticate the user, obtaining credentials from the CallbackHandler
+     * supplied in {@code}initialize{@code}. As long as the configured
+     * {@code}IAuthenticator{@code} supports the optional
+     * {@code}legacyAuthenticate{@code} method, it can be used here.
+     *
+     * @return true in all cases since this {@code}LoginModule{@code}
+     *         should not be ignored.
+     * @exception FailedLoginException if the authentication fails.
+     * @exception LoginException if this {@code}LoginModule{@code} is unable to
+     * perform the authentication.
+     */
+    @Override
+    public boolean login() throws LoginException
+    {
+        // prompt for a user name and password
+        if (callbackHandler == null)
+        {
+            logger.info("No CallbackHandler available for authentication");
+            throw new LoginException("Authentication failed");
+        }
+
+        NameCallback nc = new NameCallback("username: ");
+        PasswordCallback pc = new PasswordCallback("password: ", false);
+        try
+        {
+            callbackHandler.handle(new Callback[]{nc, pc});
+            username = nc.getName();
+            char[] tmpPassword = pc.getPassword();
+            if (tmpPassword == null)
+                tmpPassword = new char[0];
+            password = new char[tmpPassword.length];
+            System.arraycopy(tmpPassword, 0, password, 0, tmpPassword.length);
+            pc.clearPassword();
+        }
+        catch (IOException | UnsupportedCallbackException e)
+        {
+            logger.info("Unexpected exception processing authentication callbacks", e);
+            throw new LoginException("Authentication failed");
+        }
+
+        // verify the credentials
+        try
+        {
+            authenticate();
+        }
+        catch (AuthenticationException e)
+        {
+            // authentication failed -- clean up
+            succeeded = false;
+            cleanUpInternalState();
+            throw new FailedLoginException(e.getMessage());
+        }
+
+        succeeded = true;
+        return true;
+    }
+
+    private void authenticate()
+    {
+        if (!StorageService.instance.isAuthSetupComplete())
+            throw new AuthenticationException("Cannot login as server authentication setup is not yet completed");
+
+        IAuthenticator authenticator = DatabaseDescriptor.getAuthenticator();
+        Map<String, String> credentials = new HashMap<>();
+        credentials.put(PasswordAuthenticator.USERNAME_KEY, username);
+        credentials.put(PasswordAuthenticator.PASSWORD_KEY, String.valueOf(password));
+        AuthenticatedUser user = authenticator.legacyAuthenticate(credentials);
+        // Only actual users should be allowed to authenticate for JMX
+        if (user.isAnonymous() || user.isSystem())
+            throw new AuthenticationException("Invalid user");
+
+        // The LOGIN privilege is required to authenticate - c.f. ClientState::login
+        if (!DatabaseDescriptor.getRoleManager().canLogin(user.getPrimaryRole()))
+            throw new AuthenticationException(user.getName() + " is not permitted to log in");
+    }
+
+    /**
+     * This method is called if the LoginContext's overall authentication succeeded
+     * (the relevant REQUIRED, REQUISITE, SUFFICIENT and OPTIONAL LoginModules
+     * succeeded).
+     *
+     * If this LoginModule's own authentication attempt succeeded (checked by
+     * retrieving the private state saved by the {@code}login{@code} method),
+     * then this method associates a {@code}CassandraPrincipal{@code}
+     * with the {@code}Subject{@code}.
+     * If this LoginModule's own authentication attempted failed, then this
+     * method removes any state that was originally saved.
+     *
+     * @return true if this LoginModule's own login and commit attempts succeeded, false otherwise.
+     * @exception LoginException if the commit fails.
+     */
+    @Override
+    public boolean commit() throws LoginException
+    {
+        if (!succeeded)
+        {
+            return false;
+        }
+        else
+        {
+            // add a Principal (authenticated identity)
+            // to the Subject
+            principal = new CassandraPrincipal(username);
+            if (!subject.getPrincipals().contains(principal))
+                subject.getPrincipals().add(principal);
+
+            cleanUpInternalState();
+            commitSucceeded = true;
+            return true;
+        }
+    }
+
+    /**
+     * This method is called if the LoginContext's  overall authentication failed.
+     * (the relevant REQUIRED, REQUISITE, SUFFICIENT and OPTIONAL LoginModules
+     * did not succeed).
+     *
+     * If this LoginModule's own authentication attempt succeeded (checked by
+     * retrieving the private state saved by the {@code}login{@code} and
+     * {@code}commit{@code} methods), then this method cleans up any state that
+     * was originally saved.
+     *
+     * @return false if this LoginModule's own login and/or commit attempts failed, true otherwise.
+     * @throws LoginException if the abort fails.
+     */
+    @Override
+    public boolean abort() throws LoginException
+    {
+        if (!succeeded)
+        {
+            return false;
+        }
+        else if (!commitSucceeded)
+        {
+            // login succeeded but overall authentication failed
+            succeeded = false;
+            cleanUpInternalState();
+            principal = null;
+        }
+        else
+        {
+            // overall authentication succeeded and commit succeeded,
+            // but someone else's commit failed
+            logout();
+        }
+        return true;
+    }
+
+    /**
+     * Logout the user.
+     *
+     * This method removes the principal that was added by the
+     * {@code}commit{@code} method.
+     *
+     * @return true in all cases since this {@code}LoginModule{@code}
+     *         should not be ignored.
+     * @throws LoginException if the logout fails.
+     */
+    @Override
+    public boolean logout() throws LoginException
+    {
+        subject.getPrincipals().remove(principal);
+        succeeded = false;
+        cleanUpInternalState();
+        principal = null;
+        return true;
+    }
+
+    private void cleanUpInternalState()
+    {
+        username = null;
+        if (password != null)
+        {
+            for (int i = 0; i < password.length; i++)
+                password[i] = ' ';
+            password = null;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/auth/CassandraPrincipal.java b/src/java/org/apache/cassandra/auth/CassandraPrincipal.java
new file mode 100644
index 0000000..41de802
--- /dev/null
+++ b/src/java/org/apache/cassandra/auth/CassandraPrincipal.java

@@ -0,0 +1,134 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.auth;
+
+import java.io.Serializable;
+import java.security.Principal;
+
+/**
+ * <p> This class implements the <code>Principal</code> interface
+ * and represents a user.
+ *
+ * <p> Principals such as this <code>CassPrincipal</code>
+ * may be associated with a particular <code>Subject</code>
+ * to augment that <code>Subject</code> with an additional
+ * identity.  Refer to the <code>Subject</code> class for more information
+ * on how to achieve this.  Authorization decisions can then be based upon
+ * the Principals associated with a <code>Subject</code>.
+ *
+ * @see java.security.Principal
+ * @see javax.security.auth.Subject
+ */
+public class CassandraPrincipal implements Principal, Serializable
+{
+
+    /**
+     *
+     */
+    private static final long serialVersionUID = 1L;
+    private final String name;
+
+    /**
+     * Create a CassPrincipal with a username.
+     *
+     * <p>
+     *
+     * @param name the username for this user.
+     *
+     * @exception NullPointerException if the <code>name</code>
+     *                  is <code>null</code>.
+     */
+    public CassandraPrincipal(String name)
+    {
+        if (name == null)
+            throw new NullPointerException("illegal null input");
+
+        this.name = name;
+    }
+
+    /**
+     * Return the username for this <code>CassPrincipal</code>.
+     *
+     * <p>
+     *
+     * @return the username for this <code>CassPrincipal</code>
+     */
+    @Override
+    public String getName()
+    {
+        return name;
+    }
+
+    /**
+     * Return a string representation of this <code>CassPrincipal</code>.
+     *
+     * <p>
+     *
+     * @return a string representation of this <code>CassPrincipal</code>.
+     */
+    @Override
+    public String toString()
+    {
+        return ("CassandraPrincipal:  " + name);
+    }
+
+    /**
+     * Compares the specified Object with this <code>CassPrincipal</code>
+     * for equality.  Returns true if the given object is also a
+     * <code>CassPrincipal</code> and the two CassPrincipals
+     * have the same username.
+     *
+     * <p>
+     *
+     * @param o Object to be compared for equality with this
+     *          <code>CassPrincipal</code>.
+     *
+     * @return true if the specified Object is equal equal to this
+     *          <code>CassPrincipal</code>.
+     */
+    @Override
+    public boolean equals(Object o)
+    {
+        if (o == null)
+            return false;
+
+        if (this == o)
+            return true;
+
+        if (!(o instanceof CassandraPrincipal))
+            return false;
+        CassandraPrincipal that = (CassandraPrincipal) o;
+
+        if (this.getName().equals(that.getName()))
+            return true;
+        return false;
+    }
+
+    /**
+     * Return a hash code for this <code>CassPrincipal</code>.
+     *
+     * <p>
+     *
+     * @return a hash code for this <code>CassPrincipal</code>.
+     */
+    @Override
+    public int hashCode()
+    {
+        return name.hashCode();
+    }
+}

diff --git a/src/java/org/apache/cassandra/auth/CassandraRoleManager.java b/src/java/org/apache/cassandra/auth/CassandraRoleManager.java
index b34b648..826e89d 100644
--- a/src/java/org/apache/cassandra/auth/CassandraRoleManager.java
+++ b/src/java/org/apache/cassandra/auth/CassandraRoleManager.java

@@ -64,7 +64,7 @@
  * in CREATE/ALTER ROLE statements.
  *
  * Such a configuration could be implemented using a custom IRoleManager that
- * extends CassandraRoleManager and which includes Option.PASSWORD in the Set<Option>
+ * extends CassandraRoleManager and which includes Option.PASSWORD in the {@code Set<Option>}
  * returned from supportedOptions/alterableOptions. Any additional processing
  * of the password itself (such as storing it in an alternative location) would
  * be added in overridden createRole and alterRole implementations.

diff --git a/src/java/org/apache/cassandra/auth/DataResource.java b/src/java/org/apache/cassandra/auth/DataResource.java
index f64ed93..0aa24db 100644
--- a/src/java/org/apache/cassandra/auth/DataResource.java
+++ b/src/java/org/apache/cassandra/auth/DataResource.java

@@ -54,31 +54,22 @@
                                                                                             Permission.MODIFY,
                                                                                             Permission.AUTHORIZE);
     private static final String ROOT_NAME = "data";
-    private static final DataResource ROOT_RESOURCE = new DataResource();
+    private static final DataResource ROOT_RESOURCE = new DataResource(Level.ROOT, null, null);
 
     private final Level level;
     private final String keyspace;
     private final String table;
 
-    private DataResource()
-    {
-        level = Level.ROOT;
-        keyspace = null;
-        table = null;
-    }
+    // memoized hashcode since DataRessource is immutable and used in hashmaps often
+    private final transient int hash;
 
-    private DataResource(String keyspace)
+    private DataResource(Level level, String keyspace, String table)
     {
-        level = Level.KEYSPACE;
-        this.keyspace = keyspace;
-        table = null;
-    }
-
-    private DataResource(String keyspace, String table)
-    {
-        level = Level.TABLE;
+        this.level = level;
         this.keyspace = keyspace;
         this.table = table;
+
+        this.hash = Objects.hashCode(level, keyspace, table);
     }
 
     /**
@@ -97,7 +88,7 @@
      */
     public static DataResource keyspace(String keyspace)
     {
-        return new DataResource(keyspace);
+        return new DataResource(Level.KEYSPACE, keyspace, null);
     }
 
     /**
@@ -109,7 +100,7 @@
      */
     public static DataResource table(String keyspace, String table)
     {
-        return new DataResource(keyspace, table);
+        return new DataResource(Level.TABLE, keyspace, table);
     }
 
     /**
@@ -272,6 +263,6 @@
     @Override
     public int hashCode()
     {
-        return Objects.hashCode(level, keyspace, table);
+        return hash;
     }
 }

diff --git a/src/java/org/apache/cassandra/auth/IAuthorizer.java b/src/java/org/apache/cassandra/auth/IAuthorizer.java
index 01c05af..a023e3e 100644
--- a/src/java/org/apache/cassandra/auth/IAuthorizer.java
+++ b/src/java/org/apache/cassandra/auth/IAuthorizer.java

@@ -29,6 +29,15 @@
 public interface IAuthorizer
 {
     /**
+     * Whether or not the authorizer will attempt authorization.
+     * If false the authorizer will not be called for authorization of resources.
+     */
+    default boolean requireAuthorization()
+    {
+        return true;
+    }
+
+    /**
      * Returns a set of permissions of a user on a resource.
      * Since Roles were introduced in version 2.2, Cassandra does not distinguish in any
      * meaningful way between users and roles. A role may or may not have login privileges

diff --git a/src/java/org/apache/cassandra/auth/IRoleManager.java b/src/java/org/apache/cassandra/auth/IRoleManager.java
index 5afc7f3..b27681d 100644
--- a/src/java/org/apache/cassandra/auth/IRoleManager.java
+++ b/src/java/org/apache/cassandra/auth/IRoleManager.java

@@ -170,7 +170,7 @@
 
     /**
      * Where an implementation supports OPTIONS in CREATE and ALTER operations
-     * this method should return the Map<String, String> representing the custom
+     * this method should return the {@code Map<String, String>} representing the custom
      * options associated with the role, as supplied to CREATE or ALTER.
      * It should never return null; if the implementation does not support
      * OPTIONS or if none were supplied then it should return an empty map.

diff --git a/src/java/org/apache/cassandra/auth/JMXResource.java b/src/java/org/apache/cassandra/auth/JMXResource.java
new file mode 100644
index 0000000..cb0ac41
--- /dev/null
+++ b/src/java/org/apache/cassandra/auth/JMXResource.java

@@ -0,0 +1,183 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.auth;
+
+import java.lang.management.ManagementFactory;
+import java.util.Set;
+import javax.management.MBeanServer;
+import javax.management.MalformedObjectNameException;
+import javax.management.ObjectName;
+
+import com.google.common.base.Objects;
+import com.google.common.collect.Sets;
+import org.apache.commons.lang3.StringUtils;
+
+public class JMXResource implements IResource
+{
+    enum Level
+    {
+        ROOT, MBEAN
+    }
+
+    private static final String ROOT_NAME = "mbean";
+    private static final JMXResource ROOT_RESOURCE = new JMXResource();
+    private final Level level;
+    private final String name;
+
+    // permissions which may be granted on Mbeans
+    private static final Set<Permission> JMX_PERMISSIONS = Sets.immutableEnumSet(Permission.AUTHORIZE,
+                                                                                 Permission.DESCRIBE,
+                                                                                 Permission.EXECUTE,
+                                                                                 Permission.MODIFY,
+                                                                                 Permission.SELECT);
+
+    private JMXResource()
+    {
+        level = Level.ROOT;
+        name = null;
+    }
+
+    private JMXResource(String name)
+    {
+        this.name = name;
+        level = Level.MBEAN;
+    }
+
+    public static JMXResource mbean(String name)
+    {
+        return new JMXResource(name);
+    }
+
+    /**
+     * Parses a role resource name into a RoleResource instance.
+     *
+     * @param name Name of the data resource.
+     * @return RoleResource instance matching the name.
+     */
+    public static JMXResource fromName(String name)
+    {
+        String[] parts = StringUtils.split(name, '/');
+
+        if (!parts[0].equals(ROOT_NAME) || parts.length > 2)
+            throw new IllegalArgumentException(String.format("%s is not a valid JMX resource name", name));
+
+        if (parts.length == 1)
+            return root();
+
+        return mbean(parts[1]);
+    }
+
+    @Override
+    public String getName()
+    {
+        if (level == Level.ROOT)
+            return ROOT_NAME;
+        else if (level == Level.MBEAN)
+            return String.format("%s/%s", ROOT_NAME, name);
+        throw new AssertionError();
+    }
+
+    /**
+     * @return for a non-root resource, return the short form of the resource name which represents an ObjectName
+     * (which may be of the pattern or exact kind). i.e. not the full "root/name" version returned by getName().
+     * Throws IllegalStateException if called on the root-level resource.
+     */
+    public String getObjectName()
+    {
+        if (level == Level.ROOT)
+            throw new IllegalStateException(String.format("%s JMX resource has no object name", level));
+        return name;
+    }
+
+    /**
+     * @return the root-level resource.
+     */
+    public static JMXResource root()
+    {
+        return ROOT_RESOURCE;
+    }
+
+    @Override
+    public IResource getParent()
+    {
+        if (level == Level.MBEAN)
+            return root();
+        throw new IllegalStateException("Root-level resource can't have a parent");
+    }
+
+    /**
+     * @return Whether or not the resource has a parent in the hierarchy.
+     */
+    @Override
+    public boolean hasParent()
+    {
+        return !level.equals(Level.ROOT);
+    }
+
+    @Override
+    public boolean exists()
+    {
+        if (!hasParent())
+            return true;
+        MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
+        try
+        {
+            return !(mbs.queryNames(new ObjectName(name), null).isEmpty());
+        }
+        catch (MalformedObjectNameException e)
+        {
+            return false;
+        }
+        catch (NullPointerException e)
+        {
+            return false;
+        }
+    }
+
+    @Override
+    public Set<Permission> applicablePermissions()
+    {
+        return JMX_PERMISSIONS;
+    }
+
+    @Override
+    public String toString()
+    {
+        return level == Level.ROOT ? "<all mbeans>" : String.format("<mbean %s>", name);
+    }
+
+    @Override
+    public boolean equals(Object o)
+    {
+        if (this == o)
+            return true;
+
+        if (!(o instanceof JMXResource))
+            return false;
+
+        JMXResource j = (JMXResource) o;
+
+        return Objects.equal(level, j.level) && Objects.equal(name, j.name);
+    }
+
+    @Override
+    public int hashCode()
+    {
+        return Objects.hashCode(level, name);
+    }
+}

diff --git a/src/java/org/apache/cassandra/auth/PasswordAuthenticator.java b/src/java/org/apache/cassandra/auth/PasswordAuthenticator.java
index 0482199..3714523 100644
--- a/src/java/org/apache/cassandra/auth/PasswordAuthenticator.java
+++ b/src/java/org/apache/cassandra/auth/PasswordAuthenticator.java

@@ -22,18 +22,23 @@
 import java.util.Arrays;
 import java.util.Map;
 import java.util.Set;
+import java.util.concurrent.ExecutionException;
 
 import com.google.common.collect.ImmutableSet;
 import com.google.common.collect.Lists;
+import com.google.common.util.concurrent.UncheckedExecutionException;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.cql3.QueryOptions;
 import org.apache.cassandra.cql3.QueryProcessor;
 import org.apache.cassandra.cql3.UntypedResultSet;
 import org.apache.cassandra.cql3.statements.SelectStatement;
-import org.apache.cassandra.exceptions.*;
+import org.apache.cassandra.exceptions.AuthenticationException;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.exceptions.RequestExecutionException;
 import org.apache.cassandra.service.ClientState;
 import org.apache.cassandra.service.QueryState;
 import org.apache.cassandra.transport.messages.ResultMessage;
@@ -68,6 +73,8 @@
     public static final String LEGACY_CREDENTIALS_TABLE = "credentials";
     private SelectStatement legacyAuthenticateStatement;
 
+    private CredentialsCache cache;
+
     // No anonymous access.
     public boolean requireAuthentication()
     {
@@ -78,17 +85,60 @@
     {
         try
         {
+            String hash = cache.get(username);
+            if (!BCrypt.checkpw(password, hash))
+                throw new AuthenticationException("Username and/or password are incorrect");
+
+            return new AuthenticatedUser(username);
+        }
+        catch (ExecutionException | UncheckedExecutionException e)
+        {
+            // the credentials were somehow invalid - either a non-existent role, or one without a defined password
+            if (e.getCause() instanceof NoSuchCredentialsException)
+                throw new AuthenticationException("Username and/or password are incorrect");
+
+            // an unanticipated exception occured whilst querying the credentials table
+            if (e.getCause() instanceof RequestExecutionException)
+            {
+                logger.trace("Error performing internal authentication", e);
+                throw new AuthenticationException(e.getMessage());
+            }
+
+            throw new RuntimeException(e);
+        }
+    }
+
+    private String queryHashedPassword(String username) throws NoSuchCredentialsException
+    {
+        try
+        {
             // If the legacy users table exists try to verify credentials there. This is to handle the case
             // where the cluster is being upgraded and so is running with mixed versions of the authn tables
             SelectStatement authenticationStatement = Schema.instance.getCFMetaData(AuthKeyspace.NAME, LEGACY_CREDENTIALS_TABLE) == null
                                                     ? authenticateStatement
                                                     : legacyAuthenticateStatement;
-            return doAuthenticate(username, password, authenticationStatement);
+
+            ResultMessage.Rows rows =
+                authenticationStatement.execute(QueryState.forInternalCalls(),
+                                                QueryOptions.forInternalCalls(consistencyForRole(username),
+                                                                              Lists.newArrayList(ByteBufferUtil.bytes(username))));
+
+            // If either a non-existent role name was supplied, or no credentials
+            // were found for that role we don't want to cache the result so we throw
+            // a specific, but unchecked, exception to keep LoadingCache happy.
+            if (rows.result.isEmpty())
+                throw new NoSuchCredentialsException();
+
+            UntypedResultSet result = UntypedResultSet.create(rows.result);
+            if (!result.one().has(SALTED_HASH))
+                throw new NoSuchCredentialsException();
+
+            return result.one().getString(SALTED_HASH);
         }
         catch (RequestExecutionException e)
         {
             logger.trace("Error performing internal authentication", e);
-            throw new AuthenticationException(e.toString());
+            throw e;
         }
     }
 
@@ -118,6 +168,8 @@
                                   LEGACY_CREDENTIALS_TABLE);
             legacyAuthenticateStatement = prepare(query);
         }
+
+        cache = new CredentialsCache(this);
     }
 
     public AuthenticatedUser legacyAuthenticate(Map<String, String> credentials) throws AuthenticationException
@@ -138,21 +190,7 @@
         return new PlainTextSaslAuthenticator();
     }
 
-    private AuthenticatedUser doAuthenticate(String username, String password, SelectStatement authenticationStatement)
-    throws RequestExecutionException, AuthenticationException
-    {
-        ResultMessage.Rows rows = authenticationStatement.execute(QueryState.forInternalCalls(),
-                                                                  QueryOptions.forInternalCalls(consistencyForRole(username),
-                                                                                                Lists.newArrayList(ByteBufferUtil.bytes(username))));
-        UntypedResultSet result = UntypedResultSet.create(rows.result);
-
-        if ((result.isEmpty() || !result.one().has(SALTED_HASH)) || !BCrypt.checkpw(password, result.one().getString(SALTED_HASH)))
-            throw new AuthenticationException("Username and/or password are incorrect");
-
-        return new AuthenticatedUser(username);
-    }
-
-    private SelectStatement prepare(String query)
+    private static SelectStatement prepare(String query)
     {
         return (SelectStatement) QueryProcessor.getStatement(query, ClientState.forInternalCalls()).statement;
     }
@@ -191,9 +229,8 @@
          * a user being authorized to act on behalf of another with this IAuthenticator).
          *
          * @param bytes encoded credentials string sent by the client
-         * @return map containing the username/password pairs in the form an IAuthenticator
-         * would expect
-         * @throws javax.security.sasl.SaslException
+         * @throws org.apache.cassandra.exceptions.AuthenticationException if either the
+         *         authnId or password is null
          */
         private void decodeCredentials(byte[] bytes) throws AuthenticationException
         {
@@ -213,13 +250,45 @@
                 }
             }
 
-            if (user == null)
-                throw new AuthenticationException("Authentication ID must not be null");
             if (pass == null)
                 throw new AuthenticationException("Password must not be null");
+            if (user == null)
+                throw new AuthenticationException("Authentication ID must not be null");
 
             username = new String(user, StandardCharsets.UTF_8);
             password = new String(pass, StandardCharsets.UTF_8);
         }
     }
+
+    private static class CredentialsCache extends AuthCache<String, String> implements CredentialsCacheMBean
+    {
+        private CredentialsCache(PasswordAuthenticator authenticator)
+        {
+            super("CredentialsCache",
+                  DatabaseDescriptor::setCredentialsValidity,
+                  DatabaseDescriptor::getCredentialsValidity,
+                  DatabaseDescriptor::setCredentialsUpdateInterval,
+                  DatabaseDescriptor::getCredentialsUpdateInterval,
+                  DatabaseDescriptor::setCredentialsCacheMaxEntries,
+                  DatabaseDescriptor::getCredentialsCacheMaxEntries,
+                  authenticator::queryHashedPassword,
+                  () -> true);
+        }
+
+        public void invalidateCredentials(String roleName)
+        {
+            invalidate(roleName);
+        }
+    }
+
+    public static interface CredentialsCacheMBean extends AuthCacheMBean
+    {
+        public void invalidateCredentials(String roleName);
+    }
+
+    // Just a marker so we can identify that invalid credentials were the
+    // cause of a loading exception from the cache
+    private static final class NoSuchCredentialsException extends RuntimeException
+    {
+    }
 }

diff --git a/src/java/org/apache/cassandra/auth/PermissionsCache.java b/src/java/org/apache/cassandra/auth/PermissionsCache.java
index 8746b36..875c473 100644
--- a/src/java/org/apache/cassandra/auth/PermissionsCache.java
+++ b/src/java/org/apache/cassandra/auth/PermissionsCache.java

@@ -17,137 +17,36 @@
  */
 package org.apache.cassandra.auth;
 
-import java.lang.management.ManagementFactory;
 import java.util.Set;
-import java.util.concurrent.*;
+import java.util.concurrent.ExecutionException;
 
 import org.apache.cassandra.config.DatabaseDescriptor;
-import com.google.common.cache.CacheBuilder;
-import com.google.common.cache.CacheLoader;
-import com.google.common.cache.LoadingCache;
-import com.google.common.util.concurrent.ListenableFuture;
-import com.google.common.util.concurrent.ListenableFutureTask;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor;
 import org.apache.cassandra.utils.Pair;
 
-import javax.management.MBeanServer;
-import javax.management.ObjectName;
-
-public class PermissionsCache implements PermissionsCacheMBean
+public class PermissionsCache extends AuthCache<Pair<AuthenticatedUser, IResource>, Set<Permission>> implements PermissionsCacheMBean
 {
-    private static final Logger logger = LoggerFactory.getLogger(PermissionsCache.class);
-
-    private final String MBEAN_NAME = "org.apache.cassandra.auth:type=PermissionsCache";
-
-    private final ThreadPoolExecutor cacheRefreshExecutor = new DebuggableThreadPoolExecutor("PermissionsCacheRefresh",
-                                                                                             Thread.NORM_PRIORITY);
-    private final IAuthorizer authorizer;
-    private volatile LoadingCache<Pair<AuthenticatedUser, IResource>, Set<Permission>> cache;
-
     public PermissionsCache(IAuthorizer authorizer)
     {
-        this.authorizer = authorizer;
-        this.cache = initCache(null);
-        try
-        {
-            MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
-            mbs.registerMBean(this, new ObjectName(MBEAN_NAME));
-        }
-        catch (Exception e)
-        {
-            throw new RuntimeException(e);
-        }
+        super("PermissionsCache",
+              DatabaseDescriptor::setPermissionsValidity,
+              DatabaseDescriptor::getPermissionsValidity,
+              DatabaseDescriptor::setPermissionsUpdateInterval,
+              DatabaseDescriptor::getPermissionsUpdateInterval,
+              DatabaseDescriptor::setPermissionsCacheMaxEntries,
+              DatabaseDescriptor::getPermissionsCacheMaxEntries,
+              (p) -> authorizer.authorize(p.left, p.right),
+              () -> DatabaseDescriptor.getAuthorizer().requireAuthorization());
     }
 
     public Set<Permission> getPermissions(AuthenticatedUser user, IResource resource)
     {
-        if (cache == null)
-            return authorizer.authorize(user, resource);
-
         try
         {
-            return cache.get(Pair.create(user, resource));
+            return get(Pair.create(user, resource));
         }
         catch (ExecutionException e)
         {
             throw new RuntimeException(e);
         }
     }
-
-    public void invalidate()
-    {
-        cache = initCache(null);
-    }
-
-    public void setValidity(int validityPeriod)
-    {
-        DatabaseDescriptor.setPermissionsValidity(validityPeriod);
-        cache = initCache(cache);
-    }
-
-    public int getValidity()
-    {
-        return DatabaseDescriptor.getPermissionsValidity();
-    }
-
-    public void setUpdateInterval(int updateInterval)
-    {
-        DatabaseDescriptor.setPermissionsUpdateInterval(updateInterval);
-        cache = initCache(cache);
-    }
-
-    public int getUpdateInterval()
-    {
-        return DatabaseDescriptor.getPermissionsUpdateInterval();
-    }
-
-    private LoadingCache<Pair<AuthenticatedUser, IResource>, Set<Permission>> initCache(
-                                                             LoadingCache<Pair<AuthenticatedUser, IResource>, Set<Permission>> existing)
-    {
-        if (authorizer instanceof AllowAllAuthorizer)
-            return null;
-
-        if (DatabaseDescriptor.getPermissionsValidity() <= 0)
-            return null;
-
-        LoadingCache<Pair<AuthenticatedUser, IResource>, Set<Permission>> newcache = CacheBuilder.newBuilder()
-                           .refreshAfterWrite(DatabaseDescriptor.getPermissionsUpdateInterval(), TimeUnit.MILLISECONDS)
-                           .expireAfterWrite(DatabaseDescriptor.getPermissionsValidity(), TimeUnit.MILLISECONDS)
-                           .maximumSize(DatabaseDescriptor.getPermissionsCacheMaxEntries())
-                           .build(new CacheLoader<Pair<AuthenticatedUser, IResource>, Set<Permission>>()
-                           {
-                               public Set<Permission> load(Pair<AuthenticatedUser, IResource> userResource)
-                               {
-                                   return authorizer.authorize(userResource.left, userResource.right);
-                               }
-
-                               public ListenableFuture<Set<Permission>> reload(final Pair<AuthenticatedUser, IResource> userResource,
-                                                                               final Set<Permission> oldValue)
-                               {
-                                   ListenableFutureTask<Set<Permission>> task = ListenableFutureTask.create(new Callable<Set<Permission>>()
-                                   {
-                                       public Set<Permission>call() throws Exception
-                                       {
-                                           try
-                                           {
-                                               return authorizer.authorize(userResource.left, userResource.right);
-                                           }
-                                           catch (Exception e)
-                                           {
-                                               logger.trace("Error performing async refresh of user permissions", e);
-                                               throw e;
-                                           }
-                                       }
-                                   });
-                                   cacheRefreshExecutor.execute(task);
-                                   return task;
-                               }
-                           });
-        if (existing != null)
-            newcache.putAll(existing.asMap());
-        return newcache;
-    }
 }

diff --git a/src/java/org/apache/cassandra/auth/PermissionsCacheMBean.java b/src/java/org/apache/cassandra/auth/PermissionsCacheMBean.java
index d07c98f..d370d06 100644
--- a/src/java/org/apache/cassandra/auth/PermissionsCacheMBean.java
+++ b/src/java/org/apache/cassandra/auth/PermissionsCacheMBean.java

@@ -17,15 +17,10 @@
  */
 package org.apache.cassandra.auth;
 
-public interface PermissionsCacheMBean
+/**
+ * Retained since CASSANDRA-7715 for backwards compatibility of MBean interface
+ * classes. This should be removed in the next major version (4.0)
+ */
+public interface PermissionsCacheMBean extends AuthCacheMBean
 {
-    public void invalidate();
-
-    public void setValidity(int validityPeriod);
-
-    public int getValidity();
-
-    public void setUpdateInterval(int updateInterval);
-
-    public int getUpdateInterval();
 }

diff --git a/src/java/org/apache/cassandra/auth/Resources.java b/src/java/org/apache/cassandra/auth/Resources.java
index ebcfc16..653cd46 100644
--- a/src/java/org/apache/cassandra/auth/Resources.java
+++ b/src/java/org/apache/cassandra/auth/Resources.java

@@ -58,6 +58,8 @@
             return DataResource.fromName(name);
         else if (name.startsWith(FunctionResource.root().getName()))
             return FunctionResource.fromName(name);
+        else if (name.startsWith(JMXResource.root().getName()))
+            return JMXResource.fromName(name);
         else
             throw new IllegalArgumentException(String.format("Name %s is not valid for any resource type", name));
     }

diff --git a/src/java/org/apache/cassandra/auth/RoleOptions.java b/src/java/org/apache/cassandra/auth/RoleOptions.java
index 9609ff3..1205d34 100644
--- a/src/java/org/apache/cassandra/auth/RoleOptions.java
+++ b/src/java/org/apache/cassandra/auth/RoleOptions.java

@@ -90,7 +90,7 @@
     }
 
     /**
-     * Return a Map<String, String> representing custom options
+     * Return a {@code Map<String, String>} representing custom options
      * It is the responsiblity of IRoleManager implementations which support
      * IRoleManager.Option.OPTION to handle type checking and conversion of these
      * values, if present

diff --git a/src/java/org/apache/cassandra/auth/Roles.java b/src/java/org/apache/cassandra/auth/Roles.java
index da6804b..854d1f2 100644
--- a/src/java/org/apache/cassandra/auth/Roles.java
+++ b/src/java/org/apache/cassandra/auth/Roles.java

@@ -28,7 +28,7 @@
     /**
      * Get all roles granted to the supplied Role, including both directly granted
      * and inherited roles.
-     * The returned roles may be cached if roles_validity_in_ms > 0
+     * The returned roles may be cached if {@code roles_validity_in_ms > 0}
      *
      * @param primaryRole the Role
      * @return set of all granted Roles for the primary Role

diff --git a/src/java/org/apache/cassandra/auth/RolesCache.java b/src/java/org/apache/cassandra/auth/RolesCache.java
index 2694173..8b9c322 100644
--- a/src/java/org/apache/cassandra/auth/RolesCache.java
+++ b/src/java/org/apache/cassandra/auth/RolesCache.java

@@ -17,135 +17,35 @@
  */
 package org.apache.cassandra.auth;
 
-import java.lang.management.ManagementFactory;
 import java.util.Set;
-import java.util.concurrent.*;
+import java.util.concurrent.ExecutionException;
 
-import com.google.common.cache.CacheBuilder;
-import com.google.common.cache.CacheLoader;
-import com.google.common.cache.LoadingCache;
-import com.google.common.util.concurrent.ListenableFuture;
-import com.google.common.util.concurrent.ListenableFutureTask;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import org.apache.cassandra.concurrent.DebuggableThreadPoolExecutor;
 import org.apache.cassandra.config.DatabaseDescriptor;
 
-import javax.management.MBeanServer;
-import javax.management.ObjectName;
-
-public class RolesCache implements RolesCacheMBean
+public class RolesCache extends AuthCache<RoleResource, Set<RoleResource>> implements RolesCacheMBean
 {
-    private static final Logger logger = LoggerFactory.getLogger(RolesCache.class);
-
-    private final String MBEAN_NAME = "org.apache.cassandra.auth:type=RolesCache";
-    private final ThreadPoolExecutor cacheRefreshExecutor = new DebuggableThreadPoolExecutor("RolesCacheRefresh",
-                                                                                             Thread.NORM_PRIORITY);
-    private final IRoleManager roleManager;
-    private volatile LoadingCache<RoleResource, Set<RoleResource>> cache;
-
     public RolesCache(IRoleManager roleManager)
     {
-        this.roleManager = roleManager;
-        this.cache = initCache(null);
-        try
-        {
-            MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
-            mbs.registerMBean(this, new ObjectName(MBEAN_NAME));
-        }
-        catch (Exception e)
-        {
-            throw new RuntimeException(e);
-        }
+        super("RolesCache",
+              DatabaseDescriptor::setRolesValidity,
+              DatabaseDescriptor::getRolesValidity,
+              DatabaseDescriptor::setRolesUpdateInterval,
+              DatabaseDescriptor::getRolesUpdateInterval,
+              DatabaseDescriptor::setRolesCacheMaxEntries,
+              DatabaseDescriptor::getRolesCacheMaxEntries,
+              (r) -> roleManager.getRoles(r, true),
+              () -> DatabaseDescriptor.getAuthenticator().requireAuthentication());
     }
 
     public Set<RoleResource> getRoles(RoleResource role)
     {
-        if (cache == null)
-            return roleManager.getRoles(role, true);
-
         try
         {
-            return cache.get(role);
+            return get(role);
         }
         catch (ExecutionException e)
         {
             throw new RuntimeException(e);
         }
     }
-
-    public void invalidate()
-    {
-        cache = initCache(null);
-    }
-
-    public void setValidity(int validityPeriod)
-    {
-        DatabaseDescriptor.setRolesValidity(validityPeriod);
-        cache = initCache(cache);
-    }
-
-    public int getValidity()
-    {
-        return DatabaseDescriptor.getRolesValidity();
-    }
-
-    public void setUpdateInterval(int updateInterval)
-    {
-        DatabaseDescriptor.setRolesUpdateInterval(updateInterval);
-        cache = initCache(cache);
-    }
-
-    public int getUpdateInterval()
-    {
-        return DatabaseDescriptor.getRolesUpdateInterval();
-    }
-
-
-    private LoadingCache<RoleResource, Set<RoleResource>> initCache(LoadingCache<RoleResource, Set<RoleResource>> existing)
-    {
-        if (!DatabaseDescriptor.getAuthenticator().requireAuthentication())
-            return null;
-
-        if (DatabaseDescriptor.getRolesValidity() <= 0)
-            return null;
-
-        LoadingCache<RoleResource, Set<RoleResource>> newcache = CacheBuilder.newBuilder()
-                .refreshAfterWrite(DatabaseDescriptor.getRolesUpdateInterval(), TimeUnit.MILLISECONDS)
-                .expireAfterWrite(DatabaseDescriptor.getRolesValidity(), TimeUnit.MILLISECONDS)
-                .maximumSize(DatabaseDescriptor.getRolesCacheMaxEntries())
-                .build(new CacheLoader<RoleResource, Set<RoleResource>>()
-                {
-                    public Set<RoleResource> load(RoleResource primaryRole)
-                    {
-                        return roleManager.getRoles(primaryRole, true);
-                    }
-
-                    public ListenableFuture<Set<RoleResource>> reload(final RoleResource primaryRole,
-                                                                      final Set<RoleResource> oldValue)
-                    {
-                        ListenableFutureTask<Set<RoleResource>> task;
-                        task = ListenableFutureTask.create(new Callable<Set<RoleResource>>()
-                        {
-                            public Set<RoleResource> call() throws Exception
-                            {
-                                try
-                                {
-                                    return roleManager.getRoles(primaryRole, true);
-                                } catch (Exception e)
-                                {
-                                    logger.trace("Error performing async refresh of user roles", e);
-                                    throw e;
-                                }
-                            }
-                        });
-                        cacheRefreshExecutor.execute(task);
-                        return task;
-                    }
-                });
-        if (existing != null)
-            newcache.putAll(existing.asMap());
-        return newcache;
-    }
 }

diff --git a/src/java/org/apache/cassandra/auth/RolesCacheMBean.java b/src/java/org/apache/cassandra/auth/RolesCacheMBean.java
index cf270e6..06482d7 100644
--- a/src/java/org/apache/cassandra/auth/RolesCacheMBean.java
+++ b/src/java/org/apache/cassandra/auth/RolesCacheMBean.java

@@ -17,15 +17,10 @@
  */
 package org.apache.cassandra.auth;
 
-public interface RolesCacheMBean
+/**
+ * Retained since CASSANDRA-7715 for backwards compatibility of MBean interface
+ * classes. This should be removed in the next major version (4.0)
+ */
+public interface RolesCacheMBean extends AuthCacheMBean
 {
-    public void invalidate();
-
-    public void setValidity(int validityPeriod);
-
-    public int getValidity();
-
-    public void setUpdateInterval(int updateInterval);
-
-    public int getUpdateInterval();
-}
\ No newline at end of file
+}

diff --git a/src/java/org/apache/cassandra/auth/jmx/AuthenticationProxy.java b/src/java/org/apache/cassandra/auth/jmx/AuthenticationProxy.java
new file mode 100644
index 0000000..0c13e3b
--- /dev/null
+++ b/src/java/org/apache/cassandra/auth/jmx/AuthenticationProxy.java

@@ -0,0 +1,157 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.auth.jmx;
+
+import java.security.AccessController;
+import java.security.PrivilegedAction;
+import javax.management.remote.JMXAuthenticator;
+import javax.security.auth.Subject;
+import javax.security.auth.callback.*;
+import javax.security.auth.login.LoginContext;
+import javax.security.auth.login.LoginException;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.exceptions.ConfigurationException;
+
+/**
+ * An alternative to the JAAS based implementation of JMXAuthenticator provided
+ * by the JDK (JMXPluggableAuthenticator).
+ *
+ * Authentication is performed via delegation to a LoginModule. The JAAS login
+ * config is specified by passing its identifier in a custom system property:
+ *     cassandra.jmx.remote.login.config
+ *
+ * The location of the JAAS configuration file containing that config is
+ * specified in the standard way, using the java.security.auth.login.config
+ * system property.
+ *
+ * If authentication is successful then a Subject containing one or more
+ * Principals is returned. This Subject may then be used during authorization
+ * if a JMX authorization is enabled.
+ */
+public final class AuthenticationProxy implements JMXAuthenticator
+{
+    private static Logger logger = LoggerFactory.getLogger(AuthenticationProxy.class);
+
+    // Identifier of JAAS configuration to be used for subject authentication
+    private final String loginConfigName;
+
+    /**
+     * Creates an instance of <code>JMXPluggableAuthenticator</code>
+     * and initializes it with a {@link LoginContext}.
+     *
+     * @param loginConfigName name of the specifig JAAS login configuration to
+     *                        use when authenticating JMX connections
+     * @throws SecurityException if the authentication mechanism cannot be
+     *         initialized.
+     */
+    public AuthenticationProxy(String loginConfigName)
+    {
+        if (loginConfigName == null)
+            throw new ConfigurationException("JAAS login configuration missing for JMX authenticator setup");
+
+        this.loginConfigName = loginConfigName;
+    }
+
+    /**
+     * Perform authentication of the client opening the {@code}MBeanServerConnection{@code}
+     *
+     * @param credentials optionally these credentials may be supplied by the JMX user.
+     *                    Out of the box, the JDK's {@code}RMIServerImpl{@code} is capable
+     *                    of supplying a two element String[], containing username and password.
+     *                    If present, these credentials will be made available to configured
+     *                    {@code}LoginModule{@code}s via {@code}JMXCallbackHandler{@code}.
+     *
+     * @return the authenticated subject containing any {@code}Principal{@code}s added by
+     *the {@code}LoginModule{@code}s
+     *
+     * @throws SecurityException if the server cannot authenticate the user
+     *         with the provided credentials.
+     */
+    public Subject authenticate(Object credentials)
+    {
+        // The credentials object is expected to be a string array holding the subject's
+        // username & password. Those values are made accessible to LoginModules via the
+        // JMXCallbackHandler.
+        JMXCallbackHandler callbackHandler = new JMXCallbackHandler(credentials);
+        try
+        {
+            LoginContext loginContext = new LoginContext(loginConfigName, callbackHandler);
+            loginContext.login();
+            final Subject subject = loginContext.getSubject();
+            if (!subject.isReadOnly())
+            {
+                AccessController.doPrivileged((PrivilegedAction<Void>) () -> {
+                    subject.setReadOnly();
+                    return null;
+                });
+            }
+
+            return subject;
+        }
+        catch (LoginException e)
+        {
+            logger.trace("Authentication exception", e);
+            throw new SecurityException("Authentication error", e);
+        }
+    }
+
+    /**
+     * This callback handler supplies the username and password (which was
+     * optionally supplied by the JMX user) to the JAAS login module performing
+     * the authentication, should it require them . No interactive user
+     * prompting is necessary because the credentials are already available to
+     * this class (via its enclosing class).
+     */
+    private static final class JMXCallbackHandler implements CallbackHandler
+    {
+        private char[] username;
+        private char[] password;
+        private JMXCallbackHandler(Object credentials)
+        {
+            // if username/password credentials were supplied, store them in
+            // the relevant variables to make them accessible to LoginModules
+            // via JMXCallbackHandler
+            if (credentials instanceof String[])
+            {
+                String[] strings = (String[]) credentials;
+                if (strings[0] != null)
+                    username = strings[0].toCharArray();
+                if (strings[1] != null)
+                    password = strings[1].toCharArray();
+            }
+        }
+
+        public void handle(Callback[] callbacks) throws UnsupportedCallbackException
+        {
+            for (int i = 0; i < callbacks.length; i++)
+            {
+                if (callbacks[i] instanceof NameCallback)
+                    ((NameCallback)callbacks[i]).setName(username == null ? null : new String(username));
+                else if (callbacks[i] instanceof PasswordCallback)
+                    ((PasswordCallback)callbacks[i]).setPassword(password == null ? null : password);
+                else
+                    throw new UnsupportedCallbackException(callbacks[i], "Unrecognized Callback: " + callbacks[i].getClass().getName());
+            }
+        }
+    }
+}
+

diff --git a/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java b/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java
new file mode 100644
index 0000000..3bac1f6
--- /dev/null
+++ b/src/java/org/apache/cassandra/auth/jmx/AuthorizationProxy.java

@@ -0,0 +1,512 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.auth.jmx;
+
+import java.lang.reflect.*;
+import java.security.AccessControlContext;
+import java.security.AccessController;
+import java.security.Principal;
+import java.util.Set;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import javax.management.MBeanServer;
+import javax.management.MalformedObjectNameException;
+import javax.management.ObjectName;
+import javax.management.remote.MBeanServerForwarder;
+import javax.security.auth.Subject;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Throwables;
+import com.google.common.collect.ImmutableSet;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.auth.*;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.service.StorageService;
+import org.apache.cassandra.utils.FBUtilities;
+
+/**
+ * Provides a proxy interface to the platform's MBeanServer instance to perform
+ * role-based authorization on method invocation.
+ *
+ * When used in conjunction with a suitable JMXAuthenticator, which attaches a CassandraPrincipal
+ * to authenticated Subjects, this class uses the configured IAuthorizer to verify that the
+ * subject has the required permissions to execute methods on the MBeanServer and the MBeans it
+ * manages.
+ *
+ * Because an ObjectName may contain wildcards, meaning it represents a set of individual MBeans,
+ * JMX resources don't fit well with the hierarchical approach modelled by other IResource
+ * implementations and utilised by ClientState::ensureHasPermission etc. To enable grants to use
+ * pattern-type ObjectNames, this class performs its own custom matching and filtering of resources
+ * rather than pushing that down to the configured IAuthorizer. To that end, during authorization
+ * it pulls back all permissions for the active subject, filtering them to retain only grants on
+ * JMXResources. It then uses ObjectName::apply to assert whether the target MBeans are wholly
+ * represented by the resources with permissions. This means that it cannot use the PermissionsCache
+ * as IAuthorizer can, so it manages its own cache locally.
+ *
+ * Methods are split into 2 categories; those which are to be invoked on the MBeanServer itself
+ * and those which apply to MBean instances. Actually, this is somewhat of a construct as in fact
+ * *all* invocations are performed on the MBeanServer instance, the distinction is made here on
+ * those methods which take an ObjectName as their first argument and those which do not.
+ * Invoking a method of the former type, e.g. MBeanServer::getAttribute(ObjectName name, String attribute),
+ * implies that the caller is concerned with a specific MBean. Conversely, invoking a method such as
+ * MBeanServer::getDomains is primarily a function of the MBeanServer itself. This class makes
+ * such a distinction in order to identify which JMXResource the subject requires permissions on.
+ *
+ * Certain operations are never allowed for users and these are recorded in a blacklist so that we
+ * can short circuit authorization process if one is attempted by a remote subject.
+ *
+ */
+public class AuthorizationProxy implements InvocationHandler
+{
+    private static final Logger logger = LoggerFactory.getLogger(AuthorizationProxy.class);
+
+    /*
+     A whitelist of permitted methods on the MBeanServer interface which *do not* take an ObjectName
+     as their first argument. These methods can be thought of as relating to the MBeanServer itself,
+     rather than to the MBeans it manages. All of the whitelisted methods are essentially descriptive,
+     hence they require the Subject to have the DESCRIBE permission on the root JMX resource.
+     */
+    private static final Set<String> MBEAN_SERVER_METHOD_WHITELIST = ImmutableSet.of("getDefaultDomain",
+                                                                                     "getDomains",
+                                                                                     "getMBeanCount",
+                                                                                     "hashCode",
+                                                                                     "queryMBeans",
+                                                                                     "queryNames",
+                                                                                     "toString");
+
+    /*
+     A blacklist of method names which are never permitted to be executed by a remote user,
+     regardless of privileges they may be granted.
+     */
+    private static final Set<String> METHOD_BLACKLIST = ImmutableSet.of("createMBean",
+                                                                        "deserialize",
+                                                                        "getClassLoader",
+                                                                        "getClassLoaderFor",
+                                                                        "instantiate",
+                                                                        "registerMBean",
+                                                                        "unregisterMBean");
+
+    private static final JMXPermissionsCache permissionsCache = new JMXPermissionsCache();
+    private MBeanServer mbs;
+
+    /*
+     Used to check whether the Role associated with the authenticated Subject has superuser
+     status. By default, just delegates to Roles::hasSuperuserStatus, but can be overridden for testing.
+     */
+    protected Function<RoleResource, Boolean> isSuperuser = Roles::hasSuperuserStatus;
+
+    /*
+     Used to retrieve the set of all permissions granted to a given role. By default, this fetches
+     the permissions from the local cache, which in turn loads them from the configured IAuthorizer
+     but can be overridden for testing.
+     */
+    protected Function<RoleResource, Set<PermissionDetails>> getPermissions = permissionsCache::get;
+
+    /*
+     Used to decide whether authorization is enabled or not, usually this depends on the configured
+     IAuthorizer, but can be overridden for testing.
+     */
+    protected Supplier<Boolean> isAuthzRequired = () -> DatabaseDescriptor.getAuthorizer().requireAuthorization();
+
+    /*
+     Used to find matching MBeans when the invocation target is a pattern type ObjectName.
+     Defaults to querying the MBeanServer but can be overridden for testing. See checkPattern for usage.
+     */
+    protected Function<ObjectName, Set<ObjectName>> queryNames = (name) -> mbs.queryNames(name, null);
+
+    /*
+     Used to determine whether auth setup has completed so we know whether the expect the IAuthorizer
+     to be ready. Can be overridden for testing.
+     */
+    protected Supplier<Boolean> isAuthSetupComplete = () -> StorageService.instance.isAuthSetupComplete();
+
+    @Override
+    public Object invoke(Object proxy, Method method, Object[] args)
+            throws Throwable
+    {
+        String methodName = method.getName();
+
+        if ("getMBeanServer".equals(methodName))
+            throw new SecurityException("Access denied");
+
+        // Retrieve Subject from current AccessControlContext
+        AccessControlContext acc = AccessController.getContext();
+        Subject subject = Subject.getSubject(acc);
+
+        // Allow setMBeanServer iff performed on behalf of the connector server itself
+        if (("setMBeanServer").equals(methodName))
+        {
+            if (subject != null)
+                throw new SecurityException("Access denied");
+
+            if (args[0] == null)
+                throw new IllegalArgumentException("Null MBeanServer");
+
+            if (mbs != null)
+                throw new IllegalArgumentException("MBeanServer already initialized");
+
+            mbs = (MBeanServer) args[0];
+            return null;
+        }
+
+        if (authorize(subject, methodName, args))
+            return invoke(method, args);
+
+        throw new SecurityException("Access Denied");
+    }
+
+    /**
+     * Performs the actual authorization of an identified subject to execute a remote method invocation.
+     * @param subject The principal making the execution request. A null value represents a local invocation
+     *                from the JMX connector itself
+     * @param methodName Name of the method being invoked
+     * @param args Array containing invocation argument. If the first element is an ObjectName instance, for
+     *             authz purposes we consider this an invocation of an MBean method, otherwise it is treated
+     *             as an invocation of a method on the MBeanServer.
+     */
+    @VisibleForTesting
+    boolean authorize(Subject subject, String methodName, Object[] args)
+    {
+        logger.trace("Authorizing JMX method invocation {} for {}",
+                     methodName,
+                     subject == null ? "" :subject.toString().replaceAll("\\n", " "));
+
+        if (!isAuthSetupComplete.get())
+        {
+            logger.trace("Auth setup is not complete, refusing access");
+            return false;
+        }
+
+        // Permissive authorization is enabled
+        if (!isAuthzRequired.get())
+            return true;
+
+        // Allow operations performed locally on behalf of the connector server itself
+        if (subject == null)
+            return true;
+
+        // Restrict access to certain methods by any remote user
+        if (METHOD_BLACKLIST.contains(methodName))
+        {
+            logger.trace("Access denied to blacklisted method {}", methodName);
+            return false;
+        }
+
+        // Reject if the user has not authenticated
+        Set<Principal> principals = subject.getPrincipals();
+        if (principals == null || principals.isEmpty())
+            return false;
+
+        // Currently, we assume that the first Principal returned from the Subject
+        // is the one to use for authorization. It would be good to make this more
+        // robust, but we have no control over which Principals a given LoginModule
+        // might choose to associate with the Subject following successful authentication
+        RoleResource userResource = RoleResource.role(principals.iterator().next().getName());
+        // A role with superuser status can do anything
+        if (isSuperuser.apply(userResource))
+            return true;
+
+        // The method being invoked may be a method on an MBean, or it could belong
+        // to the MBeanServer itself
+        if (args != null && args[0] instanceof ObjectName)
+            return authorizeMBeanMethod(userResource, methodName, args);
+        else
+            return authorizeMBeanServerMethod(userResource, methodName);
+    }
+
+    /**
+     * Authorize execution of a method on the MBeanServer which does not take an MBean ObjectName
+     * as its first argument. The whitelisted methods that match this criteria are generally
+     * descriptive methods concerned with the MBeanServer itself, rather than with any particular
+     * set of MBeans managed by the server and so we check the DESCRIBE permission on the root
+     * JMXResource (representing the MBeanServer)
+     *
+     * @param subject
+     * @param methodName
+     * @return the result of the method invocation, if authorized
+     * @throws Throwable
+     * @throws SecurityException if authorization fails
+     */
+    private boolean authorizeMBeanServerMethod(RoleResource subject, String methodName)
+    {
+        logger.trace("JMX invocation of {} on MBeanServer requires permission {}", methodName, Permission.DESCRIBE);
+        return (MBEAN_SERVER_METHOD_WHITELIST.contains(methodName) &&
+            hasPermission(subject, Permission.DESCRIBE, JMXResource.root()));
+    }
+
+    /**
+     * Authorize execution of a method on an MBean (or set of MBeans) which may be
+     * managed by the MBeanServer. Note that this also includes the queryMBeans and queryNames
+     * methods of MBeanServer as those both take an ObjectName (possibly a pattern containing
+     * wildcards) as their first argument. They both of those methods also accept null arguments,
+     * in which case they will be handled by authorizedMBeanServerMethod
+     *
+     * @param role
+     * @param methodName
+     * @param args
+     * @return the result of the method invocation, if authorized
+     * @throws Throwable
+     * @throws SecurityException if authorization fails
+     */
+    private boolean authorizeMBeanMethod(RoleResource role, String methodName, Object[] args)
+    {
+        ObjectName targetBean = (ObjectName)args[0];
+
+        // work out which permission we need to execute the method being called on the mbean
+        Permission requiredPermission = getRequiredPermission(methodName);
+        if (null == requiredPermission)
+            return false;
+
+        logger.trace("JMX invocation of {} on {} requires permission {}", methodName, targetBean, requiredPermission);
+
+        // find any JMXResources upon which the authenticated subject has been granted the
+        // reqired permission. We'll do ObjectName-specific filtering & matching of resources later
+        Set<JMXResource> permittedResources = getPermittedResources(role, requiredPermission);
+
+        if (permittedResources.isEmpty())
+            return false;
+
+        // finally, check the JMXResource from the grants to see if we have either
+        // an exact match or a wildcard match for the target resource, whichever is
+        // applicable
+        return targetBean.isPattern()
+                ? checkPattern(targetBean, permittedResources)
+                : checkExact(targetBean, permittedResources);
+    }
+
+    /**
+     * Get any grants of the required permission for the authenticated subject, regardless
+     * of the resource the permission applies to as we'll do the filtering & matching in
+     * the calling method
+     * @param subject
+     * @param required
+     * @return the set of JMXResources upon which the subject has been granted the required permission
+     */
+    private Set<JMXResource> getPermittedResources(RoleResource subject, Permission required)
+    {
+        return getPermissions.apply(subject)
+               .stream()
+               .filter(details -> details.permission == required)
+               .map(details -> (JMXResource)details.resource)
+               .collect(Collectors.toSet());
+    }
+
+    /**
+     * Check whether a required permission has been granted to the authenticated subject on a specific resource
+     * @param subject
+     * @param permission
+     * @param resource
+     * @return true if the Subject has been granted the required permission on the specified resource; false otherwise
+     */
+    private boolean hasPermission(RoleResource subject, Permission permission, JMXResource resource)
+    {
+        return getPermissions.apply(subject)
+               .stream()
+               .anyMatch(details -> details.permission == permission && details.resource.equals(resource));
+    }
+
+    /**
+     * Given a set of JMXResources upon which the Subject has been granted a particular permission,
+     * check whether any match the pattern-type ObjectName representing the target of the method
+     * invocation. At this point, we are sure that whatever the required permission, the Subject
+     * has definitely been granted it against this set of JMXResources. The job of this method is
+     * only to verify that the target of the invocation is covered by the members of the set.
+     *
+     * @param target
+     * @param permittedResources
+     * @return true if all registered beans which match the target can also be matched by the
+     *         JMXResources the subject has been granted permissions on; false otherwise
+     */
+    private boolean checkPattern(ObjectName target, Set<JMXResource> permittedResources)
+    {
+        // if the required permission was granted on the root JMX resource, then we're done
+        if (permittedResources.contains(JMXResource.root()))
+            return true;
+
+        // Get the full set of beans which match the target pattern
+        Set<ObjectName> targetNames = queryNames.apply(target);
+
+        // Iterate over the resources the permission has been granted on. Some of these may
+        // be patterns, so query the server to retrieve the full list of matching names and
+        // remove those from the target set. Once the target set is empty (i.e. all required
+        // matches have been satisfied), the requirement is met.
+        // If there are still unsatisfied targets after all the JMXResources have been processed,
+        // there are insufficient grants to permit the operation.
+        for (JMXResource resource : permittedResources)
+        {
+            try
+            {
+                Set<ObjectName> matchingNames = queryNames.apply(ObjectName.getInstance(resource.getObjectName()));
+                targetNames.removeAll(matchingNames);
+                if (targetNames.isEmpty())
+                    return true;
+            }
+            catch (MalformedObjectNameException e)
+            {
+                logger.warn("Permissions for JMX resource contains invalid ObjectName {}", resource.getObjectName());
+            }
+        }
+
+        logger.trace("Subject does not have sufficient permissions on all MBeans matching the target pattern {}", target);
+        return false;
+    }
+
+    /**
+     * Given a set of JMXResources upon which the Subject has been granted a particular permission,
+     * check whether any match the ObjectName representing the target of the method invocation.
+     * At this point, we are sure that whatever the required permission, the Subject has definitely
+     * been granted it against this set of JMXResources. The job of this method is only to verify
+     * that the target of the invocation is matched by a member of the set.
+     *
+     * @param target
+     * @param permittedResources
+     * @return true if at least one of the permitted resources matches the target; false otherwise
+     */
+    private boolean checkExact(ObjectName target, Set<JMXResource> permittedResources)
+    {
+        // if the required permission was granted on the root JMX resource, then we're done
+        if (permittedResources.contains(JMXResource.root()))
+            return true;
+
+        for (JMXResource resource : permittedResources)
+        {
+            try
+            {
+                if (ObjectName.getInstance(resource.getObjectName()).apply(target))
+                    return true;
+            }
+            catch (MalformedObjectNameException e)
+            {
+                logger.warn("Permissions for JMX resource contains invalid ObjectName {}", resource.getObjectName());
+            }
+        }
+
+        logger.trace("Subject does not have sufficient permissions on target MBean {}", target);
+        return false;
+    }
+
+    /**
+     * Mapping between method names and the permission required to invoke them. Note, these
+     * names refer to methods on MBean instances invoked via the MBeanServer.
+     * @param methodName
+     * @return
+     */
+    private static Permission getRequiredPermission(String methodName)
+    {
+        switch (methodName)
+        {
+            case "getAttribute":
+            case "getAttributes":
+                return Permission.SELECT;
+            case "setAttribute":
+            case "setAttributes":
+                return Permission.MODIFY;
+            case "invoke":
+                return Permission.EXECUTE;
+            case "getInstanceOf":
+            case "getMBeanInfo":
+            case "hashCode":
+            case "isInstanceOf":
+            case "isRegistered":
+            case "queryMBeans":
+            case "queryNames":
+                return Permission.DESCRIBE;
+            default:
+                logger.debug("Access denied, method name {} does not map to any defined permission", methodName);
+                return null;
+        }
+    }
+
+    /**
+     * Invoke a method on the MBeanServer instance. This is called when authorization is not required (because
+     * AllowAllAuthorizer is configured, or because the invocation is being performed by the JMXConnector
+     * itself rather than by a connected client), and also when a call from an authenticated subject
+     * has been successfully authorized
+     *
+     * @param method
+     * @param args
+     * @return
+     * @throws Throwable
+     */
+    private Object invoke(Method method, Object[] args) throws Throwable
+    {
+        try
+        {
+            return method.invoke(mbs, args);
+        }
+        catch (InvocationTargetException e) //Catch any exception that might have been thrown by the mbeans
+        {
+            Throwable t = e.getCause(); //Throw the exception that nodetool etc expects
+            throw t;
+        }
+    }
+
+    /**
+     * Query the configured IAuthorizer for the set of all permissions granted on JMXResources to a specific subject
+     * @param subject
+     * @return All permissions granted to the specfied subject (including those transitively inherited from
+     *         any roles the subject has been granted), filtered to include only permissions granted on
+     *         JMXResources
+     */
+    private static Set<PermissionDetails> loadPermissions(RoleResource subject)
+    {
+        // get all permissions for the specified subject. We'll cache them as it's likely
+        // we'll receive multiple lookups for the same subject (but for different resources
+        // and permissions) in quick succession
+        return DatabaseDescriptor.getAuthorizer().list(AuthenticatedUser.SYSTEM_USER, Permission.ALL, null, subject)
+                                                 .stream()
+                                                 .filter(details -> details.resource instanceof JMXResource)
+                                                 .collect(Collectors.toSet());
+    }
+
+    private static final class JMXPermissionsCache extends AuthCache<RoleResource, Set<PermissionDetails>>
+    {
+        protected JMXPermissionsCache()
+        {
+            super("JMXPermissionsCache",
+                  DatabaseDescriptor::setPermissionsValidity,
+                  DatabaseDescriptor::getPermissionsValidity,
+                  DatabaseDescriptor::setPermissionsUpdateInterval,
+                  DatabaseDescriptor::getPermissionsUpdateInterval,
+                  DatabaseDescriptor::setPermissionsCacheMaxEntries,
+                  DatabaseDescriptor::getPermissionsCacheMaxEntries,
+                  AuthorizationProxy::loadPermissions,
+                  () -> true);
+        }
+
+        public Set<PermissionDetails> get(RoleResource roleResource)
+        {
+            try
+            {
+                return super.get(roleResource);
+            }
+            catch (Exception e)
+            {
+                // because the outer class uses this method as Function<RoleResource, Set<PermissionDetails>>,
+                // which can be overridden for testing, it cannot throw checked exceptions. So here we simply
+                // use guava's propagation helper.
+                throw Throwables.propagate(e);
+            }
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/cache/AutoSavingCache.java b/src/java/org/apache/cassandra/cache/AutoSavingCache.java
index e39dcf1..cb2ad8a 100644
--- a/src/java/org/apache/cassandra/cache/AutoSavingCache.java
+++ b/src/java/org/apache/cassandra/cache/AutoSavingCache.java

@@ -77,11 +77,18 @@
      * a minor version letter.
      *
      * Sticking with "d" is fine for 3.0 since it has never been released or used by another version
+     *
+     * "e" introduced with CASSANDRA-11206, omits IndexInfo from key-cache, stores offset into index-file
      */
-    private static final String CURRENT_VERSION = "d";
+    private static final String CURRENT_VERSION = "e";
 
     private static volatile IStreamFactory streamFactory = new IStreamFactory()
     {
+        private final SequentialWriterOption writerOption = SequentialWriterOption.newBuilder()
+                                                                    .trickleFsync(DatabaseDescriptor.getTrickleFsync())
+                                                                    .trickleFsyncByteInterval(DatabaseDescriptor.getTrickleFsyncIntervalInKb() * 1024)
+                                                                    .finishOnClose(true).build();
+
         public InputStream getInputStream(File dataPath, File crcPath) throws IOException
         {
             return new ChecksummedRandomAccessReader.Builder(dataPath, crcPath).build();
@@ -89,7 +96,7 @@
 
         public OutputStream getOutputStream(File dataPath, File crcPath)
         {
-            return SequentialWriter.open(dataPath, crcPath).finishOnClose();
+            return new ChecksummedSequentialWriter(dataPath, crcPath, null, writerOption);
         }
     };
 
@@ -152,7 +159,7 @@
         ListenableFuture<Integer> cacheLoad = es.submit(new Callable<Integer>()
         {
             @Override
-            public Integer call() throws Exception
+            public Integer call()
             {
                 return loadSaved();
             }

diff --git a/src/java/org/apache/cassandra/cache/CacheSize.java b/src/java/org/apache/cassandra/cache/CacheSize.java
new file mode 100644
index 0000000..71365bb
--- /dev/null
+++ b/src/java/org/apache/cassandra/cache/CacheSize.java

@@ -0,0 +1,34 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.cache;
+
+public interface CacheSize
+{
+
+    long capacity();
+
+    void setCapacity(long capacity);
+
+    int size();
+
+    long weightedSize();
+
+}

diff --git a/src/java/org/apache/cassandra/cache/ChunkCache.java b/src/java/org/apache/cassandra/cache/ChunkCache.java
new file mode 100644
index 0000000..e6296bd
--- /dev/null
+++ b/src/java/org/apache/cassandra/cache/ChunkCache.java

@@ -0,0 +1,322 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.cache;
+
+import java.nio.ByteBuffer;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Throwables;
+import com.google.common.collect.Iterables;
+import com.google.common.util.concurrent.MoreExecutors;
+
+import com.github.benmanes.caffeine.cache.*;
+import com.codahale.metrics.Timer;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.io.compress.BufferType;
+import org.apache.cassandra.io.sstable.CorruptSSTableException;
+import org.apache.cassandra.io.util.*;
+import org.apache.cassandra.metrics.CacheMissMetrics;
+import org.apache.cassandra.utils.memory.BufferPool;
+
+public class ChunkCache 
+        implements CacheLoader<ChunkCache.Key, ChunkCache.Buffer>, RemovalListener<ChunkCache.Key, ChunkCache.Buffer>, CacheSize
+{
+    public static final int RESERVED_POOL_SPACE_IN_MB = 32;
+    public static final long cacheSize = 1024L * 1024L * Math.max(0, DatabaseDescriptor.getFileCacheSizeInMB() - RESERVED_POOL_SPACE_IN_MB);
+
+    private static boolean enabled = cacheSize > 0;
+    public static final ChunkCache instance = enabled ? new ChunkCache() : null;
+
+    private final LoadingCache<Key, Buffer> cache;
+    public final CacheMissMetrics metrics;
+
+    static class Key
+    {
+        final ChunkReader file;
+        final String path;
+        final long position;
+
+        public Key(ChunkReader file, long position)
+        {
+            super();
+            this.file = file;
+            this.position = position;
+            this.path = file.channel().filePath();
+        }
+
+        public int hashCode()
+        {
+            final int prime = 31;
+            int result = 1;
+            result = prime * result + path.hashCode();
+            result = prime * result + file.getClass().hashCode();
+            result = prime * result + Long.hashCode(position);
+            return result;
+        }
+
+        public boolean equals(Object obj)
+        {
+            if (this == obj)
+                return true;
+            if (obj == null)
+                return false;
+
+            Key other = (Key) obj;
+            return (position == other.position)
+                    && file.getClass() == other.file.getClass()
+                    && path.equals(other.path);
+        }
+    }
+
+    static class Buffer implements Rebufferer.BufferHolder
+    {
+        private final ByteBuffer buffer;
+        private final long offset;
+        private final AtomicInteger references;
+
+        public Buffer(ByteBuffer buffer, long offset)
+        {
+            this.buffer = buffer;
+            this.offset = offset;
+            references = new AtomicInteger(1);  // start referenced.
+        }
+
+        Buffer reference()
+        {
+            int refCount;
+            do
+            {
+                refCount = references.get();
+                if (refCount == 0)
+                    // Buffer was released before we managed to reference it. 
+                    return null;
+            } while (!references.compareAndSet(refCount, refCount + 1));
+
+            return this;
+        }
+
+        @Override
+        public ByteBuffer buffer()
+        {
+            assert references.get() > 0;
+            return buffer.duplicate();
+        }
+
+        @Override
+        public long offset()
+        {
+            return offset;
+        }
+
+        @Override
+        public void release()
+        {
+            if (references.decrementAndGet() == 0)
+                BufferPool.put(buffer);
+        }
+    }
+
+    public ChunkCache()
+    {
+        cache = Caffeine.newBuilder()
+                .maximumWeight(cacheSize)
+                .executor(MoreExecutors.directExecutor())
+                .weigher((key, buffer) -> ((Buffer) buffer).buffer.capacity())
+                .removalListener(this)
+                .build(this);
+        metrics = new CacheMissMetrics("ChunkCache", this);
+    }
+
+    @Override
+    public Buffer load(Key key) throws Exception
+    {
+        ChunkReader rebufferer = key.file;
+        metrics.misses.mark();
+        try (Timer.Context ctx = metrics.missLatency.time())
+        {
+            ByteBuffer buffer = BufferPool.get(key.file.chunkSize(), key.file.preferredBufferType());
+            assert buffer != null;
+            rebufferer.readChunk(key.position, buffer);
+            return new Buffer(buffer, key.position);
+        }
+    }
+
+    @Override
+    public void onRemoval(Key key, Buffer buffer, RemovalCause cause)
+    {
+        buffer.release();
+    }
+
+    public void close()
+    {
+        cache.invalidateAll();
+    }
+
+    public RebuffererFactory wrap(ChunkReader file)
+    {
+        return new CachingRebufferer(file);
+    }
+
+    public static RebuffererFactory maybeWrap(ChunkReader file)
+    {
+        if (!enabled)
+            return file;
+
+        return instance.wrap(file);
+    }
+
+    public void invalidatePosition(SegmentedFile dfile, long position)
+    {
+        if (!(dfile.rebuffererFactory() instanceof CachingRebufferer))
+            return;
+
+        ((CachingRebufferer) dfile.rebuffererFactory()).invalidate(position);
+    }
+
+    public void invalidateFile(String fileName)
+    {
+        cache.invalidateAll(Iterables.filter(cache.asMap().keySet(), x -> x.path.equals(fileName)));
+    }
+
+    @VisibleForTesting
+    public void enable(boolean enabled)
+    {
+        ChunkCache.enabled = enabled;
+        cache.invalidateAll();
+        metrics.reset();
+    }
+
+    // TODO: Invalidate caches for obsoleted/MOVED_START tables?
+
+    /**
+     * Rebufferer providing cached chunks where data is obtained from the specified ChunkReader.
+     * Thread-safe. One instance per SegmentedFile, created by ChunkCache.maybeWrap if the cache is enabled.
+     */
+    class CachingRebufferer implements Rebufferer, RebuffererFactory
+    {
+        private final ChunkReader source;
+        final long alignmentMask;
+
+        public CachingRebufferer(ChunkReader file)
+        {
+            source = file;
+            int chunkSize = file.chunkSize();
+            assert Integer.bitCount(chunkSize) == 1;    // Must be power of two
+            alignmentMask = -chunkSize;
+        }
+
+        @Override
+        public Buffer rebuffer(long position)
+        {
+            try
+            {
+                metrics.requests.mark();
+                long pageAlignedPos = position & alignmentMask;
+                Buffer buf;
+                do
+                    buf = cache.get(new Key(source, pageAlignedPos)).reference();
+                while (buf == null);
+
+                return buf;
+            }
+            catch (Throwable t)
+            {
+                Throwables.propagateIfInstanceOf(t.getCause(), CorruptSSTableException.class);
+                throw Throwables.propagate(t);
+            }
+        }
+
+        public void invalidate(long position)
+        {
+            long pageAlignedPos = position & alignmentMask;
+            cache.invalidate(new Key(source, pageAlignedPos));
+        }
+
+        @Override
+        public Rebufferer instantiateRebufferer()
+        {
+            return this;
+        }
+
+        @Override
+        public void close()
+        {
+            source.close();
+        }
+
+        @Override
+        public void closeReader()
+        {
+            // Instance is shared among readers. Nothing to release.
+        }
+
+        @Override
+        public ChannelProxy channel()
+        {
+            return source.channel();
+        }
+
+        @Override
+        public long fileLength()
+        {
+            return source.fileLength();
+        }
+
+        @Override
+        public double getCrcCheckChance()
+        {
+            return source.getCrcCheckChance();
+        }
+
+        @Override
+        public String toString()
+        {
+            return "CachingRebufferer:" + source.toString();
+        }
+    }
+
+    @Override
+    public long capacity()
+    {
+        return cacheSize;
+    }
+
+    @Override
+    public void setCapacity(long capacity)
+    {
+        throw new UnsupportedOperationException("Chunk cache size cannot be changed.");
+    }
+
+    @Override
+    public int size()
+    {
+        return cache.asMap().size();
+    }
+
+    @Override
+    public long weightedSize()
+    {
+        return cache.policy().eviction()
+                .map(policy -> policy.weightedSize().orElseGet(cache::estimatedSize))
+                .orElseGet(cache::estimatedSize);
+    }
+}

diff --git a/src/java/org/apache/cassandra/cache/ICache.java b/src/java/org/apache/cassandra/cache/ICache.java
index 37b55cd..7ca6b2e 100644
--- a/src/java/org/apache/cassandra/cache/ICache.java
+++ b/src/java/org/apache/cassandra/cache/ICache.java

@@ -24,12 +24,8 @@
  * and does not require put or remove to return values, which lets SerializingCache
  * be more efficient by avoiding deserialize except on get.
  */
-public interface ICache<K, V>
+public interface ICache<K, V> extends CacheSize
 {
-    public long capacity();
-
-    public void setCapacity(long capacity);
-
     public void put(K key, V value);
 
     public boolean putIfAbsent(K key, V value);
@@ -40,10 +36,6 @@
 
     public void remove(K key);
 
-    public int size();
-
-    public long weightedSize();
-
     public void clear();
 
     public Iterator<K> keyIterator();

diff --git a/src/java/org/apache/cassandra/cache/SerializingCache.java b/src/java/org/apache/cassandra/cache/SerializingCache.java
index 3651a0c..0ece686 100644
--- a/src/java/org/apache/cassandra/cache/SerializingCache.java
+++ b/src/java/org/apache/cassandra/cache/SerializingCache.java

@@ -31,6 +31,7 @@
 import org.apache.cassandra.io.util.MemoryInputStream;
 import org.apache.cassandra.io.util.MemoryOutputStream;
 import org.apache.cassandra.io.util.WrappedDataOutputStreamPlus;
+import org.apache.cassandra.utils.FBUtilities;
 
 /**
  * Serializes cache values off-heap.
@@ -99,7 +100,7 @@
     {
         long serializedSize = serializer.serializedSize(value);
         if (serializedSize > Integer.MAX_VALUE)
-            throw new IllegalArgumentException("Unable to allocate " + serializedSize + " bytes");
+            throw new IllegalArgumentException(String.format("Unable to allocate %s", FBUtilities.prettyPrintMemory(serializedSize)));
 
         RefCountedMemory freeableMemory;
         try

diff --git a/src/java/org/apache/cassandra/concurrent/ExecutorLocal.java b/src/java/org/apache/cassandra/concurrent/ExecutorLocal.java
index 47826f3..367dc7c 100644
--- a/src/java/org/apache/cassandra/concurrent/ExecutorLocal.java
+++ b/src/java/org/apache/cassandra/concurrent/ExecutorLocal.java

@@ -27,7 +27,7 @@
     ExecutorLocal[] all = { Tracing.instance, ClientWarn.instance };
 
     /**
-     * This is called when scheduling the task, and also before calling {@link ExecutorLocal#set(T)} when running on a
+     * This is called when scheduling the task, and also before calling {@link #set(Object)} when running on a
      * executor thread.
      *
      * @return The thread-local value that we want to copy across executor boundaries; may be null if not set.

diff --git a/src/java/org/apache/cassandra/concurrent/NamedThreadFactory.java b/src/java/org/apache/cassandra/concurrent/NamedThreadFactory.java
index 33c80d5..85edf74 100644
--- a/src/java/org/apache/cassandra/concurrent/NamedThreadFactory.java
+++ b/src/java/org/apache/cassandra/concurrent/NamedThreadFactory.java

@@ -20,6 +20,8 @@
 import java.util.concurrent.ThreadFactory;
 import java.util.concurrent.atomic.AtomicInteger;
 
+import io.netty.util.concurrent.FastThreadLocalThread;
+
 /**
  * This class is an implementation of the <i>ThreadFactory</i> interface. This
  * is useful to give Java threads meaningful names which is useful when using
@@ -55,7 +57,7 @@
     public Thread newThread(Runnable runnable)
     {
         String name = id + ":" + n.getAndIncrement();
-        Thread thread = new Thread(threadGroup, runnable, name);
+        Thread thread = new FastThreadLocalThread(threadGroup, runnable, name);
         thread.setPriority(priority);
         thread.setDaemon(true);
         if (contextClassLoader != null)

diff --git a/src/java/org/apache/cassandra/concurrent/SEPExecutor.java b/src/java/org/apache/cassandra/concurrent/SEPExecutor.java
index 8b12b82..c87614b 100644
--- a/src/java/org/apache/cassandra/concurrent/SEPExecutor.java
+++ b/src/java/org/apache/cassandra/concurrent/SEPExecutor.java

@@ -34,6 +34,7 @@
     private final SharedExecutorPool pool;
 
     public final int maxWorkers;
+    public final String name;
     private final int maxTasksQueued;
     private final SEPMetrics metrics;
 
@@ -55,6 +56,7 @@
     SEPExecutor(SharedExecutorPool pool, int maxWorkers, int maxTasksQueued, String jmxPath, String name)
     {
         this.pool = pool;
+        this.name = name;
         this.maxWorkers = maxWorkers;
         this.maxTasksQueued = maxTasksQueued;
         this.permits.set(combine(0, maxWorkers));

diff --git a/src/java/org/apache/cassandra/concurrent/SEPWorker.java b/src/java/org/apache/cassandra/concurrent/SEPWorker.java
index 3b3e7ad..b3f817a 100644
--- a/src/java/org/apache/cassandra/concurrent/SEPWorker.java
+++ b/src/java/org/apache/cassandra/concurrent/SEPWorker.java

@@ -24,11 +24,13 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import io.netty.util.concurrent.FastThreadLocalThread;
 import org.apache.cassandra.utils.JVMStabilityInspector;
 
 final class SEPWorker extends AtomicReference<SEPWorker.Work> implements Runnable
 {
     private static final Logger logger = LoggerFactory.getLogger(SEPWorker.class);
+    private static final boolean SET_THREAD_NAME = Boolean.parseBoolean(System.getProperty("cassandra.set_sep_thread_name", "true"));
 
     final Long workerId;
     final Thread thread;
@@ -45,7 +47,7 @@
     {
         this.pool = pool;
         this.workerId = workerId;
-        thread = new Thread(this, pool.poolName + "-Worker-" + workerId);
+        thread = new FastThreadLocalThread(this, pool.poolName + "-Worker-" + workerId);
         thread.setDaemon(true);
         set(initialState);
         thread.start();
@@ -88,6 +90,8 @@
                 assigned = get().assigned;
                 if (assigned == null)
                     continue;
+                if (SET_THREAD_NAME)
+                    Thread.currentThread().setName(assigned.name + "-" + workerId);
                 task = assigned.tasks.poll();
 
                 // if we do have tasks assigned, nobody will change our state so we can simply set it to WORKING

diff --git a/src/java/org/apache/cassandra/concurrent/ScheduledExecutors.java b/src/java/org/apache/cassandra/concurrent/ScheduledExecutors.java
index 5935669..35469cc 100644
--- a/src/java/org/apache/cassandra/concurrent/ScheduledExecutors.java
+++ b/src/java/org/apache/cassandra/concurrent/ScheduledExecutors.java

@@ -23,6 +23,11 @@
 public class ScheduledExecutors
 {
     /**
+     * This pool is used for periodic fast (sub-microsecond) tasks.
+     */
+    public static final DebuggableScheduledThreadPoolExecutor scheduledFastTasks = new DebuggableScheduledThreadPoolExecutor("ScheduledFastTasks");
+
+    /**
      * This pool is used for periodic short (sub-second) tasks.
      */
      public static final DebuggableScheduledThreadPoolExecutor scheduledTasks = new DebuggableScheduledThreadPoolExecutor("ScheduledTasks");

diff --git a/src/java/org/apache/cassandra/concurrent/StageManager.java b/src/java/org/apache/cassandra/concurrent/StageManager.java
index a201e78..64abf00 100644
--- a/src/java/org/apache/cassandra/concurrent/StageManager.java
+++ b/src/java/org/apache/cassandra/concurrent/StageManager.java

@@ -112,13 +112,7 @@
         }
     }
 
-    public final static Runnable NO_OP_TASK = new Runnable()
-    {
-        public void run()
-        {
-
-        }
-    };
+    public final static Runnable NO_OP_TASK = () -> {};
 
     /**
      * A TPE that disallows submit so that we don't need to worry about unwrapping exceptions on the

diff --git a/src/java/org/apache/cassandra/config/CFMetaData.java b/src/java/org/apache/cassandra/config/CFMetaData.java
index 5678ada..4de4f7b 100644
--- a/src/java/org/apache/cassandra/config/CFMetaData.java
+++ b/src/java/org/apache/cassandra/config/CFMetaData.java

@@ -24,6 +24,7 @@
 import java.util.*;
 import java.util.concurrent.ThreadLocalRandom;
 import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
 import java.util.stream.Collectors;
 
 import com.google.common.annotations.VisibleForTesting;
@@ -38,6 +39,7 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import org.apache.cassandra.auth.DataResource;
 import org.apache.cassandra.cql3.ColumnIdentifier;
 import org.apache.cassandra.cql3.QueryProcessor;
 import org.apache.cassandra.cql3.statements.CFStatement;
@@ -65,6 +67,8 @@
         SUPER, COUNTER, DENSE, COMPOUND
     }
 
+    private static final Pattern PATTERN_WORD_CHARS = Pattern.compile("\\w+");
+
     private static final Logger logger = LoggerFactory.getLogger(CFMetaData.class);
 
     public static final Serializer serializer = new Serializer();
@@ -115,6 +119,8 @@
     // for those tables in practice).
     private volatile ColumnDefinition compactValueColumn;
 
+    public final DataResource resource;
+
     /*
      * All of these methods will go away once CFMetaData becomes completely immutable.
      */
@@ -280,11 +286,15 @@
         // A compact table should always have a clustering
         assert isCQLTable() || !clusteringColumns.isEmpty() : String.format("For table %s.%s, isDense=%b, isCompound=%b, clustering=%s", ksName, cfName, isDense, isCompound, clusteringColumns);
 
+        // All tables should have a partition key
+        assert !partitionKeyColumns.isEmpty() : String.format("Have no partition keys for table %s.%s", ksName, cfName);
+
         this.partitionKeyColumns = partitionKeyColumns;
         this.clusteringColumns = clusteringColumns;
         this.partitionColumns = partitionColumns;
 
         this.serializers = new Serializers(this);
+        this.resource = DataResource.table(ksName, cfName);
         rebuild();
     }
 
@@ -414,7 +424,7 @@
 
     /**
      * Generates deterministic UUID from keyspace/columnfamily name pair.
-     * This is used to generate the same UUID for C* version < 2.1
+     * This is used to generate the same UUID for {@code C* version < 2.1}
      *
      * Since 2.1, this is only used for system columnfamilies and tests.
      */
@@ -793,7 +803,7 @@
         className = className.contains(".") ? className : "org.apache.cassandra.db.compaction." + className;
         Class<AbstractCompactionStrategy> strategyClass = FBUtilities.classForName(className, "compaction strategy");
         if (!AbstractCompactionStrategy.class.isAssignableFrom(strategyClass))
-            throw new ConfigurationException(String.format("Specified compaction strategy class (%s) is not derived from AbstractReplicationStrategy", className));
+            throw new ConfigurationException(String.format("Specified compaction strategy class (%s) is not derived from AbstractCompactionStrategy", className));
 
         return strategyClass;
     }
@@ -830,9 +840,9 @@
         return columnMetadata.get(name);
     }
 
-    public static boolean isNameValid(String name)
-    {
-        return name != null && !name.isEmpty() && name.length() <= Schema.NAME_LENGTH && name.matches("\\w+");
+    public static boolean isNameValid(String name) {
+        return name != null && !name.isEmpty()
+                && name.length() <= Schema.NAME_LENGTH && PATTERN_WORD_CHARS.matcher(name).matches();
     }
 
     public CFMetaData validate() throws ConfigurationException

diff --git a/src/java/org/apache/cassandra/config/ColumnDefinition.java b/src/java/org/apache/cassandra/config/ColumnDefinition.java
index 6bcc2e0..713d684 100644
--- a/src/java/org/apache/cassandra/config/ColumnDefinition.java
+++ b/src/java/org/apache/cassandra/config/ColumnDefinition.java

@@ -22,15 +22,21 @@
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Function;
+import com.google.common.base.MoreObjects;
 import com.google.common.base.Objects;
 import com.google.common.collect.Collections2;
 
 import org.apache.cassandra.cql3.*;
+import org.apache.cassandra.cql3.selection.Selectable;
+import org.apache.cassandra.cql3.selection.Selector;
+import org.apache.cassandra.cql3.selection.SimpleSelector;
 import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.db.marshal.*;
+import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.serializers.MarshalException;
+import org.apache.cassandra.utils.ByteBufferUtil;
 
-public class ColumnDefinition extends ColumnSpecification implements Comparable<ColumnDefinition>
+public class ColumnDefinition extends ColumnSpecification implements Selectable, Comparable<ColumnDefinition>
 {
     public static final Comparator<Object> asymmetricColumnDataComparator =
         (a, b) -> ((ColumnData) a).column().compareTo((ColumnDefinition) b);
@@ -81,6 +87,8 @@
     private final Comparator<Object> asymmetricCellPathComparator;
     private final Comparator<? super Cell> cellComparator;
 
+    private int hash;
+
     /**
      * These objects are compared frequently, so we encode several of their comparison components
      * into a single long value so that this can be done efficiently
@@ -164,10 +172,13 @@
 
     private static Comparator<CellPath> makeCellPathComparator(Kind kind, AbstractType<?> type)
     {
-        if (kind.isPrimaryKeyKind() || !type.isCollection() || !type.isMultiCell())
+        if (kind.isPrimaryKeyKind() || !type.isMultiCell())
             return null;
 
-        CollectionType collection = (CollectionType) type;
+        AbstractType<?> nameComparator = type.isCollection()
+                                       ? ((CollectionType) type).nameComparator()
+                                       : ((UserType) type).nameComparator();
+
 
         return new Comparator<CellPath>()
         {
@@ -184,7 +195,7 @@
 
                 // This will get more complicated once we have non-frozen UDT and nested collections
                 assert path1.size() == 1 && path2.size() == 1;
-                return collection.nameComparator().compare(path1.get(0), path2.get(0));
+                return nameComparator.compare(path1.get(0), path2.get(0));
             }
         };
     }
@@ -259,18 +270,36 @@
     @Override
     public int hashCode()
     {
-        return Objects.hashCode(ksName, cfName, name, type, kind, position);
+        // This achieves the same as Objects.hashcode, but avoids the object array allocation
+        // which features significantly in the allocation profile and caches the result.
+        int result = hash;
+        if(result == 0)
+        {
+            result = 31 + (ksName == null ? 0 : ksName.hashCode());
+            result = 31 * result + (cfName == null ? 0 : cfName.hashCode());
+            result = 31 * result + (name == null ? 0 : name.hashCode());
+            result = 31 * result + (type == null ? 0 : type.hashCode());
+            result = 31 * result + (kind == null ? 0 : kind.hashCode());
+            result = 31 * result + position;
+            hash = result;
+        }
+        return result;
     }
 
     @Override
     public String toString()
     {
-        return Objects.toStringHelper(this)
-                      .add("name", name)
-                      .add("type", type)
-                      .add("kind", kind)
-                      .add("position", position)
-                      .toString();
+        return name.toString();
+    }
+
+    public String debugString()
+    {
+        return MoreObjects.toStringHelper(this)
+                          .add("name", name)
+                          .add("type", type)
+                          .add("kind", kind)
+                          .add("position", position)
+                          .toString();
     }
 
     public boolean isPrimaryKeyColumn()
@@ -365,8 +394,11 @@
         if (!isComplex())
             throw new MarshalException("Only complex cells should have a cell path");
 
-        assert type instanceof CollectionType;
-        ((CollectionType)type).nameComparator().validate(path.get(0));
+        assert type.isMultiCell();
+        if (type.isCollection())
+            ((CollectionType)type).nameComparator().validate(path.get(0));
+        else
+            ((UserType)type).nameComparator().validate(path.get(0));
     }
 
     public static String toCQLString(Iterable<ColumnDefinition> defs)
@@ -398,4 +430,193 @@
              ? ((CollectionType)type).valueComparator()
              : type;
     }
+
+    public Selector.Factory newSelectorFactory(CFMetaData cfm, AbstractType<?> expectedType, List<ColumnDefinition> defs, VariableSpecifications boundNames) throws InvalidRequestException
+    {
+        return SimpleSelector.newFactory(this, addAndGetIndex(this, defs));
+    }
+
+    public AbstractType<?> getExactTypeIfKnown(String keyspace)
+    {
+        return type;
+    }
+
+    /**
+     * Because Thrift-created tables may have a non-text comparator, we cannot determine the proper 'key' until
+     * we know the comparator. ColumnDefinition.Raw is a placeholder that can be converted to a real ColumnIdentifier
+     * once the comparator is known with prepare(). This should only be used with identifiers that are actual
+     * column names. See CASSANDRA-8178 for more background.
+     */
+    public static abstract class Raw extends Selectable.Raw
+    {
+        /**
+         * Creates a {@code ColumnDefinition.Raw} from an unquoted identifier string.
+         */
+        public static Raw forUnquoted(String text)
+        {
+            return new Literal(text, false);
+        }
+
+        /**
+         * Creates a {@code ColumnDefinition.Raw} from a quoted identifier string.
+         */
+        public static Raw forQuoted(String text)
+        {
+            return new Literal(text, true);
+        }
+
+        /**
+         * Creates a {@code ColumnDefinition.Raw} from a pre-existing {@code ColumnDefinition}
+         * (useful in the rare cases where we already have the column but need
+         * a {@code ColumnDefinition.Raw} for typing purposes).
+         */
+        public static Raw forColumn(ColumnDefinition column)
+        {
+            return new ForColumn(column);
+        }
+
+        /**
+         * Get the identifier corresponding to this raw column, without assuming this is an
+         * existing column (unlike {@link #prepare}).
+         */
+        public abstract ColumnIdentifier getIdentifier(CFMetaData cfm);
+
+        public abstract String rawText();
+
+        @Override
+        public abstract ColumnDefinition prepare(CFMetaData cfm);
+
+        @Override
+        public boolean processesSelection()
+        {
+            return false;
+        }
+
+        @Override
+        public final int hashCode()
+        {
+            return toString().hashCode();
+        }
+
+        @Override
+        public final boolean equals(Object o)
+        {
+            if(!(o instanceof Raw))
+                return false;
+
+            Raw that = (Raw)o;
+            return this.toString().equals(that.toString());
+        }
+
+        private static class Literal extends Raw
+        {
+            private final String text;
+
+            public Literal(String rawText, boolean keepCase)
+            {
+                this.text =  keepCase ? rawText : rawText.toLowerCase(Locale.US);
+            }
+
+            public ColumnIdentifier getIdentifier(CFMetaData cfm)
+            {
+                if (!cfm.isStaticCompactTable())
+                    return ColumnIdentifier.getInterned(text, true);
+
+                AbstractType<?> thriftColumnNameType = cfm.thriftColumnNameType();
+                if (thriftColumnNameType instanceof UTF8Type)
+                    return ColumnIdentifier.getInterned(text, true);
+
+                // We have a Thrift-created table with a non-text comparator. Check if we have a match column, otherwise assume we should use
+                // thriftColumnNameType
+                ByteBuffer bufferName = ByteBufferUtil.bytes(text);
+                for (ColumnDefinition def : cfm.allColumns())
+                {
+                    if (def.name.bytes.equals(bufferName))
+                        return def.name;
+                }
+                return ColumnIdentifier.getInterned(thriftColumnNameType.fromString(text), text);
+            }
+
+            public ColumnDefinition prepare(CFMetaData cfm)
+            {
+                if (!cfm.isStaticCompactTable())
+                    return find(cfm);
+
+                AbstractType<?> thriftColumnNameType = cfm.thriftColumnNameType();
+                if (thriftColumnNameType instanceof UTF8Type)
+                    return find(cfm);
+
+                // We have a Thrift-created table with a non-text comparator. Check if we have a match column, otherwise assume we should use
+                // thriftColumnNameType
+                ByteBuffer bufferName = ByteBufferUtil.bytes(text);
+                for (ColumnDefinition def : cfm.allColumns())
+                {
+                    if (def.name.bytes.equals(bufferName))
+                        return def;
+                }
+                return find(thriftColumnNameType.fromString(text), cfm);
+            }
+
+            private ColumnDefinition find(CFMetaData cfm)
+            {
+                return find(ByteBufferUtil.bytes(text), cfm);
+            }
+
+            private ColumnDefinition find(ByteBuffer id, CFMetaData cfm)
+            {
+                ColumnDefinition def = cfm.getColumnDefinition(id);
+                if (def == null)
+                    throw new InvalidRequestException(String.format("Undefined column name %s", toString()));
+                return def;
+            }
+
+            public String rawText()
+            {
+                return text;
+            }
+
+            @Override
+            public String toString()
+            {
+                return ColumnIdentifier.maybeQuote(text);
+            }
+        }
+
+        // Use internally in the rare case where we need a ColumnDefinition.Raw for type-checking but
+        // actually already have the column itself.
+        private static class ForColumn extends Raw
+        {
+            private final ColumnDefinition column;
+
+            private ForColumn(ColumnDefinition column)
+            {
+                this.column = column;
+            }
+
+            public ColumnIdentifier getIdentifier(CFMetaData cfm)
+            {
+                return column.name;
+            }
+
+            public ColumnDefinition prepare(CFMetaData cfm)
+            {
+                assert cfm.getColumnDefinition(column.name) != null; // Sanity check that we're not doing something crazy
+                return column;
+            }
+
+            public String rawText()
+            {
+                return column.name.toString();
+            }
+
+            @Override
+            public String toString()
+            {
+                return column.name.toCQLString();
+            }
+        }
+    }
+
+
+
 }

diff --git a/src/java/org/apache/cassandra/config/Config.java b/src/java/org/apache/cassandra/config/Config.java
index e6c56cb..a4b42b7 100644
--- a/src/java/org/apache/cassandra/config/Config.java
+++ b/src/java/org/apache/cassandra/config/Config.java

@@ -55,11 +55,14 @@
     public String authorizer;
     public String role_manager;
     public volatile int permissions_validity_in_ms = 2000;
-    public int permissions_cache_max_entries = 1000;
+    public volatile int permissions_cache_max_entries = 1000;
     public volatile int permissions_update_interval_in_ms = -1;
     public volatile int roles_validity_in_ms = 2000;
-    public int roles_cache_max_entries = 1000;
+    public volatile int roles_cache_max_entries = 1000;
     public volatile int roles_update_interval_in_ms = -1;
+    public volatile int credentials_validity_in_ms = 2000;
+    public volatile int credentials_cache_max_entries = 1000;
+    public volatile int credentials_update_interval_in_ms = -1;
 
     /* Hashing strategy Random or OPHF */
     public String partitioner;
@@ -110,7 +113,7 @@
     @Deprecated
     public Integer concurrent_replicates = null;
 
-    public Integer memtable_flush_writers = null;
+    public Integer memtable_flush_writers = 1;
     public Integer memtable_heap_space_in_mb;
     public Integer memtable_offheap_space_in_mb;
     public Float memtable_cleanup_threshold = null;
@@ -165,6 +168,7 @@
 
     /* if the size of columns or super-columns are more than this, indexing will kick in */
     public Integer column_index_size_in_kb = 64;
+    public Integer column_index_cache_size_in_kb = 2;
     public volatile int batch_size_warn_threshold_in_kb = 5;
     public volatile int batch_size_fail_threshold_in_kb = 50;
     public Integer unlogged_batch_across_partitions_warn_threshold = 10;
@@ -191,9 +195,16 @@
     public int commitlog_segment_size_in_mb = 32;
     public ParameterizedClass commitlog_compression;
     public int commitlog_max_compression_buffers_in_pool = 3;
+    public TransparentDataEncryptionOptions transparent_data_encryption_options = new TransparentDataEncryptionOptions();
 
     public Integer max_mutation_size_in_kb;
 
+    // Change-data-capture logs
+    public Boolean cdc_enabled = false;
+    public String cdc_raw_directory;
+    public Integer cdc_total_space_in_mb;
+    public Integer cdc_free_space_check_interval_ms = 250;
+
     @Deprecated
     public int commitlog_periodic_queue_size = -1;
 
@@ -244,7 +255,7 @@
 
     private static boolean isClientMode = false;
 
-    public Integer file_cache_size_in_mb = 512;
+    public Integer file_cache_size_in_mb;
 
     public boolean buffer_pool_use_heap_if_exhausted = true;
 
@@ -291,6 +302,17 @@
 
     public int windows_timer_interval = 0;
 
+    /**
+     * Size of the CQL prepared statements cache in MB.
+     * Defaults to 1/256th of the heap size or 10MB, whichever is greater.
+     */
+    public Long prepared_statements_cache_size_mb = null;
+    /**
+     * Size of the Thrift prepared statements cache in MB.
+     * Defaults to 1/256th of the heap size or 10MB, whichever is greater.
+     */
+    public Long thrift_prepared_statements_cache_size_mb = null;
+
     public boolean enable_user_defined_functions = false;
     public boolean enable_scripted_user_defined_functions = false;
     /**
@@ -439,6 +461,6 @@
             configMap.put(name, value);
         }
 
-        logger.info("Node configuration:[" + Joiner.on("; ").join(configMap.entrySet()) + "]");
+        logger.info("Node configuration:[{}]", Joiner.on("; ").join(configMap.entrySet()));
     }
 }

diff --git a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
index f0ec5d0..05d80a5 100644
--- a/src/java/org/apache/cassandra/config/DatabaseDescriptor.java
+++ b/src/java/org/apache/cassandra/config/DatabaseDescriptor.java

@@ -51,10 +51,12 @@
 import org.apache.cassandra.net.MessagingService;
 import org.apache.cassandra.scheduler.IRequestScheduler;
 import org.apache.cassandra.scheduler.NoScheduler;
+import org.apache.cassandra.security.EncryptionContext;
 import org.apache.cassandra.service.CacheService;
 import org.apache.cassandra.thrift.ThriftServer;
 import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.memory.*;
+import org.apache.commons.lang3.StringUtils;
 
 public class DatabaseDescriptor
 {
@@ -94,12 +96,16 @@
     private static RequestSchedulerId requestSchedulerId;
     private static RequestSchedulerOptions requestSchedulerOptions;
 
+    private static long preparedStatementsCacheSizeInMB;
+    private static long thriftPreparedStatementsCacheSizeInMB;
+
     private static long keyCacheSizeInMB;
     private static long counterCacheSizeInMB;
     private static long indexSummaryCapacityInMB;
 
     private static String localDC;
     private static Comparator<InetAddress> localComparator;
+    private static EncryptionContext encryptionContext;
     private static boolean hasLoggedConfig;
 
     public static void forceStaticInitialization() {}
@@ -327,11 +333,24 @@
         if (conf.authenticator != null)
             authenticator = FBUtilities.newAuthenticator(conf.authenticator);
 
+        // the configuration options regarding credentials caching are only guaranteed to
+        // work with PasswordAuthenticator, so log a message if some other authenticator
+        // is in use and non-default values are detected
+        if (!(authenticator instanceof PasswordAuthenticator)
+            && (conf.credentials_update_interval_in_ms != -1
+                || conf.credentials_validity_in_ms != 2000
+                || conf.credentials_cache_max_entries != 1000))
+        {
+            logger.info("Configuration options credentials_update_interval_in_ms, credentials_validity_in_ms and " +
+                        "credentials_cache_max_entries may not be applicable for the configured authenticator ({})",
+                        authenticator.getClass().getName());
+        }
+
         if (conf.authorizer != null)
             authorizer = FBUtilities.newAuthorizer(conf.authorizer);
 
-        if (authenticator instanceof AllowAllAuthenticator && !(authorizer instanceof AllowAllAuthorizer))
-            throw new ConfigurationException("AllowAllAuthenticator can't be used with " +  conf.authorizer, false);
+        if (!authenticator.requireAuthentication() && authorizer.requireAuthorization())
+            throw new ConfigurationException(conf.authenticator + " can't be used with " +  conf.authorizer, false);
 
         if (conf.role_manager != null)
             roleManager = FBUtilities.newRoleManager(conf.role_manager);
@@ -517,6 +536,14 @@
             conf.hints_directory += File.separator + "hints";
         }
 
+        if (conf.cdc_raw_directory == null)
+        {
+            conf.cdc_raw_directory = System.getProperty("cassandra.storagedir", null);
+            if (conf.cdc_raw_directory == null)
+                throw new ConfigurationException("cdc_raw_directory is missing and -Dcassandra.storagedir is not set", false);
+            conf.cdc_raw_directory += File.separator + "cdc_raw";
+        }
+
         if (conf.commitlog_total_space_in_mb == null)
         {
             int preferredSize = 8192;
@@ -544,6 +571,38 @@
             }
         }
 
+        if (conf.cdc_total_space_in_mb == null)
+        {
+            int preferredSize = 4096;
+            int minSize = 0;
+            try
+            {
+                // use 1/8th of available space.  See discussion on #10013 and #10199 on the CL, taking half that for CDC
+                minSize = Ints.checkedCast((guessFileStore(conf.cdc_raw_directory).getTotalSpace() / 1048576) / 8);
+            }
+            catch (IOException e)
+            {
+                logger.debug("Error checking disk space", e);
+                throw new ConfigurationException(String.format("Unable to check disk space available to %s. Perhaps the Cassandra user does not have the necessary permissions",
+                                                               conf.cdc_raw_directory), e);
+            }
+            if (minSize < preferredSize)
+            {
+                logger.warn("Small cdc volume detected at {}; setting cdc_total_space_in_mb to {}.  You can override this in cassandra.yaml",
+                            conf.cdc_raw_directory, minSize);
+                conf.cdc_total_space_in_mb = minSize;
+            }
+            else
+            {
+                conf.cdc_total_space_in_mb = preferredSize;
+            }
+        }
+
+        if (conf.cdc_enabled == true)
+        {
+            logger.info("cdc_enabled is true. Starting casssandra node with Change-Data-Capture enabled.");
+        }
+
         if (conf.saved_caches_directory == null)
         {
             conf.saved_caches_directory = System.getProperty("cassandra.storagedir", null);
@@ -582,8 +641,8 @@
             }
         }
         if (dataFreeBytes < 64L * 1024 * 1048576) // 64 GB
-            logger.warn("Only {} MB free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots",
-                        dataFreeBytes / 1048576);
+            logger.warn("Only {} free across all data volumes. Consider adding more capacity to your cluster or removing obsolete snapshots",
+                        FBUtilities.prettyPrintMemory(dataFreeBytes));
 
 
         if (conf.commitlog_directory.equals(conf.saved_caches_directory))
@@ -593,9 +652,6 @@
         if (conf.hints_directory.equals(conf.saved_caches_directory))
             throw new ConfigurationException("saved_caches_directory must not be the same as the hints_directory", false);
 
-        if (conf.memtable_flush_writers == null)
-            conf.memtable_flush_writers = Math.min(8, Math.max(2, Math.min(FBUtilities.getAvailableProcessors(), conf.data_file_directories.length)));
-
         if (conf.memtable_flush_writers < 1)
             throw new ConfigurationException("memtable_flush_writers must be at least 1, but was " + conf.memtable_flush_writers, false);
 
@@ -633,6 +689,38 @@
 
         try
         {
+            // if prepared_statements_cache_size_mb option was set to "auto" then size of the cache should be "max(1/256 of Heap (in MB), 10MB)"
+            preparedStatementsCacheSizeInMB = (conf.prepared_statements_cache_size_mb == null)
+                                              ? Math.max(10, (int) (Runtime.getRuntime().maxMemory() / 1024 / 1024 / 256))
+                                              : conf.prepared_statements_cache_size_mb;
+
+            if (preparedStatementsCacheSizeInMB <= 0)
+                throw new NumberFormatException(); // to escape duplicating error message
+        }
+        catch (NumberFormatException e)
+        {
+            throw new ConfigurationException("prepared_statements_cache_size_mb option was set incorrectly to '"
+                                             + conf.prepared_statements_cache_size_mb + "', supported values are <integer> >= 0.", false);
+        }
+
+        try
+        {
+            // if thrift_prepared_statements_cache_size_mb option was set to "auto" then size of the cache should be "max(1/256 of Heap (in MB), 10MB)"
+            thriftPreparedStatementsCacheSizeInMB = (conf.thrift_prepared_statements_cache_size_mb == null)
+                                                    ? Math.max(10, (int) (Runtime.getRuntime().maxMemory() / 1024 / 1024 / 256))
+                                                    : conf.thrift_prepared_statements_cache_size_mb;
+
+            if (thriftPreparedStatementsCacheSizeInMB <= 0)
+                throw new NumberFormatException(); // to escape duplicating error message
+        }
+        catch (NumberFormatException e)
+        {
+            throw new ConfigurationException("thrift_prepared_statements_cache_size_mb option was set incorrectly to '"
+                                             + conf.thrift_prepared_statements_cache_size_mb + "', supported values are <integer> >= 0.", false);
+        }
+
+        try
+        {
             // if key_cache_size_in_mb option was set to "auto" then size of the cache should be "min(5% of Heap (in MB), 100MB)
             keyCacheSizeInMB = (conf.key_cache_size_in_mb == null)
                 ? Math.min(Math.max(1, (int) (Runtime.getRuntime().totalMemory() * 0.05 / 1024 / 1024)), 100)
@@ -705,6 +793,10 @@
         if (conf.user_defined_function_fail_timeout < conf.user_defined_function_warn_timeout)
             throw new ConfigurationException("user_defined_function_warn_timeout must less than user_defined_function_fail_timeout", false);
 
+        // always attempt to load the cipher factory, as we could be in the situation where the user has disabled encryption,
+        // but has existing commitlogs and sstables on disk that are still encrypted (and still need to be read)
+        encryptionContext = new EncryptionContext(config.transparent_data_encryption_options);
+
         if (conf.max_mutation_size_in_kb == null)
             conf.max_mutation_size_in_kb = conf.commitlog_segment_size_in_mb * 1024 / 2;
         else if (conf.commitlog_segment_size_in_mb * 1024 < 2 * conf.max_mutation_size_in_kb)
@@ -774,11 +866,6 @@
         conf.permissions_validity_in_ms = timeout;
     }
 
-    public static int getPermissionsCacheMaxEntries()
-    {
-        return conf.permissions_cache_max_entries;
-    }
-
     public static int getPermissionsUpdateInterval()
     {
         return conf.permissions_update_interval_in_ms == -1
@@ -786,6 +873,21 @@
              : conf.permissions_update_interval_in_ms;
     }
 
+    public static void setPermissionsUpdateInterval(int updateInterval)
+    {
+        conf.permissions_update_interval_in_ms = updateInterval;
+    }
+
+    public static int getPermissionsCacheMaxEntries()
+    {
+        return conf.permissions_cache_max_entries;
+    }
+
+    public static int setPermissionsCacheMaxEntries(int maxEntries)
+    {
+        return conf.permissions_cache_max_entries = maxEntries;
+    }
+
     public static int getRolesValidity()
     {
         return conf.roles_validity_in_ms;
@@ -796,11 +898,6 @@
         conf.roles_validity_in_ms = validity;
     }
 
-    public static int getRolesCacheMaxEntries()
-    {
-        return conf.roles_cache_max_entries;
-    }
-
     public static int getRolesUpdateInterval()
     {
         return conf.roles_update_interval_in_ms == -1
@@ -813,9 +910,46 @@
         conf.roles_update_interval_in_ms = interval;
     }
 
-    public static void setPermissionsUpdateInterval(int updateInterval)
+    public static int getRolesCacheMaxEntries()
     {
-        conf.permissions_update_interval_in_ms = updateInterval;
+        return conf.roles_cache_max_entries;
+    }
+
+    public static int setRolesCacheMaxEntries(int maxEntries)
+    {
+        return conf.roles_cache_max_entries = maxEntries;
+    }
+
+    public static int getCredentialsValidity()
+    {
+        return conf.credentials_validity_in_ms;
+    }
+
+    public static void setCredentialsValidity(int timeout)
+    {
+        conf.credentials_validity_in_ms = timeout;
+    }
+
+    public static int getCredentialsUpdateInterval()
+    {
+        return conf.credentials_update_interval_in_ms == -1
+               ? conf.credentials_validity_in_ms
+               : conf.credentials_update_interval_in_ms;
+    }
+
+    public static void setCredentialsUpdateInterval(int updateInterval)
+    {
+        conf.credentials_update_interval_in_ms = updateInterval;
+    }
+
+    public static int getCredentialsCacheMaxEntries()
+    {
+        return conf.credentials_cache_max_entries;
+    }
+
+    public static int setCredentialsCacheMaxEntries(int maxEntries)
+    {
+        return conf.credentials_cache_max_entries = maxEntries;
     }
 
     public static int getThriftFramedTransportSize()
@@ -857,6 +991,13 @@
             if (conf.saved_caches_directory == null)
                 throw new ConfigurationException("saved_caches_directory must be specified", false);
             FileUtils.createDirectory(conf.saved_caches_directory);
+
+            if (conf.cdc_enabled)
+            {
+                if (conf.cdc_raw_directory == null)
+                    throw new ConfigurationException("cdc_raw_directory must be specified", false);
+                FileUtils.createDirectory(conf.cdc_raw_directory);
+            }
         }
         catch (ConfigurationException e)
         {
@@ -915,6 +1056,23 @@
         return conf.column_index_size_in_kb * 1024;
     }
 
+    @VisibleForTesting
+    public static void setColumnIndexSize(int val)
+    {
+        conf.column_index_size_in_kb = val;
+    }
+
+    public static int getColumnIndexCacheSize()
+    {
+        return conf.column_index_cache_size_in_kb * 1024;
+    }
+
+    @VisibleForTesting
+    public static void setColumnIndexCacheSize(int val)
+    {
+        conf.column_index_cache_size_in_kb = val;
+    }
+
     public static int getBatchSizeWarnThreshold()
     {
         return conf.batch_size_warn_threshold_in_kb * 1024;
@@ -959,8 +1117,8 @@
     {
         List<String> tokens = new ArrayList<String>();
         if (tokenString != null)
-            for (String token : tokenString.split(","))
-                tokens.add(token.replaceAll("^\\s+", "").replaceAll("\\s+$", ""));
+            for (String token : StringUtils.split(tokenString, ','))
+                tokens.add(token.trim());
         return tokens;
     }
 
@@ -981,7 +1139,7 @@
         }
         catch (UnknownHostException e)
         {
-            throw new RuntimeException("Replacement ost name could not be resolved or scope_id was specified for a global IPv6 address", e);
+            throw new RuntimeException("Replacement host name could not be resolved or scope_id was specified for a global IPv6 address", e);
         }
     }
 
@@ -1249,6 +1407,12 @@
         return conf.commitlog_directory;
     }
 
+    @VisibleForTesting
+    public static void setCommitLogLocation(String value)
+    {
+        conf.commitlog_directory = value;
+    }
+
     public static ParameterizedClass getCommitLogCompression()
     {
         return conf.commitlog_compression;
@@ -1259,6 +1423,11 @@
         conf.commitlog_compression = compressor;
     }
 
+   /**
+    * Maximum number of buffers in the compression pool. The default value is 3, it should not be set lower than that
+    * (one segment in compression, one written to, one in reserve); delays in compression may cause the log to use
+    * more, depending on how soon the sync policy stops all writing threads.
+    */
     public static int getCommitLogMaxCompressionBuffersInPool()
     {
         return conf.commitlog_max_compression_buffers_in_pool;
@@ -1689,6 +1858,13 @@
 
     public static int getFileCacheSizeInMB()
     {
+        if (conf.file_cache_size_in_mb == null)
+        {
+            // In client mode the value is not set.
+            assert Config.isClientMode();
+            return 0;
+        }
+
         return conf.file_cache_size_in_mb;
     }
 
@@ -1839,6 +2015,11 @@
         conf.counter_cache_keys_to_save = counterCacheKeysToSave;
     }
 
+    public static void setStreamingSocketTimeout(int value)
+    {
+        conf.streaming_socket_timeout_in_ms = value;
+    }
+
     public static int getStreamingSocketTimeout()
     {
         return conf.streaming_socket_timeout_in_ms;
@@ -1887,8 +2068,7 @@
                 }
                 return new SlabPool(heapLimit, offHeapLimit, conf.memtable_cleanup_threshold, new ColumnFamilyStore.FlushLargestColumnFamily());
             case offheap_objects:
-                throw new ConfigurationException("offheap_objects are not available in 3.0. They should be re-introduced in a future release, see https://issues.apache.org/jira/browse/CASSANDRA-9472 for details");
-                // return new NativePool(heapLimit, offHeapLimit, conf.memtable_cleanup_threshold, new ColumnFamilyStore.FlushLargestColumnFamily());
+                return new NativePool(heapLimit, offHeapLimit, conf.memtable_cleanup_threshold, new ColumnFamilyStore.FlushLargestColumnFamily());
             default:
                 throw new AssertionError();
         }
@@ -1940,6 +2120,16 @@
         return conf.windows_timer_interval;
     }
 
+    public static long getPreparedStatementsCacheSizeMB()
+    {
+        return preparedStatementsCacheSizeInMB;
+    }
+
+    public static long getThriftPreparedStatementsCacheSizeMB()
+    {
+        return thriftPreparedStatementsCacheSizeInMB;
+    }
+
     public static boolean enableUserDefinedFunctions()
     {
         return conf.enable_user_defined_functions;
@@ -1995,9 +2185,50 @@
         return conf.gc_log_threshold_in_ms;
     }
 
+    public static EncryptionContext getEncryptionContext()
+    {
+        return encryptionContext;
+    }
+    
     public static long getGCWarnThreshold()
     {
         return conf.gc_warn_threshold_in_ms;
     }
 
+    public static boolean isCDCEnabled()
+    {
+        return conf.cdc_enabled;
+    }
+
+    public static String getCDCLogLocation()
+    {
+        return conf.cdc_raw_directory;
+    }
+
+    public static Integer getCDCSpaceInMB()
+    {
+        return conf.cdc_total_space_in_mb;
+    }
+
+    @VisibleForTesting
+    public static void setCDCSpaceInMB(Integer input)
+    {
+        conf.cdc_total_space_in_mb = input;
+    }
+
+    public static Integer getCDCDiskCheckInterval()
+    {
+        return conf.cdc_free_space_check_interval_ms;
+    }
+
+    @VisibleForTesting
+    public static void setEncryptionContext(EncryptionContext ec)
+    {
+        encryptionContext = ec;
+    }
+
+    public static int searchConcurrencyFactor()
+    {
+        return Integer.valueOf(System.getProperty("cassandra.search_concurrency_factor", "1"));
+    }
 }

diff --git a/src/java/org/apache/cassandra/config/EncryptionOptions.java b/src/java/org/apache/cassandra/config/EncryptionOptions.java
index 31f8b4a..d662871 100644
--- a/src/java/org/apache/cassandra/config/EncryptionOptions.java
+++ b/src/java/org/apache/cassandra/config/EncryptionOptions.java

@@ -17,21 +17,20 @@
  */
 package org.apache.cassandra.config;
 
+import javax.net.ssl.SSLSocketFactory;
+
 public abstract class EncryptionOptions
 {
     public String keystore = "conf/.keystore";
     public String keystore_password = "cassandra";
     public String truststore = "conf/.truststore";
     public String truststore_password = "cassandra";
-    public String[] cipher_suites = {
-        "TLS_RSA_WITH_AES_128_CBC_SHA", "TLS_RSA_WITH_AES_256_CBC_SHA",
-        "TLS_DHE_RSA_WITH_AES_128_CBC_SHA", "TLS_DHE_RSA_WITH_AES_256_CBC_SHA",
-        "TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA", "TLS_ECDHE_RSA_WITH_AES_256_CBC_SHA" 
-    };
+    public String[] cipher_suites = ((SSLSocketFactory)SSLSocketFactory.getDefault()).getDefaultCipherSuites();
     public String protocol = "TLS";
     public String algorithm = "SunX509";
     public String store_type = "JKS";
     public boolean require_client_auth = false;
+    public boolean require_endpoint_verification = false;
 
     public static class ClientEncryptionOptions extends EncryptionOptions
     {

diff --git a/src/java/org/apache/cassandra/config/TransparentDataEncryptionOptions.java b/src/java/org/apache/cassandra/config/TransparentDataEncryptionOptions.java
new file mode 100644
index 0000000..4ad0305
--- /dev/null
+++ b/src/java/org/apache/cassandra/config/TransparentDataEncryptionOptions.java

@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.config;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Objects;
+
+public class TransparentDataEncryptionOptions
+{
+    public boolean enabled = false;
+    public int chunk_length_kb = 64;
+    public String cipher = "AES/CBC/PKCS5Padding";
+    public String key_alias;
+    public int iv_length = 16;
+
+    public ParameterizedClass key_provider;
+
+    public TransparentDataEncryptionOptions()
+    {   }
+
+    public TransparentDataEncryptionOptions(boolean enabled)
+    {
+        this.enabled = enabled;
+    }
+
+    public TransparentDataEncryptionOptions(String cipher, String keyAlias, ParameterizedClass keyProvider)
+    {
+        this(true, cipher, keyAlias, keyProvider);
+    }
+
+    public TransparentDataEncryptionOptions(boolean enabled, String cipher, String keyAlias, ParameterizedClass keyProvider)
+    {
+        this.enabled = enabled;
+        this.cipher = cipher;
+        key_alias = keyAlias;
+        key_provider = keyProvider;
+    }
+
+    public String get(String key)
+    {
+        return key_provider.parameters.get(key);
+    }
+
+    @VisibleForTesting
+    public void remove(String key)
+    {
+        key_provider.parameters.remove(key);
+    }
+
+    public boolean equals(Object o)
+    {
+        return o instanceof TransparentDataEncryptionOptions && equals((TransparentDataEncryptionOptions) o);
+    }
+
+    public boolean equals(TransparentDataEncryptionOptions other)
+    {
+        // not sure if this is a great equals() impl....
+        return Objects.equal(cipher, other.cipher) &&
+               Objects.equal(key_alias, other.key_alias);
+    }
+}

diff --git a/src/java/org/apache/cassandra/config/ViewDefinition.java b/src/java/org/apache/cassandra/config/ViewDefinition.java
index 5300f56..33cc175 100644
--- a/src/java/org/apache/cassandra/config/ViewDefinition.java
+++ b/src/java/org/apache/cassandra/config/ViewDefinition.java

@@ -128,8 +128,10 @@
     }
 
     /**
-     * Replace the column {@param from} with {@param to} in this materialized view definition's partition,
+     * Replace the column 'from' with 'to' in this materialized view definition's partition,
      * clustering, or included columns.
+     * @param from the existing column 
+     * @param to the new column 
      */
     public void renameColumn(ColumnIdentifier from, ColumnIdentifier to)
     {
@@ -137,8 +139,8 @@
 
         // convert whereClause to Relations, rename ids in Relations, then convert back to whereClause
         List<Relation> relations = whereClauseToRelations(whereClause);
-        ColumnIdentifier.Raw fromRaw = new ColumnIdentifier.Literal(from.toString(), true);
-        ColumnIdentifier.Raw toRaw = new ColumnIdentifier.Literal(to.toString(), true);
+        ColumnDefinition.Raw fromRaw = ColumnDefinition.Raw.forQuoted(from.toString());
+        ColumnDefinition.Raw toRaw = ColumnDefinition.Raw.forQuoted(to.toString());
         List<Relation> newRelations = relations.stream()
                 .map(r -> r.renameIdentifier(fromRaw, toRaw))
                 .collect(Collectors.toList());

diff --git a/src/java/org/apache/cassandra/config/YamlConfigurationLoader.java b/src/java/org/apache/cassandra/config/YamlConfigurationLoader.java
index 435377c..bd5638a 100644
--- a/src/java/org/apache/cassandra/config/YamlConfigurationLoader.java
+++ b/src/java/org/apache/cassandra/config/YamlConfigurationLoader.java

@@ -20,10 +20,11 @@
 import java.beans.IntrospectionException;
 import java.io.ByteArrayInputStream;
 import java.io.File;
-import java.io.InputStream;
 import java.io.IOException;
+import java.io.InputStream;
 import java.net.URL;
 import java.util.HashSet;
+
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
@@ -32,6 +33,9 @@
 import com.google.common.collect.Maps;
 import com.google.common.collect.Sets;
 import com.google.common.io.ByteStreams;
+
+import org.apache.commons.lang3.SystemUtils;
+
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -112,16 +116,17 @@
             }
 
             Constructor constructor = new CustomConstructor(Config.class);
-            MissingPropertiesChecker propertiesChecker = new MissingPropertiesChecker();
+            PropertiesChecker propertiesChecker = new PropertiesChecker();
             constructor.setPropertyUtils(propertiesChecker);
             Yaml yaml = new Yaml(constructor);
-            Config result = yaml.loadAs(new ByteArrayInputStream(configBytes), Config.class);
+            Config result = loadConfig(yaml, configBytes);
             propertiesChecker.check();
             return result;
         }
         catch (YAMLException e)
         {
-            throw new ConfigurationException("Invalid yaml: " + url, e);
+            throw new ConfigurationException("Invalid yaml: " + url + SystemUtils.LINE_SEPARATOR
+                                             +  " Error: " + e.getMessage(), false);
         }
     }
 
@@ -161,11 +166,25 @@
         }
     }
 
-    private static class MissingPropertiesChecker extends PropertyUtils
+    private Config loadConfig(Yaml yaml, byte[] configBytes)
+    {
+        Config config = yaml.loadAs(new ByteArrayInputStream(configBytes), Config.class);
+        // If the configuration file is empty yaml will return null. In this case we should use the default
+        // configuration to avoid hitting a NPE at a later stage.
+        return config == null ? new Config() : config;
+    }
+
+    /**
+     * Utility class to check that there are no extra properties and that properties that are not null by default
+     * are not set to null.
+     */
+    private static class PropertiesChecker extends PropertyUtils
     {
         private final Set<String> missingProperties = new HashSet<>();
 
-        public MissingPropertiesChecker()
+        private final Set<String> nullProperties = new HashSet<>();
+
+        public PropertiesChecker()
         {
             setSkipMissingProperties(true);
         }
@@ -173,19 +192,49 @@
         @Override
         public Property getProperty(Class<? extends Object> type, String name) throws IntrospectionException
         {
-            Property result = super.getProperty(type, name);
+            final Property result = super.getProperty(type, name);
+
             if (result instanceof MissingProperty)
             {
                 missingProperties.add(result.getName());
             }
-            return result;
+
+            return new Property(result.getName(), result.getType())
+            {
+                @Override
+                public void set(Object object, Object value) throws Exception
+                {
+                    if (value == null && get(object) != null)
+                    {
+                        nullProperties.add(getName());
+                    }
+                    result.set(object, value);
+                }
+
+                @Override
+                public Class<?>[] getActualTypeArguments()
+                {
+                    return result.getActualTypeArguments();
+                }
+
+                @Override
+                public Object get(Object object)
+                {
+                    return result.get(object);
+                }
+            };
         }
 
         public void check() throws ConfigurationException
         {
+            if (!nullProperties.isEmpty())
+            {
+                throw new ConfigurationException("Invalid yaml. Those properties " + nullProperties + " are not valid", false);
+            }
+
             if (!missingProperties.isEmpty())
             {
-                throw new ConfigurationException("Invalid yaml. Please remove properties " + missingProperties + " from your cassandra.yaml");
+                throw new ConfigurationException("Invalid yaml. Please remove properties " + missingProperties + " from your cassandra.yaml", false);
             }
         }
     }

diff --git a/src/java/org/apache/cassandra/cql3/AbstractMarker.java b/src/java/org/apache/cassandra/cql3/AbstractMarker.java
index cd26bd7..3689ed1 100644
--- a/src/java/org/apache/cassandra/cql3/AbstractMarker.java
+++ b/src/java/org/apache/cassandra/cql3/AbstractMarker.java

@@ -21,6 +21,7 @@
 import java.util.List;
 
 import org.apache.cassandra.cql3.functions.Function;
+import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.CollectionType;
 import org.apache.cassandra.db.marshal.ListType;
 import org.apache.cassandra.exceptions.InvalidRequestException;
@@ -67,23 +68,39 @@
 
         public NonTerminal prepare(String keyspace, ColumnSpecification receiver) throws InvalidRequestException
         {
-            if (!(receiver.type instanceof CollectionType))
-                return new Constants.Marker(bindIndex, receiver);
-
-            switch (((CollectionType)receiver.type).kind)
+            if (receiver.type.isCollection())
             {
-                case LIST: return new Lists.Marker(bindIndex, receiver);
-                case SET:  return new Sets.Marker(bindIndex, receiver);
-                case MAP:  return new Maps.Marker(bindIndex, receiver);
+                switch (((CollectionType) receiver.type).kind)
+                {
+                    case LIST:
+                        return new Lists.Marker(bindIndex, receiver);
+                    case SET:
+                        return new Sets.Marker(bindIndex, receiver);
+                    case MAP:
+                        return new Maps.Marker(bindIndex, receiver);
+                    default:
+                        throw new AssertionError();
+                }
             }
-            throw new AssertionError();
+            else if (receiver.type.isUDT())
+            {
+                return new UserTypes.Marker(bindIndex, receiver);
+            }
+
+            return new Constants.Marker(bindIndex, receiver);
         }
 
+        @Override
         public AssignmentTestable.TestResult testAssignment(String keyspace, ColumnSpecification receiver)
         {
             return AssignmentTestable.TestResult.WEAKLY_ASSIGNABLE;
         }
 
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            return null;
+        }
+
         @Override
         public String getText()
         {

diff --git a/src/java/org/apache/cassandra/cql3/Attributes.java b/src/java/org/apache/cassandra/cql3/Attributes.java
index e1d2522..a443f23 100644
--- a/src/java/org/apache/cassandra/cql3/Attributes.java
+++ b/src/java/org/apache/cassandra/cql3/Attributes.java

@@ -24,6 +24,7 @@
 import com.google.common.collect.Iterables;
 
 import org.apache.cassandra.cql3.functions.Function;
+import org.apache.cassandra.db.LivenessInfo;
 import org.apache.cassandra.db.marshal.Int32Type;
 import org.apache.cassandra.db.marshal.LongType;
 import org.apache.cassandra.exceptions.InvalidRequestException;
@@ -94,17 +95,17 @@
         return LongType.instance.compose(tval);
     }
 
-    public int getTimeToLive(QueryOptions options) throws InvalidRequestException
+    public int getTimeToLive(QueryOptions options, int defaultTimeToLive) throws InvalidRequestException
     {
         if (timeToLive == null)
-            return 0;
+            return defaultTimeToLive;
 
         ByteBuffer tval = timeToLive.bindAndGet(options);
         if (tval == null)
             throw new InvalidRequestException("Invalid null value of TTL");
 
-        if (tval == ByteBufferUtil.UNSET_BYTE_BUFFER) // treat as unlimited
-            return 0;
+        if (tval == ByteBufferUtil.UNSET_BYTE_BUFFER)
+            return defaultTimeToLive;
 
         try
         {
@@ -122,6 +123,9 @@
         if (ttl > MAX_TTL)
             throw new InvalidRequestException(String.format("ttl is too large. requested (%d) maximum (%d)", ttl, MAX_TTL));
 
+        if (defaultTimeToLive != LivenessInfo.NO_TTL && ttl == LivenessInfo.NO_TTL)
+            return LivenessInfo.NO_TTL;
+
         return ttl;
     }
 

diff --git a/src/java/org/apache/cassandra/cql3/CQL3Type.java b/src/java/org/apache/cassandra/cql3/CQL3Type.java
index 95524d9..cf7e18a 100644
--- a/src/java/org/apache/cassandra/cql3/CQL3Type.java
+++ b/src/java/org/apache/cassandra/cql3/CQL3Type.java

@@ -39,7 +39,16 @@
 {
     static final Logger logger = LoggerFactory.getLogger(CQL3Type.class);
 
-    public boolean isCollection();
+    default boolean isCollection()
+    {
+        return false;
+    }
+
+    default boolean isUDT()
+    {
+        return false;
+    }
+
     public AbstractType<?> getType();
 
     /**
@@ -82,11 +91,6 @@
             this.type = type;
         }
 
-        public boolean isCollection()
-        {
-            return false;
-        }
-
         public AbstractType<?> getType()
         {
             return type;
@@ -125,11 +129,6 @@
             this(TypeParser.parse(className));
         }
 
-        public boolean isCollection()
-        {
-            return false;
-        }
-
         public AbstractType<?> getType()
         {
             return type;
@@ -305,9 +304,9 @@
             return new UserDefined(UTF8Type.instance.compose(type.name), type);
         }
 
-        public boolean isCollection()
+        public boolean isUDT()
         {
-            return false;
+            return true;
         }
 
         public AbstractType<?> getType()
@@ -377,7 +376,10 @@
         @Override
         public String toString()
         {
-            return "frozen<" + ColumnIdentifier.maybeQuote(name) + '>';
+            if (type.isMultiCell())
+                return ColumnIdentifier.maybeQuote(name);
+            else
+                return "frozen<" + ColumnIdentifier.maybeQuote(name) + '>';
         }
     }
 
@@ -395,11 +397,6 @@
             return new Tuple(type);
         }
 
-        public boolean isCollection()
-        {
-            return false;
-        }
-
         public AbstractType<?> getType()
         {
             return type;
@@ -485,12 +482,7 @@
     {
         protected boolean frozen = false;
 
-        protected abstract boolean supportsFreezing();
-
-        public boolean isCollection()
-        {
-            return false;
-        }
+        public abstract boolean supportsFreezing();
 
         public boolean isFrozen()
         {
@@ -507,6 +499,11 @@
             return false;
         }
 
+        public boolean isUDT()
+        {
+            return false;
+        }
+
         public String keyspace()
         {
             return null;
@@ -588,7 +585,7 @@
                 return type;
             }
 
-            protected boolean supportsFreezing()
+            public boolean supportsFreezing()
             {
                 return false;
             }
@@ -627,7 +624,7 @@
                 frozen = true;
             }
 
-            protected boolean supportsFreezing()
+            public boolean supportsFreezing()
             {
                 return true;
             }
@@ -652,7 +649,7 @@
                 assert values != null : "Got null values type for a collection";
 
                 if (!frozen && values.supportsFreezing() && !values.frozen)
-                    throw new InvalidRequestException("Non-frozen collections are not allowed inside collections: " + this);
+                    throwNestedNonFrozenError(values);
 
                 // we represent Thrift supercolumns as maps, internally, and we do allow counters in supercolumns. Thus,
                 // for internal type parsing (think schema) we have to make an exception and allow counters as (map) values
@@ -664,22 +661,31 @@
                     if (keys.isCounter())
                         throw new InvalidRequestException("Counters are not allowed inside collections: " + this);
                     if (!frozen && keys.supportsFreezing() && !keys.frozen)
-                        throw new InvalidRequestException("Non-frozen collections are not allowed inside collections: " + this);
+                        throwNestedNonFrozenError(keys);
                 }
 
+                AbstractType<?> valueType = values.prepare(keyspace, udts).getType();
                 switch (kind)
                 {
                     case LIST:
-                        return new Collection(ListType.getInstance(values.prepare(keyspace, udts).getType(), !frozen));
+                        return new Collection(ListType.getInstance(valueType, !frozen));
                     case SET:
-                        return new Collection(SetType.getInstance(values.prepare(keyspace, udts).getType(), !frozen));
+                        return new Collection(SetType.getInstance(valueType, !frozen));
                     case MAP:
                         assert keys != null : "Got null keys type for a collection";
-                        return new Collection(MapType.getInstance(keys.prepare(keyspace, udts).getType(), values.prepare(keyspace, udts).getType(), !frozen));
+                        return new Collection(MapType.getInstance(keys.prepare(keyspace, udts).getType(), valueType, !frozen));
                 }
                 throw new AssertionError();
             }
 
+            private void throwNestedNonFrozenError(Raw innerType)
+            {
+                if (innerType instanceof RawCollection)
+                    throw new InvalidRequestException("Non-frozen collections are not allowed inside collections: " + this);
+                else
+                    throw new InvalidRequestException("Non-frozen UDTs are not allowed inside collections: " + this);
+            }
+
             public boolean referencesUserType(String name)
             {
                 return (keys != null && keys.referencesUserType(name)) || values.referencesUserType(name);
@@ -721,7 +727,7 @@
 
             public boolean canBeNonFrozen()
             {
-                return false;
+                return true;
             }
 
             public CQL3Type prepare(String keyspace, Types udts) throws InvalidRequestException
@@ -744,9 +750,8 @@
                 if (type == null)
                     throw new InvalidRequestException("Unknown type " + name);
 
-                if (!frozen)
-                    throw new InvalidRequestException("Non-frozen User-Defined types are not supported, please use frozen<>");
-
+                if (frozen)
+                    type = type.freeze();
                 return new UserDefined(name.toString(), type);
             }
 
@@ -755,7 +760,12 @@
                 return this.name.getStringTypeName().equals(name);
             }
 
-            protected boolean supportsFreezing()
+            public boolean supportsFreezing()
+            {
+                return true;
+            }
+
+            public boolean isUDT()
             {
                 return true;
             }
@@ -763,7 +773,10 @@
             @Override
             public String toString()
             {
-                return name.toString();
+                if (frozen)
+                    return "frozen<" + name.toString() + '>';
+                else
+                    return name.toString();
             }
         }
 
@@ -776,16 +789,11 @@
                 this.types = types;
             }
 
-            protected boolean supportsFreezing()
+            public boolean supportsFreezing()
             {
                 return true;
             }
 
-            public boolean isCollection()
-            {
-                return false;
-            }
-
             public void freeze() throws InvalidRequestException
             {
                 for (CQL3Type.Raw t : types)

diff --git a/src/java/org/apache/cassandra/cql3/ColumnCondition.java b/src/java/org/apache/cassandra/cql3/ColumnCondition.java
index b13e534..304f8bc 100644
--- a/src/java/org/apache/cassandra/cql3/ColumnCondition.java
+++ b/src/java/org/apache/cassandra/cql3/ColumnCondition.java

@@ -22,6 +22,7 @@
 
 import com.google.common.collect.Iterators;
 
+import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.functions.Function;
 import org.apache.cassandra.db.rows.*;
@@ -30,8 +31,6 @@
 import org.apache.cassandra.transport.Server;
 import org.apache.cassandra.utils.ByteBufferUtil;
 
-import static com.google.common.collect.Lists.newArrayList;
-
 /**
  * A CQL3 condition on the value of a column or collection element.  For example, "UPDATE .. IF a = 0".
  */
@@ -42,51 +41,71 @@
     // For collection, when testing the equality of a specific element, null otherwise.
     private final Term collectionElement;
 
+    // For UDT, when testing the equality of a specific field, null otherwise.
+    private final FieldIdentifier field;
+
     private final Term value;  // a single value or a marker for a list of IN values
     private final List<Term> inValues;
 
     public final Operator operator;
 
-    private ColumnCondition(ColumnDefinition column, Term collectionElement, Term value, List<Term> inValues, Operator op)
+    private ColumnCondition(ColumnDefinition column, Term collectionElement, FieldIdentifier field, Term value, List<Term> inValues, Operator op)
     {
         this.column = column;
         this.collectionElement = collectionElement;
+        this.field = field;
         this.value = value;
         this.inValues = inValues;
         this.operator = op;
 
+        assert field == null || collectionElement == null;
         if (operator != Operator.IN)
             assert this.inValues == null;
     }
 
     public static ColumnCondition condition(ColumnDefinition column, Term value, Operator op)
     {
-        return new ColumnCondition(column, null, value, null, op);
+        return new ColumnCondition(column, null, null, value, null, op);
     }
 
     public static ColumnCondition condition(ColumnDefinition column, Term collectionElement, Term value, Operator op)
     {
-        return new ColumnCondition(column, collectionElement, value, null, op);
+        return new ColumnCondition(column, collectionElement, null, value, null, op);
+    }
+
+    public static ColumnCondition condition(ColumnDefinition column, FieldIdentifier udtField, Term value, Operator op)
+    {
+        return new ColumnCondition(column, null, udtField, value, null, op);
     }
 
     public static ColumnCondition inCondition(ColumnDefinition column, List<Term> inValues)
     {
-        return new ColumnCondition(column, null, null, inValues, Operator.IN);
+        return new ColumnCondition(column, null, null, null, inValues, Operator.IN);
     }
 
     public static ColumnCondition inCondition(ColumnDefinition column, Term collectionElement, List<Term> inValues)
     {
-        return new ColumnCondition(column, collectionElement, null, inValues, Operator.IN);
+        return new ColumnCondition(column, collectionElement, null, null, inValues, Operator.IN);
+    }
+
+    public static ColumnCondition inCondition(ColumnDefinition column, FieldIdentifier udtField, List<Term> inValues)
+    {
+        return new ColumnCondition(column, null, udtField, null, inValues, Operator.IN);
     }
 
     public static ColumnCondition inCondition(ColumnDefinition column, Term inMarker)
     {
-        return new ColumnCondition(column, null, inMarker, null, Operator.IN);
+        return new ColumnCondition(column, null, null, inMarker, null, Operator.IN);
     }
 
     public static ColumnCondition inCondition(ColumnDefinition column, Term collectionElement, Term inMarker)
     {
-        return new ColumnCondition(column, collectionElement, inMarker, null, Operator.IN);
+        return new ColumnCondition(column, collectionElement, null, inMarker, null, Operator.IN);
+    }
+
+    public static ColumnCondition inCondition(ColumnDefinition column, FieldIdentifier udtField, Term inMarker)
+    {
+        return new ColumnCondition(column, null, udtField, inMarker, null, Operator.IN);
     }
 
     public void addFunctionsTo(List<Function> functions)
@@ -128,11 +147,19 @@
         boolean isInCondition = operator == Operator.IN;
         if (column.type instanceof CollectionType)
         {
-            if (collectionElement == null)
-                return isInCondition ? new CollectionInBound(this, options) : new CollectionBound(this, options);
-            else
+            if (collectionElement != null)
                 return isInCondition ? new ElementAccessInBound(this, options) : new ElementAccessBound(this, options);
+            else
+                return isInCondition ? new CollectionInBound(this, options) : new CollectionBound(this, options);
         }
+        else if (column.type.isUDT())
+        {
+            if (field != null)
+                return isInCondition ? new UDTFieldAccessInBound(this, options) : new UDTFieldAccessBound(this, options);
+            else
+                return isInCondition ? new UDTInBound(this, options) : new UDTBound(this, options);
+        }
+
         return isInCondition ? new SimpleInBound(this, options) : new SimpleBound(this, options);
     }
 
@@ -213,6 +240,35 @@
         return complexData == null ? Collections.<Cell>emptyIterator() : complexData.iterator();
     }
 
+    private static boolean evaluateComparisonWithOperator(int comparison, Operator operator)
+    {
+        // called when comparison != 0
+        switch (operator)
+        {
+            case EQ:
+                return false;
+            case LT:
+            case LTE:
+                return comparison < 0;
+            case GT:
+            case GTE:
+                return comparison > 0;
+            case NEQ:
+                return true;
+            default:
+                throw new AssertionError();
+        }
+    }
+
+    private static ByteBuffer cellValueAtIndex(Iterator<Cell> iter, int index)
+    {
+        int adv = Iterators.advance(iter, index);
+        if (adv == index && iter.hasNext())
+            return iter.next().value();
+        else
+            return null;
+    }
+
     /**
      * A condition on a single non-collection column. This does not support IN operators (see SimpleInBound).
      */
@@ -223,7 +279,7 @@
         private SimpleBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
         {
             super(condition.column, condition.operator);
-            assert !(column.type instanceof CollectionType) && condition.collectionElement == null;
+            assert !(column.type instanceof CollectionType) && condition.field == null;
             assert condition.operator != Operator.IN;
             this.value = condition.value.bindAndGet(options);
         }
@@ -244,7 +300,7 @@
         private SimpleInBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
         {
             super(condition.column, condition.operator);
-            assert !(column.type instanceof CollectionType) && condition.collectionElement == null;
+            assert !(column.type instanceof CollectionType) && condition.field == null;
             assert condition.operator == Operator.IN;
             if (condition.inValues == null)
                 this.inValues = ((Lists.Value) condition.value.bind(options)).getElements();
@@ -308,7 +364,7 @@
             ListType listType = (ListType) column.type;
             if (column.type.isMultiCell())
             {
-                ByteBuffer columnValue = getListItem(getCells(row, column), getListIndex(collectionElement));
+                ByteBuffer columnValue = cellValueAtIndex(getCells(row, column), getListIndex(collectionElement));
                 return compareWithOperator(operator, ((ListType)column.type).getElementsType(), value, columnValue);
             }
             else
@@ -327,15 +383,6 @@
             return idx;
         }
 
-        static ByteBuffer getListItem(Iterator<Cell> iter, int index)
-        {
-            int adv = Iterators.advance(iter, index);
-            if (adv == index && iter.hasNext())
-                return iter.next().value();
-            else
-                return null;
-        }
-
         public ByteBuffer getCollectionElementValue()
         {
             return collectionElement;
@@ -368,71 +415,46 @@
             if (collectionElement == null)
                 throw new InvalidRequestException("Invalid null value for " + (column.type instanceof MapType ? "map" : "list") + " element access");
 
+            ByteBuffer cellValue;
+            AbstractType<?> valueType;
             if (column.type instanceof MapType)
             {
                 MapType mapType = (MapType) column.type;
-                AbstractType<?> valueType = mapType.getValuesType();
+                valueType = mapType.getValuesType();
                 if (column.type.isMultiCell())
                 {
-                    Cell item = getCell(row, column, CellPath.create(collectionElement));
-                    for (ByteBuffer value : inValues)
-                    {
-                        if (isSatisfiedByValue(value, item, valueType, Operator.EQ))
-                            return true;
-                    }
-                    return false;
+                    Cell cell = getCell(row, column, CellPath.create(collectionElement));
+                    cellValue = cell == null ? null : cell.value();
                 }
                 else
                 {
                     Cell cell = getCell(row, column);
-                    ByteBuffer mapElementValue = cell == null
-                                               ? null
-                                               : mapType.getSerializer().getSerializedValue(cell.value(), collectionElement, mapType.getKeysType());
-                    for (ByteBuffer value : inValues)
-                    {
-                        if (value == null)
-                        {
-                            if (mapElementValue == null)
-                                return true;
-                            continue;
-                        }
-                        if (valueType.compare(value, mapElementValue) == 0)
-                            return true;
-                    }
-                    return false;
+                    cellValue = cell == null
+                              ? null
+                              : mapType.getSerializer().getSerializedValue(cell.value(), collectionElement, mapType.getKeysType());
+                }
+            }
+            else // ListType
+            {
+                ListType listType = (ListType) column.type;
+                valueType = listType.getElementsType();
+                if (column.type.isMultiCell())
+                {
+                    cellValue = cellValueAtIndex(getCells(row, column), ElementAccessBound.getListIndex(collectionElement));
+                }
+                else
+                {
+                    Cell cell = getCell(row, column);
+                    cellValue = cell == null
+                              ? null
+                              : listType.getSerializer().getElement(cell.value(), ElementAccessBound.getListIndex(collectionElement));
                 }
             }
 
-            ListType listType = (ListType) column.type;
-            AbstractType<?> elementsType = listType.getElementsType();
-            if (column.type.isMultiCell())
+            for (ByteBuffer value : inValues)
             {
-                ByteBuffer columnValue = ElementAccessBound.getListItem(getCells(row, column), ElementAccessBound.getListIndex(collectionElement));
-
-                for (ByteBuffer value : inValues)
-                {
-                    if (compareWithOperator(Operator.EQ, elementsType, value, columnValue))
-                        return true;
-                }
-            }
-            else
-            {
-                Cell cell = getCell(row, column);
-                ByteBuffer listElementValue = cell == null
-                                            ? null
-                                            : listType.getSerializer().getElement(cell.value(), ElementAccessBound.getListIndex(collectionElement));
-
-                for (ByteBuffer value : inValues)
-                {
-                    if (value == null)
-                    {
-                        if (listElementValue == null)
-                            return true;
-                        continue;
-                    }
-                    if (elementsType.compare(value, listElementValue) == 0)
-                        return true;
-                }
+                if (compareWithOperator(Operator.EQ, valueType, value, cellValue))
+                    return true;
             }
             return false;
         }
@@ -536,26 +558,6 @@
             return operator == Operator.EQ || operator == Operator.LTE || operator == Operator.GTE;
         }
 
-        private static boolean evaluateComparisonWithOperator(int comparison, Operator operator)
-        {
-            // called when comparison != 0
-            switch (operator)
-            {
-                case EQ:
-                    return false;
-                case LT:
-                case LTE:
-                    return comparison < 0;
-                case GT:
-                case GTE:
-                    return comparison > 0;
-                case NEQ:
-                    return true;
-                default:
-                    throw new AssertionError();
-            }
-        }
-
         static boolean listAppliesTo(ListType type, Iterator<Cell> iter, List<ByteBuffer> elements, Operator operator)
         {
             return setOrListAppliesTo(type.getElementsType(), iter, elements.iterator(), operator, false);
@@ -688,6 +690,195 @@
         }
     }
 
+    /** A condition on a UDT field. IN operators are not supported here, see UDTFieldAccessInBound. */
+    static class UDTFieldAccessBound extends Bound
+    {
+        public final FieldIdentifier field;
+        public final ByteBuffer value;
+
+        private UDTFieldAccessBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
+        {
+            super(condition.column, condition.operator);
+            assert column.type.isUDT() && condition.field != null;
+            assert condition.operator != Operator.IN;
+            this.field = condition.field;
+            this.value = condition.value.bindAndGet(options);
+        }
+
+        public boolean appliesTo(Row row) throws InvalidRequestException
+        {
+            UserType userType = (UserType) column.type;
+            int fieldPosition = userType.fieldPosition(field);
+            assert fieldPosition >= 0;
+
+            ByteBuffer cellValue;
+            if (column.type.isMultiCell())
+            {
+                Cell cell = getCell(row, column, userType.cellPathForField(field));
+                cellValue = cell == null ? null : cell.value();
+            }
+            else
+            {
+                Cell cell = getCell(row, column);
+                cellValue = cell == null
+                          ? null
+                          : userType.split(cell.value())[fieldPosition];
+            }
+            return compareWithOperator(operator, userType.fieldType(fieldPosition), value, cellValue);
+        }
+    }
+
+    /** An IN condition on a UDT field.  For example: IF user.name IN ('a', 'b') */
+    static class UDTFieldAccessInBound extends Bound
+    {
+        public final FieldIdentifier field;
+        public final List<ByteBuffer> inValues;
+
+        private UDTFieldAccessInBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
+        {
+            super(condition.column, condition.operator);
+            assert column.type.isUDT() && condition.field != null;
+            this.field = condition.field;
+
+            if (condition.inValues == null)
+                this.inValues = ((Lists.Value) condition.value.bind(options)).getElements();
+            else
+            {
+                this.inValues = new ArrayList<>(condition.inValues.size());
+                for (Term value : condition.inValues)
+                    this.inValues.add(value.bindAndGet(options));
+            }
+        }
+
+        public boolean appliesTo(Row row) throws InvalidRequestException
+        {
+            UserType userType = (UserType) column.type;
+            int fieldPosition = userType.fieldPosition(field);
+            assert fieldPosition >= 0;
+
+            ByteBuffer cellValue;
+            if (column.type.isMultiCell())
+            {
+                Cell cell = getCell(row, column, userType.cellPathForField(field));
+                cellValue = cell == null ? null : cell.value();
+            }
+            else
+            {
+                Cell cell = getCell(row, column);
+                cellValue = cell == null ? null : userType.split(getCell(row, column).value())[fieldPosition];
+            }
+
+            AbstractType<?> valueType = userType.fieldType(fieldPosition);
+            for (ByteBuffer value : inValues)
+            {
+                if (compareWithOperator(Operator.EQ, valueType, value, cellValue))
+                    return true;
+            }
+            return false;
+        }
+    }
+
+    /** A non-IN condition on an entire UDT.  For example: IF user = {name: 'joe', age: 42}). */
+    static class UDTBound extends Bound
+    {
+        private final ByteBuffer value;
+        private final int protocolVersion;
+
+        private UDTBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
+        {
+            super(condition.column, condition.operator);
+            assert column.type.isUDT() && condition.field == null;
+            assert condition.operator != Operator.IN;
+            protocolVersion = options.getProtocolVersion();
+            value = condition.value.bindAndGet(options);
+        }
+
+        public boolean appliesTo(Row row) throws InvalidRequestException
+        {
+            UserType userType = (UserType) column.type;
+            ByteBuffer rowValue;
+            if (userType.isMultiCell())
+            {
+                Iterator<Cell> iter = getCells(row, column);
+                rowValue = iter.hasNext() ? userType.serializeForNativeProtocol(iter, protocolVersion) : null;
+            }
+            else
+            {
+                Cell cell = getCell(row, column);
+                rowValue = cell == null ? null : cell.value();
+            }
+
+            if (value == null)
+            {
+                if (operator == Operator.EQ)
+                    return rowValue == null;
+                else if (operator == Operator.NEQ)
+                    return rowValue != null;
+                else
+                    throw new InvalidRequestException(String.format("Invalid comparison with null for operator \"%s\"", operator));
+            }
+
+            return compareWithOperator(operator, userType, value, rowValue);
+        }
+    }
+
+    /** An IN condition on an entire UDT.  For example: IF user IN ({name: 'joe', age: 42}, {name: 'bob', age: 23}). */
+    public static class UDTInBound extends Bound
+    {
+        private final List<ByteBuffer> inValues;
+        private final int protocolVersion;
+
+        private UDTInBound(ColumnCondition condition, QueryOptions options) throws InvalidRequestException
+        {
+            super(condition.column, condition.operator);
+            assert column.type.isUDT() && condition.field == null;
+            assert condition.operator == Operator.IN;
+            protocolVersion = options.getProtocolVersion();
+            inValues = new ArrayList<>();
+            if (condition.inValues == null)
+            {
+                Lists.Marker inValuesMarker = (Lists.Marker) condition.value;
+                for (ByteBuffer buffer : ((Lists.Value)inValuesMarker.bind(options)).elements)
+                    this.inValues.add(buffer);
+            }
+            else
+            {
+                for (Term value : condition.inValues)
+                    this.inValues.add(value.bindAndGet(options));
+            }
+        }
+
+        public boolean appliesTo(Row row) throws InvalidRequestException
+        {
+            UserType userType = (UserType) column.type;
+            ByteBuffer rowValue;
+            if (userType.isMultiCell())
+            {
+                Iterator<Cell> cells = getCells(row, column);
+                rowValue = cells.hasNext() ? userType.serializeForNativeProtocol(cells, protocolVersion) : null;
+            }
+            else
+            {
+                Cell cell = getCell(row, column);
+                rowValue = cell == null ? null : cell.value();
+            }
+
+            for (ByteBuffer value : inValues)
+            {
+                if (value == null || rowValue == null)
+                {
+                    if (value == rowValue) // both null
+                        return true;
+                }
+                else if (userType.compare(value, rowValue) == 0)
+                {
+                    return true;
+                }
+            }
+            return false;
+        }
+    }
+
     public static class Raw
     {
         private final Term.Raw value;
@@ -697,59 +888,140 @@
         // Can be null, only used with the syntax "IF m[e] = ..." (in which case it's 'e')
         private final Term.Raw collectionElement;
 
+        // Can be null, only used with the syntax "IF udt.field = ..." (in which case it's 'field')
+        private final FieldIdentifier udtField;
+
         private final Operator operator;
 
-        private Raw(Term.Raw value, List<Term.Raw> inValues, AbstractMarker.INRaw inMarker, Term.Raw collectionElement, Operator op)
+        private Raw(Term.Raw value, List<Term.Raw> inValues, AbstractMarker.INRaw inMarker, Term.Raw collectionElement,
+                    FieldIdentifier udtField, Operator op)
         {
             this.value = value;
             this.inValues = inValues;
             this.inMarker = inMarker;
             this.collectionElement = collectionElement;
+            this.udtField = udtField;
             this.operator = op;
         }
 
         /** A condition on a column. For example: "IF col = 'foo'" */
         public static Raw simpleCondition(Term.Raw value, Operator op)
         {
-            return new Raw(value, null, null, null, op);
+            return new Raw(value, null, null, null, null, op);
         }
 
         /** An IN condition on a column. For example: "IF col IN ('foo', 'bar', ...)" */
         public static Raw simpleInCondition(List<Term.Raw> inValues)
         {
-            return new Raw(null, inValues, null, null, Operator.IN);
+            return new Raw(null, inValues, null, null, null, Operator.IN);
         }
 
         /** An IN condition on a column with a single marker. For example: "IF col IN ?" */
         public static Raw simpleInCondition(AbstractMarker.INRaw inMarker)
         {
-            return new Raw(null, null, inMarker, null, Operator.IN);
+            return new Raw(null, null, inMarker, null, null, Operator.IN);
         }
 
         /** A condition on a collection element. For example: "IF col['key'] = 'foo'" */
         public static Raw collectionCondition(Term.Raw value, Term.Raw collectionElement, Operator op)
         {
-            return new Raw(value, null, null, collectionElement, op);
+            return new Raw(value, null, null, collectionElement, null, op);
         }
 
         /** An IN condition on a collection element. For example: "IF col['key'] IN ('foo', 'bar', ...)" */
         public static Raw collectionInCondition(Term.Raw collectionElement, List<Term.Raw> inValues)
         {
-            return new Raw(null, inValues, null, collectionElement, Operator.IN);
+            return new Raw(null, inValues, null, collectionElement, null, Operator.IN);
         }
 
         /** An IN condition on a collection element with a single marker. For example: "IF col['key'] IN ?" */
         public static Raw collectionInCondition(Term.Raw collectionElement, AbstractMarker.INRaw inMarker)
         {
-            return new Raw(null, null, inMarker, collectionElement, Operator.IN);
+            return new Raw(null, null, inMarker, collectionElement, null, Operator.IN);
         }
 
-        public ColumnCondition prepare(String keyspace, ColumnDefinition receiver) throws InvalidRequestException
+        /** A condition on a UDT field. For example: "IF col.field = 'foo'" */
+        public static Raw udtFieldCondition(Term.Raw value, FieldIdentifier udtField, Operator op)
+        {
+            return new Raw(value, null, null, null, udtField, op);
+        }
+
+        /** An IN condition on a collection element. For example: "IF col.field IN ('foo', 'bar', ...)" */
+        public static Raw udtFieldInCondition(FieldIdentifier udtField, List<Term.Raw> inValues)
+        {
+            return new Raw(null, inValues, null, null, udtField, Operator.IN);
+        }
+
+        /** An IN condition on a collection element with a single marker. For example: "IF col.field IN ?" */
+        public static Raw udtFieldInCondition(FieldIdentifier udtField, AbstractMarker.INRaw inMarker)
+        {
+            return new Raw(null, null, inMarker, null, udtField, Operator.IN);
+        }
+
+        public ColumnCondition prepare(String keyspace, ColumnDefinition receiver, CFMetaData cfm) throws InvalidRequestException
         {
             if (receiver.type instanceof CounterColumnType)
                 throw new InvalidRequestException("Conditions on counters are not supported");
 
-            if (collectionElement == null)
+            if (collectionElement != null)
+            {
+                if (!(receiver.type.isCollection()))
+                    throw new InvalidRequestException(String.format("Invalid element access syntax for non-collection column %s", receiver.name));
+
+                ColumnSpecification elementSpec, valueSpec;
+                switch ((((CollectionType) receiver.type).kind))
+                {
+                    case LIST:
+                        elementSpec = Lists.indexSpecOf(receiver);
+                        valueSpec = Lists.valueSpecOf(receiver);
+                        break;
+                    case MAP:
+                        elementSpec = Maps.keySpecOf(receiver);
+                        valueSpec = Maps.valueSpecOf(receiver);
+                        break;
+                    case SET:
+                        throw new InvalidRequestException(String.format("Invalid element access syntax for set column %s", receiver.name));
+                    default:
+                        throw new AssertionError();
+                }
+                if (operator == Operator.IN)
+                {
+                    if (inValues == null)
+                        return ColumnCondition.inCondition(receiver, collectionElement.prepare(keyspace, elementSpec), inMarker.prepare(keyspace, valueSpec));
+                    List<Term> terms = new ArrayList<>(inValues.size());
+                    for (Term.Raw value : inValues)
+                        terms.add(value.prepare(keyspace, valueSpec));
+                    return ColumnCondition.inCondition(receiver, collectionElement.prepare(keyspace, elementSpec), terms);
+                }
+                else
+                {
+                    return ColumnCondition.condition(receiver, collectionElement.prepare(keyspace, elementSpec), value.prepare(keyspace, valueSpec), operator);
+                }
+            }
+            else if (udtField != null)
+            {
+                UserType userType = (UserType) receiver.type;
+                int fieldPosition = userType.fieldPosition(udtField);
+                if (fieldPosition == -1)
+                    throw new InvalidRequestException(String.format("Unknown field %s for column %s", udtField, receiver.name));
+
+                ColumnSpecification fieldReceiver = UserTypes.fieldSpecOf(receiver, fieldPosition);
+                if (operator == Operator.IN)
+                {
+                    if (inValues == null)
+                        return ColumnCondition.inCondition(receiver, udtField, inMarker.prepare(keyspace, fieldReceiver));
+
+                    List<Term> terms = new ArrayList<>(inValues.size());
+                    for (Term.Raw value : inValues)
+                        terms.add(value.prepare(keyspace, fieldReceiver));
+                    return ColumnCondition.inCondition(receiver, udtField, terms);
+                }
+                else
+                {
+                    return ColumnCondition.condition(receiver, udtField, value.prepare(keyspace, fieldReceiver), operator);
+                }
+            }
+            else
             {
                 if (operator == Operator.IN)
                 {
@@ -765,39 +1037,6 @@
                     return ColumnCondition.condition(receiver, value.prepare(keyspace, receiver), operator);
                 }
             }
-
-            if (!(receiver.type.isCollection()))
-                throw new InvalidRequestException(String.format("Invalid element access syntax for non-collection column %s", receiver.name));
-
-            ColumnSpecification elementSpec, valueSpec;
-            switch ((((CollectionType)receiver.type).kind))
-            {
-                case LIST:
-                    elementSpec = Lists.indexSpecOf(receiver);
-                    valueSpec = Lists.valueSpecOf(receiver);
-                    break;
-                case MAP:
-                    elementSpec = Maps.keySpecOf(receiver);
-                    valueSpec = Maps.valueSpecOf(receiver);
-                    break;
-                case SET:
-                    throw new InvalidRequestException(String.format("Invalid element access syntax for set column %s", receiver.name));
-                default:
-                    throw new AssertionError();
-            }
-            if (operator == Operator.IN)
-            {
-                if (inValues == null)
-                    return ColumnCondition.inCondition(receiver, collectionElement.prepare(keyspace, elementSpec), inMarker.prepare(keyspace, valueSpec));
-                List<Term> terms = new ArrayList<>(inValues.size());
-                for (Term.Raw value : inValues)
-                    terms.add(value.prepare(keyspace, valueSpec));
-                return ColumnCondition.inCondition(receiver, collectionElement.prepare(keyspace, elementSpec), terms);
-            }
-            else
-            {
-                return ColumnCondition.condition(receiver, collectionElement.prepare(keyspace, elementSpec), value.prepare(keyspace, valueSpec), operator);
-            }
         }
     }
 }

diff --git a/src/java/org/apache/cassandra/cql3/ColumnIdentifier.java b/src/java/org/apache/cassandra/cql3/ColumnIdentifier.java
index 93734e9..467c672 100644
--- a/src/java/org/apache/cassandra/cql3/ColumnIdentifier.java
+++ b/src/java/org/apache/cassandra/cql3/ColumnIdentifier.java

@@ -24,14 +24,13 @@
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;
 
+import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.MapMaker;
 
 import org.apache.cassandra.cache.IMeasurableMemory;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.selection.Selectable;
-import org.apache.cassandra.cql3.selection.Selector;
-import org.apache.cassandra.cql3.selection.SimpleSelector;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.db.marshal.UTF8Type;
@@ -43,10 +42,11 @@
  * Represents an identifer for a CQL column definition.
  * TODO : should support light-weight mode without text representation for when not interned
  */
-public class ColumnIdentifier extends Selectable implements IMeasurableMemory, Comparable<ColumnIdentifier>
+public class ColumnIdentifier implements IMeasurableMemory, Comparable<ColumnIdentifier>
 {
     private static final Pattern PATTERN_DOUBLE_QUOTE = Pattern.compile("\"", Pattern.LITERAL);
-
+    private static final String ESCAPED_DOUBLE_QUOTE = Matcher.quoteReplacement("\"\"");
+    
     public final ByteBuffer bytes;
     private final String text;
     /**
@@ -155,7 +155,7 @@
 
     /**
      * Returns a string representation of the identifier that is safe to use directly in CQL queries.
-     * In necessary, the string will be double-quoted, and any quotes inside the string will be escaped.
+     * If necessary, the string will be double-quoted, and any quotes inside the string will be escaped.
      */
     public String toCQLString()
     {
@@ -181,15 +181,6 @@
         return interned ? this : new ColumnIdentifier(allocator.clone(bytes), text, false);
     }
 
-    public Selector.Factory newSelectorFactory(CFMetaData cfm, List<ColumnDefinition> defs) throws InvalidRequestException
-    {
-        ColumnDefinition def = cfm.getColumnDefinition(this);
-        if (def == null)
-            throw new InvalidRequestException(String.format("Undefined name %s in selection clause", this));
-
-        return SimpleSelector.newFactory(def, addAndGetIndex(def, defs));
-    }
-
     public int compareTo(ColumnIdentifier that)
     {
         int c = Long.compare(this.prefixComparison, that.prefixComparison);
@@ -200,138 +191,11 @@
         return ByteBufferUtil.compareUnsigned(this.bytes, that.bytes);
     }
 
-    /**
-     * Because Thrift-created tables may have a non-text comparator, we cannot determine the proper 'key' until
-     * we know the comparator. ColumnIdentifier.Raw is a placeholder that can be converted to a real ColumnIdentifier
-     * once the comparator is known with prepare(). This should only be used with identifiers that are actual
-     * column names. See CASSANDRA-8178 for more background.
-     */
-    public static interface Raw extends Selectable.Raw
-    {
-
-        public ColumnIdentifier prepare(CFMetaData cfm);
-
-        /**
-         * Returns a string representation of the identifier that is safe to use directly in CQL queries.
-         * In necessary, the string will be double-quoted, and any quotes inside the string will be escaped.
-         */
-        public String toCQLString();
-    }
-
-    public static class Literal implements Raw
-    {
-        private final String rawText;
-        private final String text;
-
-        public Literal(String rawText, boolean keepCase)
-        {
-            this.rawText = rawText;
-            this.text =  keepCase ? rawText : rawText.toLowerCase(Locale.US);
-        }
-
-        public ColumnIdentifier prepare(CFMetaData cfm)
-        {
-            if (!cfm.isStaticCompactTable())
-                return getInterned(text, true);
-
-            AbstractType<?> thriftColumnNameType = cfm.thriftColumnNameType();
-            if (thriftColumnNameType instanceof UTF8Type)
-                return getInterned(text, true);
-
-            // We have a Thrift-created table with a non-text comparator. Check if we have a match column, otherwise assume we should use
-            // thriftColumnNameType
-            ByteBuffer bufferName = ByteBufferUtil.bytes(text);
-            for (ColumnDefinition def : cfm.allColumns())
-            {
-                if (def.name.bytes.equals(bufferName))
-                    return def.name;
-            }
-            return getInterned(thriftColumnNameType.fromString(rawText), text);
-        }
-
-        public boolean processesSelection()
-        {
-            return false;
-        }
-
-        @Override
-        public final int hashCode()
-        {
-            return text.hashCode();
-        }
-
-        @Override
-        public final boolean equals(Object o)
-        {
-            if(!(o instanceof Literal))
-                return false;
-
-            Literal that = (Literal) o;
-            return text.equals(that.text);
-        }
-
-        @Override
-        public String toString()
-        {
-            return text;
-        }
-
-        public String toCQLString()
-        {
-            return maybeQuote(text);
-        }
-    }
-
-    public static class ColumnIdentifierValue implements Raw
-    {
-        private final ColumnIdentifier identifier;
-
-        public ColumnIdentifierValue(ColumnIdentifier identifier)
-        {
-            this.identifier = identifier;
-        }
-
-        public ColumnIdentifier prepare(CFMetaData cfm)
-        {
-            return identifier;
-        }
-
-        public boolean processesSelection()
-        {
-            return false;
-        }
-
-        @Override
-        public final int hashCode()
-        {
-            return identifier.hashCode();
-        }
-
-        @Override
-        public final boolean equals(Object o)
-        {
-            if(!(o instanceof ColumnIdentifierValue))
-                return false;
-            ColumnIdentifierValue that = (ColumnIdentifierValue) o;
-            return identifier.equals(that.identifier);
-        }
-
-        @Override
-        public String toString()
-        {
-            return identifier.toString();
-        }
-
-        public String toCQLString()
-        {
-            return maybeQuote(identifier.text);
-        }
-    }
-
-    static String maybeQuote(String text)
+    @VisibleForTesting
+    public static String maybeQuote(String text)
     {
         if (UNQUOTED_IDENTIFIER.matcher(text).matches())
             return text;
-        return '"' + PATTERN_DOUBLE_QUOTE.matcher(text).replaceAll(Matcher.quoteReplacement("\"\"")) + '"';
+        return '"' + PATTERN_DOUBLE_QUOTE.matcher(text).replaceAll(ESCAPED_DOUBLE_QUOTE) + '"';
     }
 }

diff --git a/src/java/org/apache/cassandra/cql3/ColumnSpecification.java b/src/java/org/apache/cassandra/cql3/ColumnSpecification.java
index e64f5f9..8cf869b 100644
--- a/src/java/org/apache/cassandra/cql3/ColumnSpecification.java
+++ b/src/java/org/apache/cassandra/cql3/ColumnSpecification.java

@@ -17,6 +17,7 @@
  */
 package org.apache.cassandra.cql3;
 
+import com.google.common.base.MoreObjects;
 import com.google.common.base.Objects;
 
 import org.apache.cassandra.db.marshal.AbstractType;
@@ -96,9 +97,9 @@
     @Override
     public String toString()
     {
-        return Objects.toStringHelper(this)
-                      .add("name", name)
-                      .add("type", type)
-                      .toString();
+        return MoreObjects.toStringHelper(this)
+                          .add("name", name)
+                          .add("type", type)
+                          .toString();
     }
 }

diff --git a/src/java/org/apache/cassandra/cql3/Constants.java b/src/java/org/apache/cassandra/cql3/Constants.java
index a2bacdf..913ea97 100644
--- a/src/java/org/apache/cassandra/cql3/Constants.java
+++ b/src/java/org/apache/cassandra/cql3/Constants.java

@@ -23,11 +23,7 @@
 import org.slf4j.LoggerFactory;
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.marshal.AbstractType;
-import org.apache.cassandra.db.marshal.BytesType;
-import org.apache.cassandra.db.marshal.CounterColumnType;
-import org.apache.cassandra.db.marshal.LongType;
-import org.apache.cassandra.db.marshal.ReversedType;
+import org.apache.cassandra.db.marshal.*;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.serializers.MarshalException;
 import org.apache.cassandra.utils.ByteBufferUtil;
@@ -41,7 +37,7 @@
 
     public enum Type
     {
-        STRING, INTEGER, UUID, FLOAT, DATE, TIME, BOOLEAN, HEX;
+        STRING, INTEGER, UUID, FLOAT, BOOLEAN, HEX;
     }
 
     public static final Value UNSET_VALUE = new Value(ByteBufferUtil.UNSET_BYTE_BUFFER);
@@ -67,6 +63,11 @@
         {
             return "NULL";
         }
+
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            return null;
+        }
     }
 
     public static final NullLiteral NULL_LITERAL = new NullLiteral();
@@ -159,10 +160,11 @@
             }
         }
 
+        @Override
         public AssignmentTestable.TestResult testAssignment(String keyspace, ColumnSpecification receiver)
         {
             CQL3Type receiverType = receiver.type.asCQL3Type();
-            if (receiverType.isCollection())
+            if (receiverType.isCollection() || receiverType.isUDT())
                 return AssignmentTestable.TestResult.NOT_ASSIGNABLE;
 
             if (!(receiverType instanceof CQL3Type.Native))
@@ -238,6 +240,16 @@
             return AssignmentTestable.TestResult.NOT_ASSIGNABLE;
         }
 
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            // Most constant are valid for more than one type (the extreme example being integer constants, which can
+            // be use for any numerical type, including date, time, ...) so they don't have an exact type. And in fact,
+            // for good or bad, any literal is valid for custom types, so we can never claim an exact type.
+            // But really, the reason it's fine to return null here is that getExactTypeIfKnown is only used to
+            // implement testAssignment() in Selectable and that method is overriden above.
+            return null;
+        }
+
         public String getRawText()
         {
             return text;

diff --git a/src/java/org/apache/cassandra/cql3/FieldIdentifier.java b/src/java/org/apache/cassandra/cql3/FieldIdentifier.java
new file mode 100644
index 0000000..5e0601c
--- /dev/null
+++ b/src/java/org/apache/cassandra/cql3/FieldIdentifier.java

@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql3;
+
+import java.util.Locale;
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.exceptions.SyntaxException;
+import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.serializers.MarshalException;
+
+/**
+ * Identifies a field in a UDT.
+ */
+public class FieldIdentifier
+{
+    public final ByteBuffer bytes;
+
+    public FieldIdentifier(ByteBuffer bytes)
+    {
+        this.bytes = bytes;
+    }
+
+    /**
+     * Creates a {@code FieldIdentifier} from an unquoted identifier string.
+     */
+    public static FieldIdentifier forUnquoted(String text)
+    {
+        return new FieldIdentifier(convert(text.toLowerCase(Locale.US)));
+    }
+
+    /**
+     * Creates a {@code FieldIdentifier} from a quoted identifier string.
+     */
+    public static FieldIdentifier forQuoted(String text)
+    {
+        return new FieldIdentifier(convert(text));
+    }
+
+    /**
+     * Creates a {@code FieldIdentifier} from an internal string.
+     */
+    public static FieldIdentifier forInternalString(String text)
+    {
+        // If we store a field internally, we consider it as quoted, i.e. we preserve
+        // whatever case the text has.
+        return forQuoted(text);
+    }
+
+    private static ByteBuffer convert(String text)
+    {
+        try
+        {
+            return UTF8Type.instance.decompose(text);
+        }
+        catch (MarshalException e)
+        {
+            throw new SyntaxException(String.format("For field name %s: %s", text, e.getMessage()));
+        }
+    }
+
+    @Override
+    public String toString()
+    {
+        return UTF8Type.instance.compose(bytes);
+    }
+
+    @Override
+    public final int hashCode()
+    {
+        return bytes.hashCode();
+    }
+
+    @Override
+    public final boolean equals(Object o)
+    {
+        if(!(o instanceof FieldIdentifier))
+            return false;
+        FieldIdentifier that = (FieldIdentifier)o;
+        return this.bytes.equals(that.bytes);
+    }
+}

diff --git a/src/java/org/apache/cassandra/cql3/Json.java b/src/java/org/apache/cassandra/cql3/Json.java
index ab02fb6..298cde7 100644
--- a/src/java/org/apache/cassandra/cql3/Json.java
+++ b/src/java/org/apache/cassandra/cql3/Json.java

@@ -24,6 +24,7 @@
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.functions.Function;
+import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.UTF8Type;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.serializers.MarshalException;
@@ -180,6 +181,11 @@
             return TestResult.NOT_ASSIGNABLE;
         }
 
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            return null;
+        }
+
         public String getText()
         {
             return term.toString();
@@ -212,6 +218,11 @@
             return TestResult.WEAKLY_ASSIGNABLE;
         }
 
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            return null;
+        }
+
         public String getText()
         {
             return marker.toString();

diff --git a/src/java/org/apache/cassandra/cql3/Lists.java b/src/java/org/apache/cassandra/cql3/Lists.java
index 559cf3f..ad0af6d 100644
--- a/src/java/org/apache/cassandra/cql3/Lists.java
+++ b/src/java/org/apache/cassandra/cql3/Lists.java

@@ -29,6 +29,7 @@
 import org.apache.cassandra.cql3.functions.Function;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.Int32Type;
 import org.apache.cassandra.db.marshal.ListType;
 import org.apache.cassandra.exceptions.InvalidRequestException;
@@ -113,6 +114,18 @@
             return AssignmentTestable.TestResult.testAll(keyspace, valueSpec, elements);
         }
 
+        @Override
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            for (Term.Raw term : elements)
+            {
+                AbstractType<?> type = term.getExactTypeIfKnown(keyspace);
+                if (type != null)
+                    return ListType.getInstance(type, false);
+            }
+            return null;
+        }
+
         public String getText()
         {
             return elements.stream().map(Term.Raw::getText).collect(Collectors.joining(", ", "[", "]"));
@@ -358,13 +371,9 @@
 
             CellPath elementPath = existingRow.getComplexColumnData(column).getCellByIndex(idx).path();
             if (value == null)
-            {
                 params.addTombstone(column, elementPath);
-            }
             else if (value != ByteBufferUtil.UNSET_BYTE_BUFFER)
-            {
                 params.addCell(column, elementPath, value);
-            }
         }
     }
 

diff --git a/src/java/org/apache/cassandra/cql3/Maps.java b/src/java/org/apache/cassandra/cql3/Maps.java
index 4772369..952bff0 100644
--- a/src/java/org/apache/cassandra/cql3/Maps.java
+++ b/src/java/org/apache/cassandra/cql3/Maps.java

@@ -27,7 +27,7 @@
 import org.apache.cassandra.cql3.functions.Function;
 import org.apache.cassandra.db.DecoratedKey;
 import org.apache.cassandra.db.rows.*;
-import org.apache.cassandra.db.marshal.MapType;
+import org.apache.cassandra.db.marshal.*;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.serializers.CollectionSerializer;
 import org.apache.cassandra.serializers.MarshalException;
@@ -127,6 +127,23 @@
             return res;
         }
 
+        @Override
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            AbstractType<?> keyType = null;
+            AbstractType<?> valueType = null;
+            for (Pair<Term.Raw, Term.Raw> entry : entries)
+            {
+                if (keyType == null)
+                    keyType = entry.left.getExactTypeIfKnown(keyspace);
+                if (valueType == null)
+                    valueType = entry.right.getExactTypeIfKnown(keyspace);
+                if (keyType != null && valueType != null)
+                    return MapType.getInstance(keyType, valueType, false);
+            }
+            return null;
+        }
+
         public String getText()
         {
             return entries.stream()

diff --git a/src/java/org/apache/cassandra/cql3/MultiColumnRelation.java b/src/java/org/apache/cassandra/cql3/MultiColumnRelation.java
index 143106d..01f2a12 100644
--- a/src/java/org/apache/cassandra/cql3/MultiColumnRelation.java
+++ b/src/java/org/apache/cassandra/cql3/MultiColumnRelation.java

@@ -46,7 +46,7 @@
  */
 public class MultiColumnRelation extends Relation
 {
-    private final List<ColumnIdentifier.Raw> entities;
+    private final List<ColumnDefinition.Raw> entities;
 
     /** A Tuples.Literal or Tuples.Raw marker */
     private final Term.MultiColumnRaw valuesOrMarker;
@@ -56,7 +56,7 @@
 
     private final Tuples.INRaw inMarker;
 
-    private MultiColumnRelation(List<ColumnIdentifier.Raw> entities, Operator relationType, Term.MultiColumnRaw valuesOrMarker, List<? extends Term.MultiColumnRaw> inValues, Tuples.INRaw inMarker)
+    private MultiColumnRelation(List<ColumnDefinition.Raw> entities, Operator relationType, Term.MultiColumnRaw valuesOrMarker, List<? extends Term.MultiColumnRaw> inValues, Tuples.INRaw inMarker)
     {
         this.entities = entities;
         this.relationType = relationType;
@@ -76,7 +76,7 @@
      * @param valuesOrMarker a Tuples.Literal instance or a Tuples.Raw marker
      * @return a new <code>MultiColumnRelation</code> instance
      */
-    public static MultiColumnRelation createNonInRelation(List<ColumnIdentifier.Raw> entities, Operator relationType, Term.MultiColumnRaw valuesOrMarker)
+    public static MultiColumnRelation createNonInRelation(List<ColumnDefinition.Raw> entities, Operator relationType, Term.MultiColumnRaw valuesOrMarker)
     {
         assert relationType != Operator.IN;
         return new MultiColumnRelation(entities, relationType, valuesOrMarker, null, null);
@@ -89,7 +89,7 @@
      * @param inValues a list of Tuples.Literal instances or a Tuples.Raw markers
      * @return a new <code>MultiColumnRelation</code> instance
      */
-    public static MultiColumnRelation createInRelation(List<ColumnIdentifier.Raw> entities, List<? extends Term.MultiColumnRaw> inValues)
+    public static MultiColumnRelation createInRelation(List<ColumnDefinition.Raw> entities, List<? extends Term.MultiColumnRaw> inValues)
     {
         return new MultiColumnRelation(entities, Operator.IN, null, inValues, null);
     }
@@ -101,12 +101,12 @@
      * @param inMarker a single IN marker
      * @return a new <code>MultiColumnRelation</code> instance
      */
-    public static MultiColumnRelation createSingleMarkerInRelation(List<ColumnIdentifier.Raw> entities, Tuples.INRaw inMarker)
+    public static MultiColumnRelation createSingleMarkerInRelation(List<ColumnDefinition.Raw> entities, Tuples.INRaw inMarker)
     {
         return new MultiColumnRelation(entities, Operator.IN, null, null, inMarker);
     }
 
-    public List<ColumnIdentifier.Raw> getEntities()
+    public List<ColumnDefinition.Raw> getEntities()
     {
         return entities;
     }
@@ -183,6 +183,12 @@
     }
 
     @Override
+    protected Restriction newLikeRestriction(CFMetaData cfm, VariableSpecifications boundNames, Operator operator) throws InvalidRequestException
+    {
+        throw invalidRequest("%s cannot be used for multi-column relations", operator());
+    }
+
+    @Override
     protected Term toTerm(List<? extends ColumnSpecification> receivers,
                           Raw raw,
                           String keyspace,
@@ -197,9 +203,9 @@
     {
         List<ColumnDefinition> names = new ArrayList<>(getEntities().size());
         int previousPosition = -1;
-        for (ColumnIdentifier.Raw raw : getEntities())
+        for (ColumnDefinition.Raw raw : getEntities())
         {
-            ColumnDefinition def = toColumnDefinition(cfm, raw);
+            ColumnDefinition def = raw.prepare(cfm);
             checkTrue(def.isClusteringColumn(), "Multi-column relations can only be applied to clustering columns but was applied to: %s", def.name);
             checkFalse(names.contains(def), "Column \"%s\" appeared twice in a relation: %s", def.name, this);
 
@@ -213,12 +219,12 @@
         return names;
     }
 
-    public Relation renameIdentifier(ColumnIdentifier.Raw from, ColumnIdentifier.Raw to)
+    public Relation renameIdentifier(ColumnDefinition.Raw from, ColumnDefinition.Raw to)
     {
         if (!entities.contains(from))
             return this;
 
-        List<ColumnIdentifier.Raw> newEntities = entities.stream().map(e -> e.equals(from) ? to : e).collect(Collectors.toList());
+        List<ColumnDefinition.Raw> newEntities = entities.stream().map(e -> e.equals(from) ? to : e).collect(Collectors.toList());
         return new MultiColumnRelation(newEntities, operator(), valuesOrMarker, inValues, inMarker);
     }
 

diff --git a/src/java/org/apache/cassandra/cql3/Operation.java b/src/java/org/apache/cassandra/cql3/Operation.java
index ecd37c4..7d7d7b3 100644
--- a/src/java/org/apache/cassandra/cql3/Operation.java
+++ b/src/java/org/apache/cassandra/cql3/Operation.java

@@ -17,8 +17,10 @@
  */
 package org.apache.cassandra.cql3;
 
+import java.nio.ByteBuffer;
 import java.util.List;
 
+import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.functions.Function;
 import org.apache.cassandra.db.DecoratedKey;
@@ -108,12 +110,10 @@
          * It returns an Operation which can be though as post-preparation well-typed
          * Operation.
          *
-         * @param receiver the "column" this operation applies to. Note that
-         * contrarly to the method of same name in Term.Raw, the receiver should always
-         * be a true column.
+         * @param receiver the column this operation applies to.
          * @return the prepared update operation.
          */
-        public Operation prepare(String keyspace, ColumnDefinition receiver) throws InvalidRequestException;
+        public Operation prepare(CFMetaData cfm, ColumnDefinition receiver) throws InvalidRequestException;
 
         /**
          * @return whether this operation can be applied alongside the {@code
@@ -134,7 +134,7 @@
         /**
          * The name of the column affected by this delete operation.
          */
-        public ColumnIdentifier.Raw affectedColumn();
+        public ColumnDefinition.Raw affectedColumn();
 
         /**
          * This method validates the operation (i.e. validate it is well typed)
@@ -147,7 +147,7 @@
          * @param receiver the "column" this operation applies to.
          * @return the prepared delete operation.
          */
-        public Operation prepare(String keyspace, ColumnDefinition receiver) throws InvalidRequestException;
+        public Operation prepare(String keyspace, ColumnDefinition receiver, CFMetaData cfm) throws InvalidRequestException;
     }
 
     public static class SetValue implements RawUpdate
@@ -159,26 +159,32 @@
             this.value = value;
         }
 
-        public Operation prepare(String keyspace, ColumnDefinition receiver) throws InvalidRequestException
+        public Operation prepare(CFMetaData cfm, ColumnDefinition receiver) throws InvalidRequestException
         {
-            Term v = value.prepare(keyspace, receiver);
+            Term v = value.prepare(cfm.ksName, receiver);
 
             if (receiver.type instanceof CounterColumnType)
                 throw new InvalidRequestException(String.format("Cannot set the value of counter column %s (counters can only be incremented/decremented, not set)", receiver.name));
 
-            if (!(receiver.type.isCollection()))
-                return new Constants.Setter(receiver, v);
-
-            switch (((CollectionType)receiver.type).kind)
+            if (receiver.type.isCollection())
             {
-                case LIST:
-                    return new Lists.Setter(receiver, v);
-                case SET:
-                    return new Sets.Setter(receiver, v);
-                case MAP:
-                    return new Maps.Setter(receiver, v);
+                switch (((CollectionType) receiver.type).kind)
+                {
+                    case LIST:
+                        return new Lists.Setter(receiver, v);
+                    case SET:
+                        return new Sets.Setter(receiver, v);
+                    case MAP:
+                        return new Maps.Setter(receiver, v);
+                    default:
+                        throw new AssertionError();
+                }
             }
-            throw new AssertionError();
+
+            if (receiver.type.isUDT())
+                return new UserTypes.Setter(receiver, v);
+
+            return new Constants.Setter(receiver, v);
         }
 
         protected String toString(ColumnSpecification column)
@@ -205,7 +211,7 @@
             this.value = value;
         }
 
-        public Operation prepare(String keyspace, ColumnDefinition receiver) throws InvalidRequestException
+        public Operation prepare(CFMetaData cfm, ColumnDefinition receiver) throws InvalidRequestException
         {
             if (!(receiver.type instanceof CollectionType))
                 throw new InvalidRequestException(String.format("Invalid operation (%s) for non collection column %s", toString(receiver), receiver.name));
@@ -215,14 +221,14 @@
             switch (((CollectionType)receiver.type).kind)
             {
                 case LIST:
-                    Term idx = selector.prepare(keyspace, Lists.indexSpecOf(receiver));
-                    Term lval = value.prepare(keyspace, Lists.valueSpecOf(receiver));
+                    Term idx = selector.prepare(cfm.ksName, Lists.indexSpecOf(receiver));
+                    Term lval = value.prepare(cfm.ksName, Lists.valueSpecOf(receiver));
                     return new Lists.SetterByIndex(receiver, idx, lval);
                 case SET:
                     throw new InvalidRequestException(String.format("Invalid operation (%s) for set column %s", toString(receiver), receiver.name));
                 case MAP:
-                    Term key = selector.prepare(keyspace, Maps.keySpecOf(receiver));
-                    Term mval = value.prepare(keyspace, Maps.valueSpecOf(receiver));
+                    Term key = selector.prepare(cfm.ksName, Maps.keySpecOf(receiver));
+                    Term mval = value.prepare(cfm.ksName, Maps.valueSpecOf(receiver));
                     return new Maps.SetterByKey(receiver, key, mval);
             }
             throw new AssertionError();
@@ -241,6 +247,46 @@
         }
     }
 
+    public static class SetField implements RawUpdate
+    {
+        private final FieldIdentifier field;
+        private final Term.Raw value;
+
+        public SetField(FieldIdentifier field, Term.Raw value)
+        {
+            this.field = field;
+            this.value = value;
+        }
+
+        public Operation prepare(CFMetaData cfm, ColumnDefinition receiver) throws InvalidRequestException
+        {
+            if (!receiver.type.isUDT())
+                throw new InvalidRequestException(String.format("Invalid operation (%s) for non-UDT column %s", toString(receiver), receiver.name));
+            else if (!receiver.type.isMultiCell())
+                throw new InvalidRequestException(String.format("Invalid operation (%s) for frozen UDT column %s", toString(receiver), receiver.name));
+
+            int fieldPosition = ((UserType) receiver.type).fieldPosition(field);
+            if (fieldPosition == -1)
+                throw new InvalidRequestException(String.format("UDT column %s does not have a field named %s", receiver.name, field));
+
+            Term val = value.prepare(cfm.ksName, UserTypes.fieldSpecOf(receiver, fieldPosition));
+            return new UserTypes.SetterByField(receiver, field, val);
+        }
+
+        protected String toString(ColumnSpecification column)
+        {
+            return String.format("%s.%s = %s", column.name, field, value);
+        }
+
+        public boolean isCompatibleWith(RawUpdate other)
+        {
+            if (other instanceof SetField)
+                return !((SetField) other).field.equals(field);
+            else
+                return !(other instanceof SetValue);
+        }
+    }
+
     public static class Addition implements RawUpdate
     {
         private final Term.Raw value;
@@ -250,9 +296,9 @@
             this.value = value;
         }
 
-        public Operation prepare(String keyspace, ColumnDefinition receiver) throws InvalidRequestException
+        public Operation prepare(CFMetaData cfm, ColumnDefinition receiver) throws InvalidRequestException
         {
-            Term v = value.prepare(keyspace, receiver);
+            Term v = value.prepare(cfm.ksName, receiver);
 
             if (!(receiver.type instanceof CollectionType))
             {
@@ -295,13 +341,13 @@
             this.value = value;
         }
 
-        public Operation prepare(String keyspace, ColumnDefinition receiver) throws InvalidRequestException
+        public Operation prepare(CFMetaData cfm, ColumnDefinition receiver) throws InvalidRequestException
         {
             if (!(receiver.type instanceof CollectionType))
             {
                 if (!(receiver.type instanceof CounterColumnType))
                     throw new InvalidRequestException(String.format("Invalid operation (%s) for non counter column %s", toString(receiver), receiver.name));
-                return new Constants.Substracter(receiver, value.prepare(keyspace, receiver));
+                return new Constants.Substracter(receiver, value.prepare(cfm.ksName, receiver));
             }
             else if (!(receiver.type.isMultiCell()))
                 throw new InvalidRequestException(String.format("Invalid operation (%s) for frozen collection column %s", toString(receiver), receiver.name));
@@ -309,16 +355,16 @@
             switch (((CollectionType)receiver.type).kind)
             {
                 case LIST:
-                    return new Lists.Discarder(receiver, value.prepare(keyspace, receiver));
+                    return new Lists.Discarder(receiver, value.prepare(cfm.ksName, receiver));
                 case SET:
-                    return new Sets.Discarder(receiver, value.prepare(keyspace, receiver));
+                    return new Sets.Discarder(receiver, value.prepare(cfm.ksName, receiver));
                 case MAP:
                     // The value for a map subtraction is actually a set
                     ColumnSpecification vr = new ColumnSpecification(receiver.ksName,
                                                                      receiver.cfName,
                                                                      receiver.name,
                                                                      SetType.getInstance(((MapType)receiver.type).getKeysType(), false));
-                    return new Sets.Discarder(receiver, value.prepare(keyspace, vr));
+                    return new Sets.Discarder(receiver, value.prepare(cfm.ksName, vr));
             }
             throw new AssertionError();
         }
@@ -343,9 +389,9 @@
             this.value = value;
         }
 
-        public Operation prepare(String keyspace, ColumnDefinition receiver) throws InvalidRequestException
+        public Operation prepare(CFMetaData cfm, ColumnDefinition receiver) throws InvalidRequestException
         {
-            Term v = value.prepare(keyspace, receiver);
+            Term v = value.prepare(cfm.ksName, receiver);
 
             if (!(receiver.type instanceof ListType))
                 throw new InvalidRequestException(String.format("Invalid operation (%s) for non list column %s", toString(receiver), receiver.name));
@@ -368,19 +414,19 @@
 
     public static class ColumnDeletion implements RawDeletion
     {
-        private final ColumnIdentifier.Raw id;
+        private final ColumnDefinition.Raw id;
 
-        public ColumnDeletion(ColumnIdentifier.Raw id)
+        public ColumnDeletion(ColumnDefinition.Raw id)
         {
             this.id = id;
         }
 
-        public ColumnIdentifier.Raw affectedColumn()
+        public ColumnDefinition.Raw affectedColumn()
         {
             return id;
         }
 
-        public Operation prepare(String keyspace, ColumnDefinition receiver) throws InvalidRequestException
+        public Operation prepare(String keyspace, ColumnDefinition receiver, CFMetaData cfm) throws InvalidRequestException
         {
             // No validation, deleting a column is always "well typed"
             return new Constants.Deleter(receiver);
@@ -389,21 +435,21 @@
 
     public static class ElementDeletion implements RawDeletion
     {
-        private final ColumnIdentifier.Raw id;
+        private final ColumnDefinition.Raw id;
         private final Term.Raw element;
 
-        public ElementDeletion(ColumnIdentifier.Raw id, Term.Raw element)
+        public ElementDeletion(ColumnDefinition.Raw id, Term.Raw element)
         {
             this.id = id;
             this.element = element;
         }
 
-        public ColumnIdentifier.Raw affectedColumn()
+        public ColumnDefinition.Raw affectedColumn()
         {
             return id;
         }
 
-        public Operation prepare(String keyspace, ColumnDefinition receiver) throws InvalidRequestException
+        public Operation prepare(String keyspace, ColumnDefinition receiver, CFMetaData cfm) throws InvalidRequestException
         {
             if (!(receiver.type.isCollection()))
                 throw new InvalidRequestException(String.format("Invalid deletion operation for non collection column %s", receiver.name));
@@ -425,4 +471,34 @@
             throw new AssertionError();
         }
     }
+
+    public static class FieldDeletion implements RawDeletion
+    {
+        private final ColumnDefinition.Raw id;
+        private final FieldIdentifier field;
+
+        public FieldDeletion(ColumnDefinition.Raw id, FieldIdentifier field)
+        {
+            this.id = id;
+            this.field = field;
+        }
+
+        public ColumnDefinition.Raw affectedColumn()
+        {
+            return id;
+        }
+
+        public Operation prepare(String keyspace, ColumnDefinition receiver, CFMetaData cfm) throws InvalidRequestException
+        {
+            if (!receiver.type.isUDT())
+                throw new InvalidRequestException(String.format("Invalid field deletion operation for non-UDT column %s", receiver.name));
+            else if (!receiver.type.isMultiCell())
+                throw new InvalidRequestException(String.format("Frozen UDT column %s does not support field deletions", receiver.name));
+
+            if (((UserType) receiver.type).fieldPosition(field) == -1)
+                throw new InvalidRequestException(String.format("UDT column %s does not have a field named %s", receiver.name, field));
+
+            return new UserTypes.DeleterByField(receiver, field);
+        }
+    }
 }

diff --git a/src/java/org/apache/cassandra/cql3/Operator.java b/src/java/org/apache/cassandra/cql3/Operator.java
index 7b28a30..07c92f0 100644
--- a/src/java/org/apache/cassandra/cql3/Operator.java
+++ b/src/java/org/apache/cassandra/cql3/Operator.java

@@ -25,10 +25,8 @@
 import java.util.Map;
 import java.util.Set;
 
-import org.apache.cassandra.db.marshal.AbstractType;
-import org.apache.cassandra.db.marshal.ListType;
-import org.apache.cassandra.db.marshal.MapType;
-import org.apache.cassandra.db.marshal.SetType;
+import org.apache.cassandra.db.marshal.*;
+import org.apache.cassandra.utils.ByteBufferUtil;
 
 public enum Operator
 {
@@ -101,6 +99,46 @@
         {
             return "IS NOT";
         }
+    },
+    LIKE_PREFIX(10)
+    {
+        @Override
+        public String toString()
+        {
+            return "LIKE '<term>%'";
+        }
+    },
+    LIKE_SUFFIX(11)
+    {
+        @Override
+        public String toString()
+        {
+            return "LIKE '%<term>'";
+        }
+    },
+    LIKE_CONTAINS(12)
+    {
+        @Override
+        public String toString()
+        {
+            return "LIKE '%<term>%'";
+        }
+    },
+    LIKE_MATCHES(13)
+    {
+        @Override
+        public String toString()
+        {
+            return "LIKE '<term>'";
+        }
+    },
+    LIKE(14)
+    {
+        @Override
+        public String toString()
+        {
+            return "LIKE";
+        }
     };
 
     /**
@@ -193,8 +231,15 @@
             case CONTAINS_KEY:
                 Map map = (Map) type.getSerializer().deserialize(leftOperand);
                 return map.containsKey(((MapType) type).getKeysType().getSerializer().deserialize(rightOperand));
+            case LIKE_PREFIX:
+                return ByteBufferUtil.startsWith(leftOperand, rightOperand);
+            case LIKE_SUFFIX:
+                return ByteBufferUtil.endsWith(leftOperand, rightOperand);
+            case LIKE_MATCHES:
+            case LIKE_CONTAINS:
+                return ByteBufferUtil.contains(leftOperand, rightOperand);
             default:
-                // we shouldn't get CONTAINS, CONTAINS KEY, or IS NOT here
+                // we shouldn't get LIKE, CONTAINS, CONTAINS KEY, or IS NOT here
                 throw new AssertionError();
         }
     }

diff --git a/src/java/org/apache/cassandra/cql3/QueryOptions.java b/src/java/org/apache/cassandra/cql3/QueryOptions.java
index 6324911..e6cda89 100644
--- a/src/java/org/apache/cassandra/cql3/QueryOptions.java
+++ b/src/java/org/apache/cassandra/cql3/QueryOptions.java

@@ -152,7 +152,7 @@
         throw new UnsupportedOperationException();
     }
 
-    /**  The pageSize for this query. Will be <= 0 if not relevant for the query.  */
+    /**  The pageSize for this query. Will be {@code <= 0} if not relevant for the query.  */
     public int getPageSize()
     {
         return getSpecificOptions().pageSize;

diff --git a/src/java/org/apache/cassandra/cql3/QueryProcessor.java b/src/java/org/apache/cassandra/cql3/QueryProcessor.java
index af94d3e..222204b 100644
--- a/src/java/org/apache/cassandra/cql3/QueryProcessor.java
+++ b/src/java/org/apache/cassandra/cql3/QueryProcessor.java

@@ -34,9 +34,9 @@
 
 import com.googlecode.concurrentlinkedhashmap.ConcurrentLinkedHashMap;
 import com.googlecode.concurrentlinkedhashmap.EntryWeigher;
-import com.googlecode.concurrentlinkedhashmap.EvictionListener;
 import org.antlr.runtime.*;
 import org.apache.cassandra.concurrent.ScheduledExecutors;
+import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.cql3.functions.Function;
 import org.apache.cassandra.cql3.functions.FunctionName;
@@ -55,35 +55,14 @@
 import org.apache.cassandra.transport.Server;
 import org.apache.cassandra.transport.messages.ResultMessage;
 import org.apache.cassandra.utils.*;
-import org.github.jamm.MemoryMeter;
 
 public class QueryProcessor implements QueryHandler
 {
-    public static final CassandraVersion CQL_VERSION = new CassandraVersion("3.4.0");
+    public static final CassandraVersion CQL_VERSION = new CassandraVersion("3.4.2");
 
     public static final QueryProcessor instance = new QueryProcessor();
 
     private static final Logger logger = LoggerFactory.getLogger(QueryProcessor.class);
-    private static final MemoryMeter meter = new MemoryMeter().withGuessing(MemoryMeter.Guess.FALLBACK_BEST).ignoreKnownSingletons();
-    private static final long MAX_CACHE_PREPARED_MEMORY = Runtime.getRuntime().maxMemory() / 256;
-
-    private static final EntryWeigher<MD5Digest, ParsedStatement.Prepared> cqlMemoryUsageWeigher = new EntryWeigher<MD5Digest, ParsedStatement.Prepared>()
-    {
-        @Override
-        public int weightOf(MD5Digest key, ParsedStatement.Prepared value)
-        {
-            return Ints.checkedCast(measure(key) + measure(value.statement) + measure(value.boundNames));
-        }
-    };
-
-    private static final EntryWeigher<Integer, ParsedStatement.Prepared> thriftMemoryUsageWeigher = new EntryWeigher<Integer, ParsedStatement.Prepared>()
-    {
-        @Override
-        public int weightOf(Integer key, ParsedStatement.Prepared value)
-        {
-            return Ints.checkedCast(measure(key) + measure(value.statement) + measure(value.boundNames));
-        }
-    };
 
     private static final ConcurrentLinkedHashMap<MD5Digest, ParsedStatement.Prepared> preparedStatements;
     private static final ConcurrentLinkedHashMap<Integer, ParsedStatement.Prepared> thriftPreparedStatements;
@@ -97,45 +76,48 @@
     public static final CQLMetrics metrics = new CQLMetrics();
 
     private static final AtomicInteger lastMinuteEvictionsCount = new AtomicInteger(0);
+    private static final AtomicInteger thriftLastMinuteEvictionsCount = new AtomicInteger(0);
 
     static
     {
         preparedStatements = new ConcurrentLinkedHashMap.Builder<MD5Digest, ParsedStatement.Prepared>()
-                             .maximumWeightedCapacity(MAX_CACHE_PREPARED_MEMORY)
-                             .weigher(cqlMemoryUsageWeigher)
-                             .listener(new EvictionListener<MD5Digest, ParsedStatement.Prepared>()
-                             {
-                                 public void onEviction(MD5Digest md5Digest, ParsedStatement.Prepared prepared)
-                                 {
-                                     metrics.preparedStatementsEvicted.inc();
-                                     lastMinuteEvictionsCount.incrementAndGet();
-                                 }
+                             .maximumWeightedCapacity(capacityToBytes(DatabaseDescriptor.getPreparedStatementsCacheSizeMB()))
+                             .weigher(QueryProcessor::measure)
+                             .listener((md5Digest, prepared) -> {
+                                 metrics.preparedStatementsEvicted.inc();
+                                 lastMinuteEvictionsCount.incrementAndGet();
                              }).build();
 
         thriftPreparedStatements = new ConcurrentLinkedHashMap.Builder<Integer, ParsedStatement.Prepared>()
-                                   .maximumWeightedCapacity(MAX_CACHE_PREPARED_MEMORY)
-                                   .weigher(thriftMemoryUsageWeigher)
-                                   .listener(new EvictionListener<Integer, ParsedStatement.Prepared>()
-                                   {
-                                       public void onEviction(Integer integer, ParsedStatement.Prepared prepared)
-                                       {
-                                           metrics.preparedStatementsEvicted.inc();
-                                           lastMinuteEvictionsCount.incrementAndGet();
-                                       }
+                                   .maximumWeightedCapacity(capacityToBytes(DatabaseDescriptor.getThriftPreparedStatementsCacheSizeMB()))
+                                   .weigher(QueryProcessor::measure)
+                                   .listener((integer, prepared) -> {
+                                       metrics.preparedStatementsEvicted.inc();
+                                       thriftLastMinuteEvictionsCount.incrementAndGet();
                                    })
                                    .build();
 
-        ScheduledExecutors.scheduledTasks.scheduleAtFixedRate(new Runnable()
-        {
-            public void run()
-            {
-                long count = lastMinuteEvictionsCount.getAndSet(0);
-                if (count > 0)
-                    logger.info("{} prepared statements discarded in the last minute because cache limit reached ({} bytes)",
-                                count,
-                                MAX_CACHE_PREPARED_MEMORY);
-            }
+        ScheduledExecutors.scheduledTasks.scheduleAtFixedRate(() -> {
+            long count = lastMinuteEvictionsCount.getAndSet(0);
+            if (count > 0)
+                logger.warn("{} prepared statements discarded in the last minute because cache limit reached ({} MB)",
+                            count,
+                            DatabaseDescriptor.getPreparedStatementsCacheSizeMB());
+            count = thriftLastMinuteEvictionsCount.getAndSet(0);
+            if (count > 0)
+                logger.warn("{} prepared Thrift statements discarded in the last minute because cache limit reached ({} MB)",
+                            count,
+                            DatabaseDescriptor.getThriftPreparedStatementsCacheSizeMB());
         }, 1, 1, TimeUnit.MINUTES);
+
+        logger.info("Initialized prepared statement caches with {} MB (native) and {} MB (Thrift)",
+                    DatabaseDescriptor.getPreparedStatementsCacheSizeMB(),
+                    DatabaseDescriptor.getThriftPreparedStatementsCacheSizeMB());
+    }
+
+    private static long capacityToBytes(long cacheSizeMB)
+    {
+        return cacheSizeMB * 1024 * 1024;
     }
 
     public static int preparedStatementsCount()
@@ -299,6 +281,12 @@
             return null;
     }
 
+    public static UntypedResultSet execute(String query, ConsistencyLevel cl, Object... values)
+    throws RequestExecutionException
+    {
+        return execute(query, cl, internalQueryState(), values);
+    }
+
     public static UntypedResultSet execute(String query, ConsistencyLevel cl, QueryState state, Object... values)
     throws RequestExecutionException
     {
@@ -393,6 +381,7 @@
             return existing;
 
         ParsedStatement.Prepared prepared = getStatement(queryString, clientState);
+        prepared.rawCQLStatement = queryString;
         int boundTerms = prepared.statement.getBoundTerms();
         if (boundTerms > FBUtilities.MAX_UNSIGNED_SHORT)
             throw new InvalidRequestException(String.format("Too many markers(?). %d markers exceed the allowed maximum of %d", boundTerms, FBUtilities.MAX_UNSIGNED_SHORT));
@@ -435,20 +424,26 @@
     {
         // Concatenate the current keyspace so we don't mix prepared statements between keyspace (#5352).
         // (if the keyspace is null, queryString has to have a fully-qualified keyspace so it's fine.
-        long statementSize = measure(prepared.statement);
+        long statementSize = ObjectSizes.measureDeep(prepared.statement);
         // don't execute the statement if it's bigger than the allowed threshold
-        if (statementSize > MAX_CACHE_PREPARED_MEMORY)
-            throw new InvalidRequestException(String.format("Prepared statement of size %d bytes is larger than allowed maximum of %d bytes.",
-                                                            statementSize,
-                                                            MAX_CACHE_PREPARED_MEMORY));
         if (forThrift)
         {
+            if (statementSize > capacityToBytes(DatabaseDescriptor.getThriftPreparedStatementsCacheSizeMB()))
+                throw new InvalidRequestException(String.format("Prepared statement of size %d bytes is larger than allowed maximum of %d MB: %s...",
+                                                                statementSize,
+                                                                DatabaseDescriptor.getThriftPreparedStatementsCacheSizeMB(),
+                                                                queryString.substring(0, 200)));
             Integer statementId = computeThriftId(queryString, keyspace);
             thriftPreparedStatements.put(statementId, prepared);
             return ResultMessage.Prepared.forThrift(statementId, prepared.boundNames);
         }
         else
         {
+            if (statementSize > capacityToBytes(DatabaseDescriptor.getPreparedStatementsCacheSizeMB()))
+                throw new InvalidRequestException(String.format("Prepared statement of size %d bytes is larger than allowed maximum of %d MB: %s...",
+                                                                statementSize,
+                                                                DatabaseDescriptor.getPreparedStatementsCacheSizeMB(),
+                                                                queryString.substring(0, 200)));
             MD5Digest statementId = computeId(queryString, keyspace);
             preparedStatements.put(statementId, prepared);
             return new ResultMessage.Prepared(statementId, prepared);
@@ -544,9 +539,9 @@
         }
     }
 
-    private static long measure(Object key)
+    private static int measure(Object key, ParsedStatement.Prepared value)
     {
-        return meter.measureDeep(key);
+        return Ints.checkedCast(ObjectSizes.measureDeep(key) + ObjectSizes.measureDeep(value));
     }
 
     /**

diff --git a/src/java/org/apache/cassandra/cql3/Relation.java b/src/java/org/apache/cassandra/cql3/Relation.java
index 334464f..097b88e 100644
--- a/src/java/org/apache/cassandra/cql3/Relation.java
+++ b/src/java/org/apache/cassandra/cql3/Relation.java

@@ -25,12 +25,11 @@
 import org.apache.cassandra.cql3.restrictions.Restriction;
 import org.apache.cassandra.cql3.statements.Bound;
 import org.apache.cassandra.exceptions.InvalidRequestException;
-import org.apache.cassandra.exceptions.UnrecognizedEntityException;
 
 import static org.apache.cassandra.cql3.statements.RequestValidations.invalidRequest;
 
-public abstract class Relation {
-
+public abstract class Relation
+{
     protected Operator relationType;
 
     public Operator operator()
@@ -108,6 +107,15 @@
         return relationType == Operator.EQ;
     }
 
+    public final boolean isLIKE()
+    {
+        return relationType == Operator.LIKE_PREFIX
+                || relationType == Operator.LIKE_SUFFIX
+                || relationType == Operator.LIKE_CONTAINS
+                || relationType == Operator.LIKE_MATCHES
+                || relationType == Operator.LIKE;
+    }
+
     /**
      * Checks if the operator of this relation is a <code>Slice</code> (GT, GTE, LTE, LT).
      *
@@ -143,6 +151,12 @@
             case CONTAINS: return newContainsRestriction(cfm, boundNames, false);
             case CONTAINS_KEY: return newContainsRestriction(cfm, boundNames, true);
             case IS_NOT: return newIsNotRestriction(cfm, boundNames);
+            case LIKE_PREFIX:
+            case LIKE_SUFFIX:
+            case LIKE_CONTAINS:
+            case LIKE_MATCHES:
+            case LIKE:
+                return newLikeRestriction(cfm, boundNames, relationType);
             default: throw invalidRequest("Unsupported \"!=\" relation: %s", this);
         }
     }
@@ -200,6 +214,10 @@
     protected abstract Restriction newIsNotRestriction(CFMetaData cfm,
                                                        VariableSpecifications boundNames) throws InvalidRequestException;
 
+    protected abstract Restriction newLikeRestriction(CFMetaData cfm,
+                                                      VariableSpecifications boundNames,
+                                                      Operator operator) throws InvalidRequestException;
+
     /**
      * Converts the specified <code>Raw</code> into a <code>Term</code>.
      * @param receivers the columns to which the values must be associated at
@@ -242,31 +260,11 @@
     }
 
     /**
-     * Converts the specified entity into a column definition.
-     *
-     * @param cfm the column family meta data
-     * @param entity the entity to convert
-     * @return the column definition corresponding to the specified entity
-     * @throws InvalidRequestException if the entity cannot be recognized
-     */
-    protected final ColumnDefinition toColumnDefinition(CFMetaData cfm,
-                                                        ColumnIdentifier.Raw entity) throws InvalidRequestException
-    {
-        ColumnIdentifier identifier = entity.prepare(cfm);
-        ColumnDefinition def = cfm.getColumnDefinition(identifier);
-
-        if (def == null)
-            throw new UnrecognizedEntityException(identifier, this);
-
-        return def;
-    }
-
-    /**
      * Renames an identifier in this Relation, if applicable.
      * @param from the old identifier
      * @param to the new identifier
      * @return this object, if the old identifier is not in the set of entities that this relation covers; otherwise
      *         a new Relation with "from" replaced by "to" is returned.
      */
-    public abstract Relation renameIdentifier(ColumnIdentifier.Raw from, ColumnIdentifier.Raw to);
+    public abstract Relation renameIdentifier(ColumnDefinition.Raw from, ColumnDefinition.Raw to);
 }

diff --git a/src/java/org/apache/cassandra/cql3/ResultSet.java b/src/java/org/apache/cassandra/cql3/ResultSet.java
index bc4daed..9010b20 100644
--- a/src/java/org/apache/cassandra/cql3/ResultSet.java
+++ b/src/java/org/apache/cassandra/cql3/ResultSet.java

@@ -438,16 +438,16 @@
 
         private final EnumSet<Flag> flags;
         public final List<ColumnSpecification> names;
-        private final Short[] partitionKeyBindIndexes;
+        private final short[] partitionKeyBindIndexes;
 
-        public PreparedMetadata(List<ColumnSpecification> names, Short[] partitionKeyBindIndexes)
+        public PreparedMetadata(List<ColumnSpecification> names, short[] partitionKeyBindIndexes)
         {
             this(EnumSet.noneOf(Flag.class), names, partitionKeyBindIndexes);
             if (!names.isEmpty() && ColumnSpecification.allInSameTable(names))
                 flags.add(Flag.GLOBAL_TABLES_SPEC);
         }
 
-        private PreparedMetadata(EnumSet<Flag> flags, List<ColumnSpecification> names, Short[] partitionKeyBindIndexes)
+        private PreparedMetadata(EnumSet<Flag> flags, List<ColumnSpecification> names, short[] partitionKeyBindIndexes)
         {
             this.flags = flags;
             this.names = names;
@@ -506,13 +506,13 @@
 
                 EnumSet<Flag> flags = Flag.deserialize(iflags);
 
-                Short[] partitionKeyBindIndexes = null;
+                short[] partitionKeyBindIndexes = null;
                 if (version >= Server.VERSION_4)
                 {
                     int numPKNames = body.readInt();
                     if (numPKNames > 0)
                     {
-                        partitionKeyBindIndexes = new Short[numPKNames];
+                        partitionKeyBindIndexes = new short[numPKNames];
                         for (int i = 0; i < numPKNames; i++)
                             partitionKeyBindIndexes[i] = body.readShort();
                     }

diff --git a/src/java/org/apache/cassandra/cql3/Sets.java b/src/java/org/apache/cassandra/cql3/Sets.java
index 622bb23..e8617aa 100644
--- a/src/java/org/apache/cassandra/cql3/Sets.java
+++ b/src/java/org/apache/cassandra/cql3/Sets.java

@@ -122,6 +122,18 @@
             return AssignmentTestable.TestResult.testAll(keyspace, valueSpec, elements);
         }
 
+        @Override
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            for (Term.Raw term : elements)
+            {
+                AbstractType<?> type = term.getExactTypeIfKnown(keyspace);
+                if (type != null)
+                    return SetType.getInstance(type, false);
+            }
+            return null;
+        }
+
         public String getText()
         {
             return elements.stream().map(Term.Raw::getText).collect(Collectors.joining(", ", "{", "}"));

diff --git a/src/java/org/apache/cassandra/cql3/SingleColumnRelation.java b/src/java/org/apache/cassandra/cql3/SingleColumnRelation.java
index 05ba42d..07232d2 100644
--- a/src/java/org/apache/cassandra/cql3/SingleColumnRelation.java
+++ b/src/java/org/apache/cassandra/cql3/SingleColumnRelation.java

@@ -34,20 +34,21 @@
 
 import static org.apache.cassandra.cql3.statements.RequestValidations.checkFalse;
 import static org.apache.cassandra.cql3.statements.RequestValidations.checkTrue;
+import static org.apache.cassandra.cql3.statements.RequestValidations.invalidRequest;
 
 /**
  * Relations encapsulate the relationship between an entity of some kind, and
- * a value (term). For example, <key> > "start" or "colname1" = "somevalue".
+ * a value (term). For example, {@code <key> > "start" or "colname1" = "somevalue"}.
  *
  */
 public final class SingleColumnRelation extends Relation
 {
-    private final ColumnIdentifier.Raw entity;
+    private final ColumnDefinition.Raw entity;
     private final Term.Raw mapKey;
     private final Term.Raw value;
     private final List<Term.Raw> inValues;
 
-    private SingleColumnRelation(ColumnIdentifier.Raw entity, Term.Raw mapKey, Operator type, Term.Raw value, List<Term.Raw> inValues)
+    private SingleColumnRelation(ColumnDefinition.Raw entity, Term.Raw mapKey, Operator type, Term.Raw value, List<Term.Raw> inValues)
     {
         this.entity = entity;
         this.mapKey = mapKey;
@@ -67,7 +68,7 @@
      * @param type the type that describes how this entity relates to the value.
      * @param value the value being compared.
      */
-    public SingleColumnRelation(ColumnIdentifier.Raw entity, Term.Raw mapKey, Operator type, Term.Raw value)
+    public SingleColumnRelation(ColumnDefinition.Raw entity, Term.Raw mapKey, Operator type, Term.Raw value)
     {
         this(entity, mapKey, type, value, null);
     }
@@ -79,7 +80,7 @@
      * @param type the type that describes how this entity relates to the value.
      * @param value the value being compared.
      */
-    public SingleColumnRelation(ColumnIdentifier.Raw entity, Operator type, Term.Raw value)
+    public SingleColumnRelation(ColumnDefinition.Raw entity, Operator type, Term.Raw value)
     {
         this(entity, null, type, value);
     }
@@ -94,12 +95,12 @@
         return inValues;
     }
 
-    public static SingleColumnRelation createInRelation(ColumnIdentifier.Raw entity, List<Term.Raw> inValues)
+    public static SingleColumnRelation createInRelation(ColumnDefinition.Raw entity, List<Term.Raw> inValues)
     {
         return new SingleColumnRelation(entity, null, Operator.IN, null, inValues);
     }
 
-    public ColumnIdentifier.Raw getEntity()
+    public ColumnDefinition.Raw getEntity()
     {
         return entity;
     }
@@ -133,7 +134,7 @@
         }
     }
 
-    public Relation renameIdentifier(ColumnIdentifier.Raw from, ColumnIdentifier.Raw to)
+    public Relation renameIdentifier(ColumnDefinition.Raw from, ColumnDefinition.Raw to)
     {
         return entity.equals(from)
                ? new SingleColumnRelation(to, mapKey, operator(), value, inValues)
@@ -157,7 +158,7 @@
     protected Restriction newEQRestriction(CFMetaData cfm,
                                            VariableSpecifications boundNames) throws InvalidRequestException
     {
-        ColumnDefinition columnDef = toColumnDefinition(cfm, entity);
+        ColumnDefinition columnDef = entity.prepare(cfm);
         if (mapKey == null)
         {
             Term term = toTerm(toReceivers(columnDef, cfm.isDense()), value, cfm.ksName, boundNames);
@@ -173,7 +174,7 @@
     protected Restriction newINRestriction(CFMetaData cfm,
                                            VariableSpecifications boundNames) throws InvalidRequestException
     {
-        ColumnDefinition columnDef = toColumnDefinition(cfm, entity);
+        ColumnDefinition columnDef = entity.prepare(cfm);
         List<? extends ColumnSpecification> receivers = toReceivers(columnDef, cfm.isDense());
         List<Term> terms = toTerms(receivers, inValues, cfm.ksName, boundNames);
         if (terms == null)
@@ -190,7 +191,7 @@
                                               Bound bound,
                                               boolean inclusive) throws InvalidRequestException
     {
-        ColumnDefinition columnDef = toColumnDefinition(cfm, entity);
+        ColumnDefinition columnDef = entity.prepare(cfm);
         Term term = toTerm(toReceivers(columnDef, cfm.isDense()), value, cfm.ksName, boundNames);
         return new SingleColumnRestriction.SliceRestriction(columnDef, bound, inclusive, term);
     }
@@ -200,7 +201,7 @@
                                                  VariableSpecifications boundNames,
                                                  boolean isKey) throws InvalidRequestException
     {
-        ColumnDefinition columnDef = toColumnDefinition(cfm, entity);
+        ColumnDefinition columnDef = entity.prepare(cfm);
         Term term = toTerm(toReceivers(columnDef, cfm.isDense()), value, cfm.ksName, boundNames);
         return new SingleColumnRestriction.ContainsRestriction(columnDef, term, isKey);
     }
@@ -209,12 +210,24 @@
     protected Restriction newIsNotRestriction(CFMetaData cfm,
                                               VariableSpecifications boundNames) throws InvalidRequestException
     {
-        ColumnDefinition columnDef = toColumnDefinition(cfm, entity);
+        ColumnDefinition columnDef = entity.prepare(cfm);
         // currently enforced by the grammar
         assert value == Constants.NULL_LITERAL : "Expected null literal for IS NOT relation: " + this.toString();
         return new SingleColumnRestriction.IsNotNullRestriction(columnDef);
     }
 
+    @Override
+    protected Restriction newLikeRestriction(CFMetaData cfm, VariableSpecifications boundNames, Operator operator) throws InvalidRequestException
+    {
+        if (mapKey != null)
+            throw invalidRequest("%s can't be used with collections.", operator());
+
+        ColumnDefinition columnDef = entity.prepare(cfm);
+        Term term = toTerm(toReceivers(columnDef, cfm.isDense()), value, cfm.ksName, boundNames);
+
+        return new SingleColumnRestriction.LikeRestriction(columnDef, operator, term);
+    }
+
     /**
      * Returns the receivers for this relation.
      * @param columnDef the column definition
@@ -301,6 +314,6 @@
 
     private boolean canHaveOnlyOneValue()
     {
-        return isEQ() || (isIN() && inValues != null && inValues.size() == 1);
+        return isEQ() || isLIKE() || (isIN() && inValues != null && inValues.size() == 1);
     }
 }

diff --git a/src/java/org/apache/cassandra/cql3/Term.java b/src/java/org/apache/cassandra/cql3/Term.java
index 5ae9c18..2c2eba6 100644
--- a/src/java/org/apache/cassandra/cql3/Term.java
+++ b/src/java/org/apache/cassandra/cql3/Term.java

@@ -18,11 +18,10 @@
 package org.apache.cassandra.cql3;
 
 import java.nio.ByteBuffer;
-import java.util.Collections;
 import java.util.List;
-import java.util.Set;
 
 import org.apache.cassandra.cql3.functions.Function;
+import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 
 /**
@@ -100,6 +99,17 @@
          */
         public abstract String getText();
 
+        /**
+         * The type of the {@code term} if it can be infered.
+         *
+         * @param keyspace the keyspace on which the statement containing this term is on.
+         * @return the type of this {@code Term} if inferrable, or {@code null}
+         * otherwise (for instance, the type isn't inferable for a bind marker. Even for
+         * literals, the exact type is not inferrable since they are valid for many
+         * different types and so this will return {@code null} too).
+         */
+        public abstract AbstractType<?> getExactTypeIfKnown(String keyspace);
+
         @Override
         public String toString()
         {

diff --git a/src/java/org/apache/cassandra/cql3/TokenRelation.java b/src/java/org/apache/cassandra/cql3/TokenRelation.java
index 2c13b19..42464ef 100644
--- a/src/java/org/apache/cassandra/cql3/TokenRelation.java
+++ b/src/java/org/apache/cassandra/cql3/TokenRelation.java

@@ -47,11 +47,11 @@
  */
 public final class TokenRelation extends Relation
 {
-    private final List<ColumnIdentifier.Raw> entities;
+    private final List<ColumnDefinition.Raw> entities;
 
     private final Term.Raw value;
 
-    public TokenRelation(List<ColumnIdentifier.Raw> entities, Operator type, Term.Raw value)
+    public TokenRelation(List<ColumnDefinition.Raw> entities, Operator type, Term.Raw value)
     {
         this.entities = entities;
         this.relationType = type;
@@ -112,6 +112,12 @@
     }
 
     @Override
+    protected Restriction newLikeRestriction(CFMetaData cfm, VariableSpecifications boundNames, Operator operator) throws InvalidRequestException
+    {
+        throw invalidRequest("%s cannot be used with the token function", operator);
+    }
+
+    @Override
     protected Term toTerm(List<? extends ColumnSpecification> receivers,
                           Raw raw,
                           String keyspace,
@@ -122,12 +128,12 @@
         return term;
     }
 
-    public Relation renameIdentifier(ColumnIdentifier.Raw from, ColumnIdentifier.Raw to)
+    public Relation renameIdentifier(ColumnDefinition.Raw from, ColumnDefinition.Raw to)
     {
         if (!entities.contains(from))
             return this;
 
-        List<ColumnIdentifier.Raw> newEntities = entities.stream().map(e -> e.equals(from) ? to : e).collect(Collectors.toList());
+        List<ColumnDefinition.Raw> newEntities = entities.stream().map(e -> e.equals(from) ? to : e).collect(Collectors.toList());
         return new TokenRelation(newEntities, operator(), value);
     }
 
@@ -146,11 +152,9 @@
      */
     private List<ColumnDefinition> getColumnDefinitions(CFMetaData cfm) throws InvalidRequestException
     {
-        List<ColumnDefinition> columnDefs = new ArrayList<>();
-        for ( ColumnIdentifier.Raw raw : entities)
-        {
-            columnDefs.add(toColumnDefinition(cfm, raw));
-        }
+        List<ColumnDefinition> columnDefs = new ArrayList<>(entities.size());
+        for ( ColumnDefinition.Raw raw : entities)
+            columnDefs.add(raw.prepare(cfm));
         return columnDefs;
     }
 

diff --git a/src/java/org/apache/cassandra/cql3/Tuples.java b/src/java/org/apache/cassandra/cql3/Tuples.java
index ee08efe..ba9ddb6 100644
--- a/src/java/org/apache/cassandra/cql3/Tuples.java
+++ b/src/java/org/apache/cassandra/cql3/Tuples.java

@@ -111,8 +111,10 @@
             for (int i = 0; i < elements.size(); i++)
             {
                 if (i >= tt.size())
+                {
                     throw new InvalidRequestException(String.format("Invalid tuple literal for %s: too many elements. Type %s expects %d but got %d",
-                                                                    receiver.name, tt.asCQL3Type(), tt.size(), elements.size()));
+                            receiver.name, tt.asCQL3Type(), tt.size(), elements.size()));
+                }
 
                 Term.Raw value = elements.get(i);
                 ColumnSpecification spec = componentSpecOf(receiver, i);
@@ -134,6 +136,20 @@
             }
         }
 
+        @Override
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            List<AbstractType<?>> types = new ArrayList<>(elements.size());
+            for (Term.Raw term : elements)
+            {
+                AbstractType<?> type = term.getExactTypeIfKnown(keyspace);
+                if (type == null)
+                    return null;
+                types.add(type);
+            }
+            return new TupleType(types);
+        }
+
         public String getText()
         {
             return elements.stream().map(Term.Raw::getText).collect(Collectors.joining(", ", "(", ")"));
@@ -154,6 +170,13 @@
 
         public static Value fromSerialized(ByteBuffer bytes, TupleType type)
         {
+            ByteBuffer[] values = type.split(bytes);
+            if (values.length > type.size())
+            {
+                throw new InvalidRequestException(String.format(
+                        "Tuple value contained too many fields (expected %s, got %s)", type.size(), values.length));
+            }
+
             return new Value(type.split(bytes));
         }
 
@@ -199,6 +222,10 @@
 
         private ByteBuffer[] bindInternal(QueryOptions options) throws InvalidRequestException
         {
+            if (elements.size() > type.size())
+                throw new InvalidRequestException(String.format(
+                        "Tuple value contained too many fields (expected %s, got %s)", type.size(), elements.size()));
+
             ByteBuffer[] buffers = new ByteBuffer[elements.size()];
             for (int i = 0; i < elements.size(); i++)
             {
@@ -313,6 +340,11 @@
             return new ColumnSpecification(receivers.get(0).ksName, receivers.get(0).cfName, identifier, type);
         }
 
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            return null;
+        }
+
         public AbstractMarker prepare(String keyspace, List<? extends ColumnSpecification> receivers) throws InvalidRequestException
         {
             return new Tuples.Marker(bindIndex, makeReceiver(receivers));
@@ -352,6 +384,11 @@
             return new ColumnSpecification(receivers.get(0).ksName, receivers.get(0).cfName, identifier, ListType.getInstance(type, false));
         }
 
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            return null;
+        }
+
         public AbstractMarker prepare(String keyspace, List<? extends ColumnSpecification> receivers) throws InvalidRequestException
         {
             return new InMarker(bindIndex, makeInReceiver(receivers));

diff --git a/src/java/org/apache/cassandra/cql3/TypeCast.java b/src/java/org/apache/cassandra/cql3/TypeCast.java
index 890b34f..7b2f306 100644
--- a/src/java/org/apache/cassandra/cql3/TypeCast.java
+++ b/src/java/org/apache/cassandra/cql3/TypeCast.java

@@ -58,6 +58,11 @@
             return AssignmentTestable.TestResult.NOT_ASSIGNABLE;
     }
 
+    public AbstractType<?> getExactTypeIfKnown(String keyspace)
+    {
+        return type.prepare(keyspace).getType();
+    }
+
     public String getText()
     {
         return "(" + type + ")" + term;

diff --git a/src/java/org/apache/cassandra/cql3/UntypedResultSet.java b/src/java/org/apache/cassandra/cql3/UntypedResultSet.java
index ada1e0f..3d70051 100644
--- a/src/java/org/apache/cassandra/cql3/UntypedResultSet.java
+++ b/src/java/org/apache/cassandra/cql3/UntypedResultSet.java

@@ -187,7 +187,8 @@
                         if (pager.isExhausted())
                             return endOfData();
 
-                        try (ReadOrderGroup orderGroup = pager.startOrderGroup(); PartitionIterator iter = pager.fetchPageInternal(pageSize, orderGroup))
+                        try (ReadExecutionController executionController = pager.executionController();
+                             PartitionIterator iter = pager.fetchPageInternal(pageSize, executionController))
                         {
                             currentPage = select.process(iter, nowInSec).rows.iterator();
                         }
@@ -244,7 +245,7 @@
                 {
                     ComplexColumnData complexData = row.getComplexColumnData(def);
                     if (complexData != null)
-                        data.put(def.name.toString(), ((CollectionType)def.type).serializeForNativeProtocol(def, complexData.iterator(), Server.VERSION_3));
+                        data.put(def.name.toString(), ((CollectionType)def.type).serializeForNativeProtocol(complexData.iterator(), Server.VERSION_3));
                 }
             }
 

diff --git a/src/java/org/apache/cassandra/cql3/UpdateParameters.java b/src/java/org/apache/cassandra/cql3/UpdateParameters.java
index 0c58097..d2c01c8 100644
--- a/src/java/org/apache/cassandra/cql3/UpdateParameters.java
+++ b/src/java/org/apache/cassandra/cql3/UpdateParameters.java

@@ -116,7 +116,7 @@
 
     public void addPrimaryKeyLivenessInfo()
     {
-        builder.addPrimaryKeyLivenessInfo(LivenessInfo.create(metadata, timestamp, ttl, nowInSec));
+        builder.addPrimaryKeyLivenessInfo(LivenessInfo.create(timestamp, ttl, nowInSec));
     }
 
     public void addRowDeletion()
@@ -149,7 +149,7 @@
     public void addCell(ColumnDefinition column, CellPath path, ByteBuffer value) throws InvalidRequestException
     {
         Cell cell = ttl == LivenessInfo.NO_TTL
-                  ? BufferCell.live(metadata, column, timestamp, value, path)
+                  ? BufferCell.live(column, timestamp, value, path)
                   : BufferCell.expiring(column, timestamp, ttl, nowInSec, value, path);
         builder.addCell(cell);
     }
@@ -167,7 +167,7 @@
         // shard is due to the merging rules: if a user includes multiple updates to the same counter in a batch, those
         // multiple updates will be merged in the PartitionUpdate *before* they even reach CounterMutation. So we need
         // such update to be added together, and that's what a local shard gives us.
-        builder.addCell(BufferCell.live(metadata, column, timestamp, CounterContext.instance().createLocal(increment)));
+        builder.addCell(BufferCell.live(column, timestamp, CounterContext.instance().createLocal(increment)));
     }
 
     public void setComplexDeletionTime(ColumnDefinition column)

diff --git a/src/java/org/apache/cassandra/cql3/UserTypes.java b/src/java/org/apache/cassandra/cql3/UserTypes.java
index 89cfe4b..41b8eed 100644
--- a/src/java/org/apache/cassandra/cql3/UserTypes.java
+++ b/src/java/org/apache/cassandra/cql3/UserTypes.java

@@ -20,12 +20,16 @@
 import java.nio.ByteBuffer;
 import java.util.*;
 
+import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.functions.Function;
-import org.apache.cassandra.db.marshal.UTF8Type;
-import org.apache.cassandra.db.marshal.UserType;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.marshal.*;
+import org.apache.cassandra.db.rows.CellPath;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.utils.ByteBufferUtil;
 
+import static org.apache.cassandra.cql3.Constants.UNSET_VALUE;
+
 /**
  * Static helper methods and classes for user types.
  */
@@ -38,15 +42,15 @@
         UserType ut = (UserType)column.type;
         return new ColumnSpecification(column.ksName,
                                        column.cfName,
-                                       new ColumnIdentifier(column.name + "." + UTF8Type.instance.compose(ut.fieldName(field)), true),
+                                       new ColumnIdentifier(column.name + "." + ut.fieldName(field), true),
                                        ut.fieldType(field));
     }
 
     public static class Literal extends Term.Raw
     {
-        public final Map<ColumnIdentifier, Term.Raw> entries;
+        public final Map<FieldIdentifier, Term.Raw> entries;
 
-        public Literal(Map<ColumnIdentifier, Term.Raw> entries)
+        public Literal(Map<FieldIdentifier, Term.Raw> entries)
         {
             this.entries = entries;
         }
@@ -61,7 +65,7 @@
             int foundValues = 0;
             for (int i = 0; i < ut.size(); i++)
             {
-                ColumnIdentifier field = new ColumnIdentifier(ut.fieldName(i), UTF8Type.instance);
+                FieldIdentifier field = ut.fieldName(i);
                 Term.Raw raw = entries.get(field);
                 if (raw == null)
                     raw = Constants.NULL_LITERAL;
@@ -77,9 +81,11 @@
             if (foundValues != entries.size())
             {
                 // We had some field that are not part of the type
-                for (ColumnIdentifier id : entries.keySet())
-                    if (!ut.fieldNames().contains(id.bytes))
+                for (FieldIdentifier id : entries.keySet())
+                {
+                    if (!ut.fieldNames().contains(id))
                         throw new InvalidRequestException(String.format("Unknown field '%s' in value of user defined type %s", id, ut.getNameAsString()));
+                }
             }
 
             DelayedValue value = new DelayedValue(((UserType)receiver.type), values);
@@ -88,20 +94,23 @@
 
         private void validateAssignableTo(String keyspace, ColumnSpecification receiver) throws InvalidRequestException
         {
-            if (!(receiver.type instanceof UserType))
+            if (!receiver.type.isUDT())
                 throw new InvalidRequestException(String.format("Invalid user type literal for %s of type %s", receiver, receiver.type.asCQL3Type()));
 
             UserType ut = (UserType)receiver.type;
             for (int i = 0; i < ut.size(); i++)
             {
-                ColumnIdentifier field = new ColumnIdentifier(ut.fieldName(i), UTF8Type.instance);
+                FieldIdentifier field = ut.fieldName(i);
                 Term.Raw value = entries.get(field);
                 if (value == null)
                     continue;
 
                 ColumnSpecification fieldSpec = fieldSpecOf(receiver, i);
                 if (!value.testAssignment(keyspace, fieldSpec).isAssignable())
-                    throw new InvalidRequestException(String.format("Invalid user type literal for %s: field %s is not of type %s", receiver, field, fieldSpec.type.asCQL3Type()));
+                {
+                    throw new InvalidRequestException(String.format("Invalid user type literal for %s: field %s is not of type %s",
+                            receiver, field, fieldSpec.type.asCQL3Type()));
+                }
             }
         }
 
@@ -118,14 +127,19 @@
             }
         }
 
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            return null;
+        }
+
         public String getText()
         {
             StringBuilder sb = new StringBuilder();
             sb.append("{");
-            Iterator<Map.Entry<ColumnIdentifier, Term.Raw>> iter = entries.entrySet().iterator();
+            Iterator<Map.Entry<FieldIdentifier, Term.Raw>> iter = entries.entrySet().iterator();
             while (iter.hasNext())
             {
-                Map.Entry<ColumnIdentifier, Term.Raw> entry = iter.next();
+                Map.Entry<FieldIdentifier, Term.Raw> entry = iter.next();
                 sb.append(entry.getKey()).append(": ").append(entry.getValue().getText());
                 if (iter.hasNext())
                     sb.append(", ");
@@ -135,7 +149,52 @@
         }
     }
 
-    // Same purpose than Lists.DelayedValue, except we do handle bind marker in that case
+    public static class Value extends Term.MultiItemTerminal
+    {
+        private final UserType type;
+        public final ByteBuffer[] elements;
+
+        public Value(UserType type, ByteBuffer[] elements)
+        {
+            this.type = type;
+            this.elements = elements;
+        }
+
+        public static Value fromSerialized(ByteBuffer bytes, UserType type)
+        {
+            ByteBuffer[] values = type.split(bytes);
+            if (values.length > type.size())
+            {
+                throw new InvalidRequestException(String.format(
+                        "UDT value contained too many fields (expected %s, got %s)", type.size(), values.length));
+            }
+
+            return new Value(type, type.split(bytes));
+        }
+
+        public ByteBuffer get(int protocolVersion)
+        {
+            return TupleType.buildValue(elements);
+        }
+
+        public boolean equals(UserType userType, Value v)
+        {
+            if (elements.length != v.elements.length)
+                return false;
+
+            for (int i = 0; i < elements.length; i++)
+                if (userType.fieldType(i).compare(elements[i], v.elements[i]) != 0)
+                    return false;
+
+            return true;
+        }
+
+        public List<ByteBuffer> getElements()
+        {
+            return Arrays.asList(elements);
+        }
+    }
+
     public static class DelayedValue extends Term.NonTerminal
     {
         private final UserType type;
@@ -168,20 +227,27 @@
 
         private ByteBuffer[] bindInternal(QueryOptions options) throws InvalidRequestException
         {
+            if (values.size() > type.size())
+            {
+                throw new InvalidRequestException(String.format(
+                        "UDT value contained too many fields (expected %s, got %s)", type.size(), values.size()));
+            }
+
             ByteBuffer[] buffers = new ByteBuffer[values.size()];
             for (int i = 0; i < type.size(); i++)
             {
                 buffers[i] = values.get(i).bindAndGet(options);
-                // Since A UDT value is always written in its entirety Cassandra can't preserve a pre-existing value by 'not setting' the new value. Reject the query.
-                if (buffers[i] == ByteBufferUtil.UNSET_BYTE_BUFFER)
+                // Since a frozen UDT value is always written in its entirety Cassandra can't preserve a pre-existing
+                // value by 'not setting' the new value. Reject the query.
+                if (!type.isMultiCell() && buffers[i] == ByteBufferUtil.UNSET_BYTE_BUFFER)
                     throw new InvalidRequestException(String.format("Invalid unset value for field '%s' of user defined type %s", type.fieldNameAsString(i), type.getNameAsString()));
             }
             return buffers;
         }
 
-        public Constants.Value bind(QueryOptions options) throws InvalidRequestException
+        public Value bind(QueryOptions options) throws InvalidRequestException
         {
-            return new Constants.Value(bindAndGet(options));
+            return new Value(type, bindInternal(options));
         }
 
         @Override
@@ -190,4 +256,114 @@
             return UserType.buildValue(bindInternal(options));
         }
     }
+
+    public static class Marker extends AbstractMarker
+    {
+        protected Marker(int bindIndex, ColumnSpecification receiver)
+        {
+            super(bindIndex, receiver);
+            assert receiver.type.isUDT();
+        }
+
+        public Terminal bind(QueryOptions options) throws InvalidRequestException
+        {
+            ByteBuffer value = options.getValues().get(bindIndex);
+            if (value == null)
+                return null;
+            if (value == ByteBufferUtil.UNSET_BYTE_BUFFER)
+                return UNSET_VALUE;
+            return Value.fromSerialized(value, (UserType) receiver.type);
+        }
+    }
+
+    public static class Setter extends Operation
+    {
+        public Setter(ColumnDefinition column, Term t)
+        {
+            super(column, t);
+        }
+
+        public void execute(DecoratedKey partitionKey, UpdateParameters params) throws InvalidRequestException
+        {
+            Term.Terminal value = t.bind(params.options);
+            if (value == UNSET_VALUE)
+                return;
+
+            Value userTypeValue = (Value) value;
+            if (column.type.isMultiCell())
+            {
+                // setting a whole UDT at once means we overwrite all cells, so delete existing cells
+                params.setComplexDeletionTimeForOverwrite(column);
+                if (value == null)
+                    return;
+
+                Iterator<FieldIdentifier> fieldNameIter = userTypeValue.type.fieldNames().iterator();
+                for (ByteBuffer buffer : userTypeValue.elements)
+                {
+                    assert fieldNameIter.hasNext();
+                    FieldIdentifier fieldName = fieldNameIter.next();
+                    if (buffer == null)
+                        continue;
+
+                    CellPath fieldPath = userTypeValue.type.cellPathForField(fieldName);
+                    params.addCell(column, fieldPath, buffer);
+                }
+            }
+            else
+            {
+                // for frozen UDTs, we're overwriting the whole cell value
+                if (value == null)
+                    params.addTombstone(column);
+                else
+                    params.addCell(column, value.get(params.options.getProtocolVersion()));
+            }
+        }
+    }
+
+    public static class SetterByField extends Operation
+    {
+        private final FieldIdentifier field;
+
+        public SetterByField(ColumnDefinition column, FieldIdentifier field, Term t)
+        {
+            super(column, t);
+            this.field = field;
+        }
+
+        public void execute(DecoratedKey partitionKey, UpdateParameters params) throws InvalidRequestException
+        {
+            // we should not get here for frozen UDTs
+            assert column.type.isMultiCell() : "Attempted to set an individual field on a frozen UDT";
+
+            Term.Terminal value = t.bind(params.options);
+            if (value == UNSET_VALUE)
+                return;
+
+            CellPath fieldPath = ((UserType) column.type).cellPathForField(field);
+            if (value == null)
+                params.addTombstone(column, fieldPath);
+            else
+                params.addCell(column, fieldPath, value.get(params.options.getProtocolVersion()));
+        }
+    }
+
+    public static class DeleterByField extends Operation
+    {
+        private final FieldIdentifier field;
+
+        public DeleterByField(ColumnDefinition column, FieldIdentifier field)
+        {
+            super(column, null);
+            this.field = field;
+        }
+
+        public void execute(DecoratedKey partitionKey, UpdateParameters params) throws InvalidRequestException
+        {
+            // we should not get here for frozen UDTs
+            assert column.type.isMultiCell() : "Attempted to delete a single field from a frozen UDT";
+
+            CellPath fieldPath = ((UserType) column.type).cellPathForField(field);
+            params.addTombstone(column, fieldPath);
+        }
+    }
 }

diff --git a/src/java/org/apache/cassandra/cql3/VariableSpecifications.java b/src/java/org/apache/cassandra/cql3/VariableSpecifications.java
index 5304350..24f71e4 100644
--- a/src/java/org/apache/cassandra/cql3/VariableSpecifications.java
+++ b/src/java/org/apache/cassandra/cql3/VariableSpecifications.java

@@ -63,9 +63,10 @@
      *
      * Callers of this method should ensure that all statements operate on the same table.
      */
-    public Short[] getPartitionKeyBindIndexes(CFMetaData cfm)
+    public short[] getPartitionKeyBindIndexes(CFMetaData cfm)
     {
-        Short[] partitionKeyPositions = new Short[cfm.partitionKeyColumns().size()];
+        short[] partitionKeyPositions = new short[cfm.partitionKeyColumns().size()];
+        boolean[] set = new boolean[partitionKeyPositions.length];
         for (int i = 0; i < targetColumns.length; i++)
         {
             ColumnDefinition targetColumn = targetColumns[i];
@@ -73,14 +74,13 @@
             {
                 assert targetColumn.ksName.equals(cfm.ksName) && targetColumn.cfName.equals(cfm.cfName);
                 partitionKeyPositions[targetColumn.position()] = (short) i;
+                set[targetColumn.position()] = true;
             }
         }
 
-        for (Short bindIndex : partitionKeyPositions)
-        {
-            if (bindIndex == null)
+        for (boolean b : set)
+            if (!b)
                 return null;
-        }
 
         return partitionKeyPositions;
     }

diff --git a/src/java/org/apache/cassandra/cql3/functions/AbstractFunction.java b/src/java/org/apache/cassandra/cql3/functions/AbstractFunction.java
index 0cf11a5..aa7555f 100644
--- a/src/java/org/apache/cassandra/cql3/functions/AbstractFunction.java
+++ b/src/java/org/apache/cassandra/cql3/functions/AbstractFunction.java

@@ -24,6 +24,7 @@
 import org.apache.cassandra.cql3.AssignmentTestable;
 import org.apache.cassandra.cql3.ColumnSpecification;
 import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.commons.lang3.text.StrBuilder;
 
 /**
  * Base class for our native/hardcoded functions.
@@ -89,7 +90,7 @@
         // We should ignore the fact that the receiver type is frozen in our comparison as functions do not support
         // frozen types for return type
         AbstractType<?> returnType = returnType();
-        if (receiver.type.isFrozenCollection())
+        if (receiver.type.isFreezable() && !receiver.type.isMultiCell())
             returnType = returnType.freeze();
 
         if (receiver.type.equals(returnType))
@@ -115,4 +116,13 @@
         sb.append(") -> ").append(returnType.asCQL3Type());
         return sb.toString();
     }
+
+    @Override
+    public String columnName(List<String> columnNames)
+    {
+        return new StrBuilder(name().toString()).append('(')
+                                                .appendWithSeparators(columnNames, ", ")
+                                                .append(')')
+                                                .toString();
+    }
 }

diff --git a/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java b/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java
index b2cae50..0c4f2e2 100644
--- a/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java
+++ b/src/java/org/apache/cassandra/cql3/functions/AggregateFcts.java

@@ -84,17 +84,6 @@
     }
 
     /**
-     * Checks if the specified function is the count rows (e.g. COUNT(*) or COUNT(1)) function.
-     *
-     * @param function the function to check
-     * @return <code>true</code> if the specified function is the count rows one, <code>false</code> otherwise.
-     */
-    public static boolean isCountRows(Function function)
-    {
-        return function == countRowsFunction;
-    }
-
-    /**
      * The function used to count the number of rows of a result set. This function is called when COUNT(*) or COUNT(1)
      * is specified.
      */
@@ -123,6 +112,12 @@
                         }
                     };
                 }
+
+                @Override
+                public String columnName(List<String> columnNames)
+                {
+                    return "count";
+                }
             };
 
     /**

diff --git a/src/java/org/apache/cassandra/cql3/functions/CastFcts.java b/src/java/org/apache/cassandra/cql3/functions/CastFcts.java
new file mode 100644
index 0000000..b5d3698
--- /dev/null
+++ b/src/java/org/apache/cassandra/cql3/functions/CastFcts.java

@@ -0,0 +1,342 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql3.functions;
+
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+
+import org.apache.cassandra.cql3.CQL3Type;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.marshal.AsciiType;
+import org.apache.cassandra.db.marshal.BooleanType;
+import org.apache.cassandra.db.marshal.ByteType;
+import org.apache.cassandra.db.marshal.CounterColumnType;
+import org.apache.cassandra.db.marshal.DecimalType;
+import org.apache.cassandra.db.marshal.DoubleType;
+import org.apache.cassandra.db.marshal.FloatType;
+import org.apache.cassandra.db.marshal.InetAddressType;
+import org.apache.cassandra.db.marshal.Int32Type;
+import org.apache.cassandra.db.marshal.IntegerType;
+import org.apache.cassandra.db.marshal.LongType;
+import org.apache.cassandra.db.marshal.ShortType;
+import org.apache.cassandra.db.marshal.SimpleDateType;
+import org.apache.cassandra.db.marshal.TimeType;
+import org.apache.cassandra.db.marshal.TimeUUIDType;
+import org.apache.cassandra.db.marshal.TimestampType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.db.marshal.UUIDType;
+import org.apache.commons.lang3.text.WordUtils;
+
+/**
+ * Casting functions
+ *
+ */
+public final class CastFcts
+{
+    private static final String FUNCTION_NAME_PREFIX = "castAs";
+
+    public static Collection<Function> all()
+    {
+        List<Function> functions = new ArrayList<>();
+
+        @SuppressWarnings("unchecked")
+        final AbstractType<? extends Number>[] numericTypes = new AbstractType[] {ByteType.instance,
+                                                                                  ShortType.instance,
+                                                                                  Int32Type.instance,
+                                                                                  LongType.instance,
+                                                                                  FloatType.instance,
+                                                                                  DoubleType.instance,
+                                                                                  DecimalType.instance,
+                                                                                  CounterColumnType.instance,
+                                                                                  IntegerType.instance};
+
+        for (AbstractType<? extends Number> inputType : numericTypes)
+        {
+            addFunctionIfNeeded(functions, inputType, ByteType.instance, Number::byteValue);
+            addFunctionIfNeeded(functions, inputType, ShortType.instance, Number::shortValue);
+            addFunctionIfNeeded(functions, inputType, Int32Type.instance, Number::intValue);
+            addFunctionIfNeeded(functions, inputType, LongType.instance, Number::longValue);
+            addFunctionIfNeeded(functions, inputType, FloatType.instance, Number::floatValue);
+            addFunctionIfNeeded(functions, inputType, DoubleType.instance, Number::doubleValue);
+            addFunctionIfNeeded(functions, inputType, DecimalType.instance, p -> BigDecimal.valueOf(p.doubleValue()));
+            addFunctionIfNeeded(functions, inputType, IntegerType.instance, p -> BigInteger.valueOf(p.longValue()));
+            functions.add(CastAsTextFunction.create(inputType, AsciiType.instance));
+            functions.add(CastAsTextFunction.create(inputType, UTF8Type.instance));
+        }
+
+        functions.add(JavaFunctionWrapper.create(AsciiType.instance, UTF8Type.instance, p -> p));
+
+        functions.add(CastAsTextFunction.create(InetAddressType.instance, AsciiType.instance));
+        functions.add(CastAsTextFunction.create(InetAddressType.instance, UTF8Type.instance));
+
+        functions.add(CastAsTextFunction.create(BooleanType.instance, AsciiType.instance));
+        functions.add(CastAsTextFunction.create(BooleanType.instance, UTF8Type.instance));
+
+        functions.add(CassandraFunctionWrapper.create(TimeUUIDType.instance, SimpleDateType.instance, TimeFcts.timeUuidtoDate));
+        functions.add(CassandraFunctionWrapper.create(TimeUUIDType.instance, TimestampType.instance, TimeFcts.timeUuidToTimestamp));
+        functions.add(CastAsTextFunction.create(TimeUUIDType.instance, AsciiType.instance));
+        functions.add(CastAsTextFunction.create(TimeUUIDType.instance, UTF8Type.instance));
+        functions.add(CassandraFunctionWrapper.create(TimestampType.instance, SimpleDateType.instance, TimeFcts.timestampToDate));
+        functions.add(CastAsTextFunction.create(TimestampType.instance, AsciiType.instance));
+        functions.add(CastAsTextFunction.create(TimestampType.instance, UTF8Type.instance));
+        functions.add(CassandraFunctionWrapper.create(SimpleDateType.instance, TimestampType.instance, TimeFcts.dateToTimestamp));
+        functions.add(CastAsTextFunction.create(SimpleDateType.instance, AsciiType.instance));
+        functions.add(CastAsTextFunction.create(SimpleDateType.instance, UTF8Type.instance));
+        functions.add(CastAsTextFunction.create(TimeType.instance, AsciiType.instance));
+        functions.add(CastAsTextFunction.create(TimeType.instance, UTF8Type.instance));
+
+        functions.add(CastAsTextFunction.create(UUIDType.instance, AsciiType.instance));
+        functions.add(CastAsTextFunction.create(UUIDType.instance, UTF8Type.instance));
+
+        return functions;
+    }
+
+    /**
+     * Creates the name of the cast function use to cast to the specified type.
+     *
+     * @param outputType the output type
+     * @return the name of the cast function use to cast to the specified type
+     */
+    public static String getFunctionName(AbstractType<?> outputType)
+    {
+        return getFunctionName(outputType.asCQL3Type());
+    }
+
+    /**
+     * Creates the name of the cast function use to cast to the specified type.
+     *
+     * @param outputType the output type
+     * @return the name of the cast function use to cast to the specified type
+     */
+    public static String getFunctionName(CQL3Type outputType)
+    {
+        return FUNCTION_NAME_PREFIX + WordUtils.capitalize(toLowerCaseString(outputType));
+    }
+
+    /**
+     * Adds to the list a function converting the input type in to the output type if they are not the same.
+     *
+     * @param functions the list to add to
+     * @param inputType the input type
+     * @param outputType the output type
+     * @param converter the function use to convert the input type into the output type
+     */
+    private static <I, O> void addFunctionIfNeeded(List<Function> functions,
+                                                   AbstractType<I> inputType,
+                                                   AbstractType<O> outputType,
+                                                   java.util.function.Function<I, O> converter)
+    {
+        if (!inputType.equals(outputType))
+            functions.add(wrapJavaFunction(inputType, outputType, converter));
+    }
+
+    @SuppressWarnings("unchecked")
+    private static <O, I> Function wrapJavaFunction(AbstractType<I> inputType,
+                                                    AbstractType<O> outputType,
+                                                    java.util.function.Function<I, O> converter)
+    {
+        return inputType.equals(CounterColumnType.instance)
+                ? JavaCounterFunctionWrapper.create(outputType, (java.util.function.Function<Long, O>) converter)
+                : JavaFunctionWrapper.create(inputType, outputType, converter);
+    }
+
+    private static String toLowerCaseString(CQL3Type type)
+    {
+        return type.toString().toLowerCase();
+    }
+
+    /**
+     * Base class for the CAST functions.
+     *
+     * @param <I> the input type
+     * @param <O> the output type
+     */
+    private static abstract class CastFunction<I, O> extends NativeScalarFunction
+    {
+        public CastFunction(AbstractType<I> inputType, AbstractType<O> outputType)
+        {
+            super(getFunctionName(outputType), outputType, inputType);
+        }
+
+        @Override
+        public String columnName(List<String> columnNames)
+        {
+            return String.format("cast(%s as %s)", columnNames.get(0), toLowerCaseString(outputType().asCQL3Type()));
+        }
+
+        @SuppressWarnings("unchecked")
+        protected AbstractType<O> outputType()
+        {
+            return (AbstractType<O>) returnType;
+        }
+
+        @SuppressWarnings("unchecked")
+        protected AbstractType<I> inputType()
+        {
+            return (AbstractType<I>) argTypes.get(0);
+        }
+    }
+
+    /**
+     * <code>CastFunction</code> that implements casting by wrapping a java <code>Function</code>.
+     *
+     * @param <I> the input parameter
+     * @param <O> the output parameter
+     */
+    private static class JavaFunctionWrapper<I, O> extends CastFunction<I, O>
+    {
+        /**
+         * The java function used to convert the input type into the output one.
+         */
+        private final java.util.function.Function<I, O> converter;
+
+        public static <I, O> JavaFunctionWrapper<I, O> create(AbstractType<I> inputType,
+                                                              AbstractType<O> outputType,
+                                                              java.util.function.Function<I, O> converter)
+        {
+            return new JavaFunctionWrapper<I, O>(inputType, outputType, converter);
+        }
+
+        protected JavaFunctionWrapper(AbstractType<I> inputType,
+                                      AbstractType<O> outputType,
+                                      java.util.function.Function<I, O> converter)
+        {
+            super(inputType, outputType);
+            this.converter = converter;
+        }
+
+        public final ByteBuffer execute(int protocolVersion, List<ByteBuffer> parameters)
+        {
+            ByteBuffer bb = parameters.get(0);
+            if (bb == null)
+                return null;
+
+            return outputType().decompose(converter.apply(compose(bb)));
+        }
+
+        protected I compose(ByteBuffer bb)
+        {
+            return inputType().compose(bb);
+        }
+    }
+
+    /**
+     * <code>JavaFunctionWrapper</code> for counter columns.
+     *
+     * <p>Counter columns need to be handled in a special way because their binary representation is converted into
+     * the one of a BIGINT before functions are applied.</p>
+     *
+     * @param <O> the output parameter
+     */
+    private static class JavaCounterFunctionWrapper<O> extends JavaFunctionWrapper<Long, O>
+    {
+        public static <O> JavaFunctionWrapper<Long, O> create(AbstractType<O> outputType,
+                                                              java.util.function.Function<Long, O> converter)
+        {
+            return new JavaCounterFunctionWrapper<O>(outputType, converter);
+        }
+
+        protected JavaCounterFunctionWrapper(AbstractType<O> outputType,
+                                            java.util.function.Function<Long, O> converter)
+        {
+            super(CounterColumnType.instance, outputType, converter);
+        }
+
+        protected Long compose(ByteBuffer bb)
+        {
+            return LongType.instance.compose(bb);
+        }
+    }
+
+    /**
+     * <code>CastFunction</code> that implements casting by wrapping an existing <code>NativeScalarFunction</code>.
+     *
+     * @param <I> the input parameter
+     * @param <O> the output parameter
+     */
+    private static final class CassandraFunctionWrapper<I, O> extends CastFunction<I, O>
+    {
+        /**
+         * The native scalar function used to perform the conversion.
+         */
+        private final NativeScalarFunction delegate;
+
+        public static <I, O> CassandraFunctionWrapper<I, O> create(AbstractType<I> inputType,
+                                                                   AbstractType<O> outputType,
+                                                                   NativeScalarFunction delegate)
+        {
+            return new CassandraFunctionWrapper<I, O>(inputType, outputType, delegate);
+        }
+
+        private CassandraFunctionWrapper(AbstractType<I> inputType,
+                                         AbstractType<O> outputType,
+                                         NativeScalarFunction delegate)
+        {
+            super(inputType, outputType);
+            assert delegate.argTypes().size() == 1 && inputType.equals(delegate.argTypes().get(0));
+            assert outputType.equals(delegate.returnType());
+            this.delegate = delegate;
+        }
+
+        public ByteBuffer execute(int protocolVersion, List<ByteBuffer> parameters)
+        {
+            return delegate.execute(protocolVersion, parameters);
+        }
+    }
+
+    /**
+     * <code>CastFunction</code> that can be used to cast a type into ascii or text types.
+     *
+     * @param <I> the input parameter
+     */
+    private static final class CastAsTextFunction<I> extends CastFunction<I, String>
+    {
+
+        public static <I> CastAsTextFunction<I> create(AbstractType<I> inputType,
+                                                       AbstractType<String> outputType)
+        {
+            return new CastAsTextFunction<I>(inputType, outputType);
+        }
+
+        private CastAsTextFunction(AbstractType<I> inputType,
+                                    AbstractType<String> outputType)
+        {
+            super(inputType, outputType);
+        }
+
+        public ByteBuffer execute(int protocolVersion, List<ByteBuffer> parameters)
+        {
+            ByteBuffer bb = parameters.get(0);
+            if (bb == null)
+                return null;
+
+            return outputType().decompose(inputType().getSerializer().toCQLLiteral(bb));
+        }
+    }
+
+    /**
+     * The class must not be instantiated as it contains only static variables.
+     */
+    private CastFcts()
+    {
+    }
+}

diff --git a/src/java/org/apache/cassandra/cql3/functions/Function.java b/src/java/org/apache/cassandra/cql3/functions/Function.java
index f93f14b..5d258af 100644
--- a/src/java/org/apache/cassandra/cql3/functions/Function.java
+++ b/src/java/org/apache/cassandra/cql3/functions/Function.java

@@ -47,4 +47,12 @@
     public void addFunctionsTo(List<Function> functions);
 
     public boolean hasReferenceTo(Function function);
+
+    /**
+     * Returns the name of the function to use within a ResultSet.
+     *
+     * @param columnNames the names of the columns used to call the function
+     * @return the name of the function to use within a ResultSet
+     */
+    public String columnName(List<String> columnNames);
 }

diff --git a/src/java/org/apache/cassandra/cql3/functions/FunctionCall.java b/src/java/org/apache/cassandra/cql3/functions/FunctionCall.java
index be3081a..3905c83 100644
--- a/src/java/org/apache/cassandra/cql3/functions/FunctionCall.java
+++ b/src/java/org/apache/cassandra/cql3/functions/FunctionCall.java

@@ -98,16 +98,24 @@
 
     private static Term.Terminal makeTerminal(Function fun, ByteBuffer result, int version) throws InvalidRequestException
     {
-        if (!(fun.returnType() instanceof CollectionType))
-            return new Constants.Value(result);
-
-        switch (((CollectionType)fun.returnType()).kind)
+        if (fun.returnType().isCollection())
         {
-            case LIST: return Lists.Value.fromSerialized(result, (ListType)fun.returnType(), version);
-            case SET:  return Sets.Value.fromSerialized(result, (SetType)fun.returnType(), version);
-            case MAP:  return Maps.Value.fromSerialized(result, (MapType)fun.returnType(), version);
+            switch (((CollectionType) fun.returnType()).kind)
+            {
+                case LIST:
+                    return Lists.Value.fromSerialized(result, (ListType) fun.returnType(), version);
+                case SET:
+                    return Sets.Value.fromSerialized(result, (SetType) fun.returnType(), version);
+                case MAP:
+                    return Maps.Value.fromSerialized(result, (MapType) fun.returnType(), version);
+            }
         }
-        throw new AssertionError();
+        else if (fun.returnType().isUDT())
+        {
+            return UserTypes.Value.fromSerialized(result, (UserType) fun.returnType());
+        }
+
+        return new Constants.Value(result);
     }
 
     public static class Raw extends Term.Raw
@@ -181,6 +189,16 @@
             }
         }
 
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            // We could implement this, but the method is only used in selection clause, where FunctionCall is not used 
+            // we use a Selectable.WithFunction instead). And if that method is later used in other places, better to
+            // let that future patch make sure this can be implemented properly (note in particular we don't have access
+            // to the receiver type, which FunctionResolver.get() takes) rather than provide an implementation that may
+            // not work in all cases.
+            throw new UnsupportedOperationException();
+        }
+
         public String getText()
         {
             return name + terms.stream().map(Term.Raw::getText).collect(Collectors.joining(", ", "(", ")"));

diff --git a/src/java/org/apache/cassandra/cql3/functions/FunctionResolver.java b/src/java/org/apache/cassandra/cql3/functions/FunctionResolver.java
index be2daae..9e0b706 100644
--- a/src/java/org/apache/cassandra/cql3/functions/FunctionResolver.java
+++ b/src/java/org/apache/cassandra/cql3/functions/FunctionResolver.java

@@ -126,9 +126,11 @@
             }
         }
 
-        if (compatibles == null || compatibles.isEmpty())
+        if (compatibles == null)
+        {
             throw new InvalidRequestException(String.format("Invalid call to function %s, none of its type signatures match (known type signatures: %s)",
                                                             name, format(candidates)));
+        }
 
         if (compatibles.size() > 1)
             throw new InvalidRequestException(String.format("Ambiguous call to function %s (can be matched by following signatures: %s): use type casts to disambiguate",

diff --git a/src/java/org/apache/cassandra/cql3/functions/JavaBasedUDFunction.java b/src/java/org/apache/cassandra/cql3/functions/JavaBasedUDFunction.java
index 660d494..87f5019 100644
--- a/src/java/org/apache/cassandra/cql3/functions/JavaBasedUDFunction.java
+++ b/src/java/org/apache/cassandra/cql3/functions/JavaBasedUDFunction.java

@@ -35,8 +35,11 @@
 import java.util.concurrent.ExecutorService;
 import java.util.concurrent.ThreadLocalRandom;
 import java.util.concurrent.atomic.AtomicInteger;
+import java.util.regex.Pattern;
 
+import com.google.common.annotations.VisibleForTesting;
 import com.google.common.io.ByteStreams;
+import com.google.common.reflect.TypeToken;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -57,10 +60,12 @@
 import org.eclipse.jdt.internal.compiler.impl.CompilerOptions;
 import org.eclipse.jdt.internal.compiler.problem.DefaultProblemFactory;
 
-final class JavaBasedUDFunction extends UDFunction
+public final class JavaBasedUDFunction extends UDFunction
 {
     private static final String BASE_PACKAGE = "org.apache.cassandra.cql3.udf.gen";
 
+    private static final Pattern JAVA_LANG_PREFIX = Pattern.compile("\\bjava\\.lang\\.");
+
     static final Logger logger = LoggerFactory.getLogger(JavaBasedUDFunction.class);
 
     private static final AtomicInteger classSequence = new AtomicInteger();
@@ -184,10 +189,10 @@
         super(name, argNames, argTypes, UDHelper.driverTypes(argTypes),
               returnType, UDHelper.driverType(returnType), calledOnNullInput, "java", body);
 
-        // javaParamTypes is just the Java representation for argTypes resp. argCodecs
-        Class<?>[] javaParamTypes = UDHelper.javaTypes(argCodecs, calledOnNullInput);
-        // javaReturnType is just the Java representation for returnType resp. returnCodec
-        Class<?> javaReturnType = UDHelper.asJavaClass(returnCodec);
+        // javaParamTypes is just the Java representation for argTypes resp. argDataTypes
+        TypeToken<?>[] javaParamTypes = UDHelper.typeTokens(argCodecs, calledOnNullInput);
+        // javaReturnType is just the Java representation for returnType resp. returnDataType
+        TypeToken<?> javaReturnType = returnCodec.getJavaType();
 
         // put each UDF in a separate package to prevent cross-UDF code access
         String pkgName = BASE_PACKAGE + '.' + generateClassName(name, 'p');
@@ -244,7 +249,7 @@
         {
             EcjCompilationUnit compilationUnit = new EcjCompilationUnit(javaSource, targetClassName);
 
-            org.eclipse.jdt.internal.compiler.Compiler compiler = new Compiler(compilationUnit,
+            Compiler compiler = new Compiler(compilationUnit,
                                                                                errorHandlingPolicy,
                                                                                compilerOptions,
                                                                                compilationUnit,
@@ -289,17 +294,14 @@
             }
 
             // Verify the UDF bytecode against use of probably dangerous code
-            Set<String> errors = udfByteCodeVerifier.verify(targetClassLoader.classData(targetClassName));
+            Set<String> errors = udfByteCodeVerifier.verify(targetClassName, targetClassLoader.classData(targetClassName));
             String validDeclare = "not allowed method declared: " + executeInternalName + '(';
-            String validCall = "call to " + targetClassName.replace('.', '/') + '.' + executeInternalName + "()";
             for (Iterator<String> i = errors.iterator(); i.hasNext();)
             {
                 String error = i.next();
                 // we generate a random name of the private, internal execute method, which is detected by the byte-code verifier
-                if (error.startsWith(validDeclare) || error.equals(validCall))
-                {
+                if (error.startsWith(validDeclare))
                     i.remove();
-                }
             }
             if (!errors.isEmpty())
                 throw new InvalidRequestException("Java UDF validation failed: " + errors);
@@ -327,9 +329,9 @@
                 if (nonSyntheticMethodCount != 2 || cls.getDeclaredConstructors().length != 1)
                     throw new InvalidRequestException("Check your source to not define additional Java methods or constructors");
                 MethodType methodType = MethodType.methodType(void.class)
-                                                  .appendParameterTypes(TypeCodec.class, TypeCodec[].class);
+                                                  .appendParameterTypes(TypeCodec.class, TypeCodec[].class, UDFContext.class);
                 MethodHandle ctor = MethodHandles.lookup().findConstructor(cls, methodType);
-                this.javaUDF = (JavaUDF) ctor.invokeWithArguments(returnCodec, argCodecs);
+                this.javaUDF = (JavaUDF) ctor.invokeWithArguments(returnCodec, argCodecs, udfContext);
             }
             finally
             {
@@ -341,12 +343,13 @@
             // in case of an ITE, use the cause
             throw new InvalidRequestException(String.format("Could not compile function '%s' from Java source: %s", name, e.getCause()));
         }
-        catch (VirtualMachineError e)
+        catch (InvalidRequestException | VirtualMachineError e)
         {
             throw e;
         }
         catch (Throwable e)
         {
+            logger.error(String.format("Could not compile function '%s' from Java source:%n%s", name, javaSource), e);
             throw new InvalidRequestException(String.format("Could not compile function '%s' from Java source: %s", name, e));
         }
     }
@@ -392,13 +395,14 @@
         return sb.toString();
     }
 
-    private static String javaSourceName(Class<?> type)
+    @VisibleForTesting
+    public static String javaSourceName(TypeToken<?> type)
     {
-        String n = type.getName();
-        return n.startsWith("java.lang.") ? type.getSimpleName() : n;
+        String n = type.toString();
+        return JAVA_LANG_PREFIX.matcher(n).replaceAll("");
     }
 
-    private static String generateArgumentList(Class<?>[] paramTypes, List<ColumnIdentifier> argNames)
+    private static String generateArgumentList(TypeToken<?>[] paramTypes, List<ColumnIdentifier> argNames)
     {
         // initial builder size can just be a guess (prevent temp object allocations)
         StringBuilder code = new StringBuilder(32 * paramTypes.length);
@@ -413,7 +417,7 @@
         return code.toString();
     }
 
-    private static String generateArguments(Class<?>[] paramTypes, List<ColumnIdentifier> argNames)
+    private static String generateArguments(TypeToken<?>[] paramTypes, List<ColumnIdentifier> argNames)
     {
         StringBuilder code = new StringBuilder(64 * paramTypes.length);
         for (int i = 0; i < paramTypes.length; i++)
@@ -433,9 +437,9 @@
         return code.toString();
     }
 
-    private static String composeMethod(Class<?> type)
+    private static String composeMethod(TypeToken<?> type)
     {
-        return (type.isPrimitive()) ? ("super.compose_" + type.getName()) : "super.compose";
+        return (type.isPrimitive()) ? ("super.compose_" + type.getRawType().getName()) : "super.compose";
     }
 
     // Java source UDFs are a very simple compilation task, which allows us to let one class implement

diff --git a/src/java/org/apache/cassandra/cql3/functions/JavaUDF.java b/src/java/org/apache/cassandra/cql3/functions/JavaUDF.java
index fcfd21c..7410f1f 100644
--- a/src/java/org/apache/cassandra/cql3/functions/JavaUDF.java
+++ b/src/java/org/apache/cassandra/cql3/functions/JavaUDF.java

@@ -34,10 +34,13 @@
     private final TypeCodec<Object> returnCodec;
     private final TypeCodec<Object>[] argCodecs;
 
-    protected JavaUDF(TypeCodec<Object> returnCodec, TypeCodec<Object>[] argCodecs)
+    protected final UDFContext udfContext;
+
+    protected JavaUDF(TypeCodec<Object> returnCodec, TypeCodec<Object>[] argCodecs, UDFContext udfContext)
     {
         this.returnCodec = returnCodec;
         this.argCodecs = argCodecs;
+        this.udfContext = udfContext;
     }
 
     protected abstract ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params);

diff --git a/src/java/org/apache/cassandra/cql3/functions/ScriptBasedUDFunction.java b/src/java/org/apache/cassandra/cql3/functions/ScriptBasedUDFunction.java
index 8743a20..b524163 100644
--- a/src/java/org/apache/cassandra/cql3/functions/ScriptBasedUDFunction.java
+++ b/src/java/org/apache/cassandra/cql3/functions/ScriptBasedUDFunction.java

@@ -27,6 +27,7 @@
 import java.util.concurrent.ExecutorService;
 import javax.script.*;
 
+import jdk.nashorn.api.scripting.AbstractJSObject;
 import org.apache.cassandra.concurrent.NamedThreadFactory;
 import org.apache.cassandra.cql3.ColumnIdentifier;
 import org.apache.cassandra.db.marshal.AbstractType;
@@ -127,6 +128,7 @@
     }
 
     private final CompiledScript script;
+    private final Object udfContextBinding;
 
     ScriptBasedUDFunction(FunctionName name,
                           List<ColumnIdentifier> argNames,
@@ -155,6 +157,13 @@
             throw new InvalidRequestException(
                                              String.format("Failed to compile function '%s' for language %s: %s", name, language, e));
         }
+
+        // It's not always possible to simply pass a plain Java object as a binding to Nashorn and
+        // let the script execute methods on it.
+        udfContextBinding =
+            ("Oracle Nashorn".equals(((ScriptEngine) scriptEngine).getFactory().getEngineName()))
+                ? new UDFContextWrapper()
+                : udfContext;
     }
 
     protected ExecutorService executor()
@@ -173,6 +182,7 @@
         Bindings bindings = scriptContext.getBindings(ScriptContext.ENGINE_SCOPE);
         for (int i = 0; i < params.length; i++)
             bindings.put(argNames.get(i).toString(), params[i]);
+        bindings.put("udfContext", udfContextBinding);
 
         Object result;
         try
@@ -243,4 +253,68 @@
 
         return decompose(protocolVersion, result);
     }
+
+    private final class UDFContextWrapper extends AbstractJSObject
+    {
+        private final AbstractJSObject fRetUDT;
+        private final AbstractJSObject fArgUDT;
+        private final AbstractJSObject fRetTup;
+        private final AbstractJSObject fArgTup;
+
+        UDFContextWrapper()
+        {
+            fRetUDT = new AbstractJSObject()
+            {
+                public Object call(Object thiz, Object... args)
+                {
+                    return udfContext.newReturnUDTValue();
+                }
+            };
+            fArgUDT = new AbstractJSObject()
+            {
+                public Object call(Object thiz, Object... args)
+                {
+                    if (args[0] instanceof String)
+                        return udfContext.newArgUDTValue((String) args[0]);
+                    if (args[0] instanceof Number)
+                        return udfContext.newArgUDTValue(((Number) args[0]).intValue());
+                    return super.call(thiz, args);
+                }
+            };
+            fRetTup = new AbstractJSObject()
+            {
+                public Object call(Object thiz, Object... args)
+                {
+                    return udfContext.newReturnTupleValue();
+                }
+            };
+            fArgTup = new AbstractJSObject()
+            {
+                public Object call(Object thiz, Object... args)
+                {
+                    if (args[0] instanceof String)
+                        return udfContext.newArgTupleValue((String) args[0]);
+                    if (args[0] instanceof Number)
+                        return udfContext.newArgTupleValue(((Number) args[0]).intValue());
+                    return super.call(thiz, args);
+                }
+            };
+        }
+
+        public Object getMember(String name)
+        {
+            switch(name)
+            {
+                case "newReturnUDTValue":
+                    return fRetUDT;
+                case "newArgUDTValue":
+                    return fArgUDT;
+                case "newReturnTupleValue":
+                    return fRetTup;
+                case "newArgTupleValue":
+                    return fArgTup;
+            }
+            return super.getMember(name);
+        }
+    }
 }

diff --git a/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java b/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java
index 93d6d3b..623feba 100644
--- a/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java
+++ b/src/java/org/apache/cassandra/cql3/functions/TimeFcts.java

@@ -67,7 +67,7 @@
             if (bb == null)
                 return null;
 
-            return ByteBuffer.wrap(UUIDGen.decompose(UUIDGen.minTimeUUID(TimestampType.instance.compose(bb).getTime())));
+            return UUIDGen.toByteBuffer(UUIDGen.minTimeUUID(TimestampType.instance.compose(bb).getTime()));
         }
     };
 
@@ -79,7 +79,7 @@
             if (bb == null)
                 return null;
 
-            return ByteBuffer.wrap(UUIDGen.decompose(UUIDGen.maxTimeUUID(TimestampType.instance.compose(bb).getTime())));
+            return UUIDGen.toByteBuffer(UUIDGen.maxTimeUUID(TimestampType.instance.compose(bb).getTime()));
         }
     };
 
@@ -87,7 +87,7 @@
      * Function that convert a value of <code>TIMEUUID</code> into a value of type <code>TIMESTAMP</code>.
      * @deprecated Replaced by the {@link #timeUuidToTimestamp} function
      */
-    public static final Function dateOfFct = new NativeScalarFunction("dateof", TimestampType.instance, TimeUUIDType.instance)
+    public static final NativeScalarFunction dateOfFct = new NativeScalarFunction("dateof", TimestampType.instance, TimeUUIDType.instance)
     {
         private volatile boolean hasLoggedDeprecationWarning;
 
@@ -113,7 +113,7 @@
      * Function that convert a value of type <code>TIMEUUID</code> into an UNIX timestamp.
      * @deprecated Replaced by the {@link #timeUuidToUnixTimestamp} function
      */
-    public static final Function unixTimestampOfFct = new NativeScalarFunction("unixtimestampof", LongType.instance, TimeUUIDType.instance)
+    public static final NativeScalarFunction unixTimestampOfFct = new NativeScalarFunction("unixtimestampof", LongType.instance, TimeUUIDType.instance)
     {
         private volatile boolean hasLoggedDeprecationWarning;
 
@@ -137,7 +137,7 @@
     /**
      * Function that convert a value of <code>TIMEUUID</code> into a value of type <code>DATE</code>.
      */
-    public static final Function timeUuidtoDate = new NativeScalarFunction("todate", SimpleDateType.instance, TimeUUIDType.instance)
+    public static final NativeScalarFunction timeUuidtoDate = new NativeScalarFunction("todate", SimpleDateType.instance, TimeUUIDType.instance)
     {
         public ByteBuffer execute(int protocolVersion, List<ByteBuffer> parameters)
         {
@@ -153,7 +153,7 @@
     /**
      * Function that convert a value of type <code>TIMEUUID</code> into a value of type <code>TIMESTAMP</code>.
      */
-    public static final Function timeUuidToTimestamp = new NativeScalarFunction("totimestamp", TimestampType.instance, TimeUUIDType.instance)
+    public static final NativeScalarFunction timeUuidToTimestamp = new NativeScalarFunction("totimestamp", TimestampType.instance, TimeUUIDType.instance)
     {
         public ByteBuffer execute(int protocolVersion, List<ByteBuffer> parameters)
         {
@@ -169,7 +169,7 @@
     /**
      * Function that convert a value of type <code>TIMEUUID</code> into an UNIX timestamp.
      */
-    public static final Function timeUuidToUnixTimestamp = new NativeScalarFunction("tounixtimestamp", LongType.instance, TimeUUIDType.instance)
+    public static final NativeScalarFunction timeUuidToUnixTimestamp = new NativeScalarFunction("tounixtimestamp", LongType.instance, TimeUUIDType.instance)
     {
         public ByteBuffer execute(int protocolVersion, List<ByteBuffer> parameters)
         {
@@ -184,7 +184,7 @@
     /**
      * Function that convert a value of type <code>TIMESTAMP</code> into an UNIX timestamp.
      */
-    public static final Function timestampToUnixTimestamp = new NativeScalarFunction("tounixtimestamp", LongType.instance, TimestampType.instance)
+    public static final NativeScalarFunction timestampToUnixTimestamp = new NativeScalarFunction("tounixtimestamp", LongType.instance, TimestampType.instance)
     {
         public ByteBuffer execute(int protocolVersion, List<ByteBuffer> parameters)
         {
@@ -200,7 +200,7 @@
    /**
     * Function that convert a value of type <code>TIMESTAMP</code> into a <code>DATE</code>.
     */
-   public static final Function timestampToDate = new NativeScalarFunction("todate", SimpleDateType.instance, TimestampType.instance)
+   public static final NativeScalarFunction timestampToDate = new NativeScalarFunction("todate", SimpleDateType.instance, TimestampType.instance)
    {
        public ByteBuffer execute(int protocolVersion, List<ByteBuffer> parameters)
        {
@@ -216,7 +216,7 @@
    /**
     * Function that convert a value of type <code>TIMESTAMP</code> into a <code>DATE</code>.
     */
-   public static final Function dateToTimestamp = new NativeScalarFunction("totimestamp", TimestampType.instance, SimpleDateType.instance)
+   public static final NativeScalarFunction dateToTimestamp = new NativeScalarFunction("totimestamp", TimestampType.instance, SimpleDateType.instance)
    {
        public ByteBuffer execute(int protocolVersion, List<ByteBuffer> parameters)
        {
@@ -232,7 +232,7 @@
    /**
     * Function that convert a value of type <code>DATE</code> into an UNIX timestamp.
     */
-   public static final Function dateToUnixTimestamp = new NativeScalarFunction("tounixtimestamp", LongType.instance, SimpleDateType.instance)
+   public static final NativeScalarFunction dateToUnixTimestamp = new NativeScalarFunction("tounixtimestamp", LongType.instance, SimpleDateType.instance)
    {
        public ByteBuffer execute(int protocolVersion, List<ByteBuffer> parameters)
        {

diff --git a/src/java/org/apache/cassandra/cql3/functions/UDAggregate.java b/src/java/org/apache/cassandra/cql3/functions/UDAggregate.java
index 96e19de..52b8163 100644
--- a/src/java/org/apache/cassandra/cql3/functions/UDAggregate.java
+++ b/src/java/org/apache/cassandra/cql3/functions/UDAggregate.java

@@ -225,7 +225,7 @@
             && Functions.typesMatch(returnType, that.returnType)
             && Objects.equal(stateFunction, that.stateFunction)
             && Objects.equal(finalFunction, that.finalFunction)
-            && Objects.equal(stateType, that.stateType)
+            && ((stateType == that.stateType) || ((stateType != null) && stateType.equals(that.stateType, true)))  // ignore freezing
             && Objects.equal(initcond, that.initcond);
     }
 

diff --git a/src/java/org/apache/cassandra/cql3/functions/UDFByteCodeVerifier.java b/src/java/org/apache/cassandra/cql3/functions/UDFByteCodeVerifier.java
index 1314af3..cfaa70f 100644
--- a/src/java/org/apache/cassandra/cql3/functions/UDFByteCodeVerifier.java
+++ b/src/java/org/apache/cassandra/cql3/functions/UDFByteCodeVerifier.java

@@ -47,7 +47,7 @@
 
     public static final String JAVA_UDF_NAME = JavaUDF.class.getName().replace('.', '/');
     public static final String OBJECT_NAME = Object.class.getName().replace('.', '/');
-    public static final String CTOR_SIG = "(Lcom/datastax/driver/core/TypeCodec;[Lcom/datastax/driver/core/TypeCodec;)V";
+    public static final String CTOR_SIG = "(Lcom/datastax/driver/core/TypeCodec;[Lcom/datastax/driver/core/TypeCodec;Lorg/apache/cassandra/cql3/functions/UDFContext;)V";
 
     private final Set<String> disallowedClasses = new HashSet<>();
     private final Multimap<String, String> disallowedMethodCalls = HashMultimap.create();
@@ -80,8 +80,9 @@
         return this;
     }
 
-    public Set<String> verify(byte[] bytes)
+    public Set<String> verify(String clsName, byte[] bytes)
     {
+        String clsNameSl = clsName.replace('.', '/');
         Set<String> errors = new TreeSet<>(); // it's a TreeSet for unit tests
         ClassVisitor classVisitor = new ClassVisitor(Opcodes.ASM5)
         {
@@ -134,7 +135,8 @@
 
             public void visitInnerClass(String name, String outerName, String innerName, int access)
             {
-                errors.add("class declared as inner class");
+                if (clsNameSl.equals(outerName)) // outerName might be null, which is true for anonymous inner classes
+                    errors.add("class declared as inner class");
                 super.visitInnerClass(name, outerName, innerName, access);
             }
         };

diff --git a/src/java/org/apache/cassandra/cql3/functions/UDFContext.java b/src/java/org/apache/cassandra/cql3/functions/UDFContext.java
new file mode 100644
index 0000000..4465aec
--- /dev/null
+++ b/src/java/org/apache/cassandra/cql3/functions/UDFContext.java

@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.cql3.functions;
+
+import com.datastax.driver.core.TupleValue;
+import com.datastax.driver.core.UDTValue;
+
+/**
+ * Provides context information for a particular user defined function.
+ * Java UDFs can access implementations of this interface using the
+ * {@code udfContext} field, scripted UDFs can get it using the {@code udfContext}
+ * binding.
+ */
+public interface UDFContext
+{
+    /**
+     * Creates a new {@code UDTValue} instance for an argument.
+     *
+     * @param argName name of the argument as declared in the {@code CREATE FUNCTION} statement
+     * @return a new {@code UDTValue} instance
+     * @throws IllegalArgumentException if no argument for the given name exists
+     * @throws IllegalStateException    if the argument is not a UDT
+     */
+    UDTValue newArgUDTValue(String argName);
+
+    /**
+     * Creates a new {@code UDTValue} instance for an argument.
+     *
+     * @param argNum zero-based index of the argument as declared in the {@code CREATE FUNCTION} statement
+     * @return a new {@code UDTValue} instance
+     * @throws ArrayIndexOutOfBoundsException if no argument for the given index exists
+     * @throws IllegalStateException          if the argument is not a UDT
+     */
+    UDTValue newArgUDTValue(int argNum);
+
+    /**
+     * Creates a new {@code UDTValue} instance for the return value.
+     *
+     * @return a new {@code UDTValue} instance
+     * @throws IllegalStateException          if the return type is not a UDT
+     */
+    UDTValue newReturnUDTValue();
+
+    /**
+     * Creates a new {@code UDTValue} instance by name in the same keyspace.
+     *
+     * @param udtName name of the user defined type in the same keyspace as the function
+     * @return a new {@code UDTValue} instance
+     * @throws IllegalArgumentException if no UDT for the given name exists
+     */
+    UDTValue newUDTValue(String udtName);
+
+    /**
+     * Creates a new {@code TupleValue} instance for an argument.
+     *
+     * @param argName name of the argument as declared in the {@code CREATE FUNCTION} statement
+     * @return a new {@code TupleValue} instance
+     * @throws IllegalArgumentException if no argument for the given name exists
+     * @throws IllegalStateException    if the argument is not a tuple
+     */
+    TupleValue newArgTupleValue(String argName);
+
+    /**
+     * Creates a new {@code TupleValue} instance for an argument.
+     *
+     * @param argNum zero-based index of the argument as declared in the {@code CREATE FUNCTION} statement
+     * @return a new {@code TupleValue} instance
+     * @throws ArrayIndexOutOfBoundsException if no argument for the given index exists
+     * @throws IllegalStateException          if the argument is not a tuple
+     */
+    TupleValue newArgTupleValue(int argNum);
+
+    /**
+     * Creates a new {@code TupleValue} instance for the return value.
+     *
+     * @return a new {@code TupleValue} instance
+     * @throws IllegalStateException          if the return type is not a tuple
+     */
+    TupleValue newReturnTupleValue();
+
+    /**
+     * Creates a new {@code TupleValue} instance for the CQL type definition.
+     *
+     * @param cqlDefinition CQL tuple type definition like {@code tuple<int, text, bigint>}
+     * @return a new {@code TupleValue} instance
+     * @throws IllegalStateException          if cqlDefinition type is not a tuple or an invalid type
+     */
+    TupleValue newTupleValue(String cqlDefinition);
+}

diff --git a/src/java/org/apache/cassandra/cql3/functions/UDFContextImpl.java b/src/java/org/apache/cassandra/cql3/functions/UDFContextImpl.java
new file mode 100644
index 0000000..00625cd
--- /dev/null
+++ b/src/java/org/apache/cassandra/cql3/functions/UDFContextImpl.java

@@ -0,0 +1,146 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.cql3.functions;
+
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Optional;
+
+import com.datastax.driver.core.DataType;
+import com.datastax.driver.core.TupleType;
+import com.datastax.driver.core.TupleValue;
+import com.datastax.driver.core.TypeCodec;
+import com.datastax.driver.core.UDTValue;
+import com.datastax.driver.core.UserType;
+import org.apache.cassandra.cql3.ColumnIdentifier;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.schema.CQLTypeParser;
+import org.apache.cassandra.schema.KeyspaceMetadata;
+import org.apache.cassandra.utils.ByteBufferUtil;
+
+/**
+ * Package private implementation of {@link UDFContext}
+ */
+public final class UDFContextImpl implements UDFContext
+{
+    private final KeyspaceMetadata keyspaceMetadata;
+    private final Map<String, TypeCodec<Object>> byName = new HashMap<>();
+    private final TypeCodec<Object>[] argCodecs;
+    private final TypeCodec<Object> returnCodec;
+
+    UDFContextImpl(List<ColumnIdentifier> argNames, TypeCodec<Object>[] argCodecs, TypeCodec<Object> returnCodec,
+                   KeyspaceMetadata keyspaceMetadata)
+    {
+        for (int i = 0; i < argNames.size(); i++)
+            byName.put(argNames.get(i).toString(), argCodecs[i]);
+        this.argCodecs = argCodecs;
+        this.returnCodec = returnCodec;
+        this.keyspaceMetadata = keyspaceMetadata;
+    }
+
+    public UDTValue newArgUDTValue(String argName)
+    {
+        return newUDTValue(codecFor(argName));
+    }
+
+    public UDTValue newArgUDTValue(int argNum)
+    {
+        return newUDTValue(codecFor(argNum));
+    }
+
+    public UDTValue newReturnUDTValue()
+    {
+        return newUDTValue(returnCodec);
+    }
+
+    public UDTValue newUDTValue(String udtName)
+    {
+        Optional<org.apache.cassandra.db.marshal.UserType> udtType = keyspaceMetadata.types.get(ByteBufferUtil.bytes(udtName));
+        DataType dataType = UDHelper.driverType(udtType.orElseThrow(
+                () -> new IllegalArgumentException("No UDT named " + udtName + " in keyspace " + keyspaceMetadata.name)
+            ));
+        return newUDTValue(dataType);
+    }
+
+    public TupleValue newArgTupleValue(String argName)
+    {
+        return newTupleValue(codecFor(argName));
+    }
+
+    public TupleValue newArgTupleValue(int argNum)
+    {
+        return newTupleValue(codecFor(argNum));
+    }
+
+    public TupleValue newReturnTupleValue()
+    {
+        return newTupleValue(returnCodec);
+    }
+
+    public TupleValue newTupleValue(String cqlDefinition)
+    {
+        AbstractType<?> abstractType = CQLTypeParser.parse(keyspaceMetadata.name, cqlDefinition, keyspaceMetadata.types);
+        DataType dataType = UDHelper.driverType(abstractType);
+        return newTupleValue(dataType);
+    }
+
+    private TypeCodec<Object> codecFor(int argNum)
+    {
+        if (argNum < 0 || argNum >= argCodecs.length)
+            throw new IllegalArgumentException("Function does not declare an argument with index " + argNum);
+        return argCodecs[argNum];
+    }
+
+    private TypeCodec<Object> codecFor(String argName)
+    {
+        TypeCodec<Object> codec = byName.get(argName);
+        if (codec == null)
+            throw new IllegalArgumentException("Function does not declare an argument named '" + argName + '\'');
+        return codec;
+    }
+
+    private static UDTValue newUDTValue(TypeCodec<Object> codec)
+    {
+        DataType dataType = codec.getCqlType();
+        return newUDTValue(dataType);
+    }
+
+    private static UDTValue newUDTValue(DataType dataType)
+    {
+        if (!(dataType instanceof UserType))
+            throw new IllegalStateException("Function argument is not a UDT but a " + dataType.getName());
+        UserType userType = (UserType) dataType;
+        return userType.newValue();
+    }
+
+    private static TupleValue newTupleValue(TypeCodec<Object> codec)
+    {
+        DataType dataType = codec.getCqlType();
+        return newTupleValue(dataType);
+    }
+
+    private static TupleValue newTupleValue(DataType dataType)
+    {
+        if (!(dataType instanceof TupleType))
+            throw new IllegalStateException("Function argument is not a tuple type but a " + dataType.getName());
+        TupleType tupleType = (TupleType) dataType;
+        return tupleType.newValue();
+    }
+}

diff --git a/src/java/org/apache/cassandra/cql3/functions/UDFunction.java b/src/java/org/apache/cassandra/cql3/functions/UDFunction.java
index 7b69342..6e8d187 100644
--- a/src/java/org/apache/cassandra/cql3/functions/UDFunction.java
+++ b/src/java/org/apache/cassandra/cql3/functions/UDFunction.java

@@ -75,6 +75,8 @@
     protected final TypeCodec<Object> returnCodec;
     protected final boolean calledOnNullInput;
 
+    protected final UDFContext udfContext;
+
     //
     // Access to classes is controlled via a whitelist and a blacklist.
     //
@@ -108,6 +110,7 @@
     "java/time/",
     "java/util/",
     "org/apache/cassandra/cql3/functions/JavaUDF.class",
+    "org/apache/cassandra/cql3/functions/UDFContext.class",
     "org/apache/cassandra/exceptions/",
     };
     // Only need to blacklist a pattern, if it would otherwise be allowed via whitelistedPatterns
@@ -206,6 +209,9 @@
         this.argCodecs = UDHelper.codecsFor(argDataTypes);
         this.returnCodec = UDHelper.codecFor(returnDataType);
         this.calledOnNullInput = calledOnNullInput;
+        KeyspaceMetadata keyspaceMetadata = Schema.instance.getKSMetaData(name.keyspace);
+        this.udfContext = new UDFContextImpl(argNames, argCodecs, returnCodec,
+                                             keyspaceMetadata);
     }
 
     public static UDFunction create(FunctionName name,

diff --git a/src/java/org/apache/cassandra/cql3/functions/UDHelper.java b/src/java/org/apache/cassandra/cql3/functions/UDHelper.java
index 45c734f..86cb89d 100644
--- a/src/java/org/apache/cassandra/cql3/functions/UDHelper.java
+++ b/src/java/org/apache/cassandra/cql3/functions/UDHelper.java

@@ -23,6 +23,8 @@
 import java.nio.ByteBuffer;
 import java.util.List;
 
+import com.google.common.reflect.TypeToken;
+
 import com.datastax.driver.core.CodecRegistry;
 import com.datastax.driver.core.DataType;
 import com.datastax.driver.core.ProtocolVersion;
@@ -33,7 +35,7 @@
 import org.apache.cassandra.transport.Server;
 
 /**
- * Helper class for User Defined Functions + Aggregates.
+ * Helper class for User Defined Functions, Types and Aggregates.
  */
 public final class UDHelper
 {
@@ -64,7 +66,7 @@
         return codecs;
     }
 
-    static TypeCodec<Object> codecFor(DataType dataType)
+    public static TypeCodec<Object> codecFor(DataType dataType)
     {
         return codecRegistry.codecFor(dataType);
     }
@@ -76,31 +78,32 @@
      * @param calledOnNullInput whether to allow {@code null} as an argument value
      * @return array of same size with UDF arguments
      */
-    public static Class<?>[] javaTypes(TypeCodec<Object>[] dataTypes, boolean calledOnNullInput)
+    public static TypeToken<?>[] typeTokens(TypeCodec<Object>[] dataTypes, boolean calledOnNullInput)
     {
-        Class<?>[] paramTypes = new Class[dataTypes.length];
+        TypeToken<?>[] paramTypes = new TypeToken[dataTypes.length];
         for (int i = 0; i < paramTypes.length; i++)
         {
-            Class<?> clazz = asJavaClass(dataTypes[i]);
+            TypeToken<?> typeToken = dataTypes[i].getJavaType();
             if (!calledOnNullInput)
             {
                 // only care about classes that can be used in a data type
+                Class<?> clazz = typeToken.getRawType();
                 if (clazz == Integer.class)
-                    clazz = int.class;
+                    typeToken = TypeToken.of(int.class);
                 else if (clazz == Long.class)
-                    clazz = long.class;
+                    typeToken = TypeToken.of(long.class);
                 else if (clazz == Byte.class)
-                    clazz = byte.class;
+                    typeToken = TypeToken.of(byte.class);
                 else if (clazz == Short.class)
-                    clazz = short.class;
+                    typeToken = TypeToken.of(short.class);
                 else if (clazz == Float.class)
-                    clazz = float.class;
+                    typeToken = TypeToken.of(float.class);
                 else if (clazz == Double.class)
-                    clazz = double.class;
+                    typeToken = TypeToken.of(double.class);
                 else if (clazz == Boolean.class)
-                    clazz = boolean.class;
+                    typeToken = TypeToken.of(boolean.class);
             }
-            paramTypes[i] = clazz;
+            paramTypes[i] = typeToken;
         }
         return paramTypes;
     }
@@ -126,9 +129,15 @@
     public static DataType driverType(AbstractType abstractType)
     {
         CQL3Type cqlType = abstractType.asCQL3Type();
+        String abstractTypeDef = cqlType.getType().toString();
+        return driverTypeFromAbstractType(abstractTypeDef);
+    }
+
+    public static DataType driverTypeFromAbstractType(String abstractTypeDef)
+    {
         try
         {
-            return (DataType) methodParseOne.invoke(cqlType.getType().toString(),
+            return (DataType) methodParseOne.invoke(abstractTypeDef,
                                                     ProtocolVersion.fromInt(Server.CURRENT_VERSION),
                                                     codecRegistry);
         }
@@ -139,7 +148,7 @@
         }
         catch (Throwable e)
         {
-            throw new RuntimeException("cannot parse driver type " + cqlType.getType().toString(), e);
+            throw new RuntimeException("cannot parse driver type " + abstractTypeDef, e);
         }
     }
 

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/AbstractPrimaryKeyRestrictions.java b/src/java/org/apache/cassandra/cql3/restrictions/AbstractPrimaryKeyRestrictions.java
deleted file mode 100644
index f1b5a50..0000000
--- a/src/java/org/apache/cassandra/cql3/restrictions/AbstractPrimaryKeyRestrictions.java
+++ /dev/null

@@ -1,61 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.cql3.restrictions;
-
-import java.nio.ByteBuffer;
-import java.util.*;
-
-import org.apache.cassandra.cql3.QueryOptions;
-import org.apache.cassandra.cql3.statements.Bound;
-import org.apache.cassandra.db.ClusteringPrefix;
-import org.apache.cassandra.db.ClusteringComparator;
-import org.apache.cassandra.exceptions.InvalidRequestException;
-
-/**
- * Base class for <code>PrimaryKeyRestrictions</code>.
- */
-abstract class AbstractPrimaryKeyRestrictions extends AbstractRestriction implements PrimaryKeyRestrictions
-{
-    /**
-     * The composite type.
-     */
-    protected final ClusteringComparator comparator;
-
-    public AbstractPrimaryKeyRestrictions(ClusteringComparator comparator)
-    {
-        this.comparator = comparator;
-    }
-
-    @Override
-    public List<ByteBuffer> bounds(Bound b, QueryOptions options) throws InvalidRequestException
-    {
-        return values(options);
-    }
-
-    @Override
-    public final boolean isEmpty()
-    {
-        return getColumnDefs().isEmpty();
-    }
-
-    @Override
-    public final int size()
-    {
-        return getColumnDefs().size();
-    }
-}

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/AbstractRestriction.java b/src/java/org/apache/cassandra/cql3/restrictions/AbstractRestriction.java
deleted file mode 100644
index df04331..0000000
--- a/src/java/org/apache/cassandra/cql3/restrictions/AbstractRestriction.java
+++ /dev/null

@@ -1,102 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.cql3.restrictions;
-
-import org.apache.cassandra.cql3.QueryOptions;
-import org.apache.cassandra.cql3.statements.Bound;
-import org.apache.cassandra.db.MultiCBuilder;
-
-import org.apache.cassandra.config.ColumnDefinition;
-
-/**
- * Base class for <code>Restriction</code>s
- */
-abstract class AbstractRestriction  implements Restriction
-{
-    @Override
-    public  boolean isOnToken()
-    {
-        return false;
-    }
-
-    @Override
-    public boolean isMultiColumn()
-    {
-        return false;
-    }
-
-    @Override
-    public boolean isSlice()
-    {
-        return false;
-    }
-
-    @Override
-    public boolean isEQ()
-    {
-        return false;
-    }
-
-    @Override
-    public boolean isIN()
-    {
-        return false;
-    }
-
-    @Override
-    public boolean isContains()
-    {
-        return false;
-    }
-
-    @Override
-    public boolean isNotNull()
-    {
-        return false;
-    }
-
-    @Override
-    public boolean hasBound(Bound b)
-    {
-        return true;
-    }
-
-    @Override
-    public MultiCBuilder appendBoundTo(MultiCBuilder builder, Bound bound, QueryOptions options)
-    {
-        return appendTo(builder, options);
-    }
-
-    @Override
-    public boolean isInclusive(Bound b)
-    {
-        return true;
-    }
-
-    /**
-     * Reverses the specified bound if the column type is a reversed one.
-     *
-     * @param columnDefinition the column definition
-     * @param bound the bound
-     * @return the bound reversed if the column type was a reversed one or the original bound
-     */
-    protected static Bound reverseBoundIfNeeded(ColumnDefinition columnDefinition, Bound bound)
-    {
-        return columnDefinition.isReversedType() ? bound.reverse() : bound;
-    }
-}

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/ClusteringColumnRestrictions.java b/src/java/org/apache/cassandra/cql3/restrictions/ClusteringColumnRestrictions.java
new file mode 100644
index 0000000..dc349d9
--- /dev/null
+++ b/src/java/org/apache/cassandra/cql3/restrictions/ClusteringColumnRestrictions.java

@@ -0,0 +1,215 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql3.restrictions;
+
+import java.util.*;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.cql3.QueryOptions;
+import org.apache.cassandra.cql3.statements.Bound;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.filter.RowFilter;
+import org.apache.cassandra.exceptions.InvalidRequestException;
+import org.apache.cassandra.index.SecondaryIndexManager;
+import org.apache.cassandra.utils.btree.BTreeSet;
+
+import static org.apache.cassandra.cql3.statements.RequestValidations.checkFalse;
+import static org.apache.cassandra.cql3.statements.RequestValidations.invalidRequest;
+
+/**
+ * A set of restrictions on the clustering key.
+ */
+final class ClusteringColumnRestrictions extends RestrictionSetWrapper
+{
+    /**
+     * The composite type.
+     */
+    protected final ClusteringComparator comparator;
+
+    /**
+     * <code>true</code> if filtering is allowed for this restriction, <code>false</code> otherwise
+     */
+    private final boolean allowFiltering;
+
+    public ClusteringColumnRestrictions(CFMetaData cfm)
+    {
+        this(cfm, false);
+    }
+
+    public ClusteringColumnRestrictions(CFMetaData cfm, boolean allowFiltering)
+    {
+        this(cfm.comparator, new RestrictionSet(), allowFiltering);
+    }
+
+    private ClusteringColumnRestrictions(ClusteringComparator comparator,
+                                         RestrictionSet restrictionSet,
+                                         boolean allowFiltering)
+    {
+        super(restrictionSet);
+        this.comparator = comparator;
+        this.allowFiltering = allowFiltering;
+    }
+
+    public ClusteringColumnRestrictions mergeWith(Restriction restriction) throws InvalidRequestException
+    {
+        SingleRestriction newRestriction = (SingleRestriction) restriction;
+        RestrictionSet newRestrictionSet = restrictions.addRestriction(newRestriction);
+
+        if (!isEmpty() && !allowFiltering)
+        {
+            SingleRestriction lastRestriction = restrictions.lastRestriction();
+            ColumnDefinition lastRestrictionStart = lastRestriction.getFirstColumn();
+            ColumnDefinition newRestrictionStart = restriction.getFirstColumn();
+
+            checkFalse(lastRestriction.isSlice() && newRestrictionStart.position() > lastRestrictionStart.position(),
+                       "Clustering column \"%s\" cannot be restricted (preceding column \"%s\" is restricted by a non-EQ relation)",
+                       newRestrictionStart.name,
+                       lastRestrictionStart.name);
+
+            if (newRestrictionStart.position() < lastRestrictionStart.position() && newRestriction.isSlice())
+                throw invalidRequest("PRIMARY KEY column \"%s\" cannot be restricted (preceding column \"%s\" is restricted by a non-EQ relation)",
+                                     restrictions.nextColumn(newRestrictionStart).name,
+                                     newRestrictionStart.name);
+        }
+
+        return new ClusteringColumnRestrictions(this.comparator, newRestrictionSet, allowFiltering);
+    }
+
+    private boolean hasMultiColumnSlice()
+    {
+        for (SingleRestriction restriction : restrictions)
+        {
+            if (restriction.isMultiColumn() && restriction.isSlice())
+                return true;
+        }
+        return false;
+    }
+
+    public NavigableSet<Clustering> valuesAsClustering(QueryOptions options) throws InvalidRequestException
+    {
+        MultiCBuilder builder = MultiCBuilder.create(comparator, hasIN());
+        for (SingleRestriction r : restrictions)
+        {
+            r.appendTo(builder, options);
+            if (builder.hasMissingElements())
+                break;
+        }
+        return builder.build();
+    }
+
+    public NavigableSet<ClusteringBound> boundsAsClustering(Bound bound, QueryOptions options) throws InvalidRequestException
+    {
+        MultiCBuilder builder = MultiCBuilder.create(comparator, hasIN() || hasMultiColumnSlice());
+        int keyPosition = 0;
+
+        for (SingleRestriction r : restrictions)
+        {
+            if (handleInFilter(r, keyPosition))
+                break;
+
+            if (r.isSlice())
+            {
+                r.appendBoundTo(builder, bound, options);
+                return builder.buildBoundForSlice(bound.isStart(),
+                                                  r.isInclusive(bound),
+                                                  r.isInclusive(bound.reverse()),
+                                                  r.getColumnDefs());
+            }
+
+            r.appendBoundTo(builder, bound, options);
+
+            if (builder.hasMissingElements())
+                return BTreeSet.empty(comparator);
+
+            keyPosition = r.getLastColumn().position() + 1;
+        }
+
+        // Everything was an equal (or there was nothing)
+        return builder.buildBound(bound.isStart(), true);
+    }
+
+    /**
+     * Checks if any of the underlying restriction is a CONTAINS or CONTAINS KEY.
+     *
+     * @return <code>true</code> if any of the underlying restriction is a CONTAINS or CONTAINS KEY,
+     * <code>false</code> otherwise
+     */
+    public final boolean hasContains()
+    {
+        return restrictions.stream().anyMatch(SingleRestriction::isContains);
+    }
+
+    /**
+     * Checks if any of the underlying restriction is a slice restrictions.
+     *
+     * @return <code>true</code> if any of the underlying restriction is a slice restrictions,
+     * <code>false</code> otherwise
+     */
+    public final boolean hasSlice()
+    {
+        return restrictions.stream().anyMatch(SingleRestriction::isSlice);
+    }
+
+    /**
+     * Checks if underlying restrictions would require filtering
+     *
+     * @return <code>true</code> if any underlying restrictions require filtering, <code>false</code>
+     * otherwise
+     */
+    public final boolean needFiltering()
+    {
+        int position = 0;
+
+        for (SingleRestriction restriction : restrictions)
+        {
+            if (handleInFilter(restriction, position))
+                return true;
+
+            if (!restriction.isSlice())
+                position = restriction.getLastColumn().position() + 1;
+        }
+        return hasContains();
+    }
+
+    @Override
+    public void addRowFilterTo(RowFilter filter,
+                               SecondaryIndexManager indexManager,
+                               QueryOptions options) throws InvalidRequestException
+    {
+        int position = 0;
+
+        for (SingleRestriction restriction : restrictions)
+        {
+            // We ignore all the clustering columns that can be handled by slices.
+            if (handleInFilter(restriction, position) || restriction.hasSupportingIndex(indexManager))
+            {
+                restriction.addRowFilterTo(filter, indexManager, options);
+                continue;
+            }
+
+            if (!restriction.isSlice())
+                position = restriction.getLastColumn().position() + 1;
+        }
+    }
+
+    private boolean handleInFilter(SingleRestriction restriction, int index) {
+        return restriction.isContains() || restriction.isLIKE() || index != restriction.getFirstColumn().position();
+    }
+
+}

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/ForwardingPrimaryKeyRestrictions.java b/src/java/org/apache/cassandra/cql3/restrictions/ForwardingPrimaryKeyRestrictions.java
deleted file mode 100644
index 71305b9..0000000
--- a/src/java/org/apache/cassandra/cql3/restrictions/ForwardingPrimaryKeyRestrictions.java
+++ /dev/null

@@ -1,191 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.cql3.restrictions;
-
-import java.nio.ByteBuffer;
-import java.util.List;
-import java.util.NavigableSet;
-
-import org.apache.cassandra.config.ColumnDefinition;
-import org.apache.cassandra.cql3.QueryOptions;
-import org.apache.cassandra.cql3.functions.Function;
-import org.apache.cassandra.cql3.statements.Bound;
-import org.apache.cassandra.db.Clustering;
-import org.apache.cassandra.db.MultiCBuilder;
-import org.apache.cassandra.db.Slice;
-import org.apache.cassandra.db.filter.RowFilter;
-import org.apache.cassandra.exceptions.InvalidRequestException;
-import org.apache.cassandra.index.SecondaryIndexManager;
-
-/**
- * A <code>PrimaryKeyRestrictions</code> which forwards all its method calls to another 
- * <code>PrimaryKeyRestrictions</code>. Subclasses should override one or more methods to modify the behavior 
- * of the backing <code>PrimaryKeyRestrictions</code> as desired per the decorator pattern. 
- */
-abstract class ForwardingPrimaryKeyRestrictions implements PrimaryKeyRestrictions
-{
-    /**
-     * Returns the backing delegate instance that methods are forwarded to.
-     * @return the backing delegate instance that methods are forwarded to.
-     */
-    protected abstract PrimaryKeyRestrictions getDelegate();
-
-    @Override
-    public void addFunctionsTo(List<Function> functions)
-    {
-        getDelegate().addFunctionsTo(functions);
-    }
-
-    @Override
-    public List<ColumnDefinition> getColumnDefs()
-    {
-        return getDelegate().getColumnDefs();
-    }
-
-    @Override
-    public ColumnDefinition getFirstColumn()
-    {
-        return getDelegate().getFirstColumn();
-    }
-
-    @Override
-    public ColumnDefinition getLastColumn()
-    {
-        return getDelegate().getLastColumn();
-    }
-
-    @Override
-    public PrimaryKeyRestrictions mergeWith(Restriction restriction) throws InvalidRequestException
-    {
-        return getDelegate().mergeWith(restriction);
-    }
-
-    @Override
-    public boolean hasSupportingIndex(SecondaryIndexManager secondaryIndexManager)
-    {
-        return getDelegate().hasSupportingIndex(secondaryIndexManager);
-    }
-
-    @Override
-    public List<ByteBuffer> values(QueryOptions options) throws InvalidRequestException
-    {
-        return getDelegate().values(options);
-    }
-
-    @Override
-    public MultiCBuilder appendTo(MultiCBuilder builder, QueryOptions options)
-    {
-        return getDelegate().appendTo(builder, options);
-    }
-
-    @Override
-    public NavigableSet<Clustering> valuesAsClustering(QueryOptions options) throws InvalidRequestException
-    {
-        return getDelegate().valuesAsClustering(options);
-    }
-
-    @Override
-    public List<ByteBuffer> bounds(Bound bound, QueryOptions options) throws InvalidRequestException
-    {
-        return getDelegate().bounds(bound, options);
-    }
-
-    @Override
-    public NavigableSet<Slice.Bound> boundsAsClustering(Bound bound, QueryOptions options) throws InvalidRequestException
-    {
-        return getDelegate().boundsAsClustering(bound, options);
-    }
-
-    @Override
-    public MultiCBuilder appendBoundTo(MultiCBuilder builder, Bound bound, QueryOptions options)
-    {
-        return getDelegate().appendBoundTo(builder, bound, options);
-    }
-
-    @Override
-    public boolean isInclusive(Bound bound)
-    {
-        return getDelegate().isInclusive(bound.reverse());
-    }
-
-    @Override
-    public boolean isEmpty()
-    {
-        return getDelegate().isEmpty();
-    }
-
-    @Override
-    public int size()
-    {
-        return getDelegate().size();
-    }
-
-    @Override
-    public boolean isOnToken()
-    {
-        return getDelegate().isOnToken();
-    }
-
-    @Override
-    public boolean isSlice()
-    {
-        return getDelegate().isSlice();
-    }
-
-    @Override
-    public boolean isEQ()
-    {
-        return getDelegate().isEQ();
-    }
-
-    @Override
-    public boolean isIN()
-    {
-        return getDelegate().isIN();
-    }
-
-    @Override
-    public boolean isContains()
-    {
-        return getDelegate().isContains();
-    }
-
-    @Override
-    public boolean isNotNull()
-    {
-        return getDelegate().isNotNull();
-    }
-
-    @Override
-    public boolean isMultiColumn()
-    {
-        return getDelegate().isMultiColumn();
-    }
-
-    @Override
-    public boolean hasBound(Bound b)
-    {
-        return getDelegate().hasBound(b);
-    }
-
-    @Override
-    public void addRowFilterTo(RowFilter filter, SecondaryIndexManager indexManager, QueryOptions options) throws InvalidRequestException
-    {
-        getDelegate().addRowFilterTo(filter, indexManager, options);
-    }
-}

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/MultiColumnRestriction.java b/src/java/org/apache/cassandra/cql3/restrictions/MultiColumnRestriction.java
index 9d33bb1..e5e3bc8 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/MultiColumnRestriction.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/MultiColumnRestriction.java

@@ -27,7 +27,6 @@
 import org.apache.cassandra.cql3.statements.Bound;
 import org.apache.cassandra.db.MultiCBuilder;
 import org.apache.cassandra.db.filter.RowFilter;
-import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.index.Index;
 import org.apache.cassandra.index.SecondaryIndexManager;
 
@@ -36,7 +35,7 @@
 import static org.apache.cassandra.cql3.statements.RequestValidations.checkTrue;
 import static org.apache.cassandra.cql3.statements.RequestValidations.invalidRequest;
 
-public abstract class MultiColumnRestriction extends AbstractRestriction
+public abstract class MultiColumnRestriction implements SingleRestriction
 {
     /**
      * The columns to which the restriction apply.
@@ -73,7 +72,7 @@
     }
 
     @Override
-    public final Restriction mergeWith(Restriction otherRestriction) throws InvalidRequestException
+    public final SingleRestriction mergeWith(SingleRestriction otherRestriction)
     {
         // We want to allow query like: (b,c) > (?, ?) AND b < ?
         if (!otherRestriction.isMultiColumn()
@@ -85,7 +84,7 @@
         return doMergeWith(otherRestriction);
     }
 
-    protected abstract Restriction doMergeWith(Restriction otherRestriction) throws InvalidRequestException;
+    protected abstract SingleRestriction doMergeWith(SingleRestriction otherRestriction);
 
     /**
      * Returns the names of the columns that are specified within this <code>Restrictions</code> and the other one
@@ -150,7 +149,7 @@
         }
 
         @Override
-        public Restriction doMergeWith(Restriction otherRestriction) throws InvalidRequestException
+        public SingleRestriction doMergeWith(SingleRestriction otherRestriction)
         {
             throw invalidRequest("%s cannot be restricted by more than one relation if it includes an Equal",
                                  getColumnsInCommons(otherRestriction));
@@ -179,7 +178,7 @@
         }
 
         @Override
-        public final void addRowFilterTo(RowFilter filter, SecondaryIndexManager indexMananger, QueryOptions options) throws InvalidRequestException
+        public final void addRowFilterTo(RowFilter filter, SecondaryIndexManager indexMananger, QueryOptions options)
         {
             Tuples.Value t = ((Tuples.Value) value.bind(options));
             List<ByteBuffer> values = t.getElements();
@@ -220,7 +219,7 @@
         }
 
         @Override
-        public Restriction doMergeWith(Restriction otherRestriction) throws InvalidRequestException
+        public SingleRestriction doMergeWith(SingleRestriction otherRestriction)
         {
             throw invalidRequest("%s cannot be restricted by more than one relation if it includes a IN",
                                  getColumnsInCommons(otherRestriction));
@@ -238,7 +237,7 @@
         @Override
         public final void addRowFilterTo(RowFilter filter,
                                          SecondaryIndexManager indexManager,
-                                         QueryOptions options) throws InvalidRequestException
+                                         QueryOptions options)
         {
             List<List<ByteBuffer>> splitInValues = splitValues(options);
             checkTrue(splitInValues.size() == 1, "IN restrictions are not supported on indexed columns");
@@ -251,7 +250,7 @@
             }
         }
 
-        protected abstract List<List<ByteBuffer>> splitValues(QueryOptions options) throws InvalidRequestException;
+        protected abstract List<List<ByteBuffer>> splitValues(QueryOptions options);
     }
 
     /**
@@ -281,7 +280,7 @@
         }
 
         @Override
-        protected List<List<ByteBuffer>> splitValues(QueryOptions options) throws InvalidRequestException
+        protected List<List<ByteBuffer>> splitValues(QueryOptions options)
         {
             List<List<ByteBuffer>> buffers = new ArrayList<>(values.size());
             for (Term value : values)
@@ -319,7 +318,7 @@
         }
 
         @Override
-        protected List<List<ByteBuffer>> splitValues(QueryOptions options) throws InvalidRequestException
+        protected List<List<ByteBuffer>> splitValues(QueryOptions options)
         {
             Tuples.InMarker inMarker = (Tuples.InMarker) marker;
             Tuples.InValue inValue = inMarker.bind(options);
@@ -370,7 +369,7 @@
             for (int i = 0, m = columnDefs.size(); i < m; i++)
             {
                 ColumnDefinition column = columnDefs.get(i);
-                Bound b = reverseBoundIfNeeded(column, bound);
+                Bound b = bound.reverseIfNeeded(column);
 
                 // For mixed order columns, we need to create additional slices when 2 columns are in reverse order
                 if (reversed != column.isReversedType())
@@ -444,7 +443,7 @@
         }
 
         @Override
-        public Restriction doMergeWith(Restriction otherRestriction) throws InvalidRequestException
+        public SingleRestriction doMergeWith(SingleRestriction otherRestriction)
         {
             checkTrue(otherRestriction.isSlice(),
                       "Column \"%s\" cannot be restricted by both an equality and an inequality relation",
@@ -475,7 +474,7 @@
         @Override
         public final void addRowFilterTo(RowFilter filter,
                                          SecondaryIndexManager indexManager,
-                                         QueryOptions options) throws InvalidRequestException
+                                         QueryOptions options)
         {
             throw invalidRequest("Multi-column slice restrictions cannot be used for filtering.");
         }
@@ -492,9 +491,8 @@
          * @param b the bound type
          * @param options the query options
          * @return one ByteBuffer per-component in the bound
-         * @throws InvalidRequestException if the components cannot be retrieved
          */
-        private List<ByteBuffer> componentBounds(Bound b, QueryOptions options) throws InvalidRequestException
+        private List<ByteBuffer> componentBounds(Bound b, QueryOptions options)
         {
             if (!slice.hasBound(b))
                 return Collections.emptyList();
@@ -541,7 +539,7 @@
         }
 
         @Override
-        public Restriction doMergeWith(Restriction otherRestriction) throws InvalidRequestException
+        public SingleRestriction doMergeWith(SingleRestriction otherRestriction)
         {
             throw invalidRequest("%s cannot be restricted by a relation if it includes an IS NOT NULL clause",
                                  getColumnsInCommons(otherRestriction));
@@ -563,7 +561,7 @@
         }
 
         @Override
-        public final void addRowFilterTo(RowFilter filter, SecondaryIndexManager indexMananger, QueryOptions options) throws InvalidRequestException
+        public final void addRowFilterTo(RowFilter filter, SecondaryIndexManager indexMananger, QueryOptions options)
         {
             throw new UnsupportedOperationException("Secondary indexes do not support IS NOT NULL restrictions");
         }

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/PartitionKeyRestrictions.java b/src/java/org/apache/cassandra/cql3/restrictions/PartitionKeyRestrictions.java
new file mode 100644
index 0000000..10efa9f
--- /dev/null
+++ b/src/java/org/apache/cassandra/cql3/restrictions/PartitionKeyRestrictions.java

@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql3.restrictions;
+
+import java.nio.ByteBuffer;
+import java.util.List;
+
+import org.apache.cassandra.cql3.QueryOptions;
+import org.apache.cassandra.cql3.statements.Bound;
+
+/**
+ * A set of restrictions on the partition key.
+ *
+ */
+interface PartitionKeyRestrictions extends Restrictions
+{
+    public PartitionKeyRestrictions mergeWith(Restriction restriction);
+
+    public List<ByteBuffer> values(QueryOptions options);
+
+    public List<ByteBuffer> bounds(Bound b, QueryOptions options);
+
+    /**
+     * Checks if the specified bound is set or not.
+     * @param b the bound type
+     * @return <code>true</code> if the specified bound is set, <code>false</code> otherwise
+     */
+    public boolean hasBound(Bound b);
+
+    /**
+     * Checks if the specified bound is inclusive or not.
+     * @param b the bound type
+     * @return <code>true</code> if the specified bound is inclusive, <code>false</code> otherwise
+     */
+    public boolean isInclusive(Bound b);
+}

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/PartitionKeySingleRestrictionSet.java b/src/java/org/apache/cassandra/cql3/restrictions/PartitionKeySingleRestrictionSet.java
new file mode 100644
index 0000000..b96f6da
--- /dev/null
+++ b/src/java/org/apache/cassandra/cql3/restrictions/PartitionKeySingleRestrictionSet.java

@@ -0,0 +1,132 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql3.restrictions;
+
+import java.nio.ByteBuffer;
+import java.util.*;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.cql3.QueryOptions;
+import org.apache.cassandra.cql3.statements.Bound;
+import org.apache.cassandra.db.ClusteringComparator;
+import org.apache.cassandra.db.ClusteringPrefix;
+import org.apache.cassandra.db.MultiCBuilder;
+import org.apache.cassandra.db.filter.RowFilter;
+import org.apache.cassandra.index.SecondaryIndexManager;
+
+/**
+ * A set of single restrictions on the partition key.
+ * <p>This class can only contains <code>SingleRestriction</code> instances. Token restrictions will be handled by
+ * <code>TokenRestriction</code> class or by the <code>TokenFilter</code> class if the query contains a mix of token
+ * restrictions and single column restrictions on the partition key.
+ */
+final class PartitionKeySingleRestrictionSet extends RestrictionSetWrapper implements PartitionKeyRestrictions
+{
+    /**
+     * The composite type.
+     */
+    protected final ClusteringComparator comparator;
+
+    public PartitionKeySingleRestrictionSet(ClusteringComparator comparator)
+    {
+        super(new RestrictionSet());
+        this.comparator = comparator;
+    }
+
+    private PartitionKeySingleRestrictionSet(PartitionKeySingleRestrictionSet restrictionSet,
+                                       SingleRestriction restriction)
+    {
+        super(restrictionSet.restrictions.addRestriction(restriction));
+        this.comparator = restrictionSet.comparator;
+    }
+
+    private List<ByteBuffer> toByteBuffers(SortedSet<? extends ClusteringPrefix> clusterings)
+    {
+        List<ByteBuffer> l = new ArrayList<>(clusterings.size());
+        for (ClusteringPrefix clustering : clusterings)
+            l.add(CFMetaData.serializePartitionKey(clustering));
+        return l;
+    }
+
+    @Override
+    public PartitionKeyRestrictions mergeWith(Restriction restriction)
+    {
+        if (restriction.isOnToken())
+        {
+            if (isEmpty())
+                return (PartitionKeyRestrictions) restriction;
+
+            return new TokenFilter(this, (TokenRestriction) restriction);
+        }
+
+        return new PartitionKeySingleRestrictionSet(this, (SingleRestriction) restriction);
+    }
+
+    @Override
+    public List<ByteBuffer> values(QueryOptions options)
+    {
+        MultiCBuilder builder = MultiCBuilder.create(comparator, hasIN());
+        for (SingleRestriction r : restrictions)
+        {
+            r.appendTo(builder, options);
+            if (builder.hasMissingElements())
+                break;
+        }
+        return toByteBuffers(builder.build());
+    }
+
+    @Override
+    public List<ByteBuffer> bounds(Bound bound, QueryOptions options)
+    {
+        MultiCBuilder builder = MultiCBuilder.create(comparator, hasIN());
+        for (SingleRestriction r : restrictions)
+        {
+            r.appendBoundTo(builder, bound, options);
+            if (builder.hasMissingElements())
+                return Collections.emptyList();
+        }
+        return toByteBuffers(builder.buildBound(bound.isStart(), true));
+    }
+
+    @Override
+    public boolean hasBound(Bound b)
+    {
+        if (isEmpty())
+            return false;
+        return restrictions.lastRestriction().hasBound(b);
+    }
+
+    @Override
+    public boolean isInclusive(Bound b)
+    {
+        if (isEmpty())
+            return false;
+        return restrictions.lastRestriction().isInclusive(b);
+    }
+
+    @Override
+    public void addRowFilterTo(RowFilter filter,
+                               SecondaryIndexManager indexManager,
+                               QueryOptions options)
+    {
+        for (SingleRestriction restriction : restrictions)
+        {
+             restriction.addRowFilterTo(filter, indexManager, options);
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/PrimaryKeyRestrictionSet.java b/src/java/org/apache/cassandra/cql3/restrictions/PrimaryKeyRestrictionSet.java
deleted file mode 100644
index a5f4a24..0000000
--- a/src/java/org/apache/cassandra/cql3/restrictions/PrimaryKeyRestrictionSet.java
+++ /dev/null

@@ -1,325 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.cql3.restrictions;
-
-import java.nio.ByteBuffer;
-import java.util.*;
-
-import org.apache.cassandra.config.CFMetaData;
-import org.apache.cassandra.config.ColumnDefinition;
-import org.apache.cassandra.cql3.QueryOptions;
-import org.apache.cassandra.cql3.functions.Function;
-import org.apache.cassandra.cql3.statements.Bound;
-import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.filter.RowFilter;
-import org.apache.cassandra.exceptions.InvalidRequestException;
-import org.apache.cassandra.index.SecondaryIndexManager;
-import org.apache.cassandra.utils.btree.BTreeSet;
-
-import static org.apache.cassandra.cql3.statements.RequestValidations.invalidRequest;
-
-/**
- * A set of single column restrictions on a primary key part (partition key or clustering key).
- */
-final class PrimaryKeyRestrictionSet extends AbstractPrimaryKeyRestrictions implements Iterable<Restriction>
-{
-    /**
-     * The restrictions.
-     */
-    private final RestrictionSet restrictions;
-
-    /**
-     * <code>true</code> if the restrictions are corresponding to an EQ, <code>false</code> otherwise.
-     */
-    private boolean eq;
-
-    /**
-     * <code>true</code> if the restrictions are corresponding to an IN, <code>false</code> otherwise.
-     */
-    private boolean in;
-
-    /**
-     * <code>true</code> if the restrictions are corresponding to a Slice, <code>false</code> otherwise.
-     */
-    private boolean slice;
-
-    /**
-     * <code>true</code> if the restrictions are corresponding to a Contains, <code>false</code> otherwise.
-     */
-    private boolean contains;
-
-    /**
-     * <code>true</code> if the restrictions corresponding to a partition key, <code>false</code> if it's clustering columns.
-     */
-    private boolean isPartitionKey;
-
-    public PrimaryKeyRestrictionSet(ClusteringComparator comparator, boolean isPartitionKey)
-    {
-        super(comparator);
-
-        this.restrictions = new RestrictionSet();
-        this.eq = true;
-        this.isPartitionKey = isPartitionKey;
-    }
-
-    private PrimaryKeyRestrictionSet(PrimaryKeyRestrictionSet primaryKeyRestrictions,
-                                     Restriction restriction) throws InvalidRequestException
-    {
-        super(primaryKeyRestrictions.comparator);
-        this.restrictions = primaryKeyRestrictions.restrictions.addRestriction(restriction);
-        this.isPartitionKey = primaryKeyRestrictions.isPartitionKey;
-
-        if (restriction.isSlice() || primaryKeyRestrictions.isSlice())
-            this.slice = true;
-        else if (restriction.isContains() || primaryKeyRestrictions.isContains())
-            this.contains = true;
-        else if (restriction.isIN() || primaryKeyRestrictions.isIN())
-            this.in = true;
-        else
-            this.eq = true;
-    }
-
-    private List<ByteBuffer> toByteBuffers(SortedSet<? extends ClusteringPrefix> clusterings)
-    {
-        // It's currently a tad hard to follow that this is only called for partition key so we should fix that
-        List<ByteBuffer> l = new ArrayList<>(clusterings.size());
-        for (ClusteringPrefix clustering : clusterings)
-            l.add(CFMetaData.serializePartitionKey(clustering));
-        return l;
-    }
-
-    @Override
-    public boolean isSlice()
-    {
-        return slice;
-    }
-
-    @Override
-    public boolean isEQ()
-    {
-        return eq;
-    }
-
-    @Override
-    public boolean isIN()
-    {
-        return in;
-    }
-
-    @Override
-    public boolean isOnToken()
-    {
-        return false;
-    }
-
-    @Override
-    public boolean isContains()
-    {
-        return contains;
-    }
-
-    @Override
-    public boolean isMultiColumn()
-    {
-        return false;
-    }
-
-    @Override
-    public void addFunctionsTo(List<Function> functions)
-    {
-        restrictions.addFunctionsTo(functions);
-    }
-
-    @Override
-    public PrimaryKeyRestrictions mergeWith(Restriction restriction) throws InvalidRequestException
-    {
-        if (restriction.isOnToken())
-        {
-            if (isEmpty())
-                return (PrimaryKeyRestrictions) restriction;
-
-            return new TokenFilter(this, (TokenRestriction) restriction);
-        }
-
-        return new PrimaryKeyRestrictionSet(this, restriction);
-    }
-
-    @Override
-    public NavigableSet<Clustering> valuesAsClustering(QueryOptions options) throws InvalidRequestException
-    {
-        return appendTo(MultiCBuilder.create(comparator), options).build();
-    }
-
-    @Override
-    public MultiCBuilder appendTo(MultiCBuilder builder, QueryOptions options)
-    {
-        for (Restriction r : restrictions)
-        {
-            r.appendTo(builder, options);
-            if (builder.hasMissingElements())
-                break;
-        }
-        return builder;
-    }
-
-    @Override
-    public MultiCBuilder appendBoundTo(MultiCBuilder builder, Bound bound, QueryOptions options)
-    {
-        throw new UnsupportedOperationException();
-    }
-
-    @Override
-    public NavigableSet<Slice.Bound> boundsAsClustering(Bound bound, QueryOptions options) throws InvalidRequestException
-    {
-        MultiCBuilder builder = MultiCBuilder.create(comparator);
-        int keyPosition = 0;
-        for (Restriction r : restrictions)
-        {
-            ColumnDefinition def = r.getFirstColumn();
-
-            if (keyPosition != def.position() || r.isContains())
-                break;
-
-            if (r.isSlice())
-            {
-                r.appendBoundTo(builder, bound, options);
-                return builder.buildBoundForSlice(bound.isStart(),
-                                                  r.isInclusive(bound),
-                                                  r.isInclusive(bound.reverse()),
-                                                  r.getColumnDefs());
-            }
-
-            r.appendBoundTo(builder, bound, options);
-
-            if (builder.hasMissingElements())
-                return BTreeSet.empty(comparator);
-
-            keyPosition = r.getLastColumn().position() + 1;
-        }
-
-        // Everything was an equal (or there was nothing)
-        return builder.buildBound(bound.isStart(), true);
-    }
-
-    @Override
-    public List<ByteBuffer> values(QueryOptions options) throws InvalidRequestException
-    {
-        if (!isPartitionKey)
-            throw new UnsupportedOperationException();
-
-        return toByteBuffers(valuesAsClustering(options));
-    }
-
-    @Override
-    public List<ByteBuffer> bounds(Bound b, QueryOptions options) throws InvalidRequestException
-    {
-        if (!isPartitionKey)
-            throw new UnsupportedOperationException();
-
-        return toByteBuffers(boundsAsClustering(b, options));
-    }
-
-    @Override
-    public boolean hasBound(Bound b)
-    {
-        if (isEmpty())
-            return false;
-        return restrictions.lastRestriction().hasBound(b);
-    }
-
-    @Override
-    public boolean isInclusive(Bound b)
-    {
-        if (isEmpty())
-            return false;
-        return restrictions.lastRestriction().isInclusive(b);
-    }
-
-    @Override
-    public boolean hasSupportingIndex(SecondaryIndexManager indexManager)
-    {
-        return restrictions.hasSupportingIndex(indexManager);
-    }
-
-    @Override
-    public void addRowFilterTo(RowFilter filter,
-                               SecondaryIndexManager indexManager,
-                               QueryOptions options) throws InvalidRequestException
-    {
-        int position = 0;
-
-        for (Restriction restriction : restrictions)
-        {
-            // We ignore all the clustering columns that can be handled by slices.
-            if (isPartitionKey || handleInFilter(restriction, position) || restriction.hasSupportingIndex(indexManager))
-            {
-                restriction.addRowFilterTo(filter, indexManager, options);
-                continue;
-            }
-
-            if (!restriction.isSlice())
-                position = restriction.getLastColumn().position() + 1;
-        }
-    }
-
-    @Override
-    public List<ColumnDefinition> getColumnDefs()
-    {
-        return restrictions.getColumnDefs();
-    }
-
-    @Override
-    public ColumnDefinition getFirstColumn()
-    {
-        return restrictions.firstColumn();
-    }
-
-    @Override
-    public ColumnDefinition getLastColumn()
-    {
-        return restrictions.lastColumn();
-    }
-
-    public final boolean needsFiltering()
-    {
-        // Backported from ClusteringColumnRestrictions from CASSANDRA-11310 for 3.6
-        // As that suggests, this should only be called on clustering column
-        // and not partition key restrictions.
-        int position = 0;
-        for (Restriction restriction : restrictions)
-        {
-            if (handleInFilter(restriction, position))
-                return true;
-
-            if (!restriction.isSlice())
-                position = restriction.getLastColumn().position() + 1;
-        }
-
-        return false;
-    }
-
-    private boolean handleInFilter(Restriction restriction, int index)
-    {
-        return restriction.isContains() || index != restriction.getFirstColumn().position();
-    }
-
-    public Iterator<Restriction> iterator()
-    {
-        return restrictions.iterator();
-    }
-}

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/PrimaryKeyRestrictions.java b/src/java/org/apache/cassandra/cql3/restrictions/PrimaryKeyRestrictions.java
deleted file mode 100644
index 2f9cd7b..0000000
--- a/src/java/org/apache/cassandra/cql3/restrictions/PrimaryKeyRestrictions.java
+++ /dev/null

@@ -1,46 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.cql3.restrictions;
-
-import java.nio.ByteBuffer;
-import java.util.List;
-import java.util.NavigableSet;
-
-import org.apache.cassandra.cql3.QueryOptions;
-import org.apache.cassandra.cql3.statements.Bound;
-import org.apache.cassandra.db.Clustering;
-import org.apache.cassandra.db.Slice;
-import org.apache.cassandra.exceptions.InvalidRequestException;
-
-/**
- * A set of restrictions on a primary key part (partition key or clustering key).
- *
- */
-interface PrimaryKeyRestrictions extends Restriction, Restrictions
-{
-    @Override
-    public PrimaryKeyRestrictions mergeWith(Restriction restriction) throws InvalidRequestException;
-
-    public List<ByteBuffer> values(QueryOptions options) throws InvalidRequestException;
-
-    public NavigableSet<Clustering> valuesAsClustering(QueryOptions options) throws InvalidRequestException;
-
-    public List<ByteBuffer> bounds(Bound b, QueryOptions options) throws InvalidRequestException;
-
-    public NavigableSet<Slice.Bound> boundsAsClustering(Bound bound, QueryOptions options) throws InvalidRequestException;
-}

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/Restriction.java b/src/java/org/apache/cassandra/cql3/restrictions/Restriction.java
index 987fd30..fc7f5bc 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/Restriction.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/Restriction.java

@@ -22,27 +22,18 @@
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.QueryOptions;
 import org.apache.cassandra.cql3.functions.Function;
-import org.apache.cassandra.cql3.statements.Bound;
-import org.apache.cassandra.db.MultiCBuilder;
 import org.apache.cassandra.db.filter.RowFilter;
-import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.index.SecondaryIndexManager;
 
 /**
- * A restriction/clause on a column.
- * The goal of this class being to group all conditions for a column in a SELECT.
- *
- * <p>Implementation of this class must be immutable. See {@link #mergeWith(Restriction)} for more explanation.</p>
+ * <p>Implementation of this class must be immutable.</p>
  */
 public interface Restriction
 {
-    public boolean isOnToken();
-    public boolean isSlice();
-    public boolean isEQ();
-    public boolean isIN();
-    public boolean isContains();
-    public boolean isNotNull();
-    public boolean isMultiColumn();
+    public default boolean isOnToken()
+    {
+        return false;
+    }
 
     /**
      * Returns the definition of the first column.
@@ -70,33 +61,6 @@
     void addFunctionsTo(List<Function> functions);
 
     /**
-     * Checks if the specified bound is set or not.
-     * @param b the bound type
-     * @return <code>true</code> if the specified bound is set, <code>false</code> otherwise
-     */
-    public boolean hasBound(Bound b);
-
-    /**
-     * Checks if the specified bound is inclusive or not.
-     * @param b the bound type
-     * @return <code>true</code> if the specified bound is inclusive, <code>false</code> otherwise
-     */
-    public boolean isInclusive(Bound b);
-
-    /**
-     * Merges this restriction with the specified one.
-     *
-     * <p>Restriction are immutable. Therefore merging two restrictions result in a new one.
-     * The reason behind this choice is that it allow a great flexibility in the way the merging can done while
-     * preventing any side effect.</p>
-     *
-     * @param otherRestriction the restriction to merge into this one
-     * @return the restriction resulting of the merge
-     * @throws InvalidRequestException if the restrictions cannot be merged
-     */
-    public Restriction mergeWith(Restriction otherRestriction) throws InvalidRequestException;
-
-    /**
      * Check if the restriction is on indexed columns.
      *
      * @param indexManager the index manager
@@ -110,29 +74,8 @@
      * @param filter the row filter to add expressions to
      * @param indexManager the secondary index manager
      * @param options the query options
-     * @throws InvalidRequestException if this <code>Restriction</code> cannot be converted into a row filter
      */
     public void addRowFilterTo(RowFilter filter,
                                SecondaryIndexManager indexManager,
-                               QueryOptions options)
-                               throws InvalidRequestException;
-
-    /**
-     * Appends the values of this <code>Restriction</code> to the specified builder.
-     *
-     * @param builder the <code>MultiCBuilder</code> to append to.
-     * @param options the query options
-     * @return the <code>MultiCBuilder</code>
-     */
-    public MultiCBuilder appendTo(MultiCBuilder builder, QueryOptions options);
-
-    /**
-     * Appends the values of the <code>Restriction</code> for the specified bound to the specified builder.
-     *
-     * @param builder the <code>MultiCBuilder</code> to append to.
-     * @param bound the bound
-     * @param options the query options
-     * @return the <code>MultiCBuilder</code>
-     */
-    public MultiCBuilder appendBoundTo(MultiCBuilder builder, Bound bound, QueryOptions options);
+                               QueryOptions options);
 }

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/RestrictionSet.java b/src/java/org/apache/cassandra/cql3/restrictions/RestrictionSet.java
index 9aeea69..2648f62 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/RestrictionSet.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/RestrictionSet.java

@@ -18,6 +18,7 @@
 package org.apache.cassandra.cql3.restrictions;
 
 import java.util.*;
+import java.util.stream.Stream;
 
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.QueryOptions;
@@ -30,10 +31,9 @@
 /**
  * Sets of column restrictions.
  *
- * <p>This class is immutable in order to be use within {@link PrimaryKeyRestrictionSet} which as
- * an implementation of {@link Restriction} need to be immutable.
+ * <p>This class is immutable.</p>
  */
-final class RestrictionSet implements Restrictions, Iterable<Restriction>
+final class RestrictionSet implements Restrictions, Iterable<SingleRestriction>
 {
     /**
      * The comparator used to sort the <code>Restriction</code>s.
@@ -51,14 +51,14 @@
     /**
      * The restrictions per column.
      */
-    protected final TreeMap<ColumnDefinition, Restriction> restrictions;
+    protected final TreeMap<ColumnDefinition, SingleRestriction> restrictions;
 
     public RestrictionSet()
     {
-        this(new TreeMap<ColumnDefinition, Restriction>(COLUMN_DEFINITION_COMPARATOR));
+        this(new TreeMap<ColumnDefinition, SingleRestriction>(COLUMN_DEFINITION_COMPARATOR));
     }
 
-    private RestrictionSet(TreeMap<ColumnDefinition, Restriction> restrictions)
+    private RestrictionSet(TreeMap<ColumnDefinition, SingleRestriction> restrictions)
     {
         this.restrictions = restrictions;
     }
@@ -76,6 +76,11 @@
         return new ArrayList<>(restrictions.keySet());
     }
 
+    public Stream<SingleRestriction> stream()
+    {
+        return new LinkedHashSet<>(restrictions.values()).stream();
+    }
+
     @Override
     public void addFunctionsTo(List<Function> functions)
     {
@@ -94,13 +99,13 @@
     @Override
     public final boolean isEmpty()
     {
-        return getColumnDefs().isEmpty();
+        return restrictions.isEmpty();
     }
 
     @Override
     public final int size()
     {
-        return getColumnDefs().size();
+        return restrictions.size();
     }
 
     /**
@@ -108,21 +113,19 @@
      *
      * @param restriction the restriction to add
      * @return the new set of restrictions
-     * @throws InvalidRequestException if the new restriction cannot be added
      */
-    public RestrictionSet addRestriction(Restriction restriction) throws InvalidRequestException
+    public RestrictionSet addRestriction(SingleRestriction restriction)
     {
         // RestrictionSet is immutable so we need to clone the restrictions map.
-        TreeMap<ColumnDefinition, Restriction> newRestrictions = new TreeMap<>(this.restrictions);
+        TreeMap<ColumnDefinition, SingleRestriction> newRestrictions = new TreeMap<>(this.restrictions);
         return new RestrictionSet(mergeRestrictions(newRestrictions, restriction));
     }
 
-    private TreeMap<ColumnDefinition, Restriction> mergeRestrictions(TreeMap<ColumnDefinition, Restriction> restrictions,
-                                                                     Restriction restriction)
-                                                                     throws InvalidRequestException
+    private TreeMap<ColumnDefinition, SingleRestriction> mergeRestrictions(TreeMap<ColumnDefinition, SingleRestriction> restrictions,
+                                                                           SingleRestriction restriction)
     {
         Collection<ColumnDefinition> columnDefs = restriction.getColumnDefs();
-        Set<Restriction> existingRestrictions = getRestrictions(columnDefs);
+        Set<SingleRestriction> existingRestrictions = getRestrictions(columnDefs);
 
         if (existingRestrictions.isEmpty())
         {
@@ -131,9 +134,9 @@
         }
         else
         {
-            for (Restriction existing : existingRestrictions)
+            for (SingleRestriction existing : existingRestrictions)
             {
-                Restriction newRestriction = mergeRestrictions(existing, restriction);
+                SingleRestriction newRestriction = mergeRestrictions(existing, restriction);
 
                 for (ColumnDefinition columnDef : columnDefs)
                     restrictions.put(columnDef, newRestriction);
@@ -149,12 +152,12 @@
      * @param columnDefs the column definitions
      * @return all the restrictions applied to the specified columns
      */
-    private Set<Restriction> getRestrictions(Collection<ColumnDefinition> columnDefs)
+    private Set<SingleRestriction> getRestrictions(Collection<ColumnDefinition> columnDefs)
     {
-        Set<Restriction> set = new HashSet<>();
+        Set<SingleRestriction> set = new HashSet<>();
         for (ColumnDefinition columnDef : columnDefs)
         {
-            Restriction existing = restrictions.get(columnDef);
+            SingleRestriction existing = restrictions.get(columnDef);
             if (existing != null)
                 set.add(existing);
         }
@@ -183,22 +186,14 @@
         return restrictions.tailMap(columnDef, false).firstKey();
     }
 
-    /**
-     * Returns the definition of the first column.
-     *
-     * @return the definition of the first column.
-     */
-    ColumnDefinition firstColumn()
+    @Override
+    public ColumnDefinition getFirstColumn()
     {
         return isEmpty() ? null : this.restrictions.firstKey();
     }
 
-    /**
-     * Returns the definition of the last column.
-     *
-     * @return the definition of the last column.
-     */
-    ColumnDefinition lastColumn()
+    @Override
+    public ColumnDefinition getLastColumn()
     {
         return isEmpty() ? null : this.restrictions.lastKey();
     }
@@ -208,7 +203,7 @@
      *
      * @return the last restriction.
      */
-    Restriction lastRestriction()
+    SingleRestriction lastRestriction()
     {
         return isEmpty() ? null : this.restrictions.lastEntry().getValue();
     }
@@ -221,8 +216,8 @@
      * @return the merged restriction
      * @throws InvalidRequestException if the two restrictions cannot be merged
      */
-    private static Restriction mergeRestrictions(Restriction restriction,
-                                                 Restriction otherRestriction) throws InvalidRequestException
+    private static SingleRestriction mergeRestrictions(SingleRestriction restriction,
+                                                       SingleRestriction otherRestriction)
     {
         return restriction == null ? otherRestriction
                                    : restriction.mergeWith(otherRestriction);
@@ -237,7 +232,7 @@
     public final boolean hasMultipleContains()
     {
         int numberOfContains = 0;
-        for (Restriction restriction : restrictions.values())
+        for (SingleRestriction restriction : restrictions.values())
         {
             if (restriction.isContains())
             {
@@ -249,8 +244,28 @@
     }
 
     @Override
-    public Iterator<Restriction> iterator()
+    public Iterator<SingleRestriction> iterator()
     {
         return new LinkedHashSet<>(restrictions.values()).iterator();
     }
+
+    /**
+     * Checks if any of the underlying restriction is an IN.
+     * @return <code>true</code> if any of the underlying restriction is an IN, <code>false</code> otherwise
+     */
+    public final boolean hasIN()
+    {
+        return stream().anyMatch(SingleRestriction::isIN);
+    }
+
+    /**
+     * Checks if all of the underlying restrictions are EQ or IN restrictions.
+     *
+     * @return <code>true</code> if all of the underlying restrictions are EQ or IN restrictions,
+     * <code>false</code> otherwise
+     */
+    public final boolean hasOnlyEqualityRestrictions()
+    {
+        return stream().allMatch(p -> p.isEQ() || p.isIN());
+    }
 }

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/RestrictionSetWrapper.java b/src/java/org/apache/cassandra/cql3/restrictions/RestrictionSetWrapper.java
new file mode 100644
index 0000000..6ad5fbb
--- /dev/null
+++ b/src/java/org/apache/cassandra/cql3/restrictions/RestrictionSetWrapper.java

@@ -0,0 +1,95 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql3.restrictions;
+
+import java.util.List;
+
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.cql3.QueryOptions;
+import org.apache.cassandra.cql3.functions.Function;
+import org.apache.cassandra.db.filter.RowFilter;
+import org.apache.cassandra.index.SecondaryIndexManager;
+
+/**
+ * A <code>RestrictionSet</code> wrapper that can be extended to allow to modify the <code>RestrictionSet</code>
+ * behaviour without breaking its immutability. Sub-classes should be immutables.
+ */
+class RestrictionSetWrapper implements Restrictions
+{
+    /**
+     * The wrapped <code>RestrictionSet</code>.
+     */
+    protected final RestrictionSet restrictions;
+
+    public RestrictionSetWrapper(RestrictionSet restrictions)
+    {
+        this.restrictions = restrictions;
+    }
+
+    public void addRowFilterTo(RowFilter filter,
+                               SecondaryIndexManager indexManager,
+                               QueryOptions options)
+    {
+        restrictions.addRowFilterTo(filter, indexManager, options);
+    }
+
+    public List<ColumnDefinition> getColumnDefs()
+    {
+        return restrictions.getColumnDefs();
+    }
+
+    public void addFunctionsTo(List<Function> functions)
+    {
+        restrictions.addFunctionsTo(functions);
+    }
+
+    public boolean isEmpty()
+    {
+        return restrictions.isEmpty();
+    }
+
+    public int size()
+    {
+        return restrictions.size();
+    }
+
+    public boolean hasSupportingIndex(SecondaryIndexManager indexManager)
+    {
+        return restrictions.hasSupportingIndex(indexManager);
+    }
+
+    public ColumnDefinition getFirstColumn()
+    {
+        return restrictions.getFirstColumn();
+    }
+
+    public ColumnDefinition getLastColumn()
+    {
+        return restrictions.getLastColumn();
+    }
+
+    public boolean hasIN()
+    {
+        return restrictions.hasIN();
+    }
+
+    public boolean hasOnlyEqualityRestrictions()
+    {
+        return restrictions.hasOnlyEqualityRestrictions();
+    }
+}

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/Restrictions.java b/src/java/org/apache/cassandra/cql3/restrictions/Restrictions.java
index 5fa3170..7ca82ab 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/Restrictions.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/Restrictions.java

@@ -17,53 +17,12 @@
  */
 package org.apache.cassandra.cql3.restrictions;
 
-import java.util.Collection;
-import java.util.List;
-
-import org.apache.cassandra.config.ColumnDefinition;
-import org.apache.cassandra.cql3.QueryOptions;
-import org.apache.cassandra.cql3.functions.Function;
-import org.apache.cassandra.db.filter.RowFilter;
-import org.apache.cassandra.exceptions.InvalidRequestException;
-import org.apache.cassandra.index.SecondaryIndexManager;
-
 /**
  * Sets of restrictions
  */
-interface Restrictions
+public interface Restrictions extends Restriction
 {
     /**
-     * Returns the column definitions in position order.
-     * @return the column definitions in position order.
-     */
-    public Collection<ColumnDefinition> getColumnDefs();
-
-    /**
-     * Adds all functions (native and user-defined) used by any component of the restriction
-     * to the specified list.
-     * @param functions the list to add to
-     */
-    public void addFunctionsTo(List<Function> functions);
-
-    /**
-     * Check if the restriction is on indexed columns.
-     *
-     * @param indexManager the index manager
-     * @return <code>true</code> if the restriction is on indexed columns, <code>false</code>
-     */
-    public boolean hasSupportingIndex(SecondaryIndexManager indexManager);
-
-    /**
-     * Adds to the specified row filter the expressions corresponding to this <code>Restrictions</code>.
-     *
-     * @param filter the row filter to add expressions to
-     * @param indexManager the secondary index manager
-     * @param options the query options
-     * @throws InvalidRequestException if this <code>Restrictions</code> cannot be converted into a row filter
-     */
-    public void addRowFilterTo(RowFilter filter, SecondaryIndexManager indexManager, QueryOptions options) throws InvalidRequestException;
-
-    /**
      * Checks if this <code>PrimaryKeyRestrictionSet</code> is empty or not.
      *
      * @return <code>true</code> if this <code>PrimaryKeyRestrictionSet</code> is empty, <code>false</code> otherwise.
@@ -76,4 +35,18 @@
      * @return the number of columns that have a restriction.
      */
     public int size();
+
+    /**
+     * Checks if any of the underlying restriction is an IN.
+     * @return <code>true</code> if any of the underlying restriction is an IN, <code>false</code> otherwise
+     */
+    public boolean hasIN();
+
+    /**
+     * Checks if all of the underlying restrictions are EQ or IN restrictions.
+     *
+     * @return <code>true</code> if all of the underlying restrictions are EQ or IN restrictions,
+     * <code>false</code> otherwise
+     */
+    public boolean hasOnlyEqualityRestrictions();
 }

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java b/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java
index 6296b97..ae883d1 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/SingleColumnRestriction.java

@@ -18,7 +18,9 @@
 package org.apache.cassandra.cql3.restrictions;
 
 import java.nio.ByteBuffer;
-import java.util.*;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
 
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.*;
@@ -27,9 +29,10 @@
 import org.apache.cassandra.cql3.statements.Bound;
 import org.apache.cassandra.db.MultiCBuilder;
 import org.apache.cassandra.db.filter.RowFilter;
-import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.index.Index;
 import org.apache.cassandra.index.SecondaryIndexManager;
+import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.utils.Pair;
 
 import static org.apache.cassandra.cql3.statements.RequestValidations.checkBindValueSet;
 import static org.apache.cassandra.cql3.statements.RequestValidations.checkFalse;
@@ -37,7 +40,7 @@
 import static org.apache.cassandra.cql3.statements.RequestValidations.checkTrue;
 import static org.apache.cassandra.cql3.statements.RequestValidations.invalidRequest;
 
-public abstract class SingleColumnRestriction extends AbstractRestriction
+public abstract class SingleColumnRestriction implements SingleRestriction
 {
     /**
      * The definition of the column to which apply the restriction.
@@ -78,7 +81,7 @@
     }
 
     @Override
-    public final Restriction mergeWith(Restriction otherRestriction) throws InvalidRequestException
+    public final SingleRestriction mergeWith(SingleRestriction otherRestriction)
     {
         // We want to allow query like: b > ? AND (b,c) < (?, ?)
         if (otherRestriction.isMultiColumn() && canBeConvertedToMultiColumnRestriction())
@@ -89,7 +92,7 @@
         return doMergeWith(otherRestriction);
     }
 
-    protected abstract Restriction doMergeWith(Restriction otherRestriction) throws InvalidRequestException;
+    protected abstract SingleRestriction doMergeWith(SingleRestriction otherRestriction);
 
     /**
      * Converts this <code>SingleColumnRestriction</code> into a {@link MultiColumnRestriction}
@@ -170,7 +173,7 @@
         }
 
         @Override
-        public Restriction doMergeWith(Restriction otherRestriction) throws InvalidRequestException
+        public SingleRestriction doMergeWith(SingleRestriction otherRestriction)
         {
             throw invalidRequest("%s cannot be restricted by more than one relation if it includes an Equal", columnDef.name);
         }
@@ -196,7 +199,7 @@
         }
 
         @Override
-        public final Restriction doMergeWith(Restriction otherRestriction) throws InvalidRequestException
+        public final SingleRestriction doMergeWith(SingleRestriction otherRestriction)
         {
             throw invalidRequest("%s cannot be restricted by more than one relation if it includes a IN", columnDef.name);
         }
@@ -213,7 +216,7 @@
         @Override
         public void addRowFilterTo(RowFilter filter,
                                    SecondaryIndexManager indexManager,
-                                   QueryOptions options) throws InvalidRequestException
+                                   QueryOptions options)
         {
             List<ByteBuffer> values = getValues(options);
             checkTrue(values.size() == 1, "IN restrictions are not supported on indexed columns");
@@ -227,7 +230,7 @@
             return index.supportsExpression(columnDef, Operator.IN);
         }
 
-        protected abstract List<ByteBuffer> getValues(QueryOptions options) throws InvalidRequestException;
+        protected abstract List<ByteBuffer> getValues(QueryOptions options);
     }
 
     public static class InRestrictionWithValues extends INRestriction
@@ -253,7 +256,7 @@
         }
 
         @Override
-        protected List<ByteBuffer> getValues(QueryOptions options) throws InvalidRequestException
+        protected List<ByteBuffer> getValues(QueryOptions options)
         {
             List<ByteBuffer> buffers = new ArrayList<>(values.size());
             for (Term value : values)
@@ -290,7 +293,7 @@
         }
 
         @Override
-        protected List<ByteBuffer> getValues(QueryOptions options) throws InvalidRequestException
+        protected List<ByteBuffer> getValues(QueryOptions options)
         {
             Terminal term = marker.bind(options);
             checkNotNull(term, "Invalid null value for column %s", columnDef.name);
@@ -349,7 +352,7 @@
         @Override
         public MultiCBuilder appendBoundTo(MultiCBuilder builder, Bound bound, QueryOptions options)
         {
-            Bound b = reverseBoundIfNeeded(getFirstColumn(), bound);
+            Bound b = bound.reverseIfNeeded(getFirstColumn());
 
             if (!hasBound(b))
                 return builder;
@@ -367,7 +370,7 @@
         }
 
         @Override
-        public Restriction doMergeWith(Restriction otherRestriction) throws InvalidRequestException
+        public SingleRestriction doMergeWith(SingleRestriction otherRestriction)
         {
             checkTrue(otherRestriction.isSlice(),
                       "Column \"%s\" cannot be restricted by both an equality and an inequality relation",
@@ -385,7 +388,7 @@
         }
 
         @Override
-        public void addRowFilterTo(RowFilter filter, SecondaryIndexManager indexManager, QueryOptions options) throws InvalidRequestException
+        public void addRowFilterTo(RowFilter filter, SecondaryIndexManager indexManager, QueryOptions options)
         {
             for (Bound b : Bound.values())
                 if (hasBound(b))
@@ -460,7 +463,7 @@
         }
 
         @Override
-        public Restriction doMergeWith(Restriction otherRestriction) throws InvalidRequestException
+        public SingleRestriction doMergeWith(SingleRestriction otherRestriction)
         {
             checkTrue(otherRestriction.isContains(),
                       "Collection column %s can only be restricted by CONTAINS, CONTAINS KEY, or map-entry equality",
@@ -475,7 +478,7 @@
         }
 
         @Override
-        public void addRowFilterTo(RowFilter filter, SecondaryIndexManager indexManager, QueryOptions options) throws InvalidRequestException
+        public void addRowFilterTo(RowFilter filter, SecondaryIndexManager indexManager, QueryOptions options)
         {
             for (ByteBuffer value : bindAndGet(values, options))
                 filter.add(columnDef, Operator.CONTAINS, value);
@@ -560,9 +563,8 @@
          * @param terms the terms
          * @param options the query options
          * @return the value resulting from binding the query options to the specified terms
-         * @throws InvalidRequestException if a problem occurs while binding the query options
          */
-        private static List<ByteBuffer> bindAndGet(List<Term> terms, QueryOptions options) throws InvalidRequestException
+        private static List<ByteBuffer> bindAndGet(List<Term> terms, QueryOptions options)
         {
             List<ByteBuffer> buffers = new ArrayList<>(terms.size());
             for (Term value : terms)
@@ -635,7 +637,7 @@
         }
 
         @Override
-        public Restriction doMergeWith(Restriction otherRestriction) throws InvalidRequestException
+        public SingleRestriction doMergeWith(SingleRestriction otherRestriction)
         {
             throw invalidRequest("%s cannot be restricted by a relation if it includes an IS NOT NULL", columnDef.name);
         }
@@ -646,4 +648,135 @@
             return index.supportsExpression(columnDef, Operator.IS_NOT);
         }
     }
+
+    public static final class LikeRestriction extends SingleColumnRestriction
+    {
+        private static final ByteBuffer LIKE_WILDCARD = ByteBufferUtil.bytes("%");
+        private final Operator operator;
+        private final Term value;
+
+        public LikeRestriction(ColumnDefinition columnDef, Operator operator, Term value)
+        {
+            super(columnDef);
+            this.operator = operator;
+            this.value = value;
+        }
+
+        @Override
+        public void addFunctionsTo(List<Function> functions)
+        {
+            value.addFunctionsTo(functions);
+        }
+
+        @Override
+        public boolean isEQ()
+        {
+            return false;
+        }
+
+        @Override
+        public boolean isLIKE()
+        {
+            return true;
+        }
+
+        @Override
+        public boolean canBeConvertedToMultiColumnRestriction()
+        {
+            return false;
+        }
+
+        @Override
+        MultiColumnRestriction toMultiColumnRestriction()
+        {
+            throw new UnsupportedOperationException();
+        }
+
+        @Override
+        public void addRowFilterTo(RowFilter filter,
+                                   SecondaryIndexManager indexManager,
+                                   QueryOptions options)
+        {
+            Pair<Operator, ByteBuffer> operation = makeSpecific(value.bindAndGet(options));
+
+            // there must be a suitable INDEX for LIKE_XXX expressions
+            RowFilter.SimpleExpression expression = filter.add(columnDef, operation.left, operation.right);
+            indexManager.getBestIndexFor(expression)
+                        .orElseThrow(() -> invalidRequest("%s is only supported on properly indexed columns",
+                                                          expression));
+        }
+
+        @Override
+        public MultiCBuilder appendTo(MultiCBuilder builder, QueryOptions options)
+        {
+            // LIKE can be used with clustering columns, but as it doesn't
+            // represent an actual clustering value, it can't be used in a
+            // clustering filter.
+            throw new UnsupportedOperationException();
+        }
+
+        @Override
+        public String toString()
+        {
+            return operator.toString();
+        }
+
+        @Override
+        public SingleRestriction doMergeWith(SingleRestriction otherRestriction)
+        {
+            throw invalidRequest("%s cannot be restricted by more than one relation if it includes a %s", columnDef.name, operator);
+        }
+
+        @Override
+        protected boolean isSupportedBy(Index index)
+        {
+            return index.supportsExpression(columnDef, operator);
+        }
+
+        /**
+         * As the specific subtype of LIKE (LIKE_PREFIX, LIKE_SUFFIX, LIKE_CONTAINS, LIKE_MATCHES) can only be
+         * determined by examining the value, which in turn can only be known after binding, all LIKE restrictions
+         * are initially created with the generic LIKE operator. This function takes the bound value, trims the
+         * wildcard '%' chars from it and returns a tuple of the inferred operator subtype and the final value
+         * @param value the bound value for the LIKE operation
+         * @return  Pair containing the inferred LIKE subtype and the value with wildcards removed
+         */
+        private static Pair<Operator, ByteBuffer> makeSpecific(ByteBuffer value)
+        {
+            Operator operator;
+            int beginIndex = value.position();
+            int endIndex = value.limit() - 1;
+            if (ByteBufferUtil.endsWith(value, LIKE_WILDCARD))
+            {
+                if (ByteBufferUtil.startsWith(value, LIKE_WILDCARD))
+                {
+                    operator = Operator.LIKE_CONTAINS;
+                    beginIndex =+ 1;
+                }
+                else
+                {
+                    operator = Operator.LIKE_PREFIX;
+                }
+            }
+            else if (ByteBufferUtil.startsWith(value, LIKE_WILDCARD))
+            {
+                operator = Operator.LIKE_SUFFIX;
+                beginIndex += 1;
+                endIndex += 1;
+            }
+            else
+            {
+                operator = Operator.LIKE_MATCHES;
+                endIndex += 1;
+            }
+
+            if (endIndex == 0 || beginIndex == endIndex)
+                throw invalidRequest("LIKE value can't be empty.");
+
+            ByteBuffer newValue = value.duplicate();
+            newValue.position(beginIndex);
+            newValue.limit(endIndex);
+            return Pair.create(operator, newValue);
+        }
+    }
 }

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/SingleRestriction.java b/src/java/org/apache/cassandra/cql3/restrictions/SingleRestriction.java
new file mode 100644
index 0000000..42b0b4e
--- /dev/null
+++ b/src/java/org/apache/cassandra/cql3/restrictions/SingleRestriction.java

@@ -0,0 +1,117 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql3.restrictions;
+
+import org.apache.cassandra.cql3.QueryOptions;
+import org.apache.cassandra.cql3.statements.Bound;
+import org.apache.cassandra.db.MultiCBuilder;
+
+/**
+ * A single restriction/clause on one or multiple column.
+ */
+public interface SingleRestriction extends Restriction
+{
+    public default boolean isSlice()
+    {
+        return false;
+    }
+
+    public default boolean isEQ()
+    {
+        return false;
+    }
+
+    public default boolean isLIKE()
+    {
+        return false;
+    }
+
+    public default boolean isIN()
+    {
+        return false;
+    }
+
+    public default boolean isContains()
+    {
+        return false;
+    }
+
+    public default boolean isNotNull()
+    {
+        return false;
+    }
+
+    public default boolean isMultiColumn()
+    {
+        return false;
+    }
+
+    /**
+     * Checks if the specified bound is set or not.
+     * @param b the bound type
+     * @return <code>true</code> if the specified bound is set, <code>false</code> otherwise
+     */
+    public default boolean hasBound(Bound b)
+    {
+        return true;
+    }
+
+    /**
+     * Checks if the specified bound is inclusive or not.
+     * @param b the bound type
+     * @return <code>true</code> if the specified bound is inclusive, <code>false</code> otherwise
+     */
+    public default boolean isInclusive(Bound b)
+    {
+        return true;
+    }
+
+    /**
+     * Merges this restriction with the specified one.
+     *
+     * <p>Restriction are immutable. Therefore merging two restrictions result in a new one.
+     * The reason behind this choice is that it allow a great flexibility in the way the merging can done while
+     * preventing any side effect.</p>
+     *
+     * @param otherRestriction the restriction to merge into this one
+     * @return the restriction resulting of the merge
+     */
+    public SingleRestriction mergeWith(SingleRestriction otherRestriction);
+
+    /**
+     * Appends the values of this <code>SingleRestriction</code> to the specified builder.
+     *
+     * @param builder the <code>MultiCBuilder</code> to append to.
+     * @param options the query options
+     * @return the <code>MultiCBuilder</code>
+     */
+    public MultiCBuilder appendTo(MultiCBuilder builder, QueryOptions options);
+
+    /**
+     * Appends the values of the <code>SingleRestriction</code> for the specified bound to the specified builder.
+     *
+     * @param builder the <code>MultiCBuilder</code> to append to.
+     * @param bound the bound
+     * @param options the query options
+     * @return the <code>MultiCBuilder</code>
+     */
+    public default MultiCBuilder appendBoundTo(MultiCBuilder builder, Bound bound, QueryOptions options)
+    {
+        return appendTo(builder, options);
+    }
+}

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
index 647d22f..f8a39ee 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/StatementRestrictions.java

@@ -66,12 +66,12 @@
     /**
      * Restrictions on partitioning columns
      */
-    private PrimaryKeyRestrictions partitionKeyRestrictions;
+    private PartitionKeyRestrictions partitionKeyRestrictions;
 
     /**
      * Restrictions on clustering columns
      */
-    private PrimaryKeyRestrictions clusteringColumnsRestrictions;
+    private ClusteringColumnRestrictions clusteringColumnsRestrictions;
 
     /**
      * Restriction on non-primary key columns (i.e. secondary index restrictions)
@@ -83,7 +83,7 @@
     /**
      * The restrictions used to build the row filter
      */
-    private final IndexRestrictions indexRestrictions = new IndexRestrictions();
+    private final IndexRestrictions filterRestrictions = new IndexRestrictions();
 
     /**
      * <code>true</code> if the secondary index need to be queried, <code>false</code> otherwise
@@ -104,15 +104,15 @@
      */
     public static StatementRestrictions empty(StatementType type, CFMetaData cfm)
     {
-        return new StatementRestrictions(type, cfm);
+        return new StatementRestrictions(type, cfm, false);
     }
 
-    private StatementRestrictions(StatementType type, CFMetaData cfm)
+    private StatementRestrictions(StatementType type, CFMetaData cfm, boolean allowFiltering)
     {
         this.type = type;
         this.cfm = cfm;
-        this.partitionKeyRestrictions = new PrimaryKeyRestrictionSet(cfm.getKeyValidatorAsClusteringComparator(), true);
-        this.clusteringColumnsRestrictions = new PrimaryKeyRestrictionSet(cfm.comparator, false);
+        this.partitionKeyRestrictions = new PartitionKeySingleRestrictionSet(cfm.getKeyValidatorAsClusteringComparator());
+        this.clusteringColumnsRestrictions = new ClusteringColumnRestrictions(cfm, allowFiltering);
         this.nonPrimaryKeyRestrictions = new RestrictionSet();
         this.notNullColumns = new HashSet<>();
     }
@@ -122,16 +122,21 @@
                                  WhereClause whereClause,
                                  VariableSpecifications boundNames,
                                  boolean selectsOnlyStaticColumns,
-                                 boolean selectACollection,
+                                 boolean selectsComplexColumn,
                                  boolean allowFiltering,
-                                 boolean forView) throws InvalidRequestException
+                                 boolean forView)
     {
-        this.type = type;
-        this.cfm = cfm;
-        this.partitionKeyRestrictions = new PrimaryKeyRestrictionSet(cfm.getKeyValidatorAsClusteringComparator(), true);
-        this.clusteringColumnsRestrictions = new PrimaryKeyRestrictionSet(cfm.comparator, false);
-        this.nonPrimaryKeyRestrictions = new RestrictionSet();
-        this.notNullColumns = new HashSet<>();
+        this(type, cfm, allowFiltering);
+
+
+        ColumnFamilyStore cfs;
+        SecondaryIndexManager secondaryIndexManager = null;
+
+        if (type.allowUseOfSecondaryIndices())
+        {
+            cfs = Keyspace.open(cfm.ksName).getColumnFamilyStore(cfm.cfName);
+            secondaryIndexManager = cfs.indexManager;
+        }
 
         /*
          * WHERE clause. For a given entity, rules are:
@@ -152,6 +157,17 @@
                 for (ColumnDefinition def : relation.toRestriction(cfm, boundNames).getColumnDefs())
                     this.notNullColumns.add(def);
             }
+            else if (relation.isLIKE())
+            {
+                Restriction restriction = relation.toRestriction(cfm, boundNames);
+
+                if (!type.allowUseOfSecondaryIndices() || !restriction.hasSupportingIndex(secondaryIndexManager))
+                    throw new InvalidRequestException(String.format("LIKE restriction is only supported on properly " +
+                                                                    "indexed columns. %s is not valid.",
+                                                                    relation.toString()));
+
+                addRestriction(restriction);
+            }
             else
             {
                 addRestriction(relation.toRestriction(cfm, boundNames));
@@ -163,14 +179,11 @@
 
         if (type.allowUseOfSecondaryIndices())
         {
-            ColumnFamilyStore cfs = Keyspace.open(cfm.ksName).getColumnFamilyStore(cfm.cfName);
-            SecondaryIndexManager secondaryIndexManager = cfs.indexManager;
-
             if (whereClause.containsCustomExpressions())
                 processCustomIndexExpressions(whereClause.expressions, boundNames, secondaryIndexManager);
 
             hasQueriableClusteringColumnIndex = clusteringColumnsRestrictions.hasSupportingIndex(secondaryIndexManager);
-            hasQueriableIndex = !indexRestrictions.getCustomIndexExpressions().isEmpty()
+            hasQueriableIndex = !filterRestrictions.getCustomIndexExpressions().isEmpty()
                     || hasQueriableClusteringColumnIndex
                     || partitionKeyRestrictions.hasSupportingIndex(secondaryIndexManager)
                     || nonPrimaryKeyRestrictions.hasSupportingIndex(secondaryIndexManager);
@@ -182,7 +195,7 @@
         // Some but not all of the partition key columns have been specified;
         // hence we need turn these restrictions into a row filter.
         if (usesSecondaryIndexing)
-            indexRestrictions.add(partitionKeyRestrictions);
+            filterRestrictions.add(partitionKeyRestrictions);
 
         if (selectsOnlyStaticColumns && hasClusteringColumnsRestriction())
         {
@@ -203,16 +216,18 @@
                 throw invalidRequest("Cannot restrict clustering columns when selecting only static columns");
         }
 
-        processClusteringColumnsRestrictions(hasQueriableIndex, selectsOnlyStaticColumns, selectACollection, forView);
+        processClusteringColumnsRestrictions(hasQueriableIndex,
+                                             selectsOnlyStaticColumns,
+                                             selectsComplexColumn,
+                                             forView,
+                                             allowFiltering);
 
         // Covers indexes on the first clustering column (among others).
         if (isKeyRange && hasQueriableClusteringColumnIndex)
             usesSecondaryIndexing = true;
 
-        usesSecondaryIndexing = usesSecondaryIndexing || clusteringColumnsRestrictions.isContains();
-
-        if (usesSecondaryIndexing)
-            indexRestrictions.add(clusteringColumnsRestrictions);
+        if (usesSecondaryIndexing || clusteringColumnsRestrictions.needFiltering())
+            filterRestrictions.add(clusteringColumnsRestrictions);
 
         // Even if usesSecondaryIndexing is false at this point, we'll still have to use one if
         // there is restrictions not covered by the PK.
@@ -231,7 +246,7 @@
             else if (!allowFiltering)
                 throw invalidRequest(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE);
 
-            indexRestrictions.add(nonPrimaryKeyRestrictions);
+            filterRestrictions.add(nonPrimaryKeyRestrictions);
         }
 
         if (usesSecondaryIndexing)
@@ -240,12 +255,13 @@
 
     private void addRestriction(Restriction restriction)
     {
-        if (restriction.isMultiColumn())
-            clusteringColumnsRestrictions = clusteringColumnsRestrictions.mergeWith(restriction);
-        else if (restriction.isOnToken())
+        ColumnDefinition def = restriction.getFirstColumn();
+        if (def.isPartitionKey())
             partitionKeyRestrictions = partitionKeyRestrictions.mergeWith(restriction);
+        else if (def.isClusteringColumn())
+            clusteringColumnsRestrictions = clusteringColumnsRestrictions.mergeWith(restriction);
         else
-            addSingleColumnRestriction((SingleColumnRestriction) restriction);
+            nonPrimaryKeyRestrictions = nonPrimaryKeyRestrictions.addRestriction((SingleRestriction) restriction);
     }
 
     public void addFunctionsTo(List<Function> functions)
@@ -255,15 +271,10 @@
         nonPrimaryKeyRestrictions.addFunctionsTo(functions);
     }
 
-    private void addSingleColumnRestriction(SingleColumnRestriction restriction)
+    // may be used by QueryHandler implementations
+    public IndexRestrictions getIndexRestrictions()
     {
-        ColumnDefinition def = restriction.columnDef;
-        if (def.isPartitionKey())
-            partitionKeyRestrictions = partitionKeyRestrictions.mergeWith(restriction);
-        else if (def.isClusteringColumn())
-            clusteringColumnsRestrictions = clusteringColumnsRestrictions.mergeWith(restriction);
-        else
-            nonPrimaryKeyRestrictions = nonPrimaryKeyRestrictions.addRestriction(restriction);
+        return filterRestrictions;
     }
 
     /**
@@ -274,7 +285,7 @@
     public Set<ColumnDefinition> nonPKRestrictedColumns(boolean includeNotNullRestrictions)
     {
         Set<ColumnDefinition> columns = new HashSet<>();
-        for (Restrictions r : indexRestrictions.getRestrictions())
+        for (Restrictions r : filterRestrictions.getRestrictions())
         {
             for (ColumnDefinition def : r.getColumnDefs())
                 if (!def.isPrimaryKeyColumn())
@@ -317,14 +328,14 @@
     }
 
     /**
-     * Checks if the restrictions on the partition key is an IN restriction.
+     * Checks if the restrictions on the partition key has IN restrictions.
      *
-     * @return <code>true</code> the restrictions on the partition key is an IN restriction, <code>false</code>
+     * @return <code>true</code> the restrictions on the partition key has an IN restriction, <code>false</code>
      * otherwise.
      */
     public boolean keyIsInRelation()
     {
-        return partitionKeyRestrictions.isIN();
+        return partitionKeyRestrictions.hasIN();
     }
 
     /**
@@ -435,16 +446,15 @@
      * @param hasQueriableIndex <code>true</code> if some of the queried data are indexed, <code>false</code> otherwise
      * @param selectsOnlyStaticColumns <code>true</code> if the selected or modified columns are all statics,
      * <code>false</code> otherwise.
-     * @param selectACollection <code>true</code> if the query should return a collection column
+     * @param selectsComplexColumn <code>true</code> if the query should return a collection column
      */
     private void processClusteringColumnsRestrictions(boolean hasQueriableIndex,
                                                       boolean selectsOnlyStaticColumns,
-                                                      boolean selectACollection,
-                                                      boolean forView) throws InvalidRequestException
+                                                      boolean selectsComplexColumn,
+                                                      boolean forView,
+                                                      boolean allowFiltering)
     {
-        validateClusteringRestrictions(hasQueriableIndex);
-
-        checkFalse(!type.allowClusteringColumnSlices() && clusteringColumnsRestrictions.isSlice(),
+        checkFalse(!type.allowClusteringColumnSlices() && clusteringColumnsRestrictions.hasSlice(),
                    "Slice restrictions are not supported on the clustering columns in %s statements", type);
 
         if (!type.allowClusteringColumnSlices()
@@ -456,74 +466,40 @@
         }
         else
         {
-            checkFalse(clusteringColumnsRestrictions.isIN() && selectACollection,
+            checkFalse(clusteringColumnsRestrictions.hasIN() && selectsComplexColumn,
                        "Cannot restrict clustering columns by IN relations when a collection is selected by the query");
-            checkFalse(clusteringColumnsRestrictions.isContains() && !hasQueriableIndex,
-                       "Cannot restrict clustering columns by a CONTAINS relation without a secondary index");
+            checkFalse(clusteringColumnsRestrictions.hasContains() && !hasQueriableIndex && !allowFiltering,
 
-            if (hasClusteringColumnsRestriction() && clusteringRestrictionsNeedFiltering())
+                       "Clustering columns can only be restricted with CONTAINS with a secondary index or filtering");
+
+            if (hasClusteringColumnsRestriction() && clusteringColumnsRestrictions.needFiltering())
             {
                 if (hasQueriableIndex || forView)
                 {
                     usesSecondaryIndexing = true;
-                    return;
                 }
-
-                List<ColumnDefinition> clusteringColumns = cfm.clusteringColumns();
-                List<ColumnDefinition> restrictedColumns = new LinkedList<>(clusteringColumnsRestrictions.getColumnDefs());
-
-                for (int i = 0, m = restrictedColumns.size(); i < m; i++)
+                else
                 {
-                    ColumnDefinition clusteringColumn = clusteringColumns.get(i);
-                    ColumnDefinition restrictedColumn = restrictedColumns.get(i);
+                    List<ColumnDefinition> clusteringColumns = cfm.clusteringColumns();
+                    List<ColumnDefinition> restrictedColumns = new LinkedList<>(clusteringColumnsRestrictions.getColumnDefs());
 
-                    if (!clusteringColumn.equals(restrictedColumn))
+                    for (int i = 0, m = restrictedColumns.size(); i < m; i++)
                     {
-                        throw invalidRequest(
-                           "PRIMARY KEY column \"%s\" cannot be restricted as preceding column \"%s\" is not restricted",
-                            restrictedColumn.name,
-                            clusteringColumn.name);
+                        ColumnDefinition clusteringColumn = clusteringColumns.get(i);
+                        ColumnDefinition restrictedColumn = restrictedColumns.get(i);
+
+                        if (!clusteringColumn.equals(restrictedColumn) && !allowFiltering)
+                        {
+                            throw invalidRequest("PRIMARY KEY column \"%s\" cannot be restricted as preceding column \"%s\" is not restricted",
+                                                 restrictedColumn.name,
+                                                 clusteringColumn.name);
+                        }
                     }
                 }
             }
+
         }
-    }
 
-    /**
-     * Validates whether or not restrictions are allowed for execution when secondary index is not used.
-     */
-    public final void validateClusteringRestrictions(boolean hasQueriableIndex)
-    {
-        assert clusteringColumnsRestrictions instanceof PrimaryKeyRestrictionSet;
-
-        // If there's a queriable index, filtering will take care of clustering restrictions
-        if (hasQueriableIndex)
-            return;
-
-        Iterator<Restriction> iter = ((PrimaryKeyRestrictionSet) clusteringColumnsRestrictions).iterator();
-        Restriction previousRestriction = null;
-        while (iter.hasNext())
-        {
-            Restriction restriction = iter.next();
-
-            if (previousRestriction != null)
-            {
-                ColumnDefinition lastRestrictionStart = previousRestriction.getFirstColumn();
-                ColumnDefinition newRestrictionStart = restriction.getFirstColumn();
-
-                if (previousRestriction.isSlice() && newRestrictionStart.position() > lastRestrictionStart.position())
-                    throw invalidRequest("Clustering column \"%s\" cannot be restricted (preceding column \"%s\" is restricted by a non-EQ relation)",
-                                         newRestrictionStart.name,
-                                         lastRestrictionStart.name);
-            }
-            previousRestriction = restriction;
-        }
-    }
-
-    public final boolean clusteringRestrictionsNeedFiltering()
-    {
-        assert clusteringColumnsRestrictions instanceof PrimaryKeyRestrictionSet;
-        return ((PrimaryKeyRestrictionSet) clusteringColumnsRestrictions).needsFiltering();
     }
 
     /**
@@ -580,19 +556,19 @@
 
         expression.prepareValue(cfm, expressionType, boundNames);
 
-        indexRestrictions.add(expression);
+        filterRestrictions.add(expression);
     }
 
     public RowFilter getRowFilter(SecondaryIndexManager indexManager, QueryOptions options)
     {
-        if (indexRestrictions.isEmpty())
+        if (filterRestrictions.isEmpty())
             return RowFilter.NONE;
 
         RowFilter filter = RowFilter.create();
-        for (Restrictions restrictions : indexRestrictions.getRestrictions())
+        for (Restrictions restrictions : filterRestrictions.getRestrictions())
             restrictions.addRowFilterTo(filter, indexManager, options);
 
-        for (CustomIndexExpression expression : indexRestrictions.getCustomIndexExpressions())
+        for (CustomIndexExpression expression : filterRestrictions.getCustomIndexExpressions())
             expression.addToRowFilter(filter, cfm, options);
 
         return filter;
@@ -744,24 +720,12 @@
      * @param options the query options
      * @return the bounds (start or end) of the clustering columns
      */
-    public NavigableSet<Slice.Bound> getClusteringColumnsBounds(Bound b, QueryOptions options)
+    public NavigableSet<ClusteringBound> getClusteringColumnsBounds(Bound b, QueryOptions options)
     {
         return clusteringColumnsRestrictions.boundsAsClustering(b, options);
     }
 
     /**
-     * Checks if the bounds (start or end) of the clustering columns are inclusive.
-     *
-     * @param bound the bound type
-     * @return <code>true</code> if the bounds (start or end) of the clustering columns are inclusive,
-     * <code>false</code> otherwise
-     */
-    public boolean areRequestedBoundsInclusive(Bound bound)
-    {
-        return clusteringColumnsRestrictions.isInclusive(bound);
-    }
-
-    /**
      * Checks if the query returns a range of columns.
      *
      * @return <code>true</code> if the query returns a range of columns, <code>false</code> otherwise.
@@ -772,9 +736,9 @@
         // this would mean a 'SELECT *' on a static compact table would query whole partitions, even though we'll only return
         // the static part as far as CQL is concerned. This is thus mostly an optimization to use the query-by-name path).
         int numberOfClusteringColumns = cfm.isStaticCompactTable() ? 0 : cfm.clusteringColumns().size();
-        // it is a range query if it has at least one the column alias for which no relation is defined or is not EQ.
+        // it is a range query if it has at least one the column alias for which no relation is defined or is not EQ or IN.
         return clusteringColumnsRestrictions.size() < numberOfClusteringColumns
-            || (!clusteringColumnsRestrictions.isEQ() && !clusteringColumnsRestrictions.isIN());
+            || !clusteringColumnsRestrictions.hasOnlyEqualityRestrictions();
     }
 
     /**
@@ -783,8 +747,8 @@
      */
     public boolean needFiltering()
     {
-        int numberOfRestrictions = indexRestrictions.getCustomIndexExpressions().size();
-        for (Restrictions restrictions : indexRestrictions.getRestrictions())
+        int numberOfRestrictions = filterRestrictions.getCustomIndexExpressions().size();
+        for (Restrictions restrictions : filterRestrictions.getRestrictions())
             numberOfRestrictions += restrictions.size();
 
         return numberOfRestrictions > 1
@@ -813,10 +777,10 @@
     public boolean hasAllPKColumnsRestrictedByEqualities()
     {
         return !isPartitionKeyRestrictionsOnToken()
-               && !hasUnrestrictedPartitionKeyComponents()
-               && (partitionKeyRestrictions.isEQ() || partitionKeyRestrictions.isIN())
-               && !hasUnrestrictedClusteringColumns()
-               && (clusteringColumnsRestrictions.isEQ() || clusteringColumnsRestrictions.isIN());
+                && !hasUnrestrictedPartitionKeyComponents()
+                && (partitionKeyRestrictions.hasOnlyEqualityRestrictions())
+                && !hasUnrestrictedClusteringColumns()
+                && (clusteringColumnsRestrictions.hasOnlyEqualityRestrictions());
     }
 
 }

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/TokenFilter.java b/src/java/org/apache/cassandra/cql3/restrictions/TokenFilter.java
index 3258b26..66e0cfb 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/TokenFilter.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/TokenFilter.java

@@ -25,12 +25,15 @@
 import com.google.common.collect.Range;
 import com.google.common.collect.RangeSet;
 
+import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.QueryOptions;
+import org.apache.cassandra.cql3.functions.Function;
 import org.apache.cassandra.cql3.statements.Bound;
-import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.filter.RowFilter;
 import org.apache.cassandra.dht.IPartitioner;
 import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.exceptions.InvalidRequestException;
+import org.apache.cassandra.index.SecondaryIndexManager;
 
 import static org.apache.cassandra.cql3.statements.Bound.END;
 import static org.apache.cassandra.cql3.statements.Bound.START;
@@ -38,12 +41,12 @@
 /**
  * <code>Restriction</code> decorator used to merge non-token restriction and token restriction on partition keys.
  */
-final class TokenFilter extends ForwardingPrimaryKeyRestrictions
+final class TokenFilter implements PartitionKeyRestrictions
 {
     /**
      * The decorated restriction
      */
-    private PrimaryKeyRestrictions restrictions;
+    private PartitionKeyRestrictions restrictions;
 
     /**
      * The restriction on the token
@@ -55,10 +58,14 @@
      */
     private final IPartitioner partitioner;
 
-    @Override
-    protected PrimaryKeyRestrictions getDelegate()
+    public boolean hasIN()
     {
-        return restrictions;
+        return isOnToken() ? false : restrictions.hasIN();
+    }
+
+    public boolean hasOnlyEqualityRestrictions()
+    {
+        return isOnToken() ? false : restrictions.hasOnlyEqualityRestrictions();
     }
 
     @Override
@@ -69,7 +76,7 @@
         return restrictions.size() < tokenRestriction.size();
     }
 
-    public TokenFilter(PrimaryKeyRestrictions restrictions, TokenRestriction tokenRestriction)
+    public TokenFilter(PartitionKeyRestrictions restrictions, TokenRestriction tokenRestriction)
     {
         this.restrictions = restrictions;
         this.tokenRestriction = tokenRestriction;
@@ -83,18 +90,12 @@
     }
 
     @Override
-    public NavigableSet<Clustering> valuesAsClustering(QueryOptions options) throws InvalidRequestException
-    {
-        throw new UnsupportedOperationException();
-    }
-
-    @Override
-    public PrimaryKeyRestrictions mergeWith(Restriction restriction) throws InvalidRequestException
+    public PartitionKeyRestrictions mergeWith(Restriction restriction) throws InvalidRequestException
     {
         if (restriction.isOnToken())
             return new TokenFilter(restrictions, (TokenRestriction) tokenRestriction.mergeWith(restriction));
 
-        return new TokenFilter(super.mergeWith(restriction), tokenRestriction);
+        return new TokenFilter(restrictions.mergeWith(restriction), tokenRestriction);
     }
 
     @Override
@@ -115,12 +116,6 @@
         return tokenRestriction.bounds(bound, options);
     }
 
-    @Override
-    public NavigableSet<Slice.Bound> boundsAsClustering(Bound bound, QueryOptions options) throws InvalidRequestException
-    {
-        return tokenRestriction.boundsAsClustering(bound, options);
-    }
-
     /**
      * Filter the values returned by the restriction.
      *
@@ -233,4 +228,52 @@
     {
         return inclusive ? BoundType.CLOSED : BoundType.OPEN;
     }
+
+    @Override
+    public ColumnDefinition getFirstColumn()
+    {
+        return restrictions.getFirstColumn();
+    }
+
+    @Override
+    public ColumnDefinition getLastColumn()
+    {
+        return restrictions.getLastColumn();
+    }
+
+    @Override
+    public List<ColumnDefinition> getColumnDefs()
+    {
+        return restrictions.getColumnDefs();
+    }
+
+    @Override
+    public void addFunctionsTo(List<Function> functions)
+    {
+        restrictions.addFunctionsTo(functions);
+    }
+
+    @Override
+    public boolean hasSupportingIndex(SecondaryIndexManager indexManager)
+    {
+        return restrictions.hasSupportingIndex(indexManager);
+    }
+
+    @Override
+    public void addRowFilterTo(RowFilter filter, SecondaryIndexManager indexManager, QueryOptions options)
+    {
+        restrictions.addRowFilterTo(filter, indexManager, options);
+    }
+
+    @Override
+    public boolean isEmpty()
+    {
+        return restrictions.isEmpty();
+    }
+
+    @Override
+    public int size()
+    {
+        return restrictions.size();
+    }
 }

diff --git a/src/java/org/apache/cassandra/cql3/restrictions/TokenRestriction.java b/src/java/org/apache/cassandra/cql3/restrictions/TokenRestriction.java
index 14d2cb7..a71c64c 100644
--- a/src/java/org/apache/cassandra/cql3/restrictions/TokenRestriction.java
+++ b/src/java/org/apache/cassandra/cql3/restrictions/TokenRestriction.java

@@ -28,9 +28,6 @@
 import org.apache.cassandra.cql3.Term;
 import org.apache.cassandra.cql3.functions.Function;
 import org.apache.cassandra.cql3.statements.Bound;
-import org.apache.cassandra.db.Clustering;
-import org.apache.cassandra.db.MultiCBuilder;
-import org.apache.cassandra.db.Slice;
 import org.apache.cassandra.db.filter.RowFilter;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.index.SecondaryIndexManager;
@@ -40,14 +37,14 @@
 /**
  * <code>Restriction</code> using the token function.
  */
-public abstract class TokenRestriction extends AbstractPrimaryKeyRestrictions
+public abstract class TokenRestriction implements PartitionKeyRestrictions
 {
     /**
      * The definition of the columns to which apply the token restriction.
      */
     protected final List<ColumnDefinition> columnDefs;
 
-    final CFMetaData metadata;
+    protected final CFMetaData metadata;
 
     /**
      * Creates a new <code>TokenRestriction</code> that apply to the specified columns.
@@ -56,11 +53,25 @@
      */
     public TokenRestriction(CFMetaData metadata, List<ColumnDefinition> columnDefs)
     {
-        super(metadata.getKeyValidatorAsClusteringComparator());
         this.columnDefs = columnDefs;
         this.metadata = metadata;
     }
 
+    public boolean isSlice()
+    {
+        return false;
+    }
+
+    public boolean hasIN()
+    {
+        return false;
+    }
+
+    public boolean hasOnlyEqualityRestrictions()
+    {
+        return false;
+    }
+
     @Override
     public  boolean isOnToken()
     {
@@ -98,21 +109,15 @@
     }
 
     @Override
-    public MultiCBuilder appendTo(MultiCBuilder builder, QueryOptions options)
+    public final boolean isEmpty()
     {
-        throw new UnsupportedOperationException();
+        return getColumnDefs().isEmpty();
     }
 
     @Override
-    public NavigableSet<Clustering> valuesAsClustering(QueryOptions options) throws InvalidRequestException
+    public final int size()
     {
-        throw new UnsupportedOperationException();
-    }
-
-    @Override
-    public NavigableSet<Slice.Bound> boundsAsClustering(Bound bound, QueryOptions options) throws InvalidRequestException
-    {
-        throw new UnsupportedOperationException();
+        return getColumnDefs().size();
     }
 
     /**
@@ -126,10 +131,10 @@
     }
 
     @Override
-    public final PrimaryKeyRestrictions mergeWith(Restriction otherRestriction) throws InvalidRequestException
+    public final PartitionKeyRestrictions mergeWith(Restriction otherRestriction) throws InvalidRequestException
     {
         if (!otherRestriction.isOnToken())
-            return new TokenFilter(toPrimaryKeyRestriction(otherRestriction), this);
+            return new TokenFilter(toPartitionKeyRestrictions(otherRestriction), this);
 
         return doMergeWith((TokenRestriction) otherRestriction);
     }
@@ -138,21 +143,21 @@
      * Merges this restriction with the specified <code>TokenRestriction</code>.
      * @param otherRestriction the <code>TokenRestriction</code> to merge with.
      */
-    protected abstract PrimaryKeyRestrictions doMergeWith(TokenRestriction otherRestriction) throws InvalidRequestException;
+    protected abstract PartitionKeyRestrictions doMergeWith(TokenRestriction otherRestriction) throws InvalidRequestException;
 
     /**
-     * Converts the specified restriction into a <code>PrimaryKeyRestrictions</code>.
+     * Converts the specified restriction into a <code>PartitionKeyRestrictions</code>.
      *
      * @param restriction the restriction to convert
-     * @return a <code>PrimaryKeyRestrictions</code>
+     * @return a <code>PartitionKeyRestrictions</code>
      * @throws InvalidRequestException if a problem occurs while converting the restriction
      */
-    private PrimaryKeyRestrictions toPrimaryKeyRestriction(Restriction restriction) throws InvalidRequestException
+    private PartitionKeyRestrictions toPartitionKeyRestrictions(Restriction restriction) throws InvalidRequestException
     {
-        if (restriction instanceof PrimaryKeyRestrictions)
-            return (PrimaryKeyRestrictions) restriction;
+        if (restriction instanceof PartitionKeyRestrictions)
+            return (PartitionKeyRestrictions) restriction;
 
-        return new PrimaryKeyRestrictionSet(comparator, true).mergeWith(restriction);
+        return new PartitionKeySingleRestrictionSet(metadata.getKeyValidatorAsClusteringComparator()).mergeWith(restriction);
     }
 
     public static final class EQRestriction extends TokenRestriction
@@ -166,25 +171,37 @@
         }
 
         @Override
-        public boolean isEQ()
-        {
-            return true;
-        }
-
-        @Override
         public void addFunctionsTo(List<Function> functions)
         {
             value.addFunctionsTo(functions);
         }
 
         @Override
-        protected PrimaryKeyRestrictions doMergeWith(TokenRestriction otherRestriction) throws InvalidRequestException
+        protected PartitionKeyRestrictions doMergeWith(TokenRestriction otherRestriction) throws InvalidRequestException
         {
             throw invalidRequest("%s cannot be restricted by more than one relation if it includes an Equal",
                                  Joiner.on(", ").join(ColumnDefinition.toIdentifiers(columnDefs)));
         }
 
         @Override
+        public List<ByteBuffer> bounds(Bound b, QueryOptions options) throws InvalidRequestException
+        {
+            return values(options);
+        }
+
+        @Override
+        public boolean hasBound(Bound b)
+        {
+            return true;
+        }
+
+        @Override
+        public boolean isInclusive(Bound b)
+        {
+            return true;
+        }
+
+        @Override
         public List<ByteBuffer> values(QueryOptions options) throws InvalidRequestException
         {
             return Collections.singletonList(value.bindAndGet(options));
@@ -238,10 +255,10 @@
         }
 
         @Override
-        protected PrimaryKeyRestrictions doMergeWith(TokenRestriction otherRestriction)
+        protected PartitionKeyRestrictions doMergeWith(TokenRestriction otherRestriction)
         throws InvalidRequestException
         {
-            if (!otherRestriction.isSlice())
+            if (!(otherRestriction instanceof SliceRestriction))
                 throw invalidRequest("Columns \"%s\" cannot be restricted by both an equality and an inequality relation",
                                      getColumnNamesAsString());
 

diff --git a/src/java/org/apache/cassandra/cql3/selection/AbstractFunctionSelector.java b/src/java/org/apache/cassandra/cql3/selection/AbstractFunctionSelector.java
index c48b93c..498cf0f 100644
--- a/src/java/org/apache/cassandra/cql3/selection/AbstractFunctionSelector.java
+++ b/src/java/org/apache/cassandra/cql3/selection/AbstractFunctionSelector.java

@@ -22,12 +22,11 @@
 import java.util.List;
 
 import org.apache.commons.lang3.text.StrBuilder;
-
-import org.apache.cassandra.cql3.functions.AggregateFcts;
-
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.ColumnSpecification;
+import org.apache.cassandra.cql3.QueryOptions;
 import org.apache.cassandra.cql3.functions.Function;
+import org.apache.cassandra.cql3.statements.RequestValidations;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 
@@ -39,7 +38,7 @@
      * The list used to pass the function arguments is recycled to avoid the cost of instantiating a new list
      * with each function call.
      */
-    protected final List<ByteBuffer> args;
+    private final List<ByteBuffer> args;
     protected final List<Selector> argSelectors;
 
     public static Factory newFactory(final Function fun, final SelectorFactories factories) throws InvalidRequestException
@@ -54,13 +53,7 @@
         {
             protected String getColumnName()
             {
-                if (AggregateFcts.isCountRows(fun))
-                    return "count";
-
-                return new StrBuilder(fun.name().toString()).append('(')
-                                                            .appendWithSeparators(factories.getColumnNames(), ", ")
-                                                            .append(')')
-                                                            .toString();
+                return fun.columnName(factories.getColumnNames());
             }
 
             protected AbstractType<?> getReturnType()
@@ -89,10 +82,10 @@
                 factories.addFunctionsTo(functions);
             }
 
-            public Selector newInstance() throws InvalidRequestException
+            public Selector newInstance(QueryOptions options) throws InvalidRequestException
             {
-                return fun.isAggregate() ? new AggregateFunctionSelector(fun, factories.newInstances())
-                                         : new ScalarFunctionSelector(fun, factories.newInstances());
+                return fun.isAggregate() ? new AggregateFunctionSelector(fun, factories.newInstances(options))
+                                         : new ScalarFunctionSelector(fun, factories.newInstances(options));
             }
 
             public boolean isWritetimeSelectorFactory()
@@ -119,6 +112,19 @@
         this.args = Arrays.asList(new ByteBuffer[argSelectors.size()]);
     }
 
+    // Sets a given arg value. We should use that instead of directly setting the args list for the
+    // sake of validation.
+    protected void setArg(int i, ByteBuffer value) throws InvalidRequestException
+    {
+        RequestValidations.checkBindValueSet(value, "Invalid unset value for argument in call to function %s", fun.name().name);
+        args.set(i, value);
+    }
+
+    protected List<ByteBuffer> args()
+    {
+        return args;
+    }
+
     public AbstractType<?> getType()
     {
         return fun.returnType();

diff --git a/src/java/org/apache/cassandra/cql3/selection/AggregateFunctionSelector.java b/src/java/org/apache/cassandra/cql3/selection/AggregateFunctionSelector.java
index 27a8294..d768665 100644
--- a/src/java/org/apache/cassandra/cql3/selection/AggregateFunctionSelector.java
+++ b/src/java/org/apache/cassandra/cql3/selection/AggregateFunctionSelector.java

@@ -41,10 +41,10 @@
         {
             Selector s = argSelectors.get(i);
             s.addInput(protocolVersion, rs);
-            args.set(i, s.getOutput(protocolVersion));
+            setArg(i, s.getOutput(protocolVersion));
             s.reset();
         }
-        this.aggregate.addInput(protocolVersion, args);
+        this.aggregate.addInput(protocolVersion, args());
     }
 
     public ByteBuffer getOutput(int protocolVersion) throws InvalidRequestException

diff --git a/src/java/org/apache/cassandra/cql3/selection/FieldSelector.java b/src/java/org/apache/cassandra/cql3/selection/FieldSelector.java
index 63b6cc6..55ff50f 100644
--- a/src/java/org/apache/cassandra/cql3/selection/FieldSelector.java
+++ b/src/java/org/apache/cassandra/cql3/selection/FieldSelector.java

@@ -20,6 +20,7 @@
 import java.nio.ByteBuffer;
 
 import org.apache.cassandra.cql3.ColumnSpecification;
+import org.apache.cassandra.cql3.QueryOptions;
 import org.apache.cassandra.cql3.selection.Selection.ResultSetBuilder;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.UTF8Type;
@@ -38,9 +39,7 @@
         {
             protected String getColumnName()
             {
-                return String.format("%s.%s",
-                                     factory.getColumnName(),
-                                     UTF8Type.instance.getString(type.fieldName(field)));
+                return String.format("%s.%s", factory.getColumnName(), type.fieldName(field));
             }
 
             protected AbstractType<?> getReturnType()
@@ -53,9 +52,9 @@
                 factory.addColumnMapping(mapping, resultsColumn);
             }
 
-            public Selector newInstance() throws InvalidRequestException
+            public Selector newInstance(QueryOptions options) throws InvalidRequestException
             {
-                return new FieldSelector(type, field, factory.newInstance());
+                return new FieldSelector(type, field, factory.newInstance(options));
             }
 
             public boolean isAggregateSelectorFactory()
@@ -65,11 +64,6 @@
         };
     }
 
-    public boolean isAggregate()
-    {
-        return false;
-    }
-
     public void addInput(int protocolVersion, ResultSetBuilder rs) throws InvalidRequestException
     {
         selected.addInput(protocolVersion, rs);
@@ -97,7 +91,7 @@
     @Override
     public String toString()
     {
-        return String.format("%s.%s", selected, UTF8Type.instance.getString(type.fieldName(field)));
+        return String.format("%s.%s", selected, type.fieldName(field));
     }
 
     private FieldSelector(UserType type, int field, Selector selected)

diff --git a/src/java/org/apache/cassandra/cql3/selection/ScalarFunctionSelector.java b/src/java/org/apache/cassandra/cql3/selection/ScalarFunctionSelector.java
index bb56bb8..50175c1 100644
--- a/src/java/org/apache/cassandra/cql3/selection/ScalarFunctionSelector.java
+++ b/src/java/org/apache/cassandra/cql3/selection/ScalarFunctionSelector.java

@@ -54,14 +54,14 @@
         for (int i = 0, m = argSelectors.size(); i < m; i++)
         {
             Selector s = argSelectors.get(i);
-            args.set(i, s.getOutput(protocolVersion));
+            setArg(i, s.getOutput(protocolVersion));
             s.reset();
         }
-        return fun.execute(protocolVersion, args);
+        return fun.execute(protocolVersion, args());
     }
 
     ScalarFunctionSelector(Function fun, List<Selector> argSelectors)
     {
         super((ScalarFunction) fun, argSelectors);
     }
-}
\ No newline at end of file
+}

diff --git a/src/java/org/apache/cassandra/cql3/selection/Selectable.java b/src/java/org/apache/cassandra/cql3/selection/Selectable.java
index 717fe7c..1f1f07b 100644
--- a/src/java/org/apache/cassandra/cql3/selection/Selectable.java
+++ b/src/java/org/apache/cassandra/cql3/selection/Selectable.java

@@ -19,24 +19,42 @@
 package org.apache.cassandra.cql3.selection;
 
 import java.util.ArrayList;
+import java.util.Collections;
 import java.util.List;
+import java.nio.ByteBuffer;
 
 import org.apache.commons.lang3.text.StrBuilder;
-
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
-import org.apache.cassandra.cql3.ColumnIdentifier;
+import org.apache.cassandra.cql3.*;
 import org.apache.cassandra.cql3.functions.*;
-import org.apache.cassandra.db.marshal.AbstractType;
-import org.apache.cassandra.db.marshal.UserType;
+import org.apache.cassandra.db.marshal.*;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 
-public abstract class Selectable
+public interface Selectable extends AssignmentTestable
 {
-    public abstract Selector.Factory newSelectorFactory(CFMetaData cfm, List<ColumnDefinition> defs)
-            throws InvalidRequestException;
+    public Selector.Factory newSelectorFactory(CFMetaData cfm, AbstractType<?> expectedType, List<ColumnDefinition> defs, VariableSpecifications boundNames);
 
-    protected static int addAndGetIndex(ColumnDefinition def, List<ColumnDefinition> l)
+    /**
+     * The type of the {@code Selectable} if it can be infered.
+     *
+     * @param keyspace the keyspace on which the statement for which this is a
+     * {@code Selectable} is on.
+     * @return the type of this {@code Selectable} if inferrable, or {@code null}
+     * otherwise (for instance, the type isn't inferable for a bind marker. Even for
+     * literals, the exact type is not inferrable since they are valid for many
+     * different types and so this will return {@code null} too).
+     */
+    public AbstractType<?> getExactTypeIfKnown(String keyspace);
+
+    // Term.Raw overrides this since some literals can be WEAKLY_ASSIGNABLE
+    default public TestResult testAssignment(String keyspace, ColumnSpecification receiver)
+    {
+        AbstractType<?> type = getExactTypeIfKnown(keyspace);
+        return type == null ? TestResult.NOT_ASSIGNABLE : type.testAssignment(keyspace, receiver);
+    }
+
+    default int addAndGetIndex(ColumnDefinition def, List<ColumnDefinition> l)
     {
         int idx = l.indexOf(def);
         if (idx < 0)
@@ -47,57 +65,160 @@
         return idx;
     }
 
-    public static interface Raw
+    public static abstract class Raw
     {
-        public Selectable prepare(CFMetaData cfm);
+        public abstract Selectable prepare(CFMetaData cfm);
 
         /**
          * Returns true if any processing is performed on the selected column.
          **/
-        public boolean processesSelection();
+        public boolean processesSelection()
+        {
+            // ColumnIdentifier is the only case that returns false and override this
+            return true;
+        }
     }
 
-    public static class WritetimeOrTTL extends Selectable
+    public static class WithTerm implements Selectable
     {
-        public final ColumnIdentifier id;
+        /**
+         * The names given to unamed bind markers found in selection. In selection clause, we often don't have a good
+         * name for bind markers, typically if you have:
+         *   SELECT (int)? FROM foo;
+         * there isn't a good name for that marker. So we give the same name to all the markers. Note that we could try
+         * to differenciate the names by using some increasing number in the name (so [selection_1], [selection_2], ...)
+         * but it's actually not trivial to do in the current code and it's not really more helpful since if users wants
+         * to bind by position (which they will have to in this case), they can do so at the driver level directly. And
+         * so we don't bother.
+         * Note that users should really be using named bind markers if they want to be able to bind by names.
+         */
+        private static final ColumnIdentifier bindMarkerNameInSelection = new ColumnIdentifier("[selection]", true);
+
+        private final Term.Raw rawTerm;
+
+        public WithTerm(Term.Raw rawTerm)
+        {
+            this.rawTerm = rawTerm;
+        }
+
+        @Override
+        public TestResult testAssignment(String keyspace, ColumnSpecification receiver)
+        {
+            return rawTerm.testAssignment(keyspace, receiver);
+        }
+
+        public Selector.Factory newSelectorFactory(CFMetaData cfm, AbstractType<?> expectedType, List<ColumnDefinition> defs, VariableSpecifications boundNames) throws InvalidRequestException
+        {
+            /*
+             * expectedType will be null if we have no constraint on what the type should be. For instance, if this term is a bind marker:
+             *   - it will be null if we do "SELECT ? FROM foo"
+             *   - it won't be null (and be LongType) if we do "SELECT bigintAsBlob(?) FROM foo" because the function constrain it.
+             *
+             * In the first case, we have to error out: we need to infer the type of the metadata of a SELECT at preparation time, which we can't
+             * here (users will have to do "SELECT (varint)? FROM foo" for instance).
+             * But in the 2nd case, we're fine and can use the expectedType to "prepare" the bind marker/collect the bound type.
+             *
+             * Further, the term might not be a bind marker, in which case we sometimes can default to some most-general type. For instance, in
+             *   SELECT 3 FROM foo
+             * we'll just default the type to 'varint' as that's the most generic type for the literal '3' (this is mostly for convenience, the query
+             * is not terribly useful in practice and use can force the type as for the bind marker case through "SELECT (int)3 FROM foo").
+             * But note that not all literals can have such default type. For instance, there is no way to infer the type of a UDT literal in a vacuum,
+             * and so we simply error out if we have something like:
+             *   SELECT { foo: 'bar' } FROM foo
+             *
+             * Lastly, note that if the term is a terminal literal, we don't have to check it's compatibility with 'expectedType' as any incompatibility
+             * would have been found at preparation time.
+             */
+            AbstractType<?> type = getExactTypeIfKnown(cfm.ksName);
+            if (type == null)
+            {
+                type = expectedType;
+                if (type == null)
+                    throw new InvalidRequestException("Cannot infer type for term " + this + " in selection clause (try using a cast to force a type)");
+            }
+
+            // The fact we default the name to "[selection]" inconditionally means that any bind marker in a
+            // selection will have this name. Which isn't terribly helpful, but it's unclear how to provide
+            // something a lot more helpful and in practice user can bind those markers by position or, even better,
+            // use bind markers.
+            Term term = rawTerm.prepare(cfm.ksName, new ColumnSpecification(cfm.ksName, cfm.cfName, bindMarkerNameInSelection, type));
+            term.collectMarkerSpecification(boundNames);
+            return TermSelector.newFactory(rawTerm.getText(), term, type);
+        }
+
+        @Override
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            return rawTerm.getExactTypeIfKnown(keyspace);
+        }
+ 
+        @Override
+        public String toString()
+        {
+            return rawTerm.toString();
+        }
+
+        public static class Raw extends Selectable.Raw
+        {
+            private final Term.Raw term;
+
+            public Raw(Term.Raw term)
+            {
+                this.term = term;
+            }
+
+            public Selectable prepare(CFMetaData cfm)
+            {
+                return new WithTerm(term);
+            }
+        }
+    }
+
+    public static class WritetimeOrTTL implements Selectable
+    {
+        public final ColumnDefinition column;
         public final boolean isWritetime;
 
-        public WritetimeOrTTL(ColumnIdentifier id, boolean isWritetime)
+        public WritetimeOrTTL(ColumnDefinition column, boolean isWritetime)
         {
-            this.id = id;
+            this.column = column;
             this.isWritetime = isWritetime;
         }
 
         @Override
         public String toString()
         {
-            return (isWritetime ? "writetime" : "ttl") + "(" + id + ")";
+            return (isWritetime ? "writetime" : "ttl") + "(" + column.name + ")";
         }
 
         public Selector.Factory newSelectorFactory(CFMetaData cfm,
-                                                   List<ColumnDefinition> defs) throws InvalidRequestException
+                                                   AbstractType<?> expectedType,
+                                                   List<ColumnDefinition> defs,
+                                                   VariableSpecifications boundNames)
         {
-            ColumnDefinition def = cfm.getColumnDefinition(id);
-            if (def == null)
-                throw new InvalidRequestException(String.format("Undefined name %s in selection clause", id));
-            if (def.isPrimaryKeyColumn())
+            if (column.isPrimaryKeyColumn())
                 throw new InvalidRequestException(
                         String.format("Cannot use selection function %s on PRIMARY KEY part %s",
                                       isWritetime ? "writeTime" : "ttl",
-                                      def.name));
-            if (def.type.isCollection())
+                                      column.name));
+            if (column.type.isCollection())
                 throw new InvalidRequestException(String.format("Cannot use selection function %s on collections",
                                                                 isWritetime ? "writeTime" : "ttl"));
 
-            return WritetimeOrTTLSelector.newFactory(def, addAndGetIndex(def, defs), isWritetime);
+            return WritetimeOrTTLSelector.newFactory(column, addAndGetIndex(column, defs), isWritetime);
         }
 
-        public static class Raw implements Selectable.Raw
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
         {
-            private final ColumnIdentifier.Raw id;
+            return isWritetime ? LongType.instance : Int32Type.instance;
+        }
+
+        public static class Raw extends Selectable.Raw
+        {
+            private final ColumnDefinition.Raw id;
             private final boolean isWritetime;
 
-            public Raw(ColumnIdentifier.Raw id, boolean isWritetime)
+            public Raw(ColumnDefinition.Raw id, boolean isWritetime)
             {
                 this.id = id;
                 this.isWritetime = isWritetime;
@@ -107,59 +228,42 @@
             {
                 return new WritetimeOrTTL(id.prepare(cfm), isWritetime);
             }
-
-            public boolean processesSelection()
-            {
-                return true;
-            }
         }
     }
 
-    public static class WithFunction extends Selectable
+    public static class WithFunction implements Selectable
     {
-        public final FunctionName functionName;
+        public final Function function;
         public final List<Selectable> args;
 
-        public WithFunction(FunctionName functionName, List<Selectable> args)
+        public WithFunction(Function function, List<Selectable> args)
         {
-            this.functionName = functionName;
+            this.function = function;
             this.args = args;
         }
 
         @Override
         public String toString()
         {
-            return new StrBuilder().append(functionName)
+            return new StrBuilder().append(function.name())
                                    .append("(")
                                    .appendWithSeparators(args, ", ")
                                    .append(")")
                                    .toString();
         }
 
-        public Selector.Factory newSelectorFactory(CFMetaData cfm,
-                                                   List<ColumnDefinition> defs) throws InvalidRequestException
+        public Selector.Factory newSelectorFactory(CFMetaData cfm, AbstractType<?> expectedType, List<ColumnDefinition> defs, VariableSpecifications boundNames)
         {
-            SelectorFactories factories  =
-                    SelectorFactories.createFactoriesAndCollectColumnDefinitions(args, cfm, defs);
-
-            // We need to circumvent the normal function lookup process for toJson() because instances of the function
-            // are not pre-declared (because it can accept any type of argument).
-            Function fun;
-            if (functionName.equalsNativeFunction(ToJsonFct.NAME))
-                fun = ToJsonFct.getInstance(factories.getReturnTypes());
-            else
-                fun = FunctionResolver.get(cfm.ksName, functionName, factories.newInstances(), cfm.ksName, cfm.cfName, null);
-
-            if (fun == null)
-                throw new InvalidRequestException(String.format("Unknown function '%s'", functionName));
-            if (fun.returnType() == null)
-                throw new InvalidRequestException(String.format("Unknown function %s called in selection clause",
-                                                                functionName));
-
-            return AbstractFunctionSelector.newFactory(fun, factories);
+            SelectorFactories factories = SelectorFactories.createFactoriesAndCollectColumnDefinitions(args, function.argTypes(), cfm, defs, boundNames);
+            return AbstractFunctionSelector.newFactory(function, factories);
         }
 
-        public static class Raw implements Selectable.Raw
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            return function.returnType();
+        }
+
+        public static class Raw extends Selectable.Raw
         {
             private final FunctionName functionName;
             private final List<Selectable.Raw> args;
@@ -170,27 +274,153 @@
                 this.args = args;
             }
 
-            public WithFunction prepare(CFMetaData cfm)
+            public static Raw newCountRowsFunction()
+            {
+                return new Raw(AggregateFcts.countRowsFunction.name(),
+                               Collections.emptyList());
+            }
+
+            public Selectable prepare(CFMetaData cfm)
             {
                 List<Selectable> preparedArgs = new ArrayList<>(args.size());
                 for (Selectable.Raw arg : args)
                     preparedArgs.add(arg.prepare(cfm));
-                return new WithFunction(functionName, preparedArgs);
-            }
 
-            public boolean processesSelection()
-            {
-                return true;
+                FunctionName name = functionName;
+                // We need to circumvent the normal function lookup process for toJson() because instances of the function
+                // are not pre-declared (because it can accept any type of argument). We also have to wait until we have the
+                // selector factories of the argument so we can access their final type.
+                if (functionName.equalsNativeFunction(ToJsonFct.NAME))
+                {
+                    return new WithToJSonFunction(preparedArgs);
+                }
+                // Also, COUNT(x) is equivalent to COUNT(*) for any non-null term x (since count(x) don't care about it's argument outside of check for nullness) and
+                // for backward compatibilty we want to support COUNT(1), but we actually have COUNT(x) method for every existing (simple) input types so currently COUNT(1)
+                // will throw as ambiguous (since 1 works for any type). So we have have to special case COUNT.
+                else if (functionName.equalsNativeFunction(FunctionName.nativeFunction("count"))
+                        && preparedArgs.size() == 1
+                        && (preparedArgs.get(0) instanceof WithTerm)
+                        && (((WithTerm)preparedArgs.get(0)).rawTerm instanceof Constants.Literal))
+                {
+                    // Note that 'null' isn't a Constants.Literal
+                    name = AggregateFcts.countRowsFunction.name();
+                    preparedArgs = Collections.emptyList();
+                }
+
+                Function fun = FunctionResolver.get(cfm.ksName, name, preparedArgs, cfm.ksName, cfm.cfName, null);
+
+                if (fun == null)
+                    throw new InvalidRequestException(String.format("Unknown function '%s'", functionName));
+
+                if (fun.returnType() == null)
+                    throw new InvalidRequestException(String.format("Unknown function %s called in selection clause", functionName));
+
+                return new WithFunction(fun, preparedArgs);
             }
         }
     }
 
-    public static class WithFieldSelection extends Selectable
+    public static class WithToJSonFunction implements Selectable
+    {
+        public final List<Selectable> args;
+
+        private WithToJSonFunction(List<Selectable> args)
+        {
+            this.args = args;
+        }
+
+        @Override
+        public String toString()
+        {
+            return new StrBuilder().append(ToJsonFct.NAME)
+                                   .append("(")
+                                   .appendWithSeparators(args, ", ")
+                                   .append(")")
+                                   .toString();
+        }
+
+        public Selector.Factory newSelectorFactory(CFMetaData cfm, AbstractType<?> expectedType, List<ColumnDefinition> defs, VariableSpecifications boundNames)
+        {
+            SelectorFactories factories = SelectorFactories.createFactoriesAndCollectColumnDefinitions(args, null, cfm, defs, boundNames);
+            Function fun = ToJsonFct.getInstance(factories.getReturnTypes());
+            return AbstractFunctionSelector.newFactory(fun, factories);
+        }
+
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            return UTF8Type.instance;
+        }
+    }
+
+    public static class WithCast implements Selectable
+    {
+        private final CQL3Type type;
+        private final Selectable arg;
+
+        public WithCast(Selectable arg, CQL3Type type)
+        {
+            this.arg = arg;
+            this.type = type;
+        }
+
+        @Override
+        public String toString()
+        {
+            return String.format("cast(%s as %s)", arg, type.toString().toLowerCase());
+        }
+
+        public Selector.Factory newSelectorFactory(CFMetaData cfm, AbstractType<?> expectedType, List<ColumnDefinition> defs, VariableSpecifications boundNames)
+        {
+            List<Selectable> args = Collections.singletonList(arg);
+            SelectorFactories factories = SelectorFactories.createFactoriesAndCollectColumnDefinitions(args, null, cfm, defs, boundNames);
+
+            Selector.Factory factory = factories.get(0);
+
+            // If the user is trying to cast a type on its own type we simply ignore it.
+            if (type.getType().equals(factory.getReturnType()))
+                return factory;
+
+            FunctionName name = FunctionName.nativeFunction(CastFcts.getFunctionName(type));
+            Function fun = FunctionResolver.get(cfm.ksName, name, args, cfm.ksName, cfm.cfName, null);
+
+            if (fun == null)
+            {
+                    throw new InvalidRequestException(String.format("%s cannot be cast to %s",
+                                                                    defs.get(0).name,
+                                                                    type));
+            }
+            return AbstractFunctionSelector.newFactory(fun, factories);
+        }
+
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            return type.getType();
+        }
+
+        public static class Raw extends Selectable.Raw
+        {
+            private final CQL3Type type;
+            private final Selectable.Raw arg;
+
+            public Raw(Selectable.Raw arg, CQL3Type type)
+            {
+                this.arg = arg;
+                this.type = type;
+            }
+
+            public WithCast prepare(CFMetaData cfm)
+            {
+                return new WithCast(arg.prepare(cfm), type);
+            }
+        }
+    }
+
+    public static class WithFieldSelection implements Selectable
     {
         public final Selectable selected;
-        public final ColumnIdentifier field;
+        public final FieldIdentifier field;
 
-        public WithFieldSelection(Selectable selected, ColumnIdentifier field)
+        public WithFieldSelection(Selectable selected, FieldIdentifier field)
         {
             this.selected = selected;
             this.field = field;
@@ -202,36 +432,49 @@
             return String.format("%s.%s", selected, field);
         }
 
-        public Selector.Factory newSelectorFactory(CFMetaData cfm,
-                                                   List<ColumnDefinition> defs) throws InvalidRequestException
+        public Selector.Factory newSelectorFactory(CFMetaData cfm, AbstractType<?> expectedType, List<ColumnDefinition> defs, VariableSpecifications boundNames)
         {
-            Selector.Factory factory = selected.newSelectorFactory(cfm, defs);
-            AbstractType<?> type = factory.newInstance().getType();
-            if (!(type instanceof UserType))
+            Selector.Factory factory = selected.newSelectorFactory(cfm, null, defs, boundNames);
+            AbstractType<?> type = factory.getColumnSpecification(cfm).type;
+            if (!type.isUDT())
+            {
                 throw new InvalidRequestException(
                         String.format("Invalid field selection: %s of type %s is not a user type",
-                                      selected,
-                                      type.asCQL3Type()));
+                                selected,
+                                type.asCQL3Type()));
+            }
 
             UserType ut = (UserType) type;
-            for (int i = 0; i < ut.size(); i++)
+            int fieldIndex = ut.fieldPosition(field);
+            if (fieldIndex == -1)
             {
-                if (!ut.fieldName(i).equals(field.bytes))
-                    continue;
-                return FieldSelector.newFactory(ut, i, factory);
+                throw new InvalidRequestException(String.format("%s of type %s has no field %s",
+                        selected, type.asCQL3Type(), field));
             }
-            throw new InvalidRequestException(String.format("%s of type %s has no field %s",
-                                                            selected,
-                                                            type.asCQL3Type(),
-                                                            field));
+
+            return FieldSelector.newFactory(ut, fieldIndex, factory);
         }
 
-        public static class Raw implements Selectable.Raw
+        public AbstractType<?> getExactTypeIfKnown(String keyspace)
+        {
+            AbstractType<?> selectedType = selected.getExactTypeIfKnown(keyspace);
+            if (selectedType == null || !(selectedType instanceof UserType))
+                return null;
+
+            UserType ut = (UserType) selectedType;
+            int fieldIndex = ut.fieldPosition(field);
+            if (fieldIndex == -1)
+                return null;
+
+            return ut.fieldType(fieldIndex);
+        }
+
+        public static class Raw extends Selectable.Raw
         {
             private final Selectable.Raw selected;
-            private final ColumnIdentifier.Raw field;
+            private final FieldIdentifier field;
 
-            public Raw(Selectable.Raw selected, ColumnIdentifier.Raw field)
+            public Raw(Selectable.Raw selected, FieldIdentifier field)
             {
                 this.selected = selected;
                 this.field = field;
@@ -239,12 +482,7 @@
 
             public WithFieldSelection prepare(CFMetaData cfm)
             {
-                return new WithFieldSelection(selected.prepare(cfm), field.prepare(cfm));
-            }
-
-            public boolean processesSelection()
-            {
-                return true;
+                return new WithFieldSelection(selected.prepare(cfm), field);
             }
         }
     }

diff --git a/src/java/org/apache/cassandra/cql3/selection/Selection.java b/src/java/org/apache/cassandra/cql3/selection/Selection.java
index 8a27314..2a11d27 100644
--- a/src/java/org/apache/cassandra/cql3/selection/Selection.java
+++ b/src/java/org/apache/cassandra/cql3/selection/Selection.java

@@ -20,6 +20,7 @@
 import java.nio.ByteBuffer;
 import java.util.*;
 
+import com.google.common.base.MoreObjects;
 import com.google.common.base.Objects;
 import com.google.common.base.Predicate;
 import com.google.common.collect.Iterables;
@@ -112,14 +113,14 @@
     }
 
     /**
-     * Checks if this selection contains a collection.
+     * Checks if this selection contains a complex column.
      *
-     * @return <code>true</code> if this selection contains a collection, <code>false</code> otherwise.
+     * @return <code>true</code> if this selection contains a multicell collection or UDT, <code>false</code> otherwise.
      */
-    public boolean containsACollection()
+    public boolean containsAComplexColumn()
     {
         for (ColumnDefinition def : getColumns())
-            if (def.type.isCollection() && def.type.isMultiCell())
+            if (def.isComplex())
                 return true;
 
         return false;
@@ -168,12 +169,12 @@
         return false;
     }
 
-    public static Selection fromSelectors(CFMetaData cfm, List<RawSelector> rawSelectors) throws InvalidRequestException
+    public static Selection fromSelectors(CFMetaData cfm, List<RawSelector> rawSelectors, VariableSpecifications boundNames) throws InvalidRequestException
     {
         List<ColumnDefinition> defs = new ArrayList<>();
 
         SelectorFactories factories =
-                SelectorFactories.createFactoriesAndCollectColumnDefinitions(RawSelector.toSelectables(rawSelectors, cfm), cfm, defs);
+                SelectorFactories.createFactoriesAndCollectColumnDefinitions(RawSelector.toSelectables(rawSelectors, cfm), null, cfm, defs, boundNames);
         SelectionColumnMapping mapping = collectColumnMappings(cfm, rawSelectors, factories);
 
         return (processesSelection(rawSelectors) || rawSelectors.size() != defs.size())
@@ -220,7 +221,7 @@
         return selectionColumns;
     }
 
-    protected abstract Selectors newSelectors() throws InvalidRequestException;
+    protected abstract Selectors newSelectors(QueryOptions options) throws InvalidRequestException;
 
     /**
      * @return the list of CQL3 columns value this SelectionClause needs.
@@ -238,9 +239,9 @@
         return columnMapping;
     }
 
-    public ResultSetBuilder resultSetBuilder(boolean isJons) throws InvalidRequestException
+    public ResultSetBuilder resultSetBuilder(QueryOptions options, boolean isJons) throws InvalidRequestException
     {
-        return new ResultSetBuilder(isJons);
+        return new ResultSetBuilder(options, isJons);
     }
 
     public abstract boolean isAggregate();
@@ -248,18 +249,45 @@
     @Override
     public String toString()
     {
-        return Objects.toStringHelper(this)
-                .add("columns", columns)
-                .add("columnMapping", columnMapping)
-                .add("metadata", metadata)
-                .add("collectTimestamps", collectTimestamps)
-                .add("collectTTLs", collectTTLs)
-                .toString();
+        return MoreObjects.toStringHelper(this)
+                          .add("columns", columns)
+                          .add("columnMapping", columnMapping)
+                          .add("metadata", metadata)
+                          .add("collectTimestamps", collectTimestamps)
+                          .add("collectTTLs", collectTTLs)
+                          .toString();
+    }
+
+    public static List<ByteBuffer> rowToJson(List<ByteBuffer> row, int protocolVersion, ResultSet.ResultMetadata metadata)
+    {
+        StringBuilder sb = new StringBuilder("{");
+        for (int i = 0; i < metadata.names.size(); i++)
+        {
+            if (i > 0)
+                sb.append(", ");
+
+            ColumnSpecification spec = metadata.names.get(i);
+            String columnName = spec.name.toString();
+            if (!columnName.equals(columnName.toLowerCase(Locale.US)))
+                columnName = "\"" + columnName + "\"";
+
+            ByteBuffer buffer = row.get(i);
+            sb.append('"');
+            sb.append(Json.quoteAsJsonString(columnName));
+            sb.append("\": ");
+            if (buffer == null)
+                sb.append("null");
+            else
+                sb.append(spec.type.toJSONString(buffer, protocolVersion));
+        }
+        sb.append("}");
+        return Collections.singletonList(UTF8Type.instance.getSerializer().serialize(sb.toString()));
     }
 
     public class ResultSetBuilder
     {
         private final ResultSet resultSet;
+        private final int protocolVersion;
 
         /**
          * As multiple thread can access a <code>Selection</code> instance each <code>ResultSetBuilder</code> will use
@@ -281,10 +309,11 @@
 
         private final boolean isJson;
 
-        private ResultSetBuilder(boolean isJson) throws InvalidRequestException
+        private ResultSetBuilder(QueryOptions options, boolean isJson) throws InvalidRequestException
         {
             this.resultSet = new ResultSet(getResultMetadata(isJson).copy(), new ArrayList<List<ByteBuffer>>());
-            this.selectors = newSelectors();
+            this.protocolVersion = options.getProtocolVersion();
+            this.selectors = newSelectors(options);
             this.timestamps = collectTimestamps ? new long[columns.size()] : null;
             this.ttls = collectTTLs ? new int[columns.size()] : null;
             this.isJson = isJson;
@@ -334,67 +363,41 @@
                  : c.value();
         }
 
-        public void newRow(int protocolVersion) throws InvalidRequestException
+        public void newRow() throws InvalidRequestException
         {
             if (current != null)
             {
                 selectors.addInputRow(protocolVersion, this);
                 if (!selectors.isAggregate())
                 {
-                    resultSet.addRow(getOutputRow(protocolVersion));
+                    resultSet.addRow(getOutputRow());
                     selectors.reset();
                 }
             }
             current = new ArrayList<>(columns.size());
         }
 
-        public ResultSet build(int protocolVersion) throws InvalidRequestException
+        public ResultSet build() throws InvalidRequestException
         {
             if (current != null)
             {
                 selectors.addInputRow(protocolVersion, this);
-                resultSet.addRow(getOutputRow(protocolVersion));
+                resultSet.addRow(getOutputRow());
                 selectors.reset();
                 current = null;
             }
 
             if (resultSet.isEmpty() && selectors.isAggregate())
-                resultSet.addRow(getOutputRow(protocolVersion));
+                resultSet.addRow(getOutputRow());
             return resultSet;
         }
 
-        private List<ByteBuffer> getOutputRow(int protocolVersion)
+        private List<ByteBuffer> getOutputRow()
         {
             List<ByteBuffer> outputRow = selectors.getOutputRow(protocolVersion);
-            return isJson ? rowToJson(outputRow, protocolVersion)
+            return isJson ? rowToJson(outputRow, protocolVersion, metadata)
                           : outputRow;
         }
-
-        private List<ByteBuffer> rowToJson(List<ByteBuffer> row, int protocolVersion)
-        {
-            StringBuilder sb = new StringBuilder("{");
-            for (int i = 0; i < metadata.names.size(); i++)
-            {
-                if (i > 0)
-                    sb.append(", ");
-
-                ColumnSpecification spec = metadata.names.get(i);
-                String columnName = spec.name.toString();
-                if (!columnName.equals(columnName.toLowerCase(Locale.US)))
-                    columnName = "\"" + columnName + "\"";
-
-                ByteBuffer buffer = row.get(i);
-                sb.append('"');
-                sb.append(Json.quoteAsJsonString(columnName));
-                sb.append("\": ");
-                if (buffer == null)
-                    sb.append("null");
-                else
-                    sb.append(spec.type.toJSONString(buffer, protocolVersion));
-            }
-            sb.append("}");
-            return Collections.singletonList(UTF8Type.instance.getSerializer().serialize(sb.toString()));
-        }
     }
 
     private static interface Selectors
@@ -414,7 +417,7 @@
         public void reset();
     }
 
-    // Special cased selection for when no function is used (this save some allocations).
+    // Special cased selection for when only columns are selected.
     private static class SimpleSelection extends Selection
     {
         private final boolean isWildcard;
@@ -449,7 +452,7 @@
             return false;
         }
 
-        protected Selectors newSelectors()
+        protected Selectors newSelectors(QueryOptions options)
         {
             return new Selectors()
             {
@@ -530,11 +533,11 @@
             return factories.doesAggregation();
         }
 
-        protected Selectors newSelectors() throws InvalidRequestException
+        protected Selectors newSelectors(final QueryOptions options) throws InvalidRequestException
         {
             return new Selectors()
             {
-                private final List<Selector> selectors = factories.newInstances();
+                private final List<Selector> selectors = factories.newInstances(options);
 
                 public void reset()
                 {

diff --git a/src/java/org/apache/cassandra/cql3/selection/Selector.java b/src/java/org/apache/cassandra/cql3/selection/Selector.java
index 7249d22..c85dcd1 100644
--- a/src/java/org/apache/cassandra/cql3/selection/Selector.java
+++ b/src/java/org/apache/cassandra/cql3/selection/Selector.java

@@ -21,13 +21,12 @@
 import java.util.List;
 
 import org.apache.cassandra.config.CFMetaData;
-import org.apache.cassandra.cql3.AssignmentTestable;
 import org.apache.cassandra.cql3.ColumnIdentifier;
 import org.apache.cassandra.cql3.ColumnSpecification;
+import org.apache.cassandra.cql3.QueryOptions;
 import org.apache.cassandra.cql3.functions.Function;
 import org.apache.cassandra.cql3.selection.Selection.ResultSetBuilder;
 import org.apache.cassandra.db.marshal.AbstractType;
-import org.apache.cassandra.db.marshal.ReversedType;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 
 /**
@@ -36,7 +35,7 @@
  * <p>Since the introduction of aggregation, <code>Selector</code>s cannot be called anymore by multiple threads 
  * as they have an internal state.</p>
  */
-public abstract class Selector implements AssignmentTestable
+public abstract class Selector
 {
     /**
      * A factory for <code>Selector</code> instances.
@@ -58,16 +57,19 @@
         {
             return new ColumnSpecification(cfm.ksName,
                                            cfm.cfName,
-                                           ColumnIdentifier.getInterned(getColumnName(), true),
+                                           new ColumnIdentifier(getColumnName(), true), // note that the name is not necessarily
+                                                                                        // a true column name so we shouldn't intern it
                                            getReturnType());
         }
 
         /**
          * Creates a new <code>Selector</code> instance.
          *
+         * @param options the options of the query for which the instance is created (some selector
+         * depends on the bound values in particular).
          * @return a new <code>Selector</code> instance
          */
-        public abstract Selector newInstance() throws InvalidRequestException;
+        public abstract Selector newInstance(QueryOptions options) throws InvalidRequestException;
 
         /**
          * Checks if this factory creates selectors instances that creates aggregates.
@@ -183,24 +185,4 @@
      * Reset the internal state of this <code>Selector</code>.
      */
     public abstract void reset();
-
-    public final AssignmentTestable.TestResult testAssignment(String keyspace, ColumnSpecification receiver)
-    {
-        // We should ignore the fact that the output type is frozen in our comparison as functions do not support
-        // frozen types for arguments
-        AbstractType<?> receiverType = receiver.type;
-        if (getType().isFrozenCollection())
-            receiverType = receiverType.freeze();
-
-        if (getType().isReversed())
-            receiverType = ReversedType.getInstance(receiverType);
-
-        if (receiverType.equals(getType()))
-            return AssignmentTestable.TestResult.EXACT_MATCH;
-
-        if (receiverType.isValueCompatibleWith(getType()))
-            return AssignmentTestable.TestResult.WEAKLY_ASSIGNABLE;
-
-        return AssignmentTestable.TestResult.NOT_ASSIGNABLE;
-    }
 }

diff --git a/src/java/org/apache/cassandra/cql3/selection/SelectorFactories.java b/src/java/org/apache/cassandra/cql3/selection/SelectorFactories.java
index 97a1198..41bf193 100644
--- a/src/java/org/apache/cassandra/cql3/selection/SelectorFactories.java
+++ b/src/java/org/apache/cassandra/cql3/selection/SelectorFactories.java

@@ -23,6 +23,8 @@
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.cql3.QueryOptions;
+import org.apache.cassandra.cql3.VariableSpecifications;
 import org.apache.cassandra.cql3.functions.Function;
 import org.apache.cassandra.cql3.selection.Selector.Factory;
 import org.apache.cassandra.db.marshal.AbstractType;
@@ -57,29 +59,40 @@
      * Creates a new <code>SelectorFactories</code> instance and collect the column definitions.
      *
      * @param selectables the <code>Selectable</code>s for which the factories must be created
+     * @param expectedTypes the returned types expected for each of the {@code selectables}, if there
+     * is any such expectations, or {@code null} otherwise. This will be {@code null} when called on
+     * the top-level selectables, but may not be for selectable nested within a function for instance
+     * (as the argument selectable will be expected to be of the type expected by the function).
      * @param cfm the Column Family Definition
      * @param defs the collector parameter for the column definitions
+     * @param boundNames the collector for the specification of bound markers in the selection
      * @return a new <code>SelectorFactories</code> instance
      * @throws InvalidRequestException if a problem occurs while creating the factories
      */
     public static SelectorFactories createFactoriesAndCollectColumnDefinitions(List<Selectable> selectables,
+                                                                               List<AbstractType<?>> expectedTypes,
                                                                                CFMetaData cfm,
-                                                                               List<ColumnDefinition> defs)
+                                                                               List<ColumnDefinition> defs,
+                                                                               VariableSpecifications boundNames)
                                                                                throws InvalidRequestException
     {
-        return new SelectorFactories(selectables, cfm, defs);
+        return new SelectorFactories(selectables, expectedTypes, cfm, defs, boundNames);
     }
 
     private SelectorFactories(List<Selectable> selectables,
+                              List<AbstractType<?>> expectedTypes,
                               CFMetaData cfm,
-                              List<ColumnDefinition> defs)
+                              List<ColumnDefinition> defs,
+                              VariableSpecifications boundNames)
                               throws InvalidRequestException
     {
         factories = new ArrayList<>(selectables.size());
 
-        for (Selectable selectable : selectables)
+        for (int i = 0; i < selectables.size(); i++)
         {
-            Factory factory = selectable.newSelectorFactory(cfm, defs);
+            Selectable selectable = selectables.get(i);
+            AbstractType<?> expectedType = expectedTypes == null ? null : expectedTypes.get(i);
+            Factory factory = selectable.newSelectorFactory(cfm, expectedType, defs, boundNames);
             containsWritetimeFactory |= factory.isWritetimeSelectorFactory();
             containsTTLFactory |= factory.isTTLSelectorFactory();
             if (factory.isAggregateSelectorFactory())
@@ -148,15 +161,15 @@
 
     /**
      * Creates a list of new <code>Selector</code> instances.
+     *
+     * @param options the query options for the query being executed.
      * @return a list of new <code>Selector</code> instances.
      */
-    public List<Selector> newInstances() throws InvalidRequestException
+    public List<Selector> newInstances(QueryOptions options) throws InvalidRequestException
     {
         List<Selector> selectors = new ArrayList<>(factories.size());
         for (Selector.Factory factory : factories)
-        {
-            selectors.add(factory.newInstance());
-        }
+            selectors.add(factory.newInstance(options));
         return selectors;
     }
 

diff --git a/src/java/org/apache/cassandra/cql3/selection/SimpleSelector.java b/src/java/org/apache/cassandra/cql3/selection/SimpleSelector.java
index e4040fa..e14cd5c 100644
--- a/src/java/org/apache/cassandra/cql3/selection/SimpleSelector.java
+++ b/src/java/org/apache/cassandra/cql3/selection/SimpleSelector.java

@@ -21,6 +21,7 @@
 
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.ColumnSpecification;
+import org.apache.cassandra.cql3.QueryOptions;
 import org.apache.cassandra.cql3.selection.Selection.ResultSetBuilder;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.exceptions.InvalidRequestException;
@@ -55,7 +56,7 @@
             }
 
             @Override
-            public Selector newInstance()
+            public Selector newInstance(QueryOptions options)
             {
                 return new SimpleSelector(def.name.toString(), idx, def.type);
             }

diff --git a/src/java/org/apache/cassandra/cql3/selection/TermSelector.java b/src/java/org/apache/cassandra/cql3/selection/TermSelector.java
new file mode 100644
index 0000000..5aa4522
--- /dev/null
+++ b/src/java/org/apache/cassandra/cql3/selection/TermSelector.java

@@ -0,0 +1,92 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql3.selection;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.cql3.AssignmentTestable;
+import org.apache.cassandra.cql3.ColumnIdentifier;
+import org.apache.cassandra.cql3.ColumnSpecification;
+import org.apache.cassandra.cql3.QueryOptions;
+import org.apache.cassandra.cql3.Term;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.exceptions.InvalidRequestException;
+
+/**
+ * Selector representing a simple term (literals or bound variables).
+ * <p>
+ * Note that we know the term does not include function calls for instance (this is actually enforced by the parser), those
+ * being dealt with by their own Selector.
+ */
+public class TermSelector extends Selector
+{
+    private final ByteBuffer value;
+    private final AbstractType<?> type;
+
+    public static Factory newFactory(final String name, final Term term, final AbstractType<?> type)
+    {
+        return new Factory()
+        {
+            protected String getColumnName()
+            {
+                return name;
+            }
+
+            protected AbstractType<?> getReturnType()
+            {
+                return type;
+            }
+
+            protected void addColumnMapping(SelectionColumnMapping mapping, ColumnSpecification resultColumn)
+            {
+               mapping.addMapping(resultColumn, (ColumnDefinition)null);
+            }
+
+            public Selector newInstance(QueryOptions options)
+            {
+                return new TermSelector(term.bindAndGet(options), type);
+            }
+        };
+    }
+
+    private TermSelector(ByteBuffer value, AbstractType<?> type)
+    {
+        this.value = value;
+        this.type = type;
+    }
+
+    public void addInput(int protocolVersion, Selection.ResultSetBuilder rs) throws InvalidRequestException
+    {
+    }
+
+    public ByteBuffer getOutput(int protocolVersion) throws InvalidRequestException
+    {
+        return value;
+    }
+
+    public AbstractType<?> getType()
+    {
+        return type;
+    }
+
+    public void reset()
+    {
+    }
+}

diff --git a/src/java/org/apache/cassandra/cql3/selection/WritetimeOrTTLSelector.java b/src/java/org/apache/cassandra/cql3/selection/WritetimeOrTTLSelector.java
index 131827f..78380d7 100644
--- a/src/java/org/apache/cassandra/cql3/selection/WritetimeOrTTLSelector.java
+++ b/src/java/org/apache/cassandra/cql3/selection/WritetimeOrTTLSelector.java

@@ -20,6 +20,7 @@
 import java.nio.ByteBuffer;
 
 import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.cql3.QueryOptions;
 import org.apache.cassandra.cql3.ColumnSpecification;
 import org.apache.cassandra.cql3.selection.Selection.ResultSetBuilder;
 import org.apache.cassandra.db.marshal.AbstractType;
@@ -54,7 +55,7 @@
                mapping.addMapping(resultsColumn, def);
             }
 
-            public Selector newInstance()
+            public Selector newInstance(QueryOptions options)
             {
                 return new WritetimeOrTTLSelector(def.name.toString(), idx, isWritetime);
             }

diff --git a/src/java/org/apache/cassandra/cql3/statements/AlterTableStatement.java b/src/java/org/apache/cassandra/cql3/statements/AlterTableStatement.java
index 381971f..afe2776 100644
--- a/src/java/org/apache/cassandra/cql3/statements/AlterTableStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/AlterTableStatement.java

@@ -52,27 +52,21 @@
     }
 
     public final Type oType;
-    public final CQL3Type.Raw validator;
-    public final ColumnIdentifier.Raw rawColumnName;
     private final TableAttributes attrs;
-    private final Map<ColumnIdentifier.Raw, ColumnIdentifier.Raw> renames;
-    private final boolean isStatic; // Only for ALTER ADD
+    private final Map<ColumnDefinition.Raw, ColumnDefinition.Raw> renames;
+    private final List<AlterTableStatementColumn> colNameList;
 
     public AlterTableStatement(CFName name,
                                Type type,
-                               ColumnIdentifier.Raw columnName,
-                               CQL3Type.Raw validator,
+                               List<AlterTableStatementColumn> colDataList,
                                TableAttributes attrs,
-                               Map<ColumnIdentifier.Raw, ColumnIdentifier.Raw> renames,
-                               boolean isStatic)
+                               Map<ColumnDefinition.Raw, ColumnDefinition.Raw> renames)
     {
         super(name);
         this.oType = type;
-        this.rawColumnName = columnName;
-        this.validator = validator; // used only for ADD/ALTER commands
+        this.colNameList = colDataList;
         this.attrs = attrs;
         this.renames = renames;
-        this.isStatic = isStatic;
     }
 
     public void checkAccess(ClientState state) throws UnauthorizedException, InvalidRequestException
@@ -92,15 +86,12 @@
             throw new InvalidRequestException("Cannot use ALTER TABLE on Materialized View");
 
         CFMetaData cfm = meta.copy();
-
-        CQL3Type validator = this.validator == null ? null : this.validator.prepare(keyspace());
         ColumnIdentifier columnName = null;
         ColumnDefinition def = null;
-        if (rawColumnName != null)
-        {
-            columnName = rawColumnName.prepare(cfm);
-            def = cfm.getColumnDefinition(columnName);
-        }
+        CQL3Type.Raw dataType = null;
+        boolean isStatic = false;
+        CQL3Type validator = null;
+        ColumnDefinition.Raw rawColumnName = null;
 
         List<ViewDefinition> viewUpdates = null;
         Iterable<ViewDefinition> views = View.findAll(keyspace(), columnFamily());
@@ -108,81 +99,103 @@
         switch (oType)
         {
             case ADD:
-                assert columnName != null;
-                if (cfm.isDense())
-                    throw new InvalidRequestException("Cannot add new column to a COMPACT STORAGE table");
-
-                if (isStatic)
+                for (AlterTableStatementColumn colData : colNameList)
                 {
-                    if (!cfm.isCompound())
-                        throw new InvalidRequestException("Static columns are not allowed in COMPACT STORAGE tables");
-                    if (cfm.clusteringColumns().isEmpty())
-                        throw new InvalidRequestException("Static columns are only useful (and thus allowed) if the table has at least one clustering column");
-                }
-
-                if (def != null)
-                {
-                    switch (def.kind)
+                    rawColumnName = colData.getColumnName();
+                    if (rawColumnName != null)
                     {
-                        case PARTITION_KEY:
-                        case CLUSTERING:
-                            throw new InvalidRequestException(String.format("Invalid column name %s because it conflicts with a PRIMARY KEY part", columnName));
-                        default:
-                            throw new InvalidRequestException(String.format("Invalid column name %s because it conflicts with an existing column", columnName));
+                        columnName = rawColumnName.getIdentifier(cfm);
+                        def =  cfm.getColumnDefinition(columnName);
+                        dataType = colData.getColumnType();
+                        isStatic = colData.getStaticType();
+                        validator = dataType == null ? null : dataType.prepare(keyspace());
                     }
-                }
 
-                // Cannot re-add a dropped counter column. See #7831.
-                if (meta.isCounter() && meta.getDroppedColumns().containsKey(columnName.bytes))
-                    throw new InvalidRequestException(String.format("Cannot re-add previously dropped counter column %s", columnName));
+                    assert columnName != null;
+                    if (cfm.isDense())
+                        throw new InvalidRequestException("Cannot add new column to a COMPACT STORAGE table");
 
-                AbstractType<?> type = validator.getType();
-                if (type.isCollection() && type.isMultiCell())
-                {
-                    if (!cfm.isCompound())
-                        throw new InvalidRequestException("Cannot use non-frozen collections in COMPACT STORAGE tables");
-                    if (cfm.isSuper())
-                        throw new InvalidRequestException("Cannot use non-frozen collections with super column families");
-
-                    // If there used to be a non-frozen collection column with the same name (that has been dropped),
-                    // we could still have some data using the old type, and so we can't allow adding a collection
-                    // with the same name unless the types are compatible (see #6276).
-                    CFMetaData.DroppedColumn dropped = cfm.getDroppedColumns().get(columnName.bytes);
-                    if (dropped != null && dropped.type instanceof CollectionType
-                        && dropped.type.isMultiCell() && !type.isCompatibleWith(dropped.type))
+                    if (isStatic)
                     {
-                        String message =
-                            String.format("Cannot add a collection with the name %s because a collection with the same name"
-                                          + " and a different type (%s) has already been used in the past",
-                                          columnName,
-                                          dropped.type.asCQL3Type());
-                        throw new InvalidRequestException(message);
+                        if (!cfm.isCompound())
+                            throw new InvalidRequestException("Static columns are not allowed in COMPACT STORAGE tables");
+                        if (cfm.clusteringColumns().isEmpty())
+                            throw new InvalidRequestException("Static columns are only useful (and thus allowed) if the table has at least one clustering column");
                     }
-                }
 
-                cfm.addColumnDefinition(isStatic
-                                        ? ColumnDefinition.staticDef(cfm, columnName.bytes, type)
-                                        : ColumnDefinition.regularDef(cfm, columnName.bytes, type));
-
-                // Adding a column to a table which has an include all view requires the column to be added to the view
-                // as well
-                if (!isStatic)
-                {
-                    for (ViewDefinition view : views)
+                    if (def != null)
                     {
-                        if (view.includeAllColumns)
+                        switch (def.kind)
                         {
-                            ViewDefinition viewCopy = view.copy();
-                            viewCopy.metadata.addColumnDefinition(ColumnDefinition.regularDef(viewCopy.metadata, columnName.bytes, type));
-                            if (viewUpdates == null)
-                                viewUpdates = new ArrayList<>();
-                            viewUpdates.add(viewCopy);
+                            case PARTITION_KEY:
+                            case CLUSTERING:
+                                throw new InvalidRequestException(String.format("Invalid column name %s because it conflicts with a PRIMARY KEY part", columnName));
+                            default:
+                                throw new InvalidRequestException(String.format("Invalid column name %s because it conflicts with an existing column", columnName));
+                        }
+                    }
+
+                    // Cannot re-add a dropped counter column. See #7831.
+                    if (meta.isCounter() && meta.getDroppedColumns().containsKey(columnName.bytes))
+                        throw new InvalidRequestException(String.format("Cannot re-add previously dropped counter column %s", columnName));
+
+                    AbstractType<?> type = validator.getType();
+                    if (type.isCollection() && type.isMultiCell())
+                    {
+                        if (!cfm.isCompound())
+                            throw new InvalidRequestException("Cannot use non-frozen collections in COMPACT STORAGE tables");
+                        if (cfm.isSuper())
+                            throw new InvalidRequestException("Cannot use non-frozen collections with super column families");
+
+                        // If there used to be a non-frozen collection column with the same name (that has been dropped),
+                        // we could still have some data using the old type, and so we can't allow adding a collection
+                        // with the same name unless the types are compatible (see #6276).
+                        CFMetaData.DroppedColumn dropped = cfm.getDroppedColumns().get(columnName.bytes);
+                        if (dropped != null && dropped.type instanceof CollectionType
+                            && dropped.type.isMultiCell() && !type.isCompatibleWith(dropped.type))
+                        {
+                            String message =
+                                String.format("Cannot add a collection with the name %s because a collection with the same name"
+                                              + " and a different type (%s) has already been used in the past",
+                                              columnName,
+                                              dropped.type.asCQL3Type());
+                            throw new InvalidRequestException(message);
+                        }
+                    }
+
+                    cfm.addColumnDefinition(isStatic
+                                            ? ColumnDefinition.staticDef(cfm, columnName.bytes, type)
+                                            : ColumnDefinition.regularDef(cfm, columnName.bytes, type));
+
+                    // Adding a column to a table which has an include all view requires the column to be added to the view
+                    // as well
+                    if (!isStatic)
+                    {
+                        for (ViewDefinition view : views)
+                        {
+                            if (view.includeAllColumns)
+                            {
+                                ViewDefinition viewCopy = view.copy();
+                                viewCopy.metadata.addColumnDefinition(ColumnDefinition.regularDef(viewCopy.metadata, columnName.bytes, type));
+                                if (viewUpdates == null)
+                                    viewUpdates = new ArrayList<>();
+                                viewUpdates.add(viewCopy);
+                            }
                         }
                     }
                 }
                 break;
 
             case ALTER:
+                rawColumnName = colNameList.get(0).getColumnName();
+                if (rawColumnName != null)
+                {
+                    columnName = rawColumnName.getIdentifier(cfm);
+                    def = cfm.getColumnDefinition(columnName);
+                    dataType = colNameList.get(0).getColumnType();
+                    validator = dataType == null ? null : dataType.prepare(keyspace());
+                }
+
                 assert columnName != null;
                 if (def == null)
                     throw new InvalidRequestException(String.format("Column %s was not found in table %s", columnName, columnFamily()));
@@ -214,66 +227,76 @@
                 break;
 
             case DROP:
-                assert columnName != null;
-                if (!cfm.isCQLTable())
-                    throw new InvalidRequestException("Cannot drop columns from a non-CQL3 table");
-                if (def == null)
-                    throw new InvalidRequestException(String.format("Column %s was not found in table %s", columnName, columnFamily()));
-
-                switch (def.kind)
+                for (AlterTableStatementColumn colData : colNameList)
                 {
-                    case PARTITION_KEY:
-                    case CLUSTERING:
-                        throw new InvalidRequestException(String.format("Cannot drop PRIMARY KEY part %s", columnName));
-                    case REGULAR:
-                    case STATIC:
-                        ColumnDefinition toDelete = null;
-                        for (ColumnDefinition columnDef : cfm.partitionColumns())
-                        {
-                            if (columnDef.name.equals(columnName))
-                            {
-                                toDelete = columnDef;
-                                break;
-                            }
-                        }
+                    columnName = null;
+                    rawColumnName = colData.getColumnName();
+                    if (rawColumnName != null)
+                    {
+                        columnName = rawColumnName.getIdentifier(cfm);
+                        def = cfm.getColumnDefinition(columnName);
+                    }
+                    assert columnName != null;
+                    if (!cfm.isCQLTable())
+                        throw new InvalidRequestException("Cannot drop columns from a non-CQL3 table");
+                    if (def == null)
+                        throw new InvalidRequestException(String.format("Column %s was not found in table %s", columnName, columnFamily()));
+
+                    switch (def.kind)
+                    {
+                         case PARTITION_KEY:
+                         case CLUSTERING:
+                              throw new InvalidRequestException(String.format("Cannot drop PRIMARY KEY part %s", columnName));
+                         case REGULAR:
+                         case STATIC:
+                              ColumnDefinition toDelete = null;
+                              for (ColumnDefinition columnDef : cfm.partitionColumns())
+                              {
+                                   if (columnDef.name.equals(columnName))
+                                      {
+                                        toDelete = columnDef;
+                                        break;
+                                      }
+                               }
                         assert toDelete != null;
                         cfm.removeColumnDefinition(toDelete);
                         cfm.recordColumnDrop(toDelete);
                         break;
-                }
+                    }
 
-                // If the dropped column is required by any secondary indexes
-                // we reject the operation, as the indexes must be dropped first
-                Indexes allIndexes = cfm.getIndexes();
-                if (!allIndexes.isEmpty())
-                {
-                    ColumnFamilyStore store = Keyspace.openAndGetStore(cfm);
-                    Set<IndexMetadata> dependentIndexes = store.indexManager.getDependentIndexes(def);
-                    if (!dependentIndexes.isEmpty())
-                        throw new InvalidRequestException(String.format("Cannot drop column %s because it has " +
-                                                                        "dependent secondary indexes (%s)",
-                                                                        def,
-                                                                        dependentIndexes.stream()
-                                                                                        .map(i -> i.name)
-                                                                                        .collect(Collectors.joining(","))));
-                }
+                    // If the dropped column is required by any secondary indexes
+                    // we reject the operation, as the indexes must be dropped first
+                    Indexes allIndexes = cfm.getIndexes();
+                    if (!allIndexes.isEmpty())
+                    {
+                        ColumnFamilyStore store = Keyspace.openAndGetStore(cfm);
+                        Set<IndexMetadata> dependentIndexes = store.indexManager.getDependentIndexes(def);
+                        if (!dependentIndexes.isEmpty())
+                            throw new InvalidRequestException(String.format("Cannot drop column %s because it has " +
+                                                                            "dependent secondary indexes (%s)",
+                                                                            def,
+                                                                            dependentIndexes.stream()
+                                                                                            .map(i -> i.name)
+                                                                                            .collect(Collectors.joining(","))));
+                    }
 
-                // If a column is dropped which is included in a view, we don't allow the drop to take place.
-                boolean rejectAlter = false;
-                StringBuilder builder = new StringBuilder();
-                for (ViewDefinition view : views)
-                {
-                    if (!view.includes(columnName)) continue;
+                    // If a column is dropped which is included in a view, we don't allow the drop to take place.
+                    boolean rejectAlter = false;
+                    StringBuilder builder = new StringBuilder();
+                    for (ViewDefinition view : views)
+                    {
+                        if (!view.includes(columnName)) continue;
+                        if (rejectAlter)
+                            builder.append(',');
+                        rejectAlter = true;
+                        builder.append(view.viewName);
+                    }
                     if (rejectAlter)
-                        builder.append(',');
-                    rejectAlter = true;
-                    builder.append(view.viewName);
+                        throw new InvalidRequestException(String.format("Cannot drop column %s, depended on by materialized views (%s.{%s})",
+                                                                        columnName.toString(),
+                                                                        keyspace(),
+                                                                        builder.toString()));
                 }
-                if (rejectAlter)
-                    throw new InvalidRequestException(String.format("Cannot drop column %s, depended on by materialized views (%s.{%s})",
-                                                                    columnName.toString(),
-                                                                    keyspace(),
-                                                                    builder.toString()));
                 break;
             case OPTS:
                 if (attrs == null)
@@ -298,10 +321,10 @@
 
                 break;
             case RENAME:
-                for (Map.Entry<ColumnIdentifier.Raw, ColumnIdentifier.Raw> entry : renames.entrySet())
+                for (Map.Entry<ColumnDefinition.Raw, ColumnDefinition.Raw> entry : renames.entrySet())
                 {
-                    ColumnIdentifier from = entry.getKey().prepare(cfm);
-                    ColumnIdentifier to = entry.getValue().prepare(cfm);
+                    ColumnIdentifier from = entry.getKey().getIdentifier(cfm);
+                    ColumnIdentifier to = entry.getValue().getIdentifier(cfm);
                     cfm.renameColumn(from, to);
 
                     // If the view includes a renamed column, it must be renamed in the view table and the definition.
@@ -310,8 +333,8 @@
                         if (!view.includes(from)) continue;
 
                         ViewDefinition viewCopy = view.copy();
-                        ColumnIdentifier viewFrom = entry.getKey().prepare(viewCopy.metadata);
-                        ColumnIdentifier viewTo = entry.getValue().prepare(viewCopy.metadata);
+                        ColumnIdentifier viewFrom = entry.getKey().getIdentifier(viewCopy.metadata);
+                        ColumnIdentifier viewTo = entry.getValue().getIdentifier(viewCopy.metadata);
                         viewCopy.renameColumn(viewFrom, viewTo);
 
                         if (viewUpdates == null)
@@ -379,12 +402,11 @@
         }
     }
 
+    @Override
     public String toString()
     {
-        return String.format("AlterTableStatement(name=%s, type=%s, column=%s, validator=%s)",
+        return String.format("AlterTableStatement(name=%s, type=%s)",
                              cfName,
-                             oType,
-                             rawColumnName,
-                             validator);
+                             oType);
     }
 }

diff --git a/src/java/org/apache/cassandra/cql3/statements/AlterTableStatementColumn.java b/src/java/org/apache/cassandra/cql3/statements/AlterTableStatementColumn.java
new file mode 100644
index 0000000..813effe
--- /dev/null
+++ b/src/java/org/apache/cassandra/cql3/statements/AlterTableStatementColumn.java

@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql3.statements;
+
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.cql3.CQL3Type;
+import org.apache.cassandra.cql3.ColumnIdentifier;
+
+public class AlterTableStatementColumn
+{
+    private final CQL3Type.Raw dataType;
+    private final ColumnDefinition.Raw colName;
+    private final Boolean isStatic;
+
+    public AlterTableStatementColumn(ColumnDefinition.Raw colName, CQL3Type.Raw dataType, boolean isStatic)
+    {
+        this.dataType = dataType;
+        this.colName = colName;
+        this.isStatic = isStatic;
+    }
+
+    public AlterTableStatementColumn(ColumnDefinition.Raw colName, CQL3Type.Raw dataType)
+    {
+        this(colName, dataType,false );
+    }
+
+    public AlterTableStatementColumn(ColumnDefinition.Raw colName)
+    {
+        this(colName, null, false);
+    }
+
+    public CQL3Type.Raw getColumnType()
+    {
+        return dataType;
+    }
+
+    public ColumnDefinition.Raw getColumnName()
+    {
+        return colName;
+    }
+
+    public Boolean getStaticType()
+    {
+        return isStatic;
+    }
+}

diff --git a/src/java/org/apache/cassandra/cql3/statements/AlterTypeStatement.java b/src/java/org/apache/cassandra/cql3/statements/AlterTypeStatement.java
index bd23971..64bccf5 100644
--- a/src/java/org/apache/cassandra/cql3/statements/AlterTypeStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/AlterTypeStatement.java

@@ -51,17 +51,17 @@
 
     protected abstract UserType makeUpdatedType(UserType toUpdate, KeyspaceMetadata ksm) throws InvalidRequestException;
 
-    public static AlterTypeStatement addition(UTName name, ColumnIdentifier fieldName, CQL3Type.Raw type)
+    public static AlterTypeStatement addition(UTName name, FieldIdentifier fieldName, CQL3Type.Raw type)
     {
         return new AddOrAlter(name, true, fieldName, type);
     }
 
-    public static AlterTypeStatement alter(UTName name, ColumnIdentifier fieldName, CQL3Type.Raw type)
+    public static AlterTypeStatement alter(UTName name, FieldIdentifier fieldName, CQL3Type.Raw type)
     {
         return new AddOrAlter(name, false, fieldName, type);
     }
 
-    public static AlterTypeStatement renames(UTName name, Map<ColumnIdentifier, ColumnIdentifier> renames)
+    public static AlterTypeStatement renames(UTName name, Map<FieldIdentifier, FieldIdentifier> renames)
     {
         return new Renames(name, renames);
     }
@@ -137,14 +137,6 @@
         return new Event.SchemaChange(Event.SchemaChange.Change.UPDATED, Event.SchemaChange.Target.TYPE, keyspace(), name.getStringTypeName());
     }
 
-    private static int getIdxOfField(UserType type, ColumnIdentifier field)
-    {
-        for (int i = 0; i < type.size(); i++)
-            if (field.bytes.equals(type.fieldName(i)))
-                return i;
-        return -1;
-    }
-
     private boolean updateDefinition(CFMetaData cfm, ColumnDefinition def, String keyspace, ByteBuffer toReplace, UserType updated)
     {
         AbstractType<?> t = updateWith(def.type, keyspace, toReplace, updated);
@@ -166,11 +158,11 @@
 
             // If it's directly the type we've updated, then just use the new one.
             if (keyspace.equals(ut.keyspace) && toReplace.equals(ut.name))
-                return updated;
+                return type.isMultiCell() ? updated : updated.freeze();
 
             // Otherwise, check for nesting
             List<AbstractType<?>> updatedTypes = updateTypes(ut.fieldTypes(), keyspace, toReplace, updated);
-            return updatedTypes == null ? null : new UserType(ut.keyspace, ut.name, new ArrayList<>(ut.fieldNames()), updatedTypes);
+            return updatedTypes == null ? null : new UserType(ut.keyspace, ut.name, new ArrayList<>(ut.fieldNames()), updatedTypes, type.isMultiCell());
         }
         else if (type instanceof TupleType)
         {
@@ -247,10 +239,10 @@
     private static class AddOrAlter extends AlterTypeStatement
     {
         private final boolean isAdd;
-        private final ColumnIdentifier fieldName;
+        private final FieldIdentifier fieldName;
         private final CQL3Type.Raw type;
 
-        public AddOrAlter(UTName name, boolean isAdd, ColumnIdentifier fieldName, CQL3Type.Raw type)
+        public AddOrAlter(UTName name, boolean isAdd, FieldIdentifier fieldName, CQL3Type.Raw type)
         {
             super(name);
             this.isAdd = isAdd;
@@ -260,12 +252,12 @@
 
         private UserType doAdd(UserType toUpdate) throws InvalidRequestException
         {
-            if (getIdxOfField(toUpdate, fieldName) >= 0)
+            if (toUpdate.fieldPosition(fieldName) >= 0)
                 throw new InvalidRequestException(String.format("Cannot add new field %s to type %s: a field of the same name already exists", fieldName, name));
 
-            List<ByteBuffer> newNames = new ArrayList<>(toUpdate.size() + 1);
+            List<FieldIdentifier> newNames = new ArrayList<>(toUpdate.size() + 1);
             newNames.addAll(toUpdate.fieldNames());
-            newNames.add(fieldName.bytes);
+            newNames.add(fieldName);
 
             AbstractType<?> addType = type.prepare(keyspace()).getType();
             if (addType.referencesUserType(toUpdate.getNameAsString()))
@@ -275,14 +267,14 @@
             newTypes.addAll(toUpdate.fieldTypes());
             newTypes.add(addType);
 
-            return new UserType(toUpdate.keyspace, toUpdate.name, newNames, newTypes);
+            return new UserType(toUpdate.keyspace, toUpdate.name, newNames, newTypes, toUpdate.isMultiCell());
         }
 
         private UserType doAlter(UserType toUpdate, KeyspaceMetadata ksm) throws InvalidRequestException
         {
             checkTypeNotUsedByAggregate(ksm);
 
-            int idx = getIdxOfField(toUpdate, fieldName);
+            int idx = toUpdate.fieldPosition(fieldName);
             if (idx < 0)
                 throw new InvalidRequestException(String.format("Unknown field %s in type %s", fieldName, name));
 
@@ -290,11 +282,11 @@
             if (!type.prepare(keyspace()).getType().isCompatibleWith(previous))
                 throw new InvalidRequestException(String.format("Type %s is incompatible with previous type %s of field %s in user type %s", type, previous.asCQL3Type(), fieldName, name));
 
-            List<ByteBuffer> newNames = new ArrayList<>(toUpdate.fieldNames());
+            List<FieldIdentifier> newNames = new ArrayList<>(toUpdate.fieldNames());
             List<AbstractType<?>> newTypes = new ArrayList<>(toUpdate.fieldTypes());
             newTypes.set(idx, type.prepare(keyspace()).getType());
 
-            return new UserType(toUpdate.keyspace, toUpdate.name, newNames, newTypes);
+            return new UserType(toUpdate.keyspace, toUpdate.name, newNames, newTypes, toUpdate.isMultiCell());
         }
 
         protected UserType makeUpdatedType(UserType toUpdate, KeyspaceMetadata ksm) throws InvalidRequestException
@@ -305,9 +297,9 @@
 
     private static class Renames extends AlterTypeStatement
     {
-        private final Map<ColumnIdentifier, ColumnIdentifier> renames;
+        private final Map<FieldIdentifier, FieldIdentifier> renames;
 
-        public Renames(UTName name, Map<ColumnIdentifier, ColumnIdentifier> renames)
+        public Renames(UTName name, Map<FieldIdentifier, FieldIdentifier> renames)
         {
             super(name);
             this.renames = renames;
@@ -317,20 +309,20 @@
         {
             checkTypeNotUsedByAggregate(ksm);
 
-            List<ByteBuffer> newNames = new ArrayList<>(toUpdate.fieldNames());
+            List<FieldIdentifier> newNames = new ArrayList<>(toUpdate.fieldNames());
             List<AbstractType<?>> newTypes = new ArrayList<>(toUpdate.fieldTypes());
 
-            for (Map.Entry<ColumnIdentifier, ColumnIdentifier> entry : renames.entrySet())
+            for (Map.Entry<FieldIdentifier, FieldIdentifier> entry : renames.entrySet())
             {
-                ColumnIdentifier from = entry.getKey();
-                ColumnIdentifier to = entry.getValue();
-                int idx = getIdxOfField(toUpdate, from);
+                FieldIdentifier from = entry.getKey();
+                FieldIdentifier to = entry.getValue();
+                int idx = toUpdate.fieldPosition(from);
                 if (idx < 0)
                     throw new InvalidRequestException(String.format("Unknown field %s in type %s", from, name));
-                newNames.set(idx, to.bytes);
+                newNames.set(idx, to);
             }
 
-            UserType updated = new UserType(toUpdate.keyspace, toUpdate.name, newNames, newTypes);
+            UserType updated = new UserType(toUpdate.keyspace, toUpdate.name, newNames, newTypes, toUpdate.isMultiCell());
             CreateTypeStatement.checkForDuplicateNames(updated);
             return updated;
         }

diff --git a/src/java/org/apache/cassandra/cql3/statements/BatchStatement.java b/src/java/org/apache/cassandra/cql3/statements/BatchStatement.java
index f0aa835..2739c2e 100644
--- a/src/java/org/apache/cassandra/cql3/statements/BatchStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/BatchStatement.java

@@ -21,7 +21,6 @@
 import java.util.*;
 import java.util.concurrent.TimeUnit;
 
-import com.google.common.base.Function;
 import com.google.common.collect.Iterables;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -38,6 +37,7 @@
 import org.apache.cassandra.service.*;
 import org.apache.cassandra.tracing.Tracing;
 import org.apache.cassandra.transport.messages.ResultMessage;
+import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.NoSpamLogger;
 import org.apache.cassandra.utils.Pair;
 
@@ -258,57 +258,72 @@
     /**
      * Checks batch size to ensure threshold is met. If not, a warning is logged.
      *
-     * @param cfs ColumnFamilies that will store the batch's mutations.
+     * @param updates - the batch mutations.
      */
-    public static void verifyBatchSize(Iterable<PartitionUpdate> updates) throws InvalidRequestException
+    private static void verifyBatchSize(Collection<? extends IMutation> mutations) throws InvalidRequestException
     {
+        // We only warn for batch spanning multiple mutations (#10876)
+        if (mutations.size() <= 1)
+            return;
+
         long size = 0;
         long warnThreshold = DatabaseDescriptor.getBatchSizeWarnThreshold();
         long failThreshold = DatabaseDescriptor.getBatchSizeFailThreshold();
 
-        for (PartitionUpdate update : updates)
-            size += update.dataSize();
+        for (IMutation mutation : mutations)
+        {
+            for (PartitionUpdate update : mutation.getPartitionUpdates())
+                size += update.dataSize();
+        }
 
         if (size > warnThreshold)
         {
             Set<String> tableNames = new HashSet<>();
-            for (PartitionUpdate update : updates)
-                tableNames.add(String.format("%s.%s", update.metadata().ksName, update.metadata().cfName));
+            for (IMutation mutation : mutations)
+            {
+                for (PartitionUpdate update : mutation.getPartitionUpdates())
+                    tableNames.add(String.format("%s.%s", update.metadata().ksName, update.metadata().cfName));
+            }
 
-            String format = "Batch of prepared statements for {} is of size {}, exceeding specified threshold of {} by {}.{}";
+            String format = "Batch for {} is of size {}, exceeding specified threshold of {} by {}.{}";
             if (size > failThreshold)
             {
-                Tracing.trace(format, tableNames, size, failThreshold, size - failThreshold, " (see batch_size_fail_threshold_in_kb)");
-                logger.error(format, tableNames, size, failThreshold, size - failThreshold, " (see batch_size_fail_threshold_in_kb)");
+                Tracing.trace(format, tableNames, FBUtilities.prettyPrintMemory(size), FBUtilities.prettyPrintMemory(failThreshold),
+                              FBUtilities.prettyPrintMemory(size - failThreshold), " (see batch_size_fail_threshold_in_kb)");
+                logger.error(format, tableNames, FBUtilities.prettyPrintMemory(size), FBUtilities.prettyPrintMemory(failThreshold),
+                             FBUtilities.prettyPrintMemory(size - failThreshold), " (see batch_size_fail_threshold_in_kb)");
                 throw new InvalidRequestException("Batch too large");
             }
             else if (logger.isWarnEnabled())
             {
-                logger.warn(format, tableNames, size, warnThreshold, size - warnThreshold, "");
+                logger.warn(format, tableNames, FBUtilities.prettyPrintMemory(size), FBUtilities.prettyPrintMemory(warnThreshold),
+                            FBUtilities.prettyPrintMemory(size - warnThreshold), "");
             }
             ClientWarn.instance.warn(MessageFormatter.arrayFormat(format, new Object[] {tableNames, size, warnThreshold, size - warnThreshold, ""}).getMessage());
         }
     }
 
-    private void verifyBatchType(Iterable<PartitionUpdate> updates)
+    private void verifyBatchType(Collection<? extends IMutation> mutations)
     {
-        if (!isLogged() && Iterables.size(updates) > 1)
+        if (!isLogged() && mutations.size() > 1)
         {
             Set<DecoratedKey> keySet = new HashSet<>();
             Set<String> tableNames = new HashSet<>();
 
-            for (PartitionUpdate update : updates)
+            for (IMutation mutation : mutations)
             {
-                keySet.add(update.partitionKey());
+                for (PartitionUpdate update : mutation.getPartitionUpdates())
+                {
+                    keySet.add(update.partitionKey());
 
-                tableNames.add(String.format("%s.%s", update.metadata().ksName, update.metadata().cfName));
+                    tableNames.add(String.format("%s.%s", update.metadata().ksName, update.metadata().cfName));
+                }
             }
 
             // CASSANDRA-11529: log only if we have more than a threshold of keys, this was also suggested in the
             // original ticket that introduced this warning, CASSANDRA-9282
             if (keySet.size() > DatabaseDescriptor.getUnloggedBatchAcrossPartitionsWarnThreshold())
             {
-
                 NoSpamLogger.log(logger, NoSpamLogger.Level.WARN, 1, TimeUnit.MINUTES, UNLOGGED_BATCH_WARNING,
                                  keySet.size(), tableNames.size() == 1 ? "" : "s", tableNames);
 
@@ -349,17 +364,8 @@
         if (mutations.isEmpty())
             return;
 
-        // Extract each collection of updates from it's IMutation and then lazily concatenate all of them into a single Iterable.
-        Iterable<PartitionUpdate> updates = Iterables.concat(Iterables.transform(mutations, new Function<IMutation, Collection<PartitionUpdate>>()
-        {
-            public Collection<PartitionUpdate> apply(IMutation im)
-            {
-                return im.getPartitionUpdates();
-            }
-        }));
-
-        verifyBatchSize(updates);
-        verifyBatchType(updates);
+        verifyBatchSize(mutations);
+        verifyBatchType(mutations);
 
         boolean mutateAtomic = (isLogged() && mutations.size() > 1);
         StorageProxy.mutateWithTriggers(mutations, cl, mutateAtomic);
@@ -447,14 +453,7 @@
     private ResultMessage executeInternalWithoutCondition(QueryState queryState, QueryOptions options) throws RequestValidationException, RequestExecutionException
     {
         for (IMutation mutation : getMutations(BatchQueryOptions.withoutPerStatementVariables(options), true, queryState.getTimestamp()))
-        {
-            assert mutation instanceof Mutation || mutation instanceof CounterMutation;
-
-            if (mutation instanceof Mutation)
-                ((Mutation) mutation).apply();
-            else if (mutation instanceof CounterMutation)
-                ((CounterMutation) mutation).apply();
-        }
+            mutation.apply();
         return null;
     }
 
@@ -531,7 +530,7 @@
 
             // Use the CFMetadata of the first statement for partition key bind indexes.  If the statements affect
             // multiple tables, we won't send partition key bind indexes.
-            Short[] partitionKeyBindIndexes = (haveMultipleCFs || batchStatement.statements.isEmpty())? null
+            short[] partitionKeyBindIndexes = (haveMultipleCFs || batchStatement.statements.isEmpty())? null
                                                               : boundNames.getPartitionKeyBindIndexes(batchStatement.statements.get(0).cfm);
 
             return new ParsedStatement.Prepared(batchStatement, boundNames, partitionKeyBindIndexes);

diff --git a/src/java/org/apache/cassandra/cql3/statements/Bound.java b/src/java/org/apache/cassandra/cql3/statements/Bound.java
index 7742642..824743c 100644
--- a/src/java/org/apache/cassandra/cql3/statements/Bound.java
+++ b/src/java/org/apache/cassandra/cql3/statements/Bound.java

@@ -17,6 +17,8 @@
  */
 package org.apache.cassandra.cql3.statements;
 
+import org.apache.cassandra.config.ColumnDefinition;
+
 public enum Bound
 {
     START(0), END(1);
@@ -28,6 +30,17 @@
         this.idx = idx;
     }
 
+    /**
+     * Reverses the bound if the column type is a reversed one.
+     *
+     * @param columnDefinition the column definition
+     * @return the bound reversed if the column type was a reversed one or the original bound
+     */
+    public Bound reverseIfNeeded(ColumnDefinition columnDefinition)
+    {
+        return columnDefinition.isReversedType() ? reverse() : this;
+    }
+
     public Bound reverse()
     {
         return isStart() ? END : START;

diff --git a/src/java/org/apache/cassandra/cql3/statements/CQL3CasRequest.java b/src/java/org/apache/cassandra/cql3/statements/CQL3CasRequest.java
index 9564005..93844b3 100644
--- a/src/java/org/apache/cassandra/cql3/statements/CQL3CasRequest.java
+++ b/src/java/org/apache/cassandra/cql3/statements/CQL3CasRequest.java

@@ -175,10 +175,6 @@
             upd.applyUpdates(current, update);
 
         Keyspace.openAndGetStore(cfm).indexManager.validate(update);
-
-        if (isBatch)
-            BatchStatement.verifyBatchSize(Collections.singleton(update));
-
         return update;
     }
 

diff --git a/src/java/org/apache/cassandra/cql3/statements/CreateIndexStatement.java b/src/java/org/apache/cassandra/cql3/statements/CreateIndexStatement.java
index 2eebe0d..f899247 100644
--- a/src/java/org/apache/cassandra/cql3/statements/CreateIndexStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/CreateIndexStatement.java

@@ -105,15 +105,6 @@
             if (cfm.isCompactTable() && cd.isPrimaryKeyColumn())
                 throw new InvalidRequestException("Secondary indexes are not supported on PRIMARY KEY columns in COMPACT STORAGE tables");
 
-            // It would be possible to support 2ndary index on static columns (but not without modifications of at least ExtendedFilter and
-            // CompositesIndex) and maybe we should, but that means a query like:
-            //     SELECT * FROM foo WHERE static_column = 'bar'
-            // would pull the full partition every time the static column of partition is 'bar', which sounds like offering a
-            // fair potential for foot-shooting, so I prefer leaving that to a follow up ticket once we have identified cases where
-            // such indexing is actually useful.
-            if (!cfm.isCompactTable() && cd.isStatic())
-                throw new InvalidRequestException("Secondary indexes are not allowed on static columns");
-
             if (cd.kind == ColumnDefinition.Kind.PARTITION_KEY && cfm.getKeyValidatorAsClusteringComparator().size() == 1)
                 throw new InvalidRequestException(String.format("Cannot create secondary index on partition key column %s", target.column));
 

diff --git a/src/java/org/apache/cassandra/cql3/statements/CreateKeyspaceStatement.java b/src/java/org/apache/cassandra/cql3/statements/CreateKeyspaceStatement.java
index 3eb0ac9..f88c04f 100644
--- a/src/java/org/apache/cassandra/cql3/statements/CreateKeyspaceStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/CreateKeyspaceStatement.java

@@ -17,6 +17,8 @@
  */
 package org.apache.cassandra.cql3.statements;
 
+import java.util.regex.Pattern;
+
 import org.apache.cassandra.auth.*;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
@@ -31,6 +33,8 @@
 /** A <code>CREATE KEYSPACE</code> statement parsed from a CQL query. */
 public class CreateKeyspaceStatement extends SchemaAlteringStatement
 {
+    private static final Pattern PATTERN_WORD_CHARS = Pattern.compile("\\w+");
+
     private final String name;
     private final KeyspaceAttributes attrs;
     private final boolean ifNotExists;
@@ -73,7 +77,7 @@
         ThriftValidation.validateKeyspaceNotSystem(name);
 
         // keyspace name
-        if (!name.matches("\\w+"))
+        if (!PATTERN_WORD_CHARS.matcher(name).matches())
             throw new InvalidRequestException(String.format("\"%s\" is not a valid keyspace name", name));
         if (name.length() > Schema.NAME_LENGTH)
             throw new InvalidRequestException(String.format("Keyspace names shouldn't be more than %s characters long (got \"%s\")", Schema.NAME_LENGTH, name));

diff --git a/src/java/org/apache/cassandra/cql3/statements/CreateTableStatement.java b/src/java/org/apache/cassandra/cql3/statements/CreateTableStatement.java
index 04f76d3..08c3a4c 100644
--- a/src/java/org/apache/cassandra/cql3/statements/CreateTableStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/CreateTableStatement.java

@@ -19,7 +19,7 @@
 
 import java.nio.ByteBuffer;
 import java.util.*;
-
+import java.util.regex.Pattern;
 import com.google.common.collect.HashMultiset;
 import com.google.common.collect.Multiset;
 import org.apache.commons.lang3.StringUtils;
@@ -41,10 +41,12 @@
 /** A {@code CREATE TABLE} parsed from a CQL query statement. */
 public class CreateTableStatement extends SchemaAlteringStatement
 {
+    private static final Pattern PATTERN_WORD_CHARS = Pattern.compile("\\w+");
+
     private List<AbstractType<?>> keyTypes;
     private List<AbstractType<?>> clusteringTypes;
 
-    private final Map<ByteBuffer, CollectionType> collections = new HashMap<>();
+    private final Map<ByteBuffer, AbstractType> multicellColumns = new HashMap<>();
 
     private final List<ColumnIdentifier> keyAliases = new ArrayList<>();
     private final List<ColumnIdentifier> columnAliases = new ArrayList<>();
@@ -202,7 +204,7 @@
         public ParsedStatement.Prepared prepare(Types udts) throws RequestValidationException
         {
             // Column family name
-            if (!columnFamily().matches("\\w+"))
+            if (!PATTERN_WORD_CHARS.matcher(columnFamily()).matches())
                 throw new InvalidRequestException(String.format("\"%s\" is not a valid table name (must be alphanumeric character or underscore only: [a-zA-Z_0-9]+)", columnFamily()));
             if (columnFamily().length() > Schema.NAME_LENGTH)
                 throw new InvalidRequestException(String.format("Table names shouldn't be more than %s characters long (got \"%s\")", Schema.NAME_LENGTH, columnFamily()));
@@ -221,10 +223,24 @@
             {
                 ColumnIdentifier id = entry.getKey();
                 CQL3Type pt = entry.getValue().prepare(keyspace(), udts);
-                if (pt.isCollection() && ((CollectionType)pt.getType()).isMultiCell())
-                    stmt.collections.put(id.bytes, (CollectionType)pt.getType());
+                if (pt.getType().isMultiCell())
+                    stmt.multicellColumns.put(id.bytes, pt.getType());
                 if (entry.getValue().isCounter())
                     stmt.hasCounters = true;
+
+                // check for non-frozen UDTs or collections in a non-frozen UDT
+                if (pt.getType().isUDT() && pt.getType().isMultiCell())
+                {
+                    for (AbstractType<?> innerType : ((UserType) pt.getType()).fieldTypes())
+                    {
+                        if (innerType.isMultiCell())
+                        {
+                            assert innerType.isCollection();  // shouldn't get this far with a nested non-frozen UDT
+                            throw new InvalidRequestException("Non-frozen UDTs with nested non-frozen collections are not supported");
+                        }
+                    }
+                }
+
                 stmt.columns.put(id, pt.getType()); // we'll remove what is not a column below
             }
 
@@ -283,8 +299,8 @@
             // For COMPACT STORAGE, we reject any "feature" that we wouldn't be able to translate back to thrift.
             if (useCompactStorage)
             {
-                if (!stmt.collections.isEmpty())
-                    throw new InvalidRequestException("Non-frozen collection types are not supported with COMPACT STORAGE");
+                if (!stmt.multicellColumns.isEmpty())
+                    throw new InvalidRequestException("Non-frozen collections and UDTs are not supported with COMPACT STORAGE");
                 if (!staticColumns.isEmpty())
                     throw new InvalidRequestException("Static columns are not supported in COMPACT STORAGE tables");
 
@@ -348,8 +364,13 @@
             AbstractType type = columns.get(t);
             if (type == null)
                 throw new InvalidRequestException(String.format("Unknown definition %s referenced in PRIMARY KEY", t));
-            if (type.isCollection() && type.isMultiCell())
-                throw new InvalidRequestException(String.format("Invalid collection type for PRIMARY KEY component %s", t));
+            if (type.isMultiCell())
+            {
+                if (type.isCollection())
+                    throw new InvalidRequestException(String.format("Invalid non-frozen collection type for PRIMARY KEY component %s", t));
+                else
+                    throw new InvalidRequestException(String.format("Invalid non-frozen user-defined type for PRIMARY KEY component %s", t));
+            }
 
             columns.remove(t);
             Boolean isReversed = properties.definedOrdering.get(t);

diff --git a/src/java/org/apache/cassandra/cql3/statements/CreateTypeStatement.java b/src/java/org/apache/cassandra/cql3/statements/CreateTypeStatement.java
index f62b9ea..6f4331b 100644
--- a/src/java/org/apache/cassandra/cql3/statements/CreateTypeStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/CreateTypeStatement.java

@@ -19,6 +19,7 @@
 
 import java.nio.ByteBuffer;
 import java.util.*;
+import java.util.stream.Collectors;
 
 import org.apache.cassandra.auth.Permission;
 import org.apache.cassandra.config.*;
@@ -28,6 +29,7 @@
 import org.apache.cassandra.db.marshal.UserType;
 import org.apache.cassandra.exceptions.*;
 import org.apache.cassandra.schema.KeyspaceMetadata;
+import org.apache.cassandra.schema.Types;
 import org.apache.cassandra.service.ClientState;
 import org.apache.cassandra.service.MigrationManager;
 import org.apache.cassandra.transport.Event;
@@ -35,7 +37,7 @@
 public class CreateTypeStatement extends SchemaAlteringStatement
 {
     private final UTName name;
-    private final List<ColumnIdentifier> columnNames = new ArrayList<>();
+    private final List<FieldIdentifier> columnNames = new ArrayList<>();
     private final List<CQL3Type.Raw> columnTypes = new ArrayList<>();
     private final boolean ifNotExists;
 
@@ -53,7 +55,7 @@
             name.setKeyspace(state.getKeyspace());
     }
 
-    public void addDefinition(ColumnIdentifier name, CQL3Type.Raw type)
+    public void addDefinition(FieldIdentifier name, CQL3Type.Raw type)
     {
         columnNames.add(name);
         columnTypes.add(type);
@@ -74,42 +76,47 @@
             throw new InvalidRequestException(String.format("A user type of name %s already exists", name));
 
         for (CQL3Type.Raw type : columnTypes)
+        {
             if (type.isCounter())
                 throw new InvalidRequestException("A user type cannot contain counters");
+            if (type.isUDT() && !type.isFrozen())
+                throw new InvalidRequestException("A user type cannot contain non-frozen UDTs");
+        }
     }
 
     public static void checkForDuplicateNames(UserType type) throws InvalidRequestException
     {
         for (int i = 0; i < type.size() - 1; i++)
         {
-            ByteBuffer fieldName = type.fieldName(i);
+            FieldIdentifier fieldName = type.fieldName(i);
             for (int j = i+1; j < type.size(); j++)
             {
                 if (fieldName.equals(type.fieldName(j)))
-                    throw new InvalidRequestException(String.format("Duplicate field name %s in type %s",
-                                                                    UTF8Type.instance.getString(fieldName),
-                                                                    UTF8Type.instance.getString(type.name)));
+                    throw new InvalidRequestException(String.format("Duplicate field name %s in type %s", fieldName, type.name));
             }
         }
     }
 
+    public void addToRawBuilder(Types.RawBuilder builder) throws InvalidRequestException
+    {
+        builder.add(name.getStringTypeName(),
+                    columnNames.stream().map(FieldIdentifier::toString).collect(Collectors.toList()),
+                    columnTypes.stream().map(CQL3Type.Raw::toString).collect(Collectors.toList()));
+    }
+
     @Override
     public String keyspace()
     {
         return name.getKeyspace();
     }
 
-    private UserType createType() throws InvalidRequestException
+    public UserType createType() throws InvalidRequestException
     {
-        List<ByteBuffer> names = new ArrayList<>(columnNames.size());
-        for (ColumnIdentifier name : columnNames)
-            names.add(name.bytes);
-
         List<AbstractType<?>> types = new ArrayList<>(columnTypes.size());
         for (CQL3Type.Raw type : columnTypes)
             types.add(type.prepare(keyspace()).getType());
 
-        return new UserType(name.getKeyspace(), name.getUserTypeName(), names, types);
+        return new UserType(name.getKeyspace(), name.getUserTypeName(), columnNames, types, true);
     }
 
     public Event.SchemaChange announceMigration(boolean isLocalOnly) throws InvalidRequestException, ConfigurationException

diff --git a/src/java/org/apache/cassandra/cql3/statements/CreateViewStatement.java b/src/java/org/apache/cassandra/cql3/statements/CreateViewStatement.java
index 13e528c..013adbc 100644
--- a/src/java/org/apache/cassandra/cql3/statements/CreateViewStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/CreateViewStatement.java

@@ -51,8 +51,8 @@
     private final CFName baseName;
     private final List<RawSelector> selectClause;
     private final WhereClause whereClause;
-    private final List<ColumnIdentifier.Raw> partitionKeys;
-    private final List<ColumnIdentifier.Raw> clusteringKeys;
+    private final List<ColumnDefinition.Raw> partitionKeys;
+    private final List<ColumnDefinition.Raw> clusteringKeys;
     public final CFProperties properties = new CFProperties();
     private final boolean ifNotExists;
 
@@ -60,8 +60,8 @@
                                CFName baseName,
                                List<RawSelector> selectClause,
                                WhereClause whereClause,
-                               List<ColumnIdentifier.Raw> partitionKeys,
-                               List<ColumnIdentifier.Raw> clusteringKeys,
+                               List<ColumnDefinition.Raw> partitionKeys,
+                               List<ColumnDefinition.Raw> clusteringKeys,
                                boolean ifNotExists)
     {
         super(viewName);
@@ -159,30 +159,26 @@
                 throw new InvalidRequestException("Cannot use function when defining a materialized view");
             if (selectable instanceof Selectable.WritetimeOrTTL.Raw)
                 throw new InvalidRequestException("Cannot use function when defining a materialized view");
-            ColumnIdentifier identifier = (ColumnIdentifier) selectable.prepare(cfm);
             if (selector.alias != null)
-                throw new InvalidRequestException(String.format("Cannot alias column '%s' as '%s' when defining a materialized view", identifier.toString(), selector.alias.toString()));
+                throw new InvalidRequestException("Cannot use alias when defining a materialized view");
 
-            ColumnDefinition cdef = cfm.getColumnDefinition(identifier);
+            Selectable s = selectable.prepare(cfm);
+            if (s instanceof Term.Raw)
+                throw new InvalidRequestException("Cannot use terms in selection when defining a materialized view");
 
-            if (cdef == null)
-                throw new InvalidRequestException("Unknown column name detected in CREATE MATERIALIZED VIEW statement : "+identifier);
-
-            included.add(identifier);
+            ColumnDefinition cdef = (ColumnDefinition)s;
+            included.add(cdef.name);
         }
 
-        Set<ColumnIdentifier.Raw> targetPrimaryKeys = new HashSet<>();
-        for (ColumnIdentifier.Raw identifier : Iterables.concat(partitionKeys, clusteringKeys))
+        Set<ColumnDefinition.Raw> targetPrimaryKeys = new HashSet<>();
+        for (ColumnDefinition.Raw identifier : Iterables.concat(partitionKeys, clusteringKeys))
         {
             if (!targetPrimaryKeys.add(identifier))
                 throw new InvalidRequestException("Duplicate entry found in PRIMARY KEY: "+identifier);
 
-            ColumnDefinition cdef = cfm.getColumnDefinition(identifier.prepare(cfm));
+            ColumnDefinition cdef = identifier.prepare(cfm);
 
-            if (cdef == null)
-                throw new InvalidRequestException("Unknown column name detected in CREATE MATERIALIZED VIEW statement : "+identifier);
-
-            if (cfm.getColumnDefinition(identifier.prepare(cfm)).type.isMultiCell())
+            if (cdef.type.isMultiCell())
                 throw new InvalidRequestException(String.format("Cannot use MultiCell column '%s' in PRIMARY KEY of materialized view", identifier));
 
             if (cdef.isStatic())
@@ -190,9 +186,9 @@
         }
 
         // build the select statement
-        Map<ColumnIdentifier.Raw, Boolean> orderings = Collections.emptyMap();
+        Map<ColumnDefinition.Raw, Boolean> orderings = Collections.emptyMap();
         SelectStatement.Parameters parameters = new SelectStatement.Parameters(orderings, false, true, false);
-        SelectStatement.RawStatement rawSelect = new SelectStatement.RawStatement(baseName, parameters, selectClause, whereClause, null);
+        SelectStatement.RawStatement rawSelect = new SelectStatement.RawStatement(baseName, parameters, selectClause, whereClause, null, null);
 
         ClientState state = ClientState.forInternalCalls();
         state.setKeyspace(keyspace());
@@ -226,10 +222,10 @@
 
         // This is only used as an intermediate state; this is to catch whether multiple non-PK columns are used
         boolean hasNonPKColumn = false;
-        for (ColumnIdentifier.Raw raw : partitionKeys)
+        for (ColumnDefinition.Raw raw : partitionKeys)
             hasNonPKColumn |= getColumnIdentifier(cfm, basePrimaryKeyCols, hasNonPKColumn, raw, targetPartitionKeys, restrictions);
 
-        for (ColumnIdentifier.Raw raw : clusteringKeys)
+        for (ColumnDefinition.Raw raw : clusteringKeys)
             hasNonPKColumn |= getColumnIdentifier(cfm, basePrimaryKeyCols, hasNonPKColumn, raw, targetClusteringColumns, restrictions);
 
         // We need to include all of the primary key columns from the base table in order to make sure that we do not
@@ -250,13 +246,16 @@
                 throw new InvalidRequestException(String.format("Unable to include static column '%s' which would be included by Materialized View SELECT * statement", identifier));
             }
 
-            if (includeDef && !targetClusteringColumns.contains(identifier) && !targetPartitionKeys.contains(identifier))
+            boolean defInTargetPrimaryKey = targetClusteringColumns.contains(identifier)
+                                            || targetPartitionKeys.contains(identifier);
+
+            if (includeDef && !defInTargetPrimaryKey)
             {
                 includedColumns.add(identifier);
             }
             if (!def.isPrimaryKeyColumn()) continue;
 
-            if (!targetClusteringColumns.contains(identifier) && !targetPartitionKeys.contains(identifier))
+            if (!defInTargetPrimaryKey)
             {
                 if (missingClusteringColumns)
                     columnNames.append(',');
@@ -307,25 +306,24 @@
     private static boolean getColumnIdentifier(CFMetaData cfm,
                                                Set<ColumnIdentifier> basePK,
                                                boolean hasNonPKColumn,
-                                               ColumnIdentifier.Raw raw,
+                                               ColumnDefinition.Raw raw,
                                                List<ColumnIdentifier> columns,
                                                StatementRestrictions restrictions)
     {
-        ColumnIdentifier identifier = raw.prepare(cfm);
-        ColumnDefinition def = cfm.getColumnDefinition(identifier);
+        ColumnDefinition def = raw.prepare(cfm);
 
-        boolean isPk = basePK.contains(identifier);
+        boolean isPk = basePK.contains(def.name);
         if (!isPk && hasNonPKColumn)
-            throw new InvalidRequestException(String.format("Cannot include more than one non-primary key column '%s' in materialized view primary key", identifier));
+            throw new InvalidRequestException(String.format("Cannot include more than one non-primary key column '%s' in materialized view primary key", def.name));
 
         // We don't need to include the "IS NOT NULL" filter on a non-composite partition key
         // because we will never allow a single partition key to be NULL
-        boolean isSinglePartitionKey = cfm.getColumnDefinition(identifier).isPartitionKey()
+        boolean isSinglePartitionKey = def.isPartitionKey()
                                        && cfm.partitionKeyColumns().size() == 1;
         if (!isSinglePartitionKey && !restrictions.isRestricted(def))
-            throw new InvalidRequestException(String.format("Primary key column '%s' is required to be filtered by 'IS NOT NULL'", identifier));
+            throw new InvalidRequestException(String.format("Primary key column '%s' is required to be filtered by 'IS NOT NULL'", def.name));
 
-        columns.add(identifier);
+        columns.add(def.name);
         return !isPk;
     }
 }

diff --git a/src/java/org/apache/cassandra/cql3/statements/DeleteStatement.java b/src/java/org/apache/cassandra/cql3/statements/DeleteStatement.java
index 4888b43..26b25de 100644
--- a/src/java/org/apache/cassandra/cql3/statements/DeleteStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/DeleteStatement.java

@@ -122,7 +122,7 @@
                       Attributes.Raw attrs,
                       List<Operation.RawDeletion> deletions,
                       WhereClause whereClause,
-                      List<Pair<ColumnIdentifier.Raw, ColumnCondition.Raw>> conditions,
+                      List<Pair<ColumnDefinition.Raw, ColumnCondition.Raw>> conditions,
                       boolean ifExists)
         {
             super(name, StatementType.DELETE, attrs, conditions, false, ifExists);
@@ -147,7 +147,7 @@
                 // list. However, we support having the value name for coherence with the static/sparse case
                 checkFalse(def.isPrimaryKeyColumn(), "Invalid identifier %s for deletion (should not be a PRIMARY KEY part)", def.name);
 
-                Operation op = deletion.prepare(cfm.ksName, def);
+                Operation op = deletion.prepare(cfm.ksName, def, cfm);
                 op.collectMarkerSpecification(boundNames);
                 operations.add(op);
             }

diff --git a/src/java/org/apache/cassandra/cql3/statements/DropKeyspaceStatement.java b/src/java/org/apache/cassandra/cql3/statements/DropKeyspaceStatement.java
index 513ff1b..a08b193 100644
--- a/src/java/org/apache/cassandra/cql3/statements/DropKeyspaceStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/DropKeyspaceStatement.java

@@ -17,10 +17,10 @@
  */
 package org.apache.cassandra.cql3.statements;
 
+import org.apache.cassandra.auth.Permission;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.exceptions.RequestValidationException;
-import org.apache.cassandra.auth.Permission;
 import org.apache.cassandra.exceptions.UnauthorizedException;
 import org.apache.cassandra.service.ClientState;
 import org.apache.cassandra.service.MigrationManager;

diff --git a/src/java/org/apache/cassandra/cql3/statements/IndexTarget.java b/src/java/org/apache/cassandra/cql3/statements/IndexTarget.java
index 8cdf2c8..9756a4c 100644
--- a/src/java/org/apache/cassandra/cql3/statements/IndexTarget.java
+++ b/src/java/org/apache/cassandra/cql3/statements/IndexTarget.java

@@ -68,36 +68,36 @@
 
     public static class Raw
     {
-        private final ColumnIdentifier.Raw column;
+        private final ColumnDefinition.Raw column;
         private final Type type;
 
-        private Raw(ColumnIdentifier.Raw column, Type type)
+        private Raw(ColumnDefinition.Raw column, Type type)
         {
             this.column = column;
             this.type = type;
         }
 
-        public static Raw simpleIndexOn(ColumnIdentifier.Raw c)
+        public static Raw simpleIndexOn(ColumnDefinition.Raw c)
         {
             return new Raw(c, Type.SIMPLE);
         }
 
-        public static Raw valuesOf(ColumnIdentifier.Raw c)
+        public static Raw valuesOf(ColumnDefinition.Raw c)
         {
             return new Raw(c, Type.VALUES);
         }
 
-        public static Raw keysOf(ColumnIdentifier.Raw c)
+        public static Raw keysOf(ColumnDefinition.Raw c)
         {
             return new Raw(c, Type.KEYS);
         }
 
-        public static Raw keysAndValuesOf(ColumnIdentifier.Raw c)
+        public static Raw keysAndValuesOf(ColumnDefinition.Raw c)
         {
             return new Raw(c, Type.KEYS_AND_VALUES);
         }
 
-        public static Raw fullCollection(ColumnIdentifier.Raw c)
+        public static Raw fullCollection(ColumnDefinition.Raw c)
         {
             return new Raw(c, Type.FULL);
         }
@@ -109,13 +109,9 @@
             // same syntax as an index on a regular column (i.e. the 'values' in
             // 'CREATE INDEX on table(values(collection));' is optional). So we correct the target type
             // when the target column is a collection & the target type is SIMPLE.
-            ColumnIdentifier colId = column.prepare(cfm);
-            ColumnDefinition columnDef = cfm.getColumnDefinition(colId);
-            if (columnDef == null)
-                throw new InvalidRequestException("No column definition found for column " + colId);
-
+            ColumnDefinition columnDef = column.prepare(cfm);
             Type actualType = (type == Type.SIMPLE && columnDef.type.isCollection()) ? Type.VALUES : type;
-            return new IndexTarget(colId, actualType);
+            return new IndexTarget(columnDef.name, actualType);
         }
     }
 

diff --git a/src/java/org/apache/cassandra/cql3/statements/ModificationStatement.java b/src/java/org/apache/cassandra/cql3/statements/ModificationStatement.java
index 01c2ad1..8d85498 100644
--- a/src/java/org/apache/cassandra/cql3/statements/ModificationStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/ModificationStatement.java

@@ -27,9 +27,9 @@
 import org.apache.cassandra.auth.Permission;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.config.ColumnDefinition.Raw;
 import org.apache.cassandra.config.ViewDefinition;
 import org.apache.cassandra.cql3.*;
-import org.apache.cassandra.cql3.ColumnIdentifier.Raw;
 import org.apache.cassandra.cql3.functions.Function;
 import org.apache.cassandra.cql3.restrictions.StatementRestrictions;
 import org.apache.cassandra.cql3.selection.Selection;
@@ -154,6 +154,14 @@
         conditions.addFunctionsTo(functions);
     }
 
+    /*
+     * May be used by QueryHandler implementations
+     */
+    public StatementRestrictions getRestrictions()
+    {
+        return restrictions;
+    }
+
     public abstract void addUpdateForKey(PartitionUpdate update, Clustering clustering, UpdateParameters params);
 
     public abstract void addUpdateForKey(PartitionUpdate update, Slice slice, UpdateParameters params);
@@ -195,26 +203,26 @@
 
     public int getTimeToLive(QueryOptions options) throws InvalidRequestException
     {
-        return attrs.getTimeToLive(options);
+        return attrs.getTimeToLive(options, cfm.params.defaultTimeToLive);
     }
 
     public void checkAccess(ClientState state) throws InvalidRequestException, UnauthorizedException
     {
-        state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.MODIFY);
+        state.hasColumnFamilyAccess(cfm, Permission.MODIFY);
 
         // CAS updates can be used to simulate a SELECT query, so should require Permission.SELECT as well.
         if (hasConditions())
-            state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.SELECT);
+            state.hasColumnFamilyAccess(cfm, Permission.SELECT);
 
         // MV updates need to get the current state from the table, and might update the views
         // Require Permission.SELECT on the base table, and Permission.MODIFY on the views
         Iterator<ViewDefinition> views = View.findAll(keyspace(), columnFamily()).iterator();
         if (views.hasNext())
         {
-            state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.SELECT);
+            state.hasColumnFamilyAccess(cfm, Permission.SELECT);
             do
             {
-                state.hasColumnFamilyAccess(keyspace(), views.next().viewName, Permission.MODIFY);
+                state.hasColumnFamilyAccess(views.next().metadata, Permission.MODIFY);
             } while (views.hasNext());
         }
 
@@ -367,7 +375,8 @@
 
         if (local)
         {
-            try (ReadOrderGroup orderGroup = group.startOrderGroup(); PartitionIterator iter = group.executeInternal(orderGroup))
+            try (ReadExecutionController executionController = group.executionController();
+                 PartitionIterator iter = group.executeInternal(executionController))
             {
                 return asMaterializedMap(iter);
             }
@@ -540,10 +549,10 @@
 
         }
 
-        Selection.ResultSetBuilder builder = selection.resultSetBuilder(false);
+        Selection.ResultSetBuilder builder = selection.resultSetBuilder(options, false);
         SelectStatement.forSelection(cfm, selection).processPartition(partition, options, builder, FBUtilities.nowInSeconds());
 
-        return builder.build(options.getProtocolVersion());
+        return builder.build();
     }
 
     public ResultMessage executeInternal(QueryState queryState, QueryOptions options) throws RequestValidationException, RequestExecutionException
@@ -556,14 +565,7 @@
     public ResultMessage executeInternalWithoutCondition(QueryState queryState, QueryOptions options) throws RequestValidationException, RequestExecutionException
     {
         for (IMutation mutation : getMutations(options, true, queryState.getTimestamp()))
-        {
-            assert mutation instanceof Mutation || mutation instanceof CounterMutation;
-
-            if (mutation instanceof Mutation)
-                ((Mutation) mutation).apply();
-            else if (mutation instanceof CounterMutation)
-                ((CounterMutation) mutation).apply();
-        }
+            mutation.apply();
         return null;
     }
 
@@ -582,7 +584,8 @@
 
         SinglePartitionReadCommand readCommand = request.readCommand(FBUtilities.nowInSeconds());
         FilteredPartition current;
-        try (ReadOrderGroup orderGroup = readCommand.startOrderGroup(); PartitionIterator iter = readCommand.executeInternal(orderGroup))
+        try (ReadExecutionController executionController = readCommand.executionController();
+             PartitionIterator iter = readCommand.executeInternal(executionController))
         {
             current = FilteredPartition.create(PartitionIterators.getOnlyElement(iter, readCommand));
         }
@@ -688,8 +691,8 @@
 
     private Slices createSlice(QueryOptions options)
     {
-        SortedSet<Slice.Bound> startBounds = restrictions.getClusteringColumnsBounds(Bound.START, options);
-        SortedSet<Slice.Bound> endBounds = restrictions.getClusteringColumnsBounds(Bound.END, options);
+        SortedSet<ClusteringBound> startBounds = restrictions.getClusteringColumnsBounds(Bound.START, options);
+        SortedSet<ClusteringBound> endBounds = restrictions.getClusteringColumnsBounds(Bound.END, options);
 
         return toSlices(startBounds, endBounds);
     }
@@ -728,14 +731,14 @@
         return new UpdateParameters(cfm, updatedColumns(), options, getTimestamp(now, options), getTimeToLive(options), lists);
     }
 
-    private Slices toSlices(SortedSet<Slice.Bound> startBounds, SortedSet<Slice.Bound> endBounds)
+    private Slices toSlices(SortedSet<ClusteringBound> startBounds, SortedSet<ClusteringBound> endBounds)
     {
         assert startBounds.size() == endBounds.size();
 
         Slices.Builder builder = new Slices.Builder(cfm.comparator);
 
-        Iterator<Slice.Bound> starts = startBounds.iterator();
-        Iterator<Slice.Bound> ends = endBounds.iterator();
+        Iterator<ClusteringBound> starts = startBounds.iterator();
+        Iterator<ClusteringBound> ends = endBounds.iterator();
 
         while (starts.hasNext())
         {
@@ -753,21 +756,21 @@
     {
         protected final StatementType type;
         private final Attributes.Raw attrs;
-        private final List<Pair<ColumnIdentifier.Raw, ColumnCondition.Raw>> conditions;
+        private final List<Pair<ColumnDefinition.Raw, ColumnCondition.Raw>> conditions;
         private final boolean ifNotExists;
         private final boolean ifExists;
 
         protected Parsed(CFName name,
                          StatementType type,
                          Attributes.Raw attrs,
-                         List<Pair<ColumnIdentifier.Raw, ColumnCondition.Raw>> conditions,
+                         List<Pair<ColumnDefinition.Raw, ColumnCondition.Raw>> conditions,
                          boolean ifNotExists,
                          boolean ifExists)
         {
             super(name);
             this.type = type;
             this.attrs = attrs;
-            this.conditions = conditions == null ? Collections.<Pair<ColumnIdentifier.Raw, ColumnCondition.Raw>>emptyList() : conditions;
+            this.conditions = conditions == null ? Collections.<Pair<ColumnDefinition.Raw, ColumnCondition.Raw>>emptyList() : conditions;
             this.ifNotExists = ifNotExists;
             this.ifExists = ifExists;
         }
@@ -839,16 +842,13 @@
 
             ColumnConditions.Builder builder = ColumnConditions.newBuilder();
 
-            for (Pair<ColumnIdentifier.Raw, ColumnCondition.Raw> entry : conditions)
+            for (Pair<ColumnDefinition.Raw, ColumnCondition.Raw> entry : conditions)
             {
-                ColumnIdentifier id = entry.left.prepare(metadata);
-                ColumnDefinition def = metadata.getColumnDefinition(id);
-                checkNotNull(metadata.getColumnDefinition(id), "Unknown identifier %s in IF conditions", id);
-
-                ColumnCondition condition = entry.right.prepare(keyspace(), def);
+                ColumnDefinition def = entry.left.prepare(metadata);
+                ColumnCondition condition = entry.right.prepare(keyspace(), def, metadata);
                 condition.collectMarkerSpecification(boundNames);
 
-                checkFalse(def.isPrimaryKeyColumn(), "PRIMARY KEY column '%s' cannot have IF conditions", id);
+                checkFalse(def.isPrimaryKeyColumn(), "PRIMARY KEY column '%s' cannot have IF conditions", def.name);
                 builder.add(condition);
             }
             return builder.build();
@@ -891,8 +891,7 @@
          */
         protected static ColumnDefinition getColumnDefinition(CFMetaData cfm, Raw rawId)
         {
-            ColumnIdentifier id = rawId.prepare(cfm);
-            return checkNotNull(cfm.getColumnDefinition(id), "Unknown identifier %s", id);
+            return rawId.prepare(cfm);
         }
     }
 }

diff --git a/src/java/org/apache/cassandra/cql3/statements/ParsedStatement.java b/src/java/org/apache/cassandra/cql3/statements/ParsedStatement.java
index 4c3f8a9..c7cbb58 100644
--- a/src/java/org/apache/cassandra/cql3/statements/ParsedStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/ParsedStatement.java

@@ -48,18 +48,27 @@
 
     public static class Prepared
     {
+        /**
+         * Contains the CQL statement source if the statement has been "regularly" perpared via
+         * {@link org.apache.cassandra.cql3.QueryProcessor#prepare(java.lang.String, org.apache.cassandra.service.ClientState, boolean)} /
+         * {@link QueryHandler#prepare(java.lang.String, org.apache.cassandra.service.QueryState, java.util.Map)}.
+         * Other usages of this class may or may not contain the CQL statement source.
+         */
+        public String rawCQLStatement;
+
         public final CQLStatement statement;
         public final List<ColumnSpecification> boundNames;
-        public final Short[] partitionKeyBindIndexes;
+        public final short[] partitionKeyBindIndexes;
 
-        protected Prepared(CQLStatement statement, List<ColumnSpecification> boundNames, Short[] partitionKeyBindIndexes)
+        protected Prepared(CQLStatement statement, List<ColumnSpecification> boundNames, short[] partitionKeyBindIndexes)
         {
             this.statement = statement;
             this.boundNames = boundNames;
             this.partitionKeyBindIndexes = partitionKeyBindIndexes;
+            this.rawCQLStatement = "";
         }
 
-        public Prepared(CQLStatement statement, VariableSpecifications names, Short[] partitionKeyBindIndexes)
+        public Prepared(CQLStatement statement, VariableSpecifications names, short[] partitionKeyBindIndexes)
         {
             this(statement, names.getSpecifications(), partitionKeyBindIndexes);
         }

diff --git a/src/java/org/apache/cassandra/cql3/statements/PropertyDefinitions.java b/src/java/org/apache/cassandra/cql3/statements/PropertyDefinitions.java
index 793285b..590910f 100644
--- a/src/java/org/apache/cassandra/cql3/statements/PropertyDefinitions.java
+++ b/src/java/org/apache/cassandra/cql3/statements/PropertyDefinitions.java

@@ -18,6 +18,7 @@
 package org.apache.cassandra.cql3.statements;
 
 import java.util.*;
+import java.util.regex.Pattern;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -26,6 +27,8 @@
 
 public class PropertyDefinitions
 {
+    private static final Pattern PATTERN_POSITIVE = Pattern.compile("(1|true|yes)");
+    
     protected static final Logger logger = LoggerFactory.getLogger(PropertyDefinitions.class);
 
     protected final Map<String, Object> properties = new HashMap<String, Object>();
@@ -91,7 +94,7 @@
     public Boolean getBoolean(String key, Boolean defaultValue) throws SyntaxException
     {
         String value = getSimple(key);
-        return (value == null) ? defaultValue : value.toLowerCase().matches("(1|true|yes)");
+        return (value == null) ? defaultValue : PATTERN_POSITIVE.matcher(value.toLowerCase()).matches();
     }
 
     // Return a property value, typed as a double

diff --git a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
index aca6146..f2b484e 100644
--- a/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/SelectStatement.java

@@ -20,6 +20,7 @@
 import java.nio.ByteBuffer;
 import java.util.*;
 
+import com.google.common.base.MoreObjects;
 import com.google.common.base.Objects;
 import com.google.common.base.Predicate;
 import com.google.common.collect.Iterables;
@@ -39,6 +40,7 @@
 import org.apache.cassandra.db.marshal.CollectionType;
 import org.apache.cassandra.db.marshal.CompositeType;
 import org.apache.cassandra.db.marshal.Int32Type;
+import org.apache.cassandra.db.marshal.UserType;
 import org.apache.cassandra.db.partitions.PartitionIterator;
 import org.apache.cassandra.db.rows.ComplexColumnData;
 import org.apache.cassandra.db.rows.Row;
@@ -60,6 +62,7 @@
 
 import static org.apache.cassandra.cql3.statements.RequestValidations.checkFalse;
 import static org.apache.cassandra.cql3.statements.RequestValidations.checkNotNull;
+import static org.apache.cassandra.cql3.statements.RequestValidations.checkNull;
 import static org.apache.cassandra.cql3.statements.RequestValidations.checkTrue;
 import static org.apache.cassandra.cql3.statements.RequestValidations.invalidRequest;
 import static org.apache.cassandra.utils.ByteBufferUtil.UNSET_BYTE_BUFFER;
@@ -83,6 +86,7 @@
     public final Parameters parameters;
     private final Selection selection;
     private final Term limit;
+    private final Term perPartitionLimit;
 
     private final StatementRestrictions restrictions;
 
@@ -96,7 +100,7 @@
     private final ColumnFilter queriedColumns;
 
     // Used by forSelection below
-    private static final Parameters defaultParameters = new Parameters(Collections.<ColumnIdentifier.Raw, Boolean>emptyMap(), false, false, false);
+    private static final Parameters defaultParameters = new Parameters(Collections.<ColumnDefinition.Raw, Boolean>emptyMap(), false, false, false);
 
     public SelectStatement(CFMetaData cfm,
                            int boundTerms,
@@ -105,7 +109,8 @@
                            StatementRestrictions restrictions,
                            boolean isReversed,
                            Comparator<List<ByteBuffer>> orderingComparator,
-                           Term limit)
+                           Term limit,
+                           Term perPartitionLimit)
     {
         this.cfm = cfm;
         this.boundTerms = boundTerms;
@@ -115,6 +120,7 @@
         this.orderingComparator = orderingComparator;
         this.parameters = parameters;
         this.limit = limit;
+        this.perPartitionLimit = perPartitionLimit;
         this.queriedColumns = gatherQueriedColumns();
     }
 
@@ -132,6 +138,9 @@
 
         if (limit != null)
             limit.addFunctionsTo(functions);
+
+        if (perPartitionLimit != null)
+            perPartitionLimit.addFunctionsTo(functions);
     }
 
     // Note that the queried columns internally is different from the one selected by the
@@ -172,6 +181,7 @@
                                    StatementRestrictions.empty(StatementType.SELECT, cfm),
                                    false,
                                    null,
+                                   null,
                                    null);
     }
 
@@ -191,11 +201,11 @@
         {
             CFMetaData baseTable = View.findBaseTable(keyspace(), columnFamily());
             if (baseTable != null)
-                state.hasColumnFamilyAccess(keyspace(), baseTable.cfName, Permission.SELECT);
+                state.hasColumnFamilyAccess(baseTable, Permission.SELECT);
         }
         else
         {
-            state.hasColumnFamilyAccess(keyspace(), columnFamily(), Permission.SELECT);
+            state.hasColumnFamilyAccess(cfm, Permission.SELECT);
         }
 
         for (Function function : getFunctions())
@@ -216,7 +226,8 @@
 
         int nowInSec = FBUtilities.nowInSeconds();
         int userLimit = getLimit(options);
-        ReadQuery query = getQuery(options, nowInSec, userLimit);
+        int userPerPartitionLimit = getPerPartitionLimit(options);
+        ReadQuery query = getQuery(options, nowInSec, userLimit, userPerPartitionLimit);
 
         int pageSize = getPageSize(options);
 
@@ -242,12 +253,12 @@
 
     public ReadQuery getQuery(QueryOptions options, int nowInSec) throws RequestValidationException
     {
-        return getQuery(options, nowInSec, getLimit(options));
+        return getQuery(options, nowInSec, getLimit(options), getPerPartitionLimit(options));
     }
 
-    public ReadQuery getQuery(QueryOptions options, int nowInSec, int userLimit) throws RequestValidationException
+    public ReadQuery getQuery(QueryOptions options, int nowInSec, int userLimit, int perPartitionLimit) throws RequestValidationException
     {
-        DataLimits limit = getDataLimits(userLimit);
+        DataLimits limit = getDataLimits(userLimit, perPartitionLimit);
         if (restrictions.isKeyRange() || restrictions.usesSecondaryIndexing())
             return getRangeCommand(options, limit, nowInSec);
 
@@ -276,9 +287,9 @@
             this.pager = pager;
         }
 
-        public static Pager forInternalQuery(QueryPager pager, ReadOrderGroup orderGroup)
+        public static Pager forInternalQuery(QueryPager pager, ReadExecutionController executionController)
         {
-            return new InternalPager(pager, orderGroup);
+            return new InternalPager(pager, executionController);
         }
 
         public static Pager forDistributedQuery(QueryPager pager, ConsistencyLevel consistency, ClientState clientState)
@@ -318,17 +329,17 @@
 
         public static class InternalPager extends Pager
         {
-            private final ReadOrderGroup orderGroup;
+            private final ReadExecutionController executionController;
 
-            private InternalPager(QueryPager pager, ReadOrderGroup orderGroup)
+            private InternalPager(QueryPager pager, ReadExecutionController executionController)
             {
                 super(pager);
-                this.orderGroup = orderGroup;
+                this.executionController = executionController;
             }
 
             public PartitionIterator fetchPage(int pageSize)
             {
-                return pager.fetchPageInternal(pageSize, orderGroup);
+                return pager.fetchPageInternal(pageSize, executionController);
             }
         }
     }
@@ -375,7 +386,7 @@
             ClientWarn.instance.warn("Aggregation query used on multiple partition keys (IN restriction)");
         }
 
-        Selection.ResultSetBuilder result = selection.resultSetBuilder(parameters.isJson);
+        Selection.ResultSetBuilder result = selection.resultSetBuilder(options, parameters.isJson);
         while (!pager.isExhausted())
         {
             try (PartitionIterator iter = pager.fetchPage(pageSize))
@@ -389,7 +400,7 @@
                 }
             }
         }
-        return new ResultMessage.Rows(result.build(options.getProtocolVersion()));
+        return new ResultMessage.Rows(result.build());
     }
 
     private ResultMessage.Rows processResults(PartitionIterator partitions,
@@ -409,14 +420,15 @@
     public ResultMessage.Rows executeInternal(QueryState state, QueryOptions options, int nowInSec) throws RequestExecutionException, RequestValidationException
     {
         int userLimit = getLimit(options);
-        ReadQuery query = getQuery(options, nowInSec, userLimit);
+        int userPerPartitionLimit = getPerPartitionLimit(options);
+        ReadQuery query = getQuery(options, nowInSec, userLimit, userPerPartitionLimit);
         int pageSize = getPageSize(options);
 
-        try (ReadOrderGroup orderGroup = query.startOrderGroup())
+        try (ReadExecutionController executionController = query.executionController())
         {
             if (pageSize <= 0 || query.limits().count() <= pageSize)
             {
-                try (PartitionIterator data = query.executeInternal(orderGroup))
+                try (PartitionIterator data = query.executeInternal(executionController))
                 {
                     return processResults(data, options, nowInSec, userLimit);
                 }
@@ -424,7 +436,7 @@
             else
             {
                 QueryPager pager = query.getPager(options.getPagingState(), options.getProtocolVersion());
-                return execute(Pager.forInternalQuery(pager, orderGroup), options, pageSize, nowInSec, userLimit);
+                return execute(Pager.forInternalQuery(pager, executionController), options, pageSize, nowInSec, userLimit);
             }
         }
     }
@@ -596,27 +608,27 @@
     private Slices makeSlices(QueryOptions options)
     throws InvalidRequestException
     {
-        SortedSet<Slice.Bound> startBounds = restrictions.getClusteringColumnsBounds(Bound.START, options);
-        SortedSet<Slice.Bound> endBounds = restrictions.getClusteringColumnsBounds(Bound.END, options);
+        SortedSet<ClusteringBound> startBounds = restrictions.getClusteringColumnsBounds(Bound.START, options);
+        SortedSet<ClusteringBound> endBounds = restrictions.getClusteringColumnsBounds(Bound.END, options);
         assert startBounds.size() == endBounds.size();
 
         // The case where startBounds == 1 is common enough that it's worth optimizing
         if (startBounds.size() == 1)
         {
-            Slice.Bound start = startBounds.first();
-            Slice.Bound end = endBounds.first();
+            ClusteringBound start = startBounds.first();
+            ClusteringBound end = endBounds.first();
             return cfm.comparator.compare(start, end) > 0
                  ? Slices.NONE
                  : Slices.with(cfm.comparator, Slice.make(start, end));
         }
 
         Slices.Builder builder = new Slices.Builder(cfm.comparator, startBounds.size());
-        Iterator<Slice.Bound> startIter = startBounds.iterator();
-        Iterator<Slice.Bound> endIter = endBounds.iterator();
+        Iterator<ClusteringBound> startIter = startBounds.iterator();
+        Iterator<ClusteringBound> endIter = endBounds.iterator();
         while (startIter.hasNext() && endIter.hasNext())
         {
-            Slice.Bound start = startIter.next();
-            Slice.Bound end = endIter.next();
+            ClusteringBound start = startIter.next();
+            ClusteringBound end = endIter.next();
 
             // Ignore slices that are nonsensical
             if (cfm.comparator.compare(start, end) > 0)
@@ -628,9 +640,10 @@
         return builder.build();
     }
 
-    private DataLimits getDataLimits(int userLimit)
+    private DataLimits getDataLimits(int userLimit, int perPartitionLimit)
     {
         int cqlRowLimit = DataLimits.NO_LIMIT;
+        int cqlPerPartitionLimit = DataLimits.NO_LIMIT;
 
         // If we aggregate, the limit really apply to the number of rows returned to the user, not to what is queried, and
         // since in practice we currently only aggregate at top level (we have no GROUP BY support yet), we'll only ever
@@ -638,24 +651,44 @@
         // Whenever we support GROUP BY, we'll have to add a new DataLimits kind that knows how things are grouped and is thus
         // able to apply the user limit properly.
         // If we do post ordering we need to get all the results sorted before we can trim them.
-        if (!selection.isAggregate() && !needsPostQueryOrdering())
-            cqlRowLimit = userLimit;
-
+        if (!selection.isAggregate())
+        {
+            if (!needsPostQueryOrdering())
+                cqlRowLimit = userLimit;
+            cqlPerPartitionLimit = perPartitionLimit;
+        }
         if (parameters.isDistinct)
             return cqlRowLimit == DataLimits.NO_LIMIT ? DataLimits.DISTINCT_NONE : DataLimits.distinctLimits(cqlRowLimit);
 
-        return cqlRowLimit == DataLimits.NO_LIMIT ? DataLimits.NONE : DataLimits.cqlLimits(cqlRowLimit);
+        return DataLimits.cqlLimits(cqlRowLimit, cqlPerPartitionLimit);
     }
 
     /**
      * Returns the limit specified by the user.
      * May be used by custom QueryHandler implementations
      *
-     * @return the limit specified by the user or <code>DataLimits.NO_LIMIT</code> if no value 
+     * @return the limit specified by the user or <code>DataLimits.NO_LIMIT</code> if no value
      * as been specified.
      */
     public int getLimit(QueryOptions options)
     {
+        return getLimit(limit, options);
+    }
+
+    /**
+     * Returns the per partition limit specified by the user.
+     * May be used by custom QueryHandler implementations
+     *
+     * @return the per partition limit specified by the user or <code>DataLimits.NO_LIMIT</code> if no value
+     * as been specified.
+     */
+    public int getPerPartitionLimit(QueryOptions options)
+    {
+        return getLimit(perPartitionLimit, options);
+    }
+
+    private int getLimit(Term limit, QueryOptions options)
+    {
         int userLimit = DataLimits.NO_LIMIT;
 
         if (limit != null)
@@ -703,7 +736,7 @@
                               int nowInSec,
                               int userLimit) throws InvalidRequestException
     {
-        Selection.ResultSetBuilder result = selection.resultSetBuilder(parameters.isJson);
+        Selection.ResultSetBuilder result = selection.resultSetBuilder(options, parameters.isJson);
         while (partitions.hasNext())
         {
             try (RowIterator partition = partitions.next())
@@ -712,7 +745,7 @@
             }
         }
 
-        ResultSet cqlRows = result.build(options.getProtocolVersion());
+        ResultSet cqlRows = result.build();
 
         orderResults(cqlRows);
 
@@ -743,15 +776,14 @@
         ByteBuffer[] keyComponents = getComponents(cfm, partition.partitionKey());
 
         Row staticRow = partition.staticRow();
-        // If there is no rows, then provided the select was a full partition selection
-        // (i.e. not a 2ndary index search and there was no condition on clustering columns),
+        // If there is no rows, and there's no restriction on clustering/regular columns,
+        // then provided the select was a full partition selection (either by partition key and/or by static column),
         // we want to include static columns and we're done.
         if (!partition.hasNext())
         {
-            if (!staticRow.isEmpty() && (!restrictions.usesSecondaryIndexing() || cfm.isStaticCompactTable())
-                    && !restrictions.hasClusteringColumnsRestriction())
+            if (!staticRow.isEmpty() && (!restrictions.hasClusteringColumnsRestriction() || cfm.isStaticCompactTable()))
             {
-                result.newRow(protocolVersion);
+                result.newRow();
                 for (ColumnDefinition def : selection.getColumns())
                 {
                     switch (def.kind)
@@ -773,7 +805,7 @@
         while (partition.hasNext())
         {
             Row row = partition.next();
-            result.newRow(protocolVersion);
+            result.newRow();
             // Respect selection order
             for (ColumnDefinition def : selection.getColumns())
             {
@@ -800,13 +832,14 @@
     {
         if (def.isComplex())
         {
-            // Collections are the only complex types we have so far
-            assert def.type.isCollection() && def.type.isMultiCell();
+            assert def.type.isMultiCell();
             ComplexColumnData complexData = row.getComplexColumnData(def);
             if (complexData == null)
-                result.add((ByteBuffer)null);
+                result.add(null);
+            else if (def.type.isCollection())
+                result.add(((CollectionType) def.type).serializeForNativeProtocol(complexData.iterator(), protocolVersion));
             else
-                result.add(((CollectionType)def.type).serializeForNativeProtocol(def, complexData.iterator(), protocolVersion));
+                result.add(((UserType) def.type).serializeForNativeProtocol(complexData.iterator(), protocolVersion));
         }
         else
         {
@@ -837,14 +870,20 @@
         public final List<RawSelector> selectClause;
         public final WhereClause whereClause;
         public final Term.Raw limit;
+        public final Term.Raw perPartitionLimit;
 
-        public RawStatement(CFName cfName, Parameters parameters, List<RawSelector> selectClause, WhereClause whereClause, Term.Raw limit)
+        public RawStatement(CFName cfName, Parameters parameters,
+                            List<RawSelector> selectClause,
+                            WhereClause whereClause,
+                            Term.Raw limit,
+                            Term.Raw perPartitionLimit)
         {
             super(cfName);
             this.parameters = parameters;
             this.selectClause = selectClause;
             this.whereClause = whereClause;
             this.limit = limit;
+            this.perPartitionLimit = perPartitionLimit;
         }
 
         public ParsedStatement.Prepared prepare() throws InvalidRequestException
@@ -859,12 +898,18 @@
 
             Selection selection = selectClause.isEmpty()
                                   ? Selection.wildcard(cfm)
-                                  : Selection.fromSelectors(cfm, selectClause);
+                                  : Selection.fromSelectors(cfm, selectClause, boundNames);
 
             StatementRestrictions restrictions = prepareRestrictions(cfm, boundNames, selection, forView);
 
             if (parameters.isDistinct)
+            {
+                checkNull(perPartitionLimit, "PER PARTITION LIMIT is not allowed with SELECT DISTINCT queries");
                 validateDistinctSelection(cfm, selection, restrictions);
+            }
+
+            checkFalse(selection.isAggregate() && perPartitionLimit != null,
+                       "PER PARTITION LIMIT is not allowed with aggregate queries.");
 
             Comparator<List<ByteBuffer>> orderingComparator = null;
             boolean isReversed = false;
@@ -888,7 +933,8 @@
                                                         restrictions,
                                                         isReversed,
                                                         orderingComparator,
-                                                        prepareLimit(boundNames));
+                                                        prepareLimit(boundNames, limit, keyspace(), limitReceiver()),
+                                                        prepareLimit(boundNames, perPartitionLimit, keyspace(), perPartitionLimitReceiver()));
 
             return new ParsedStatement.Prepared(stmt, boundNames, boundNames.getPartitionKeyBindIndexes(cfm));
         }
@@ -907,32 +953,24 @@
                                                           Selection selection,
                                                           boolean forView) throws InvalidRequestException
         {
-            try
-            {
-                return new StatementRestrictions(StatementType.SELECT,
-                                                 cfm,
-                                                 whereClause,
-                                                 boundNames,
-                                                 selection.containsOnlyStaticColumns(),
-                                                 selection.containsACollection(),
-                                                 parameters.allowFiltering,
-                                                 forView);
-            }
-            catch (UnrecognizedEntityException e)
-            {
-                if (containsAlias(e.entity))
-                    throw invalidRequest("Aliases aren't allowed in the where clause ('%s')", e.relation);
-                throw e;
-            }
+            return new StatementRestrictions(StatementType.SELECT,
+                                             cfm,
+                                             whereClause,
+                                             boundNames,
+                                             selection.containsOnlyStaticColumns(),
+                                             selection.containsAComplexColumn(),
+                                             parameters.allowFiltering,
+                                             forView);
         }
 
         /** Returns a Term for the limit or null if no limit is set */
-        private Term prepareLimit(VariableSpecifications boundNames) throws InvalidRequestException
+        private Term prepareLimit(VariableSpecifications boundNames, Term.Raw limit,
+                                  String keyspace, ColumnSpecification limitReceiver) throws InvalidRequestException
         {
             if (limit == null)
                 return null;
 
-            Term prepLimit = limit.prepare(keyspace(), limitReceiver());
+            Term prepLimit = limit.prepare(keyspace, limitReceiver);
             prepLimit.collectMarkerSpecification(boundNames);
             return prepLimit;
         }
@@ -968,12 +1006,6 @@
                           "SELECT DISTINCT queries must request all the partition key columns (missing %s)", def.name);
         }
 
-        private void handleUnrecognizedOrderingColumn(ColumnIdentifier column) throws InvalidRequestException
-        {
-            checkFalse(containsAlias(column), "Aliases are not allowed in order by clause ('%s')", column);
-            checkFalse(true, "Order by on unknown column %s", column);
-        }
-
         private Comparator<List<ByteBuffer>> getOrderingComparator(CFMetaData cfm,
                                                                    Selection selection,
                                                                    StatementRestrictions restrictions)
@@ -987,10 +1019,9 @@
             List<Integer> idToSort = new ArrayList<Integer>();
             List<Comparator<ByteBuffer>> sorters = new ArrayList<Comparator<ByteBuffer>>();
 
-            for (ColumnIdentifier.Raw raw : parameters.orderings.keySet())
+            for (ColumnDefinition.Raw raw : parameters.orderings.keySet())
             {
-                ColumnIdentifier identifier = raw.prepare(cfm);
-                ColumnDefinition orderingColumn = cfm.getColumnDefinition(identifier);
+                ColumnDefinition orderingColumn = raw.prepare(cfm);
                 idToSort.add(orderingIndexes.get(orderingColumn.name));
                 sorters.add(orderingColumn.type);
             }
@@ -1005,12 +1036,9 @@
             // even if we don't
             // ultimately ship them to the client (CASSANDRA-4911).
             Map<ColumnIdentifier, Integer> orderingIndexes = new HashMap<>();
-            for (ColumnIdentifier.Raw raw : parameters.orderings.keySet())
+            for (ColumnDefinition.Raw raw : parameters.orderings.keySet())
             {
-                ColumnIdentifier column = raw.prepare(cfm);
-                final ColumnDefinition def = cfm.getColumnDefinition(column);
-                if (def == null)
-                    handleUnrecognizedOrderingColumn(column);
+                final ColumnDefinition def = raw.prepare(cfm);
                 int index = selection.getResultSetIndex(def);
                 if (index < 0)
                     index = selection.addColumnForOrdering(def);
@@ -1023,17 +1051,13 @@
         {
             Boolean[] reversedMap = new Boolean[cfm.clusteringColumns().size()];
             int i = 0;
-            for (Map.Entry<ColumnIdentifier.Raw, Boolean> entry : parameters.orderings.entrySet())
+            for (Map.Entry<ColumnDefinition.Raw, Boolean> entry : parameters.orderings.entrySet())
             {
-                ColumnIdentifier column = entry.getKey().prepare(cfm);
+                ColumnDefinition def = entry.getKey().prepare(cfm);
                 boolean reversed = entry.getValue();
 
-                ColumnDefinition def = cfm.getColumnDefinition(column);
-                if (def == null)
-                    handleUnrecognizedOrderingColumn(column);
-
                 checkTrue(def.isClusteringColumn(),
-                          "Order by is currently only supported on the clustered columns of the PRIMARY KEY, got %s", column);
+                          "Order by is currently only supported on the clustered columns of the PRIMARY KEY, got %s", def.name);
 
                 checkTrue(i++ == def.position(),
                           "Order by currently only support the ordering of columns following their declared order in the PRIMARY KEY");
@@ -1073,43 +1097,37 @@
             }
         }
 
-        private boolean containsAlias(final ColumnIdentifier name)
-        {
-            return Iterables.any(selectClause, new Predicate<RawSelector>()
-                                               {
-                                                   public boolean apply(RawSelector raw)
-                                                   {
-                                                       return name.equals(raw.alias);
-                                                   }
-                                               });
-        }
-
         private ColumnSpecification limitReceiver()
         {
             return new ColumnSpecification(keyspace(), columnFamily(), new ColumnIdentifier("[limit]", true), Int32Type.instance);
         }
 
+        private ColumnSpecification perPartitionLimitReceiver()
+        {
+            return new ColumnSpecification(keyspace(), columnFamily(), new ColumnIdentifier("[per_partition_limit]", true), Int32Type.instance);
+        }
+
         @Override
         public String toString()
         {
-            return Objects.toStringHelper(this)
-                          .add("name", cfName)
-                          .add("selectClause", selectClause)
-                          .add("whereClause", whereClause)
-                          .add("isDistinct", parameters.isDistinct)
-                          .toString();
+            return MoreObjects.toStringHelper(this)
+                              .add("name", cfName)
+                              .add("selectClause", selectClause)
+                              .add("whereClause", whereClause)
+                              .add("isDistinct", parameters.isDistinct)
+                              .toString();
         }
     }
 
     public static class Parameters
     {
         // Public because CASSANDRA-9858
-        public final Map<ColumnIdentifier.Raw, Boolean> orderings;
+        public final Map<ColumnDefinition.Raw, Boolean> orderings;
         public final boolean isDistinct;
         public final boolean allowFiltering;
         public final boolean isJson;
 
-        public Parameters(Map<ColumnIdentifier.Raw, Boolean> orderings,
+        public Parameters(Map<ColumnDefinition.Raw, Boolean> orderings,
                           boolean isDistinct,
                           boolean allowFiltering,
                           boolean isJson)

diff --git a/src/java/org/apache/cassandra/cql3/statements/TableAttributes.java b/src/java/org/apache/cassandra/cql3/statements/TableAttributes.java
index c1a9d54..dee3385 100644
--- a/src/java/org/apache/cassandra/cql3/statements/TableAttributes.java
+++ b/src/java/org/apache/cassandra/cql3/statements/TableAttributes.java

@@ -131,6 +131,9 @@
         if (hasOption(Option.CRC_CHECK_CHANCE))
             builder.crcCheckChance(getDouble(Option.CRC_CHECK_CHANCE));
 
+        if (hasOption(Option.CDC))
+            builder.cdc(getBoolean(Option.CDC.toString(), false));
+
         return builder.build();
     }
 

diff --git a/src/java/org/apache/cassandra/cql3/statements/TruncateStatement.java b/src/java/org/apache/cassandra/cql3/statements/TruncateStatement.java
index 66b3da0..336091d 100644
--- a/src/java/org/apache/cassandra/cql3/statements/TruncateStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/TruncateStatement.java

@@ -70,7 +70,7 @@
 
             StorageProxy.truncateBlocking(keyspace(), columnFamily());
         }
-        catch (UnavailableException | TimeoutException | IOException e)
+        catch (UnavailableException | TimeoutException e)
         {
             throw new TruncateException(e);
         }

diff --git a/src/java/org/apache/cassandra/cql3/statements/UpdateStatement.java b/src/java/org/apache/cassandra/cql3/statements/UpdateStatement.java
index 6f872d4..3657f94 100644
--- a/src/java/org/apache/cassandra/cql3/statements/UpdateStatement.java
+++ b/src/java/org/apache/cassandra/cql3/statements/UpdateStatement.java

@@ -113,7 +113,7 @@
 
     public static class ParsedInsert extends ModificationStatement.Parsed
     {
-        private final List<ColumnIdentifier.Raw> columnNames;
+        private final List<ColumnDefinition.Raw> columnNames;
         private final List<Term.Raw> columnValues;
 
         /**
@@ -127,7 +127,7 @@
          */
         public ParsedInsert(CFName name,
                             Attributes.Raw attrs,
-                            List<ColumnIdentifier.Raw> columnNames,
+                            List<ColumnDefinition.Raw> columnNames,
                             List<Term.Raw> columnValues,
                             boolean ifNotExists)
         {
@@ -170,13 +170,13 @@
                 }
                 else
                 {
-                    Operation operation = new Operation.SetValue(value).prepare(keyspace(), def);
+                    Operation operation = new Operation.SetValue(value).prepare(cfm, def);
                     operation.collectMarkerSpecification(boundNames);
                     operations.add(operation);
                 }
             }
 
-            boolean applyOnlyToStaticColumns = appliesOnlyToStaticColumns(operations, conditions) && !hasClusteringColumnsSet;
+            boolean applyOnlyToStaticColumns = !hasClusteringColumnsSet && appliesOnlyToStaticColumns(operations, conditions);
 
             StatementRestrictions restrictions = new StatementRestrictions(type,
                                                                            cfm,
@@ -233,19 +233,17 @@
                 Term.Raw raw = prepared.getRawTermForColumn(def);
                 if (def.isPrimaryKeyColumn())
                 {
-                    whereClause.add(new SingleColumnRelation(new ColumnIdentifier.ColumnIdentifierValue(def.name),
-                                                             Operator.EQ,
-                                                             raw));
+                    whereClause.add(new SingleColumnRelation(ColumnDefinition.Raw.forColumn(def), Operator.EQ, raw));
                 }
                 else
                 {
-                    Operation operation = new Operation.SetValue(raw).prepare(keyspace(), def);
+                    Operation operation = new Operation.SetValue(raw).prepare(cfm, def);
                     operation.collectMarkerSpecification(boundNames);
                     operations.add(operation);
                 }
             }
 
-            boolean applyOnlyToStaticColumns = appliesOnlyToStaticColumns(operations, conditions) && !hasClusteringColumnsSet;
+            boolean applyOnlyToStaticColumns = !hasClusteringColumnsSet && appliesOnlyToStaticColumns(operations, conditions);
 
             StatementRestrictions restrictions = new StatementRestrictions(type,
                                                                            cfm,
@@ -269,7 +267,7 @@
     public static class ParsedUpdate extends ModificationStatement.Parsed
     {
         // Provided for an UPDATE
-        private final List<Pair<ColumnIdentifier.Raw, Operation.RawUpdate>> updates;
+        private final List<Pair<ColumnDefinition.Raw, Operation.RawUpdate>> updates;
         private final WhereClause whereClause;
 
         /**
@@ -284,9 +282,9 @@
          * */
         public ParsedUpdate(CFName name,
                             Attributes.Raw attrs,
-                            List<Pair<ColumnIdentifier.Raw, Operation.RawUpdate>> updates,
+                            List<Pair<ColumnDefinition.Raw, Operation.RawUpdate>> updates,
                             WhereClause whereClause,
-                            List<Pair<ColumnIdentifier.Raw, ColumnCondition.Raw>> conditions,
+                            List<Pair<ColumnDefinition.Raw, ColumnCondition.Raw>> conditions,
                             boolean ifExists)
         {
             super(name, StatementType.UPDATE, attrs, conditions, false, ifExists);
@@ -302,13 +300,13 @@
         {
             Operations operations = new Operations(type);
 
-            for (Pair<ColumnIdentifier.Raw, Operation.RawUpdate> entry : updates)
+            for (Pair<ColumnDefinition.Raw, Operation.RawUpdate> entry : updates)
             {
                 ColumnDefinition def = getColumnDefinition(cfm, entry.left);
 
                 checkFalse(def.isPrimaryKeyColumn(), "PRIMARY KEY part %s found in SET part", def.name);
 
-                Operation operation = entry.right.prepare(keyspace(), def);
+                Operation operation = entry.right.prepare(cfm, def);
                 operation.collectMarkerSpecification(boundNames);
                 operations.add(operation);
             }

diff --git a/src/java/org/apache/cassandra/db/AbstractBufferClusteringPrefix.java b/src/java/org/apache/cassandra/db/AbstractBufferClusteringPrefix.java
new file mode 100644
index 0000000..95bc777
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/AbstractBufferClusteringPrefix.java

@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.utils.ObjectSizes;
+
+public abstract class AbstractBufferClusteringPrefix extends AbstractClusteringPrefix
+{
+    public static final ByteBuffer[] EMPTY_VALUES_ARRAY = new ByteBuffer[0];
+    private static final long EMPTY_SIZE = ObjectSizes.measure(Clustering.make(EMPTY_VALUES_ARRAY));
+
+    protected final Kind kind;
+    protected final ByteBuffer[] values;
+
+    protected AbstractBufferClusteringPrefix(Kind kind, ByteBuffer[] values)
+    {
+        this.kind = kind;
+        this.values = values;
+    }
+
+    public Kind kind()
+    {
+        return kind;
+    }
+
+    public ClusteringPrefix clustering()
+    {
+        return this;
+    }
+
+    public int size()
+    {
+        return values.length;
+    }
+
+    public ByteBuffer get(int i)
+    {
+        return values[i];
+    }
+
+    public ByteBuffer[] getRawValues()
+    {
+        return values;
+    }
+
+    public long unsharedHeapSize()
+    {
+        return EMPTY_SIZE + ObjectSizes.sizeOnHeapOf(values);
+    }
+
+    public long unsharedHeapSizeExcludingData()
+    {
+        return EMPTY_SIZE + ObjectSizes.sizeOnHeapExcludingData(values);
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/AbstractClusteringPrefix.java b/src/java/org/apache/cassandra/db/AbstractClusteringPrefix.java
index 2631b46..0b1daf7 100644
--- a/src/java/org/apache/cassandra/db/AbstractClusteringPrefix.java
+++ b/src/java/org/apache/cassandra/db/AbstractClusteringPrefix.java

@@ -22,48 +22,14 @@
 import java.util.Objects;
 
 import org.apache.cassandra.utils.FBUtilities;
-import org.apache.cassandra.utils.ObjectSizes;
 
 public abstract class AbstractClusteringPrefix implements ClusteringPrefix
 {
-    protected static final ByteBuffer[] EMPTY_VALUES_ARRAY = new ByteBuffer[0];
-
-    private static final long EMPTY_SIZE = ObjectSizes.measure(new Clustering(EMPTY_VALUES_ARRAY));
-
-    protected final Kind kind;
-    protected final ByteBuffer[] values;
-
-    protected AbstractClusteringPrefix(Kind kind, ByteBuffer[] values)
-    {
-        this.kind = kind;
-        this.values = values;
-    }
-
-    public Kind kind()
-    {
-        return kind;
-    }
-
     public ClusteringPrefix clustering()
     {
         return this;
     }
 
-    public int size()
-    {
-        return values.length;
-    }
-
-    public ByteBuffer get(int i)
-    {
-        return values[i];
-    }
-
-    public ByteBuffer[] getRawValues()
-    {
-        return values;
-    }
-
     public int dataSize()
     {
         int size = 0;
@@ -86,16 +52,6 @@
         FBUtilities.updateWithByte(digest, kind().ordinal());
     }
 
-    public long unsharedHeapSize()
-    {
-        return EMPTY_SIZE + ObjectSizes.sizeOnHeapOf(values);
-    }
-
-    public long unsharedHeapSizeExcludingData()
-    {
-        return EMPTY_SIZE + ObjectSizes.sizeOnHeapExcludingData(values);
-    }
-
     @Override
     public final int hashCode()
     {

diff --git a/src/java/org/apache/cassandra/db/AbstractReadCommandBuilder.java b/src/java/org/apache/cassandra/db/AbstractReadCommandBuilder.java
index dab22c7..849e684 100644
--- a/src/java/org/apache/cassandra/db/AbstractReadCommandBuilder.java
+++ b/src/java/org/apache/cassandra/db/AbstractReadCommandBuilder.java

@@ -43,8 +43,8 @@
     protected Set<ColumnIdentifier> columns;
     protected final RowFilter filter = RowFilter.create();
 
-    private Slice.Bound lowerClusteringBound;
-    private Slice.Bound upperClusteringBound;
+    private ClusteringBound lowerClusteringBound;
+    private ClusteringBound upperClusteringBound;
 
     private NavigableSet<Clustering> clusterings;
 
@@ -64,28 +64,28 @@
     public AbstractReadCommandBuilder fromIncl(Object... values)
     {
         assert lowerClusteringBound == null && clusterings == null;
-        this.lowerClusteringBound = Slice.Bound.create(cfs.metadata.comparator, true, true, values);
+        this.lowerClusteringBound = ClusteringBound.create(cfs.metadata.comparator, true, true, values);
         return this;
     }
 
     public AbstractReadCommandBuilder fromExcl(Object... values)
     {
         assert lowerClusteringBound == null && clusterings == null;
-        this.lowerClusteringBound = Slice.Bound.create(cfs.metadata.comparator, true, false, values);
+        this.lowerClusteringBound = ClusteringBound.create(cfs.metadata.comparator, true, false, values);
         return this;
     }
 
     public AbstractReadCommandBuilder toIncl(Object... values)
     {
         assert upperClusteringBound == null && clusterings == null;
-        this.upperClusteringBound = Slice.Bound.create(cfs.metadata.comparator, false, true, values);
+        this.upperClusteringBound = ClusteringBound.create(cfs.metadata.comparator, false, true, values);
         return this;
     }
 
     public AbstractReadCommandBuilder toExcl(Object... values)
     {
         assert upperClusteringBound == null && clusterings == null;
-        this.upperClusteringBound = Slice.Bound.create(cfs.metadata.comparator, false, false, values);
+        this.upperClusteringBound = ClusteringBound.create(cfs.metadata.comparator, false, false, values);
         return this;
     }
 
@@ -195,8 +195,8 @@
         }
         else
         {
-            Slice slice = Slice.make(lowerClusteringBound == null ? Slice.Bound.BOTTOM : lowerClusteringBound,
-                                     upperClusteringBound == null ? Slice.Bound.TOP : upperClusteringBound);
+            Slice slice = Slice.make(lowerClusteringBound == null ? ClusteringBound.BOTTOM : lowerClusteringBound,
+                                     upperClusteringBound == null ? ClusteringBound.TOP : upperClusteringBound);
             return new ClusteringIndexSliceFilter(Slices.with(cfs.metadata.comparator, slice), reversed);
         }
     }

diff --git a/src/java/org/apache/cassandra/db/BlacklistedDirectories.java b/src/java/org/apache/cassandra/db/BlacklistedDirectories.java
index f47fd57..3e6332c 100644
--- a/src/java/org/apache/cassandra/db/BlacklistedDirectories.java
+++ b/src/java/org/apache/cassandra/db/BlacklistedDirectories.java

@@ -66,6 +66,16 @@
         return Collections.unmodifiableSet(unwritableDirectories);
     }
 
+    public void markUnreadable(String path)
+    {
+        maybeMarkUnreadable(new File(path));
+    }
+
+    public void markUnwritable(String path)
+    {
+        maybeMarkUnwritable(new File(path));
+    }
+
     /**
      * Adds parent directory of the file (or the file itself, if it is a directory)
      * to the set of unreadable directories.

diff --git a/src/java/org/apache/cassandra/db/BlacklistedDirectoriesMBean.java b/src/java/org/apache/cassandra/db/BlacklistedDirectoriesMBean.java
index 3163b9a..3fb9f39 100644
--- a/src/java/org/apache/cassandra/db/BlacklistedDirectoriesMBean.java
+++ b/src/java/org/apache/cassandra/db/BlacklistedDirectoriesMBean.java

@@ -20,10 +20,13 @@
 import java.io.File;
 import java.util.Set;
 
-public interface BlacklistedDirectoriesMBean {
-
+public interface BlacklistedDirectoriesMBean
+{
     public Set<File> getUnreadableDirectories();
     
     public Set<File> getUnwritableDirectories();
-    
+
+    public void markUnreadable(String path);
+
+    public void markUnwritable(String path);
 }

diff --git a/src/java/org/apache/cassandra/db/BufferClustering.java b/src/java/org/apache/cassandra/db/BufferClustering.java
new file mode 100644
index 0000000..7c6bb20
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/BufferClustering.java

@@ -0,0 +1,44 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.utils.memory.AbstractAllocator;
+
+/**
+ * The clustering column values for a row.
+ * <p>
+ * A {@code Clustering} is a {@code ClusteringPrefix} that must always be "complete", i.e. have
+ * as many values as there is clustering columns in the table it is part of. It is the clustering
+ * prefix used by rows.
+ * <p>
+ * Note however that while it's size must be equal to the table clustering size, a clustering can have
+ * {@code null} values, and this mostly for thrift backward compatibility (in practice, if a value is null,
+ * all of the following ones will be too because that's what thrift allows, but it's never assumed by the
+ * code so we could start generally allowing nulls for clustering columns if we wanted to).
+ */
+public class BufferClustering extends AbstractBufferClusteringPrefix implements Clustering
+{
+    BufferClustering(ByteBuffer... values)
+    {
+        super(Kind.CLUSTERING, values);
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/CBuilder.java b/src/java/org/apache/cassandra/db/CBuilder.java
index 94feb93..be56394 100644
--- a/src/java/org/apache/cassandra/db/CBuilder.java
+++ b/src/java/org/apache/cassandra/db/CBuilder.java

@@ -24,7 +24,7 @@
 import org.apache.cassandra.db.marshal.AbstractType;
 
 /**
- * Allows to build ClusteringPrefixes, either Clustering or Slice.Bound.
+ * Allows to build ClusteringPrefixes, either Clustering or ClusteringBound.
  */
 public abstract class CBuilder
 {
@@ -60,7 +60,7 @@
             return Clustering.STATIC_CLUSTERING;
         }
 
-        public Slice.Bound buildBound(boolean isStart, boolean isInclusive)
+        public ClusteringBound buildBound(boolean isStart, boolean isInclusive)
         {
             throw new UnsupportedOperationException();
         }
@@ -80,12 +80,12 @@
             throw new UnsupportedOperationException();
         }
 
-        public Slice.Bound buildBoundWith(ByteBuffer value, boolean isStart, boolean isInclusive)
+        public ClusteringBound buildBoundWith(ByteBuffer value, boolean isStart, boolean isInclusive)
         {
             throw new UnsupportedOperationException();
         }
 
-        public Slice.Bound buildBoundWith(List<ByteBuffer> newValues, boolean isStart, boolean isInclusive)
+        public ClusteringBound buildBoundWith(List<ByteBuffer> newValues, boolean isStart, boolean isInclusive)
         {
             throw new UnsupportedOperationException();
         }
@@ -102,12 +102,12 @@
     public abstract CBuilder add(ByteBuffer value);
     public abstract CBuilder add(Object value);
     public abstract Clustering build();
-    public abstract Slice.Bound buildBound(boolean isStart, boolean isInclusive);
+    public abstract ClusteringBound buildBound(boolean isStart, boolean isInclusive);
     public abstract Slice buildSlice();
     public abstract Clustering buildWith(ByteBuffer value);
     public abstract Clustering buildWith(List<ByteBuffer> newValues);
-    public abstract Slice.Bound buildBoundWith(ByteBuffer value, boolean isStart, boolean isInclusive);
-    public abstract Slice.Bound buildBoundWith(List<ByteBuffer> newValues, boolean isStart, boolean isInclusive);
+    public abstract ClusteringBound buildBoundWith(ByteBuffer value, boolean isStart, boolean isInclusive);
+    public abstract ClusteringBound buildBoundWith(List<ByteBuffer> newValues, boolean isStart, boolean isInclusive);
 
     private static class ArrayBackedBuilder extends CBuilder
     {
@@ -162,20 +162,20 @@
             built = true;
 
             // Currently, only dense table can leave some clustering column out (see #7990)
-            return size == 0 ? Clustering.EMPTY : new Clustering(values);
+            return size == 0 ? Clustering.EMPTY : Clustering.make(values);
         }
 
-        public Slice.Bound buildBound(boolean isStart, boolean isInclusive)
+        public ClusteringBound buildBound(boolean isStart, boolean isInclusive)
         {
             // We don't allow to add more element to a builder that has been built so
             // that we don't have to copy values (even though we have to do it in most cases).
             built = true;
 
             if (size == 0)
-                return isStart ? Slice.Bound.BOTTOM : Slice.Bound.TOP;
+                return isStart ? ClusteringBound.BOTTOM : ClusteringBound.TOP;
 
-            return Slice.Bound.create(Slice.Bound.boundKind(isStart, isInclusive),
-                                      size == values.length ? values : Arrays.copyOfRange(values, 0, size));
+            return ClusteringBound.create(ClusteringBound.boundKind(isStart, isInclusive),
+                                size == values.length ? values : Arrays.copyOfRange(values, 0, size));
         }
 
         public Slice buildSlice()
@@ -196,7 +196,7 @@
 
             ByteBuffer[] newValues = Arrays.copyOf(values, type.size());
             newValues[size] = value;
-            return new Clustering(newValues);
+            return Clustering.make(newValues);
         }
 
         public Clustering buildWith(List<ByteBuffer> newValues)
@@ -207,24 +207,24 @@
             for (ByteBuffer value : newValues)
                 buffers[newSize++] = value;
 
-            return new Clustering(buffers);
+            return Clustering.make(buffers);
         }
 
-        public Slice.Bound buildBoundWith(ByteBuffer value, boolean isStart, boolean isInclusive)
+        public ClusteringBound buildBoundWith(ByteBuffer value, boolean isStart, boolean isInclusive)
         {
             ByteBuffer[] newValues = Arrays.copyOf(values, size+1);
             newValues[size] = value;
-            return Slice.Bound.create(Slice.Bound.boundKind(isStart, isInclusive), newValues);
+            return ClusteringBound.create(ClusteringBound.boundKind(isStart, isInclusive), newValues);
         }
 
-        public Slice.Bound buildBoundWith(List<ByteBuffer> newValues, boolean isStart, boolean isInclusive)
+        public ClusteringBound buildBoundWith(List<ByteBuffer> newValues, boolean isStart, boolean isInclusive)
         {
             ByteBuffer[] buffers = Arrays.copyOf(values, size + newValues.size());
             int newSize = size;
             for (ByteBuffer value : newValues)
                 buffers[newSize++] = value;
 
-            return Slice.Bound.create(Slice.Bound.boundKind(isStart, isInclusive), buffers);
+            return ClusteringBound.create(ClusteringBound.boundKind(isStart, isInclusive), buffers);
         }
     }
 }

diff --git a/src/java/org/apache/cassandra/db/Clustering.java b/src/java/org/apache/cassandra/db/Clustering.java
index a40cc1f..fa38ce1 100644
--- a/src/java/org/apache/cassandra/db/Clustering.java
+++ b/src/java/org/apache/cassandra/db/Clustering.java

@@ -1,53 +1,91 @@
 /*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements.  See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership.  The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License.  You may obtain a copy of the License at
+*
+*    http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing,
+* software distributed under the License is distributed on an
+* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+* KIND, either express or implied.  See the License for the
+* specific language governing permissions and limitations
+* under the License.
+*/
 package org.apache.cassandra.db;
 
 import java.io.IOException;
 import java.nio.ByteBuffer;
-import java.util.*;
+import java.util.List;
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.db.marshal.AbstractType;
-import org.apache.cassandra.io.util.*;
+import org.apache.cassandra.io.util.DataInputBuffer;
+import org.apache.cassandra.io.util.DataInputPlus;
+import org.apache.cassandra.io.util.DataOutputBuffer;
+import org.apache.cassandra.io.util.DataOutputPlus;
 import org.apache.cassandra.utils.memory.AbstractAllocator;
 
-/**
- * The clustering column values for a row.
- * <p>
- * A {@code Clustering} is a {@code ClusteringPrefix} that must always be "complete", i.e. have
- * as many values as there is clustering columns in the table it is part of. It is the clustering
- * prefix used by rows.
- * <p>
- * Note however that while it's size must be equal to the table clustering size, a clustering can have
- * {@code null} values, and this mostly for thrift backward compatibility (in practice, if a value is null,
- * all of the following ones will be too because that's what thrift allows, but it's never assumed by the
- * code so we could start generally allowing nulls for clustering columns if we wanted to).
- */
-public class Clustering extends AbstractClusteringPrefix
+import static org.apache.cassandra.db.AbstractBufferClusteringPrefix.EMPTY_VALUES_ARRAY;
+
+public interface Clustering extends ClusteringPrefix
 {
     public static final Serializer serializer = new Serializer();
 
+    public long unsharedHeapSizeExcludingData();
+
+    public default Clustering copy(AbstractAllocator allocator)
+    {
+        // Important for STATIC_CLUSTERING (but must copy empty native clustering types).
+        if (size() == 0)
+            return kind() == Kind.STATIC_CLUSTERING ? this : new BufferClustering(EMPTY_VALUES_ARRAY);
+
+        ByteBuffer[] newValues = new ByteBuffer[size()];
+        for (int i = 0; i < size(); i++)
+        {
+            ByteBuffer val = get(i);
+            newValues[i] = val == null ? null : allocator.clone(val);
+        }
+        return new BufferClustering(newValues);
+    }
+
+    public default String toString(CFMetaData metadata)
+    {
+        StringBuilder sb = new StringBuilder();
+        for (int i = 0; i < size(); i++)
+        {
+            ColumnDefinition c = metadata.clusteringColumns().get(i);
+            sb.append(i == 0 ? "" : ", ").append(c.name).append('=').append(get(i) == null ? "null" : c.type.getString(get(i)));
+        }
+        return sb.toString();
+    }
+
+    public default String toCQLString(CFMetaData metadata)
+    {
+        StringBuilder sb = new StringBuilder();
+        for (int i = 0; i < size(); i++)
+        {
+            ColumnDefinition c = metadata.clusteringColumns().get(i);
+            sb.append(i == 0 ? "" : ", ").append(c.type.getString(get(i)));
+        }
+        return sb.toString();
+    }
+
+    public static Clustering make(ByteBuffer... values)
+    {
+        return new BufferClustering(values);
+    }
+
     /**
      * The special cased clustering used by all static rows. It is a special case in the
      * sense that it's always empty, no matter how many clustering columns the table has.
      */
-    public static final Clustering STATIC_CLUSTERING = new Clustering(EMPTY_VALUES_ARRAY)
+    public static final Clustering STATIC_CLUSTERING = new BufferClustering(EMPTY_VALUES_ARRAY)
     {
         @Override
         public Kind kind()
@@ -69,7 +107,7 @@
     };
 
     /** Empty clustering for tables having no clustering columns. */
-    public static final Clustering EMPTY = new Clustering(EMPTY_VALUES_ARRAY)
+    public static final Clustering EMPTY = new BufferClustering(EMPTY_VALUES_ARRAY)
     {
         @Override
         public String toString(CFMetaData metadata)
@@ -78,50 +116,6 @@
         }
     };
 
-    public Clustering(ByteBuffer... values)
-    {
-        super(Kind.CLUSTERING, values);
-    }
-
-    public Kind kind()
-    {
-        return Kind.CLUSTERING;
-    }
-
-    public Clustering copy(AbstractAllocator allocator)
-    {
-        // Important for STATIC_CLUSTERING (but no point in being wasteful in general).
-        if (size() == 0)
-            return this;
-
-        ByteBuffer[] newValues = new ByteBuffer[size()];
-        for (int i = 0; i < size(); i++)
-            newValues[i] = values[i] == null ? null : allocator.clone(values[i]);
-        return new Clustering(newValues);
-    }
-
-    public String toString(CFMetaData metadata)
-    {
-        StringBuilder sb = new StringBuilder();
-        for (int i = 0; i < size(); i++)
-        {
-            ColumnDefinition c = metadata.clusteringColumns().get(i);
-            sb.append(i == 0 ? "" : ", ").append(c.name).append('=').append(get(i) == null ? "null" : c.type.getString(get(i)));
-        }
-        return sb.toString();
-    }
-
-    public String toCQLString(CFMetaData metadata)
-    {
-        StringBuilder sb = new StringBuilder();
-        for (int i = 0; i < size(); i++)
-        {
-            ColumnDefinition c = metadata.clusteringColumns().get(i);
-            sb.append(i == 0 ? "" : ", ").append(c.type.getString(get(i)));
-        }
-        return sb.toString();
-    }
-
     /**
      * Serializer for Clustering object.
      * <p>
@@ -155,13 +149,19 @@
             return ClusteringPrefix.serializer.valuesWithoutSizeSerializedSize(clustering, version, types);
         }
 
+        public void skip(DataInputPlus in, int version, List<AbstractType<?>> types) throws IOException
+        {
+            if (!types.isEmpty())
+                ClusteringPrefix.serializer.skipValuesWithoutSize(in, types.size(), version, types);
+        }
+
         public Clustering deserialize(DataInputPlus in, int version, List<AbstractType<?>> types) throws IOException
         {
             if (types.isEmpty())
                 return EMPTY;
 
             ByteBuffer[] values = ClusteringPrefix.serializer.deserializeValuesWithoutSize(in, types.size(), version, types);
-            return new Clustering(values);
+            return new BufferClustering(values);
         }
 
         public Clustering deserialize(ByteBuffer in, int version, List<AbstractType<?>> types)

diff --git a/src/java/org/apache/cassandra/db/ClusteringBound.java b/src/java/org/apache/cassandra/db/ClusteringBound.java
new file mode 100644
index 0000000..c45f7ba
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/ClusteringBound.java

@@ -0,0 +1,171 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.db;
+
+import java.nio.ByteBuffer;
+import java.util.List;
+
+import org.apache.cassandra.utils.memory.AbstractAllocator;
+
+/**
+ * The start or end of a range of clusterings, either inclusive or exclusive.
+ */
+public class ClusteringBound extends ClusteringBoundOrBoundary
+{
+    /** The smallest start bound, i.e. the one that starts before any row. */
+    public static final ClusteringBound BOTTOM = new ClusteringBound(Kind.INCL_START_BOUND, EMPTY_VALUES_ARRAY);
+    /** The biggest end bound, i.e. the one that ends after any row. */
+    public static final ClusteringBound TOP = new ClusteringBound(Kind.INCL_END_BOUND, EMPTY_VALUES_ARRAY);
+
+    protected ClusteringBound(Kind kind, ByteBuffer[] values)
+    {
+        super(kind, values);
+    }
+
+    public static ClusteringBound create(Kind kind, ByteBuffer[] values)
+    {
+        assert !kind.isBoundary();
+        return new ClusteringBound(kind, values);
+    }
+
+    public static Kind boundKind(boolean isStart, boolean isInclusive)
+    {
+        return isStart
+             ? (isInclusive ? Kind.INCL_START_BOUND : Kind.EXCL_START_BOUND)
+             : (isInclusive ? Kind.INCL_END_BOUND : Kind.EXCL_END_BOUND);
+    }
+
+    public static ClusteringBound inclusiveStartOf(ByteBuffer... values)
+    {
+        return create(Kind.INCL_START_BOUND, values);
+    }
+
+    public static ClusteringBound inclusiveEndOf(ByteBuffer... values)
+    {
+        return create(Kind.INCL_END_BOUND, values);
+    }
+
+    public static ClusteringBound exclusiveStartOf(ByteBuffer... values)
+    {
+        return create(Kind.EXCL_START_BOUND, values);
+    }
+
+    public static ClusteringBound exclusiveEndOf(ByteBuffer... values)
+    {
+        return create(Kind.EXCL_END_BOUND, values);
+    }
+
+    public static ClusteringBound inclusiveStartOf(ClusteringPrefix prefix)
+    {
+        ByteBuffer[] values = new ByteBuffer[prefix.size()];
+        for (int i = 0; i < prefix.size(); i++)
+            values[i] = prefix.get(i);
+        return inclusiveStartOf(values);
+    }
+
+    public static ClusteringBound exclusiveStartOf(ClusteringPrefix prefix)
+    {
+        ByteBuffer[] values = new ByteBuffer[prefix.size()];
+        for (int i = 0; i < prefix.size(); i++)
+            values[i] = prefix.get(i);
+        return exclusiveStartOf(values);
+    }
+
+    public static ClusteringBound inclusiveEndOf(ClusteringPrefix prefix)
+    {
+        ByteBuffer[] values = new ByteBuffer[prefix.size()];
+        for (int i = 0; i < prefix.size(); i++)
+            values[i] = prefix.get(i);
+        return inclusiveEndOf(values);
+    }
+
+    public static ClusteringBound create(ClusteringComparator comparator, boolean isStart, boolean isInclusive, Object... values)
+    {
+        CBuilder builder = CBuilder.create(comparator);
+        for (Object val : values)
+        {
+            if (val instanceof ByteBuffer)
+                builder.add((ByteBuffer) val);
+            else
+                builder.add(val);
+        }
+        return builder.buildBound(isStart, isInclusive);
+    }
+
+    @Override
+    public ClusteringBound invert()
+    {
+        return create(kind().invert(), values);
+    }
+
+    public ClusteringBound copy(AbstractAllocator allocator)
+    {
+        return (ClusteringBound) super.copy(allocator);
+    }
+
+    public boolean isStart()
+    {
+        return kind().isStart();
+    }
+
+    public boolean isEnd()
+    {
+        return !isStart();
+    }
+
+    public boolean isInclusive()
+    {
+        return kind == Kind.INCL_START_BOUND || kind == Kind.INCL_END_BOUND;
+    }
+
+    public boolean isExclusive()
+    {
+        return kind == Kind.EXCL_START_BOUND || kind == Kind.EXCL_END_BOUND;
+    }
+
+    // For use by intersects, it's called with the sstable bound opposite to the slice bound
+    // (so if the slice bound is a start, it's call with the max sstable bound)
+    int compareTo(ClusteringComparator comparator, List<ByteBuffer> sstableBound)
+    {
+        for (int i = 0; i < sstableBound.size(); i++)
+        {
+            // Say the slice bound is a start. It means we're in the case where the max
+            // sstable bound is say (1:5) while the slice start is (1). So the start
+            // does start before the sstable end bound (and intersect it). It's the exact
+            // inverse with a end slice bound.
+            if (i >= size())
+                return isStart() ? -1 : 1;
+
+            int cmp = comparator.compareComponent(i, get(i), sstableBound.get(i));
+            if (cmp != 0)
+                return cmp;
+        }
+
+        // Say the slice bound is a start. I means we're in the case where the max
+        // sstable bound is say (1), while the slice start is (1:5). This again means
+        // that the slice start before the end bound.
+        if (size() > sstableBound.size())
+            return isStart() ? -1 : 1;
+
+        // The slice bound is equal to the sstable bound. Results depends on whether the slice is inclusive or not
+        return isInclusive() ? 0 : (isStart() ? 1 : -1);
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/ClusteringBoundOrBoundary.java b/src/java/org/apache/cassandra/db/ClusteringBoundOrBoundary.java
new file mode 100644
index 0000000..7a2cce1
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/ClusteringBoundOrBoundary.java

@@ -0,0 +1,183 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.db;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.List;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.io.util.DataInputPlus;
+import org.apache.cassandra.io.util.DataOutputPlus;
+import org.apache.cassandra.utils.memory.AbstractAllocator;
+
+/**
+ * This class defines a threshold between ranges of clusterings. It can either be a start or end bound of a range, or
+ * the boundary between two different defined ranges.
+ * <p>
+ * The latter is used for range tombstones for 2 main reasons:
+ *   1) When merging multiple iterators having range tombstones (that are represented by their start and end markers),
+ *      we need to know when a range is close on an iterator, if it is reopened right away. Otherwise, we cannot
+ *      easily produce the markers on the merged iterators within risking to fail the sorting guarantees of an
+ *      iterator. See this comment for more details: https://goo.gl/yyB5mR.
+ *   2) This saves some storage space.
+ */
+public abstract class ClusteringBoundOrBoundary extends AbstractBufferClusteringPrefix
+{
+    public static final ClusteringBoundOrBoundary.Serializer serializer = new Serializer();
+
+    protected ClusteringBoundOrBoundary(Kind kind, ByteBuffer[] values)
+    {
+        super(kind, values);
+        assert values.length > 0 || !kind.isBoundary();
+    }
+
+    public static ClusteringBoundOrBoundary create(Kind kind, ByteBuffer[] values)
+    {
+        return kind.isBoundary()
+                ? new ClusteringBoundary(kind, values)
+                : new ClusteringBound(kind, values);
+    }
+
+    public boolean isBoundary()
+    {
+        return kind.isBoundary();
+    }
+
+    public boolean isOpen(boolean reversed)
+    {
+        return kind.isOpen(reversed);
+    }
+
+    public boolean isClose(boolean reversed)
+    {
+        return kind.isClose(reversed);
+    }
+
+    public static ClusteringBound inclusiveOpen(boolean reversed, ByteBuffer[] boundValues)
+    {
+        return new ClusteringBound(reversed ? Kind.INCL_END_BOUND : Kind.INCL_START_BOUND, boundValues);
+    }
+
+    public static ClusteringBound exclusiveOpen(boolean reversed, ByteBuffer[] boundValues)
+    {
+        return new ClusteringBound(reversed ? Kind.EXCL_END_BOUND : Kind.EXCL_START_BOUND, boundValues);
+    }
+
+    public static ClusteringBound inclusiveClose(boolean reversed, ByteBuffer[] boundValues)
+    {
+        return new ClusteringBound(reversed ? Kind.INCL_START_BOUND : Kind.INCL_END_BOUND, boundValues);
+    }
+
+    public static ClusteringBound exclusiveClose(boolean reversed, ByteBuffer[] boundValues)
+    {
+        return new ClusteringBound(reversed ? Kind.EXCL_START_BOUND : Kind.EXCL_END_BOUND, boundValues);
+    }
+
+    public static ClusteringBoundary inclusiveCloseExclusiveOpen(boolean reversed, ByteBuffer[] boundValues)
+    {
+        return new ClusteringBoundary(reversed ? Kind.EXCL_END_INCL_START_BOUNDARY : Kind.INCL_END_EXCL_START_BOUNDARY, boundValues);
+    }
+
+    public static ClusteringBoundary exclusiveCloseInclusiveOpen(boolean reversed, ByteBuffer[] boundValues)
+    {
+        return new ClusteringBoundary(reversed ? Kind.INCL_END_EXCL_START_BOUNDARY : Kind.EXCL_END_INCL_START_BOUNDARY, boundValues);
+    }
+
+    public ClusteringBoundOrBoundary copy(AbstractAllocator allocator)
+    {
+        ByteBuffer[] newValues = new ByteBuffer[size()];
+        for (int i = 0; i < size(); i++)
+            newValues[i] = allocator.clone(get(i));
+        return create(kind(), newValues);
+    }
+
+    public String toString(CFMetaData metadata)
+    {
+        return toString(metadata.comparator);
+    }
+
+    public String toString(ClusteringComparator comparator)
+    {
+        StringBuilder sb = new StringBuilder();
+        sb.append(kind()).append('(');
+        for (int i = 0; i < size(); i++)
+        {
+            if (i > 0)
+                sb.append(", ");
+            sb.append(comparator.subtype(i).getString(get(i)));
+        }
+        return sb.append(')').toString();
+    }
+
+    /**
+     * Returns the inverse of the current bound.
+     * <p>
+     * This invert both start into end (and vice-versa) and inclusive into exclusive (and vice-versa).
+     *
+     * @return the invert of this bound. For instance, if this bound is an exlusive start, this return
+     * an inclusive end with the same values.
+     */
+    public abstract ClusteringBoundOrBoundary invert();
+
+    public static class Serializer
+    {
+        public void serialize(ClusteringBoundOrBoundary bound, DataOutputPlus out, int version, List<AbstractType<?>> types) throws IOException
+        {
+            out.writeByte(bound.kind().ordinal());
+            out.writeShort(bound.size());
+            ClusteringPrefix.serializer.serializeValuesWithoutSize(bound, out, version, types);
+        }
+
+        public long serializedSize(ClusteringBoundOrBoundary bound, int version, List<AbstractType<?>> types)
+        {
+            return 1 // kind ordinal
+                 + TypeSizes.sizeof((short)bound.size())
+                 + ClusteringPrefix.serializer.valuesWithoutSizeSerializedSize(bound, version, types);
+        }
+
+        public ClusteringBoundOrBoundary deserialize(DataInputPlus in, int version, List<AbstractType<?>> types) throws IOException
+        {
+            Kind kind = Kind.values()[in.readByte()];
+            return deserializeValues(in, kind, version, types);
+        }
+
+        public void skipValues(DataInputPlus in, Kind kind, int version, List<AbstractType<?>> types) throws IOException
+        {
+            int size = in.readUnsignedShort();
+            if (size == 0)
+                return;
+
+            ClusteringPrefix.serializer.skipValuesWithoutSize(in, size, version, types);
+        }
+
+        public ClusteringBoundOrBoundary deserializeValues(DataInputPlus in, Kind kind, int version, List<AbstractType<?>> types) throws IOException
+        {
+            int size = in.readUnsignedShort();
+            if (size == 0)
+                return kind.isStart() ? ClusteringBound.BOTTOM : ClusteringBound.TOP;
+
+            ByteBuffer[] values = ClusteringPrefix.serializer.deserializeValuesWithoutSize(in, size, version, types);
+            return create(kind, values);
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/ClusteringBoundary.java b/src/java/org/apache/cassandra/db/ClusteringBoundary.java
new file mode 100644
index 0000000..37b3210
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/ClusteringBoundary.java

@@ -0,0 +1,65 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.db;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.utils.memory.AbstractAllocator;
+
+/**
+ * The threshold between two different ranges, i.e. a shortcut for the combination of two ClusteringBounds -- one
+ * specifying the end of one of the ranges, and its (implicit) complement specifying the beginning of the other.
+ */
+public class ClusteringBoundary extends ClusteringBoundOrBoundary
+{
+    protected ClusteringBoundary(Kind kind, ByteBuffer[] values)
+    {
+        super(kind, values);
+    }
+
+    public static ClusteringBoundary create(Kind kind, ByteBuffer[] values)
+    {
+        assert kind.isBoundary();
+        return new ClusteringBoundary(kind, values);
+    }
+
+    @Override
+    public ClusteringBoundary invert()
+    {
+        return create(kind().invert(), values);
+    }
+
+    @Override
+    public ClusteringBoundary copy(AbstractAllocator allocator)
+    {
+        return (ClusteringBoundary) super.copy(allocator);
+    }
+
+    public ClusteringBound openBound(boolean reversed)
+    {
+        return ClusteringBound.create(kind.openBoundOfBoundary(reversed), values);
+    }
+
+    public ClusteringBound closeBound(boolean reversed)
+    {
+        return ClusteringBound.create(kind.closeBoundOfBoundary(reversed), values);
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/ClusteringComparator.java b/src/java/org/apache/cassandra/db/ClusteringComparator.java
index f3411cf..3030b5a 100644
--- a/src/java/org/apache/cassandra/db/ClusteringComparator.java
+++ b/src/java/org/apache/cassandra/db/ClusteringComparator.java

@@ -18,7 +18,6 @@
 package org.apache.cassandra.db;
 
 import java.nio.ByteBuffer;
-import java.util.Arrays;
 import java.util.Comparator;
 import java.util.List;
 import java.util.Objects;
@@ -29,10 +28,8 @@
 import org.apache.cassandra.db.rows.Row;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.serializers.MarshalException;
-import org.apache.cassandra.utils.ByteBufferUtil;
-import org.apache.cassandra.utils.FastByteOperations;
 
-import static org.apache.cassandra.io.sstable.IndexHelper.IndexInfo;
+import org.apache.cassandra.io.sstable.IndexInfo;
 
 /**
  * A comparator of clustering prefixes (or more generally of {@link Clusterable}}.

diff --git a/src/java/org/apache/cassandra/db/ClusteringPrefix.java b/src/java/org/apache/cassandra/db/ClusteringPrefix.java
index 7f7f964..340e237 100644
--- a/src/java/org/apache/cassandra/db/ClusteringPrefix.java
+++ b/src/java/org/apache/cassandra/db/ClusteringPrefix.java

@@ -37,10 +37,10 @@
  * a "kind" that allows us to implement slices with inclusive and exclusive bounds.
  * <p>
  * In practice, {@code ClusteringPrefix} is just the common parts to its 3 main subtype: {@link Clustering} and
- * {@link Slice.Bound}/{@link RangeTombstone.Bound}, where:
+ * {@link ClusteringBound}/{@link ClusteringBoundary}, where:
  *   1) {@code Clustering} represents the clustering values for a row, i.e. the values for it's clustering columns.
- *   2) {@code Slice.Bound} represents a bound (start or end) of a slice (of rows).
- *   3) {@code RangeTombstoneBoundMarker.Bound} represents a range tombstone marker "bound".
+ *   2) {@code ClusteringBound} represents a bound (start or end) of a slice (of rows) or a range tombstone.
+ *   3) {@code ClusteringBoundary} represents the threshold between two adjacent range tombstones.
  * See those classes for more details.
  */
 public interface ClusteringPrefix extends IMeasurableMemory, Clusterable
@@ -51,7 +51,7 @@
      * The kind of clustering prefix this actually is.
      *
      * The kind {@code STATIC_CLUSTERING} is only implemented by {@link Clustering#STATIC_CLUSTERING} and {@code CLUSTERING} is
-     * implemented by the {@link Clustering} class. The rest is used by {@link Slice.Bound} and {@link RangeTombstone.Bound}.
+     * implemented by the {@link Clustering} class. The rest is used by {@link ClusteringBound} and {@link ClusteringBoundary}.
      */
     public enum Kind
     {
@@ -74,7 +74,7 @@
          */
         public final int comparedToClustering;
 
-        private Kind(int comparison, int comparedToClustering)
+        Kind(int comparison, int comparedToClustering)
         {
             this.comparison = comparison;
             this.comparedToClustering = comparedToClustering;
@@ -122,8 +122,9 @@
                 case EXCL_START_BOUND:
                 case EXCL_END_BOUND:
                     return true;
+                default:
+                    return false;
             }
-            return false;
         }
 
         public boolean isBoundary()
@@ -133,8 +134,9 @@
                 case INCL_END_EXCL_START_BOUNDARY:
                 case EXCL_END_INCL_START_BOUNDARY:
                     return true;
+                default:
+                    return false;
             }
-            return false;
         }
 
         public boolean isStart()
@@ -259,10 +261,21 @@
             }
             else
             {
-                RangeTombstone.Bound.serializer.serialize((RangeTombstone.Bound)clustering, out, version, types);
+                ClusteringBoundOrBoundary.serializer.serialize((ClusteringBoundOrBoundary)clustering, out, version, types);
             }
         }
 
+        public void skip(DataInputPlus in, int version, List<AbstractType<?>> types) throws IOException
+        {
+            Kind kind = Kind.values()[in.readByte()];
+            // We shouldn't serialize static clusterings
+            assert kind != Kind.STATIC_CLUSTERING;
+            if (kind == Kind.CLUSTERING)
+                Clustering.serializer.skip(in, version, types);
+            else
+                ClusteringBoundOrBoundary.serializer.skipValues(in, kind, version, types);
+        }
+
         public ClusteringPrefix deserialize(DataInputPlus in, int version, List<AbstractType<?>> types) throws IOException
         {
             Kind kind = Kind.values()[in.readByte()];
@@ -271,7 +284,7 @@
             if (kind == Kind.CLUSTERING)
                 return Clustering.serializer.deserialize(in, version, types);
             else
-                return RangeTombstone.Bound.serializer.deserializeValues(in, kind, version, types);
+                return ClusteringBoundOrBoundary.serializer.deserializeValues(in, kind, version, types);
         }
 
         public long serializedSize(ClusteringPrefix clustering, int version, List<AbstractType<?>> types)
@@ -281,7 +294,7 @@
             if (clustering.kind() == Kind.CLUSTERING)
                 return 1 + Clustering.serializer.serializedSize((Clustering)clustering, version, types);
             else
-                return RangeTombstone.Bound.serializer.serializedSize((RangeTombstone.Bound)clustering, version, types);
+                return ClusteringBoundOrBoundary.serializer.serializedSize((ClusteringBoundOrBoundary)clustering, version, types);
         }
 
         void serializeValuesWithoutSize(ClusteringPrefix clustering, DataOutputPlus out, int version, List<AbstractType<?>> types) throws IOException
@@ -350,6 +363,24 @@
             return values;
         }
 
+        void skipValuesWithoutSize(DataInputPlus in, int size, int version, List<AbstractType<?>> types) throws IOException
+        {
+            // Callers of this method should handle the case where size = 0 (in all case we want to return a special value anyway).
+            assert size > 0;
+            int offset = 0;
+            while (offset < size)
+            {
+                long header = in.readUnsignedVInt();
+                int limit = Math.min(size, offset + 32);
+                while (offset < limit)
+                {
+                    if (!isNull(header, offset) && !isEmpty(header, offset))
+                         types.get(offset).skipValue(in);
+                    offset++;
+                }
+            }
+        }
+
         /**
          * Whatever the type of a given clustering column is, its value can always be either empty or null. So we at least need to distinguish those
          * 2 values, and because we want to be able to store fixed width values without appending their (fixed) size first, we need a way to encode
@@ -435,9 +466,9 @@
                 this.nextValues = new ByteBuffer[nextSize];
         }
 
-        public int compareNextTo(Slice.Bound bound) throws IOException
+        public int compareNextTo(ClusteringBoundOrBoundary bound) throws IOException
         {
-            if (bound == Slice.Bound.TOP)
+            if (bound == ClusteringBound.TOP)
                 return -1;
 
             for (int i = 0; i < bound.size(); i++)
@@ -489,11 +520,11 @@
                 continue;
         }
 
-        public RangeTombstone.Bound deserializeNextBound() throws IOException
+        public ClusteringBoundOrBoundary deserializeNextBound() throws IOException
         {
             assert !nextIsRow;
             deserializeAll();
-            RangeTombstone.Bound bound = new RangeTombstone.Bound(nextKind, nextValues);
+            ClusteringBoundOrBoundary bound = ClusteringBoundOrBoundary.create(nextKind, nextValues);
             nextValues = null;
             return bound;
         }
@@ -502,7 +533,7 @@
         {
             assert nextIsRow;
             deserializeAll();
-            Clustering clustering = new Clustering(nextValues);
+            Clustering clustering = Clustering.make(nextValues);
             nextValues = null;
             return clustering;
         }

diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
index 400fd36..9d31b60 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStore.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStore.java

@@ -44,7 +44,7 @@
 import org.apache.cassandra.concurrent.*;
 import org.apache.cassandra.config.*;
 import org.apache.cassandra.db.commitlog.CommitLog;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.apache.cassandra.db.compaction.*;
 import org.apache.cassandra.db.filter.ClusteringIndexFilter;
 import org.apache.cassandra.db.filter.DataLimits;
@@ -56,6 +56,7 @@
 import org.apache.cassandra.dht.*;
 import org.apache.cassandra.dht.Range;
 import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.exceptions.StartupException;
 import org.apache.cassandra.index.SecondaryIndexManager;
 import org.apache.cassandra.index.internal.CassandraIndex;
 import org.apache.cassandra.index.transactions.UpdateTransaction;
@@ -124,6 +125,14 @@
 
     private static final Logger logger = LoggerFactory.getLogger(ColumnFamilyStore.class);
 
+    /*
+    We keep a pool of threads for each data directory, size of each pool is memtable_flush_writers.
+    When flushing we start a Flush runnable in the flushExecutor. Flush calculates how to split the
+    memtable ranges over the existing data directories and creates a FlushRunnable for each of the directories.
+    The FlushRunnables are executed in the perDiskflushExecutors and the Flush will block until all FlushRunnables
+    are finished. By having flushExecutor size the same size as each of the perDiskflushExecutors we make sure we can
+    have that many flushes going at the same time.
+    */
     private static final ExecutorService flushExecutor = new JMXEnabledThreadPoolExecutor(DatabaseDescriptor.getFlushWriters(),
                                                                                           StageManager.KEEPALIVE,
                                                                                           TimeUnit.SECONDS,
@@ -131,6 +140,20 @@
                                                                                           new NamedThreadFactory("MemtableFlushWriter"),
                                                                                           "internal");
 
+    private static final ExecutorService [] perDiskflushExecutors = new ExecutorService[DatabaseDescriptor.getAllDataFileLocations().length];
+    static
+    {
+        for (int i = 0; i < DatabaseDescriptor.getAllDataFileLocations().length; i++)
+        {
+            perDiskflushExecutors[i] = new JMXEnabledThreadPoolExecutor(DatabaseDescriptor.getFlushWriters(),
+                                                                        StageManager.KEEPALIVE,
+                                                                        TimeUnit.SECONDS,
+                                                                        new LinkedBlockingQueue<Runnable>(),
+                                                                        new NamedThreadFactory("PerDiskMemtableFlushWriter_"+i),
+                                                                        "internal");
+        }
+    }
+
     // post-flush executor is single threaded to provide guarantee that any flush Future on a CF will never return until prior flushes have completed
     private static final ExecutorService postFlushExecutor = new JMXEnabledThreadPoolExecutor(1,
                                                                                               StageManager.KEEPALIVE,
@@ -266,7 +289,7 @@
             logger.trace("scheduling flush in {} ms", period);
             WrappedRunnable runnable = new WrappedRunnable()
             {
-                protected void runMayThrow() throws Exception
+                protected void runMayThrow()
                 {
                     synchronized (data)
                     {
@@ -469,7 +492,10 @@
 
     public Directories getDirectories()
     {
-        return directories;
+        // todo, hack since we need to know the data directories when constructing the compaction strategy
+        if (directories != null)
+            return directories;
+        return new Directories(metadata, initialDirectories);
     }
 
     public SSTableMultiWriter createSSTableMultiWriter(Descriptor descriptor, long keyCount, long repairedAt, int sstableLevel, SerializationHeader header, LifecycleTransaction txn)
@@ -480,7 +506,7 @@
 
     public SSTableMultiWriter createSSTableMultiWriter(Descriptor descriptor, long keyCount, long repairedAt, MetadataCollector metadataCollector, SerializationHeader header, LifecycleTransaction txn)
     {
-        return getCompactionStrategyManager().createSSTableMultiWriter(descriptor, keyCount, repairedAt, metadataCollector, header, txn);
+        return getCompactionStrategyManager().createSSTableMultiWriter(descriptor, keyCount, repairedAt, metadataCollector, header, indexManager.listIndexes(), txn);
     }
 
     public boolean supportsEarlyOpen()
@@ -580,7 +606,7 @@
      * Removes unnecessary files from the cf directory at startup: these include temp files, orphans, zero-length files
      * and compacted sstables. Files that cannot be recognized will be ignored.
      */
-    public static void scrubDataDirectories(CFMetaData metadata)
+    public static void scrubDataDirectories(CFMetaData metadata) throws StartupException
     {
         Directories directories = new Directories(metadata, initialDirectories);
         Set<File> cleanedDirectories = new HashSet<>();
@@ -591,7 +617,12 @@
         directories.removeTemporaryDirectories();
 
         logger.trace("Removing temporary or obsoleted files from unfinished operations for table {}", metadata.cfName);
-        LifecycleTransaction.removeUnfinishedLeftovers(metadata);
+        if (!LifecycleTransaction.removeUnfinishedLeftovers(metadata))
+            throw new StartupException(StartupException.ERR_WRONG_DISK_STATE,
+                                       String.format("Cannot remove temporary or obsoleted files for %s.%s due to a problem with transaction " +
+                                                     "log files. Please check records with problems in the log messages above and fix them. " +
+                                                     "Refer to the 3.0 upgrading instructions in NEWS.txt " +
+                                                     "for a description of transaction log files.", metadata.ksName, metadata.cfName));
 
         logger.trace("Further extra check for orphan sstable files for {}", metadata.cfName);
         for (Map.Entry<Descriptor,Set<Component>> sstableFiles : directories.sstableLister(Directories.OnTxnErr.IGNORE).list().entrySet())
@@ -799,7 +830,7 @@
      *
      * @param memtable
      */
-    public ListenableFuture<ReplayPosition> switchMemtableIfCurrent(Memtable memtable)
+    public ListenableFuture<CommitLogPosition> switchMemtableIfCurrent(Memtable memtable)
     {
         synchronized (data)
         {
@@ -816,14 +847,14 @@
      * not complete until the Memtable (and all prior Memtables) have been successfully flushed, and the CL
      * marked clean up to the position owned by the Memtable.
      */
-    public ListenableFuture<ReplayPosition> switchMemtable()
+    public ListenableFuture<CommitLogPosition> switchMemtable()
     {
         synchronized (data)
         {
             logFlush();
             Flush flush = new Flush(false);
             flushExecutor.execute(flush);
-            ListenableFutureTask<ReplayPosition> task = ListenableFutureTask.create(flush.postFlush);
+            ListenableFutureTask<CommitLogPosition> task = ListenableFutureTask.create(flush.postFlush);
             postFlushExecutor.submit(task);
             return task;
         }
@@ -850,8 +881,13 @@
             offHeapTotal += allocator.offHeap().owns();
         }
 
-        logger.debug("Enqueuing flush of {}: {}", name, String.format("%d (%.0f%%) on-heap, %d (%.0f%%) off-heap",
-                                                                     onHeapTotal, onHeapRatio * 100, offHeapTotal, offHeapRatio * 100));
+        logger.debug("Enqueuing flush of {}: {}",
+                     name,
+                     String.format("%s (%.0f%%) on-heap, %s (%.0f%%) off-heap",
+                                   FBUtilities.prettyPrintMemory(onHeapTotal),
+                                   onHeapRatio * 100,
+                                   FBUtilities.prettyPrintMemory(offHeapTotal),
+                                   offHeapRatio * 100));
     }
 
 
@@ -861,7 +897,7 @@
      * @return a Future yielding the commit log position that can be guaranteed to have been successfully written
      *         to sstables for this table once the future completes
      */
-    public ListenableFuture<ReplayPosition> forceFlush()
+    public ListenableFuture<CommitLogPosition> forceFlush()
     {
         synchronized (data)
         {
@@ -880,7 +916,7 @@
      * @return a Future yielding the commit log position that can be guaranteed to have been successfully written
      *         to sstables for this table once the future completes
      */
-    public ListenableFuture<ReplayPosition> forceFlush(ReplayPosition flushIfDirtyBefore)
+    public ListenableFuture<?> forceFlush(CommitLogPosition flushIfDirtyBefore)
     {
         // we don't loop through the remaining memtables since here we only care about commit log dirtiness
         // and this does not vary between a table and its table-backed indexes
@@ -894,24 +930,20 @@
      * @return a Future yielding the commit log position that can be guaranteed to have been successfully written
      *         to sstables for this table once the future completes
      */
-    private ListenableFuture<ReplayPosition> waitForFlushes()
+    private ListenableFuture<CommitLogPosition> waitForFlushes()
     {
         // we grab the current memtable; once any preceding memtables have flushed, we know its
         // commitLogLowerBound has been set (as this it is set with the upper bound of the preceding memtable)
         final Memtable current = data.getView().getCurrentMemtable();
-        ListenableFutureTask<ReplayPosition> task = ListenableFutureTask.create(new Callable<ReplayPosition>()
-        {
-            public ReplayPosition call()
-            {
-                logger.debug("forceFlush requested but everything is clean in {}", name);
-                return current.getCommitLogLowerBound();
-            }
+        ListenableFutureTask<CommitLogPosition> task = ListenableFutureTask.create(() -> {
+            logger.debug("forceFlush requested but everything is clean in {}", name);
+            return current.getCommitLogLowerBound();
         });
         postFlushExecutor.execute(task);
         return task;
     }
 
-    public ReplayPosition forceBlockingFlush()
+    public CommitLogPosition forceBlockingFlush()
     {
         return FBUtilities.waitOnFuture(forceFlush());
     }
@@ -920,18 +952,21 @@
      * Both synchronises custom secondary indexes and provides ordering guarantees for futures on switchMemtable/flush
      * etc, which expect to be able to wait until the flush (and all prior flushes) requested have completed.
      */
-    private final class PostFlush implements Callable<ReplayPosition>
+    private final class PostFlush implements Callable<CommitLogPosition>
     {
         final boolean flushSecondaryIndexes;
         final OpOrder.Barrier writeBarrier;
         final CountDownLatch latch = new CountDownLatch(1);
-        volatile FSWriteError flushFailure = null;
-        final ReplayPosition commitLogUpperBound;
+        final CommitLogPosition commitLogUpperBound;
+        volatile Throwable flushFailure = null;
         final List<Memtable> memtables;
         final List<Collection<SSTableReader>> readers;
 
-        private PostFlush(boolean flushSecondaryIndexes, OpOrder.Barrier writeBarrier, ReplayPosition commitLogUpperBound,
-                          List<Memtable> memtables, List<Collection<SSTableReader>> readers)
+        private PostFlush(boolean flushSecondaryIndexes,
+                          OpOrder.Barrier writeBarrier,
+                          CommitLogPosition commitLogUpperBound,
+                          List<Memtable> memtables,
+                          List<Collection<SSTableReader>> readers)
         {
             this.writeBarrier = writeBarrier;
             this.flushSecondaryIndexes = flushSecondaryIndexes;
@@ -940,7 +975,7 @@
             this.readers = readers;
         }
 
-        public ReplayPosition call()
+        public CommitLogPosition call()
         {
             if (discardFlushResults == ColumnFamilyStore.this)
                 return commitLogUpperBound;
@@ -968,6 +1003,8 @@
                 throw new IllegalStateException();
             }
 
+            // Must check commitLogUpperBound != null because Flush may find that all memtables are clean
+            // and so not set a commitLogUpperBound
             // If a flush errored out but the error was ignored, make sure we don't discard the commit log.
             if (flushFailure == null)
             {
@@ -984,7 +1021,7 @@
             metric.pendingFlushes.dec();
 
             if (flushFailure != null)
-                throw flushFailure;
+                throw Throwables.propagate(flushFailure);
 
             return commitLogUpperBound;
         }
@@ -1024,7 +1061,7 @@
             writeBarrier = keyspace.writeOrder.newBarrier();
 
             // submit flushes for the memtable for any indexed sub-cfses, and our own
-            AtomicReference<ReplayPosition> commitLogUpperBound = new AtomicReference<>();
+            AtomicReference<CommitLogPosition> commitLogUpperBound = new AtomicReference<>();
             for (ColumnFamilyStore cfs : concatWithIndexes())
             {
                 // switch all memtables, regardless of their dirty status, setting the barrier
@@ -1042,7 +1079,7 @@
 
             // we then issue the barrier; this lets us wait for all operations started prior to the barrier to complete;
             // since this happens after wiring up the commitLogUpperBound, we also know all operations with earlier
-            // replay positions have also completed, i.e. the memtables are done and ready to flush
+            // commit log segment position have also completed, i.e. the memtables are done and ready to flush
             writeBarrier.issue();
             postFlush = new PostFlush(!truncate, writeBarrier, commitLogUpperBound.get(), memtables, readers);
         }
@@ -1064,7 +1101,6 @@
                 if (memtable.isClean() || truncate)
                 {
                     memtable.cfs.data.replaceFlushed(memtable, Collections.emptyList());
-                    memtable.cfs.compactionStrategyManager.replaceFlushed(memtable, Collections.emptyList());
                     reclaim(memtable);
                     iter.remove();
                 }
@@ -1076,23 +1112,109 @@
             {
                 for (Memtable memtable : memtables)
                 {
-                    Collection<SSTableReader> readers = memtable.flush();
-                    memtable.cfs.data.replaceFlushed(memtable, readers);
-                    reclaim(memtable);
-                    this.readers.add(readers);
+                    this.readers.add(flushMemtable(memtable));
                 }
             }
-            catch (FSWriteError e)
+            catch (Throwable t)
             {
-                JVMStabilityInspector.inspectThrowable(e);
-                // If we weren't killed, try to continue work but do not allow CommitLog to be discarded.
-                postFlush.flushFailure = e;
+                JVMStabilityInspector.inspectThrowable(t);
+                postFlush.flushFailure = t;
             }
-
             // signal the post-flush we've done our work
             postFlush.latch.countDown();
         }
 
+        public Collection<SSTableReader> flushMemtable(Memtable memtable)
+        {
+            List<Future<SSTableMultiWriter>> futures = new ArrayList<>();
+            long totalBytesOnDisk = 0;
+            long maxBytesOnDisk = 0;
+            long minBytesOnDisk = Long.MAX_VALUE;
+            List<SSTableReader> sstables = new ArrayList<>();
+            try (LifecycleTransaction txn = LifecycleTransaction.offline(OperationType.FLUSH))
+            {
+                List<Memtable.FlushRunnable> flushRunnables = null;
+                List<SSTableMultiWriter> flushResults = null;
+
+                try
+                {
+                    // flush the memtable
+                    flushRunnables = memtable.flushRunnables(txn);
+
+                    for (int i = 0; i < flushRunnables.size(); i++)
+                        futures.add(perDiskflushExecutors[i].submit(flushRunnables.get(i)));
+
+                    flushResults = Lists.newArrayList(FBUtilities.waitOnFutures(futures));
+                }
+                catch (Throwable t)
+                {
+                    t = memtable.abortRunnables(flushRunnables, t);
+                    t = txn.abort(t);
+                    throw Throwables.propagate(t);
+                }
+
+                try
+                {
+                    Iterator<SSTableMultiWriter> writerIterator = flushResults.iterator();
+                    while (writerIterator.hasNext())
+                    {
+                        @SuppressWarnings("resource")
+                        SSTableMultiWriter writer = writerIterator.next();
+                        if (writer.getFilePointer() > 0)
+                        {
+                            writer.setOpenResult(true).prepareToCommit();
+                        }
+                        else
+                        {
+                            maybeFail(writer.abort(null));
+                            writerIterator.remove();
+                        }
+                    }
+                }
+                catch (Throwable t)
+                {
+                    for (SSTableMultiWriter writer : flushResults)
+                        t = writer.abort(t);
+                    t = txn.abort(t);
+                    Throwables.propagate(t);
+                }
+
+                txn.prepareToCommit();
+
+                Throwable accumulate = null;
+                for (SSTableMultiWriter writer : flushResults)
+                    accumulate = writer.commit(accumulate);
+
+                maybeFail(txn.commit(accumulate));
+
+                for (SSTableMultiWriter writer : flushResults)
+                {
+                    Collection<SSTableReader> flushedSSTables = writer.finished();
+                    for (SSTableReader sstable : flushedSSTables)
+                    {
+                        if (sstable != null)
+                        {
+                            sstables.add(sstable);
+                            long size = sstable.bytesOnDisk();
+                            totalBytesOnDisk += size;
+                            maxBytesOnDisk = Math.max(maxBytesOnDisk, size);
+                            minBytesOnDisk = Math.min(minBytesOnDisk, size);
+                        }
+                    }
+                }
+            }
+            memtable.cfs.data.replaceFlushed(memtable, sstables);
+            reclaim(memtable);
+            memtable.cfs.compactionStrategyManager.compactionLogger.flush(sstables);
+            logger.debug("Flushed to {} ({} sstables, {}), biggest {}, smallest {}",
+                         sstables,
+                         sstables.size(),
+                         FBUtilities.prettyPrintMemory(totalBytesOnDisk),
+                         FBUtilities.prettyPrintMemory(maxBytesOnDisk),
+                         FBUtilities.prettyPrintMemory(minBytesOnDisk));
+            return sstables;
+        }
+
         private void reclaim(final Memtable memtable)
         {
             // issue a read barrier for reclaiming the memory, and offload the wait to another thread
@@ -1100,7 +1222,7 @@
             readBarrier.issue();
             reclaimExecutor.execute(new WrappedRunnable()
             {
-                public void runMayThrow() throws InterruptedException, ExecutionException
+                public void runMayThrow()
                 {
                     readBarrier.await();
                     memtable.setDiscarded();
@@ -1110,16 +1232,16 @@
     }
 
     // atomically set the upper bound for the commit log
-    private static void setCommitLogUpperBound(AtomicReference<ReplayPosition> commitLogUpperBound)
+    private static void setCommitLogUpperBound(AtomicReference<CommitLogPosition> commitLogUpperBound)
     {
         // we attempt to set the holder to the current commit log context. at the same time all writes to the memtables are
         // also maintaining this value, so if somebody sneaks ahead of us somehow (should be rare) we simply retry,
         // so that we know all operations prior to the position have not reached it yet
-        ReplayPosition lastReplayPosition;
+        CommitLogPosition lastReplayPosition;
         while (true)
         {
-            lastReplayPosition = new Memtable.LastReplayPosition(CommitLog.instance.getContext());
-            ReplayPosition currentLast = commitLogUpperBound.get();
+            lastReplayPosition = new Memtable.LastCommitLogPosition((CommitLog.instance.getCurrentPosition()));
+            CommitLogPosition currentLast = commitLogUpperBound.get();
             if ((currentLast == null || currentLast.compareTo(lastReplayPosition) <= 0)
                 && commitLogUpperBound.compareAndSet(currentLast, lastReplayPosition))
                 break;
@@ -1133,7 +1255,7 @@
     public void simulateFailedFlush()
     {
         discardFlushResults = this;
-        data.markFlushing(data.switchMemtable(false, new Memtable(new AtomicReference<>(CommitLog.instance.getContext()), this)));
+        data.markFlushing(data.switchMemtable(false, new Memtable(new AtomicReference<>(CommitLog.instance.getCurrentPosition()), this)));
     }
 
     public void resumeFlushing()
@@ -1204,15 +1326,6 @@
         return String.format("%.2f/%.2f", onHeap, offHeap);
     }
 
-    public void maybeUpdateRowCache(DecoratedKey key)
-    {
-        if (!isRowCacheEnabled())
-            return;
-
-        RowCacheKey cacheKey = new RowCacheKey(metadata.ksAndCFName, key);
-        invalidateCachedPartition(cacheKey);
-    }
-
     /**
      * Insert/Update the column family for this key.
      * Caller is responsible for acquiring Keyspace.switchLock
@@ -1220,17 +1333,18 @@
      * param @ key - key for update/insert
      * param @ columnFamily - columnFamily changes
      */
-    public void apply(PartitionUpdate update, UpdateTransaction indexer, OpOrder.Group opGroup, ReplayPosition replayPosition)
+    public void apply(PartitionUpdate update, UpdateTransaction indexer, OpOrder.Group opGroup, CommitLogPosition commitLogPosition)
 
     {
         long start = System.nanoTime();
-        Memtable mt = data.getMemtableFor(opGroup, replayPosition);
         try
         {
+            Memtable mt = data.getMemtableFor(opGroup, commitLogPosition);
             long timeDelta = mt.put(update, indexer, opGroup);
             DecoratedKey key = update.partitionKey();
-            maybeUpdateRowCache(key);
+            invalidateCachedPartition(key);
             metric.samplers.get(Sampler.WRITES).addSample(key.getKey(), key.hashCode(), 1);
+            StorageHook.instance.reportWrite(metadata.cfId, update);
             metric.writeLatency.addNano(System.nanoTime() - start);
             if(timeDelta < Long.MAX_VALUE)
                 metric.colUpdateTimeDeltaHistogram.update(timeDelta);
@@ -1238,10 +1352,9 @@
         catch (RuntimeException e)
         {
             throw new RuntimeException(e.getMessage()
-                                                     + " for ks: "
-                                                     + keyspace.getName() + ", table: " + name, e);
+                                       + " for ks: "
+                                       + keyspace.getName() + ", table: " + name, e);
         }
-
     }
 
     /**
@@ -1459,6 +1572,11 @@
         return CompactionManager.instance.performSSTableRewrite(ColumnFamilyStore.this, excludeCurrentVersion, jobs);
     }
 
+    public CompactionManager.AllSSTableOpStatus relocateSSTables(int jobs) throws ExecutionException, InterruptedException
+    {
+        return CompactionManager.instance.relocateSSTables(this, jobs);
+    }
+
     public void markObsolete(Collection<SSTableReader> sstables, OperationType compactionType)
     {
         assert !sstables.isEmpty();
@@ -1571,7 +1689,13 @@
     // WARNING: this returns the set of LIVE sstables only, which may be only partially written
     public List<String> getSSTablesForKey(String key)
     {
-        DecoratedKey dk = decorateKey(metadata.getKeyValidator().fromString(key));
+        return getSSTablesForKey(key, false);
+    }
+
+    public List<String> getSSTablesForKey(String key, boolean hexFormat)
+    {
+        ByteBuffer keyBuffer = hexFormat ? ByteBufferUtil.hexToBytes(key) : metadata.getKeyValidator().fromString(key);
+        DecoratedKey dk = decorateKey(keyBuffer);
         try (OpOrder.Group op = readOrdering.start())
         {
             List<String> files = new ArrayList<>();
@@ -1598,12 +1722,14 @@
         TabularDataSupport result = new TabularDataSupport(COUNTER_TYPE);
         for (Counter<ByteBuffer> counter : samplerResults.topK)
         {
-            byte[] key = counter.getItem().array();
+            //Not duplicating the buffer for safety because AbstractSerializer and ByteBufferUtil.bytesToHex
+            //don't modify position or limit
+            ByteBuffer key = counter.getItem();
             result.put(new CompositeDataSupport(COUNTER_COMPOSITE_TYPE, COUNTER_NAMES, new Object[] {
-                    Hex.bytesToHex(key), // raw
+                    ByteBufferUtil.bytesToHex(key), // raw
                     counter.getCount(),  // count
                     counter.getError(),  // error
-                    metadata.getKeyValidator().getString(ByteBuffer.wrap(key)) })); // string
+                    metadata.getKeyValidator().getString(key) })); // string
         }
         return new CompositeDataSupport(SAMPLING_RESULT, SAMPLER_NAMES, new Object[]{
                 samplerResults.cardinality, result});
@@ -1784,16 +1910,31 @@
      */
     public Set<SSTableReader> snapshot(String snapshotName)
     {
-        return snapshot(snapshotName, null, false);
+        return snapshot(snapshotName, false);
+    }
+
+    /**
+     * Take a snap shot of this columnfamily store.
+     *
+     * @param snapshotName the name of the associated with the snapshot
+     * @param skipFlush Skip blocking flush of memtable
+     */
+    public Set<SSTableReader> snapshot(String snapshotName, boolean skipFlush)
+    {
+        return snapshot(snapshotName, null, false, skipFlush);
     }
 
 
     /**
      * @param ephemeral If this flag is set to true, the snapshot will be cleaned up during next startup
+     * @param skipFlush Skip blocking flush of memtable
      */
-    public Set<SSTableReader> snapshot(String snapshotName, Predicate<SSTableReader> predicate, boolean ephemeral)
+    public Set<SSTableReader> snapshot(String snapshotName, Predicate<SSTableReader> predicate, boolean ephemeral, boolean skipFlush)
     {
-        forceBlockingFlush();
+        if (!skipFlush)
+        {
+            forceBlockingFlush();
+        }
         return snapshotWithoutFlush(snapshotName, predicate, ephemeral);
     }
 
@@ -1901,8 +2042,8 @@
 
     public void invalidateCachedPartition(DecoratedKey key)
     {
-        if (!Schema.instance.hasCF(metadata.ksAndCFName))
-            return; //2i don't cache rows
+        if (!isRowCacheEnabled())
+            return;
 
         invalidateCachedPartition(new RowCacheKey(metadata.ksAndCFName, key));
     }
@@ -2000,7 +2141,7 @@
         // recording the timestamp IN BETWEEN those actions. Any sstables created
         // with this timestamp or greater time, will not be marked for delete.
         //
-        // Bonus complication: since we store replay position in sstable metadata,
+        // Bonus complication: since we store commit log segment position in sstable metadata,
         // truncating those sstables means we will replay any CL segments from the
         // beginning if we restart before they [the CL segments] are discarded for
         // normal reasons post-truncate.  To prevent this, we store truncation
@@ -2008,7 +2149,7 @@
         logger.trace("truncating {}", name);
 
         final long truncatedAt;
-        final ReplayPosition replayAfter;
+        final CommitLogPosition replayAfter;
 
         if (keyspace.getMetadata().params.durableWrites || DatabaseDescriptor.isAutoSnapshot())
         {
@@ -2064,7 +2205,7 @@
     /**
      * Drops current memtable without flushing to disk. This should only be called when truncating a column family which is not durable.
      */
-    public Future<ReplayPosition> dumpMemtable()
+    public Future<CommitLogPosition> dumpMemtable()
     {
         synchronized (data)
         {
@@ -2127,7 +2268,7 @@
     {
         Callable<LifecycleTransaction> callable = new Callable<LifecycleTransaction>()
         {
-            public LifecycleTransaction call() throws Exception
+            public LifecycleTransaction call()
             {
                 assert data.getCompacting().isEmpty() : data.getCompacting();
                 Iterable<SSTableReader> sstables = getPermittedToCompactSSTables();
@@ -2317,7 +2458,7 @@
     public Iterable<ColumnFamilyStore> concatWithIndexes()
     {
         // we return the main CFS first, which we rely on for simplicity in switchMemtable(), for getting the
-        // latest replay position
+        // latest commit log segment position
         return Iterables.concat(Collections.singleton(this), indexManager.getAllIndexColumnFamilyStores());
     }
 
@@ -2328,7 +2469,7 @@
 
     public int getUnleveledSSTables()
     {
-        return this.compactionStrategyManager.getUnleveledSSTables();
+        return compactionStrategyManager.getUnleveledSSTables();
     }
 
     public int[] getSSTableCountPerLevel()

diff --git a/src/java/org/apache/cassandra/db/ColumnFamilyStoreMBean.java b/src/java/org/apache/cassandra/db/ColumnFamilyStoreMBean.java
index a74316e..4df9f8d 100644
--- a/src/java/org/apache/cassandra/db/ColumnFamilyStoreMBean.java
+++ b/src/java/org/apache/cassandra/db/ColumnFamilyStoreMBean.java

@@ -124,6 +124,14 @@
     public List<String> getSSTablesForKey(String key);
 
     /**
+     * Returns a list of filenames that contain the given key on this node
+     * @param key
+     * @param hexFormat if key is in hex string format
+     * @return list of filenames containing the key
+     */
+    public List<String> getSSTablesForKey(String key, boolean hexFormat);
+
+    /**
      * Scan through Keyspace/ColumnFamily's data directory
      * determine which SSTables should be loaded and load them
      */

diff --git a/src/java/org/apache/cassandra/db/ColumnIndex.java b/src/java/org/apache/cassandra/db/ColumnIndex.java
index ede3f79..2e7a2ee 100644
--- a/src/java/org/apache/cassandra/db/ColumnIndex.java
+++ b/src/java/org/apache/cassandra/db/ColumnIndex.java

@@ -15,164 +15,265 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
+
 package org.apache.cassandra.db;
 
 import java.io.IOException;
+import java.nio.ByteBuffer;
 import java.util.*;
 
-import com.google.common.annotations.VisibleForTesting;
+import com.google.common.primitives.Ints;
 
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.rows.*;
-import org.apache.cassandra.io.sstable.IndexHelper;
+import org.apache.cassandra.io.ISerializer;
+import org.apache.cassandra.io.sstable.IndexInfo;
+import org.apache.cassandra.io.sstable.format.SSTableFlushObserver;
 import org.apache.cassandra.io.sstable.format.Version;
+import org.apache.cassandra.io.util.DataOutputBuffer;
 import org.apache.cassandra.io.util.SequentialWriter;
 import org.apache.cassandra.utils.ByteBufferUtil;
 
+/**
+ * Column index builder used by {@link org.apache.cassandra.io.sstable.format.big.BigTableWriter}.
+ * For index entries that exceed {@link org.apache.cassandra.config.Config#column_index_cache_size_in_kb},
+ * this uses the serialization logic as in {@link RowIndexEntry}.
+ */
 public class ColumnIndex
 {
-    public final long partitionHeaderLength;
-    public final List<IndexHelper.IndexInfo> columnsIndex;
+    // used, if the row-index-entry reaches config column_index_cache_size_in_kb
+    private DataOutputBuffer buffer;
+    // used to track the size of the serialized size of row-index-entry (unused for buffer)
+    private int indexSamplesSerializedSize;
+    // used, until the row-index-entry reaches config column_index_cache_size_in_kb
+    private final List<IndexInfo> indexSamples = new ArrayList<>();
 
-    private static final ColumnIndex EMPTY = new ColumnIndex(-1, Collections.<IndexHelper.IndexInfo>emptyList());
+    public int columnIndexCount;
+    private int[] indexOffsets;
 
-    private ColumnIndex(long partitionHeaderLength, List<IndexHelper.IndexInfo> columnsIndex)
+    private final SerializationHeader header;
+    private final int version;
+    private final SequentialWriter writer;
+    private long initialPosition;
+    private final  ISerializer<IndexInfo> idxSerializer;
+    public long headerLength;
+    private long startPosition;
+
+    private int written;
+    private long previousRowStart;
+
+    private ClusteringPrefix firstClustering;
+    private ClusteringPrefix lastClustering;
+
+    private DeletionTime openMarker;
+
+    private final Collection<SSTableFlushObserver> observers;
+
+    public ColumnIndex(SerializationHeader header,
+                        SequentialWriter writer,
+                        Version version,
+                        Collection<SSTableFlushObserver> observers,
+                        ISerializer<IndexInfo> indexInfoSerializer)
     {
-        assert columnsIndex != null;
-
-        this.partitionHeaderLength = partitionHeaderLength;
-        this.columnsIndex = columnsIndex;
+        this.header = header;
+        this.writer = writer;
+        this.version = version.correspondingMessagingVersion();
+        this.observers = observers;
+        this.idxSerializer = indexInfoSerializer;
     }
 
-    public static ColumnIndex writeAndBuildIndex(UnfilteredRowIterator iterator, SequentialWriter output, SerializationHeader header, Version version) throws IOException
+    public void reset()
     {
-        assert !iterator.isEmpty() && version.storeRows();
-
-        Builder builder = new Builder(iterator, output, header, version.correspondingMessagingVersion());
-        return builder.build();
+        this.initialPosition = writer.position();
+        this.headerLength = -1;
+        this.startPosition = -1;
+        this.previousRowStart = 0;
+        this.columnIndexCount = 0;
+        this.written = 0;
+        this.indexSamplesSerializedSize = 0;
+        this.indexSamples.clear();
+        this.firstClustering = null;
+        this.lastClustering = null;
+        this.openMarker = null;
+        this.buffer = null;
     }
 
-    @VisibleForTesting
-    public static ColumnIndex nothing()
+    public void buildRowIndex(UnfilteredRowIterator iterator) throws IOException
     {
-        return EMPTY;
+        writePartitionHeader(iterator);
+        this.headerLength = writer.position() - initialPosition;
+
+        while (iterator.hasNext())
+            add(iterator.next());
+
+        finish();
     }
 
-    /**
-     * Help to create an index for a column family based on size of columns,
-     * and write said columns to disk.
-     */
-    private static class Builder
+    private void writePartitionHeader(UnfilteredRowIterator iterator) throws IOException
     {
-        private final UnfilteredRowIterator iterator;
-        private final SequentialWriter writer;
-        private final SerializationHeader header;
-        private final int version;
-
-        private final List<IndexHelper.IndexInfo> columnsIndex = new ArrayList<>();
-        private final long initialPosition;
-        private long headerLength = -1;
-
-        private long startPosition = -1;
-
-        private int written;
-        private long previousRowStart;
-
-        private ClusteringPrefix firstClustering;
-        private ClusteringPrefix lastClustering;
-
-        private DeletionTime openMarker;
-
-        public Builder(UnfilteredRowIterator iterator,
-                       SequentialWriter writer,
-                       SerializationHeader header,
-                       int version)
+        ByteBufferUtil.writeWithShortLength(iterator.partitionKey().getKey(), writer);
+        DeletionTime.serializer.serialize(iterator.partitionLevelDeletion(), writer);
+        if (header.hasStatic())
         {
-            this.iterator = iterator;
-            this.writer = writer;
-            this.header = header;
-            this.version = version;
-            this.initialPosition = writer.position();
+            Row staticRow = iterator.staticRow();
+
+            UnfilteredSerializer.serializer.serializeStaticRow(staticRow, header, writer, version);
+            if (!observers.isEmpty())
+                observers.forEach((o) -> o.nextUnfilteredCluster(staticRow));
+        }
+    }
+
+    private long currentPosition()
+    {
+        return writer.position() - initialPosition;
+    }
+
+    public ByteBuffer buffer()
+    {
+        return buffer != null ? buffer.buffer() : null;
+    }
+
+    public List<IndexInfo> indexSamples()
+    {
+        if (indexSamplesSerializedSize + columnIndexCount * TypeSizes.sizeof(0) <= DatabaseDescriptor.getColumnIndexCacheSize())
+        {
+            return indexSamples;
         }
 
-        private void writePartitionHeader(UnfilteredRowIterator iterator) throws IOException
+        return null;
+    }
+
+    public int[] offsets()
+    {
+        return indexOffsets != null
+               ? Arrays.copyOf(indexOffsets, columnIndexCount)
+               : null;
+    }
+
+    private void addIndexBlock() throws IOException
+    {
+        IndexInfo cIndexInfo = new IndexInfo(firstClustering,
+                                             lastClustering,
+                                             startPosition,
+                                             currentPosition() - startPosition,
+                                             openMarker);
+
+        // indexOffsets is used for both shallow (ShallowIndexedEntry) and non-shallow IndexedEntry.
+        // For shallow ones, we need it to serialize the offsts in finish().
+        // For non-shallow ones, the offsts are passed into IndexedEntry, so we don't have to
+        // calculate the offsets again.
+
+        // indexOffsets contains the offsets of the serialized IndexInfo objects.
+        // I.e. indexOffsets[0] is always 0 so we don't have to deal with a special handling
+        // for index #0 and always subtracting 1 for the index (which could be error-prone).
+        if (indexOffsets == null)
+            indexOffsets = new int[10];
+        else
         {
-            ByteBufferUtil.writeWithShortLength(iterator.partitionKey().getKey(), writer);
-            DeletionTime.serializer.serialize(iterator.partitionLevelDeletion(), writer);
-            if (header.hasStatic())
-                UnfilteredSerializer.serializer.serializeStaticRow(iterator.staticRow(), header, writer, version);
-        }
+            if (columnIndexCount >= indexOffsets.length)
+                indexOffsets = Arrays.copyOf(indexOffsets, indexOffsets.length + 10);
 
-        public ColumnIndex build() throws IOException
-        {
-            writePartitionHeader(iterator);
-            this.headerLength = writer.position() - initialPosition;
-
-            while (iterator.hasNext())
-                add(iterator.next());
-
-            return close();
-        }
-
-        private long currentPosition()
-        {
-            return writer.position() - initialPosition;
-        }
-
-        private void addIndexBlock()
-        {
-            IndexHelper.IndexInfo cIndexInfo = new IndexHelper.IndexInfo(firstClustering,
-                                                                         lastClustering,
-                                                                         startPosition,
-                                                                         currentPosition() - startPosition,
-                                                                         openMarker);
-            columnsIndex.add(cIndexInfo);
-            firstClustering = null;
-        }
-
-        private void add(Unfiltered unfiltered) throws IOException
-        {
-            long pos = currentPosition();
-
-            if (firstClustering == null)
+            //the 0th element is always 0
+            if (columnIndexCount == 0)
             {
-                // Beginning of an index block. Remember the start and position
-                firstClustering = unfiltered.clustering();
-                startPosition = pos;
+                indexOffsets[columnIndexCount] = 0;
             }
-
-            UnfilteredSerializer.serializer.serialize(unfiltered, header, writer, pos - previousRowStart, version);
-            lastClustering = unfiltered.clustering();
-            previousRowStart = pos;
-            ++written;
-
-            if (unfiltered.kind() == Unfiltered.Kind.RANGE_TOMBSTONE_MARKER)
+            else
             {
-                RangeTombstoneMarker marker = (RangeTombstoneMarker)unfiltered;
-                openMarker = marker.isOpen(false) ? marker.openDeletionTime(false) : null;
+                indexOffsets[columnIndexCount] =
+                buffer != null
+                ? Ints.checkedCast(buffer.position())
+                : indexSamplesSerializedSize;
             }
-
-            // if we hit the column index size that we have to index after, go ahead and index it.
-            if (currentPosition() - startPosition >= DatabaseDescriptor.getColumnIndexSize())
-                addIndexBlock();
-
         }
+        columnIndexCount++;
 
-        private ColumnIndex close() throws IOException
+        // First, we collect the IndexInfo objects until we reach Config.column_index_cache_size_in_kb in an ArrayList.
+        // When column_index_cache_size_in_kb is reached, we switch to byte-buffer mode.
+        if (buffer == null)
         {
-            UnfilteredSerializer.serializer.writeEndOfPartition(writer);
-
-            // It's possible we add no rows, just a top level deletion
-            if (written == 0)
-                return ColumnIndex.EMPTY;
-
-            // the last column may have fallen on an index boundary already.  if not, index it explicitly.
-            if (firstClustering != null)
-                addIndexBlock();
-
-            // we should always have at least one computed index block, but we only write it out if there is more than that.
-            assert columnsIndex.size() > 0 && headerLength >= 0;
-            return new ColumnIndex(headerLength, columnsIndex);
+            indexSamplesSerializedSize += idxSerializer.serializedSize(cIndexInfo);
+            if (indexSamplesSerializedSize + columnIndexCount * TypeSizes.sizeof(0) > DatabaseDescriptor.getColumnIndexCacheSize())
+            {
+                buffer = new DataOutputBuffer(DatabaseDescriptor.getColumnIndexCacheSize() * 2);
+                for (IndexInfo indexSample : indexSamples)
+                {
+                    idxSerializer.serialize(indexSample, buffer);
+                }
+            }
+            else
+            {
+                indexSamples.add(cIndexInfo);
+            }
         }
+        // don't put an else here...
+        if (buffer != null)
+        {
+            idxSerializer.serialize(cIndexInfo, buffer);
+        }
+
+        firstClustering = null;
+    }
+
+    private void add(Unfiltered unfiltered) throws IOException
+    {
+        long pos = currentPosition();
+
+        if (firstClustering == null)
+        {
+            // Beginning of an index block. Remember the start and position
+            firstClustering = unfiltered.clustering();
+            startPosition = pos;
+        }
+
+        UnfilteredSerializer.serializer.serialize(unfiltered, header, writer, pos - previousRowStart, version);
+
+        // notify observers about each new row
+        if (!observers.isEmpty())
+            observers.forEach((o) -> o.nextUnfilteredCluster(unfiltered));
+
+        lastClustering = unfiltered.clustering();
+        previousRowStart = pos;
+        ++written;
+
+        if (unfiltered.kind() == Unfiltered.Kind.RANGE_TOMBSTONE_MARKER)
+        {
+            RangeTombstoneMarker marker = (RangeTombstoneMarker) unfiltered;
+            openMarker = marker.isOpen(false) ? marker.openDeletionTime(false) : null;
+        }
+
+        // if we hit the column index size that we have to index after, go ahead and index it.
+        if (currentPosition() - startPosition >= DatabaseDescriptor.getColumnIndexSize())
+            addIndexBlock();
+    }
+
+    private void finish() throws IOException
+    {
+        UnfilteredSerializer.serializer.writeEndOfPartition(writer);
+
+        // It's possible we add no rows, just a top level deletion
+        if (written == 0)
+            return;
+
+        // the last column may have fallen on an index boundary already.  if not, index it explicitly.
+        if (firstClustering != null)
+            addIndexBlock();
+
+        // If we serialize the IndexInfo objects directly in the code above into 'buffer',
+        // we have to write the offsts to these here. The offsets have already been are collected
+        // in indexOffsets[]. buffer is != null, if it exceeds Config.column_index_cache_size_in_kb.
+        // In the other case, when buffer==null, the offsets are serialized in RowIndexEntry.IndexedEntry.serialize().
+        if (buffer != null)
+            RowIndexEntry.Serializer.serializeOffsets(buffer, indexOffsets, columnIndexCount);
+
+        // we should always have at least one computed index block, but we only write it out if there is more than that.
+        assert columnIndexCount > 0 && headerLength >= 0;
+    }
+
+    public int indexInfoSerializedSize()
+    {
+        return buffer != null
+               ? buffer.buffer().limit()
+               : indexSamplesSerializedSize + columnIndexCount * TypeSizes.sizeof(0);
     }
 }

diff --git a/src/java/org/apache/cassandra/db/Columns.java b/src/java/org/apache/cassandra/db/Columns.java
index cad295c..e3c30fa 100644
--- a/src/java/org/apache/cassandra/db/Columns.java
+++ b/src/java/org/apache/cassandra/db/Columns.java

@@ -38,6 +38,7 @@
 import org.apache.cassandra.utils.SearchIterator;
 import org.apache.cassandra.utils.btree.BTree;
 import org.apache.cassandra.utils.btree.BTreeSearchIterator;
+import org.apache.cassandra.utils.btree.BTreeRemoval;
 import org.apache.cassandra.utils.btree.UpdateFunction;
 
 /**
@@ -343,7 +344,7 @@
         if (!contains(column))
             return this;
 
-        Object[] newColumns = BTree.<ColumnDefinition>transformAndFilter(columns, (c) -> c.equals(column) ? null : c);
+        Object[] newColumns = BTreeRemoval.<ColumnDefinition>remove(columns, Comparator.naturalOrder(), column);
         return new Columns(newColumns);
     }
 

diff --git a/src/java/org/apache/cassandra/db/CompactTables.java b/src/java/org/apache/cassandra/db/CompactTables.java
index e31fda3..0d9c5df 100644
--- a/src/java/org/apache/cassandra/db/CompactTables.java
+++ b/src/java/org/apache/cassandra/db/CompactTables.java

@@ -56,6 +56,7 @@
  *
  * As far as thrift is concerned, one exception to this is super column families, which have a different layout. Namely, a super
  * column families is encoded with:
+ * {@code
  *   CREATE TABLE super (
  *      key [key_validation_class],
  *      super_column_name [comparator],
@@ -65,6 +66,7 @@
  *      "" map<[sub_comparator], [default_validation_class]>
  *      PRIMARY KEY (key, super_column_name)
  *   )
+ * }
  * In other words, every super column is encoded by a row. That row has one column for each defined "column_metadata", but it also
  * has a special map column (whose name is the empty string as this is guaranteed to never conflict with a user-defined
  * "column_metadata") which stores the super column "dynamic" sub-columns.

diff --git a/src/java/org/apache/cassandra/db/CounterMutation.java b/src/java/org/apache/cassandra/db/CounterMutation.java
index 8aafa5c..6a07782 100644
--- a/src/java/org/apache/cassandra/db/CounterMutation.java
+++ b/src/java/org/apache/cassandra/db/CounterMutation.java

@@ -110,7 +110,7 @@
      *
      * @return the applied resulting Mutation
      */
-    public Mutation apply() throws WriteTimeoutException
+    public Mutation applyCounterMutation() throws WriteTimeoutException
     {
         Mutation result = new Mutation(getKeyspaceName(), key());
         Keyspace keyspace = Keyspace.open(getKeyspaceName());
@@ -132,6 +132,11 @@
         }
     }
 
+    public void apply()
+    {
+        applyCounterMutation();
+    }
+
     private void grabCounterLocks(Keyspace keyspace, List<Lock> locks) throws WriteTimeoutException
     {
         long startTime = System.nanoTime();
@@ -206,7 +211,7 @@
 
     private void updateWithCurrentValue(PartitionUpdate.CounterMark mark, ClockAndCount currentValue, ColumnFamilyStore cfs)
     {
-        long clock = currentValue.clock + 1L;
+        long clock = Math.max(FBUtilities.timestampMicros(), currentValue.clock + 1L);
         long count = currentValue.count + CounterContext.instance().total(mark.value());
 
         mark.setValue(CounterContext.instance().createGlobal(CounterId.getLocalId(), clock, count));
@@ -250,7 +255,8 @@
         ClusteringIndexNamesFilter filter = new ClusteringIndexNamesFilter(names.build(), false);
         SinglePartitionReadCommand cmd = SinglePartitionReadCommand.create(cfs.metadata, nowInSec, key(), builder.build(), filter);
         PeekingIterator<PartitionUpdate.CounterMark> markIter = Iterators.peekingIterator(marks.iterator());
-        try (OpOrder.Group op = cfs.readOrdering.start(); RowIterator partition = UnfilteredRowIterators.filter(cmd.queryMemtableAndDisk(cfs, op), nowInSec))
+        try (ReadExecutionController controller = cmd.executionController();
+             RowIterator partition = UnfilteredRowIterators.filter(cmd.queryMemtableAndDisk(cfs, controller), nowInSec))
         {
             updateForRow(markIter, partition.staticRow(), cfs);
 

diff --git a/src/java/org/apache/cassandra/db/Directories.java b/src/java/org/apache/cassandra/db/Directories.java
index 01ffd52..cd5b695 100644
--- a/src/java/org/apache/cassandra/db/Directories.java
+++ b/src/java/org/apache/cassandra/db/Directories.java

@@ -23,22 +23,15 @@
 import java.io.FileFilter;
 import java.io.IOError;
 import java.io.IOException;
-import java.nio.file.FileVisitResult;
 import java.nio.file.Files;
 import java.nio.file.Path;
-import java.nio.file.SimpleFileVisitor;
-import java.nio.file.attribute.BasicFileAttributes;
 import java.util.*;
 import java.util.concurrent.ThreadLocalRandom;
-import java.util.concurrent.atomic.AtomicLong;
 import java.util.function.BiFunction;
-import java.util.function.Consumer;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Predicate;
 import com.google.common.collect.ImmutableMap;
-import com.google.common.collect.ImmutableSet;
-import com.google.common.collect.ImmutableSet.Builder;
 import com.google.common.collect.Iterables;
 
 import org.apache.commons.lang3.StringUtils;
@@ -51,6 +44,7 @@
 import org.apache.cassandra.io.FSWriteError;
 import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.io.sstable.*;
+import org.apache.cassandra.utils.DirectorySizeCalculator;
 import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.Pair;
@@ -291,6 +285,19 @@
         return null;
     }
 
+    public DataDirectory getDataDirectoryForFile(File directory)
+    {
+        if (directory != null)
+        {
+            for (DataDirectory dataDirectory : paths)
+            {
+                if (directory.getAbsolutePath().startsWith(dataDirectory.location.getAbsolutePath()))
+                    return dataDirectory;
+            }
+        }
+        return null;
+    }
+
     public Descriptor find(String filename)
     {
         for (File dir : dataPaths)
@@ -306,7 +313,7 @@
      * which may return any non-blacklisted directory - even a data directory that has no usable space.
      * Do not use this method in production code.
      *
-     * @throws IOError if all directories are blacklisted.
+     * @throws FSWriteError if all directories are blacklisted.
      */
     public File getDirectoryForNewSSTables()
     {
@@ -316,11 +323,14 @@
     /**
      * Returns a non-blacklisted data directory that _currently_ has {@code writeSize} bytes as usable space.
      *
-     * @throws IOError if all directories are blacklisted.
+     * @throws FSWriteError if all directories are blacklisted.
      */
     public File getWriteableLocationAsFile(long writeSize)
     {
-        return getLocationForDisk(getWriteableLocation(writeSize));
+        File location = getLocationForDisk(getWriteableLocation(writeSize));
+        if (location == null)
+            throw new FSWriteError(new IOException("No configured data directory contains enough space to write " + writeSize + " bytes"), "");
+        return location;
     }
 
     /**
@@ -352,9 +362,10 @@
     }
 
     /**
-     * Returns a non-blacklisted data directory that _currently_ has {@code writeSize} bytes as usable space.
+     * Returns a non-blacklisted data directory that _currently_ has {@code writeSize} bytes as usable space, null if
+     * there is not enough space left in all directories.
      *
-     * @throws IOError if all directories are blacklisted.
+     * @throws FSWriteError if all directories are blacklisted.
      */
     public DataDirectory getWriteableLocation(long writeSize)
     {
@@ -433,7 +444,7 @@
         for (DataDirectory dataDir : paths)
         {
             if (BlacklistedDirectories.isUnwritable(getLocationForDisk(dataDir)))
-                  continue;
+                continue;
             DataDirectoryCandidate candidate = new DataDirectoryCandidate(dataDir);
             // exclude directory if its total writeSize does not fit to data directory
             if (candidate.availableSpace < writeSize)
@@ -443,6 +454,26 @@
         return totalAvailable > expectedTotalWriteSize;
     }
 
+    public DataDirectory[] getWriteableLocations()
+    {
+        List<DataDirectory> nonBlacklistedDirs = new ArrayList<>();
+        for (DataDirectory dir : paths)
+        {
+            if (!BlacklistedDirectories.isUnwritable(dir.location))
+                nonBlacklistedDirs.add(dir);
+        }
+
+        Collections.sort(nonBlacklistedDirs, new Comparator<DataDirectory>()
+        {
+            @Override
+            public int compare(DataDirectory o1, DataDirectory o2)
+            {
+                return o1.location.compareTo(o2.location);
+            }
+        });
+        return nonBlacklistedDirs.toArray(new DataDirectory[nonBlacklistedDirs.size()]);
+    }
+
     public static File getSnapshotDirectory(Descriptor desc, String snapshotName)
     {
         return getSnapshotDirectory(desc.directory, snapshotName);
@@ -777,7 +808,6 @@
         return snapshotSpaceMap;
     }
 
-
     public List<String> listEphemeralSnapshots()
     {
         final List<String> ephemeralSnapshots = new LinkedList<>();
@@ -891,7 +921,7 @@
         if (!input.isDirectory())
             return 0;
 
-        TrueFilesSizeVisitor visitor = new TrueFilesSizeVisitor();
+        SSTableSizeSummer visitor = new SSTableSizeSummer(sstableLister(Directories.OnTxnErr.THROW).listFiles());
         try
         {
             Files.walkFileTree(input.toPath(), visitor);
@@ -975,22 +1005,15 @@
             dataDirectories[i] = new DataDirectory(new File(locations[i]));
     }
     
-    private class TrueFilesSizeVisitor extends SimpleFileVisitor<Path>
+    private class SSTableSizeSummer extends DirectorySizeCalculator
     {
-        private final AtomicLong size = new AtomicLong(0);
-        private final Set<String> visited = newHashSet(); //count each file only once
-        private final Set<String> alive;
-
-        TrueFilesSizeVisitor()
+        SSTableSizeSummer(List<File> files)
         {
-            super();
-            Builder<String> builder = ImmutableSet.builder();
-            for (File file : sstableLister(Directories.OnTxnErr.THROW).listFiles())
-                builder.add(file.getName());
-            alive = builder.build();
+            super(files);
         }
 
-        private boolean isAcceptable(Path file)
+        @Override
+        public boolean isAcceptable(Path file)
         {
             String fileName = file.toFile().getName();
             Pair<Descriptor, Component> pair = SSTable.tryComponentFromFilename(file.getParent().toFile(), fileName);
@@ -1000,27 +1023,5 @@
                     && !visited.contains(fileName)
                     && !alive.contains(fileName);
         }
-
-        @Override
-        public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException
-        {
-            if (isAcceptable(file))
-            {
-                size.addAndGet(attrs.size());
-                visited.add(file.toFile().getName());
-            }
-            return FileVisitResult.CONTINUE;
-        }
-
-        @Override
-        public FileVisitResult visitFileFailed(Path file, IOException exc) throws IOException 
-        {
-            return FileVisitResult.CONTINUE;
-        }
-        
-        public long getAllocatedSize()
-        {
-            return size.get();
-        }
     }
 }

diff --git a/src/java/org/apache/cassandra/db/IMutation.java b/src/java/org/apache/cassandra/db/IMutation.java
index aad35c3..c734e16 100644
--- a/src/java/org/apache/cassandra/db/IMutation.java
+++ b/src/java/org/apache/cassandra/db/IMutation.java

@@ -17,7 +17,6 @@
  */
 package org.apache.cassandra.db;
 
-import java.nio.ByteBuffer;
 import java.util.Collection;
 import java.util.UUID;
 
@@ -25,6 +24,7 @@
 
 public interface IMutation
 {
+    public void apply();
     public String getKeyspaceName();
     public Collection<UUID> getColumnFamilyIds();
     public DecoratedKey key();

diff --git a/src/java/org/apache/cassandra/db/Keyspace.java b/src/java/org/apache/cassandra/db/Keyspace.java
index bcf1d24..6e44308 100644
--- a/src/java/org/apache/cassandra/db/Keyspace.java
+++ b/src/java/org/apache/cassandra/db/Keyspace.java

@@ -26,12 +26,14 @@
 
 import com.google.common.base.Function;
 import com.google.common.collect.Iterables;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.concurrent.Stage;
 import org.apache.cassandra.concurrent.StageManager;
 import org.apache.cassandra.config.*;
 import org.apache.cassandra.db.commitlog.CommitLog;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.apache.cassandra.db.compaction.CompactionManager;
 import org.apache.cassandra.db.lifecycle.SSTableSet;
 import org.apache.cassandra.db.partitions.PartitionUpdate;
@@ -47,12 +49,8 @@
 import org.apache.cassandra.schema.KeyspaceMetadata;
 import org.apache.cassandra.service.StorageService;
 import org.apache.cassandra.tracing.Tracing;
-import org.apache.cassandra.utils.ByteBufferUtil;
-import org.apache.cassandra.utils.FBUtilities;
-import org.apache.cassandra.utils.JVMStabilityInspector;
+import org.apache.cassandra.utils.*;
 import org.apache.cassandra.utils.concurrent.OpOrder;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
 
 /**
  * It represents a Keyspace.
@@ -218,9 +216,10 @@
      *
      * @param snapshotName     the tag associated with the name of the snapshot.  This value may not be null
      * @param columnFamilyName the column family to snapshot or all on null
+     * @param skipFlush Skip blocking flush of memtable
      * @throws IOException if the column family doesn't exist
      */
-    public void snapshot(String snapshotName, String columnFamilyName) throws IOException
+    public void snapshot(String snapshotName, String columnFamilyName, boolean skipFlush) throws IOException
     {
         assert snapshotName != null;
         boolean tookSnapShot = false;
@@ -229,7 +228,7 @@
             if (columnFamilyName == null || cfStore.name.equals(columnFamilyName))
             {
                 tookSnapShot = true;
-                cfStore.snapshot(snapshotName);
+                cfStore.snapshot(snapshotName, skipFlush);
             }
         }
 
@@ -238,6 +237,19 @@
     }
 
     /**
+     * Take a snapshot of the specific column family, or the entire set of column families
+     * if columnFamily is null with a given timestamp
+     *
+     * @param snapshotName     the tag associated with the name of the snapshot.  This value may not be null
+     * @param columnFamilyName the column family to snapshot or all on null
+     * @throws IOException if the column family doesn't exist
+     */
+    public void snapshot(String snapshotName, String columnFamilyName) throws IOException
+    {
+        snapshot(snapshotName, columnFamilyName, false);
+    }
+
+    /**
      * @param clientSuppliedName may be null.
      * @return the name of the snapshot
      */
@@ -412,59 +424,73 @@
         if (TEST_FAIL_WRITES && metadata.name.equals(TEST_FAIL_WRITES_KS))
             throw new RuntimeException("Testing write failures");
 
-        Lock lock = null;
+        Lock[] locks = null;
         boolean requiresViewUpdate = updateIndexes && viewManager.updatesAffectView(Collections.singleton(mutation), false);
         final CompletableFuture<?> mark = future == null ? new CompletableFuture<>() : future;
 
         if (requiresViewUpdate)
         {
             mutation.viewLockAcquireStart.compareAndSet(0L, System.currentTimeMillis());
-            lock = ViewManager.acquireLockFor(mutation.key().getKey());
 
-            if (lock == null)
+            // the order of lock acquisition doesn't matter (from a deadlock perspective) because we only use tryLock()
+            Collection<UUID> columnFamilyIds = mutation.getColumnFamilyIds();
+            Iterator<UUID> idIterator = columnFamilyIds.iterator();
+            locks = new Lock[columnFamilyIds.size()];
+
+            for (int i = 0; i < columnFamilyIds.size(); i++)
             {
-                // avoid throwing a WTE during commitlog replay
-                if (!isClReplay && (System.currentTimeMillis() - mutation.createdAt) > DatabaseDescriptor.getWriteRpcTimeout())
+                UUID cfid = idIterator.next();
+                int lockKey = Objects.hash(mutation.key().getKey(), cfid);
+                Lock lock = ViewManager.acquireLockFor(lockKey);
+                if (lock == null)
                 {
-                    logger.trace("Could not acquire lock for {}", ByteBufferUtil.bytesToHex(mutation.key().getKey()));
-                    Tracing.trace("Could not acquire MV lock");
-                    if (future != null)
-                        future.completeExceptionally(new WriteTimeoutException(WriteType.VIEW, ConsistencyLevel.LOCAL_ONE, 0, 1));
+                    // we will either time out or retry, so release all acquired locks
+                    for (int j = 0; j < i; j++)
+                        locks[j].unlock();
+
+                    // avoid throwing a WTE during commitlog replay
+                    if (!isClReplay && (System.currentTimeMillis() - mutation.createdAt) > DatabaseDescriptor.getWriteRpcTimeout())
+                    {
+                        logger.trace("Could not acquire lock for {} and table {}", ByteBufferUtil.bytesToHex(mutation.key().getKey()), columnFamilyStores.get(cfid).name);
+                        Tracing.trace("Could not acquire MV lock");
+                        if (future != null)
+                            future.completeExceptionally(new WriteTimeoutException(WriteType.VIEW, ConsistencyLevel.LOCAL_ONE, 0, 1));
+                        else
+                            throw new WriteTimeoutException(WriteType.VIEW, ConsistencyLevel.LOCAL_ONE, 0, 1);
+                    }
                     else
-                        throw new WriteTimeoutException(WriteType.VIEW, ConsistencyLevel.LOCAL_ONE, 0, 1);
+                    {
+                        // This view update can't happen right now. so rather than keep this thread busy
+                        // we will re-apply ourself to the queue and try again later
+                        StageManager.getStage(Stage.MUTATION).execute(() ->
+                            apply(mutation, writeCommitLog, true, isClReplay, mark)
+                        );
+
+                        return mark;
+                    }
                 }
                 else
                 {
-                    //This view update can't happen right now. so rather than keep this thread busy
-                    // we will re-apply ourself to the queue and try again later
-                    StageManager.getStage(Stage.MUTATION).execute(() ->
-                        apply(mutation, writeCommitLog, true, isClReplay, mark)
-                    );
-
-                    return mark;
+                    locks[i] = lock;
                 }
             }
-            else
+
+            long acquireTime = System.currentTimeMillis() - mutation.viewLockAcquireStart.get();
+            if (!isClReplay)
             {
-                long acquireTime = System.currentTimeMillis() - mutation.viewLockAcquireStart.get();
-                if (!isClReplay)
-                {
-                    for(UUID cfid : mutation.getColumnFamilyIds())
-                    {
-                        columnFamilyStores.get(cfid).metric.viewLockAcquireTime.update(acquireTime, TimeUnit.MILLISECONDS);
-                    }
-                }
+                for(UUID cfid : columnFamilyIds)
+                    columnFamilyStores.get(cfid).metric.viewLockAcquireTime.update(acquireTime, TimeUnit.MILLISECONDS);
             }
         }
         int nowInSec = FBUtilities.nowInSeconds();
         try (OpOrder.Group opGroup = writeOrder.start())
         {
             // write the mutation to the commitlog and memtables
-            ReplayPosition replayPosition = null;
+            CommitLogPosition commitLogPosition = null;
             if (writeCommitLog)
             {
                 Tracing.trace("Appending to commitlog");
-                replayPosition = CommitLog.instance.add(mutation);
+                commitLogPosition = CommitLog.instance.add(mutation);
             }
 
             for (PartitionUpdate upd : mutation.getPartitionUpdates())
@@ -497,7 +523,7 @@
                 UpdateTransaction indexTransaction = updateIndexes
                                                      ? cfs.indexManager.newUpdateTransaction(upd, opGroup, nowInSec)
                                                      : UpdateTransaction.NO_OP;
-                cfs.apply(upd, indexTransaction, opGroup, replayPosition);
+                cfs.apply(upd, indexTransaction, opGroup, commitLogPosition);
                 if (requiresViewUpdate)
                     baseComplete.set(System.currentTimeMillis());
             }
@@ -506,8 +532,11 @@
         }
         finally
         {
-            if (lock != null)
-                lock.unlock();
+            if (locks != null)
+            {
+                for (Lock lock : locks)
+                    lock.unlock();
+            }
         }
     }
 
@@ -530,9 +559,9 @@
                                                                                       FBUtilities.nowInSeconds(),
                                                                                       key);
 
-        try (OpOrder.Group writeGroup = cfs.keyspace.writeOrder.start();
-             OpOrder.Group readGroup = cfs.readOrdering.start();
-             UnfilteredRowIterator partition = cmd.queryMemtableAndDisk(cfs, readGroup))
+        try (ReadExecutionController controller = cmd.executionController();
+             UnfilteredRowIterator partition = cmd.queryMemtableAndDisk(cfs, controller);
+             OpOrder.Group writeGroup = cfs.keyspace.writeOrder.start())
         {
             cfs.indexManager.indexPartition(partition, writeGroup, indexes, cmd.nowInSec());
         }

diff --git a/src/java/org/apache/cassandra/db/LegacyLayout.java b/src/java/org/apache/cassandra/db/LegacyLayout.java
index 3feb1f4..b5190ad 100644
--- a/src/java/org/apache/cassandra/db/LegacyLayout.java
+++ b/src/java/org/apache/cassandra/db/LegacyLayout.java

@@ -108,7 +108,7 @@
         if (metadata.isSuper())
         {
             assert superColumnName != null;
-            return decodeForSuperColumn(metadata, new Clustering(superColumnName), cellname);
+            return decodeForSuperColumn(metadata, Clustering.make(superColumnName), cellname);
         }
 
         assert superColumnName == null;
@@ -166,7 +166,7 @@
         {
             // If it's a compact table, it means the column is in fact a "dynamic" one
             if (metadata.isCompactTable())
-                return new LegacyCellName(new Clustering(column), metadata.compactValueColumn(), null);
+                return new LegacyCellName(Clustering.make(column), metadata.compactValueColumn(), null);
 
             if (def == null)
                 throw new UnknownColumnException(metadata, column);
@@ -197,26 +197,26 @@
         List<CompositeType.CompositeComponent> prefix = components.size() <= metadata.comparator.size()
                                                       ? components
                                                       : components.subList(0, metadata.comparator.size());
-        Slice.Bound.Kind boundKind;
+        ClusteringPrefix.Kind boundKind;
         if (isStart)
         {
             if (components.get(components.size() - 1).eoc > 0)
-                boundKind = Slice.Bound.Kind.EXCL_START_BOUND;
+                boundKind = ClusteringPrefix.Kind.EXCL_START_BOUND;
             else
-                boundKind = Slice.Bound.Kind.INCL_START_BOUND;
+                boundKind = ClusteringPrefix.Kind.INCL_START_BOUND;
         }
         else
         {
             if (components.get(components.size() - 1).eoc < 0)
-                boundKind = Slice.Bound.Kind.EXCL_END_BOUND;
+                boundKind = ClusteringPrefix.Kind.EXCL_END_BOUND;
             else
-                boundKind = Slice.Bound.Kind.INCL_END_BOUND;
+                boundKind = ClusteringPrefix.Kind.INCL_END_BOUND;
         }
 
         ByteBuffer[] prefixValues = new ByteBuffer[prefix.size()];
         for (int i = 0; i < prefix.size(); i++)
             prefixValues[i] = prefix.get(i).value;
-        Slice.Bound sb = Slice.Bound.create(boundKind, prefixValues);
+        ClusteringBound sb = ClusteringBound.create(boundKind, prefixValues);
 
         ColumnDefinition collectionName = components.size() == metadata.comparator.size() + 1
                                         ? metadata.getColumnDefinition(components.get(metadata.comparator.size()).value)
@@ -224,9 +224,9 @@
         return new LegacyBound(sb, metadata.isCompound() && CompositeType.isStaticName(bound), collectionName);
     }
 
-    public static ByteBuffer encodeBound(CFMetaData metadata, Slice.Bound bound, boolean isStart)
+    public static ByteBuffer encodeBound(CFMetaData metadata, ClusteringBound bound, boolean isStart)
     {
-        if (bound == Slice.Bound.BOTTOM || bound == Slice.Bound.TOP || metadata.comparator.size() == 0)
+        if (bound == ClusteringBound.BOTTOM || bound == ClusteringBound.TOP || metadata.comparator.size() == 0)
             return ByteBufferUtil.EMPTY_BYTE_BUFFER;
 
         ClusteringPrefix clustering = bound.clustering();
@@ -303,7 +303,7 @@
                                     ? CompositeType.splitName(value)
                                     : Collections.singletonList(value);
 
-        return new Clustering(components.subList(0, Math.min(csize, components.size())).toArray(new ByteBuffer[csize]));
+        return Clustering.make(components.subList(0, Math.min(csize, components.size())).toArray(new ByteBuffer[csize]));
     }
 
     public static ByteBuffer encodeClustering(CFMetaData metadata, ClusteringPrefix clustering)
@@ -699,16 +699,18 @@
 
             protected LegacyCell computeNext()
             {
-                if (currentRow.hasNext())
-                    return currentRow.next();
+                while (true)
+                {
+                    if (currentRow.hasNext())
+                        return currentRow.next();
 
-                if (!iterator.hasNext())
-                    return endOfData();
+                    if (!iterator.hasNext())
+                        return endOfData();
 
-                Pair<LegacyRangeTombstoneList, Iterator<LegacyCell>> row = fromRow(metadata, iterator.next());
-                deletions.addAll(row.left);
-                currentRow = row.right;
-                return computeNext();
+                    Pair<LegacyRangeTombstoneList, Iterator<LegacyCell>> row = fromRow(metadata, iterator.next());
+                    deletions.addAll(row.left);
+                    currentRow = row.right;
+                }
             }
         };
 
@@ -724,8 +726,8 @@
         if (!row.deletion().isLive())
         {
             Clustering clustering = row.clustering();
-            Slice.Bound startBound = Slice.Bound.inclusiveStartOf(clustering);
-            Slice.Bound endBound = Slice.Bound.inclusiveEndOf(clustering);
+            ClusteringBound startBound = ClusteringBound.inclusiveStartOf(clustering);
+            ClusteringBound endBound = ClusteringBound.inclusiveEndOf(clustering);
 
             LegacyBound start = new LegacyLayout.LegacyBound(startBound, false, null);
             LegacyBound end = new LegacyLayout.LegacyBound(endBound, false, null);
@@ -744,8 +746,8 @@
             {
                 Clustering clustering = row.clustering();
 
-                Slice.Bound startBound = Slice.Bound.inclusiveStartOf(clustering);
-                Slice.Bound endBound = Slice.Bound.inclusiveEndOf(clustering);
+                ClusteringBound startBound = ClusteringBound.inclusiveStartOf(clustering);
+                ClusteringBound endBound = ClusteringBound.inclusiveEndOf(clustering);
 
                 LegacyLayout.LegacyBound start = new LegacyLayout.LegacyBound(startBound, col.isStatic(), col);
                 LegacyLayout.LegacyBound end = new LegacyLayout.LegacyBound(endBound, col.isStatic(), col);
@@ -920,9 +922,9 @@
             // we can have collection deletion and we want those to sort properly just before the column they
             // delete, not before the whole row.
             // We also want to special case static so they sort before any non-static. Note in particular that
-            // this special casing is important in the case of one of the Atom being Slice.Bound.BOTTOM: we want
+            // this special casing is important in the case of one of the Atom being Bound.BOTTOM: we want
             // it to sort after the static as we deal with static first in toUnfilteredAtomIterator and having
-            // Slice.Bound.BOTTOM first would mess that up (note that static deletion is handled through a specific
+            // Bound.BOTTOM first would mess that up (note that static deletion is handled through a specific
             // static tombstone, see LegacyDeletionInfo.add()).
             if (o1.isStatic() != o2.isStatic())
                 return o1.isStatic() ? -1 : 1;
@@ -1169,7 +1171,7 @@
             {
                 // It's the row marker
                 assert !cell.value.hasRemaining();
-                builder.addPrimaryKeyLivenessInfo(LivenessInfo.create(cell.timestamp, cell.ttl, cell.localDeletionTime));
+                builder.addPrimaryKeyLivenessInfo(LivenessInfo.withExpirationTime(cell.timestamp, cell.ttl, cell.localDeletionTime));
             }
             else
             {
@@ -1347,14 +1349,14 @@
 
     public static class LegacyBound
     {
-        public static final LegacyBound BOTTOM = new LegacyBound(Slice.Bound.BOTTOM, false, null);
-        public static final LegacyBound TOP = new LegacyBound(Slice.Bound.TOP, false, null);
+        public static final LegacyBound BOTTOM = new LegacyBound(ClusteringBound.BOTTOM, false, null);
+        public static final LegacyBound TOP = new LegacyBound(ClusteringBound.TOP, false, null);
 
-        public final Slice.Bound bound;
+        public final ClusteringBound bound;
         public final boolean isStatic;
         public final ColumnDefinition collectionName;
 
-        public LegacyBound(Slice.Bound bound, boolean isStatic, ColumnDefinition collectionName)
+        public LegacyBound(ClusteringBound bound, boolean isStatic, ColumnDefinition collectionName)
         {
             this.bound = bound;
             this.isStatic = isStatic;
@@ -1370,7 +1372,7 @@
             ByteBuffer[] values = new ByteBuffer[bound.size()];
             for (int i = 0; i < bound.size(); i++)
                 values[i] = bound.get(i);
-            return new Clustering(values);
+            return Clustering.make(values);
         }
 
         @Override
@@ -1659,7 +1661,7 @@
             deletionInfo.add(topLevel);
         }
 
-        private static Slice.Bound staticBound(CFMetaData metadata, boolean isStart)
+        private static ClusteringBound staticBound(CFMetaData metadata, boolean isStart)
         {
             // In pre-3.0 nodes, static row started by a clustering with all empty values so we
             // preserve that here. Note that in practice, it doesn't really matter since the rest
@@ -1668,8 +1670,8 @@
             for (int i = 0; i < values.length; i++)
                 values[i] = ByteBufferUtil.EMPTY_BYTE_BUFFER;
             return isStart
-                 ? Slice.Bound.inclusiveStartOf(values)
-                 : Slice.Bound.inclusiveEndOf(values);
+                 ? ClusteringBound.inclusiveStartOf(values)
+                 : ClusteringBound.inclusiveEndOf(values);
         }
 
         public void add(CFMetaData metadata, LegacyRangeTombstone tombstone)
@@ -1791,7 +1793,7 @@
 
     /**
      * Almost an entire copy of RangeTombstoneList from C* 2.1.  The main difference is that LegacyBoundComparator
-     * is used in place of Comparator<Composite> (because Composite doesn't exist any more).
+     * is used in place of {@code Comparator<Composite>} (because Composite doesn't exist any more).
      *
      * This class is needed to allow us to convert single-row deletions and complex deletions into range tombstones
      * and properly merge them into the normal set of range tombstones.

diff --git a/src/java/org/apache/cassandra/db/LivenessInfo.java b/src/java/org/apache/cassandra/db/LivenessInfo.java
index 411fb9a..71f3101 100644
--- a/src/java/org/apache/cassandra/db/LivenessInfo.java
+++ b/src/java/org/apache/cassandra/db/LivenessInfo.java

@@ -52,12 +52,8 @@
         this.timestamp = timestamp;
     }
 
-    public static LivenessInfo create(CFMetaData metadata, long timestamp, int nowInSec)
+    public static LivenessInfo create(long timestamp, int nowInSec)
     {
-        int defaultTTL = metadata.params.defaultTimeToLive;
-        if (defaultTTL != NO_TTL)
-            return expiring(timestamp, defaultTTL, nowInSec);
-
         return new LivenessInfo(timestamp);
     }
 
@@ -66,16 +62,16 @@
         return new ExpiringLivenessInfo(timestamp, ttl, nowInSec + ttl);
     }
 
-    public static LivenessInfo create(CFMetaData metadata, long timestamp, int ttl, int nowInSec)
+    public static LivenessInfo create(long timestamp, int ttl, int nowInSec)
     {
         return ttl == NO_TTL
-             ? create(metadata, timestamp, nowInSec)
+             ? create(timestamp, nowInSec)
              : expiring(timestamp, ttl, nowInSec);
     }
 
-    // Note that this ctor ignores the default table ttl and takes the expiration time, not the current time.
+    // Note that this ctor takes the expiration time, not the current time.
     // Use when you know that's what you want.
-    public static LivenessInfo create(long timestamp, int ttl, int localExpirationTime)
+    public static LivenessInfo withExpirationTime(long timestamp, int ttl, int localExpirationTime)
     {
         return ttl == NO_TTL ? new LivenessInfo(timestamp) : new ExpiringLivenessInfo(timestamp, ttl, localExpirationTime);
     }

diff --git a/src/java/org/apache/cassandra/db/Memtable.java b/src/java/org/apache/cassandra/db/Memtable.java
index 93dc5af..7a46d8a 100644
--- a/src/java/org/apache/cassandra/db/Memtable.java
+++ b/src/java/org/apache/cassandra/db/Memtable.java

@@ -17,7 +17,6 @@
  */
 package org.apache.cassandra.db;
 
-import java.io.File;
 import java.util.*;
 import java.util.concurrent.*;
 import java.util.concurrent.atomic.AtomicBoolean;
@@ -25,6 +24,7 @@
 import java.util.concurrent.atomic.AtomicReference;
 
 import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Throwables;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -33,8 +33,7 @@
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.commitlog.CommitLog;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
-import org.apache.cassandra.db.compaction.OperationType;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.apache.cassandra.db.filter.ClusteringIndexFilter;
 import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
@@ -45,14 +44,13 @@
 import org.apache.cassandra.dht.Murmur3Partitioner.LongToken;
 import org.apache.cassandra.index.transactions.UpdateTransaction;
 import org.apache.cassandra.io.sstable.Descriptor;
-import org.apache.cassandra.io.sstable.SSTableTxnWriter;
-import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.io.sstable.SSTableMultiWriter;
 import org.apache.cassandra.io.sstable.metadata.MetadataCollector;
-import org.apache.cassandra.io.util.DiskAwareRunnable;
 import org.apache.cassandra.service.ActiveRepairService;
 import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.ObjectSizes;
+import org.apache.cassandra.service.StorageService;
 import org.apache.cassandra.utils.concurrent.OpOrder;
 import org.apache.cassandra.utils.memory.MemtableAllocator;
 import org.apache.cassandra.utils.memory.MemtablePool;
@@ -61,7 +59,7 @@
 {
     private static final Logger logger = LoggerFactory.getLogger(Memtable.class);
 
-    static final MemtablePool MEMORY_POOL = DatabaseDescriptor.getMemtableAllocatorPool();
+    public static final MemtablePool MEMORY_POOL = DatabaseDescriptor.getMemtableAllocatorPool();
     private static final int ROW_OVERHEAD_HEAP_SIZE = estimateRowOverhead(Integer.parseInt(System.getProperty("cassandra.memtable_row_overhead_computation_step", "100000")));
 
     private final MemtableAllocator allocator;
@@ -70,23 +68,24 @@
 
     // the write barrier for directing writes to this memtable during a switch
     private volatile OpOrder.Barrier writeBarrier;
-    // the precise upper bound of ReplayPosition owned by this memtable
-    private volatile AtomicReference<ReplayPosition> commitLogUpperBound;
-    // the precise lower bound of ReplayPosition owned by this memtable; equal to its predecessor's commitLogUpperBound
-    private AtomicReference<ReplayPosition> commitLogLowerBound;
-    // the approximate lower bound by this memtable; must be <= commitLogLowerBound once our predecessor
+    // the precise upper bound of CommitLogPosition owned by this memtable
+    private volatile AtomicReference<CommitLogPosition> commitLogUpperBound;
+    // the precise lower bound of CommitLogPosition owned by this memtable; equal to its predecessor's commitLogUpperBound
+    private AtomicReference<CommitLogPosition> commitLogLowerBound;
+
+    // The approximate lower bound by this memtable; must be <= commitLogLowerBound once our predecessor
     // has been finalised, and this is enforced in the ColumnFamilyStore.setCommitLogUpperBound
-    private final ReplayPosition approximateCommitLogLowerBound = CommitLog.instance.getContext();
+    private final CommitLogPosition approximateCommitLogLowerBound = CommitLog.instance.getCurrentPosition();
 
     public int compareTo(Memtable that)
     {
         return this.approximateCommitLogLowerBound.compareTo(that.approximateCommitLogLowerBound);
     }
 
-    public static final class LastReplayPosition extends ReplayPosition
+    public static final class LastCommitLogPosition extends CommitLogPosition
     {
-        public LastReplayPosition(ReplayPosition copy) {
-            super(copy.segment, copy.position);
+        public LastCommitLogPosition(CommitLogPosition copy) {
+            super(copy.segmentId, copy.position);
         }
     }
 
@@ -109,7 +108,7 @@
     private final StatsCollector statsCollector = new StatsCollector();
 
     // only to be used by init(), to setup the very first memtable for the cfs
-    public Memtable(AtomicReference<ReplayPosition> commitLogLowerBound, ColumnFamilyStore cfs)
+    public Memtable(AtomicReference<CommitLogPosition> commitLogLowerBound, ColumnFamilyStore cfs)
     {
         this.cfs = cfs;
         this.commitLogLowerBound = commitLogLowerBound;
@@ -145,10 +144,10 @@
     }
 
     @VisibleForTesting
-    public void setDiscarding(OpOrder.Barrier writeBarrier, AtomicReference<ReplayPosition> lastReplayPosition)
+    public void setDiscarding(OpOrder.Barrier writeBarrier, AtomicReference<CommitLogPosition> commitLogUpperBound)
     {
         assert this.writeBarrier == null;
-        this.commitLogUpperBound = lastReplayPosition;
+        this.commitLogUpperBound = commitLogUpperBound;
         this.writeBarrier = writeBarrier;
         allocator.setDiscarding();
     }
@@ -159,7 +158,7 @@
     }
 
     // decide if this memtable should take the write, or if it should go to the next memtable
-    public boolean accepts(OpOrder.Group opGroup, ReplayPosition replayPosition)
+    public boolean accepts(OpOrder.Group opGroup, CommitLogPosition commitLogPosition)
     {
         // if the barrier hasn't been set yet, then this memtable is still taking ALL writes
         OpOrder.Barrier barrier = this.writeBarrier;
@@ -169,7 +168,7 @@
         if (!barrier.isAfter(opGroup))
             return false;
         // if we aren't durable we are directed only by the barrier
-        if (replayPosition == null)
+        if (commitLogPosition == null)
             return true;
         while (true)
         {
@@ -178,17 +177,17 @@
             // its current value and ours; if it HAS been finalised, we simply accept its judgement
             // this permits us to coordinate a safe boundary, as the boundary choice is made
             // atomically wrt our max() maintenance, so an operation cannot sneak into the past
-            ReplayPosition currentLast = commitLogUpperBound.get();
-            if (currentLast instanceof LastReplayPosition)
-                return currentLast.compareTo(replayPosition) >= 0;
-            if (currentLast != null && currentLast.compareTo(replayPosition) >= 0)
+            CommitLogPosition currentLast = commitLogUpperBound.get();
+            if (currentLast instanceof LastCommitLogPosition)
+                return currentLast.compareTo(commitLogPosition) >= 0;
+            if (currentLast != null && currentLast.compareTo(commitLogPosition) >= 0)
                 return true;
-            if (commitLogUpperBound.compareAndSet(currentLast, replayPosition))
+            if (commitLogUpperBound.compareAndSet(currentLast, commitLogPosition))
                 return true;
         }
     }
 
-    public ReplayPosition getCommitLogLowerBound()
+    public CommitLogPosition getCommitLogLowerBound()
     {
         return commitLogLowerBound.get();
     }
@@ -203,7 +202,7 @@
         return partitions.isEmpty();
     }
 
-    public boolean mayContainDataBefore(ReplayPosition position)
+    public boolean mayContainDataBefore(CommitLogPosition position)
     {
         return approximateCommitLogLowerBound.compareTo(position) < 0;
     }
@@ -221,7 +220,7 @@
      * Should only be called by ColumnFamilyStore.apply via Keyspace.apply, which supplies the appropriate
      * OpOrdering.
      *
-     * replayPosition should only be null if this is a secondary index, in which case it is *expected* to be null
+     * commitLogSegmentPosition should only be null if this is a secondary index, in which case it is *expected* to be null
      */
     long put(PartitionUpdate update, UpdateTransaction indexer, OpOrder.Group opGroup)
     {
@@ -263,6 +262,48 @@
         return partitions.size();
     }
 
+    public List<FlushRunnable> flushRunnables(LifecycleTransaction txn)
+    {
+        List<Range<Token>> localRanges = Range.sort(StorageService.instance.getLocalRanges(cfs.keyspace.getName()));
+
+        if (!cfs.getPartitioner().splitter().isPresent() || localRanges.isEmpty())
+            return Collections.singletonList(new FlushRunnable(txn));
+
+        return createFlushRunnables(localRanges, txn);
+    }
+
+    private List<FlushRunnable> createFlushRunnables(List<Range<Token>> localRanges, LifecycleTransaction txn)
+    {
+        assert cfs.getPartitioner().splitter().isPresent();
+
+        Directories.DataDirectory[] locations = cfs.getDirectories().getWriteableLocations();
+        List<PartitionPosition> boundaries = StorageService.getDiskBoundaries(localRanges, cfs.getPartitioner(), locations);
+        List<FlushRunnable> runnables = new ArrayList<>(boundaries.size());
+        PartitionPosition rangeStart = cfs.getPartitioner().getMinimumToken().minKeyBound();
+        try
+        {
+            for (int i = 0; i < boundaries.size(); i++)
+            {
+                PartitionPosition t = boundaries.get(i);
+                runnables.add(new FlushRunnable(rangeStart, t, locations[i], txn));
+                rangeStart = t;
+            }
+            return runnables;
+        }
+        catch (Throwable e)
+        {
+            throw Throwables.propagate(abortRunnables(runnables, e));
+        }
+    }
+
+    public Throwable abortRunnables(List<FlushRunnable> runnables, Throwable t)
+    {
+        if (runnables != null)
+            for (FlushRunnable runnable : runnables)
+                t = runnable.writer.abort(t);
+        return t;
+    }
+
     public String toString()
     {
         return String.format("Memtable-%s@%s(%s serialized bytes, %s ops, %.0f%%/%.0f%% of on/off-heap limit)",
@@ -315,51 +356,73 @@
         return partitions.get(key);
     }
 
-    public Collection<SSTableReader> flush()
-    {
-        long estimatedSize = estimatedSize();
-        Directories.DataDirectory dataDirectory = cfs.getDirectories().getWriteableLocation(estimatedSize);
-        if (dataDirectory == null)
-            throw new RuntimeException("Insufficient disk space to write " + estimatedSize + " bytes");
-        File sstableDirectory = cfs.getDirectories().getLocationForDisk(dataDirectory);
-        assert sstableDirectory != null : "Flush task is not bound to any disk";
-        return writeSortedContents(sstableDirectory);
-    }
-
     public long getMinTimestamp()
     {
         return minTimestamp;
     }
 
-    private long estimatedSize()
+    class FlushRunnable implements Callable<SSTableMultiWriter>
     {
-        long keySize = 0;
-        for (PartitionPosition key : partitions.keySet())
+        private final long estimatedSize;
+        private final ConcurrentNavigableMap<PartitionPosition, AtomicBTreePartition> toFlush;
+
+        private final boolean isBatchLogTable;
+        private final SSTableMultiWriter writer;
+
+        // keeping these to be able to log what we are actually flushing
+        private final PartitionPosition from;
+        private final PartitionPosition to;
+
+        FlushRunnable(PartitionPosition from, PartitionPosition to, Directories.DataDirectory flushLocation, LifecycleTransaction txn)
         {
-            //  make sure we don't write non-sensical keys
-            assert key instanceof DecoratedKey;
-            keySize += ((DecoratedKey)key).getKey().remaining();
+            this(partitions.subMap(from, to), flushLocation, from, to, txn);
         }
-        return (long) ((keySize // index entries
-                        + keySize // keys in data file
-                        + liveDataSize.get()) // data
-                       * 1.2); // bloom filter and row index overhead
-    }
 
-    private Collection<SSTableReader> writeSortedContents(File sstableDirectory)
-    {
-        boolean isBatchLogTable = cfs.name.equals(SystemKeyspace.BATCHES) && cfs.keyspace.getName().equals(SystemKeyspace.NAME);
-
-        logger.debug("Writing {}", Memtable.this.toString());
-
-        Collection<SSTableReader> ssTables;
-        try (SSTableTxnWriter writer = createFlushWriter(cfs.getSSTablePath(sstableDirectory), columnsCollector.get(), statsCollector.get()))
+        FlushRunnable(LifecycleTransaction txn)
         {
+            this(partitions, null, null, null, txn);
+        }
+
+        FlushRunnable(ConcurrentNavigableMap<PartitionPosition, AtomicBTreePartition> toFlush, Directories.DataDirectory flushLocation, PartitionPosition from, PartitionPosition to, LifecycleTransaction txn)
+        {
+            this.toFlush = toFlush;
+            this.from = from;
+            this.to = to;
+            long keySize = 0;
+            for (PartitionPosition key : toFlush.keySet())
+            {
+                //  make sure we don't write non-sensical keys
+                assert key instanceof DecoratedKey;
+                keySize += ((DecoratedKey) key).getKey().remaining();
+            }
+            estimatedSize = (long) ((keySize // index entries
+                                    + keySize // keys in data file
+                                    + liveDataSize.get()) // data
+                                    * 1.2); // bloom filter and row index overhead
+
+            this.isBatchLogTable = cfs.name.equals(SystemKeyspace.BATCHES) && cfs.keyspace.getName().equals(SystemKeyspace.NAME);
+
+            if (flushLocation == null)
+                writer = createFlushWriter(txn, cfs.getSSTablePath(getDirectories().getWriteableLocationAsFile(estimatedSize)), columnsCollector.get(), statsCollector.get());
+            else
+                writer = createFlushWriter(txn, cfs.getSSTablePath(getDirectories().getLocationForDisk(flushLocation)), columnsCollector.get(), statsCollector.get());
+
+        }
+
+        protected Directories getDirectories()
+        {
+            return cfs.getDirectories();
+        }
+
+        private void writeSortedContents()
+        {
+            logger.debug("Writing {}, flushed range = ({}, {}]", Memtable.this.toString(), from, to);
+
             boolean trackContention = logger.isTraceEnabled();
             int heavilyContendedRowCount = 0;
             // (we can't clear out the map as-we-go to free up memory,
             //  since the memtable is being used for queries in the "pending flush" category)
-            for (AtomicBTreePartition partition : partitions.values())
+            for (AtomicBTreePartition partition : toFlush.values())
             {
                 // Each batchlog partition is a separate entry in the log. And for an entry, we only do 2
                 // operations: 1) we insert the entry and 2) we delete it. Further, BL data is strictly local,
@@ -381,59 +444,39 @@
                 }
             }
 
-            if (writer.getFilePointer() > 0)
-            {
-                logger.debug(String.format("Completed flushing %s (%s) for commitlog position %s",
-                                           writer.getFilename(),
-                                           FBUtilities.prettyPrintMemory(writer.getFilePointer()),
-                                           commitLogUpperBound));
-
-                // sstables should contain non-repaired data.
-                ssTables = writer.finish(true);
-            }
-            else
-            {
-                logger.debug("Completed flushing {}; nothing needed to be retained.  Commitlog position was {}",
-                             writer.getFilename(), commitLogUpperBound);
-                writer.abort();
-                ssTables = Collections.emptyList();
-            }
+            long bytesFlushed = writer.getFilePointer();
+            logger.debug(String.format("Completed flushing %s (%s) for commitlog position %s",
+                                                                              writer.getFilename(),
+                                                                              FBUtilities.prettyPrintMemory(bytesFlushed),
+                                                                              commitLogUpperBound));
+            // Update the metrics
+            cfs.metric.bytesFlushed.inc(bytesFlushed);
 
             if (heavilyContendedRowCount > 0)
-                logger.trace(String.format("High update contention in %d/%d partitions of %s ", heavilyContendedRowCount, partitions.size(), Memtable.this.toString()));
-
-            return ssTables;
+                logger.trace(String.format("High update contention in %d/%d partitions of %s ", heavilyContendedRowCount, toFlush.size(), Memtable.this.toString()));
         }
-    }
 
-    @SuppressWarnings("resource") // log and writer closed by SSTableTxnWriter
-    public SSTableTxnWriter createFlushWriter(String filename,
-                                              PartitionColumns columns,
-                                              EncodingStats stats)
-    {
-        // we operate "offline" here, as we expose the resulting reader consciously when done
-        // (although we may want to modify this behaviour in future, to encapsulate full flush behaviour in LifecycleTransaction)
-        LifecycleTransaction txn = null;
-        try
+        public SSTableMultiWriter createFlushWriter(LifecycleTransaction txn,
+                                                  String filename,
+                                                  PartitionColumns columns,
+                                                  EncodingStats stats)
         {
-            txn = LifecycleTransaction.offline(OperationType.FLUSH);
             MetadataCollector sstableMetadataCollector = new MetadataCollector(cfs.metadata.comparator)
-                                                         .commitLogLowerBound(commitLogLowerBound.get())
-                                                         .commitLogUpperBound(commitLogUpperBound.get());
+                    .commitLogLowerBound(commitLogLowerBound.get())
+                    .commitLogUpperBound(commitLogUpperBound.get());
+            return cfs.createSSTableMultiWriter(Descriptor.fromFilename(filename),
+                                                (long)toFlush.size(),
+                                                ActiveRepairService.UNREPAIRED_SSTABLE,
+                                                sstableMetadataCollector,
+                                                new SerializationHeader(true, cfs.metadata, columns, stats), txn);
 
-            return new SSTableTxnWriter(txn,
-                                        cfs.createSSTableMultiWriter(Descriptor.fromFilename(filename),
-                                                                     (long) partitions.size(),
-                                                                     ActiveRepairService.UNREPAIRED_SSTABLE,
-                                                                     sstableMetadataCollector,
-                                                                     new SerializationHeader(true, cfs.metadata, columns, stats),
-                                                                     txn));
         }
-        catch (Throwable t)
+
+        @Override
+        public SSTableMultiWriter call()
         {
-            if (txn != null)
-                txn.close();
-            throw t;
+            writeSortedContents();
+            return writer;
         }
     }
 
@@ -504,6 +547,7 @@
             assert entry.getKey() instanceof DecoratedKey;
             DecoratedKey key = (DecoratedKey)entry.getKey();
             ClusteringIndexFilter filter = dataRange.clusteringIndexFilter(key);
+
             return filter.getUnfilteredRowIterator(columnFilter, entry.getValue());
         }
     }

diff --git a/src/java/org/apache/cassandra/db/MultiCBuilder.java b/src/java/org/apache/cassandra/db/MultiCBuilder.java
index 7c77ab0..ae8c26c 100644
--- a/src/java/org/apache/cassandra/db/MultiCBuilder.java
+++ b/src/java/org/apache/cassandra/db/MultiCBuilder.java

@@ -19,6 +19,7 @@
 
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
+import java.util.Arrays;
 import java.util.List;
 import java.util.NavigableSet;
 
@@ -27,46 +28,41 @@
 import org.apache.cassandra.utils.btree.BTreeSet;
 
 /**
- * Builder that allow to build multiple Clustering/Slice.Bound at the same time.
+ * Builder that allow to build multiple Clustering/ClusteringBound at the same time.
  */
-public class MultiCBuilder
+public abstract class MultiCBuilder
 {
     /**
      * The table comparator.
      */
-    private final ClusteringComparator comparator;
+    protected final ClusteringComparator comparator;
 
     /**
-     * The elements of the clusterings
+     * The number of clustering elements that have been added.
      */
-    private final List<List<ByteBuffer>> elementsList = new ArrayList<>();
-
-    /**
-     * The number of elements that have been added.
-     */
-    private int size;
+    protected int size;
 
     /**
      * <code>true</code> if the clusterings have been build, <code>false</code> otherwise.
      */
-    private boolean built;
+    protected boolean built;
 
     /**
      * <code>true</code> if the clusterings contains some <code>null</code> elements.
      */
-    private boolean containsNull;
+    protected boolean containsNull;
 
     /**
      * <code>true</code> if the composites contains some <code>unset</code> elements.
      */
-    private boolean containsUnset;
+    protected boolean containsUnset;
 
     /**
      * <code>true</code> if some empty collection have been added.
      */
-    private boolean hasMissingElements;
+    protected boolean hasMissingElements;
 
-    private MultiCBuilder(ClusteringComparator comparator)
+    protected MultiCBuilder(ClusteringComparator comparator)
     {
         this.comparator = comparator;
     }
@@ -74,19 +70,11 @@
     /**
      * Creates a new empty {@code MultiCBuilder}.
      */
-    public static MultiCBuilder create(ClusteringComparator comparator)
+    public static MultiCBuilder create(ClusteringComparator comparator, boolean forMultipleValues)
     {
-        return new MultiCBuilder(comparator);
-    }
-
-    /**
-     * Checks if this builder is empty.
-     *
-     * @return <code>true</code> if this builder is empty, <code>false</code> otherwise.
-     */
-    private boolean isEmpty()
-    {
-        return elementsList.isEmpty();
+        return forMultipleValues
+             ? new MultiClusteringBuilder(comparator)
+             : new OneClusteringBuilder(comparator);
     }
 
     /**
@@ -99,25 +87,7 @@
      * @param value the value of the next element
      * @return this <code>MulitCBuilder</code>
      */
-    public MultiCBuilder addElementToAll(ByteBuffer value)
-    {
-        checkUpdateable();
-
-        if (isEmpty())
-            elementsList.add(new ArrayList<ByteBuffer>());
-
-        for (int i = 0, m = elementsList.size(); i < m; i++)
-        {
-            if (value == null)
-                containsNull = true;
-            if (value == ByteBufferUtil.UNSET_BYTE_BUFFER)
-                containsUnset = true;
-
-            elementsList.get(i).add(value);
-        }
-        size++;
-        return this;
-    }
+    public abstract MultiCBuilder addElementToAll(ByteBuffer value);
 
     /**
      * Adds individually each of the specified elements to the end of all of the existing clusterings.
@@ -129,42 +99,7 @@
      * @param values the elements to add
      * @return this <code>CompositeBuilder</code>
      */
-    public MultiCBuilder addEachElementToAll(List<ByteBuffer> values)
-    {
-        checkUpdateable();
-
-        if (isEmpty())
-            elementsList.add(new ArrayList<ByteBuffer>());
-
-        if (values.isEmpty())
-        {
-            hasMissingElements = true;
-        }
-        else
-        {
-            for (int i = 0, m = elementsList.size(); i < m; i++)
-            {
-                List<ByteBuffer> oldComposite = elementsList.remove(0);
-
-                for (int j = 0, n = values.size(); j < n; j++)
-                {
-                    List<ByteBuffer> newComposite = new ArrayList<>(oldComposite);
-                    elementsList.add(newComposite);
-
-                    ByteBuffer value = values.get(j);
-
-                    if (value == null)
-                        containsNull = true;
-                    if (value == ByteBufferUtil.UNSET_BYTE_BUFFER)
-                        containsUnset = true;
-
-                    newComposite.add(values.get(j));
-                }
-            }
-        }
-        size++;
-        return this;
-    }
+    public abstract MultiCBuilder addEachElementToAll(List<ByteBuffer> values);
 
     /**
      * Adds individually each of the specified list of elements to the end of all of the existing composites.
@@ -176,41 +111,12 @@
      * @param values the elements to add
      * @return this <code>CompositeBuilder</code>
      */
-    public MultiCBuilder addAllElementsToAll(List<List<ByteBuffer>> values)
+    public abstract MultiCBuilder addAllElementsToAll(List<List<ByteBuffer>> values);
+
+    protected void checkUpdateable()
     {
-        checkUpdateable();
-
-        if (isEmpty())
-            elementsList.add(new ArrayList<ByteBuffer>());
-
-        if (values.isEmpty())
-        {
-            hasMissingElements = true;
-        }
-        else
-        {
-            for (int i = 0, m = elementsList.size(); i < m; i++)
-            {
-                List<ByteBuffer> oldComposite = elementsList.remove(0);
-
-                for (int j = 0, n = values.size(); j < n; j++)
-                {
-                    List<ByteBuffer> newComposite = new ArrayList<>(oldComposite);
-                    elementsList.add(newComposite);
-
-                    List<ByteBuffer> value = values.get(j);
-
-                    if (value.contains(null))
-                        containsNull = true;
-                    if (value.contains(ByteBufferUtil.UNSET_BYTE_BUFFER))
-                        containsUnset = true;
-
-                    newComposite.addAll(value);
-                }
-            }
-            size += values.get(0).size();
-        }
-        return this;
+        if (!hasRemaining() || built)
+            throw new IllegalStateException("this builder cannot be updated anymore");
     }
 
     /**
@@ -257,109 +163,30 @@
      *
      * @return the clusterings
      */
-    public NavigableSet<Clustering> build()
-    {
-        built = true;
-
-        if (hasMissingElements)
-            return BTreeSet.empty(comparator);
-
-        CBuilder builder = CBuilder.create(comparator);
-
-        if (elementsList.isEmpty())
-            return BTreeSet.of(builder.comparator(), builder.build());
-
-        BTreeSet.Builder<Clustering> set = BTreeSet.builder(builder.comparator());
-        for (int i = 0, m = elementsList.size(); i < m; i++)
-        {
-            List<ByteBuffer> elements = elementsList.get(i);
-            set.add(builder.buildWith(elements));
-        }
-        return set.build();
-    }
+    public abstract NavigableSet<Clustering> build();
 
     /**
-     * Builds the <code>Slice.Bound</code>s for slice restrictions.
+     * Builds the <code>ClusteringBound</code>s for slice restrictions.
      *
      * @param isStart specify if the bound is a start one
      * @param isInclusive specify if the bound is inclusive or not
      * @param isOtherBoundInclusive specify if the other bound is inclusive or not
      * @param columnDefs the columns of the slice restriction
-     * @return the <code>Slice.Bound</code>s
+     * @return the <code>ClusteringBound</code>s
      */
-    public NavigableSet<Slice.Bound> buildBoundForSlice(boolean isStart,
-                                                        boolean isInclusive,
-                                                        boolean isOtherBoundInclusive,
-                                                        List<ColumnDefinition> columnDefs)
-    {
-        built = true;
+    public abstract NavigableSet<ClusteringBound> buildBoundForSlice(boolean isStart,
+                                                                 boolean isInclusive,
+                                                                 boolean isOtherBoundInclusive,
+                                                                 List<ColumnDefinition> columnDefs);
 
-        if (hasMissingElements)
-            return BTreeSet.empty(comparator);
-
-        CBuilder builder = CBuilder.create(comparator);
-
-        if (elementsList.isEmpty())
-            return BTreeSet.of(comparator, builder.buildBound(isStart, isInclusive));
-
-        // Use a TreeSet to sort and eliminate duplicates
-        BTreeSet.Builder<Slice.Bound> set = BTreeSet.builder(comparator);
-
-        // The first column of the slice might not be the first clustering column (e.g. clustering_0 = ? AND (clustering_1, clustering_2) >= (?, ?)
-        int offset = columnDefs.get(0).position();
-
-        for (int i = 0, m = elementsList.size(); i < m; i++)
-        {
-            List<ByteBuffer> elements = elementsList.get(i);
-
-            // Handle the no bound case
-            if (elements.size() == offset)
-            {
-                set.add(builder.buildBoundWith(elements, isStart, true));
-                continue;
-            }
-
-            // In the case of mixed order columns, we will have some extra slices where the columns change directions.
-            // For example: if we have clustering_0 DESC and clustering_1 ASC a slice like (clustering_0, clustering_1) > (1, 2)
-            // will produce 2 slices: [BOTTOM, 1) and (1.2, 1]
-            // So, the END bound will return 2 bounds with the same values 1
-            ColumnDefinition lastColumn = columnDefs.get(columnDefs.size() - 1);
-            if (elements.size() <= lastColumn.position() && i < m - 1 && elements.equals(elementsList.get(i + 1)))
-            {
-                set.add(builder.buildBoundWith(elements, isStart, false));
-                set.add(builder.buildBoundWith(elementsList.get(i++), isStart, true));
-                continue;
-            }
-
-            // Handle the normal bounds
-            ColumnDefinition column = columnDefs.get(elements.size() - 1 - offset);
-            set.add(builder.buildBoundWith(elements, isStart, column.isReversedType() ? isOtherBoundInclusive : isInclusive));
-        }
-        return set.build();
-    }
-
-    public NavigableSet<Slice.Bound> buildBound(boolean isStart, boolean isInclusive)
-    {
-        built = true;
-
-        if (hasMissingElements)
-            return BTreeSet.empty(comparator);
-
-        CBuilder builder = CBuilder.create(comparator);
-
-        if (elementsList.isEmpty())
-            return BTreeSet.of(comparator, builder.buildBound(isStart, isInclusive));
-
-        // Use a TreeSet to sort and eliminate duplicates
-        BTreeSet.Builder<Slice.Bound> set = BTreeSet.builder(comparator);
-
-        for (int i = 0, m = elementsList.size(); i < m; i++)
-        {
-            List<ByteBuffer> elements = elementsList.get(i);
-            set.add(builder.buildBoundWith(elements, isStart, isInclusive));
-        }
-        return set.build();
-    }
+    /**
+     * Builds the <code>ClusteringBound</code>s
+     *
+     * @param isStart specify if the bound is a start one
+     * @param isInclusive specify if the bound is inclusive or not
+     * @return the <code>ClusteringBound</code>s
+     */
+    public abstract NavigableSet<ClusteringBound> buildBound(boolean isStart, boolean isInclusive);
 
     /**
      * Checks if some elements can still be added to the clusterings.
@@ -371,9 +198,298 @@
         return remainingCount() > 0;
     }
 
-    private void checkUpdateable()
+    /**
+     * Specialization of MultiCBuilder when we know only one clustering/bound is created.
+     */
+    private static class OneClusteringBuilder extends MultiCBuilder
     {
-        if (!hasRemaining() || built)
-            throw new IllegalStateException("this builder cannot be updated anymore");
+        /**
+         * The elements of the clusterings
+         */
+        private final ByteBuffer[] elements;
+
+        public OneClusteringBuilder(ClusteringComparator comparator)
+        {
+            super(comparator);
+            this.elements = new ByteBuffer[comparator.size()];
+        }
+
+        public MultiCBuilder addElementToAll(ByteBuffer value)
+        {
+            checkUpdateable();
+
+            if (value == null)
+                containsNull = true;
+            if (value == ByteBufferUtil.UNSET_BYTE_BUFFER)
+                containsUnset = true;
+
+            elements[size++] = value;
+            return this;
+        }
+
+        public MultiCBuilder addEachElementToAll(List<ByteBuffer> values)
+        {
+            if (values.isEmpty())
+            {
+                hasMissingElements = true;
+                return this;
+            }
+
+            assert values.size() == 1;
+
+            return addElementToAll(values.get(0));
+        }
+
+        public MultiCBuilder addAllElementsToAll(List<List<ByteBuffer>> values)
+        {
+            if (values.isEmpty())
+            {
+                hasMissingElements = true;
+                return this;
+            }
+
+            assert values.size() == 1;
+            return addEachElementToAll(values.get(0));
+        }
+
+        public NavigableSet<Clustering> build()
+        {
+            built = true;
+
+            if (hasMissingElements)
+                return BTreeSet.empty(comparator);
+
+            return BTreeSet.of(comparator, size == 0 ? Clustering.EMPTY : Clustering.make(elements));
+        }
+
+        @Override
+        public NavigableSet<ClusteringBound> buildBoundForSlice(boolean isStart,
+                                                                boolean isInclusive,
+                                                                boolean isOtherBoundInclusive,
+                                                                List<ColumnDefinition> columnDefs)
+        {
+            return buildBound(isStart, columnDefs.get(0).isReversedType() ? isOtherBoundInclusive : isInclusive);
+        }
+
+        public NavigableSet<ClusteringBound> buildBound(boolean isStart, boolean isInclusive)
+        {
+            built = true;
+
+            if (hasMissingElements)
+                return BTreeSet.empty(comparator);
+
+            if (size == 0)
+                return BTreeSet.of(comparator, isStart ? ClusteringBound.BOTTOM : ClusteringBound.TOP);
+
+            ByteBuffer[] newValues = size == elements.length
+                                   ? elements
+                                   : Arrays.copyOf(elements, size);
+
+            return BTreeSet.of(comparator, ClusteringBound.create(ClusteringBound.boundKind(isStart, isInclusive), newValues));
+        }
+    }
+
+    /**
+     * MultiCBuilder implementation actually supporting the creation of multiple clustering/bound.
+     */
+    private static class MultiClusteringBuilder extends MultiCBuilder
+    {
+        /**
+         * The elements of the clusterings
+         */
+        private final List<List<ByteBuffer>> elementsList = new ArrayList<>();
+
+        public MultiClusteringBuilder(ClusteringComparator comparator)
+        {
+            super(comparator);
+        }
+
+        public MultiCBuilder addElementToAll(ByteBuffer value)
+        {
+            checkUpdateable();
+
+            if (elementsList.isEmpty())
+                elementsList.add(new ArrayList<ByteBuffer>());
+
+            if (value == null)
+                containsNull = true;
+            else if (value == ByteBufferUtil.UNSET_BYTE_BUFFER)
+                containsUnset = true;
+
+            for (int i = 0, m = elementsList.size(); i < m; i++)
+                elementsList.get(i).add(value);
+
+            size++;
+            return this;
+        }
+
+        public MultiCBuilder addEachElementToAll(List<ByteBuffer> values)
+        {
+            checkUpdateable();
+
+            if (elementsList.isEmpty())
+                elementsList.add(new ArrayList<ByteBuffer>());
+
+            if (values.isEmpty())
+            {
+                hasMissingElements = true;
+            }
+            else
+            {
+                for (int i = 0, m = elementsList.size(); i < m; i++)
+                {
+                    List<ByteBuffer> oldComposite = elementsList.remove(0);
+
+                    for (int j = 0, n = values.size(); j < n; j++)
+                    {
+                        List<ByteBuffer> newComposite = new ArrayList<>(oldComposite);
+                        elementsList.add(newComposite);
+
+                        ByteBuffer value = values.get(j);
+
+                        if (value == null)
+                            containsNull = true;
+                        if (value == ByteBufferUtil.UNSET_BYTE_BUFFER)
+                            containsUnset = true;
+
+                        newComposite.add(values.get(j));
+                    }
+                }
+            }
+            size++;
+            return this;
+        }
+
+        public MultiCBuilder addAllElementsToAll(List<List<ByteBuffer>> values)
+        {
+            checkUpdateable();
+
+            if (elementsList.isEmpty())
+                elementsList.add(new ArrayList<ByteBuffer>());
+
+            if (values.isEmpty())
+            {
+                hasMissingElements = true;
+            }
+            else
+            {
+                for (int i = 0, m = elementsList.size(); i < m; i++)
+                {
+                    List<ByteBuffer> oldComposite = elementsList.remove(0);
+
+                    for (int j = 0, n = values.size(); j < n; j++)
+                    {
+                        List<ByteBuffer> newComposite = new ArrayList<>(oldComposite);
+                        elementsList.add(newComposite);
+
+                        List<ByteBuffer> value = values.get(j);
+
+                        if (value.contains(null))
+                            containsNull = true;
+                        if (value.contains(ByteBufferUtil.UNSET_BYTE_BUFFER))
+                            containsUnset = true;
+
+                        newComposite.addAll(value);
+                    }
+                }
+                size += values.get(0).size();
+            }
+            return this;
+        }
+
+        public NavigableSet<Clustering> build()
+        {
+            built = true;
+
+            if (hasMissingElements)
+                return BTreeSet.empty(comparator);
+
+            CBuilder builder = CBuilder.create(comparator);
+
+            if (elementsList.isEmpty())
+                return BTreeSet.of(builder.comparator(), builder.build());
+
+            BTreeSet.Builder<Clustering> set = BTreeSet.builder(builder.comparator());
+            for (int i = 0, m = elementsList.size(); i < m; i++)
+            {
+                List<ByteBuffer> elements = elementsList.get(i);
+                set.add(builder.buildWith(elements));
+            }
+            return set.build();
+        }
+
+        public NavigableSet<ClusteringBound> buildBoundForSlice(boolean isStart,
+                                                            boolean isInclusive,
+                                                            boolean isOtherBoundInclusive,
+                                                            List<ColumnDefinition> columnDefs)
+        {
+            built = true;
+
+            if (hasMissingElements)
+                return BTreeSet.empty(comparator);
+
+            CBuilder builder = CBuilder.create(comparator);
+
+            if (elementsList.isEmpty())
+                return BTreeSet.of(comparator, builder.buildBound(isStart, isInclusive));
+
+            // Use a TreeSet to sort and eliminate duplicates
+            BTreeSet.Builder<ClusteringBound> set = BTreeSet.builder(comparator);
+
+            // The first column of the slice might not be the first clustering column (e.g. clustering_0 = ? AND (clustering_1, clustering_2) >= (?, ?)
+            int offset = columnDefs.get(0).position();
+
+            for (int i = 0, m = elementsList.size(); i < m; i++)
+            {
+                List<ByteBuffer> elements = elementsList.get(i);
+
+                // Handle the no bound case
+                if (elements.size() == offset)
+                {
+                    set.add(builder.buildBoundWith(elements, isStart, true));
+                    continue;
+                }
+
+                // In the case of mixed order columns, we will have some extra slices where the columns change directions.
+                // For example: if we have clustering_0 DESC and clustering_1 ASC a slice like (clustering_0, clustering_1) > (1, 2)
+                // will produce 2 slices: [BOTTOM, 1) and (1.2, 1]
+                // So, the END bound will return 2 bounds with the same values 1
+                ColumnDefinition lastColumn = columnDefs.get(columnDefs.size() - 1);
+                if (elements.size() <= lastColumn.position() && i < m - 1 && elements.equals(elementsList.get(i + 1)))
+                {
+                    set.add(builder.buildBoundWith(elements, isStart, false));
+                    set.add(builder.buildBoundWith(elementsList.get(i++), isStart, true));
+                    continue;
+                }
+
+                // Handle the normal bounds
+                ColumnDefinition column = columnDefs.get(elements.size() - 1 - offset);
+                set.add(builder.buildBoundWith(elements, isStart, column.isReversedType() ? isOtherBoundInclusive : isInclusive));
+            }
+            return set.build();
+        }
+
+        public NavigableSet<ClusteringBound> buildBound(boolean isStart, boolean isInclusive)
+        {
+            built = true;
+
+            if (hasMissingElements)
+                return BTreeSet.empty(comparator);
+
+            CBuilder builder = CBuilder.create(comparator);
+
+            if (elementsList.isEmpty())
+                return BTreeSet.of(comparator, builder.buildBound(isStart, isInclusive));
+
+            // Use a TreeSet to sort and eliminate duplicates
+            BTreeSet.Builder<ClusteringBound> set = BTreeSet.builder(comparator);
+
+            for (int i = 0, m = elementsList.size(); i < m; i++)
+            {
+                List<ByteBuffer> elements = elementsList.get(i);
+                set.add(builder.buildBoundWith(elements, isStart, isInclusive));
+            }
+            return set.build();
+        }
     }
 }

diff --git a/src/java/org/apache/cassandra/db/MutableDeletionInfo.java b/src/java/org/apache/cassandra/db/MutableDeletionInfo.java
index d01b1d1..4a2c455 100644
--- a/src/java/org/apache/cassandra/db/MutableDeletionInfo.java
+++ b/src/java/org/apache/cassandra/db/MutableDeletionInfo.java

@@ -17,6 +17,7 @@
  */
 package org.apache.cassandra.db;
 
+import java.util.Collections;
 import java.util.Iterator;
 
 import com.google.common.base.Objects;
@@ -151,12 +152,12 @@
     // Use sparingly, not the most efficient thing
     public Iterator<RangeTombstone> rangeIterator(boolean reversed)
     {
-        return ranges == null ? Iterators.<RangeTombstone>emptyIterator() : ranges.iterator(reversed);
+        return ranges == null ? Collections.emptyIterator() : ranges.iterator(reversed);
     }
 
     public Iterator<RangeTombstone> rangeIterator(Slice slice, boolean reversed)
     {
-        return ranges == null ? Iterators.<RangeTombstone>emptyIterator() : ranges.iterator(slice, reversed);
+        return ranges == null ? Collections.emptyIterator() : ranges.iterator(slice, reversed);
     }
 
     public RangeTombstone rangeCovering(Clustering name)
@@ -290,8 +291,8 @@
                 DeletionTime openDeletion = openMarker.openDeletionTime(reversed);
                 assert marker.closeDeletionTime(reversed).equals(openDeletion);
 
-                Slice.Bound open = openMarker.openBound(reversed);
-                Slice.Bound close = marker.closeBound(reversed);
+                ClusteringBound open = openMarker.openBound(reversed);
+                ClusteringBound close = marker.closeBound(reversed);
 
                 Slice slice = reversed ? Slice.make(close, open) : Slice.make(open, close);
                 deletion.add(new RangeTombstone(slice, openDeletion), comparator);

diff --git a/src/java/org/apache/cassandra/db/Mutation.java b/src/java/org/apache/cassandra/db/Mutation.java
index c6ad9b8..61e5ee9 100644
--- a/src/java/org/apache/cassandra/db/Mutation.java
+++ b/src/java/org/apache/cassandra/db/Mutation.java

@@ -38,15 +38,12 @@
 import org.apache.cassandra.net.MessageOut;
 import org.apache.cassandra.net.MessagingService;
 import org.apache.cassandra.utils.ByteBufferUtil;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
 
 // TODO convert this to a Builder pattern instead of encouraging M.add directly,
 // which is less-efficient since we have to keep a mutable HashMap around
 public class Mutation implements IMutation
 {
     public static final MutationSerializer serializer = new MutationSerializer();
-    private static final Logger logger = LoggerFactory.getLogger(Mutation.class);
 
     public static final String FORWARD_TO = "FWD_TO";
     public static final String FORWARD_FROM = "FWD_FRM";
@@ -64,6 +61,8 @@
     // keep track of when mutation has started waiting for a MV partition lock
     public final AtomicLong viewLockAcquireStart = new AtomicLong(0);
 
+    private boolean cdcEnabled = false;
+
     public Mutation(String keyspaceName, DecoratedKey key)
     {
         this(keyspaceName, key, new HashMap<>());
@@ -126,10 +125,21 @@
         return modifications.get(cfId);
     }
 
+    /**
+     * Adds PartitionUpdate to the local set of modifications.
+     * Assumes no updates for the Table this PartitionUpdate impacts.
+     *
+     * @param update PartitionUpdate to append to Modifications list
+     * @return Mutation this mutation
+     * @throws IllegalArgumentException If PartitionUpdate for duplicate table is passed as argument
+     */
     public Mutation add(PartitionUpdate update)
     {
         assert update != null;
         assert update.partitionKey().getPartitioner() == key.getPartitioner();
+
+        cdcEnabled |= update.metadata().params.cdc;
+
         PartitionUpdate prev = modifications.put(update.metadata().cfId, update);
         if (prev != null)
             // developer error
@@ -259,6 +269,11 @@
         return gcgs;
     }
 
+    public boolean trackedByCDC()
+    {
+        return cdcEnabled;
+    }
+
     public String toString()
     {
         return toString(false);
@@ -272,7 +287,7 @@
         buff.append(", modifications=[");
         if (shallow)
         {
-            List<String> cfnames = new ArrayList<String>(modifications.size());
+            List<String> cfnames = new ArrayList<>(modifications.size());
             for (UUID cfid : modifications.keySet())
             {
                 CFMetaData cfm = Schema.instance.getCFMetaData(cfid);
@@ -282,7 +297,7 @@
         }
         else
         {
-            buff.append("\n  ").append(StringUtils.join(modifications.values(), "\n  ")).append("\n");
+            buff.append("\n  ").append(StringUtils.join(modifications.values(), "\n  ")).append('\n');
         }
         return buff.append("])").toString();
     }

diff --git a/src/java/org/apache/cassandra/db/NativeClustering.java b/src/java/org/apache/cassandra/db/NativeClustering.java
new file mode 100644
index 0000000..1943b71
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/NativeClustering.java

@@ -0,0 +1,125 @@
+/*
+* Licensed to the Apache Software Foundation (ASF) under one
+* or more contributor license agreements.  See the NOTICE file
+* distributed with this work for additional information
+* regarding copyright ownership.  The ASF licenses this file
+* to you under the Apache License, Version 2.0 (the
+* "License"); you may not use this file except in compliance
+* with the License.  You may obtain a copy of the License at
+*
+*    http://www.apache.org/licenses/LICENSE-2.0
+*
+* Unless required by applicable law or agreed to in writing,
+* software distributed under the License is distributed on an
+* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+* KIND, either express or implied.  See the License for the
+* specific language governing permissions and limitations
+* under the License.
+*/
+package org.apache.cassandra.db;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+import org.apache.cassandra.utils.ObjectSizes;
+import org.apache.cassandra.utils.concurrent.OpOrder;
+import org.apache.cassandra.utils.memory.MemoryUtil;
+import org.apache.cassandra.utils.memory.NativeAllocator;
+
+public class NativeClustering extends AbstractClusteringPrefix implements Clustering
+{
+    private static final long EMPTY_SIZE = ObjectSizes.measure(new NativeClustering());
+
+    private final long peer;
+
+    private NativeClustering() { peer = 0; }
+
+    public NativeClustering(NativeAllocator allocator, OpOrder.Group writeOp, Clustering clustering)
+    {
+        int count = clustering.size();
+        int metadataSize = (count * 2) + 4;
+        int dataSize = clustering.dataSize();
+        int bitmapSize = ((count + 7) >>> 3);
+
+        assert count < 64 << 10;
+        assert dataSize < 64 << 10;
+
+        peer = allocator.allocate(metadataSize + dataSize + bitmapSize, writeOp);
+        long bitmapStart = peer + metadataSize;
+        MemoryUtil.setShort(peer, (short) count);
+        MemoryUtil.setShort(peer + (metadataSize - 2), (short) dataSize); // goes at the end of the other offsets
+
+        MemoryUtil.setByte(bitmapStart, bitmapSize, (byte) 0);
+        long dataStart = peer + metadataSize + bitmapSize;
+        int dataOffset = 0;
+        for (int i = 0 ; i < count ; i++)
+        {
+            MemoryUtil.setShort(peer + 2 + i * 2, (short) dataOffset);
+
+            ByteBuffer value = clustering.get(i);
+            if (value == null)
+            {
+                long boffset = bitmapStart + (i >>> 3);
+                int b = MemoryUtil.getByte(boffset);
+                b |= 1 << (i & 7);
+                MemoryUtil.setByte(boffset, (byte) b);
+                continue;
+            }
+
+            assert value.order() == ByteOrder.BIG_ENDIAN;
+
+            int size = value.remaining();
+            MemoryUtil.setBytes(dataStart + dataOffset, value);
+            dataOffset += size;
+        }
+    }
+
+    public Kind kind()
+    {
+        return Kind.CLUSTERING;
+    }
+
+    public int size()
+    {
+        return MemoryUtil.getShort(peer);
+    }
+
+    public ByteBuffer get(int i)
+    {
+        // offset at which we store the dataOffset
+        int size = size();
+        if (i >= size)
+            throw new IndexOutOfBoundsException();
+
+        int metadataSize = (size * 2) + 4;
+        int bitmapSize = ((size + 7) >>> 3);
+        long bitmapStart = peer + metadataSize;
+        int b = MemoryUtil.getByte(bitmapStart + (i >>> 3));
+        if ((b & (1 << (i & 7))) != 0)
+            return null;
+
+        int startOffset = MemoryUtil.getShort(peer + 2 + i * 2);
+        int endOffset = MemoryUtil.getShort(peer + 4 + i * 2);
+        return MemoryUtil.getByteBuffer(bitmapStart + bitmapSize + startOffset,
+                                        endOffset - startOffset,
+                                        ByteOrder.BIG_ENDIAN);
+    }
+
+    public ByteBuffer[] getRawValues()
+    {
+        ByteBuffer[] values = new ByteBuffer[size()];
+        for (int i = 0 ; i < values.length ; i++)
+            values[i] = get(i);
+        return values;
+    }
+
+    public long unsharedHeapSize()
+    {
+        return EMPTY_SIZE;
+    }
+
+    public long unsharedHeapSizeExcludingData()
+    {
+        return EMPTY_SIZE;
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/NativeDecoratedKey.java b/src/java/org/apache/cassandra/db/NativeDecoratedKey.java
index ca874c3..019209e 100644
--- a/src/java/org/apache/cassandra/db/NativeDecoratedKey.java
+++ b/src/java/org/apache/cassandra/db/NativeDecoratedKey.java

@@ -18,9 +18,11 @@
 package org.apache.cassandra.db;
 
 import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
 
 import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.utils.concurrent.OpOrder;
+import org.apache.cassandra.utils.memory.HeapAllocator;
 import org.apache.cassandra.utils.memory.MemoryUtil;
 import org.apache.cassandra.utils.memory.NativeAllocator;
 
@@ -32,6 +34,8 @@
     {
         super(token);
         assert key != null;
+        assert key.order() == ByteOrder.BIG_ENDIAN;
+
         int size = key.remaining();
         this.peer = allocator.allocate(4 + size, writeOp);
         MemoryUtil.setInt(peer, size);
@@ -40,6 +44,6 @@
 
     public ByteBuffer getKey()
     {
-        return MemoryUtil.getByteBuffer(peer + 4, MemoryUtil.getInt(peer));
+        return MemoryUtil.getByteBuffer(peer + 4, MemoryUtil.getInt(peer), ByteOrder.BIG_ENDIAN);
     }
 }

diff --git a/src/java/org/apache/cassandra/db/PartitionRangeReadCommand.java b/src/java/org/apache/cassandra/db/PartitionRangeReadCommand.java
index 99e24c8..9517503 100644
--- a/src/java/org/apache/cassandra/db/PartitionRangeReadCommand.java
+++ b/src/java/org/apache/cassandra/db/PartitionRangeReadCommand.java

@@ -173,7 +173,7 @@
         metric.rangeLatency.addNano(latencyNanos);
     }
 
-    protected UnfilteredPartitionIterator queryStorage(final ColumnFamilyStore cfs, ReadOrderGroup orderGroup)
+    protected UnfilteredPartitionIterator queryStorage(final ColumnFamilyStore cfs, ReadExecutionController executionController)
     {
         ColumnFamilyStore.ViewFragment view = cfs.select(View.selectLive(dataRange().keyRange()));
         Tracing.trace("Executing seq scan across {} sstables for {}", view.sstables.size(), dataRange().keyRange().getString(metadata().getKeyValidator()));

diff --git a/src/java/org/apache/cassandra/db/RangeTombstone.java b/src/java/org/apache/cassandra/db/RangeTombstone.java
index 8af3b97..8e01b8e 100644
--- a/src/java/org/apache/cassandra/db/RangeTombstone.java
+++ b/src/java/org/apache/cassandra/db/RangeTombstone.java

@@ -17,16 +17,9 @@
  */
 package org.apache.cassandra.db;
 
-import java.io.IOException;
-import java.nio.ByteBuffer;
-import java.util.List;
 import java.util.Objects;
 
-import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.rows.RangeTombstoneMarker;
-import org.apache.cassandra.io.util.DataInputPlus;
-import org.apache.cassandra.io.util.DataOutputPlus;
-import org.apache.cassandra.utils.memory.AbstractAllocator;
 
 
 /**
@@ -89,129 +82,4 @@
     {
         return Objects.hash(deletedSlice(), deletionTime());
     }
-
-    /**
-     * The bound of a range tombstone.
-     * <p>
-     * This is the same than for a slice but it includes "boundaries" between ranges. A boundary simply condensed
-     * a close and an opening "bound" into a single object. There is 2 main reasons for these "shortcut" boundaries:
-     *   1) When merging multiple iterators having range tombstones (that are represented by their start and end markers),
-     *      we need to know when a range is close on an iterator, if it is reopened right away. Otherwise, we cannot
-     *      easily produce the markers on the merged iterators within risking to fail the sorting guarantees of an
-     *      iterator. See this comment for more details: https://goo.gl/yyB5mR.
-     *   2) This saves some storage space.
-     */
-    public static class Bound extends Slice.Bound
-    {
-        public static final Serializer serializer = new Serializer();
-
-        /** The smallest start bound, i.e. the one that starts before any row. */
-        public static final Bound BOTTOM = new Bound(Kind.INCL_START_BOUND, EMPTY_VALUES_ARRAY);
-        /** The biggest end bound, i.e. the one that ends after any row. */
-        public static final Bound TOP = new Bound(Kind.INCL_END_BOUND, EMPTY_VALUES_ARRAY);
-
-        public Bound(Kind kind, ByteBuffer[] values)
-        {
-            super(kind, values);
-            assert values.length > 0 || !kind.isBoundary();
-        }
-
-        public boolean isBoundary()
-        {
-            return kind.isBoundary();
-        }
-
-        public boolean isOpen(boolean reversed)
-        {
-            return kind.isOpen(reversed);
-        }
-
-        public boolean isClose(boolean reversed)
-        {
-            return kind.isClose(reversed);
-        }
-
-        public static RangeTombstone.Bound inclusiveOpen(boolean reversed, ByteBuffer[] boundValues)
-        {
-            return new Bound(reversed ? Kind.INCL_END_BOUND : Kind.INCL_START_BOUND, boundValues);
-        }
-
-        public static RangeTombstone.Bound exclusiveOpen(boolean reversed, ByteBuffer[] boundValues)
-        {
-            return new Bound(reversed ? Kind.EXCL_END_BOUND : Kind.EXCL_START_BOUND, boundValues);
-        }
-
-        public static RangeTombstone.Bound inclusiveClose(boolean reversed, ByteBuffer[] boundValues)
-        {
-            return new Bound(reversed ? Kind.INCL_START_BOUND : Kind.INCL_END_BOUND, boundValues);
-        }
-
-        public static RangeTombstone.Bound exclusiveClose(boolean reversed, ByteBuffer[] boundValues)
-        {
-            return new Bound(reversed ? Kind.EXCL_START_BOUND : Kind.EXCL_END_BOUND, boundValues);
-        }
-
-        public static RangeTombstone.Bound inclusiveCloseExclusiveOpen(boolean reversed, ByteBuffer[] boundValues)
-        {
-            return new Bound(reversed ? Kind.EXCL_END_INCL_START_BOUNDARY : Kind.INCL_END_EXCL_START_BOUNDARY, boundValues);
-        }
-
-        public static RangeTombstone.Bound exclusiveCloseInclusiveOpen(boolean reversed, ByteBuffer[] boundValues)
-        {
-            return new Bound(reversed ? Kind.INCL_END_EXCL_START_BOUNDARY : Kind.EXCL_END_INCL_START_BOUNDARY, boundValues);
-        }
-
-        public static RangeTombstone.Bound fromSliceBound(Slice.Bound sliceBound)
-        {
-            return new RangeTombstone.Bound(sliceBound.kind(), sliceBound.getRawValues());
-        }
-
-        public RangeTombstone.Bound copy(AbstractAllocator allocator)
-        {
-            ByteBuffer[] newValues = new ByteBuffer[size()];
-            for (int i = 0; i < size(); i++)
-                newValues[i] = allocator.clone(get(i));
-            return new Bound(kind(), newValues);
-        }
-
-        @Override
-        public Bound withNewKind(Kind kind)
-        {
-            return new Bound(kind, values);
-        }
-
-        public static class Serializer
-        {
-            public void serialize(RangeTombstone.Bound bound, DataOutputPlus out, int version, List<AbstractType<?>> types) throws IOException
-            {
-                out.writeByte(bound.kind().ordinal());
-                out.writeShort(bound.size());
-                ClusteringPrefix.serializer.serializeValuesWithoutSize(bound, out, version, types);
-            }
-
-            public long serializedSize(RangeTombstone.Bound bound, int version, List<AbstractType<?>> types)
-            {
-                return 1 // kind ordinal
-                     + TypeSizes.sizeof((short)bound.size())
-                     + ClusteringPrefix.serializer.valuesWithoutSizeSerializedSize(bound, version, types);
-            }
-
-            public RangeTombstone.Bound deserialize(DataInputPlus in, int version, List<AbstractType<?>> types) throws IOException
-            {
-                Kind kind = Kind.values()[in.readByte()];
-                return deserializeValues(in, kind, version, types);
-            }
-
-            public RangeTombstone.Bound deserializeValues(DataInputPlus in, Kind kind, int version,
-                    List<AbstractType<?>> types) throws IOException
-            {
-                int size = in.readUnsignedShort();
-                if (size == 0)
-                    return kind.isStart() ? BOTTOM : TOP;
-
-                ByteBuffer[] values = ClusteringPrefix.serializer.deserializeValuesWithoutSize(in, size, version, types);
-                return new RangeTombstone.Bound(kind, values);
-            }
-        }
-    }
 }

diff --git a/src/java/org/apache/cassandra/db/RangeTombstoneList.java b/src/java/org/apache/cassandra/db/RangeTombstoneList.java
index c67ea33..c60b774 100644
--- a/src/java/org/apache/cassandra/db/RangeTombstoneList.java
+++ b/src/java/org/apache/cassandra/db/RangeTombstoneList.java

@@ -38,7 +38,7 @@
  * A range tombstone has 4 elements: the start and end of the range covered,
  * and the deletion infos (markedAt timestamp and local deletion time). The
  * markedAt timestamp is what define the priority of 2 overlapping tombstones.
- * That is, given 2 tombstones [0, 10]@t1 and [5, 15]@t2, then if t2 > t1 (and
+ * That is, given 2 tombstones {@code [0, 10]@t1 and [5, 15]@t2, then if t2 > t1} (and
  * are the tombstones markedAt values), the 2nd tombstone take precedence over
  * the first one on [5, 10]. If such tombstones are added to a RangeTombstoneList,
  * the range tombstone list will store them as [[0, 5]@t1, [5, 15]@t2].
@@ -54,15 +54,15 @@
 
     // Note: we don't want to use a List for the markedAts and delTimes to avoid boxing. We could
     // use a List for starts and ends, but having arrays everywhere is almost simpler.
-    private Slice.Bound[] starts;
-    private Slice.Bound[] ends;
+    private ClusteringBound[] starts;
+    private ClusteringBound[] ends;
     private long[] markedAts;
     private int[] delTimes;
 
     private long boundaryHeapSize;
     private int size;
 
-    private RangeTombstoneList(ClusteringComparator comparator, Slice.Bound[] starts, Slice.Bound[] ends, long[] markedAts, int[] delTimes, long boundaryHeapSize, int size)
+    private RangeTombstoneList(ClusteringComparator comparator, ClusteringBound[] starts, ClusteringBound[] ends, long[] markedAts, int[] delTimes, long boundaryHeapSize, int size)
     {
         assert starts.length == ends.length && starts.length == markedAts.length && starts.length == delTimes.length;
         this.comparator = comparator;
@@ -76,7 +76,7 @@
 
     public RangeTombstoneList(ClusteringComparator comparator, int capacity)
     {
-        this(comparator, new Slice.Bound[capacity], new Slice.Bound[capacity], new long[capacity], new int[capacity], 0, 0);
+        this(comparator, new ClusteringBound[capacity], new ClusteringBound[capacity], new long[capacity], new int[capacity], 0, 0);
     }
 
     public boolean isEmpty()
@@ -107,8 +107,8 @@
     public RangeTombstoneList copy(AbstractAllocator allocator)
     {
         RangeTombstoneList copy =  new RangeTombstoneList(comparator,
-                                                          new Slice.Bound[size],
-                                                          new Slice.Bound[size],
+                                                          new ClusteringBound[size],
+                                                          new ClusteringBound[size],
                                                           Arrays.copyOf(markedAts, size),
                                                           Arrays.copyOf(delTimes, size),
                                                           boundaryHeapSize, size);
@@ -123,12 +123,12 @@
         return copy;
     }
 
-    private static Slice.Bound clone(Slice.Bound bound, AbstractAllocator allocator)
+    private static ClusteringBound clone(ClusteringBound bound, AbstractAllocator allocator)
     {
         ByteBuffer[] values = new ByteBuffer[bound.size()];
         for (int i = 0; i < values.length; i++)
             values[i] = allocator.clone(bound.get(i));
-        return new Slice.Bound(bound.kind(), values);
+        return new ClusteringBound(bound.kind(), values);
     }
 
     public void add(RangeTombstone tombstone)
@@ -145,7 +145,7 @@
      * This method will be faster if the new tombstone sort after all the currently existing ones (this is a common use case),
      * but it doesn't assume it.
      */
-    public void add(Slice.Bound start, Slice.Bound end, long markedAt, int delTime)
+    public void add(ClusteringBound start, ClusteringBound end, long markedAt, int delTime)
     {
         if (isEmpty())
         {
@@ -324,17 +324,17 @@
         return new RangeTombstone(Slice.make(starts[idx], ends[idx]), new DeletionTime(markedAts[idx], delTimes[idx]));
     }
 
-    private RangeTombstone rangeTombstoneWithNewStart(int idx, Slice.Bound newStart)
+    private RangeTombstone rangeTombstoneWithNewStart(int idx, ClusteringBound newStart)
     {
         return new RangeTombstone(Slice.make(newStart, ends[idx]), new DeletionTime(markedAts[idx], delTimes[idx]));
     }
 
-    private RangeTombstone rangeTombstoneWithNewEnd(int idx, Slice.Bound newEnd)
+    private RangeTombstone rangeTombstoneWithNewEnd(int idx, ClusteringBound newEnd)
     {
         return new RangeTombstone(Slice.make(starts[idx], newEnd), new DeletionTime(markedAts[idx], delTimes[idx]));
     }
 
-    private RangeTombstone rangeTombstoneWithNewBounds(int idx, Slice.Bound newStart, Slice.Bound newEnd)
+    private RangeTombstone rangeTombstoneWithNewBounds(int idx, ClusteringBound newStart, ClusteringBound newEnd)
     {
         return new RangeTombstone(Slice.make(newStart, newEnd), new DeletionTime(markedAts[idx], delTimes[idx]));
     }
@@ -380,13 +380,13 @@
 
     private Iterator<RangeTombstone> forwardIterator(final Slice slice)
     {
-        int startIdx = slice.start() == Slice.Bound.BOTTOM ? 0 : searchInternal(slice.start(), 0, size);
+        int startIdx = slice.start() == ClusteringBound.BOTTOM ? 0 : searchInternal(slice.start(), 0, size);
         final int start = startIdx < 0 ? -startIdx-1 : startIdx;
 
         if (start >= size)
             return Collections.emptyIterator();
 
-        int finishIdx = slice.end() == Slice.Bound.TOP ? size - 1 : searchInternal(slice.end(), start, size);
+        int finishIdx = slice.end() == ClusteringBound.TOP ? size - 1 : searchInternal(slice.end(), start, size);
         // if stopIdx is the first range after 'slice.end()' we care only until the previous range
         final int finish = finishIdx < 0 ? -finishIdx-2 : finishIdx;
 
@@ -397,8 +397,8 @@
         {
             // We want to make sure the range are stricly included within the queried slice as this
             // make it easier to combine things when iterating over successive slices.
-            Slice.Bound s = comparator.compare(starts[start], slice.start()) < 0 ? slice.start() : starts[start];
-            Slice.Bound e = comparator.compare(slice.end(), ends[start]) < 0 ? slice.end() : ends[start];
+            ClusteringBound s = comparator.compare(starts[start], slice.start()) < 0 ? slice.start() : starts[start];
+            ClusteringBound e = comparator.compare(slice.end(), ends[start]) < 0 ? slice.end() : ends[start];
             return Iterators.<RangeTombstone>singletonIterator(rangeTombstoneWithNewBounds(start, s, e));
         }
 
@@ -425,14 +425,14 @@
 
     private Iterator<RangeTombstone> reverseIterator(final Slice slice)
     {
-        int startIdx = slice.end() == Slice.Bound.TOP ? size - 1 : searchInternal(slice.end(), 0, size);
+        int startIdx = slice.end() == ClusteringBound.TOP ? size - 1 : searchInternal(slice.end(), 0, size);
         // if startIdx is the first range after 'slice.end()' we care only until the previous range
         final int start = startIdx < 0 ? -startIdx-2 : startIdx;
 
         if (start < 0)
             return Collections.emptyIterator();
 
-        int finishIdx = slice.start() == Slice.Bound.BOTTOM ? 0 : searchInternal(slice.start(), 0, start + 1);  // include same as finish
+        int finishIdx = slice.start() == ClusteringBound.BOTTOM ? 0 : searchInternal(slice.start(), 0, start + 1);  // include same as finish
         // if stopIdx is the first range after 'slice.end()' we care only until the previous range
         final int finish = finishIdx < 0 ? -finishIdx-1 : finishIdx;
 
@@ -443,8 +443,8 @@
         {
             // We want to make sure the range are stricly included within the queried slice as this
             // make it easier to combine things when iterator over successive slices.
-            Slice.Bound s = comparator.compare(starts[start], slice.start()) < 0 ? slice.start() : starts[start];
-            Slice.Bound e = comparator.compare(slice.end(), ends[start]) < 0 ? slice.end() : ends[start];
+            ClusteringBound s = comparator.compare(starts[start], slice.start()) < 0 ? slice.start() : starts[start];
+            ClusteringBound e = comparator.compare(slice.end(), ends[start]) < 0 ? slice.end() : ends[start];
             return Iterators.<RangeTombstone>singletonIterator(rangeTombstoneWithNewBounds(start, s, e));
         }
 
@@ -456,7 +456,6 @@
             {
                 if (idx < 0 || idx < finish)
                     return endOfData();
-
                 // We want to make sure the range are stricly included within the queried slice as this
                 // make it easier to combine things when iterator over successive slices. This means that
                 // for the first and last range we might have to "cut" the range returned.
@@ -528,7 +527,7 @@
      *   - e_i <= s_i+1
      * Basically, range are non overlapping and in order.
      */
-    private void insertFrom(int i, Slice.Bound start, Slice.Bound end, long markedAt, int delTime)
+    private void insertFrom(int i, ClusteringBound start, ClusteringBound end, long markedAt, int delTime)
     {
         while (i < size)
         {
@@ -547,7 +546,7 @@
                 // First deal with what might come before the newly added one.
                 if (comparator.compare(starts[i], start) < 0)
                 {
-                    Slice.Bound newEnd = start.invert();
+                    ClusteringBound newEnd = start.invert();
                     if (!Slice.isEmpty(comparator, starts[i], newEnd))
                     {
                         addInternal(i, starts[i], start.invert(), markedAts[i], delTimes[i]);
@@ -595,7 +594,7 @@
                     // one to reflect the not overwritten parts. We're then done.
                     addInternal(i, start, end, markedAt, delTime);
                     i++;
-                    Slice.Bound newStart = end.invert();
+                    ClusteringBound newStart = end.invert();
                     if (!Slice.isEmpty(comparator, newStart, ends[i]))
                     {
                         setInternal(i, newStart, ends[i], markedAts[i], delTimes[i]);
@@ -617,7 +616,7 @@
                         addInternal(i, start, end, markedAt, delTime);
                         return;
                     }
-                    Slice.Bound newEnd = starts[i].invert();
+                    ClusteringBound newEnd = starts[i].invert();
                     if (!Slice.isEmpty(comparator, start, newEnd))
                     {
                         addInternal(i, start, newEnd, markedAt, delTime);
@@ -649,7 +648,7 @@
     /*
      * Adds the new tombstone at index i, growing and/or moving elements to make room for it.
      */
-    private void addInternal(int i, Slice.Bound start, Slice.Bound end, long markedAt, int delTime)
+    private void addInternal(int i, ClusteringBound start, ClusteringBound end, long markedAt, int delTime)
     {
         assert i >= 0;
 
@@ -688,12 +687,12 @@
         delTimes = grow(delTimes, size, newLength, i);
     }
 
-    private static Slice.Bound[] grow(Slice.Bound[] a, int size, int newLength, int i)
+    private static ClusteringBound[] grow(ClusteringBound[] a, int size, int newLength, int i)
     {
         if (i < 0 || i >= size)
             return Arrays.copyOf(a, newLength);
 
-        Slice.Bound[] newA = new Slice.Bound[newLength];
+        ClusteringBound[] newA = new ClusteringBound[newLength];
         System.arraycopy(a, 0, newA, 0, i);
         System.arraycopy(a, i, newA, i+1, size - i);
         return newA;
@@ -738,7 +737,7 @@
         starts[i] = null;
     }
 
-    private void setInternal(int i, Slice.Bound start, Slice.Bound end, long markedAt, int delTime)
+    private void setInternal(int i, ClusteringBound start, ClusteringBound end, long markedAt, int delTime)
     {
         if (starts[i] != null)
             boundaryHeapSize -= starts[i].unsharedHeapSize() + ends[i].unsharedHeapSize();

diff --git a/src/java/org/apache/cassandra/db/ReadCommand.java b/src/java/org/apache/cassandra/db/ReadCommand.java
index 36969f8..68c9e3b 100644
--- a/src/java/org/apache/cassandra/db/ReadCommand.java
+++ b/src/java/org/apache/cassandra/db/ReadCommand.java

@@ -28,8 +28,10 @@
 import org.apache.cassandra.config.*;
 import org.apache.cassandra.cql3.Operator;
 import org.apache.cassandra.db.filter.*;
+import org.apache.cassandra.db.monitoring.MonitorableImpl;
 import org.apache.cassandra.db.partitions.*;
 import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.db.transform.StoppingTransformation;
 import org.apache.cassandra.db.transform.Transformation;
 import org.apache.cassandra.dht.AbstractBounds;
 import org.apache.cassandra.index.Index;
@@ -46,6 +48,7 @@
 import org.apache.cassandra.service.ClientWarn;
 import org.apache.cassandra.tracing.Tracing;
 import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.Pair;
 
 /**
@@ -54,10 +57,10 @@
  * <p>
  * This contains all the informations needed to do a local read.
  */
-public abstract class ReadCommand implements ReadQuery
+public abstract class ReadCommand extends MonitorableImpl implements ReadQuery
 {
+    private static final int TEST_ITERATION_DELAY_MILLIS = Integer.valueOf(System.getProperty("cassandra.test.read_iteration_delay_ms", "0"));
     protected static final Logger logger = LoggerFactory.getLogger(ReadCommand.class);
-
     public static final IVersionedSerializer<ReadCommand> serializer = new Serializer();
 
     // For READ verb: will either dispatch on 'serializer' for 3.0 or 'legacyReadCommandSerializer' for earlier version.
@@ -309,7 +312,7 @@
      */
     public abstract ReadCommand copy();
 
-    protected abstract UnfilteredPartitionIterator queryStorage(ColumnFamilyStore cfs, ReadOrderGroup orderGroup);
+    protected abstract UnfilteredPartitionIterator queryStorage(ColumnFamilyStore cfs, ReadExecutionController executionController);
 
     protected abstract int oldestUnrepairedTombstone();
 
@@ -367,13 +370,13 @@
     /**
      * Executes this command on the local host.
      *
-     * @param orderGroup the operation group spanning this command
+     * @param executionController the execution controller spanning this command
      *
      * @return an iterator over the result of executing this command locally.
      */
     @SuppressWarnings("resource") // The result iterator is closed upon exceptions (we know it's fine to potentially not close the intermediary
                                   // iterators created inside the try as long as we do close the original resultIterator), or by closing the result.
-    public UnfilteredPartitionIterator executeLocally(ReadOrderGroup orderGroup)
+    public UnfilteredPartitionIterator executeLocally(ReadExecutionController executionController)
     {
         long startTimeNanos = System.nanoTime();
 
@@ -391,11 +394,12 @@
         }
 
         UnfilteredPartitionIterator resultIterator = searcher == null
-                                         ? queryStorage(cfs, orderGroup)
-                                         : searcher.search(orderGroup);
+                                         ? queryStorage(cfs, executionController)
+                                         : searcher.search(executionController);
 
         try
         {
+            resultIterator = withStateTracking(resultIterator);
             resultIterator = withMetricsRecording(withoutPurgeableTombstones(resultIterator, cfs), cfs.metric, startTimeNanos);
 
             // If we've used a 2ndary index, we know the result already satisfy the primary expression used, so
@@ -419,14 +423,14 @@
 
     protected abstract void recordLatency(TableMetrics metric, long latencyNanos);
 
-    public PartitionIterator executeInternal(ReadOrderGroup orderGroup)
+    public PartitionIterator executeInternal(ReadExecutionController controller)
     {
-        return UnfilteredPartitionIterators.filter(executeLocally(orderGroup), nowInSec());
+        return UnfilteredPartitionIterators.filter(executeLocally(controller), nowInSec());
     }
 
-    public ReadOrderGroup startOrderGroup()
+    public ReadExecutionController executionController()
     {
-        return ReadOrderGroup.forCommand(this);
+        return ReadExecutionController.forCommand(this);
     }
 
     /**
@@ -515,6 +519,55 @@
         return Transformation.apply(iter, new MetricRecording());
     }
 
+    protected class CheckForAbort extends StoppingTransformation<BaseRowIterator<?>>
+    {
+        protected BaseRowIterator<?> applyToPartition(BaseRowIterator partition)
+        {
+            if (maybeAbort())
+            {
+                partition.close();
+                return null;
+            }
+
+            return partition;
+        }
+
+        protected Row applyToRow(Row row)
+        {
+            return maybeAbort() ? null : row;
+        }
+
+        private boolean maybeAbort()
+        {
+            if (TEST_ITERATION_DELAY_MILLIS > 0)
+                maybeDelayForTesting();
+
+            if (isAborted())
+            {
+                stop();
+                return true;
+            }
+
+            return false;
+        }
+    }
+
+    protected UnfilteredPartitionIterator withStateTracking(UnfilteredPartitionIterator iter)
+    {
+        return Transformation.apply(iter, new CheckForAbort());
+    }
+
+    protected UnfilteredRowIterator withStateTracking(UnfilteredRowIterator iter)
+    {
+        return Transformation.apply(iter, new CheckForAbort());
+    }
+
+    private void maybeDelayForTesting()
+    {
+        if (!metadata.ksName.startsWith("system"))
+            FBUtilities.sleepQuietly(TEST_ITERATION_DELAY_MILLIS);
+    }
+
     /**
      * Creates a message for this command.
      */
@@ -564,6 +617,12 @@
         return sb.toString();
     }
 
+    // Monitorable interface
+    public String name()
+    {
+        return toCQLString();
+    }
+
     private static class Serializer implements IVersionedSerializer<ReadCommand>
     {
         private static int digestFlag(boolean isDigest)
@@ -1016,7 +1075,7 @@
             // slice filter's stop.
             DataRange.Paging pagingRange = (DataRange.Paging) rangeCommand.dataRange();
             Clustering lastReturned = pagingRange.getLastReturned();
-            Slice.Bound newStart = Slice.Bound.exclusiveStartOf(lastReturned);
+            ClusteringBound newStart = ClusteringBound.exclusiveStartOf(lastReturned);
             Slice lastSlice = filter.requestedSlices().get(filter.requestedSlices().size() - 1);
             ByteBufferUtil.writeWithShortLength(LegacyLayout.encodeBound(metadata, newStart, true), out);
             ByteBufferUtil.writeWithShortLength(LegacyLayout.encodeClustering(metadata, lastSlice.end().clustering()), out);
@@ -1332,7 +1391,7 @@
                 // See CASSANDRA-11087.
                 if (metadata.isStaticCompactTable() && cellName.clustering.equals(Clustering.STATIC_CLUSTERING))
                 {
-                    clusterings.add(new Clustering(cellName.column.name.bytes));
+                    clusterings.add(Clustering.make(cellName.column.name.bytes));
                     selectionBuilder.add(metadata.compactValueColumn());
                 }
                 else
@@ -1484,7 +1543,7 @@
         static long serializedStaticSliceSize(CFMetaData metadata)
         {
             // unlike serializeStaticSlice(), but we don't care about reversal for size calculations
-            ByteBuffer sliceStart = LegacyLayout.encodeBound(metadata, Slice.Bound.BOTTOM, false);
+            ByteBuffer sliceStart = LegacyLayout.encodeBound(metadata, ClusteringBound.BOTTOM, false);
             long size = ByteBufferUtil.serializedSizeWithShortLength(sliceStart);
 
             size += TypeSizes.sizeof((short) (metadata.comparator.size() * 3 + 2));
@@ -1512,7 +1571,7 @@
             // slice finish after we've written the static slice start
             if (!isReversed)
             {
-                ByteBuffer sliceStart = LegacyLayout.encodeBound(metadata, Slice.Bound.BOTTOM, false);
+                ByteBuffer sliceStart = LegacyLayout.encodeBound(metadata, ClusteringBound.BOTTOM, false);
                 ByteBufferUtil.writeWithShortLength(sliceStart, out);
             }
 
@@ -1528,7 +1587,7 @@
 
             if (isReversed)
             {
-                ByteBuffer sliceStart = LegacyLayout.encodeBound(metadata, Slice.Bound.BOTTOM, false);
+                ByteBuffer sliceStart = LegacyLayout.encodeBound(metadata, ClusteringBound.BOTTOM, false);
                 ByteBufferUtil.writeWithShortLength(sliceStart, out);
             }
         }
@@ -1650,7 +1709,7 @@
             {
                 Slices.Builder slicesBuilder = new Slices.Builder(metadata.comparator);
                 for (Clustering clustering : requestedRows)
-                    slicesBuilder.add(Slice.Bound.inclusiveStartOf(clustering), Slice.Bound.inclusiveEndOf(clustering));
+                    slicesBuilder.add(ClusteringBound.inclusiveStartOf(clustering), ClusteringBound.inclusiveEndOf(clustering));
                 slices = slicesBuilder.build();
             }
 

diff --git a/src/java/org/apache/cassandra/db/ReadCommandVerbHandler.java b/src/java/org/apache/cassandra/db/ReadCommandVerbHandler.java
index 9cde8dc..e2a9678 100644
--- a/src/java/org/apache/cassandra/db/ReadCommandVerbHandler.java
+++ b/src/java/org/apache/cassandra/db/ReadCommandVerbHandler.java

@@ -41,15 +41,24 @@
         }
 
         ReadCommand command = message.payload;
+        command.setMonitoringTime(message.constructionTime, message.getTimeout());
+
         ReadResponse response;
-        try (ReadOrderGroup opGroup = command.startOrderGroup(); UnfilteredPartitionIterator iterator = command.executeLocally(opGroup))
+        try (ReadExecutionController executionController = command.executionController();
+             UnfilteredPartitionIterator iterator = command.executeLocally(executionController))
         {
             response = command.createResponse(iterator);
         }
 
-        MessageOut<ReadResponse> reply = new MessageOut<>(MessagingService.Verb.REQUEST_RESPONSE, response, serializer());
+        if (!command.complete())
+        {
+            Tracing.trace("Discarding partial response to {} (timed out)", message.from);
+            MessagingService.instance().incrementDroppedMessages(message, System.currentTimeMillis() - message.constructionTime.timestamp);
+            return;
+        }
 
         Tracing.trace("Enqueuing response to {}", message.from);
+        MessageOut<ReadResponse> reply = new MessageOut<>(MessagingService.Verb.REQUEST_RESPONSE, response, serializer());
         MessagingService.instance().sendReply(reply, id, message.from);
     }
 }

diff --git a/src/java/org/apache/cassandra/db/ReadExecutionController.java b/src/java/org/apache/cassandra/db/ReadExecutionController.java
new file mode 100644
index 0000000..7ddc8df
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/ReadExecutionController.java

@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.index.Index;
+import org.apache.cassandra.utils.concurrent.OpOrder;
+
+public class ReadExecutionController implements AutoCloseable
+{
+    // For every reads
+    private final OpOrder.Group baseOp;
+    private final CFMetaData baseMetadata; // kept to sanity check that we have take the op order on the right table
+
+    // For index reads
+    private final ReadExecutionController indexController;
+    private final OpOrder.Group writeOp;
+
+    private ReadExecutionController(OpOrder.Group baseOp, CFMetaData baseMetadata, ReadExecutionController indexController, OpOrder.Group writeOp)
+    {
+        // We can have baseOp == null, but only when empty() is called, in which case the controller will never really be used
+        // (which validForReadOn should ensure). But if it's not null, we should have the proper metadata too.
+        assert (baseOp == null) == (baseMetadata == null);
+        this.baseOp = baseOp;
+        this.baseMetadata = baseMetadata;
+        this.indexController = indexController;
+        this.writeOp = writeOp;
+    }
+
+    public ReadExecutionController indexReadController()
+    {
+        return indexController;
+    }
+
+    public OpOrder.Group writeOpOrderGroup()
+    {
+        return writeOp;
+    }
+
+    public boolean validForReadOn(ColumnFamilyStore cfs)
+    {
+        return baseOp != null && cfs.metadata.cfId.equals(baseMetadata.cfId);
+    }
+
+    public static ReadExecutionController empty()
+    {
+        return new ReadExecutionController(null, null, null, null);
+    }
+
+    /**
+     * Creates an execution controller for the provided command.
+     * <p>
+     * Note: no code should use this method outside of {@link ReadCommand#executionController} (for
+     * consistency sake) and you should use that latter method if you need an execution controller.
+     *
+     * @param command the command for which to create a controller.
+     * @return the created execution controller, which must always be closed.
+     */
+    @SuppressWarnings("resource") // ops closed during controller close
+    static ReadExecutionController forCommand(ReadCommand command)
+    {
+        ColumnFamilyStore baseCfs = Keyspace.openAndGetStore(command.metadata());
+        ColumnFamilyStore indexCfs = maybeGetIndexCfs(baseCfs, command);
+
+        if (indexCfs == null)
+        {
+            return new ReadExecutionController(baseCfs.readOrdering.start(), baseCfs.metadata, null, null);
+        }
+        else
+        {
+            OpOrder.Group baseOp = null, writeOp = null;
+            ReadExecutionController indexController = null;
+            // OpOrder.start() shouldn't fail, but better safe than sorry.
+            try
+            {
+                baseOp = baseCfs.readOrdering.start();
+                indexController = new ReadExecutionController(indexCfs.readOrdering.start(), indexCfs.metadata, null, null);
+                // TODO: this should perhaps not open and maintain a writeOp for the full duration, but instead only *try* to delete stale entries, without blocking if there's no room
+                // as it stands, we open a writeOp and keep it open for the duration to ensure that should this CF get flushed to make room we don't block the reclamation of any room being made
+                writeOp = Keyspace.writeOrder.start();
+                return new ReadExecutionController(baseOp, baseCfs.metadata, indexController, writeOp);
+            }
+            catch (RuntimeException e)
+            {
+                // Note that must have writeOp == null since ReadOrderGroup ctor can't fail
+                assert writeOp == null;
+                try
+                {
+                    if (baseOp != null)
+                        baseOp.close();
+                }
+                finally
+                {
+                    if (indexController != null)
+                        indexController.close();
+                }
+                throw e;
+            }
+        }
+    }
+
+    private static ColumnFamilyStore maybeGetIndexCfs(ColumnFamilyStore baseCfs, ReadCommand command)
+    {
+        Index index = command.getIndex(baseCfs);
+        return index == null ? null : index.getBackingTable().orElse(null);
+    }
+
+    public void close()
+    {
+        try
+        {
+            if (baseOp != null)
+                baseOp.close();
+        }
+        finally
+        {
+            if (indexController != null)
+            {
+                try
+                {
+                    indexController.close();
+                }
+                finally
+                {
+                    writeOp.close();
+                }
+            }
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/ReadOrderGroup.java b/src/java/org/apache/cassandra/db/ReadOrderGroup.java
deleted file mode 100644
index 416a2b8..0000000
--- a/src/java/org/apache/cassandra/db/ReadOrderGroup.java
+++ /dev/null

@@ -1,129 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.db;
-
-import org.apache.cassandra.index.Index;
-import org.apache.cassandra.utils.concurrent.OpOrder;
-
-public class ReadOrderGroup implements AutoCloseable
-{
-    // For every reads
-    private final OpOrder.Group baseOp;
-
-    // For index reads
-    private final OpOrder.Group indexOp;
-    private final OpOrder.Group writeOp;
-
-    private ReadOrderGroup(OpOrder.Group baseOp, OpOrder.Group indexOp, OpOrder.Group writeOp)
-    {
-        this.baseOp = baseOp;
-        this.indexOp = indexOp;
-        this.writeOp = writeOp;
-    }
-
-    public OpOrder.Group baseReadOpOrderGroup()
-    {
-        return baseOp;
-    }
-
-    public OpOrder.Group indexReadOpOrderGroup()
-    {
-        return indexOp;
-    }
-
-    public OpOrder.Group writeOpOrderGroup()
-    {
-        return writeOp;
-    }
-
-    public static ReadOrderGroup emptyGroup()
-    {
-        return new ReadOrderGroup(null, null, null);
-    }
-
-    @SuppressWarnings("resource") // ops closed during group close
-    public static ReadOrderGroup forCommand(ReadCommand command)
-    {
-        ColumnFamilyStore baseCfs = Keyspace.openAndGetStore(command.metadata());
-        ColumnFamilyStore indexCfs = maybeGetIndexCfs(baseCfs, command);
-
-        if (indexCfs == null)
-        {
-            return new ReadOrderGroup(baseCfs.readOrdering.start(), null, null);
-        }
-        else
-        {
-            OpOrder.Group baseOp = null, indexOp = null, writeOp = null;
-            // OpOrder.start() shouldn't fail, but better safe than sorry.
-            try
-            {
-                baseOp = baseCfs.readOrdering.start();
-                indexOp = indexCfs.readOrdering.start();
-                // TODO: this should perhaps not open and maintain a writeOp for the full duration, but instead only *try* to delete stale entries, without blocking if there's no room
-                // as it stands, we open a writeOp and keep it open for the duration to ensure that should this CF get flushed to make room we don't block the reclamation of any room being made
-                writeOp = Keyspace.writeOrder.start();
-                return new ReadOrderGroup(baseOp, indexOp, writeOp);
-            }
-            catch (RuntimeException e)
-            {
-                // Note that must have writeOp == null since ReadOrderGroup ctor can't fail
-                assert writeOp == null;
-                try
-                {
-                    if (baseOp != null)
-                        baseOp.close();
-                }
-                finally
-                {
-                    if (indexOp != null)
-                        indexOp.close();
-                }
-                throw e;
-            }
-        }
-    }
-
-    private static ColumnFamilyStore maybeGetIndexCfs(ColumnFamilyStore baseCfs, ReadCommand command)
-    {
-        Index index = command.getIndex(baseCfs);
-        return index == null ? null : index.getBackingTable().orElse(null);
-    }
-
-    public void close()
-    {
-        try
-        {
-            if (baseOp != null)
-                baseOp.close();
-        }
-        finally
-        {
-            if (indexOp != null)
-            {
-                try
-                {
-                    indexOp.close();
-                }
-                finally
-                {
-                    writeOp.close();
-                }
-            }
-        }
-    }
-}

diff --git a/src/java/org/apache/cassandra/db/ReadQuery.java b/src/java/org/apache/cassandra/db/ReadQuery.java
index 178ca7c..ba7b893 100644
--- a/src/java/org/apache/cassandra/db/ReadQuery.java
+++ b/src/java/org/apache/cassandra/db/ReadQuery.java

@@ -35,9 +35,9 @@
 {
     ReadQuery EMPTY = new ReadQuery()
     {
-        public ReadOrderGroup startOrderGroup()
+        public ReadExecutionController executionController()
         {
-            return ReadOrderGroup.emptyGroup();
+            return ReadExecutionController.empty();
         }
 
         public PartitionIterator execute(ConsistencyLevel consistency, ClientState clientState) throws RequestExecutionException
@@ -45,7 +45,7 @@
             return EmptyIterators.partition();
         }
 
-        public PartitionIterator executeInternal(ReadOrderGroup orderGroup)
+        public PartitionIterator executeInternal(ReadExecutionController controller)
         {
             return EmptyIterators.partition();
         }
@@ -86,9 +86,9 @@
      * The returned object <b>must</b> be closed on all path and it is thus strongly advised to
      * use it in a try-with-ressource construction.
      *
-     * @return a newly started order group for this {@code ReadQuery}.
+     * @return a newly started execution controller for this {@code ReadQuery}.
      */
-    public ReadOrderGroup startOrderGroup();
+    public ReadExecutionController executionController();
 
     /**
      * Executes the query at the provided consistency level.
@@ -104,10 +104,10 @@
     /**
      * Execute the query for internal queries (that is, it basically executes the query locally).
      *
-     * @param orderGroup the {@code ReadOrderGroup} protecting the read.
+     * @param controller the {@code ReadExecutionController} protecting the read.
      * @return the result of the query.
      */
-    public PartitionIterator executeInternal(ReadOrderGroup orderGroup);
+    public PartitionIterator executeInternal(ReadExecutionController controller);
 
     /**
      * Returns a pager for the query.

diff --git a/src/java/org/apache/cassandra/db/ReadResponse.java b/src/java/org/apache/cassandra/db/ReadResponse.java
index 2304cb4..05ebd00 100644
--- a/src/java/org/apache/cassandra/db/ReadResponse.java
+++ b/src/java/org/apache/cassandra/db/ReadResponse.java

@@ -284,7 +284,7 @@
                     // inclusive on both ends. If we have exclusive slice ends, we need to filter the results here.
                     UnfilteredRowIterator iterator;
                     if (!command.metadata().isCompound())
-                        iterator = filter.filter(partition.sliceableUnfilteredIterator(command.columnFilter(), filter.isReversed()));
+                        iterator = partition.unfilteredIterator(command.columnFilter(), filter.getSlices(command.metadata()), filter.isReversed());
                     else
                         iterator = partition.unfilteredIterator(command.columnFilter(), Slices.ALL, filter.isReversed());
 

diff --git a/src/java/org/apache/cassandra/db/RowIndexEntry.java b/src/java/org/apache/cassandra/db/RowIndexEntry.java
index 4e2f063..dd1fdb7 100644
--- a/src/java/org/apache/cassandra/db/RowIndexEntry.java
+++ b/src/java/org/apache/cassandra/db/RowIndexEntry.java

@@ -18,26 +18,135 @@
 package org.apache.cassandra.db;
 
 import java.io.IOException;
-import java.util.ArrayList;
+import java.nio.ByteBuffer;
 import java.util.Arrays;
-import java.util.Collections;
 import java.util.List;
 
-import com.google.common.primitives.Ints;
-
+import com.codahale.metrics.Histogram;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.cache.IMeasurableMemory;
-import org.apache.cassandra.io.sstable.IndexHelper;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.io.ISerializer;
+import org.apache.cassandra.io.sstable.IndexInfo;
 import org.apache.cassandra.io.sstable.format.Version;
 import org.apache.cassandra.io.util.DataInputPlus;
+import org.apache.cassandra.io.util.DataOutputBuffer;
 import org.apache.cassandra.io.util.DataOutputPlus;
-import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.io.util.FileDataInput;
+import org.apache.cassandra.io.util.SegmentedFile;
+import org.apache.cassandra.io.util.TrackedDataInputPlus;
+import org.apache.cassandra.metrics.DefaultNameFactory;
+import org.apache.cassandra.metrics.MetricNameFactory;
 import org.apache.cassandra.utils.ObjectSizes;
+import org.apache.cassandra.utils.vint.VIntCoding;
+import org.github.jamm.Unmetered;
 
+import static org.apache.cassandra.metrics.CassandraMetricsRegistry.Metrics;
+
+/**
+ * Binary format of {@code RowIndexEntry} is defined as follows:
+ * {@code
+ * (long) position (64 bit long, vint encoded)
+ *  (int) serialized size of data that follows (32 bit int, vint encoded)
+ * -- following for indexed entries only (so serialized size > 0)
+ *  (int) DeletionTime.localDeletionTime
+ * (long) DeletionTime.markedForDeletionAt
+ *  (int) number of IndexInfo objects (32 bit int, vint encoded)
+ *    (*) serialized IndexInfo objects, see below
+ *    (*) offsets of serialized IndexInfo objects, since version "ma" (3.0)
+ *        Each IndexInfo object's offset is relative to the first IndexInfo object.
+ * }
+ * <p>
+ * See {@link IndexInfo} for a description of the serialized format.
+ * </p>
+ *
+ * <p>
+ * For each partition, the layout of the index file looks like this:
+ * </p>
+ * <ol>
+ *     <li>partition key - prefixed with {@code short} length</li>
+ *     <li>serialized {@code RowIndexEntry} objects</li>
+ * </ol>
+ *
+ * <p>
+ *     Generally, we distinguish between index entries that have <i>index
+ *     samples</i> (list of {@link IndexInfo} objects) and those who don't.
+ *     For each <i>portion</i> of data for a single partition in the data file,
+ *     an index sample is created. The size of that <i>portion</i> is defined
+ *     by {@link org.apache.cassandra.config.Config#column_index_size_in_kb}.
+ * </p>
+ * <p>
+ *     Index entries with less than 2 index samples, will just store the
+ *     position in the data file.
+ * </p>
+ * <p>
+ *     Note: legacy sstables for index entries are those sstable formats that
+ *     do <i>not</i> have an offsets table to index samples ({@link IndexInfo}
+ *     objects). These are those sstables created on Cassandra versions
+ *     earlier than 3.0.
+ * </p>
+ * <p>
+ *     For index entries with index samples we store the index samples
+ *     ({@link IndexInfo} objects). The bigger the partition, the more
+ *     index samples are created. Since a huge amount of index samples
+ *     will "pollute" the heap and cause huge GC pressure, Cassandra 3.6
+ *     (CASSANDRA-11206) distinguishes between index entries with an
+ *     "acceptable" amount of index samples per partition and those
+ *     with an "enormous" amount of index samples. The barrier
+ *     is controlled by the configuration parameter
+ *     {@link org.apache.cassandra.config.Config#column_index_cache_size_in_kb}.
+ *     Index entries with a total serialized size of index samples up to
+ *     {@code column_index_cache_size_in_kb} will be held in an array.
+ *     Index entries exceeding that value will always be accessed from
+ *     disk.
+ * </p>
+ * <p>
+ *     This results in these classes:
+ * </p>
+ * <ul>
+ *     <li>{@link RowIndexEntry} just stores the offset in the data file.</li>
+ *     <li>{@link IndexedEntry} is for index entries with index samples
+ *     and used for both current and legacy sstables, which do not exceed
+ *     {@link org.apache.cassandra.config.Config#column_index_cache_size_in_kb}.</li>
+ *     <li>{@link ShallowIndexedEntry} is for index entries with index samples
+ *     that exceed {@link org.apache.cassandra.config.Config#column_index_cache_size_in_kb}
+ *     for sstables with an offset table to the index samples.</li>
+ *     <li>{@link LegacyShallowIndexedEntry} is for index entries with index samples
+ *     that exceed {@link org.apache.cassandra.config.Config#column_index_cache_size_in_kb}
+ *     but for legacy sstables.</li>
+ * </ul>
+ * <p>
+ *     Since access to index samples on disk (obviously) requires some file
+ *     reader, that functionality is encapsulated in implementations of
+ *     {@link IndexInfoRetriever}. There is an implementation to access
+ *     index samples of legacy sstables (without the offsets table),
+ *     an implementation of access sstables with an offsets table.
+ * </p>
+ * <p>
+ *     Until now (Cassandra 3.x), we still support reading from <i>legacy</i> sstables -
+ *     i.e. sstables created by Cassandra &lt; 3.0 (see {@link org.apache.cassandra.io.sstable.format.big.BigFormat}.
+ * </p>
+ *
+ */
 public class RowIndexEntry<T> implements IMeasurableMemory
 {
     private static final long EMPTY_SIZE = ObjectSizes.measure(new RowIndexEntry(0));
 
+    // constants for type of row-index-entry as serialized for saved-cache
+    static final int CACHE_NOT_INDEXED = 0;
+    static final int CACHE_INDEXED = 1;
+    static final int CACHE_INDEXED_SHALLOW = 2;
+
+    static final Histogram indexEntrySizeHistogram;
+    static final Histogram indexInfoCountHistogram;
+    static final Histogram indexInfoGetsHistogram;
+    static {
+        MetricNameFactory factory = new DefaultNameFactory("Index", "RowIndexEntry");
+        indexEntrySizeHistogram = Metrics.histogram(factory.createMetricName("IndexedEntrySize"), false);
+        indexInfoCountHistogram = Metrics.histogram(factory.createMetricName("IndexInfoCount"), false);
+        indexInfoGetsHistogram = Metrics.histogram(factory.createMetricName("IndexInfoGets"), false);
+    }
+
     public final long position;
 
     public RowIndexEntry(long position)
@@ -45,32 +154,18 @@
         this.position = position;
     }
 
-    protected int promotedSize(IndexHelper.IndexInfo.Serializer idxSerializer)
-    {
-        return 0;
-    }
-
-    public static RowIndexEntry<IndexHelper.IndexInfo> create(long position, DeletionTime deletionTime, ColumnIndex index)
-    {
-        assert index != null;
-        assert deletionTime != null;
-
-        // we only consider the columns summary when determining whether to create an IndexedEntry,
-        // since if there are insufficient columns to be worth indexing we're going to seek to
-        // the beginning of the row anyway, so we might as well read the tombstone there as well.
-        if (index.columnsIndex.size() > 1)
-            return new IndexedEntry(position, deletionTime, index.partitionHeaderLength, index.columnsIndex);
-        else
-            return new RowIndexEntry<>(position);
-    }
-
     /**
      * @return true if this index entry contains the row-level tombstone and column summary.  Otherwise,
      * caller should fetch these from the row header.
      */
     public boolean isIndexed()
     {
-        return !columnsIndex().isEmpty();
+        return columnsIndexCount() > 1;
+    }
+
+    public boolean indexOnHeap()
+    {
+        return false;
     }
 
     public DeletionTime deletionTime()
@@ -79,15 +174,6 @@
     }
 
     /**
-     * @return the offset to the start of the header information for this row.
-     * For some formats this may not be the start of the row.
-     */
-    public long headerOffset()
-    {
-        return 0;
-    }
-
-    /**
      * The length of the row header (partition key, partition deletion and static row).
      * This value is only provided for indexed entries and this method will throw
      * {@code UnsupportedOperationException} if {@code !isIndexed()}.
@@ -97,9 +183,9 @@
         throw new UnsupportedOperationException();
     }
 
-    public List<T> columnsIndex()
+    public int columnsIndexCount()
     {
-        return Collections.emptyList();
+        return 0;
     }
 
     public long unsharedHeapSize()
@@ -107,129 +193,178 @@
         return EMPTY_SIZE;
     }
 
-    public interface IndexSerializer<T>
+    /**
+     * @param dataFilePosition  position of the partition in the {@link org.apache.cassandra.io.sstable.Component.Type#DATA} file
+     * @param indexFilePosition position in the {@link org.apache.cassandra.io.sstable.Component.Type#PRIMARY_INDEX} of the {@link RowIndexEntry}
+     * @param deletionTime      deletion time of {@link RowIndexEntry}
+     * @param headerLength      deletion time of {@link RowIndexEntry}
+     * @param columnIndexCount  number of {@link IndexInfo} entries in the {@link RowIndexEntry}
+     * @param indexedPartSize   serialized size of all serialized {@link IndexInfo} objects and their offsets
+     * @param indexSamples      list with IndexInfo offsets (if total serialized size is less than {@link org.apache.cassandra.config.Config#column_index_cache_size_in_kb}
+     * @param offsets           offsets of IndexInfo offsets
+     * @param idxInfoSerializer the {@link IndexInfo} serializer
+     */
+    public static RowIndexEntry<IndexInfo> create(long dataFilePosition, long indexFilePosition,
+                                       DeletionTime deletionTime, long headerLength, int columnIndexCount,
+                                       int indexedPartSize,
+                                       List<IndexInfo> indexSamples, int[] offsets,
+                                       ISerializer<IndexInfo> idxInfoSerializer)
     {
-        void serialize(RowIndexEntry<T> rie, DataOutputPlus out) throws IOException;
-        RowIndexEntry<T> deserialize(DataInputPlus in) throws IOException;
-        int serializedSize(RowIndexEntry<T> rie);
+        // If the "partition building code" in BigTableWriter.append() via ColumnIndex returns a list
+        // of IndexInfo objects, which is the case if the serialized size is less than
+        // Config.column_index_cache_size_in_kb, AND we have more than one IndexInfo object, we
+        // construct an IndexedEntry object. (note: indexSamples.size() and columnIndexCount have the same meaning)
+        if (indexSamples != null && indexSamples.size() > 1)
+            return new IndexedEntry(dataFilePosition, deletionTime, headerLength,
+                                    indexSamples.toArray(new IndexInfo[indexSamples.size()]), offsets,
+                                    indexedPartSize, idxInfoSerializer);
+        // Here we have to decide whether we have serialized IndexInfo objects that exceeds
+        // Config.column_index_cache_size_in_kb (not exceeding case covered above).
+        // Such a "big" indexed-entry is represented as a shallow one.
+        if (columnIndexCount > 1)
+            return new ShallowIndexedEntry(dataFilePosition, indexFilePosition,
+                                           deletionTime, headerLength, columnIndexCount,
+                                           indexedPartSize, idxInfoSerializer);
+        // Last case is that there are no index samples.
+        return new RowIndexEntry<>(dataFilePosition);
     }
 
-    public static class Serializer implements IndexSerializer<IndexHelper.IndexInfo>
+    public IndexInfoRetriever openWithIndex(SegmentedFile indexFile)
     {
-        private final IndexHelper.IndexInfo.Serializer idxSerializer;
+        return null;
+    }
+
+    public interface IndexSerializer<T>
+    {
+        void serialize(RowIndexEntry<T> rie, DataOutputPlus out, ByteBuffer indexInfo) throws IOException;
+        RowIndexEntry<T> deserialize(DataInputPlus in, long indexFilePosition) throws IOException;
+        void serializeForCache(RowIndexEntry<T> rie, DataOutputPlus out) throws IOException;
+        RowIndexEntry<T> deserializeForCache(DataInputPlus in) throws IOException;
+
+        long deserializePositionAndSkip(DataInputPlus in) throws IOException;
+
+        ISerializer<T> indexInfoSerializer();
+    }
+
+    public static final class Serializer implements IndexSerializer<IndexInfo>
+    {
+        private final IndexInfo.Serializer idxInfoSerializer;
         private final Version version;
 
         public Serializer(CFMetaData metadata, Version version, SerializationHeader header)
         {
-            this.idxSerializer = new IndexHelper.IndexInfo.Serializer(metadata, version, header);
+            this.idxInfoSerializer = metadata.serializers().indexInfoSerializer(version, header);
             this.version = version;
         }
 
-        public void serialize(RowIndexEntry<IndexHelper.IndexInfo> rie, DataOutputPlus out) throws IOException
+        public IndexInfo.Serializer indexInfoSerializer()
+        {
+            return idxInfoSerializer;
+        }
+
+        public void serialize(RowIndexEntry<IndexInfo> rie, DataOutputPlus out, ByteBuffer indexInfo) throws IOException
         {
             assert version.storeRows() : "We read old index files but we should never write them";
 
-            out.writeUnsignedVInt(rie.position);
-            out.writeUnsignedVInt(rie.promotedSize(idxSerializer));
+            rie.serialize(out, idxInfoSerializer, indexInfo);
+        }
 
-            if (rie.isIndexed())
+        public void serializeForCache(RowIndexEntry<IndexInfo> rie, DataOutputPlus out) throws IOException
+        {
+            assert version.storeRows();
+
+            rie.serializeForCache(out);
+        }
+
+        public RowIndexEntry<IndexInfo> deserializeForCache(DataInputPlus in) throws IOException
+        {
+            assert version.storeRows();
+
+            long position = in.readUnsignedVInt();
+
+            switch (in.readByte())
             {
-                out.writeUnsignedVInt(rie.headerLength());
-                DeletionTime.serializer.serialize(rie.deletionTime(), out);
-                out.writeUnsignedVInt(rie.columnsIndex().size());
-
-                // Calculate and write the offsets to the IndexInfo objects.
-
-                int[] offsets = new int[rie.columnsIndex().size()];
-
-                if (out.hasPosition())
-                {
-                    // Out is usually a SequentialWriter, so using the file-pointer is fine to generate the offsets.
-                    // A DataOutputBuffer also works.
-                    long start = out.position();
-                    int i = 0;
-                    for (IndexHelper.IndexInfo info : rie.columnsIndex())
-                    {
-                        offsets[i] = i == 0 ? 0 : (int)(out.position() - start);
-                        i++;
-                        idxSerializer.serialize(info, out);
-                    }
-                }
-                else
-                {
-                    // Not sure this branch will ever be needed, but if it is called, it has to calculate the
-                    // serialized sizes instead of simply using the file-pointer.
-                    int i = 0;
-                    int offset = 0;
-                    for (IndexHelper.IndexInfo info : rie.columnsIndex())
-                    {
-                        offsets[i++] = offset;
-                        idxSerializer.serialize(info, out);
-                        offset += idxSerializer.serializedSize(info);
-                    }
-                }
-
-                for (int off : offsets)
-                    out.writeInt(off);
+                case CACHE_NOT_INDEXED:
+                    return new RowIndexEntry<>(position);
+                case CACHE_INDEXED:
+                    return new IndexedEntry(position, in, idxInfoSerializer, version);
+                case CACHE_INDEXED_SHALLOW:
+                    return new ShallowIndexedEntry(position, in, idxInfoSerializer);
+                default:
+                    throw new AssertionError();
             }
         }
 
-        public RowIndexEntry<IndexHelper.IndexInfo> deserialize(DataInputPlus in) throws IOException
+        public static void skipForCache(DataInputPlus in, Version version) throws IOException
+        {
+            assert version.storeRows();
+
+            /* long position = */in.readUnsignedVInt();
+            switch (in.readByte())
+            {
+                case CACHE_NOT_INDEXED:
+                    break;
+                case CACHE_INDEXED:
+                    IndexedEntry.skipForCache(in);
+                    break;
+                case CACHE_INDEXED_SHALLOW:
+                    ShallowIndexedEntry.skipForCache(in);
+                    break;
+                default:
+                    assert false;
+            }
+        }
+
+        public RowIndexEntry<IndexInfo> deserialize(DataInputPlus in, long indexFilePosition) throws IOException
         {
             if (!version.storeRows())
-            {
-                long position = in.readLong();
-
-                int size = in.readInt();
-                if (size > 0)
-                {
-                    DeletionTime deletionTime = DeletionTime.serializer.deserialize(in);
-
-                    int entries = in.readInt();
-                    List<IndexHelper.IndexInfo> columnsIndex = new ArrayList<>(entries);
-
-                    long headerLength = 0L;
-                    for (int i = 0; i < entries; i++)
-                    {
-                        IndexHelper.IndexInfo info = idxSerializer.deserialize(in);
-                        columnsIndex.add(info);
-                        if (i == 0)
-                            headerLength = info.offset;
-                    }
-
-                    return new IndexedEntry(position, deletionTime, headerLength, columnsIndex);
-                }
-                else
-                {
-                    return new RowIndexEntry<>(position);
-                }
-            }
+                return LegacyShallowIndexedEntry.deserialize(in, indexFilePosition, idxInfoSerializer);
 
             long position = in.readUnsignedVInt();
 
             int size = (int)in.readUnsignedVInt();
-            if (size > 0)
-            {
-                long headerLength = in.readUnsignedVInt();
-                DeletionTime deletionTime = DeletionTime.serializer.deserialize(in);
-                int entries = (int)in.readUnsignedVInt();
-                List<IndexHelper.IndexInfo> columnsIndex = new ArrayList<>(entries);
-                for (int i = 0; i < entries; i++)
-                    columnsIndex.add(idxSerializer.deserialize(in));
-
-                in.skipBytesFully(entries * TypeSizes.sizeof(0));
-
-                return new IndexedEntry(position, deletionTime, headerLength, columnsIndex);
-            }
-            else
+            if (size == 0)
             {
                 return new RowIndexEntry<>(position);
             }
+            else
+            {
+                long headerLength = in.readUnsignedVInt();
+                DeletionTime deletionTime = DeletionTime.serializer.deserialize(in);
+                int columnsIndexCount = (int) in.readUnsignedVInt();
+
+                int indexedPartSize = size - serializedSize(deletionTime, headerLength, columnsIndexCount);
+
+                if (size <= DatabaseDescriptor.getColumnIndexCacheSize())
+                {
+                    return new IndexedEntry(position, in, deletionTime, headerLength, columnsIndexCount,
+                                            idxInfoSerializer, version, indexedPartSize);
+                }
+                else
+                {
+                    in.skipBytes(indexedPartSize);
+
+                    return new ShallowIndexedEntry(position,
+                                                   indexFilePosition,
+                                                   deletionTime, headerLength, columnsIndexCount,
+                                                   indexedPartSize, idxInfoSerializer);
+                }
+            }
         }
 
-        // Reads only the data 'position' of the index entry and returns it. Note that this left 'in' in the middle
-        // of reading an entry, so this is only useful if you know what you are doing and in most case 'deserialize'
-        // should be used instead.
+        public long deserializePositionAndSkip(DataInputPlus in) throws IOException
+        {
+            if (!version.storeRows())
+                return LegacyShallowIndexedEntry.deserializePositionAndSkip(in);
+
+            return ShallowIndexedEntry.deserializePositionAndSkip(in);
+        }
+
+        /**
+         * Reads only the data 'position' of the index entry and returns it. Note that this left 'in' in the middle
+         * of reading an entry, so this is only useful if you know what you are doing and in most case 'deserialize'
+         * should be used instead.
+         */
         public static long readPosition(DataInputPlus in, Version version) throws IOException
         {
             return version.storeRows() ? in.readUnsignedVInt() : in.readLong();
@@ -250,51 +385,62 @@
             in.skipBytesFully(size);
         }
 
-        public int serializedSize(RowIndexEntry<IndexHelper.IndexInfo> rie)
+        public static void serializeOffsets(DataOutputBuffer out, int[] indexOffsets, int columnIndexCount) throws IOException
         {
-            assert version.storeRows() : "We read old index files but we should never write them";
-
-            int indexedSize = 0;
-            if (rie.isIndexed())
-            {
-                List<IndexHelper.IndexInfo> index = rie.columnsIndex();
-
-                indexedSize += TypeSizes.sizeofUnsignedVInt(rie.headerLength());
-                indexedSize += DeletionTime.serializer.serializedSize(rie.deletionTime());
-                indexedSize += TypeSizes.sizeofUnsignedVInt(index.size());
-
-                for (IndexHelper.IndexInfo info : index)
-                    indexedSize += idxSerializer.serializedSize(info);
-
-                indexedSize += index.size() * TypeSizes.sizeof(0);
-            }
-
-            return TypeSizes.sizeofUnsignedVInt(rie.position) + TypeSizes.sizeofUnsignedVInt(indexedSize) + indexedSize;
+            for (int i = 0; i < columnIndexCount; i++)
+                out.writeInt(indexOffsets[i]);
         }
     }
 
-    /**
-     * An entry in the row index for a row whose columns are indexed.
-     */
-    private static class IndexedEntry extends RowIndexEntry<IndexHelper.IndexInfo>
+    private static int serializedSize(DeletionTime deletionTime, long headerLength, int columnIndexCount)
     {
-        private final DeletionTime deletionTime;
+        return TypeSizes.sizeofUnsignedVInt(headerLength)
+               + (int) DeletionTime.serializer.serializedSize(deletionTime)
+               + TypeSizes.sizeofUnsignedVInt(columnIndexCount);
+    }
 
-        // The offset in the file when the index entry end
-        private final long headerLength;
-        private final List<IndexHelper.IndexInfo> columnsIndex;
-        private static final long BASE_SIZE =
-                ObjectSizes.measure(new IndexedEntry(0, DeletionTime.LIVE, 0, Arrays.<IndexHelper.IndexInfo>asList(null, null)))
-              + ObjectSizes.measure(new ArrayList<>(1));
+    public void serialize(DataOutputPlus out, IndexInfo.Serializer idxInfoSerializer, ByteBuffer indexInfo) throws IOException
+    {
+        out.writeUnsignedVInt(position);
 
-        private IndexedEntry(long position, DeletionTime deletionTime, long headerLength, List<IndexHelper.IndexInfo> columnsIndex)
+        out.writeUnsignedVInt(0);
+    }
+
+    public void serializeForCache(DataOutputPlus out) throws IOException
+    {
+        out.writeUnsignedVInt(position);
+
+        out.writeByte(CACHE_NOT_INDEXED);
+    }
+
+    private static final class LegacyShallowIndexedEntry extends RowIndexEntry<IndexInfo>
+    {
+        private static final long BASE_SIZE;
+        static
         {
-            super(position);
-            assert deletionTime != null;
-            assert columnsIndex != null && columnsIndex.size() > 1;
+            BASE_SIZE = ObjectSizes.measure(new LegacyShallowIndexedEntry(0, 0, DeletionTime.LIVE, 0, new int[0], null, 0));
+        }
+
+        private final long indexFilePosition;
+        private final int[] offsets;
+        @Unmetered
+        private final IndexInfo.Serializer idxInfoSerializer;
+        private final DeletionTime deletionTime;
+        private final long headerLength;
+        private final int serializedSize;
+
+        private LegacyShallowIndexedEntry(long dataFilePosition, long indexFilePosition,
+                                          DeletionTime deletionTime, long headerLength,
+                                          int[] offsets, IndexInfo.Serializer idxInfoSerializer,
+                                          int serializedSize)
+        {
+            super(dataFilePosition);
             this.deletionTime = deletionTime;
             this.headerLength = headerLength;
-            this.columnsIndex = columnsIndex;
+            this.indexFilePosition = indexFilePosition;
+            this.offsets = offsets;
+            this.idxInfoSerializer = idxInfoSerializer;
+            this.serializedSize = serializedSize;
         }
 
         @Override
@@ -310,36 +456,596 @@
         }
 
         @Override
-        public List<IndexHelper.IndexInfo> columnsIndex()
+        public long unsharedHeapSize()
         {
-            return columnsIndex;
+            return BASE_SIZE + offsets.length * TypeSizes.sizeof(0);
         }
 
         @Override
-        protected int promotedSize(IndexHelper.IndexInfo.Serializer idxSerializer)
+        public int columnsIndexCount()
         {
-            long size = TypeSizes.sizeofUnsignedVInt(headerLength)
-                      + DeletionTime.serializer.serializedSize(deletionTime)
-                      + TypeSizes.sizeofUnsignedVInt(columnsIndex.size()); // number of entries
-            for (IndexHelper.IndexInfo info : columnsIndex)
-                size += idxSerializer.serializedSize(info);
+            return offsets.length;
+        }
 
-            size += columnsIndex.size() * TypeSizes.sizeof(0);
+        @Override
+        public void serialize(DataOutputPlus out, IndexInfo.Serializer idxInfoSerializer, ByteBuffer indexInfo)
+        {
+            throw new UnsupportedOperationException("serializing legacy index entries is not supported");
+        }
 
-            return Ints.checkedCast(size);
+        @Override
+        public void serializeForCache(DataOutputPlus out)
+        {
+            throw new UnsupportedOperationException("serializing legacy index entries is not supported");
+        }
+
+        @Override
+        public IndexInfoRetriever openWithIndex(SegmentedFile indexFile)
+        {
+            int fieldsSize = (int) DeletionTime.serializer.serializedSize(deletionTime)
+                             + TypeSizes.sizeof(0); // columnIndexCount
+            indexEntrySizeHistogram.update(serializedSize);
+            indexInfoCountHistogram.update(offsets.length);
+            return new LegacyIndexInfoRetriever(indexFilePosition +
+                                                TypeSizes.sizeof(0L) + // position
+                                                TypeSizes.sizeof(0) + // indexInfoSize
+                                                fieldsSize,
+                                                offsets, indexFile.createReader(), idxInfoSerializer);
+        }
+
+        public static RowIndexEntry<IndexInfo> deserialize(DataInputPlus in, long indexFilePosition,
+                                                IndexInfo.Serializer idxInfoSerializer) throws IOException
+        {
+            long dataFilePosition = in.readLong();
+
+            int size = in.readInt();
+            if (size == 0)
+            {
+                return new RowIndexEntry<>(dataFilePosition);
+            }
+            else if (size <= DatabaseDescriptor.getColumnIndexCacheSize())
+            {
+                return new IndexedEntry(dataFilePosition, in, idxInfoSerializer);
+            }
+            else
+            {
+                DeletionTime deletionTime = DeletionTime.serializer.deserialize(in);
+
+                // For legacy sstables (i.e. sstables pre-"ma", pre-3.0) we have to scan all serialized IndexInfo
+                // objects to calculate the offsets array. However, it might be possible to deserialize all
+                // IndexInfo objects here - but to just skip feels more gentle to the heap/GC.
+
+                int entries = in.readInt();
+                int[] offsets = new int[entries];
+
+                TrackedDataInputPlus tracked = new TrackedDataInputPlus(in);
+                long start = tracked.getBytesRead();
+                long headerLength = 0L;
+                for (int i = 0; i < entries; i++)
+                {
+                    offsets[i] = (int) (tracked.getBytesRead() - start);
+                    if (i == 0)
+                    {
+                        IndexInfo info = idxInfoSerializer.deserialize(tracked);
+                        headerLength = info.offset;
+                    }
+                    else
+                        idxInfoSerializer.skip(tracked);
+                }
+
+                return new LegacyShallowIndexedEntry(dataFilePosition, indexFilePosition, deletionTime, headerLength, offsets, idxInfoSerializer, size);
+            }
+        }
+
+        static long deserializePositionAndSkip(DataInputPlus in) throws IOException
+        {
+            long position = in.readLong();
+
+            int size = in.readInt();
+            if (size > 0)
+                in.skipBytesFully(size);
+
+            return position;
+        }
+    }
+
+    private static final class LegacyIndexInfoRetriever extends FileIndexInfoRetriever
+    {
+        private final int[] offsets;
+
+        private LegacyIndexInfoRetriever(long indexFilePosition, int[] offsets, FileDataInput reader, IndexInfo.Serializer idxInfoSerializer)
+        {
+            super(indexFilePosition, offsets.length, reader, idxInfoSerializer);
+            this.offsets = offsets;
+        }
+
+        IndexInfo fetchIndex(int index) throws IOException
+        {
+            retrievals++;
+
+            // seek to posision of IndexInfo
+            indexReader.seek(indexInfoFilePosition + offsets[index]);
+
+            // deserialize IndexInfo
+            return idxInfoSerializer.deserialize(indexReader);
+        }
+    }
+
+    /**
+     * An entry in the row index for a row whose columns are indexed - used for both legacy and current formats.
+     */
+    private static final class IndexedEntry extends RowIndexEntry<IndexInfo>
+    {
+        private static final long BASE_SIZE;
+
+        static
+        {
+            BASE_SIZE = ObjectSizes.measure(new IndexedEntry(0, DeletionTime.LIVE, 0, null, null, 0, null));
+        }
+
+        private final DeletionTime deletionTime;
+        private final long headerLength;
+
+        private final IndexInfo[] columnsIndex;
+        private final int[] offsets;
+        private final int indexedPartSize;
+        @Unmetered
+        private final ISerializer<IndexInfo> idxInfoSerializer;
+
+        private IndexedEntry(long dataFilePosition, DeletionTime deletionTime, long headerLength,
+                             IndexInfo[] columnsIndex, int[] offsets,
+                             int indexedPartSize, ISerializer<IndexInfo> idxInfoSerializer)
+        {
+            super(dataFilePosition);
+
+            this.headerLength = headerLength;
+            this.deletionTime = deletionTime;
+
+            this.columnsIndex = columnsIndex;
+            this.offsets = offsets;
+            this.indexedPartSize = indexedPartSize;
+            this.idxInfoSerializer = idxInfoSerializer;
+        }
+
+        private IndexedEntry(long dataFilePosition, DataInputPlus in,
+                             DeletionTime deletionTime, long headerLength, int columnIndexCount,
+                             IndexInfo.Serializer idxInfoSerializer,
+                             Version version, int indexedPartSize) throws IOException
+        {
+            super(dataFilePosition);
+
+            this.headerLength = headerLength;
+            this.deletionTime = deletionTime;
+            int columnsIndexCount = columnIndexCount;
+
+            this.columnsIndex = new IndexInfo[columnsIndexCount];
+            for (int i = 0; i < columnsIndexCount; i++)
+                this.columnsIndex[i] = idxInfoSerializer.deserialize(in);
+
+            int[] offsets = null;
+            if (version.storeRows())
+            {
+                offsets = new int[this.columnsIndex.length];
+                for (int i = 0; i < offsets.length; i++)
+                    offsets[i] = in.readInt();
+            }
+            this.offsets = offsets;
+
+            this.indexedPartSize = indexedPartSize;
+
+            this.idxInfoSerializer = idxInfoSerializer;
+        }
+
+        /**
+         * Constructor called from {@link Serializer#deserializeForCache(org.apache.cassandra.io.util.DataInputPlus)}.
+         */
+        private IndexedEntry(long dataFilePosition, DataInputPlus in, IndexInfo.Serializer idxInfoSerializer, Version version) throws IOException
+        {
+            super(dataFilePosition);
+
+            this.headerLength = in.readUnsignedVInt();
+            this.deletionTime = DeletionTime.serializer.deserialize(in);
+            int columnsIndexCount = (int) in.readUnsignedVInt();
+
+            TrackedDataInputPlus trackedIn = new TrackedDataInputPlus(in);
+
+            this.columnsIndex = new IndexInfo[columnsIndexCount];
+            for (int i = 0; i < columnsIndexCount; i++)
+                this.columnsIndex[i] = idxInfoSerializer.deserialize(trackedIn);
+
+            this.offsets = null;
+
+            this.indexedPartSize = (int) trackedIn.getBytesRead();
+
+            this.idxInfoSerializer = idxInfoSerializer;
+        }
+
+        /**
+         * Constructor called from {@link LegacyShallowIndexedEntry#deserialize(org.apache.cassandra.io.util.DataInputPlus, long, org.apache.cassandra.io.sstable.IndexInfo.Serializer)}.
+         * Only for legacy sstables.
+         */
+        private IndexedEntry(long dataFilePosition, DataInputPlus in, IndexInfo.Serializer idxInfoSerializer) throws IOException
+        {
+            super(dataFilePosition);
+
+            long headerLength = 0;
+            this.deletionTime = DeletionTime.serializer.deserialize(in);
+            int columnsIndexCount = in.readInt();
+
+            TrackedDataInputPlus trackedIn = new TrackedDataInputPlus(in);
+
+            this.columnsIndex = new IndexInfo[columnsIndexCount];
+            for (int i = 0; i < columnsIndexCount; i++)
+            {
+                this.columnsIndex[i] = idxInfoSerializer.deserialize(trackedIn);
+                if (i == 0)
+                    headerLength = this.columnsIndex[i].offset;
+            }
+            this.headerLength = headerLength;
+
+            this.offsets = null;
+
+            this.indexedPartSize = (int) trackedIn.getBytesRead();
+
+            this.idxInfoSerializer = idxInfoSerializer;
+        }
+
+        @Override
+        public boolean indexOnHeap()
+        {
+            return true;
+        }
+
+        @Override
+        public int columnsIndexCount()
+        {
+            return columnsIndex.length;
+        }
+
+        @Override
+        public DeletionTime deletionTime()
+        {
+            return deletionTime;
+        }
+
+        @Override
+        public long headerLength()
+        {
+            return headerLength;
+        }
+
+        @Override
+        public IndexInfoRetriever openWithIndex(SegmentedFile indexFile)
+        {
+            indexEntrySizeHistogram.update(serializedSize(deletionTime, headerLength, columnsIndex.length) + indexedPartSize);
+            indexInfoCountHistogram.update(columnsIndex.length);
+            return new IndexInfoRetriever()
+            {
+                private int retrievals;
+
+                @Override
+                public IndexInfo columnsIndex(int index)
+                {
+                    retrievals++;
+                    return columnsIndex[index];
+                }
+
+                public void close()
+                {
+                    indexInfoGetsHistogram.update(retrievals);
+                }
+            };
         }
 
         @Override
         public long unsharedHeapSize()
         {
             long entrySize = 0;
-            for (IndexHelper.IndexInfo idx : columnsIndex)
+            for (IndexInfo idx : columnsIndex)
                 entrySize += idx.unsharedHeapSize();
-
             return BASE_SIZE
-                   + entrySize
-                   + deletionTime.unsharedHeapSize()
-                   + ObjectSizes.sizeOfReferenceArray(columnsIndex.size());
+                + entrySize
+                + ObjectSizes.sizeOfReferenceArray(columnsIndex.length);
+        }
+
+        @Override
+        public void serialize(DataOutputPlus out, IndexInfo.Serializer idxInfoSerializer, ByteBuffer indexInfo) throws IOException
+        {
+            assert indexedPartSize != Integer.MIN_VALUE;
+
+            out.writeUnsignedVInt(position);
+
+            out.writeUnsignedVInt(serializedSize(deletionTime, headerLength, columnsIndex.length) + indexedPartSize);
+
+            out.writeUnsignedVInt(headerLength);
+            DeletionTime.serializer.serialize(deletionTime, out);
+            out.writeUnsignedVInt(columnsIndex.length);
+            for (IndexInfo info : columnsIndex)
+                idxInfoSerializer.serialize(info, out);
+            for (int offset : offsets)
+                out.writeInt(offset);
+        }
+
+        @Override
+        public void serializeForCache(DataOutputPlus out) throws IOException
+        {
+            out.writeUnsignedVInt(position);
+            out.writeByte(CACHE_INDEXED);
+
+            out.writeUnsignedVInt(headerLength);
+            DeletionTime.serializer.serialize(deletionTime, out);
+            out.writeUnsignedVInt(columnsIndexCount());
+
+            for (IndexInfo indexInfo : columnsIndex)
+                idxInfoSerializer.serialize(indexInfo, out);
+        }
+
+        static void skipForCache(DataInputPlus in) throws IOException
+        {
+            /*long headerLength =*/in.readUnsignedVInt();
+            /*DeletionTime deletionTime = */DeletionTime.serializer.skip(in);
+            /*int columnsIndexCount = (int)*/in.readUnsignedVInt();
+
+            /*int indexedPartSize = (int)*/in.readUnsignedVInt();
+        }
+    }
+
+    /**
+     * An entry in the row index for a row whose columns are indexed and the {@link IndexInfo} objects
+     * are not read into the key cache.
+     */
+    private static final class ShallowIndexedEntry extends RowIndexEntry<IndexInfo>
+    {
+        private static final long BASE_SIZE;
+
+        static
+        {
+            BASE_SIZE = ObjectSizes.measure(new ShallowIndexedEntry(0, 0, DeletionTime.LIVE, 0, 10, 0, null));
+        }
+
+        private final long indexFilePosition;
+
+        private final DeletionTime deletionTime;
+        private final long headerLength;
+        private final int columnsIndexCount;
+
+        private final int indexedPartSize;
+        private final int offsetsOffset;
+        @Unmetered
+        private final ISerializer<IndexInfo> idxInfoSerializer;
+        private final int fieldsSerializedSize;
+
+        /**
+         * See {@link #create(long, long, DeletionTime, long, int, int, List, int[], ISerializer)} for a description
+         * of the parameters.
+         */
+        private ShallowIndexedEntry(long dataFilePosition, long indexFilePosition,
+                                    DeletionTime deletionTime, long headerLength, int columnIndexCount,
+                                    int indexedPartSize, ISerializer<IndexInfo> idxInfoSerializer)
+        {
+            super(dataFilePosition);
+
+            assert columnIndexCount > 1;
+
+            this.indexFilePosition = indexFilePosition;
+            this.headerLength = headerLength;
+            this.deletionTime = deletionTime;
+            this.columnsIndexCount = columnIndexCount;
+
+            this.indexedPartSize = indexedPartSize;
+            this.idxInfoSerializer = idxInfoSerializer;
+
+            this.fieldsSerializedSize = serializedSize(deletionTime, headerLength, columnIndexCount);
+            this.offsetsOffset = indexedPartSize + fieldsSerializedSize - columnsIndexCount * TypeSizes.sizeof(0);
+        }
+
+        /**
+         * Constructor for key-cache deserialization
+         */
+        private ShallowIndexedEntry(long dataFilePosition, DataInputPlus in, IndexInfo.Serializer idxInfoSerializer) throws IOException
+        {
+            super(dataFilePosition);
+
+            this.indexFilePosition = in.readUnsignedVInt();
+
+            this.headerLength = in.readUnsignedVInt();
+            this.deletionTime = DeletionTime.serializer.deserialize(in);
+            this.columnsIndexCount = (int) in.readUnsignedVInt();
+
+            this.indexedPartSize = (int) in.readUnsignedVInt();
+
+            this.idxInfoSerializer = idxInfoSerializer;
+
+            this.fieldsSerializedSize = serializedSize(deletionTime, headerLength, columnsIndexCount);
+            this.offsetsOffset = indexedPartSize + fieldsSerializedSize - columnsIndexCount * TypeSizes.sizeof(0);
+        }
+
+        @Override
+        public int columnsIndexCount()
+        {
+            return columnsIndexCount;
+        }
+
+        @Override
+        public DeletionTime deletionTime()
+        {
+            return deletionTime;
+        }
+
+        @Override
+        public long headerLength()
+        {
+            return headerLength;
+        }
+
+        @Override
+        public IndexInfoRetriever openWithIndex(SegmentedFile indexFile)
+        {
+            indexEntrySizeHistogram.update(indexedPartSize + fieldsSerializedSize);
+            indexInfoCountHistogram.update(columnsIndexCount);
+            return new ShallowInfoRetriever(indexFilePosition +
+                                            VIntCoding.computeUnsignedVIntSize(position) +
+                                            VIntCoding.computeUnsignedVIntSize(indexedPartSize + fieldsSerializedSize) +
+                                            fieldsSerializedSize,
+                                            offsetsOffset - fieldsSerializedSize,
+                                            columnsIndexCount, indexFile.createReader(), idxInfoSerializer);
+        }
+
+        @Override
+        public long unsharedHeapSize()
+        {
+            return BASE_SIZE;
+        }
+
+        @Override
+        public void serialize(DataOutputPlus out, IndexInfo.Serializer idxInfoSerializer, ByteBuffer indexInfo) throws IOException
+        {
+            out.writeUnsignedVInt(position);
+
+            out.writeUnsignedVInt(fieldsSerializedSize + indexInfo.limit());
+
+            out.writeUnsignedVInt(headerLength);
+            DeletionTime.serializer.serialize(deletionTime, out);
+            out.writeUnsignedVInt(columnsIndexCount);
+
+            out.write(indexInfo);
+        }
+
+        static long deserializePositionAndSkip(DataInputPlus in) throws IOException
+        {
+            long position = in.readUnsignedVInt();
+
+            int size = (int) in.readUnsignedVInt();
+            if (size > 0)
+                in.skipBytesFully(size);
+
+            return position;
+        }
+
+        @Override
+        public void serializeForCache(DataOutputPlus out) throws IOException
+        {
+            out.writeUnsignedVInt(position);
+            out.writeByte(CACHE_INDEXED_SHALLOW);
+
+            out.writeUnsignedVInt(indexFilePosition);
+
+            out.writeUnsignedVInt(headerLength);
+            DeletionTime.serializer.serialize(deletionTime, out);
+            out.writeUnsignedVInt(columnsIndexCount);
+
+            out.writeUnsignedVInt(indexedPartSize);
+        }
+
+        static void skipForCache(DataInputPlus in) throws IOException
+        {
+            /*long indexFilePosition =*/in.readUnsignedVInt();
+
+            /*long headerLength =*/in.readUnsignedVInt();
+            /*DeletionTime deletionTime = */DeletionTime.serializer.skip(in);
+            /*int columnsIndexCount = (int)*/in.readUnsignedVInt();
+
+            /*int indexedPartSize = (int)*/in.readUnsignedVInt();
+        }
+    }
+
+    private static final class ShallowInfoRetriever extends FileIndexInfoRetriever
+    {
+        private final int offsetsOffset;
+
+        private ShallowInfoRetriever(long indexInfoFilePosition, int offsetsOffset, int indexCount,
+                                     FileDataInput indexReader, ISerializer<IndexInfo> idxInfoSerializer)
+        {
+            super(indexInfoFilePosition, indexCount, indexReader, idxInfoSerializer);
+            this.offsetsOffset = offsetsOffset;
+        }
+
+        IndexInfo fetchIndex(int index) throws IOException
+        {
+            assert index >= 0 && index < indexCount;
+
+            retrievals++;
+
+            // seek to position in "offsets to IndexInfo" table
+            indexReader.seek(indexInfoFilePosition + offsetsOffset + index * TypeSizes.sizeof(0));
+
+            // read offset of IndexInfo
+            int indexInfoPos = indexReader.readInt();
+
+            // seek to posision of IndexInfo
+            indexReader.seek(indexInfoFilePosition + indexInfoPos);
+
+            // finally, deserialize IndexInfo
+            return idxInfoSerializer.deserialize(indexReader);
+        }
+    }
+
+    /**
+     * Base class to access {@link IndexInfo} objects.
+     */
+    public interface IndexInfoRetriever extends AutoCloseable
+    {
+        IndexInfo columnsIndex(int index) throws IOException;
+
+        void close() throws IOException;
+    }
+
+    /**
+     * Base class to access {@link IndexInfo} objects on disk that keeps already
+     * read {@link IndexInfo} on heap.
+     */
+    private abstract static class FileIndexInfoRetriever implements IndexInfoRetriever
+    {
+        final long indexInfoFilePosition;
+        final int indexCount;
+        final ISerializer<IndexInfo> idxInfoSerializer;
+        final FileDataInput indexReader;
+        int retrievals;
+
+        private IndexInfo[] lastIndexes;
+
+        /**
+         *
+         * @param indexInfoFilePosition offset of first serialized {@link IndexInfo} object
+         * @param indexCount number of {@link IndexInfo} objects
+         * @param indexReader file data input to access the index file, closed by this instance
+         * @param idxInfoSerializer the index serializer to deserialize {@link IndexInfo} objects
+         */
+        FileIndexInfoRetriever(long indexInfoFilePosition, int indexCount, FileDataInput indexReader, ISerializer<IndexInfo> idxInfoSerializer)
+        {
+            this.indexInfoFilePosition = indexInfoFilePosition;
+            this.indexCount = indexCount;
+            this.idxInfoSerializer = idxInfoSerializer;
+            this.indexReader = indexReader;
+        }
+
+        public final IndexInfo columnsIndex(int index) throws IOException
+        {
+            if (lastIndexes != null
+                && lastIndexes.length > index && lastIndexes[index] != null)
+            {
+                // return a previously read/deserialized IndexInfo
+                return lastIndexes[index];
+            }
+
+            if (lastIndexes == null)
+                lastIndexes = new IndexInfo[index + 1];
+            else if (lastIndexes.length <= index)
+                lastIndexes = Arrays.copyOf(lastIndexes, index + 1);
+
+            IndexInfo indexInfo = fetchIndex(index);
+            lastIndexes[index] = indexInfo;
+
+            return indexInfo;
+        }
+
+        abstract IndexInfo fetchIndex(int index) throws IOException;
+
+        public void close() throws IOException
+        {
+            indexReader.close();
+
+            indexInfoGetsHistogram.update(retrievals);
         }
     }
 }

diff --git a/src/java/org/apache/cassandra/db/RowUpdateBuilder.java b/src/java/org/apache/cassandra/db/RowUpdateBuilder.java
index 8ace988..b414eba 100644
--- a/src/java/org/apache/cassandra/db/RowUpdateBuilder.java
+++ b/src/java/org/apache/cassandra/db/RowUpdateBuilder.java

@@ -18,7 +18,6 @@
 package org.apache.cassandra.db;
 
 import java.nio.ByteBuffer;
-import java.util.ArrayList;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
@@ -27,7 +26,6 @@
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.db.marshal.SetType;
-import org.apache.cassandra.db.marshal.UTF8Type;
 import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.db.context.CounterContext;
 import org.apache.cassandra.db.partitions.*;
@@ -87,7 +85,7 @@
 
         // If a CQL table, add the "row marker"
         if (update.metadata().isCQLTable() && useRowMarker)
-            regularBuilder.addPrimaryKeyLivenessInfo(LivenessInfo.create(update.metadata(), timestamp, ttl, localDeletionTime));
+            regularBuilder.addPrimaryKeyLivenessInfo(LivenessInfo.create(timestamp, ttl, localDeletionTime));
     }
 
     private Row.Builder builder()
@@ -278,7 +276,7 @@
     {
         return value == null
              ? BufferCell.tombstone(c, timestamp, localDeletionTime)
-             : (ttl == LivenessInfo.NO_TTL ? BufferCell.live(update.metadata(), c, timestamp, value, path) : BufferCell.expiring(c, timestamp, ttl, localDeletionTime, value, path));
+             : (ttl == LivenessInfo.NO_TTL ? BufferCell.live(c, timestamp, value, path) : BufferCell.expiring(c, timestamp, ttl, localDeletionTime, value, path));
     }
 
     public RowUpdateBuilder add(ColumnDefinition columnDefinition, Object value)

diff --git a/src/java/org/apache/cassandra/db/SerializationHeader.java b/src/java/org/apache/cassandra/db/SerializationHeader.java
index 0fd1281..af2d434 100644
--- a/src/java/org/apache/cassandra/db/SerializationHeader.java
+++ b/src/java/org/apache/cassandra/db/SerializationHeader.java

@@ -29,7 +29,6 @@
 import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.db.marshal.AbstractType;
-import org.apache.cassandra.db.marshal.BytesType;
 import org.apache.cassandra.db.marshal.UTF8Type;
 import org.apache.cassandra.db.marshal.TypeParser;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
@@ -75,25 +74,6 @@
         return new SerializationHeader(true, metadata, metadata.partitionColumns(), EncodingStats.NO_STATS);
     }
 
-    public static SerializationHeader forKeyCache(CFMetaData metadata)
-    {
-        // We don't save type information in the key cache (we could change
-        // that but it's easier right now), so instead we simply use BytesType
-        // for both serialization and deserialization. Note that we also only
-        // serializer clustering prefixes in the key cache, so only the clusteringTypes
-        // really matter.
-        int size = metadata.clusteringColumns().size();
-        List<AbstractType<?>> clusteringTypes = new ArrayList<>(size);
-        for (int i = 0; i < size; i++)
-            clusteringTypes.add(BytesType.instance);
-        return new SerializationHeader(false,
-                                       BytesType.instance,
-                                       clusteringTypes,
-                                       PartitionColumns.NONE,
-                                       EncodingStats.NO_STATS,
-                                       Collections.<ByteBuffer, AbstractType<?>>emptyMap());
-    }
-
     public static SerializationHeader make(CFMetaData metadata, Collection<SSTableReader> sstables)
     {
         // The serialization header has to be computed before the start of compaction (since it's used to write)

diff --git a/src/java/org/apache/cassandra/db/Serializers.java b/src/java/org/apache/cassandra/db/Serializers.java
index 348fda3..d6aac64 100644
--- a/src/java/org/apache/cassandra/db/Serializers.java
+++ b/src/java/org/apache/cassandra/db/Serializers.java

@@ -20,10 +20,15 @@
 import java.io.*;
 import java.nio.ByteBuffer;
 import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
 
 import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.CompositeType;
 import org.apache.cassandra.io.ISerializer;
+import org.apache.cassandra.io.sstable.IndexInfo;
+import org.apache.cassandra.io.sstable.format.big.BigFormat;
 import org.apache.cassandra.io.util.DataInputPlus;
 import org.apache.cassandra.io.util.DataOutputPlus;
 import org.apache.cassandra.io.sstable.format.Version;
@@ -36,36 +41,66 @@
 {
     private final CFMetaData metadata;
 
+    private Map<Version, IndexInfo.Serializer> otherVersionClusteringSerializers;
+
+    private final IndexInfo.Serializer latestVersionIndexSerializer;
+
     public Serializers(CFMetaData metadata)
     {
         this.metadata = metadata;
+        this.latestVersionIndexSerializer = new IndexInfo.Serializer(BigFormat.latestVersion,
+                                                                     indexEntryClusteringPrefixSerializer(BigFormat.latestVersion, SerializationHeader.makeWithoutStats(metadata)));
+    }
+
+    IndexInfo.Serializer indexInfoSerializer(Version version, SerializationHeader header)
+    {
+        // null header indicates streaming from pre-3.0 sstables
+        if (version.equals(BigFormat.latestVersion) && header != null)
+            return latestVersionIndexSerializer;
+
+        if (otherVersionClusteringSerializers == null)
+            otherVersionClusteringSerializers = new ConcurrentHashMap<>();
+        IndexInfo.Serializer serializer = otherVersionClusteringSerializers.get(version);
+        if (serializer == null)
+        {
+            serializer = new IndexInfo.Serializer(version,
+                                                  indexEntryClusteringPrefixSerializer(version, header));
+            otherVersionClusteringSerializers.put(version, serializer);
+        }
+        return serializer;
     }
 
     // TODO: Once we drop support for old (pre-3.0) sstables, we can drop this method and inline the calls to
-    // ClusteringPrefix.serializer in IndexHelper directly. At which point this whole class probably becomes
+    // ClusteringPrefix.serializer directly. At which point this whole class probably becomes
     // unecessary (since IndexInfo.Serializer won't depend on the metadata either).
-    public ISerializer<ClusteringPrefix> indexEntryClusteringPrefixSerializer(final Version version, final SerializationHeader header)
+    private ISerializer<ClusteringPrefix> indexEntryClusteringPrefixSerializer(Version version, SerializationHeader header)
     {
         if (!version.storeRows() || header ==  null) //null header indicates streaming from pre-3.0 sstables
         {
             return oldFormatSerializer(version);
         }
 
-        return newFormatSerializer(version, header);
+        return new NewFormatSerializer(version, header.clusteringTypes());
     }
 
-    private ISerializer<ClusteringPrefix> oldFormatSerializer(final Version version)
+    private ISerializer<ClusteringPrefix> oldFormatSerializer(Version version)
     {
         return new ISerializer<ClusteringPrefix>()
         {
-            SerializationHeader newHeader = SerializationHeader.makeWithoutStats(metadata);
+            List<AbstractType<?>> clusteringTypes = SerializationHeader.makeWithoutStats(metadata).clusteringTypes();
 
             public void serialize(ClusteringPrefix clustering, DataOutputPlus out) throws IOException
             {
                 //we deserialize in the old format and serialize in the new format
                 ClusteringPrefix.serializer.serialize(clustering, out,
                                                       version.correspondingMessagingVersion(),
-                                                      newHeader.clusteringTypes());
+                                                      clusteringTypes);
+            }
+
+            @Override
+            public void skip(DataInputPlus in) throws IOException
+            {
+                ByteBufferUtil.skipShortLength(in);
             }
 
             public ClusteringPrefix deserialize(DataInputPlus in) throws IOException
@@ -80,7 +115,7 @@
                     return Clustering.EMPTY;
 
                 if (!metadata.isCompound())
-                    return new Clustering(bb);
+                    return Clustering.make(bb);
 
                 List<ByteBuffer> components = CompositeType.splitName(bb);
                 byte eoc = CompositeType.lastEOC(bb);
@@ -91,48 +126,58 @@
                     if (components.size() > clusteringSize)
                         components = components.subList(0, clusteringSize);
 
-                    return new Clustering(components.toArray(new ByteBuffer[clusteringSize]));
+                    return Clustering.make(components.toArray(new ByteBuffer[clusteringSize]));
                 }
                 else
                 {
                     // It's a range tombstone bound. It is a start since that's the only part we've ever included
                     // in the index entries.
-                    Slice.Bound.Kind boundKind = eoc > 0
-                                                 ? Slice.Bound.Kind.EXCL_START_BOUND
-                                                 : Slice.Bound.Kind.INCL_START_BOUND;
+                    ClusteringPrefix.Kind boundKind = eoc > 0
+                                                 ? ClusteringPrefix.Kind.EXCL_START_BOUND
+                                                 : ClusteringPrefix.Kind.INCL_START_BOUND;
 
-                    return Slice.Bound.create(boundKind, components.toArray(new ByteBuffer[components.size()]));
+                    return ClusteringBound.create(boundKind, components.toArray(new ByteBuffer[components.size()]));
                 }
             }
 
             public long serializedSize(ClusteringPrefix clustering)
             {
                 return ClusteringPrefix.serializer.serializedSize(clustering, version.correspondingMessagingVersion(),
-                                                                  newHeader.clusteringTypes());
+                                                                  clusteringTypes);
             }
         };
     }
 
-
-    private ISerializer<ClusteringPrefix> newFormatSerializer(final Version version, final SerializationHeader header)
+    private static class NewFormatSerializer implements ISerializer<ClusteringPrefix>
     {
-        return new ISerializer<ClusteringPrefix>() //Reading and writing from/to the new sstable format
+        private final Version version;
+        private final List<AbstractType<?>> clusteringTypes;
+
+        NewFormatSerializer(Version version, List<AbstractType<?>> clusteringTypes)
         {
-            public void serialize(ClusteringPrefix clustering, DataOutputPlus out) throws IOException
-            {
-                ClusteringPrefix.serializer.serialize(clustering, out, version.correspondingMessagingVersion(), header.clusteringTypes());
-            }
+            this.version = version;
+            this.clusteringTypes = clusteringTypes;
+        }
 
-            public ClusteringPrefix deserialize(DataInputPlus in) throws IOException
-            {
-                return ClusteringPrefix.serializer.deserialize(in, version.correspondingMessagingVersion(), header.clusteringTypes());
-            }
+        public void serialize(ClusteringPrefix clustering, DataOutputPlus out) throws IOException
+        {
+            ClusteringPrefix.serializer.serialize(clustering, out, version.correspondingMessagingVersion(), clusteringTypes);
+        }
 
-            public long serializedSize(ClusteringPrefix clustering)
-            {
-                return ClusteringPrefix.serializer.serializedSize(clustering, version.correspondingMessagingVersion(), header.clusteringTypes());
-            }
-        };
+        @Override
+        public void skip(DataInputPlus in) throws IOException
+        {
+            ClusteringPrefix.serializer.skip(in, version.correspondingMessagingVersion(), clusteringTypes);
+        }
+
+        public ClusteringPrefix deserialize(DataInputPlus in) throws IOException
+        {
+            return ClusteringPrefix.serializer.deserialize(in, version.correspondingMessagingVersion(), clusteringTypes);
+        }
+
+        public long serializedSize(ClusteringPrefix clustering)
+        {
+            return ClusteringPrefix.serializer.serializedSize(clustering, version.correspondingMessagingVersion(), clusteringTypes);
+        }
     }
-
 }

diff --git a/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java b/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java
index 73eb9bd..855b030 100644
--- a/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java
+++ b/src/java/org/apache/cassandra/db/SinglePartitionReadCommand.java

@@ -36,6 +36,7 @@
 import org.apache.cassandra.db.filter.*;
 import org.apache.cassandra.db.partitions.*;
 import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.db.transform.Transformation;
 import org.apache.cassandra.exceptions.RequestExecutionException;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.io.util.DataInputPlus;
@@ -54,7 +55,6 @@
 import org.apache.cassandra.utils.SearchIterator;
 import org.apache.cassandra.utils.btree.BTreeSet;
 import org.apache.cassandra.utils.concurrent.OpOrder;
-import org.apache.cassandra.utils.memory.HeapAllocator;
 
 
 /**
@@ -312,11 +312,11 @@
      * @param lastReturned the last row returned by the previous page. The newly created command
      * will only query row that comes after this (in query order). This can be {@code null} if this
      * is the first page.
-     * @param pageSize the size to use for the page to query.
+     * @param limits the limits to use for the page to query.
      *
      * @return the newly create command.
      */
-    public SinglePartitionReadCommand forPaging(Clustering lastReturned, int pageSize)
+    public SinglePartitionReadCommand forPaging(Clustering lastReturned, DataLimits limits)
     {
         // We shouldn't have set digest yet when reaching that point
         assert !isDigestQuery();
@@ -325,7 +325,7 @@
                       nowInSec(),
                       columnFilter(),
                       rowFilter(),
-                      limits().forPaging(pageSize),
+                      limits,
                       partitionKey(),
                       lastReturned == null ? clusteringIndexFilter() : clusteringIndexFilter.forPaging(metadata().comparator, lastReturned, false));
     }
@@ -351,11 +351,11 @@
     }
 
     @SuppressWarnings("resource") // we close the created iterator through closing the result of this method (and SingletonUnfilteredPartitionIterator ctor cannot fail)
-    protected UnfilteredPartitionIterator queryStorage(final ColumnFamilyStore cfs, ReadOrderGroup orderGroup)
+    protected UnfilteredPartitionIterator queryStorage(final ColumnFamilyStore cfs, ReadExecutionController executionController)
     {
         UnfilteredRowIterator partition = cfs.isRowCacheEnabled()
-                                        ? getThroughCache(cfs, orderGroup.baseReadOpOrderGroup())
-                                        : queryMemtableAndDisk(cfs, orderGroup.baseReadOpOrderGroup());
+                                        ? getThroughCache(cfs, executionController)
+                                        : queryMemtableAndDisk(cfs, executionController);
         return new SingletonUnfilteredPartitionIterator(partition, isForThrift());
     }
 
@@ -368,7 +368,7 @@
      * If the partition is is not cached, we figure out what filter is "biggest", read
      * that from disk, then filter the result and either cache that or return it.
      */
-    private UnfilteredRowIterator getThroughCache(ColumnFamilyStore cfs, OpOrder.Group readOp)
+    private UnfilteredRowIterator getThroughCache(ColumnFamilyStore cfs, ReadExecutionController executionController)
     {
         assert !cfs.isIndex(); // CASSANDRA-5732
         assert cfs.isRowCacheEnabled() : String.format("Row cache is not enabled on table [%s]", cfs.name);
@@ -386,7 +386,7 @@
                 // Some other read is trying to cache the value, just do a normal non-caching read
                 Tracing.trace("Row cache miss (race)");
                 cfs.metric.rowCacheMiss.inc();
-                return queryMemtableAndDisk(cfs, readOp);
+                return queryMemtableAndDisk(cfs, executionController);
             }
 
             CachedPartition cachedPartition = (CachedPartition)cached;
@@ -401,7 +401,7 @@
 
             cfs.metric.rowCacheHitOutOfRange.inc();
             Tracing.trace("Ignoring row cache as cached value could not satisfy query");
-            return queryMemtableAndDisk(cfs, readOp);
+            return queryMemtableAndDisk(cfs, executionController);
         }
 
         cfs.metric.rowCacheMiss.inc();
@@ -426,7 +426,7 @@
             {
                 int rowsToCache = metadata().params.caching.rowsPerPartitionToCache();
                 @SuppressWarnings("resource") // we close on exception or upon closing the result of this method
-                UnfilteredRowIterator iter = SinglePartitionReadCommand.fullPartitionRead(metadata(), nowInSec(), partitionKey()).queryMemtableAndDisk(cfs, readOp);
+                UnfilteredRowIterator iter = SinglePartitionReadCommand.fullPartitionRead(metadata(), nowInSec(), partitionKey()).queryMemtableAndDisk(cfs, executionController);
                 try
                 {
                     // We want to cache only rowsToCache rows
@@ -466,7 +466,7 @@
         }
 
         Tracing.trace("Fetching data but not populating cache as query does not query from the start of the partition");
-        return queryMemtableAndDisk(cfs, readOp);
+        return queryMemtableAndDisk(cfs, executionController);
     }
 
     /**
@@ -481,15 +481,15 @@
      * It is publicly exposed because there is a few places where that is exactly what we want,
      * but it should be used only where you know you don't need thoses things.
      * <p>
-     * Also note that one must have "started" a {@code OpOrder.Group} on the queried table, and that is
-     * to enforce that that it is required as parameter, even though it's not explicitlly used by the method.
+     * Also note that one must have created a {@code ReadExecutionController} on the queried table and we require it as
+     * a parameter to enforce that fact, even though it's not explicitlly used by the method.
      */
-    public UnfilteredRowIterator queryMemtableAndDisk(ColumnFamilyStore cfs, OpOrder.Group readOp)
+    public UnfilteredRowIterator queryMemtableAndDisk(ColumnFamilyStore cfs, ReadExecutionController executionController)
     {
+        assert executionController != null && executionController.validForReadOn(cfs);
         Tracing.trace("Executing single-partition query on {}", cfs.name);
 
-        boolean copyOnHeap = Memtable.MEMORY_POOL.needToCopyOnHeap();
-        return queryMemtableAndDiskInternal(cfs, copyOnHeap);
+        return queryMemtableAndDiskInternal(cfs);
     }
 
     @Override
@@ -498,7 +498,7 @@
         return oldestUnrepairedTombstone;
     }
 
-    private UnfilteredRowIterator queryMemtableAndDiskInternal(ColumnFamilyStore cfs, boolean copyOnHeap)
+    private UnfilteredRowIterator queryMemtableAndDiskInternal(ColumnFamilyStore cfs)
     {
         /*
          * We have 2 main strategies:
@@ -512,13 +512,13 @@
          *      of shards so have the same problem).
          */
         if (clusteringIndexFilter() instanceof ClusteringIndexNamesFilter && queryNeitherCountersNorCollections())
-            return queryMemtableAndSSTablesInTimestampOrder(cfs, copyOnHeap, (ClusteringIndexNamesFilter)clusteringIndexFilter());
+            return queryMemtableAndSSTablesInTimestampOrder(cfs, (ClusteringIndexNamesFilter)clusteringIndexFilter());
 
         Tracing.trace("Acquiring sstable references");
         ColumnFamilyStore.ViewFragment view = cfs.select(View.select(SSTableSet.LIVE, partitionKey()));
-
         List<UnfilteredRowIterator> iterators = new ArrayList<>(Iterables.size(view.memtables) + view.sstables.size());
         ClusteringIndexFilter filter = clusteringIndexFilter();
+        long minTimestamp = Long.MAX_VALUE;
 
         try
         {
@@ -528,13 +528,14 @@
                 if (partition == null)
                     continue;
 
+                minTimestamp = Math.min(minTimestamp, memtable.getMinTimestamp());
+
                 @SuppressWarnings("resource") // 'iter' is added to iterators which is closed on exception, or through the closing of the final merged iterator
                 UnfilteredRowIterator iter = filter.getUnfilteredRowIterator(columnFilter(), partition);
-                @SuppressWarnings("resource") // same as above
-                UnfilteredRowIterator maybeCopied = copyOnHeap ? UnfilteredRowIterators.cloningIterator(iter, HeapAllocator.instance) : iter;
                 oldestUnrepairedTombstone = Math.min(oldestUnrepairedTombstone, partition.stats().minLocalDeletionTime);
-                iterators.add(isForThrift() ? ThriftResultsMerger.maybeWrap(maybeCopied, nowInSec()) : maybeCopied);
+                iterators.add(isForThrift() ? ThriftResultsMerger.maybeWrap(iter, nowInSec()) : iter);
             }
+
             /*
              * We can't eliminate full sstables based on the timestamp of what we've already read like
              * in collectTimeOrderedData, but we still want to eliminate sstable whose maxTimestamp < mostRecentTombstone
@@ -547,16 +548,13 @@
              * In other words, iterating in maxTimestamp order allow to do our mostRecentPartitionTombstone elimination
              * in one pass, and minimize the number of sstables for which we read a partition tombstone.
              */
-            int sstablesIterated = 0;
             Collections.sort(view.sstables, SSTableReader.maxTimestampComparator);
-            List<SSTableReader> skippedSSTables = null;
             long mostRecentPartitionTombstone = Long.MIN_VALUE;
-            long minTimestamp = Long.MAX_VALUE;
             int nonIntersectingSSTables = 0;
+            List<SSTableReader> skippedSSTablesWithTombstones = null;
 
             for (SSTableReader sstable : view.sstables)
             {
-                minTimestamp = Math.min(minTimestamp, sstable.getMinTimestamp());
                 // if we've already seen a partition tombstone with a timestamp greater
                 // than the most recent update to this sstable, we can skip it
                 if (sstable.getMaxTimestamp() < mostRecentPartitionTombstone)
@@ -565,73 +563,56 @@
                 if (!shouldInclude(sstable))
                 {
                     nonIntersectingSSTables++;
-                    // sstable contains no tombstone if maxLocalDeletionTime == Integer.MAX_VALUE, so we can safely skip those entirely
-                    if (sstable.getSSTableMetadata().maxLocalDeletionTime != Integer.MAX_VALUE)
-                    {
-                        if (skippedSSTables == null)
-                            skippedSSTables = new ArrayList<>();
-                        skippedSSTables.add(sstable);
+                    if (sstable.hasTombstones())
+                    { // if sstable has tombstones we need to check after one pass if it can be safely skipped
+                        if (skippedSSTablesWithTombstones == null)
+                            skippedSSTablesWithTombstones = new ArrayList<>();
+                        skippedSSTablesWithTombstones.add(sstable);
                     }
                     continue;
                 }
 
-                sstable.incrementReadCount();
-                @SuppressWarnings("resource") // 'iter' is added to iterators which is closed on exception, or through the closing of the final merged iterator
-                UnfilteredRowIterator iter = filter.filter(sstable.iterator(partitionKey(), columnFilter(), filter.isReversed(), isForThrift()));
+                minTimestamp = Math.min(minTimestamp, sstable.getMinTimestamp());
+
+                @SuppressWarnings("resource") // 'iter' is added to iterators which is closed on exception,
+                                              // or through the closing of the final merged iterator
+                UnfilteredRowIteratorWithLowerBound iter = makeIterator(cfs, sstable, true);
                 if (!sstable.isRepaired())
                     oldestUnrepairedTombstone = Math.min(oldestUnrepairedTombstone, sstable.getMinLocalDeletionTime());
 
-                iterators.add(isForThrift() ? ThriftResultsMerger.maybeWrap(iter, nowInSec()) : iter);
-                mostRecentPartitionTombstone = Math.max(mostRecentPartitionTombstone, iter.partitionLevelDeletion().markedForDeleteAt());
-                sstablesIterated++;
+                iterators.add(iter);
+                mostRecentPartitionTombstone = Math.max(mostRecentPartitionTombstone,
+                                                        iter.partitionLevelDeletion().markedForDeleteAt());
             }
 
             int includedDueToTombstones = 0;
-            // Check for partition tombstones in the skipped sstables
-            if (skippedSSTables != null)
+            // Check for sstables with tombstones that are not expired
+            if (skippedSSTablesWithTombstones != null)
             {
-                for (SSTableReader sstable : skippedSSTables)
+                for (SSTableReader sstable : skippedSSTablesWithTombstones)
                 {
                     if (sstable.getMaxTimestamp() <= minTimestamp)
                         continue;
 
-                    sstable.incrementReadCount();
-                    @SuppressWarnings("resource") // 'iter' is either closed right away, or added to iterators which is close on exception, or through the closing of the final merged iterator
-                    UnfilteredRowIterator iter = filter.filter(sstable.iterator(partitionKey(), columnFilter(), filter.isReversed(), isForThrift()));
-                    if (iter.partitionLevelDeletion().markedForDeleteAt() > minTimestamp)
-                    {
-                        iterators.add(iter);
-                        if (!sstable.isRepaired())
-                            oldestUnrepairedTombstone = Math.min(oldestUnrepairedTombstone, sstable.getMinLocalDeletionTime());
-                        includedDueToTombstones++;
-                        sstablesIterated++;
-                    }
-                    else
-                    {
-                        iter.close();
-                    }
+                    @SuppressWarnings("resource") // 'iter' is added to iterators which is close on exception,
+                                                  // or through the closing of the final merged iterator
+                    UnfilteredRowIteratorWithLowerBound iter = makeIterator(cfs, sstable, false);
+                    if (!sstable.isRepaired())
+                        oldestUnrepairedTombstone = Math.min(oldestUnrepairedTombstone, sstable.getMinLocalDeletionTime());
+
+                    iterators.add(iter);
+                    includedDueToTombstones++;
                 }
             }
             if (Tracing.isTracing())
                 Tracing.trace("Skipped {}/{} non-slice-intersecting sstables, included {} due to tombstones",
-                              nonIntersectingSSTables, view.sstables.size(), includedDueToTombstones);
-
-            cfs.metric.updateSSTableIterated(sstablesIterated);
+                               nonIntersectingSSTables, view.sstables.size(), includedDueToTombstones);
 
             if (iterators.isEmpty())
                 return EmptyIterators.unfilteredRow(cfs.metadata, partitionKey(), filter.isReversed());
 
-            Tracing.trace("Merging data from memtables and {} sstables", sstablesIterated);
-
-            @SuppressWarnings("resource") //  Closed through the closing of the result of that method.
-            UnfilteredRowIterator merged = UnfilteredRowIterators.merge(iterators, nowInSec());
-            if (!merged.isEmpty())
-            {
-                DecoratedKey key = merged.partitionKey();
-                cfs.metric.samplers.get(TableMetrics.Sampler.READS).addSample(key.getKey(), key.hashCode(), 1);
-            }
-
-            return merged;
+            StorageHook.instance.reportRead(cfs.metadata.cfId, partitionKey());
+            return withStateTracking(withSSTablesIterated(iterators, cfs.metric));
         }
         catch (RuntimeException | Error e)
         {
@@ -658,6 +639,52 @@
         return clusteringIndexFilter().shouldInclude(sstable);
     }
 
+    private UnfilteredRowIteratorWithLowerBound makeIterator(ColumnFamilyStore cfs, final SSTableReader sstable, boolean applyThriftTransformation)
+    {
+        return StorageHook.instance.makeRowIteratorWithLowerBound(cfs,
+                                                                  partitionKey(),
+                                                                  sstable,
+                                                                  clusteringIndexFilter(),
+                                                                  columnFilter(),
+                                                                  isForThrift(),
+                                                                  nowInSec(),
+                                                                  applyThriftTransformation);
+
+    }
+
+    /**
+     * Return a wrapped iterator that when closed will update the sstables iterated and READ sample metrics.
+     * Note that we cannot use the Transformations framework because they greedily get the static row, which
+     * would cause all iterators to be initialized and hence all sstables to be accessed.
+     */
+    private UnfilteredRowIterator withSSTablesIterated(List<UnfilteredRowIterator> iterators,
+                                                       TableMetrics metrics)
+    {
+        @SuppressWarnings("resource") //  Closed through the closing of the result of the caller method.
+        UnfilteredRowIterator merged = UnfilteredRowIterators.merge(iterators, nowInSec());
+
+        if (!merged.isEmpty())
+        {
+            DecoratedKey key = merged.partitionKey();
+            metrics.samplers.get(TableMetrics.Sampler.READS).addSample(key.getKey(), key.hashCode(), 1);
+        }
+
+        class UpdateSstablesIterated extends Transformation
+        {
+           public void onPartitionClose()
+           {
+               int sstablesIterated = (int)iterators.stream()
+                                                    .filter(it -> it instanceof LazilyInitializedUnfilteredRowIterator)
+                                                    .filter(it -> ((LazilyInitializedUnfilteredRowIterator)it).initialized())
+                                                    .count();
+
+               metrics.updateSSTableIterated(sstablesIterated);
+               Tracing.trace("Merged data from memtables and {} sstables", sstablesIterated);
+           }
+        };
+        return Transformation.apply(merged, new UpdateSstablesIterated());
+    }
+
     private boolean queryNeitherCountersNorCollections()
     {
         for (ColumnDefinition column : columnFilter().fetchedColumns())
@@ -677,7 +704,7 @@
      * no collection or counters are included).
      * This method assumes the filter is a {@code ClusteringIndexNamesFilter}.
      */
-    private UnfilteredRowIterator queryMemtableAndSSTablesInTimestampOrder(ColumnFamilyStore cfs, boolean copyOnHeap, ClusteringIndexNamesFilter filter)
+    private UnfilteredRowIterator queryMemtableAndSSTablesInTimestampOrder(ColumnFamilyStore cfs, ClusteringIndexNamesFilter filter)
     {
         Tracing.trace("Acquiring sstable references");
         ColumnFamilyStore.ViewFragment view = cfs.select(View.select(SSTableSet.LIVE, partitionKey()));
@@ -696,10 +723,7 @@
                 if (iter.isEmpty())
                     continue;
 
-                UnfilteredRowIterator clonedFilter = copyOnHeap
-                                                   ? UnfilteredRowIterators.cloningIterator(iter, HeapAllocator.instance)
-                                                   : iter;
-                result = add(isForThrift() ? ThriftResultsMerger.maybeWrap(clonedFilter, nowInSec()) : clonedFilter, result, filter, false);
+                result = add(isForThrift() ? ThriftResultsMerger.maybeWrap(iter, nowInSec()) : iter, result, filter, false);
             }
         }
 
@@ -727,12 +751,12 @@
                 // however: if it is set, it impacts everything and must be included. Getting that top-level partition deletion costs us
                 // some seek in general however (unless the partition is indexed and is in the key cache), so we first check if the sstable
                 // has any tombstone at all as a shortcut.
-                if (sstable.getSSTableMetadata().maxLocalDeletionTime == Integer.MAX_VALUE)
-                    continue; // Means no tombstone at all, we can skip that sstable
+                if (!sstable.hasTombstones())
+                    continue; // no tombstone at all, we can skip that sstable
 
                 // We need to get the partition deletion and include it if it's live. In any case though, we're done with that sstable.
                 sstable.incrementReadCount();
-                try (UnfilteredRowIterator iter = sstable.iterator(partitionKey(), columnFilter(), filter.isReversed(), isForThrift()))
+                try (UnfilteredRowIterator iter = StorageHook.instance.makeRowIterator(cfs, sstable, partitionKey(), Slices.ALL, columnFilter(), filter.isReversed(), isForThrift()))
                 {
                     if (iter.partitionLevelDeletion().isLive())
                     {
@@ -745,7 +769,7 @@
 
             Tracing.trace("Merging data from sstable {}", sstable.descriptor.generation);
             sstable.incrementReadCount();
-            try (UnfilteredRowIterator iter = filter.filter(sstable.iterator(partitionKey(), columnFilter(), filter.isReversed(), isForThrift())))
+            try (UnfilteredRowIterator iter = StorageHook.instance.makeRowIterator(cfs, sstable, partitionKey(), filter.getSlices(metadata()), columnFilter(), filter.isReversed(), isForThrift()))
             {
                 if (iter.isEmpty())
                     continue;
@@ -764,6 +788,7 @@
 
         DecoratedKey key = result.partitionKey();
         cfs.metric.samplers.get(TableMetrics.Sampler.READS).addSample(key.getKey(), key.hashCode(), 1);
+        StorageHook.instance.reportRead(cfs.metadata.cfId, partitionKey());
 
         // "hoist up" the requested data into a more recent sstable
         if (sstablesIterated > cfs.getMinimumCompactionThreshold()
@@ -777,19 +802,15 @@
 
             try (UnfilteredRowIterator iter = result.unfilteredIterator(columnFilter(), Slices.ALL, false))
             {
-                final Mutation mutation = new Mutation(PartitionUpdate.fromIterator(iter));
-                StageManager.getStage(Stage.MUTATION).execute(new Runnable()
-                {
-                    public void run()
-                    {
-                        // skipping commitlog and index updates is fine since we're just de-fragmenting existing data
-                        Keyspace.open(mutation.getKeyspaceName()).apply(mutation, false, false);
-                    }
+                final Mutation mutation = new Mutation(PartitionUpdate.fromIterator(iter, columnFilter()));
+                StageManager.getStage(Stage.MUTATION).execute(() -> {
+                    // skipping commitlog and index updates is fine since we're just de-fragmenting existing data
+                    Keyspace.open(mutation.getKeyspaceName()).apply(mutation, false, false);
                 });
             }
         }
 
-        return result.unfilteredIterator(columnFilter(), Slices.ALL, clusteringIndexFilter().isReversed());
+        return withStateTracking(result.unfilteredIterator(columnFilter(), Slices.ALL, clusteringIndexFilter().isReversed()));
     }
 
     private ImmutableBTreePartition add(UnfilteredRowIterator iter, ImmutableBTreePartition result, ClusteringIndexNamesFilter filter, boolean isRepaired)
@@ -946,7 +967,7 @@
 
         public static Group one(SinglePartitionReadCommand command)
         {
-            return new Group(Collections.<SinglePartitionReadCommand>singletonList(command), command.limits());
+            return new Group(Collections.singletonList(command), command.limits());
         }
 
         public PartitionIterator execute(ConsistencyLevel consistency, ClientState clientState) throws RequestExecutionException
@@ -969,18 +990,18 @@
             return commands.get(0).metadata();
         }
 
-        public ReadOrderGroup startOrderGroup()
+        public ReadExecutionController executionController()
         {
             // Note that the only difference between the command in a group must be the partition key on which
             // they applied. So as far as ReadOrderGroup is concerned, we can use any of the commands to start one.
-            return commands.get(0).startOrderGroup();
+            return commands.get(0).executionController();
         }
 
-        public PartitionIterator executeInternal(ReadOrderGroup orderGroup)
+        public PartitionIterator executeInternal(ReadExecutionController controller)
         {
             List<PartitionIterator> partitions = new ArrayList<>(commands.size());
             for (SinglePartitionReadCommand cmd : commands)
-                partitions.add(cmd.executeInternal(orderGroup));
+                partitions.add(cmd.executeInternal(controller));
 
             // Because we only have enforce the limit per command, we need to enforce it globally.
             return limits.filter(PartitionIterators.concat(partitions), nowInSec);

diff --git a/src/java/org/apache/cassandra/db/Slice.java b/src/java/org/apache/cassandra/db/Slice.java
index 7fde45e..c3da222 100644
--- a/src/java/org/apache/cassandra/db/Slice.java
+++ b/src/java/org/apache/cassandra/db/Slice.java

@@ -38,10 +38,10 @@
     public static final Serializer serializer = new Serializer();
 
     /** The slice selecting all rows (of a given partition) */
-    public static final Slice ALL = new Slice(Bound.BOTTOM, Bound.TOP)
+    public static final Slice ALL = new Slice(ClusteringBound.BOTTOM, ClusteringBound.TOP)
     {
         @Override
-        public boolean selects(ClusteringComparator comparator, Clustering clustering)
+        public boolean includes(ClusteringComparator comparator, ClusteringPrefix clustering)
         {
             return true;
         }
@@ -59,19 +59,19 @@
         }
     };
 
-    private final Bound start;
-    private final Bound end;
+    private final ClusteringBound start;
+    private final ClusteringBound end;
 
-    private Slice(Bound start, Bound end)
+    private Slice(ClusteringBound start, ClusteringBound end)
     {
         assert start.isStart() && end.isEnd();
         this.start = start;
         this.end = end;
     }
 
-    public static Slice make(Bound start, Bound end)
+    public static Slice make(ClusteringBound start, ClusteringBound end)
     {
-        if (start == Bound.BOTTOM && end == Bound.TOP)
+        if (start == ClusteringBound.BOTTOM && end == ClusteringBound.TOP)
             return ALL;
 
         return new Slice(start, end);
@@ -95,7 +95,7 @@
         // This doesn't give us what we want with the clustering prefix
         assert clustering != Clustering.STATIC_CLUSTERING;
         ByteBuffer[] values = extractValues(clustering);
-        return new Slice(Bound.inclusiveStartOf(values), Bound.inclusiveEndOf(values));
+        return new Slice(ClusteringBound.inclusiveStartOf(values), ClusteringBound.inclusiveEndOf(values));
     }
 
     public static Slice make(Clustering start, Clustering end)
@@ -106,7 +106,7 @@
         ByteBuffer[] startValues = extractValues(start);
         ByteBuffer[] endValues = extractValues(end);
 
-        return new Slice(Bound.inclusiveStartOf(startValues), Bound.inclusiveEndOf(endValues));
+        return new Slice(ClusteringBound.inclusiveStartOf(startValues), ClusteringBound.inclusiveEndOf(endValues));
     }
 
     private static ByteBuffer[] extractValues(ClusteringPrefix clustering)
@@ -117,22 +117,22 @@
         return values;
     }
 
-    public Bound start()
+    public ClusteringBound start()
     {
         return start;
     }
 
-    public Bound end()
+    public ClusteringBound end()
     {
         return end;
     }
 
-    public Bound open(boolean reversed)
+    public ClusteringBound open(boolean reversed)
     {
         return reversed ? end : start;
     }
 
-    public Bound close(boolean reversed)
+    public ClusteringBound close(boolean reversed)
     {
         return reversed ? start : end;
     }
@@ -157,34 +157,21 @@
      * @return whether the slice formed by {@code start} and {@code end} is
      * empty or not.
      */
-    public static boolean isEmpty(ClusteringComparator comparator, Slice.Bound start, Slice.Bound end)
+    public static boolean isEmpty(ClusteringComparator comparator, ClusteringBound start, ClusteringBound end)
     {
         assert start.isStart() && end.isEnd();
         return comparator.compare(end, start) < 0;
     }
 
     /**
-     * Returns whether a given clustering is selected by this slice.
-     *
-     * @param comparator the comparator for the table this is a slice of.
-     * @param clustering the clustering to test inclusion of.
-     *
-     * @return whether {@code clustering} is selected by this slice.
-     */
-    public boolean selects(ClusteringComparator comparator, Clustering clustering)
-    {
-        return comparator.compare(start, clustering) <= 0 && comparator.compare(clustering, end) <= 0;
-    }
-
-    /**
-     * Returns whether a given bound is included in this slice.
+     * Returns whether a given clustering or bound is included in this slice.
      *
      * @param comparator the comparator for the table this is a slice of.
      * @param bound the bound to test inclusion of.
      *
      * @return whether {@code bound} is within the bounds of this slice.
      */
-    public boolean includes(ClusteringComparator comparator, Bound bound)
+    public boolean includes(ClusteringComparator comparator, ClusteringPrefix bound)
     {
         return comparator.compare(start, bound) <= 0 && comparator.compare(bound, end) <= 0;
     }
@@ -218,7 +205,7 @@
                 return this;
 
             ByteBuffer[] values = extractValues(lastReturned);
-            return new Slice(start, inclusive ? Bound.inclusiveEndOf(values) : Bound.exclusiveEndOf(values));
+            return new Slice(start, inclusive ? ClusteringBound.inclusiveEndOf(values) : ClusteringBound.exclusiveEndOf(values));
         }
         else
         {
@@ -231,7 +218,7 @@
                 return this;
 
             ByteBuffer[] values = extractValues(lastReturned);
-            return new Slice(inclusive ? Bound.inclusiveStartOf(values) : Bound.exclusiveStartOf(values), end);
+            return new Slice(inclusive ? ClusteringBound.inclusiveStartOf(values) : ClusteringBound.exclusiveStartOf(values), end);
         }
     }
 
@@ -320,238 +307,21 @@
     {
         public void serialize(Slice slice, DataOutputPlus out, int version, List<AbstractType<?>> types) throws IOException
         {
-            Bound.serializer.serialize(slice.start, out, version, types);
-            Bound.serializer.serialize(slice.end, out, version, types);
+            ClusteringBound.serializer.serialize(slice.start, out, version, types);
+            ClusteringBound.serializer.serialize(slice.end, out, version, types);
         }
 
         public long serializedSize(Slice slice, int version, List<AbstractType<?>> types)
         {
-            return Bound.serializer.serializedSize(slice.start, version, types)
-                 + Bound.serializer.serializedSize(slice.end, version, types);
+            return ClusteringBound.serializer.serializedSize(slice.start, version, types)
+                 + ClusteringBound.serializer.serializedSize(slice.end, version, types);
         }
 
         public Slice deserialize(DataInputPlus in, int version, List<AbstractType<?>> types) throws IOException
         {
-            Bound start = Bound.serializer.deserialize(in, version, types);
-            Bound end = Bound.serializer.deserialize(in, version, types);
+            ClusteringBound start = (ClusteringBound) ClusteringBound.serializer.deserialize(in, version, types);
+            ClusteringBound end = (ClusteringBound) ClusteringBound.serializer.deserialize(in, version, types);
             return new Slice(start, end);
         }
     }
-
-    /**
-     * The bound of a slice.
-     * <p>
-     * This can be either a start or an end bound, and this can be either inclusive or exclusive.
-     */
-    public static class Bound extends AbstractClusteringPrefix
-    {
-        public static final Serializer serializer = new Serializer();
-
-        /**
-         * The smallest and biggest bound. Note that as range tomstone bounds are (special case) of slice bounds,
-         * we want the BOTTOM and TOP to be the same object, but we alias them here because it's cleaner when dealing
-         * with slices to refer to Slice.Bound.BOTTOM and Slice.Bound.TOP.
-         */
-        public static final Bound BOTTOM = RangeTombstone.Bound.BOTTOM;
-        public static final Bound TOP = RangeTombstone.Bound.TOP;
-
-        protected Bound(Kind kind, ByteBuffer[] values)
-        {
-            super(kind, values);
-        }
-
-        public static Bound create(Kind kind, ByteBuffer[] values)
-        {
-            assert !kind.isBoundary();
-            return new Bound(kind, values);
-        }
-
-        public static Kind boundKind(boolean isStart, boolean isInclusive)
-        {
-            return isStart
-                 ? (isInclusive ? Kind.INCL_START_BOUND : Kind.EXCL_START_BOUND)
-                 : (isInclusive ? Kind.INCL_END_BOUND : Kind.EXCL_END_BOUND);
-        }
-
-        public static Bound inclusiveStartOf(ByteBuffer... values)
-        {
-            return create(Kind.INCL_START_BOUND, values);
-        }
-
-        public static Bound inclusiveEndOf(ByteBuffer... values)
-        {
-            return create(Kind.INCL_END_BOUND, values);
-        }
-
-        public static Bound exclusiveStartOf(ByteBuffer... values)
-        {
-            return create(Kind.EXCL_START_BOUND, values);
-        }
-
-        public static Bound exclusiveEndOf(ByteBuffer... values)
-        {
-            return create(Kind.EXCL_END_BOUND, values);
-        }
-
-        public static Bound inclusiveStartOf(ClusteringPrefix prefix)
-        {
-            ByteBuffer[] values = new ByteBuffer[prefix.size()];
-            for (int i = 0; i < prefix.size(); i++)
-                values[i] = prefix.get(i);
-            return inclusiveStartOf(values);
-        }
-
-        public static Bound exclusiveStartOf(ClusteringPrefix prefix)
-        {
-            ByteBuffer[] values = new ByteBuffer[prefix.size()];
-            for (int i = 0; i < prefix.size(); i++)
-                values[i] = prefix.get(i);
-            return exclusiveStartOf(values);
-        }
-
-        public static Bound inclusiveEndOf(ClusteringPrefix prefix)
-        {
-            ByteBuffer[] values = new ByteBuffer[prefix.size()];
-            for (int i = 0; i < prefix.size(); i++)
-                values[i] = prefix.get(i);
-            return inclusiveEndOf(values);
-        }
-
-        public static Bound create(ClusteringComparator comparator, boolean isStart, boolean isInclusive, Object... values)
-        {
-            CBuilder builder = CBuilder.create(comparator);
-            for (Object val : values)
-            {
-                if (val instanceof ByteBuffer)
-                    builder.add((ByteBuffer) val);
-                else
-                    builder.add(val);
-            }
-            return builder.buildBound(isStart, isInclusive);
-        }
-
-        public Bound withNewKind(Kind kind)
-        {
-            assert !kind.isBoundary();
-            return new Bound(kind, values);
-        }
-
-        public boolean isStart()
-        {
-            return kind().isStart();
-        }
-
-        public boolean isEnd()
-        {
-            return !isStart();
-        }
-
-        public boolean isInclusive()
-        {
-            return kind == Kind.INCL_START_BOUND || kind == Kind.INCL_END_BOUND;
-        }
-
-        public boolean isExclusive()
-        {
-            return kind == Kind.EXCL_START_BOUND || kind == Kind.EXCL_END_BOUND;
-        }
-
-        /**
-         * Returns the inverse of the current bound.
-         * <p>
-         * This invert both start into end (and vice-versa) and inclusive into exclusive (and vice-versa).
-         *
-         * @return the invert of this bound. For instance, if this bound is an exlusive start, this return
-         * an inclusive end with the same values.
-         */
-        public Slice.Bound invert()
-        {
-            return withNewKind(kind().invert());
-        }
-
-        // For use by intersects, it's called with the sstable bound opposite to the slice bound
-        // (so if the slice bound is a start, it's call with the max sstable bound)
-        private int compareTo(ClusteringComparator comparator, List<ByteBuffer> sstableBound)
-        {
-            for (int i = 0; i < sstableBound.size(); i++)
-            {
-                // Say the slice bound is a start. It means we're in the case where the max
-                // sstable bound is say (1:5) while the slice start is (1). So the start
-                // does start before the sstable end bound (and intersect it). It's the exact
-                // inverse with a end slice bound.
-                if (i >= size())
-                    return isStart() ? -1 : 1;
-
-                int cmp = comparator.compareComponent(i, get(i), sstableBound.get(i));
-                if (cmp != 0)
-                    return cmp;
-            }
-
-            // Say the slice bound is a start. I means we're in the case where the max
-            // sstable bound is say (1), while the slice start is (1:5). This again means
-            // that the slice start before the end bound.
-            if (size() > sstableBound.size())
-                return isStart() ? -1 : 1;
-
-            // The slice bound is equal to the sstable bound. Results depends on whether the slice is inclusive or not
-            return isInclusive() ? 0 : (isStart() ? 1 : -1);
-        }
-
-        public String toString(CFMetaData metadata)
-        {
-            return toString(metadata.comparator);
-        }
-
-        public String toString(ClusteringComparator comparator)
-        {
-            StringBuilder sb = new StringBuilder();
-            sb.append(kind()).append('(');
-            for (int i = 0; i < size(); i++)
-            {
-                if (i > 0)
-                    sb.append(", ");
-                sb.append(comparator.subtype(i).getString(get(i)));
-            }
-            return sb.append(')').toString();
-        }
-
-        /**
-         * Serializer for slice bounds.
-         * <p>
-         * Contrarily to {@code Clustering}, a slice bound can be a true prefix of the full clustering, so we actually record
-         * its size.
-         */
-        public static class Serializer
-        {
-            public void serialize(Slice.Bound bound, DataOutputPlus out, int version, List<AbstractType<?>> types) throws IOException
-            {
-                out.writeByte(bound.kind().ordinal());
-                out.writeShort(bound.size());
-                ClusteringPrefix.serializer.serializeValuesWithoutSize(bound, out, version, types);
-            }
-
-            public long serializedSize(Slice.Bound bound, int version, List<AbstractType<?>> types)
-            {
-                return 1 // kind ordinal
-                     + TypeSizes.sizeof((short)bound.size())
-                     + ClusteringPrefix.serializer.valuesWithoutSizeSerializedSize(bound, version, types);
-            }
-
-            public Slice.Bound deserialize(DataInputPlus in, int version, List<AbstractType<?>> types) throws IOException
-            {
-                Kind kind = Kind.values()[in.readByte()];
-                return deserializeValues(in, kind, version, types);
-            }
-
-            public Slice.Bound deserializeValues(DataInputPlus in, Kind kind, int version, List<AbstractType<?>> types) throws IOException
-            {
-                int size = in.readUnsignedShort();
-                if (size == 0)
-                    return kind.isStart() ? BOTTOM : TOP;
-
-                ByteBuffer[] values = ClusteringPrefix.serializer.deserializeValuesWithoutSize(in, size, version, types);
-                return Slice.Bound.create(kind, values);
-            }
-        }
-    }
 }

diff --git a/src/java/org/apache/cassandra/db/Slices.java b/src/java/org/apache/cassandra/db/Slices.java
index bb354a1..f880781 100644
--- a/src/java/org/apache/cassandra/db/Slices.java
+++ b/src/java/org/apache/cassandra/db/Slices.java

@@ -59,7 +59,7 @@
      */
     public static Slices with(ClusteringComparator comparator, Slice slice)
     {
-        if (slice.start() == Slice.Bound.BOTTOM && slice.end() == Slice.Bound.TOP)
+        if (slice.start() == ClusteringBound.BOTTOM && slice.end() == ClusteringBound.TOP)
             return Slices.ALL;
 
         assert comparator.compare(slice.start(), slice.end()) <= 0;
@@ -141,16 +141,6 @@
      */
     public abstract boolean intersects(List<ByteBuffer> minClusteringValues, List<ByteBuffer> maxClusteringValues);
 
-    /**
-     * Given a sliceable row iterator, returns a row iterator that only return rows selected by the slice of
-     * this {@code Slices} object.
-     *
-     * @param iter the sliceable iterator to filter.
-     *
-     * @return an iterator that only returns the rows (or rather Unfiltered) of {@code iter} that are selected by those slices.
-     */
-    public abstract UnfilteredRowIterator makeSliceIterator(SliceableUnfilteredRowIterator iter);
-
     public abstract String toCQLString(CFMetaData metadata);
 
     /**
@@ -196,7 +186,7 @@
             this.slices = new ArrayList<>(initialSize);
         }
 
-        public Builder add(Slice.Bound start, Slice.Bound end)
+        public Builder add(ClusteringBound start, ClusteringBound end)
         {
             return add(Slice.make(start, end));
         }
@@ -345,7 +335,7 @@
             for (int i = 0; i < size; i++)
                 slices[i] = Slice.serializer.deserialize(in, version, metadata.comparator.subtypes());
 
-            if (size == 1 && slices[0].start() == Slice.Bound.BOTTOM && slices[0].end() == Slice.Bound.TOP)
+            if (size == 1 && slices[0].start() == ClusteringBound.BOTTOM && slices[0].end() == ClusteringBound.TOP)
                 return ALL;
 
             return new ArrayBackedSlices(metadata.comparator, slices);
@@ -459,65 +449,6 @@
             return false;
         }
 
-        public UnfilteredRowIterator makeSliceIterator(final SliceableUnfilteredRowIterator iter)
-        {
-            return new WrappingUnfilteredRowIterator(iter)
-            {
-                private int nextSlice = iter.isReverseOrder() ? slices.length - 1 : 0;
-                private Iterator<Unfiltered> currentSliceIterator = Collections.emptyIterator();
-
-                private Unfiltered next;
-
-                @Override
-                public boolean hasNext()
-                {
-                    prepareNext();
-                    return next != null;
-                }
-
-                @Override
-                public Unfiltered next()
-                {
-                    prepareNext();
-                    Unfiltered toReturn = next;
-                    next = null;
-                    return toReturn;
-                }
-
-                private boolean hasMoreSlice()
-                {
-                    return isReverseOrder()
-                         ? nextSlice >= 0
-                         : nextSlice < slices.length;
-                }
-
-                private Slice popNextSlice()
-                {
-                    return slices[isReverseOrder() ? nextSlice-- : nextSlice++];
-                }
-
-                private void prepareNext()
-                {
-                    if (next != null)
-                        return;
-
-                    while (true)
-                    {
-                        if (currentSliceIterator.hasNext())
-                        {
-                            next = currentSliceIterator.next();
-                            return;
-                        }
-
-                        if (!hasMoreSlice())
-                            return;
-
-                        currentSliceIterator = iter.slice(popNextSlice());
-                    }
-                }
-            };
-        }
-
         public Iterator<Slice> iterator()
         {
             return Iterators.forArray(slices);
@@ -722,8 +653,8 @@
 
             public static ComponentOfSlice fromSlice(int component, Slice slice)
             {
-                Slice.Bound start = slice.start();
-                Slice.Bound end = slice.end();
+                ClusteringBound start = slice.start();
+                ClusteringBound end = slice.end();
 
                 if (component >= start.size() && component >= end.size())
                     return null;
@@ -745,7 +676,7 @@
 
             public boolean isEQ()
             {
-                return startValue.equals(endValue);
+                return Objects.equals(startValue, endValue);
             }
         }
     }
@@ -810,11 +741,6 @@
             return true;
         }
 
-        public UnfilteredRowIterator makeSliceIterator(SliceableUnfilteredRowIterator iter)
-        {
-            return iter;
-        }
-
         public Iterator<Slice> iterator()
         {
             return Iterators.singletonIterator(Slice.ALL);
@@ -890,15 +816,9 @@
             return false;
         }
 
-        public UnfilteredRowIterator makeSliceIterator(SliceableUnfilteredRowIterator iter)
-        {
-            return UnfilteredRowIterators.noRowsIterator(iter.metadata(), iter.partitionKey(), iter.staticRow(),
-                                                         iter.partitionLevelDeletion(), iter.isReverseOrder());
-        }
-
         public Iterator<Slice> iterator()
         {
-            return Iterators.emptyIterator();
+            return Collections.emptyIterator();
         }
 
         @Override

diff --git a/src/java/org/apache/cassandra/db/StorageHook.java b/src/java/org/apache/cassandra/db/StorageHook.java
new file mode 100644
index 0000000..0f27adb
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/StorageHook.java

@@ -0,0 +1,86 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db;
+
+import java.util.UUID;
+
+import org.apache.cassandra.db.filter.ClusteringIndexFilter;
+import org.apache.cassandra.db.filter.ColumnFilter;
+import org.apache.cassandra.db.partitions.PartitionUpdate;
+import org.apache.cassandra.db.rows.UnfilteredRowIterator;
+import org.apache.cassandra.db.rows.UnfilteredRowIteratorWithLowerBound;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.utils.FBUtilities;
+
+public interface StorageHook
+{
+    public static final StorageHook instance = createHook();
+
+    public void reportWrite(UUID cfid, PartitionUpdate partitionUpdate);
+    public void reportRead(UUID cfid, DecoratedKey key);
+    public UnfilteredRowIteratorWithLowerBound makeRowIteratorWithLowerBound(ColumnFamilyStore cfs,
+                                                                      DecoratedKey partitionKey,
+                                                                      SSTableReader sstable,
+                                                                      ClusteringIndexFilter filter,
+                                                                      ColumnFilter selectedColumns,
+                                                                      boolean isForThrift,
+                                                                      int nowInSec,
+                                                                      boolean applyThriftTransformation);
+    public UnfilteredRowIterator makeRowIterator(ColumnFamilyStore cfs,
+                                                 SSTableReader sstable,
+                                                 DecoratedKey key,
+                                                 Slices slices,
+                                                 ColumnFilter selectedColumns,
+                                                 boolean reversed,
+                                                 boolean isForThrift);
+
+    static StorageHook createHook()
+    {
+        String className =  System.getProperty("cassandra.storage_hook");
+        if (className != null)
+        {
+            return FBUtilities.construct(className, StorageHook.class.getSimpleName());
+        }
+        else
+        {
+            return new StorageHook()
+            {
+                public void reportWrite(UUID cfid, PartitionUpdate partitionUpdate) {}
+
+                public void reportRead(UUID cfid, DecoratedKey key) {}
+
+                public UnfilteredRowIteratorWithLowerBound makeRowIteratorWithLowerBound(ColumnFamilyStore cfs, DecoratedKey partitionKey, SSTableReader sstable, ClusteringIndexFilter filter, ColumnFilter selectedColumns, boolean isForThrift, int nowInSec, boolean applyThriftTransformation)
+                {
+                    return new UnfilteredRowIteratorWithLowerBound(partitionKey,
+                                                                   sstable,
+                                                                   filter,
+                                                                   selectedColumns,
+                                                                   isForThrift,
+                                                                   nowInSec,
+                                                                   applyThriftTransformation);
+                }
+
+                public UnfilteredRowIterator makeRowIterator(ColumnFamilyStore cfs, SSTableReader sstable, DecoratedKey key, Slices slices, ColumnFilter selectedColumns, boolean reversed, boolean isForThrift)
+                {
+                    return sstable.iterator(key, slices, selectedColumns, reversed, isForThrift);
+                }
+            };
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/SystemKeyspace.java b/src/java/org/apache/cassandra/db/SystemKeyspace.java
index da96b38..584279d 100644
--- a/src/java/org/apache/cassandra/db/SystemKeyspace.java
+++ b/src/java/org/apache/cassandra/db/SystemKeyspace.java

@@ -41,7 +41,7 @@
 import org.apache.cassandra.cql3.QueryProcessor;
 import org.apache.cassandra.cql3.UntypedResultSet;
 import org.apache.cassandra.cql3.functions.*;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.apache.cassandra.db.compaction.CompactionHistoryTabularData;
 import org.apache.cassandra.db.marshal.*;
 import org.apache.cassandra.db.partitions.PartitionUpdate;
@@ -262,6 +262,7 @@
                 "CREATE TABLE %s ("
                 + "keyspace_name text,"
                 + "view_name text,"
+                + "status_replicated boolean,"
                 + "PRIMARY KEY ((keyspace_name), view_name))");
 
     @Deprecated
@@ -450,10 +451,11 @@
                         .add(TimeFcts.all())
                         .add(BytesConversionFcts.all())
                         .add(AggregateFcts.all())
+                        .add(CastFcts.all())
                         .build();
     }
 
-    private static volatile Map<UUID, Pair<ReplayPosition, Long>> truncationRecords;
+    private static volatile Map<UUID, Pair<CommitLogPosition, Long>> truncationRecords;
 
     public enum BootstrapState
     {
@@ -534,13 +536,23 @@
         return !result.isEmpty();
     }
 
-    public static void setViewBuilt(String keyspaceName, String viewName)
+    public static boolean isViewStatusReplicated(String keyspaceName, String viewName)
     {
-        String req = "INSERT INTO %s.\"%s\" (keyspace_name, view_name) VALUES (?, ?)";
-        executeInternal(String.format(req, NAME, BUILT_VIEWS), keyspaceName, viewName);
-        forceBlockingFlush(BUILT_VIEWS);
+        String req = "SELECT status_replicated FROM %s.\"%s\" WHERE keyspace_name=? AND view_name=?";
+        UntypedResultSet result = executeInternal(String.format(req, NAME, BUILT_VIEWS), keyspaceName, viewName);
+
+        if (result.isEmpty())
+            return false;
+        UntypedResultSet.Row row = result.one();
+        return row.has("status_replicated") && row.getBoolean("status_replicated");
     }
 
+    public static void setViewBuilt(String keyspaceName, String viewName, boolean replicated)
+    {
+        String req = "INSERT INTO %s.\"%s\" (keyspace_name, view_name, status_replicated) VALUES (?, ?, ?)";
+        executeInternal(String.format(req, NAME, BUILT_VIEWS), keyspaceName, viewName, replicated);
+        forceBlockingFlush(BUILT_VIEWS);
+    }
 
     public static void setViewRemoved(String keyspaceName, String viewName)
     {
@@ -566,14 +578,18 @@
         // We flush the view built first, because if we fail now, we'll restart at the last place we checkpointed
         // view build.
         // If we flush the delete first, we'll have to restart from the beginning.
-        // Also, if the build succeeded, but the view build failed, we will be able to skip the view build check
-        // next boot.
-        setViewBuilt(ksname, viewName);
-        forceBlockingFlush(BUILT_VIEWS);
+        // Also, if writing to the built_view succeeds, but the view_builds_in_progress deletion fails, we will be able
+        // to skip the view build next boot.
+        setViewBuilt(ksname, viewName, false);
         executeInternal(String.format("DELETE FROM system.%s WHERE keyspace_name = ? AND view_name = ?", VIEWS_BUILDS_IN_PROGRESS), ksname, viewName);
         forceBlockingFlush(VIEWS_BUILDS_IN_PROGRESS);
     }
 
+    public static void setViewBuiltReplicated(String ksname, String viewName)
+    {
+        setViewBuilt(ksname, viewName, true);
+    }
+
     public static void updateViewBuildStatus(String ksname, String viewName, Token token)
     {
         String req = "INSERT INTO system.%s (keyspace_name, view_name, last_token) VALUES (?, ?, ?)";
@@ -603,7 +619,7 @@
         return Pair.create(generation, lastKey);
     }
 
-    public static synchronized void saveTruncationRecord(ColumnFamilyStore cfs, long truncatedAt, ReplayPosition position)
+    public static synchronized void saveTruncationRecord(ColumnFamilyStore cfs, long truncatedAt, CommitLogPosition position)
     {
         String req = "UPDATE system.%s SET truncated_at = truncated_at + ? WHERE key = '%s'";
         executeInternal(String.format(req, LOCAL, LOCAL), truncationAsMapEntry(cfs, truncatedAt, position));
@@ -622,44 +638,49 @@
         forceBlockingFlush(LOCAL);
     }
 
-    private static Map<UUID, ByteBuffer> truncationAsMapEntry(ColumnFamilyStore cfs, long truncatedAt, ReplayPosition position)
+    private static Map<UUID, ByteBuffer> truncationAsMapEntry(ColumnFamilyStore cfs, long truncatedAt, CommitLogPosition position)
     {
-        try (DataOutputBuffer out = new DataOutputBuffer())
+        DataOutputBuffer out = null;
+        try (DataOutputBuffer ignored = out = DataOutputBuffer.RECYCLER.get())
         {
-            ReplayPosition.serializer.serialize(position, out);
+            CommitLogPosition.serializer.serialize(position, out);
             out.writeLong(truncatedAt);
-            return singletonMap(cfs.metadata.cfId, ByteBuffer.wrap(out.getData(), 0, out.getLength()));
+            return singletonMap(cfs.metadata.cfId, out.asNewBuffer());
         }
         catch (IOException e)
         {
             throw new RuntimeException(e);
         }
+        finally
+        {
+            out.recycle();
+        }
     }
 
-    public static ReplayPosition getTruncatedPosition(UUID cfId)
+    public static CommitLogPosition getTruncatedPosition(UUID cfId)
     {
-        Pair<ReplayPosition, Long> record = getTruncationRecord(cfId);
+        Pair<CommitLogPosition, Long> record = getTruncationRecord(cfId);
         return record == null ? null : record.left;
     }
 
     public static long getTruncatedAt(UUID cfId)
     {
-        Pair<ReplayPosition, Long> record = getTruncationRecord(cfId);
+        Pair<CommitLogPosition, Long> record = getTruncationRecord(cfId);
         return record == null ? Long.MIN_VALUE : record.right;
     }
 
-    private static synchronized Pair<ReplayPosition, Long> getTruncationRecord(UUID cfId)
+    private static synchronized Pair<CommitLogPosition, Long> getTruncationRecord(UUID cfId)
     {
         if (truncationRecords == null)
             truncationRecords = readTruncationRecords();
         return truncationRecords.get(cfId);
     }
 
-    private static Map<UUID, Pair<ReplayPosition, Long>> readTruncationRecords()
+    private static Map<UUID, Pair<CommitLogPosition, Long>> readTruncationRecords()
     {
         UntypedResultSet rows = executeInternal(String.format("SELECT truncated_at FROM system.%s WHERE key = '%s'", LOCAL, LOCAL));
 
-        Map<UUID, Pair<ReplayPosition, Long>> records = new HashMap<>();
+        Map<UUID, Pair<CommitLogPosition, Long>> records = new HashMap<>();
 
         if (!rows.isEmpty() && rows.one().has("truncated_at"))
         {
@@ -671,11 +692,11 @@
         return records;
     }
 
-    private static Pair<ReplayPosition, Long> truncationRecordFromBlob(ByteBuffer bytes)
+    private static Pair<CommitLogPosition, Long> truncationRecordFromBlob(ByteBuffer bytes)
     {
         try (RebufferingInputStream in = new DataInputBuffer(bytes, true))
         {
-            return Pair.create(ReplayPosition.serializer.deserialize(in), in.available() > 0 ? in.readLong() : Long.MIN_VALUE);
+            return Pair.create(CommitLogPosition.serializer.deserialize(in), in.available() > 0 ? in.readLong() : Long.MIN_VALUE);
         }
         catch (IOException e)
         {

diff --git a/src/java/org/apache/cassandra/db/UnfilteredDeserializer.java b/src/java/org/apache/cassandra/db/UnfilteredDeserializer.java
index bf9c2b8..1ab96fa 100644
--- a/src/java/org/apache/cassandra/db/UnfilteredDeserializer.java
+++ b/src/java/org/apache/cassandra/db/UnfilteredDeserializer.java

@@ -81,7 +81,7 @@
      * comparison. Whenever we know what to do with this atom (read it or skip it),
      * readNext or skipNext should be called.
      */
-    public abstract int compareNextTo(Slice.Bound bound) throws IOException;
+    public abstract int compareNextTo(ClusteringBound bound) throws IOException;
 
     /**
      * Returns whether the next atom is a row or not.
@@ -173,7 +173,7 @@
             isReady = true;
         }
 
-        public int compareNextTo(Slice.Bound bound) throws IOException
+        public int compareNextTo(ClusteringBound bound) throws IOException
         {
             if (!isReady)
                 prepareNext();
@@ -202,7 +202,7 @@
             isReady = false;
             if (UnfilteredSerializer.kind(nextFlags) == Unfiltered.Kind.RANGE_TOMBSTONE_MARKER)
             {
-                RangeTombstone.Bound bound = clusteringDeserializer.deserializeNextBound();
+                ClusteringBoundOrBoundary bound = clusteringDeserializer.deserializeNextBound();
                 return UnfilteredSerializer.serializer.deserializeMarkerBody(in, header, bound);
             }
             else
@@ -326,7 +326,7 @@
             return tombstone.isCollectionTombstone() || tombstone.isRowDeletion(metadata);
         }
 
-        public int compareNextTo(Slice.Bound bound) throws IOException
+        public int compareNextTo(ClusteringBound bound) throws IOException
         {
             if (!hasNext())
                 throw new IllegalStateException();
@@ -401,7 +401,7 @@
             {
                 this.grouper = new LegacyLayout.CellGrouper(metadata, helper);
                 this.tombstoneTracker = new TombstoneTracker(partitionDeletion);
-                this.atoms = new AtomIterator(tombstoneTracker);
+                this.atoms = new AtomIterator();
             }
 
             public boolean hasNext()
@@ -432,7 +432,7 @@
                         return false;
                     }
                 }
-                return next != null;
+                return true;
             }
 
             private Unfiltered readRow(LegacyLayout.LegacyAtom first)
@@ -485,13 +485,11 @@
         // the internal state of the iterator so it's cleaner to do it ourselves.
         private class AtomIterator implements PeekingIterator<LegacyLayout.LegacyAtom>
         {
-            private final TombstoneTracker tombstoneTracker;
             private boolean isDone;
             private LegacyLayout.LegacyAtom next;
 
-            private AtomIterator(TombstoneTracker tombstoneTracker)
+            private AtomIterator()
             {
-                this.tombstoneTracker = tombstoneTracker;
             }
 
             public boolean hasNext()

diff --git a/src/java/org/apache/cassandra/db/WindowsFailedSnapshotTracker.java b/src/java/org/apache/cassandra/db/WindowsFailedSnapshotTracker.java
index 7cc7893..134fb11 100644
--- a/src/java/org/apache/cassandra/db/WindowsFailedSnapshotTracker.java
+++ b/src/java/org/apache/cassandra/db/WindowsFailedSnapshotTracker.java

@@ -85,8 +85,7 @@
             }
             catch (IOException e)
             {
-                logger.warn("Failed to open {}. Obsolete snapshots from previous runs will not be deleted.", TODELETEFILE);
-                logger.warn("Exception: " + e);
+                logger.warn("Failed to open {}. Obsolete snapshots from previous runs will not be deleted.", TODELETEFILE, e);
             }
         }
 

diff --git a/src/java/org/apache/cassandra/db/WriteType.java b/src/java/org/apache/cassandra/db/WriteType.java
index fdbe97d..11909e7 100644
--- a/src/java/org/apache/cassandra/db/WriteType.java
+++ b/src/java/org/apache/cassandra/db/WriteType.java

@@ -25,5 +25,6 @@
     COUNTER,
     BATCH_LOG,
     CAS,
-    VIEW;
+    VIEW,
+    CDC;
 }

diff --git a/src/java/org/apache/cassandra/db/columniterator/AbstractSSTableIterator.java b/src/java/org/apache/cassandra/db/columniterator/AbstractSSTableIterator.java
index d57e6bc..005bb2c 100644
--- a/src/java/org/apache/cassandra/db/columniterator/AbstractSSTableIterator.java
+++ b/src/java/org/apache/cassandra/db/columniterator/AbstractSSTableIterator.java

@@ -18,22 +18,23 @@
 package org.apache.cassandra.db.columniterator;
 
 import java.io.IOException;
-import java.util.Collections;
+import java.util.Comparator;
 import java.util.Iterator;
-import java.util.List;
+import java.util.NoSuchElementException;
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.io.sstable.IndexInfo;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.io.sstable.CorruptSSTableException;
-import org.apache.cassandra.io.sstable.IndexHelper;
 import org.apache.cassandra.io.util.FileDataInput;
 import org.apache.cassandra.io.util.DataPosition;
+import org.apache.cassandra.io.util.SegmentedFile;
 import org.apache.cassandra.utils.ByteBufferUtil;
 
-abstract class AbstractSSTableIterator implements SliceableUnfilteredRowIterator
+public abstract class AbstractSSTableIterator implements UnfilteredRowIterator
 {
     protected final SSTableReader sstable;
     protected final DecoratedKey key;
@@ -46,20 +47,29 @@
 
     private final boolean isForThrift;
 
+    protected final SegmentedFile ifile;
+
     private boolean isClosed;
 
+    protected final Slices slices;
+    protected int slice;
+
     @SuppressWarnings("resource") // We need this because the analysis is not able to determine that we do close
                                   // file on every path where we created it.
     protected AbstractSSTableIterator(SSTableReader sstable,
                                       FileDataInput file,
                                       DecoratedKey key,
                                       RowIndexEntry indexEntry,
+                                      Slices slices,
                                       ColumnFilter columnFilter,
-                                      boolean isForThrift)
+                                      boolean isForThrift,
+                                      SegmentedFile ifile)
     {
         this.sstable = sstable;
+        this.ifile = ifile;
         this.key = key;
         this.columns = columnFilter;
+        this.slices = slices;
         this.helper = new SerializationHelper(sstable.metadata, sstable.descriptor.version.correspondingMessagingVersion(), SerializationHelper.Flag.LOCAL, columnFilter);
         this.isForThrift = isForThrift;
 
@@ -109,6 +119,9 @@
                     this.reader = needsReader ? createReader(indexEntry, file, shouldCloseFile) : null;
                 }
 
+                if (reader != null && !slices.isEmpty())
+                    reader.setForSlice(slices.get(0));
+
                 if (reader == null && file != null && shouldCloseFile)
                     file.close();
             }
@@ -180,7 +193,13 @@
         }
     }
 
-    protected abstract Reader createReader(RowIndexEntry indexEntry, FileDataInput file, boolean shouldCloseFile);
+    protected abstract Reader createReaderInternal(RowIndexEntry indexEntry, FileDataInput file, boolean shouldCloseFile);
+
+    private Reader createReader(RowIndexEntry indexEntry, FileDataInput file, boolean shouldCloseFile)
+    {
+        return slices.isEmpty() ? new NoRowsReader(file, shouldCloseFile)
+                                : createReaderInternal(indexEntry, file, shouldCloseFile);
+    };
 
     public CFMetaData metadata()
     {
@@ -209,14 +228,24 @@
 
     public EncodingStats stats()
     {
-        // We could return sstable.header.stats(), but this may not be as accurate than the actual sstable stats (see
-        // SerializationHeader.make() for details) so we use the latter instead.
-        return new EncodingStats(sstable.getMinTimestamp(), sstable.getMinLocalDeletionTime(), sstable.getMinTTL());
+        return sstable.stats();
     }
 
     public boolean hasNext()
     {
-        return reader != null && reader.hasNext();
+        while (true)
+        {
+            if (reader == null)
+                return false;
+
+            if (reader.hasNext())
+                return true;
+
+            if (++slice >= slices.size())
+                return false;
+
+            slice(slices.get(slice));
+        }
     }
 
     public Unfiltered next()
@@ -225,15 +254,12 @@
         return reader.next();
     }
 
-    public Iterator<Unfiltered> slice(Slice slice)
+    private void slice(Slice slice)
     {
         try
         {
-            if (reader == null)
-                return Collections.emptyIterator();
-
-            reader.setForSlice(slice);
-            return reader;
+            if (reader != null)
+                reader.setForSlice(slice);
         }
         catch (IOException e)
         {
@@ -388,14 +414,38 @@
         }
     }
 
+    // Reader for when we have Slices.NONE but need to read static row or partition level deletion
+    private class NoRowsReader extends AbstractSSTableIterator.Reader
+    {
+        private NoRowsReader(FileDataInput file, boolean shouldCloseFile)
+        {
+            super(file, shouldCloseFile);
+        }
+
+        public void setForSlice(Slice slice) throws IOException
+        {
+            return;
+        }
+
+        protected boolean hasNextInternal() throws IOException
+        {
+            return false;
+        }
+
+        protected Unfiltered nextInternal() throws IOException
+        {
+            throw new NoSuchElementException();
+        }
+    }
+
     // Used by indexed readers to store where they are of the index.
-    protected static class IndexState
+    public static class IndexState implements AutoCloseable
     {
         private final Reader reader;
         private final ClusteringComparator comparator;
 
         private final RowIndexEntry indexEntry;
-        private final List<IndexHelper.IndexInfo> indexes;
+        private final RowIndexEntry.IndexInfoRetriever indexInfoRetriever;
         private final boolean reversed;
 
         private int currentIndexIdx;
@@ -403,43 +453,43 @@
         // Marks the beginning of the block corresponding to currentIndexIdx.
         private DataPosition mark;
 
-        public IndexState(Reader reader, ClusteringComparator comparator, RowIndexEntry indexEntry, boolean reversed)
+        public IndexState(Reader reader, ClusteringComparator comparator, RowIndexEntry indexEntry, boolean reversed, SegmentedFile indexFile)
         {
             this.reader = reader;
             this.comparator = comparator;
             this.indexEntry = indexEntry;
-            this.indexes = indexEntry.columnsIndex();
+            this.indexInfoRetriever = indexEntry.openWithIndex(indexFile);
             this.reversed = reversed;
-            this.currentIndexIdx = reversed ? indexEntry.columnsIndex().size() : -1;
+            this.currentIndexIdx = reversed ? indexEntry.columnsIndexCount() : -1;
         }
 
         public boolean isDone()
         {
-            return reversed ? currentIndexIdx < 0 : currentIndexIdx >= indexes.size();
+            return reversed ? currentIndexIdx < 0 : currentIndexIdx >= indexEntry.columnsIndexCount();
         }
 
         // Sets the reader to the beginning of blockIdx.
         public void setToBlock(int blockIdx) throws IOException
         {
-            if (blockIdx >= 0 && blockIdx < indexes.size())
+            if (blockIdx >= 0 && blockIdx < indexEntry.columnsIndexCount())
             {
                 reader.seekToPosition(columnOffset(blockIdx));
                 reader.deserializer.clearState();
             }
 
             currentIndexIdx = blockIdx;
-            reader.openMarker = blockIdx > 0 ? indexes.get(blockIdx - 1).endOpenMarker : null;
+            reader.openMarker = blockIdx > 0 ? index(blockIdx - 1).endOpenMarker : null;
             mark = reader.file.mark();
         }
 
-        private long columnOffset(int i)
+        private long columnOffset(int i) throws IOException
         {
-            return indexEntry.position + indexes.get(i).offset;
+            return indexEntry.position + index(i).offset;
         }
 
         public int blocksCount()
         {
-            return indexes.size();
+            return indexEntry.columnsIndexCount();
         }
 
         // Update the block idx based on the current reader position if we're past the current block.
@@ -458,7 +508,7 @@
                 return;
             }
 
-            while (currentIndexIdx + 1 < indexes.size() && isPastCurrentBlock())
+            while (currentIndexIdx + 1 < indexEntry.columnsIndexCount() && isPastCurrentBlock())
             {
                 reader.openMarker = currentIndex().endOpenMarker;
                 ++currentIndexIdx;
@@ -481,7 +531,7 @@
         }
 
         // Check if we've crossed an index boundary (based on the mark on the beginning of the index block).
-        public boolean isPastCurrentBlock()
+        public boolean isPastCurrentBlock() throws IOException
         {
             assert reader.deserializer != null;
             long correction = reader.deserializer.bytesReadForUnconsumedData();
@@ -493,32 +543,92 @@
             return currentIndexIdx;
         }
 
-        public IndexHelper.IndexInfo currentIndex()
+        public IndexInfo currentIndex() throws IOException
         {
             return index(currentIndexIdx);
         }
 
-        public IndexHelper.IndexInfo index(int i)
+        public IndexInfo index(int i) throws IOException
         {
-            return indexes.get(i);
+            return indexInfoRetriever.columnsIndex(i);
         }
 
         // Finds the index of the first block containing the provided bound, starting at the provided index.
         // Will be -1 if the bound is before any block, and blocksCount() if it is after every block.
-        public int findBlockIndex(Slice.Bound bound, int fromIdx)
+        public int findBlockIndex(ClusteringBound bound, int fromIdx) throws IOException
         {
-            if (bound == Slice.Bound.BOTTOM)
+            if (bound == ClusteringBound.BOTTOM)
                 return -1;
-            if (bound == Slice.Bound.TOP)
+            if (bound == ClusteringBound.TOP)
                 return blocksCount();
 
-            return IndexHelper.indexFor(bound, indexes, comparator, reversed, fromIdx);
+            return indexFor(bound, fromIdx);
+        }
+
+        public int indexFor(ClusteringPrefix name, int lastIndex) throws IOException
+        {
+            IndexInfo target = new IndexInfo(name, name, 0, 0, null);
+            /*
+            Take the example from the unit test, and say your index looks like this:
+            [0..5][10..15][20..25]
+            and you look for the slice [13..17].
+
+            When doing forward slice, we are doing a binary search comparing 13 (the start of the query)
+            to the lastName part of the index slot. You'll end up with the "first" slot, going from left to right,
+            that may contain the start.
+
+            When doing a reverse slice, we do the same thing, only using as a start column the end of the query,
+            i.e. 17 in this example, compared to the firstName part of the index slots.  bsearch will give us the
+            first slot where firstName > start ([20..25] here), so we subtract an extra one to get the slot just before.
+            */
+            int startIdx = 0;
+            int endIdx = indexEntry.columnsIndexCount() - 1;
+
+            if (reversed)
+            {
+                if (lastIndex < endIdx)
+                {
+                    endIdx = lastIndex;
+                }
+            }
+            else
+            {
+                if (lastIndex > 0)
+                {
+                    startIdx = lastIndex;
+                }
+            }
+
+            int index = binarySearch(target, comparator.indexComparator(reversed), startIdx, endIdx);
+            return (index < 0 ? -index - (reversed ? 2 : 1) : index);
+        }
+
+        private int binarySearch(IndexInfo key, Comparator<IndexInfo> c, int low, int high) throws IOException {
+            while (low <= high) {
+                int mid = (low + high) >>> 1;
+                IndexInfo midVal = index(mid);
+                int cmp = c.compare(midVal, key);
+
+                if (cmp < 0)
+                    low = mid + 1;
+                else if (cmp > 0)
+                    high = mid - 1;
+                else
+                    return mid;
+            }
+            return -(low + 1);
         }
 
         @Override
         public String toString()
         {
-            return String.format("IndexState(indexSize=%d, currentBlock=%d, reversed=%b)", indexes.size(), currentIndexIdx, reversed);
+            return String.format("IndexState(indexSize=%d, currentBlock=%d, reversed=%b)", indexEntry.columnsIndexCount(), currentIndexIdx, reversed);
+        }
+
+        @Override
+        public void close() throws IOException
+        {
+            indexInfoRetriever.close();
         }
     }
 }

diff --git a/src/java/org/apache/cassandra/db/columniterator/SSTableIterator.java b/src/java/org/apache/cassandra/db/columniterator/SSTableIterator.java
index 0409310..e4f6700 100644
--- a/src/java/org/apache/cassandra/db/columniterator/SSTableIterator.java
+++ b/src/java/org/apache/cassandra/db/columniterator/SSTableIterator.java

@@ -25,28 +25,26 @@
 import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.io.util.FileDataInput;
+import org.apache.cassandra.io.util.SegmentedFile;
 
 /**
  *  A Cell Iterator over SSTable
  */
 public class SSTableIterator extends AbstractSSTableIterator
 {
-    public SSTableIterator(SSTableReader sstable, DecoratedKey key, ColumnFilter columns, boolean isForThrift)
-    {
-        this(sstable, null, key, sstable.getPosition(key, SSTableReader.Operator.EQ), columns, isForThrift);
-    }
-
     public SSTableIterator(SSTableReader sstable,
                            FileDataInput file,
                            DecoratedKey key,
                            RowIndexEntry indexEntry,
+                           Slices slices,
                            ColumnFilter columns,
-                           boolean isForThrift)
+                           boolean isForThrift,
+                           SegmentedFile ifile)
     {
-        super(sstable, file, key, indexEntry, columns, isForThrift);
+        super(sstable, file, key, indexEntry, slices, columns, isForThrift, ifile);
     }
 
-    protected Reader createReader(RowIndexEntry indexEntry, FileDataInput file, boolean shouldCloseFile)
+    protected Reader createReaderInternal(RowIndexEntry indexEntry, FileDataInput file, boolean shouldCloseFile)
     {
         return indexEntry.isIndexed()
              ? new ForwardIndexedReader(indexEntry, file, shouldCloseFile)
@@ -61,9 +59,9 @@
     private class ForwardReader extends Reader
     {
         // The start of the current slice. This will be null as soon as we know we've passed that bound.
-        protected Slice.Bound start;
+        protected ClusteringBound start;
         // The end of the current slice. Will never be null.
-        protected Slice.Bound end = Slice.Bound.TOP;
+        protected ClusteringBound end = ClusteringBound.TOP;
 
         protected Unfiltered next; // the next element to return: this is computed by hasNextInternal().
 
@@ -77,7 +75,7 @@
 
         public void setForSlice(Slice slice) throws IOException
         {
-            start = slice.start() == Slice.Bound.BOTTOM ? null : slice.start();
+            start = slice.start() == ClusteringBound.BOTTOM ? null : slice.start();
             end = slice.end();
 
             sliceDone = false;
@@ -105,7 +103,7 @@
                     updateOpenMarker((RangeTombstoneMarker)deserializer.readNext());
             }
 
-            Slice.Bound sliceStart = start;
+            ClusteringBound sliceStart = start;
             start = null;
 
             // We've reached the beginning of our queried slice. If we have an open marker
@@ -185,11 +183,18 @@
         private ForwardIndexedReader(RowIndexEntry indexEntry, FileDataInput file, boolean shouldCloseFile)
         {
             super(file, shouldCloseFile);
-            this.indexState = new IndexState(this, sstable.metadata.comparator, indexEntry, false);
+            this.indexState = new IndexState(this, sstable.metadata.comparator, indexEntry, false, ifile);
             this.lastBlockIdx = indexState.blocksCount(); // if we never call setForSlice, that's where we want to stop
         }
 
         @Override
+        public void close() throws IOException
+        {
+            super.close();
+            this.indexState.close();
+        }
+
+        @Override
         public void setForSlice(Slice slice) throws IOException
         {
             super.setForSlice(slice);

diff --git a/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java b/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java
index 3e49a3a..903fd59 100644
--- a/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java
+++ b/src/java/org/apache/cassandra/db/columniterator/SSTableReversedIterator.java

@@ -27,6 +27,7 @@
 import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.io.util.FileDataInput;
+import org.apache.cassandra.io.util.SegmentedFile;
 import org.apache.cassandra.utils.btree.BTree;
 
 /**
@@ -34,22 +35,19 @@
  */
 public class SSTableReversedIterator extends AbstractSSTableIterator
 {
-    public SSTableReversedIterator(SSTableReader sstable, DecoratedKey key, ColumnFilter columns, boolean isForThrift)
-    {
-        this(sstable, null, key, sstable.getPosition(key, SSTableReader.Operator.EQ), columns, isForThrift);
-    }
-
     public SSTableReversedIterator(SSTableReader sstable,
                                    FileDataInput file,
                                    DecoratedKey key,
                                    RowIndexEntry indexEntry,
+                                   Slices slices,
                                    ColumnFilter columns,
-                                   boolean isForThrift)
+                                   boolean isForThrift,
+                                   SegmentedFile ifile)
     {
-        super(sstable, file, key, indexEntry, columns, isForThrift);
+        super(sstable, file, key, indexEntry, slices, columns, isForThrift, ifile);
     }
 
-    protected Reader createReader(RowIndexEntry indexEntry, FileDataInput file, boolean shouldCloseFile)
+    protected Reader createReaderInternal(RowIndexEntry indexEntry, FileDataInput file, boolean shouldCloseFile)
     {
         return indexEntry.isIndexed()
              ? new ReverseIndexedReader(indexEntry, file, shouldCloseFile)
@@ -136,14 +134,14 @@
             return iterator.next();
         }
 
-        protected boolean stopReadingDisk()
+        protected boolean stopReadingDisk() throws IOException
         {
             return false;
         }
 
         // Reads the unfiltered from disk and load them into the reader buffer. It stops reading when either the partition
         // is fully read, or when stopReadingDisk() returns true.
-        protected void loadFromDisk(Slice.Bound start, Slice.Bound end, boolean includeFirst) throws IOException
+        protected void loadFromDisk(ClusteringBound start, ClusteringBound end, boolean includeFirst) throws IOException
         {
             buffer.reset();
 
@@ -165,7 +163,7 @@
             // If we have an open marker, it's either one from what we just skipped (if start != null), or it's from the previous index block.
             if (openMarker != null)
             {
-                RangeTombstone.Bound markerStart = start == null ? RangeTombstone.Bound.BOTTOM : RangeTombstone.Bound.fromSliceBound(start);
+                ClusteringBound markerStart = start == null ? ClusteringBound.BOTTOM : start;
                 buffer.add(new RangeTombstoneBoundMarker(markerStart, openMarker));
             }
 
@@ -188,7 +186,7 @@
             if (openMarker != null)
             {
                 // If we have no end and still an openMarker, this means we're indexed and the marker is closed in a following block.
-                RangeTombstone.Bound markerEnd = end == null ? RangeTombstone.Bound.TOP : RangeTombstone.Bound.fromSliceBound(end);
+                ClusteringBound markerEnd = end == null ? ClusteringBound.TOP : end;
                 buffer.add(new RangeTombstoneBoundMarker(markerEnd, getAndClearOpenMarker()));
             }
 
@@ -208,7 +206,14 @@
         private ReverseIndexedReader(RowIndexEntry indexEntry, FileDataInput file, boolean shouldCloseFile)
         {
             super(file, shouldCloseFile);
-            this.indexState = new IndexState(this, sstable.metadata.comparator, indexEntry, true);
+            this.indexState = new IndexState(this, sstable.metadata.comparator, indexEntry, true, ifile);
+        }
+
+        @Override
+        public void close() throws IOException
+        {
+            super.close();
+            this.indexState.close();
         }
 
         @Override
@@ -308,7 +313,7 @@
         }
 
         @Override
-        protected boolean stopReadingDisk()
+        protected boolean stopReadingDisk() throws IOException
         {
             return indexState.isPastCurrentBlock();
         }
@@ -348,7 +353,7 @@
         public void reset()
         {
             built = null;
-            rowBuilder.reuse();
+            rowBuilder = BTree.builder(metadata.comparator);
             deletionBuilder = MutableDeletionInfo.builder(partitionLevelDeletion, metadata().comparator, false);
         }
 

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManager.java b/src/java/org/apache/cassandra/db/commitlog/AbstractCommitLogSegmentManager.java
similarity index 78%
rename from src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManager.java
rename to src/java/org/apache/cassandra/db/commitlog/AbstractCommitLogSegmentManager.java
index 66ad6a3..7ea7439 100644
--- a/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManager.java
+++ b/src/java/org/apache/cassandra/db/commitlog/AbstractCommitLogSegmentManager.java

@@ -18,52 +18,34 @@
 package org.apache.cassandra.db.commitlog;
 
 import java.io.File;
-import java.util.ArrayList;
-import java.util.Collection;
-import java.util.Collections;
-import java.util.LinkedHashMap;
-import java.util.List;
-import java.util.Map;
-import java.util.UUID;
-import java.util.concurrent.BlockingQueue;
-import java.util.concurrent.ConcurrentLinkedQueue;
-import java.util.concurrent.Future;
-import java.util.concurrent.LinkedBlockingQueue;
-import java.util.concurrent.TimeUnit;
+import java.io.IOException;
+import java.util.*;
+import java.util.concurrent.*;
 import java.util.concurrent.atomic.AtomicLong;
 
-import org.apache.cassandra.config.DatabaseDescriptor;
-import org.apache.cassandra.config.Schema;
-import org.apache.cassandra.db.ColumnFamilyStore;
-import org.apache.cassandra.db.Keyspace;
-import org.apache.cassandra.db.Mutation;
-import org.apache.cassandra.db.commitlog.CommitLogSegment.Allocation;
-import org.apache.cassandra.io.util.FileUtils;
-import org.apache.cassandra.utils.JVMStabilityInspector;
-import org.apache.cassandra.utils.Pair;
-import org.apache.cassandra.utils.WrappedRunnable;
-import org.apache.cassandra.utils.concurrent.WaitQueue;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.collect.Iterables;
-import com.google.common.util.concurrent.Futures;
-import com.google.common.util.concurrent.ListenableFuture;
-import com.google.common.util.concurrent.Runnables;
-import com.google.common.util.concurrent.Uninterruptibles;
+import com.google.common.util.concurrent.*;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.config.Schema;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.utils.*;
+import org.apache.cassandra.utils.concurrent.WaitQueue;
+
+import static org.apache.cassandra.db.commitlog.CommitLogSegment.Allocation;
 
 /**
  * Performs eager-creation of commit log segments in a background thread. All the
  * public methods are thread safe.
  */
-public class CommitLogSegmentManager
+public abstract class AbstractCommitLogSegmentManager
 {
-    static final Logger logger = LoggerFactory.getLogger(CommitLogSegmentManager.class);
+    static final Logger logger = LoggerFactory.getLogger(AbstractCommitLogSegmentManager.class);
 
-    /**
-     * Queue of work to be done by the manager thread, also used to wake the thread to perform segment allocation.
-     */
+    // Queue of work to be done by the manager thread, also used to wake the thread to perform segment allocation.
     private final BlockingQueue<Runnable> segmentManagementTasks = new LinkedBlockingQueue<>();
 
     /** Segments that are ready to be used. Head of the queue is the one we allocate writes to */
@@ -73,10 +55,12 @@
     private final ConcurrentLinkedQueue<CommitLogSegment> activeSegments = new ConcurrentLinkedQueue<>();
 
     /** The segment we are currently allocating commit log records to */
-    private volatile CommitLogSegment allocatingFrom = null;
+    protected volatile CommitLogSegment allocatingFrom = null;
 
     private final WaitQueue hasAvailableSegments = new WaitQueue();
 
+    final String storageDirectory;
+
     /**
      * Tracks commitlog size, in multiples of the segment size.  We need to do this so we can "promise" size
      * adjustments ahead of actually adding/freeing segments on disk, so that the "evict oldest segment" logic
@@ -92,12 +76,16 @@
     volatile boolean createReserveSegments = false;
 
     private Thread managerThread;
-    private volatile boolean run = true;
-    private final CommitLog commitLog;
+    protected volatile boolean run = true;
+    protected final CommitLog commitLog;
 
-    CommitLogSegmentManager(final CommitLog commitLog)
+    private static final SimpleCachedBufferPool bufferPool =
+        new SimpleCachedBufferPool(DatabaseDescriptor.getCommitLogMaxCompressionBuffersInPool(), DatabaseDescriptor.getCommitLogSegmentSize());
+
+    AbstractCommitLogSegmentManager(final CommitLog commitLog, String storageDirectory)
     {
         this.commitLog = commitLog;
+        this.storageDirectory = storageDirectory;
     }
 
     void start()
@@ -115,11 +103,13 @@
                         if (task == null)
                         {
                             // if we have no more work to do, check if we should create a new segment
-                            if (!atSegmentLimit() && availableSegments.isEmpty() && (activeSegments.isEmpty() || createReserveSegments))
+                            if (!atSegmentLimit() &&
+                                availableSegments.isEmpty() &&
+                                (activeSegments.isEmpty() || createReserveSegments))
                             {
                                 logger.trace("No segments in reserve; creating a fresh one");
                                 // TODO : some error handling in case we fail to create a new segment
-                                availableSegments.add(CommitLogSegment.createSegment(commitLog, () -> wakeManager()));
+                                availableSegments.add(createSegment());
                                 hasAvailableSegments.signalAll();
                             }
 
@@ -141,9 +131,10 @@
                                 flushDataFrom(segmentsToRecycle, false);
                             }
 
+                            // Since we're operating on a "null" allocation task, block here for the next task on the
+                            // queue rather than looping, grabbing another null, and repeating the above work.
                             try
                             {
-                                // wait for new work to be provided
                                 task = segmentManagementTasks.take();
                             }
                             catch (InterruptedException e)
@@ -151,7 +142,6 @@
                                 throw new AssertionError();
                             }
                         }
-
                         task.run();
                     }
                     catch (Throwable t)
@@ -167,9 +157,8 @@
 
             private boolean atSegmentLimit()
             {
-                return CommitLogSegment.usesBufferPool(commitLog) && CompressedSegment.hasReachedPoolLimit();
+                return CommitLogSegment.usesBufferPool(commitLog) && bufferPool.atLimit();
             }
-
         };
 
         run = true;
@@ -178,28 +167,44 @@
         managerThread.start();
     }
 
+
     /**
-     * Reserve space in the current segment for the provided mutation or, if there isn't space available,
-     * create a new segment.
-     *
-     * @return the provided Allocation object
+     * Shut down the CLSM. Used both during testing and during regular shutdown, so needs to stop everything.
      */
-    public Allocation allocate(Mutation mutation, int size)
-    {
-        CommitLogSegment segment = allocatingFrom();
+    public abstract void shutdown();
 
-        Allocation alloc;
-        while ( null == (alloc = segment.allocate(mutation, size)) )
-        {
-            // failed to allocate, so move to a new segment with enough room
-            advanceAllocatingFrom(segment);
-            segment = allocatingFrom;
-        }
+    /**
+     * Allocate a segment within this CLSM. Should either succeed or throw.
+     */
+    public abstract Allocation allocate(Mutation mutation, int size);
 
-        return alloc;
-    }
+    /**
+     * The recovery and replay process replays mutations into memtables and flushes them to disk. Individual CLSM
+     * decide what to do with those segments on disk after they've been replayed.
+     */
+    abstract void handleReplayedSegment(final File file);
 
-    // simple wrapper to ensure non-null value for allocatingFrom; only necessary on first call
+    /**
+     * Hook to allow segment managers to track state surrounding creation of new segments. Onl perform as task submit
+     * to segment manager so it's performed on segment management thread.
+     */
+    abstract CommitLogSegment createSegment();
+
+    /**
+     * Indicates that a segment file has been flushed and is no longer needed. Only perform as task submit to segment
+     * manager so it's performend on segment management thread, or perform while segment management thread is shutdown
+     * during testing resets.
+     *
+     * @param segment segment to be discarded
+     * @param delete  whether or not the segment is safe to be deleted.
+     */
+    abstract void discard(CommitLogSegment segment, boolean delete);
+
+
+    /**
+     * Grab the current CommitLogSegment we're allocating from. Also serves as a utility method to block while the allocator
+     * is working on initial allocation of a CommitLogSegment.
+     */
     CommitLogSegment allocatingFrom()
     {
         CommitLogSegment r = allocatingFrom;
@@ -212,9 +217,12 @@
     }
 
     /**
-     * Fetches a new segment from the queue, creating a new one if necessary, and activates it
+     * Fetches a new segment from the queue, signaling the management thread to create a new one if necessary, and "activates" it.
+     * Blocks until a new segment is allocated and the thread requesting an advanceAllocatingFrom is signalled.
+     *
+     * WARNING: Assumes segment management thread always succeeds in allocating a new segment or kills the JVM.
      */
-    private void advanceAllocatingFrom(CommitLogSegment old)
+    protected void advanceAllocatingFrom(CommitLogSegment old)
     {
         while (true)
         {
@@ -277,7 +285,7 @@
         }
     }
 
-    private void wakeManager()
+    protected void wakeManager()
     {
         // put a NO-OP on the queue, to trigger management thread (and create a new segment if necessary)
         segmentManagementTasks.add(Runnables.doNothing());
@@ -310,14 +318,16 @@
 
             for (CommitLogSegment segment : activeSegments)
                 for (UUID cfId : droppedCfs)
-                    segment.markClean(cfId, segment.getContext());
+                    segment.markClean(cfId, segment.getCurrentCommitLogPosition());
 
             // now recycle segments that are unused, as we may not have triggered a discardCompletedSegments()
             // if the previous active segment was the only one to recycle (since an active segment isn't
             // necessarily dirty, and we only call dCS after a flush).
             for (CommitLogSegment segment : activeSegments)
+            {
                 if (segment.isUnused())
                     recycleSegment(segment);
+            }
 
             CommitLogSegment first;
             if ((first = activeSegments.peek()) != null && first.id <= last.id)
@@ -325,7 +335,7 @@
         }
         catch (Throwable t)
         {
-            // for now just log the error and return false, indicating that we failed
+            // for now just log the error
             logger.error("Failed waiting for a forced recycle of in-use commit log segments", t);
         }
     }
@@ -350,19 +360,6 @@
     }
 
     /**
-     * Differs from the above because it can work on any file instead of just existing
-     * commit log segments managed by this manager.
-     *
-     * @param file segment file that is no longer in use.
-     */
-    void recycleSegment(final File file)
-    {
-        // (don't decrease managed size, since this was never a "live" segment)
-        logger.trace("(Unopened) segment {} is no longer needed and will be deleted now", file);
-        FileUtils.deleteWithConfirm(file);
-    }
-
-    /**
      * Indicates that a segment file should be deleted.
      *
      * @param segment segment to be discarded
@@ -370,14 +367,7 @@
     private void discardSegment(final CommitLogSegment segment, final boolean deleteFile)
     {
         logger.trace("Segment {} is no longer active and will be deleted {}", segment, deleteFile ? "now" : "by the archive script");
-
-        segmentManagementTasks.add(new Runnable()
-        {
-            public void run()
-            {
-                segment.discard(deleteFile);
-            }
-        });
+        segmentManagementTasks.add(() -> discard(segment, deleteFile));
     }
 
     /**
@@ -436,7 +426,7 @@
     {
         if (segments.isEmpty())
             return Futures.immediateFuture(null);
-        final ReplayPosition maxReplayPosition = segments.get(segments.size() - 1).getContext();
+        final CommitLogPosition maxCommitLogPosition = segments.get(segments.size() - 1).getCurrentCommitLogPosition();
 
         // a map of CfId -> forceFlush() to ensure we only queue one flush per cf
         final Map<UUID, ListenableFuture<?>> flushes = new LinkedHashMap<>();
@@ -451,7 +441,7 @@
                     // even though we remove the schema entry before a final flush when dropping a CF,
                     // it's still possible for a writer to race and finish his append after the flush.
                     logger.trace("Marking clean CF {} that doesn't exist anymore", dirtyCFId);
-                    segment.markClean(dirtyCFId, segment.getContext());
+                    segment.markClean(dirtyCFId, segment.getCurrentCommitLogPosition());
                 }
                 else if (!flushes.containsKey(dirtyCFId))
                 {
@@ -459,7 +449,7 @@
                     final ColumnFamilyStore cfs = Keyspace.open(keyspace).getColumnFamilyStore(dirtyCFId);
                     // can safely call forceFlush here as we will only ever block (briefly) for other attempts to flush,
                     // no deadlock possibility since switchLock removal
-                    flushes.put(dirtyCFId, force ? cfs.forceFlush() : cfs.forceFlush(maxReplayPosition));
+                    flushes.put(dirtyCFId, force ? cfs.forceFlush() : cfs.forceFlush(maxCommitLogPosition));
                 }
             }
         }
@@ -519,11 +509,14 @@
         // waiting completes correctly.
     }
 
-    private static void closeAndDeleteSegmentUnsafe(CommitLogSegment segment, boolean delete)
+    /**
+     * Explicitly for use only during resets in unit testing.
+     */
+    private void closeAndDeleteSegmentUnsafe(CommitLogSegment segment, boolean delete)
     {
         try
         {
-            segment.discard(delete);
+            discard(segment, delete);
         }
         catch (AssertionError ignored)
         {
@@ -532,15 +525,6 @@
     }
 
     /**
-     * Initiates the shutdown process for the management thread.
-     */
-    public void shutdown()
-    {
-        run = false;
-        wakeManager();
-    }
-
-    /**
      * Returns when the management thread terminates.
      */
     public void awaitTermination() throws InterruptedException
@@ -553,7 +537,7 @@
         for (CommitLogSegment segment : availableSegments)
             segment.close();
 
-        CompressedSegment.shutdown();
+        bufferPool.shutdown();
     }
 
     /**
@@ -565,5 +549,34 @@
         return Collections.unmodifiableCollection(activeSegments);
     }
 
+    /**
+     * @return the current CommitLogPosition of the active segment we're allocating from
+     */
+    CommitLogPosition getCurrentPosition()
+    {
+        return allocatingFrom().getCurrentCommitLogPosition();
+    }
+
+    /**
+     * Forces a disk flush on the commit log files that need it.  Blocking.
+     */
+    public void sync(boolean syncAllSegments) throws IOException
+    {
+        CommitLogSegment current = allocatingFrom();
+        for (CommitLogSegment segment : getActiveSegments())
+        {
+            if (!syncAllSegments && segment.id > current.id)
+                return;
+            segment.sync();
+        }
+    }
+
+    /**
+     * Used by compressed and encrypted segments to share a buffer pool across the CLSM.
+     */
+    SimpleCachedBufferPool getBufferPool()
+    {
+        return bufferPool;
+    }
 }
 

diff --git a/src/java/org/apache/cassandra/db/commitlog/AbstractCommitLogService.java b/src/java/org/apache/cassandra/db/commitlog/AbstractCommitLogService.java
index 113d1ba..0ba4f55 100644
--- a/src/java/org/apache/cassandra/db/commitlog/AbstractCommitLogService.java
+++ b/src/java/org/apache/cassandra/db/commitlog/AbstractCommitLogService.java

@@ -29,7 +29,6 @@
 
 public abstract class AbstractCommitLogService
 {
-
     private Thread thread;
     private volatile boolean shutdown = false;
 
@@ -89,7 +88,7 @@
 
                         // sync and signal
                         long syncStarted = System.currentTimeMillis();
-                        //This is a target for Byteman in CommitLogSegmentManagerTest
+                        // This is a target for Byteman in CommitLogSegmentManagerTest
                         commitLog.sync(shutdown);
                         lastSyncedAt = syncStarted;
                         syncComplete.signalAll();

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLog.java b/src/java/org/apache/cassandra/db/commitlog/CommitLog.java
index dcdd855..b66221c 100644
--- a/src/java/org/apache/cassandra/db/commitlog/CommitLog.java
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLog.java

@@ -22,32 +22,35 @@
 import java.nio.ByteBuffer;
 import java.util.*;
 import java.util.zip.CRC32;
-
 import javax.management.MBeanServer;
 import javax.management.ObjectName;
 
 import com.google.common.annotations.VisibleForTesting;
-
+import org.apache.commons.lang3.StringUtils;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import org.apache.commons.lang3.StringUtils;
-
 import org.apache.cassandra.config.Config;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.ParameterizedClass;
 import org.apache.cassandra.db.*;
+import org.apache.cassandra.exceptions.WriteTimeoutException;
 import org.apache.cassandra.io.FSWriteError;
-import org.apache.cassandra.schema.CompressionParams;
 import org.apache.cassandra.io.compress.ICompressor;
 import org.apache.cassandra.io.util.BufferedDataOutputStreamPlus;
 import org.apache.cassandra.io.util.DataOutputBufferFixed;
+import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.metrics.CommitLogMetrics;
 import org.apache.cassandra.net.MessagingService;
+import org.apache.cassandra.schema.CompressionParams;
+import org.apache.cassandra.security.EncryptionContext;
 import org.apache.cassandra.service.StorageService;
+import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.JVMStabilityInspector;
 
-import static org.apache.cassandra.db.commitlog.CommitLogSegment.*;
+import static org.apache.cassandra.db.commitlog.CommitLogSegment.Allocation;
+import static org.apache.cassandra.db.commitlog.CommitLogSegment.CommitLogSegmentFileComparator;
+import static org.apache.cassandra.db.commitlog.CommitLogSegment.ENTRY_OVERHEAD_SIZE;
 import static org.apache.cassandra.utils.FBUtilities.updateChecksum;
 import static org.apache.cassandra.utils.FBUtilities.updateChecksumInt;
 
@@ -63,19 +66,19 @@
 
     // we only permit records HALF the size of a commit log, to ensure we don't spin allocating many mostly
     // empty segments when writing large records
-    private final long MAX_MUTATION_SIZE = DatabaseDescriptor.getMaxMutationSize();
+    final long MAX_MUTATION_SIZE = DatabaseDescriptor.getMaxMutationSize();
 
-    public final CommitLogSegmentManager allocator;
+    final public AbstractCommitLogSegmentManager segmentManager;
+
     public final CommitLogArchiver archiver;
     final CommitLogMetrics metrics;
     final AbstractCommitLogService executor;
 
     volatile Configuration configuration;
-    final public String location;
 
     private static CommitLog construct()
     {
-        CommitLog log = new CommitLog(DatabaseDescriptor.getCommitLogLocation(), CommitLogArchiver.construct());
+        CommitLog log = new CommitLog(CommitLogArchiver.construct());
 
         MBeanServer mbs = ManagementFactory.getPlatformMBeanServer();
         try
@@ -90,10 +93,10 @@
     }
 
     @VisibleForTesting
-    CommitLog(String location, CommitLogArchiver archiver)
+    CommitLog(CommitLogArchiver archiver)
     {
-        this.location = location;
-        this.configuration = new Configuration(DatabaseDescriptor.getCommitLogCompression());
+        this.configuration = new Configuration(DatabaseDescriptor.getCommitLogCompression(),
+                                               DatabaseDescriptor.getEncryptionContext());
         DatabaseDescriptor.createAllDirectories();
 
         this.archiver = archiver;
@@ -103,16 +106,18 @@
                 ? new BatchCommitLogService(this)
                 : new PeriodicCommitLogService(this);
 
-        allocator = new CommitLogSegmentManager(this);
+        segmentManager = DatabaseDescriptor.isCDCEnabled()
+                         ? new CommitLogSegmentManagerCDC(this, DatabaseDescriptor.getCommitLogLocation())
+                         : new CommitLogSegmentManagerStandard(this, DatabaseDescriptor.getCommitLogLocation());
 
         // register metrics
-        metrics.attach(executor, allocator);
+        metrics.attach(executor, segmentManager);
     }
 
     CommitLog start()
     {
         executor.start();
-        allocator.start();
+        segmentManager.start();
         return this;
     }
 
@@ -120,11 +125,12 @@
      * Perform recovery on commit logs located in the directory specified by the config file.
      *
      * @return the number of mutations replayed
+     * @throws IOException
      */
-    public int recover() throws IOException
+    public int recoverSegmentsOnDisk() throws IOException
     {
         // If createReserveSegments is already flipped, the CLSM is running and recovery has already taken place.
-        if (allocator.createReserveSegments)
+        if (segmentManager.createReserveSegments)
             return 0;
 
         FilenameFilter unmanagedFilesFilter = new FilenameFilter()
@@ -133,13 +139,13 @@
             {
                 // we used to try to avoid instantiating commitlog (thus creating an empty segment ready for writes)
                 // until after recover was finished.  this turns out to be fragile; it is less error-prone to go
-                // ahead and allow writes before recover(), and just skip active segments when we do.
+                // ahead and allow writes before recover, and just skip active segments when we do.
                 return CommitLogDescriptor.isValid(name) && CommitLogSegment.shouldReplay(name);
             }
         };
 
-        // submit all existing files in the commit log dir for archiving prior to recovery - CASSANDRA-6904
-        for (File file : new File(DatabaseDescriptor.getCommitLogLocation()).listFiles(unmanagedFilesFilter))
+        // submit all files for this segment manager for archiving prior to recovery - CASSANDRA-6904
+        for (File file : new File(segmentManager.storageDirectory).listFiles(unmanagedFilesFilter))
         {
             archiver.maybeArchive(file.getPath(), file.getName());
             archiver.maybeWaitForArchiving(file.getName());
@@ -148,7 +154,7 @@
         assert archiver.archivePending.isEmpty() : "Not all commit log archive tasks were completed before restore";
         archiver.maybeRestoreArchive();
 
-        File[] files = new File(DatabaseDescriptor.getCommitLogLocation()).listFiles(unmanagedFilesFilter);
+        File[] files = new File(segmentManager.storageDirectory).listFiles(unmanagedFilesFilter);
         int replayed = 0;
         if (files.length == 0)
         {
@@ -158,14 +164,14 @@
         {
             Arrays.sort(files, new CommitLogSegmentFileComparator());
             logger.info("Replaying {}", StringUtils.join(files, ", "));
-            replayed = recover(files);
+            replayed = recoverFiles(files);
             logger.info("Log replay complete, {} replayed mutations", replayed);
 
             for (File f : files)
-                allocator.recycleSegment(f);
+                segmentManager.handleReplayedSegment(f);
         }
 
-        allocator.enableReserveSegmentCreation();
+        segmentManager.enableReserveSegmentCreation();
         return replayed;
     }
 
@@ -175,30 +181,35 @@
      * @param clogs   the list of commit log files to replay
      * @return the number of mutations replayed
      */
-    public int recover(File... clogs) throws IOException
+    public int recoverFiles(File... clogs) throws IOException
     {
-        CommitLogReplayer recovery = CommitLogReplayer.construct(this);
-        recovery.recover(clogs);
-        return recovery.blockForWrites();
+        CommitLogReplayer replayer = CommitLogReplayer.construct(this);
+        replayer.replayFiles(clogs);
+        return replayer.blockForWrites();
+    }
+
+    public void recoverPath(String path) throws IOException
+    {
+        CommitLogReplayer replayer = CommitLogReplayer.construct(this);
+        replayer.replayPath(new File(path), false);
+        replayer.blockForWrites();
     }
 
     /**
-     * Perform recovery on a single commit log.
+     * Perform recovery on a single commit log. Kept w/sub-optimal name due to coupling w/MBean / JMX
      */
     public void recover(String path) throws IOException
     {
-        CommitLogReplayer recovery = CommitLogReplayer.construct(this);
-        recovery.recover(new File(path), false);
-        recovery.blockForWrites();
+        recoverPath(path);
     }
 
     /**
-     * @return a ReplayPosition which, if >= one returned from add(), implies add() was started
+     * @return a CommitLogPosition which, if {@code >= one} returned from add(), implies add() was started
      * (but not necessarily finished) prior to this call
      */
-    public ReplayPosition getContext()
+    public CommitLogPosition getCurrentPosition()
     {
-        return allocator.allocatingFrom().getContext();
+        return segmentManager.getCurrentPosition();
     }
 
     /**
@@ -206,7 +217,7 @@
      */
     public void forceRecycleAllSegments(Iterable<UUID> droppedCfs)
     {
-        allocator.forceRecycleAll(droppedCfs);
+        segmentManager.forceRecycleAll(droppedCfs);
     }
 
     /**
@@ -214,21 +225,15 @@
      */
     public void forceRecycleAllSegments()
     {
-        allocator.forceRecycleAll(Collections.<UUID>emptyList());
+        segmentManager.forceRecycleAll(Collections.<UUID>emptyList());
     }
 
     /**
      * Forces a disk flush on the commit log files that need it.  Blocking.
      */
-    public void sync(boolean syncAllSegments)
+    public void sync(boolean syncAllSegments) throws IOException
     {
-        CommitLogSegment current = allocator.allocatingFrom();
-        for (CommitLogSegment segment : allocator.getActiveSegments())
-        {
-            if (!syncAllSegments && segment.id > current.id)
-                return;
-            segment.sync();
-        }
+        segmentManager.sync(syncAllSegments);
     }
 
     /**
@@ -240,11 +245,12 @@
     }
 
     /**
-     * Add a Mutation to the commit log.
+     * Add a Mutation to the commit log. If CDC is enabled, this can fail.
      *
      * @param mutation the Mutation to add to the log
+     * @throws WriteTimeoutException
      */
-    public ReplayPosition add(Mutation mutation)
+    public CommitLogPosition add(Mutation mutation) throws WriteTimeoutException
     {
         assert mutation != null;
 
@@ -253,11 +259,13 @@
         int totalSize = size + ENTRY_OVERHEAD_SIZE;
         if (totalSize > MAX_MUTATION_SIZE)
         {
-            throw new IllegalArgumentException(String.format("Mutation of %s bytes is too large for the maximum size of %s",
-                                                             totalSize, MAX_MUTATION_SIZE));
+            throw new IllegalArgumentException(String.format("Mutation of %s is too large for the maximum size of %s",
+                                                             FBUtilities.prettyPrintMemory(totalSize),
+                                                             FBUtilities.prettyPrintMemory(MAX_MUTATION_SIZE)));
         }
 
-        Allocation alloc = allocator.allocate(mutation, (int) totalSize);
+        Allocation alloc = segmentManager.allocate(mutation, totalSize);
+
         CRC32 checksum = new CRC32();
         final ByteBuffer buffer = alloc.getBuffer();
         try (BufferedDataOutputStreamPlus dos = new DataOutputBufferFixed(buffer))
@@ -282,7 +290,7 @@
         }
 
         executor.finishWriteFor(alloc);
-        return alloc.getReplayPosition();
+        return alloc.getCommitLogPosition();
     }
 
     /**
@@ -290,17 +298,17 @@
      * given. Discards any commit log segments that are no longer used.
      *
      * @param cfId    the column family ID that was flushed
-     * @param context the replay position of the flush
+     * @param context the commit log segment position of the flush
      */
-    public void discardCompletedSegments(final UUID cfId, final ReplayPosition context)
+    public void discardCompletedSegments(final UUID cfId, final CommitLogPosition context)
     {
         logger.trace("discard completed log segments for {}, table {}", context, cfId);
 
         // Go thru the active segment files, which are ordered oldest to newest, marking the
-        // flushed CF as clean, until we reach the segment file containing the ReplayPosition passed
+        // flushed CF as clean, until we reach the segment file containing the CommitLogPosition passed
         // in the arguments. Any segments that become unused after they are marked clean will be
         // recycled or discarded.
-        for (Iterator<CommitLogSegment> iter = allocator.getActiveSegments().iterator(); iter.hasNext();)
+        for (Iterator<CommitLogSegment> iter = segmentManager.getActiveSegments().iterator(); iter.hasNext();)
         {
             CommitLogSegment segment = iter.next();
             segment.markClean(cfId, context);
@@ -308,7 +316,7 @@
             if (segment.isUnused())
             {
                 logger.trace("Commit log segment {} is unused", segment);
-                allocator.recycleSegment(segment);
+                segmentManager.recycleSegment(segment);
             }
             else
             {
@@ -356,8 +364,8 @@
     public List<String> getActiveSegmentNames()
     {
         List<String> segmentNames = new ArrayList<>();
-        for (CommitLogSegment segment : allocator.getActiveSegments())
-            segmentNames.add(segment.getName());
+        for (CommitLogSegment seg : segmentManager.getActiveSegments())
+            segmentNames.add(seg.getName());
         return segmentNames;
     }
 
@@ -370,23 +378,23 @@
     public long getActiveContentSize()
     {
         long size = 0;
-        for (CommitLogSegment segment : allocator.getActiveSegments())
-            size += segment.contentSize();
+        for (CommitLogSegment seg : segmentManager.getActiveSegments())
+            size += seg.contentSize();
         return size;
     }
 
     @Override
     public long getActiveOnDiskSize()
     {
-        return allocator.onDiskSize();
+        return segmentManager.onDiskSize();
     }
 
     @Override
     public Map<String, Double> getActiveSegmentCompressionRatios()
     {
         Map<String, Double> segmentRatios = new TreeMap<>();
-        for (CommitLogSegment segment : allocator.getActiveSegments())
-            segmentRatios.put(segment.getName(), 1.0 * segment.onDiskSize() / segment.contentSize());
+        for (CommitLogSegment seg : segmentManager.getActiveSegments())
+            segmentRatios.put(seg.getName(), 1.0 * seg.onDiskSize() / seg.contentSize());
         return segmentRatios;
     }
 
@@ -397,12 +405,12 @@
     {
         executor.shutdown();
         executor.awaitTermination();
-        allocator.shutdown();
-        allocator.awaitTermination();
+        segmentManager.shutdown();
+        segmentManager.awaitTermination();
     }
 
     /**
-     * FOR TESTING PURPOSES. See CommitLogAllocator.
+     * FOR TESTING PURPOSES
      * @return the number of files recovered
      */
     public int resetUnsafe(boolean deleteSegments) throws IOException
@@ -413,7 +421,15 @@
     }
 
     /**
-     * FOR TESTING PURPOSES. See CommitLogAllocator.
+     * FOR TESTING PURPOSES.
+     */
+    public void resetConfiguration()
+    {
+        configuration = new Configuration(DatabaseDescriptor.getCommitLogCompression(),
+                                          DatabaseDescriptor.getEncryptionContext());
+    }
+
+    /**
      */
     public void stopUnsafe(boolean deleteSegments)
     {
@@ -426,28 +442,24 @@
         {
             throw new RuntimeException(e);
         }
-        allocator.stopUnsafe(deleteSegments);
+        segmentManager.stopUnsafe(deleteSegments);
         CommitLogSegment.resetReplayLimit();
+        if (DatabaseDescriptor.isCDCEnabled() && deleteSegments)
+            for (File f : new File(DatabaseDescriptor.getCDCLogLocation()).listFiles())
+                FileUtils.deleteWithConfirm(f);
+
     }
 
     /**
-     * FOR TESTING PURPOSES.
-     */
-    public void resetConfiguration()
-    {
-        configuration = new Configuration(DatabaseDescriptor.getCommitLogCompression());
-    }
-
-    /**
-     * FOR TESTING PURPOSES.  See CommitLogAllocator
+     * FOR TESTING PURPOSES
      */
     public int restartUnsafe() throws IOException
     {
-        allocator.start();
+        segmentManager.start();
         executor.restartUnsafe();
         try
         {
-            return recover();
+            return recoverSegmentsOnDisk();
         }
         catch (FSWriteError e)
         {
@@ -462,16 +474,6 @@
         }
     }
 
-    /**
-     * Used by tests.
-     *
-     * @return the number of active segments (segments with unflushed data in them)
-     */
-    public int activeSegments()
-    {
-        return allocator.getActiveSegments().size();
-    }
-
     @VisibleForTesting
     public static boolean handleCommitError(String message, Throwable t)
     {
@@ -506,10 +508,16 @@
          */
         private final ICompressor compressor;
 
-        public Configuration(ParameterizedClass compressorClass)
+        /**
+         * The encryption context used to encrypt the segments.
+         */
+        private EncryptionContext encryptionContext;
+
+        public Configuration(ParameterizedClass compressorClass, EncryptionContext encryptionContext)
         {
             this.compressorClass = compressorClass;
             this.compressor = compressorClass != null ? CompressionParams.createCompressor(compressorClass) : null;
+            this.encryptionContext = encryptionContext;
         }
 
         /**
@@ -522,6 +530,15 @@
         }
 
         /**
+         * Checks if the segments must be encrypted.
+         * @return <code>true</code> if the segments must be encrypted, <code>false</code> otherwise.
+         */
+        public boolean useEncryption()
+        {
+            return encryptionContext.isEnabled();
+        }
+
+        /**
          * Returns the compressor used to compress the segments.
          * @return the compressor used to compress the segments
          */
@@ -547,5 +564,14 @@
         {
             return useCompression() ? compressor.getClass().getSimpleName() : "none";
         }
+
+        /**
+         * Returns the encryption context used to encrypt the segments.
+         * @return the encryption context used to encrypt the segments
+         */
+        public EncryptionContext getEncryptionContext()
+        {
+            return encryptionContext;
+        }
     }
 }

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java b/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java
index 5547d0e..044f2db 100644
--- a/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLogArchiver.java

@@ -29,6 +29,8 @@
 import java.util.Properties;
 import java.util.TimeZone;
 import java.util.concurrent.*;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
 
 import org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor;
 import org.apache.cassandra.config.DatabaseDescriptor;
@@ -46,6 +48,10 @@
     private static final Logger logger = LoggerFactory.getLogger(CommitLogArchiver.class);
     public static final SimpleDateFormat format = new SimpleDateFormat("yyyy:MM:dd HH:mm:ss");
     private static final String DELIMITER = ",";
+    private static final Pattern NAME = Pattern.compile("%name");
+    private static final Pattern PATH = Pattern.compile("%path");
+    private static final Pattern FROM = Pattern.compile("%from");
+    private static final Pattern TO = Pattern.compile("%to");
     static
     {
         format.setTimeZone(TimeZone.getTimeZone("GMT"));
@@ -136,8 +142,8 @@
             protected void runMayThrow() throws IOException
             {
                 segment.waitForFinalSync();
-                String command = archiveCommand.replace("%name", segment.getName());
-                command = command.replace("%path", segment.getPath());
+                String command = NAME.matcher(archiveCommand).replaceAll(Matcher.quoteReplacement(segment.getName()));
+                command = PATH.matcher(command).replaceAll(Matcher.quoteReplacement(segment.getPath()));
                 exec(command);
             }
         }));
@@ -158,8 +164,8 @@
         {
             protected void runMayThrow() throws IOException
             {
-                String command = archiveCommand.replace("%name", name);
-                command = command.replace("%path", path);
+                String command = NAME.matcher(archiveCommand).replaceAll(Matcher.quoteReplacement(name));
+                command = PATH.matcher(command).replaceAll(Matcher.quoteReplacement(path));
                 exec(command);
             }
         }));
@@ -209,7 +215,7 @@
             }
             for (File fromFile : files)
             {
-                CommitLogDescriptor fromHeader = CommitLogDescriptor.fromHeader(fromFile);
+                CommitLogDescriptor fromHeader = CommitLogDescriptor.fromHeader(fromFile, DatabaseDescriptor.getEncryptionContext());
                 CommitLogDescriptor fromName = CommitLogDescriptor.isValid(fromFile.getName()) ? CommitLogDescriptor.fromFileName(fromFile.getName()) : null;
                 CommitLogDescriptor descriptor;
                 if (fromHeader == null && fromName == null)
@@ -244,8 +250,8 @@
                     continue;
                 }
 
-                String command = restoreCommand.replace("%from", fromFile.getPath());
-                command = command.replace("%to", toFile.getPath());
+                String command = FROM.matcher(restoreCommand).replaceAll(Matcher.quoteReplacement(fromFile.getPath()));
+                command = TO.matcher(command).replaceAll(Matcher.quoteReplacement(toFile.getPath()));
                 try
                 {
                     exec(command);

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLogDescriptor.java b/src/java/org/apache/cassandra/db/commitlog/CommitLogDescriptor.java
index 6774d39..60c5a39 100644
--- a/src/java/org/apache/cassandra/db/commitlog/CommitLogDescriptor.java
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLogDescriptor.java

@@ -27,6 +27,7 @@
 import java.io.RandomAccessFile;
 import java.nio.ByteBuffer;
 import java.nio.charset.StandardCharsets;
+import java.util.Collections;
 import java.util.Map;
 import java.util.TreeMap;
 import java.util.regex.Matcher;
@@ -40,6 +41,7 @@
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.io.FSReadError;
 import org.apache.cassandra.net.MessagingService;
+import org.apache.cassandra.security.EncryptionContext;
 import org.json.simple.JSONValue;
 
 import static org.apache.cassandra.utils.FBUtilities.updateChecksumInt;
@@ -51,14 +53,16 @@
     private static final String FILENAME_EXTENSION = ".log";
     // match both legacy and new version of commitlogs Ex: CommitLog-12345.log and CommitLog-4-12345.log.
     private static final Pattern COMMIT_LOG_FILE_PATTERN = Pattern.compile(FILENAME_PREFIX + "((\\d+)(" + SEPARATOR + "\\d+)?)" + FILENAME_EXTENSION);
-    private static final String COMPRESSION_PARAMETERS_KEY = "compressionParameters";
-    private static final String COMPRESSION_CLASS_KEY = "compressionClass";
+
+    static final String COMPRESSION_PARAMETERS_KEY = "compressionParameters";
+    static final String COMPRESSION_CLASS_KEY = "compressionClass";
 
     public static final int VERSION_12 = 2;
     public static final int VERSION_20 = 3;
     public static final int VERSION_21 = 4;
     public static final int VERSION_22 = 5;
     public static final int VERSION_30 = 6;
+
     /**
      * Increment this number if there is a changes in the commit log disc layout or MessagingVersion changes.
      * Note: make sure to handle {@link #getMessagingVersion()}
@@ -69,21 +73,31 @@
     final int version;
     public final long id;
     public final ParameterizedClass compression;
+    private final EncryptionContext encryptionContext;
 
-    public CommitLogDescriptor(int version, long id, ParameterizedClass compression)
+    public CommitLogDescriptor(int version, long id, ParameterizedClass compression, EncryptionContext encryptionContext)
     {
         this.version = version;
         this.id = id;
         this.compression = compression;
+        this.encryptionContext = encryptionContext;
     }
 
-    public CommitLogDescriptor(long id, ParameterizedClass compression)
+    public CommitLogDescriptor(long id, ParameterizedClass compression, EncryptionContext encryptionContext)
     {
-        this(current_version, id, compression);
+        this(current_version, id, compression, encryptionContext);
     }
 
     public static void writeHeader(ByteBuffer out, CommitLogDescriptor descriptor)
     {
+        writeHeader(out, descriptor, Collections.<String, String>emptyMap());
+    }
+
+    /**
+     * @param additionalHeaders Allow segments to pass custom header data
+     */
+    public static void writeHeader(ByteBuffer out, CommitLogDescriptor descriptor, Map<String, String> additionalHeaders)
+    {
         CRC32 crc = new CRC32();
         out.putInt(descriptor.version);
         updateChecksumInt(crc, descriptor.version);
@@ -91,7 +105,7 @@
         updateChecksumInt(crc, (int) (descriptor.id & 0xFFFFFFFFL));
         updateChecksumInt(crc, (int) (descriptor.id >>> 32));
         if (descriptor.version >= VERSION_22) {
-            String parametersString = constructParametersString(descriptor);
+            String parametersString = constructParametersString(descriptor.compression, descriptor.encryptionContext, additionalHeaders);
             byte[] parametersBytes = parametersString.getBytes(StandardCharsets.UTF_8);
             if (parametersBytes.length != (((short) parametersBytes.length) & 0xFFFF))
                 throw new ConfigurationException(String.format("Compression parameters too long, length %d cannot be above 65535.",
@@ -105,24 +119,27 @@
         out.putInt((int) crc.getValue());
     }
 
-    private static String constructParametersString(CommitLogDescriptor descriptor)
+    @VisibleForTesting
+    static String constructParametersString(ParameterizedClass compression, EncryptionContext encryptionContext, Map<String, String> additionalHeaders)
     {
-        Map<String, Object> params = new TreeMap<String, Object>();
-        ParameterizedClass compression = descriptor.compression;
+        Map<String, Object> params = new TreeMap<>();
         if (compression != null)
         {
             params.put(COMPRESSION_PARAMETERS_KEY, compression.parameters);
             params.put(COMPRESSION_CLASS_KEY, compression.class_name);
         }
+        if (encryptionContext != null)
+            params.putAll(encryptionContext.toHeaderParameters());
+        params.putAll(additionalHeaders);
         return JSONValue.toJSONString(params);
     }
 
-    public static CommitLogDescriptor fromHeader(File file)
+    public static CommitLogDescriptor fromHeader(File file, EncryptionContext encryptionContext)
     {
         try (RandomAccessFile raf = new RandomAccessFile(file, "r"))
         {
             assert raf.getFilePointer() == 0;
-            return readHeader(raf);
+            return readHeader(raf, encryptionContext);
         }
         catch (EOFException e)
         {
@@ -134,7 +151,7 @@
         }
     }
 
-    public static CommitLogDescriptor readHeader(DataInput input) throws IOException
+    public static CommitLogDescriptor readHeader(DataInput input, EncryptionContext encryptionContext) throws IOException
     {
         CRC32 checkcrc = new CRC32();
         int version = input.readInt();
@@ -153,16 +170,20 @@
         input.readFully(parametersBytes);
         checkcrc.update(parametersBytes, 0, parametersBytes.length);
         int crc = input.readInt();
+
         if (crc == (int) checkcrc.getValue())
-            return new CommitLogDescriptor(version, id,
-                    parseCompression((Map<?, ?>) JSONValue.parse(new String(parametersBytes, StandardCharsets.UTF_8))));
+        {
+            Map<?, ?> map = (Map<?, ?>) JSONValue.parse(new String(parametersBytes, StandardCharsets.UTF_8));
+            return new CommitLogDescriptor(version, id, parseCompression(map), EncryptionContext.createFromMap(map, encryptionContext));
+        }
         return null;
     }
 
     @SuppressWarnings("unchecked")
-    private static ParameterizedClass parseCompression(Map<?, ?> params)
+    @VisibleForTesting
+    static ParameterizedClass parseCompression(Map<?, ?> params)
     {
-        if (params == null)
+        if (params == null || params.isEmpty())
             return null;
         String className = (String) params.get(COMPRESSION_CLASS_KEY);
         if (className == null)
@@ -182,7 +203,7 @@
             throw new UnsupportedOperationException("Commitlog segment is too old to open; upgrade to 1.2.5+ first");
 
         long id = Long.parseLong(matcher.group(3).split(SEPARATOR)[1]);
-        return new CommitLogDescriptor(Integer.parseInt(matcher.group(2)), id, null);
+        return new CommitLogDescriptor(Integer.parseInt(matcher.group(2)), id, null, new EncryptionContext());
     }
 
     public int getMessagingVersion()
@@ -218,6 +239,11 @@
         return COMMIT_LOG_FILE_PATTERN.matcher(filename).matches();
     }
 
+    public EncryptionContext getEncryptionContext()
+    {
+        return encryptionContext;
+    }
+
     public String toString()
     {
         return "(" + version + "," + id + (compression != null ? "," + compression : "") + ")";
@@ -235,7 +261,7 @@
 
     public boolean equals(CommitLogDescriptor that)
     {
-        return equalsIgnoringCompression(that) && Objects.equal(this.compression, that.compression);
+        return equalsIgnoringCompression(that) && Objects.equal(this.compression, that.compression)
+                && Objects.equal(encryptionContext, that.encryptionContext);
     }
-
 }

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLogPosition.java b/src/java/org/apache/cassandra/db/commitlog/CommitLogPosition.java
new file mode 100644
index 0000000..84054a4
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLogPosition.java

@@ -0,0 +1,121 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db.commitlog;
+
+import java.io.IOException;
+import java.util.Comparator;
+
+import org.apache.cassandra.db.TypeSizes;
+import org.apache.cassandra.io.ISerializer;
+import org.apache.cassandra.io.util.DataInputPlus;
+import org.apache.cassandra.io.util.DataOutputPlus;
+
+/**
+ * Contains a segment id and a position for CommitLogSegment identification.
+ * Used for both replay and general CommitLog file reading.
+ */
+public class CommitLogPosition implements Comparable<CommitLogPosition>
+{
+    public static final CommitLogPositionSerializer serializer = new CommitLogPositionSerializer();
+
+    // NONE is used for SSTables that are streamed from other nodes and thus have no relationship
+    // with our local commitlog. The values satisfy the criteria that
+    //  - no real commitlog segment will have the given id
+    //  - it will sort before any real CommitLogPosition, so it will be effectively ignored by getCommitLogPosition
+    public static final CommitLogPosition NONE = new CommitLogPosition(-1, 0);
+
+    public final long segmentId;
+    public final int position;
+
+    public static final Comparator<CommitLogPosition> comparator = new Comparator<CommitLogPosition>()
+    {
+        public int compare(CommitLogPosition o1, CommitLogPosition o2)
+        {
+            if (o1.segmentId != o2.segmentId)
+            	return Long.compare(o1.segmentId,  o2.segmentId);
+
+            return Integer.compare(o1.position, o2.position);
+        }
+    };
+
+    public CommitLogPosition(long segmentId, int position)
+    {
+        this.segmentId = segmentId;
+        assert position >= 0;
+        this.position = position;
+    }
+
+    public int compareTo(CommitLogPosition other)
+    {
+        return comparator.compare(this, other);
+    }
+
+    @Override
+    public boolean equals(Object o)
+    {
+        if (this == o) return true;
+        if (o == null || getClass() != o.getClass()) return false;
+
+        CommitLogPosition that = (CommitLogPosition) o;
+
+        if (position != that.position) return false;
+        return segmentId == that.segmentId;
+    }
+
+    @Override
+    public int hashCode()
+    {
+        int result = (int) (segmentId ^ (segmentId >>> 32));
+        result = 31 * result + position;
+        return result;
+    }
+
+    @Override
+    public String toString()
+    {
+        return "CommitLogPosition(" +
+               "segmentId=" + segmentId +
+               ", position=" + position +
+               ')';
+    }
+
+    public CommitLogPosition clone()
+    {
+        return new CommitLogPosition(segmentId, position);
+    }
+
+
+    public static class CommitLogPositionSerializer implements ISerializer<CommitLogPosition>
+    {
+        public void serialize(CommitLogPosition clsp, DataOutputPlus out) throws IOException
+        {
+            out.writeLong(clsp.segmentId);
+            out.writeInt(clsp.position);
+        }
+
+        public CommitLogPosition deserialize(DataInputPlus in) throws IOException
+        {
+            return new CommitLogPosition(in.readLong(), in.readInt());
+        }
+
+        public long serializedSize(CommitLogPosition clsp)
+        {
+            return TypeSizes.sizeof(clsp.segmentId) + TypeSizes.sizeof(clsp.position);
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLogReadHandler.java b/src/java/org/apache/cassandra/db/commitlog/CommitLogReadHandler.java
new file mode 100644
index 0000000..0602147
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLogReadHandler.java

@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.commitlog;
+
+import java.io.IOException;
+
+import org.apache.cassandra.db.Mutation;
+
+public interface CommitLogReadHandler
+{
+    enum CommitLogReadErrorReason
+    {
+        RECOVERABLE_DESCRIPTOR_ERROR,
+        UNRECOVERABLE_DESCRIPTOR_ERROR,
+        MUTATION_ERROR,
+        UNRECOVERABLE_UNKNOWN_ERROR,
+        EOF
+    }
+
+    class CommitLogReadException extends IOException
+    {
+        public final CommitLogReadErrorReason reason;
+        public final boolean permissible;
+
+        CommitLogReadException(String message, CommitLogReadErrorReason reason, boolean permissible)
+        {
+            super(message);
+            this.reason = reason;
+            this.permissible = permissible;
+        }
+    }
+
+    /**
+     * Handle an error during segment read, signaling whether or not you want the reader to skip the remainder of the
+     * current segment on error.
+     *
+     * @param exception CommitLogReadException w/details on exception state
+     * @return boolean indicating whether to stop reading
+     * @throws IOException In the event the handler wants forceful termination of all processing, throw IOException.
+     */
+    boolean shouldSkipSegmentOnError(CommitLogReadException exception) throws IOException;
+
+    /**
+     * In instances where we cannot recover from a specific error and don't care what the reader thinks
+     *
+     * @param exception CommitLogReadException w/details on exception state
+     * @throws IOException
+     */
+    void handleUnrecoverableError(CommitLogReadException exception) throws IOException;
+
+    /**
+     * Process a deserialized mutation
+     *
+     * @param m deserialized mutation
+     * @param size serialized size of the mutation
+     * @param entryLocation filePointer offset inside the CommitLogSegment for the record
+     * @param desc CommitLogDescriptor for mutation being processed
+     */
+    void handleMutation(Mutation m, int size, int entryLocation, CommitLogDescriptor desc);
+}

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java b/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java
new file mode 100644
index 0000000..a914cc9
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLogReader.java

@@ -0,0 +1,503 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db.commitlog;
+
+import java.io.*;
+import java.util.*;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.zip.CRC32;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.apache.commons.lang.StringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.db.Mutation;
+import org.apache.cassandra.db.UnknownColumnFamilyException;
+import org.apache.cassandra.db.commitlog.CommitLogReadHandler.CommitLogReadErrorReason;
+import org.apache.cassandra.db.commitlog.CommitLogReadHandler.CommitLogReadException;
+import org.apache.cassandra.db.partitions.PartitionUpdate;
+import org.apache.cassandra.db.rows.SerializationHelper;
+import org.apache.cassandra.io.util.ChannelProxy;
+import org.apache.cassandra.io.util.DataInputBuffer;
+import org.apache.cassandra.io.util.FileDataInput;
+import org.apache.cassandra.io.util.RandomAccessReader;
+import org.apache.cassandra.io.util.RebufferingInputStream;
+import org.apache.cassandra.utils.JVMStabilityInspector;
+
+import static org.apache.cassandra.utils.FBUtilities.updateChecksumInt;
+
+public class CommitLogReader
+{
+    private static final Logger logger = LoggerFactory.getLogger(CommitLogReader.class);
+
+    private static final int LEGACY_END_OF_SEGMENT_MARKER = 0;
+
+    @VisibleForTesting
+    public static final int ALL_MUTATIONS = -1;
+    private final CRC32 checksum;
+    private final Map<UUID, AtomicInteger> invalidMutations;
+
+    private byte[] buffer;
+
+    public CommitLogReader()
+    {
+        checksum = new CRC32();
+        invalidMutations = new HashMap<>();
+        buffer = new byte[4096];
+    }
+
+    public Set<Map.Entry<UUID, AtomicInteger>> getInvalidMutations()
+    {
+        return invalidMutations.entrySet();
+    }
+
+    /**
+     * Reads all passed in files with no minimum, no start, and no mutation limit.
+     */
+    public void readAllFiles(CommitLogReadHandler handler, File[] files) throws IOException
+    {
+        readAllFiles(handler, files, CommitLogPosition.NONE);
+    }
+
+    /**
+     * Reads all passed in files with minPosition, no start, and no mutation limit.
+     */
+    public void readAllFiles(CommitLogReadHandler handler, File[] files, CommitLogPosition minPosition) throws IOException
+    {
+        for (int i = 0; i < files.length; i++)
+            readCommitLogSegment(handler, files[i], minPosition, ALL_MUTATIONS, i + 1 == files.length);
+    }
+
+    /**
+     * Reads passed in file fully
+     */
+    public void readCommitLogSegment(CommitLogReadHandler handler, File file, boolean tolerateTruncation) throws IOException
+    {
+        readCommitLogSegment(handler, file, CommitLogPosition.NONE, ALL_MUTATIONS, tolerateTruncation);
+    }
+
+    /**
+     * Reads passed in file fully, up to mutationLimit count
+     */
+    @VisibleForTesting
+    public void readCommitLogSegment(CommitLogReadHandler handler, File file, int mutationLimit, boolean tolerateTruncation) throws IOException
+    {
+        readCommitLogSegment(handler, file, CommitLogPosition.NONE, mutationLimit, tolerateTruncation);
+    }
+
+    /**
+     * Reads mutations from file, handing them off to handler
+     * @param handler Handler that will take action based on deserialized Mutations
+     * @param file CommitLogSegment file to read
+     * @param minPosition Optional minimum CommitLogPosition - all segments with id > or matching w/greater position will be read
+     * @param mutationLimit Optional limit on # of mutations to replay. Local ALL_MUTATIONS serves as marker to play all.
+     * @param tolerateTruncation Whether or not we should allow truncation of this file or throw if EOF found
+     *
+     * @throws IOException
+     */
+    public void readCommitLogSegment(CommitLogReadHandler handler,
+                                     File file,
+                                     CommitLogPosition minPosition,
+                                     int mutationLimit,
+                                     boolean tolerateTruncation) throws IOException
+    {
+        // just transform from the file name (no reading of headers) to determine version
+        CommitLogDescriptor desc = CommitLogDescriptor.fromFileName(file.getName());
+
+        try(ChannelProxy channel = new ChannelProxy(file);
+            RandomAccessReader reader = RandomAccessReader.open(channel))
+        {
+            if (desc.version < CommitLogDescriptor.VERSION_21)
+            {
+                if (!shouldSkipSegmentId(file, desc, minPosition))
+                {
+                    if (minPosition.segmentId == desc.id)
+                        reader.seek(minPosition.position);
+                    ReadStatusTracker statusTracker = new ReadStatusTracker(mutationLimit, tolerateTruncation);
+                    statusTracker.errorContext = desc.fileName();
+                    readSection(handler, reader, minPosition, (int) reader.length(), statusTracker, desc);
+                }
+                return;
+            }
+
+            final long segmentIdFromFilename = desc.id;
+            try
+            {
+                // The following call can either throw or legitimately return null. For either case, we need to check
+                // desc outside this block and set it to null in the exception case.
+                desc = CommitLogDescriptor.readHeader(reader, DatabaseDescriptor.getEncryptionContext());
+            }
+            catch (Exception e)
+            {
+                desc = null;
+            }
+            if (desc == null)
+            {
+                // don't care about whether or not the handler thinks we can continue. We can't w/out descriptor.
+                handler.handleUnrecoverableError(new CommitLogReadException(
+                    String.format("Could not read commit log descriptor in file %s", file),
+                    CommitLogReadErrorReason.UNRECOVERABLE_DESCRIPTOR_ERROR,
+                    false));
+                return;
+            }
+
+            if (segmentIdFromFilename != desc.id)
+            {
+                if (handler.shouldSkipSegmentOnError(new CommitLogReadException(String.format(
+                    "Segment id mismatch (filename %d, descriptor %d) in file %s", segmentIdFromFilename, desc.id, file),
+                                                                                CommitLogReadErrorReason.RECOVERABLE_DESCRIPTOR_ERROR,
+                                                                                false)))
+                {
+                    return;
+                }
+            }
+
+            if (shouldSkipSegmentId(file, desc, minPosition))
+                return;
+
+            CommitLogSegmentReader segmentReader;
+            try
+            {
+                segmentReader = new CommitLogSegmentReader(handler, desc, reader, tolerateTruncation);
+            }
+            catch(Exception e)
+            {
+                handler.handleUnrecoverableError(new CommitLogReadException(
+                    String.format("Unable to create segment reader for commit log file: %s", e),
+                    CommitLogReadErrorReason.UNRECOVERABLE_UNKNOWN_ERROR,
+                    tolerateTruncation));
+                return;
+            }
+
+            try
+            {
+                ReadStatusTracker statusTracker = new ReadStatusTracker(mutationLimit, tolerateTruncation);
+                for (CommitLogSegmentReader.SyncSegment syncSegment : segmentReader)
+                {
+                    statusTracker.tolerateErrorsInSection &= syncSegment.toleratesErrorsInSection;
+
+                    // Skip segments that are completely behind the desired minPosition
+                    if (desc.id == minPosition.segmentId && syncSegment.endPosition < minPosition.position)
+                        continue;
+
+                    statusTracker.errorContext = String.format("Next section at %d in %s", syncSegment.fileStartPosition, desc.fileName());
+
+                    readSection(handler, syncSegment.input, minPosition, syncSegment.endPosition, statusTracker, desc);
+                    if (!statusTracker.shouldContinue())
+                        break;
+                }
+            }
+            // Unfortunately AbstractIterator cannot throw a checked exception, so we check to see if a RuntimeException
+            // is wrapping an IOException.
+            catch (RuntimeException re)
+            {
+                if (re.getCause() instanceof IOException)
+                    throw (IOException) re.getCause();
+                throw re;
+            }
+            logger.debug("Finished reading {}", file);
+        }
+    }
+
+    /**
+     * Any segment with id >= minPosition.segmentId is a candidate for read.
+     */
+    private boolean shouldSkipSegmentId(File file, CommitLogDescriptor desc, CommitLogPosition minPosition)
+    {
+        logger.debug("Reading {} (CL version {}, messaging version {}, compression {})",
+            file.getPath(),
+            desc.version,
+            desc.getMessagingVersion(),
+            desc.compression);
+
+        if (minPosition.segmentId > desc.id)
+        {
+            logger.trace("Skipping read of fully-flushed {}", file);
+            return true;
+        }
+        return false;
+    }
+
+    /**
+     * Reads a section of a file containing mutations
+     *
+     * @param handler Handler that will take action based on deserialized Mutations
+     * @param reader FileDataInput / logical buffer containing commitlog mutations
+     * @param minPosition CommitLogPosition indicating when we should start actively replaying mutations
+     * @param end logical numeric end of the segment being read
+     * @param statusTracker ReadStatusTracker with current state of mutation count, error state, etc
+     * @param desc Descriptor for CommitLog serialization
+     */
+    private void readSection(CommitLogReadHandler handler,
+                             FileDataInput reader,
+                             CommitLogPosition minPosition,
+                             int end,
+                             ReadStatusTracker statusTracker,
+                             CommitLogDescriptor desc) throws IOException
+    {
+        // seek rather than deserializing mutation-by-mutation to reach the desired minPosition in this SyncSegment
+        if (desc.id == minPosition.segmentId && reader.getFilePointer() < minPosition.position)
+            reader.seek(minPosition.position);
+
+        while (statusTracker.shouldContinue() && reader.getFilePointer() < end && !reader.isEOF())
+        {
+            long mutationStart = reader.getFilePointer();
+            if (logger.isTraceEnabled())
+                logger.trace("Reading mutation at {}", mutationStart);
+
+            long claimedCRC32;
+            int serializedSize;
+            try
+            {
+                // any of the reads may hit EOF
+                serializedSize = reader.readInt();
+                if (serializedSize == LEGACY_END_OF_SEGMENT_MARKER)
+                {
+                    logger.trace("Encountered end of segment marker at {}", reader.getFilePointer());
+                    statusTracker.requestTermination();
+                    return;
+                }
+
+                // Mutation must be at LEAST 10 bytes:
+                //    3 for a non-empty Keyspace
+                //    3 for a Key (including the 2-byte length from writeUTF/writeWithShortLength)
+                //    4 bytes for column count.
+                // This prevents CRC by being fooled by special-case garbage in the file; see CASSANDRA-2128
+                if (serializedSize < 10)
+                {
+                    if (handler.shouldSkipSegmentOnError(new CommitLogReadException(
+                                                    String.format("Invalid mutation size %d at %d in %s", serializedSize, mutationStart, statusTracker.errorContext),
+                                                    CommitLogReadErrorReason.MUTATION_ERROR,
+                                                    statusTracker.tolerateErrorsInSection)))
+                    {
+                        statusTracker.requestTermination();
+                    }
+                    return;
+                }
+
+                long claimedSizeChecksum = CommitLogFormat.calculateClaimedChecksum(reader, desc.version);
+                checksum.reset();
+                CommitLogFormat.updateChecksum(checksum, serializedSize, desc.version);
+
+                if (checksum.getValue() != claimedSizeChecksum)
+                {
+                    if (handler.shouldSkipSegmentOnError(new CommitLogReadException(
+                                                    String.format("Mutation size checksum failure at %d in %s", mutationStart, statusTracker.errorContext),
+                                                    CommitLogReadErrorReason.MUTATION_ERROR,
+                                                    statusTracker.tolerateErrorsInSection)))
+                    {
+                        statusTracker.requestTermination();
+                    }
+                    return;
+                }
+
+                if (serializedSize > buffer.length)
+                    buffer = new byte[(int) (1.2 * serializedSize)];
+                reader.readFully(buffer, 0, serializedSize);
+
+                claimedCRC32 = CommitLogFormat.calculateClaimedCRC32(reader, desc.version);
+            }
+            catch (EOFException eof)
+            {
+                if (handler.shouldSkipSegmentOnError(new CommitLogReadException(
+                                                String.format("Unexpected end of segment at %d in %s", mutationStart, statusTracker.errorContext),
+                                                CommitLogReadErrorReason.EOF,
+                                                statusTracker.tolerateErrorsInSection)))
+                {
+                    statusTracker.requestTermination();
+                }
+                return;
+            }
+
+            checksum.update(buffer, 0, serializedSize);
+            if (claimedCRC32 != checksum.getValue())
+            {
+                if (handler.shouldSkipSegmentOnError(new CommitLogReadException(
+                                                String.format("Mutation checksum failure at %d in %s", mutationStart, statusTracker.errorContext),
+                                                CommitLogReadErrorReason.MUTATION_ERROR,
+                                                statusTracker.tolerateErrorsInSection)))
+                {
+                    statusTracker.requestTermination();
+                }
+                continue;
+            }
+
+            long mutationPosition = reader.getFilePointer();
+            readMutation(handler, buffer, serializedSize, minPosition, (int)mutationPosition, desc);
+
+            // Only count this as a processed mutation if it is after our min as we suppress reading of mutations that
+            // are before this mark.
+            if (mutationPosition >= minPosition.position)
+                statusTracker.addProcessedMutation();
+        }
+    }
+
+    /**
+     * Deserializes and passes a Mutation to the ICommitLogReadHandler requested
+     *
+     * @param handler Handler that will take action based on deserialized Mutations
+     * @param inputBuffer raw byte array w/Mutation data
+     * @param size deserialized size of mutation
+     * @param minPosition We need to suppress replay of mutations that are before the required minPosition
+     * @param entryLocation filePointer offset of mutation within CommitLogSegment
+     * @param desc CommitLogDescriptor being worked on
+     */
+    @VisibleForTesting
+    protected void readMutation(CommitLogReadHandler handler,
+                                byte[] inputBuffer,
+                                int size,
+                                CommitLogPosition minPosition,
+                                final int entryLocation,
+                                final CommitLogDescriptor desc) throws IOException
+    {
+        // For now, we need to go through the motions of deserializing the mutation to determine its size and move
+        // the file pointer forward accordingly, even if we're behind the requested minPosition within this SyncSegment.
+        boolean shouldReplay = entryLocation > minPosition.position;
+
+        final Mutation mutation;
+        try (RebufferingInputStream bufIn = new DataInputBuffer(inputBuffer, 0, size))
+        {
+            mutation = Mutation.serializer.deserialize(bufIn,
+                                                       desc.getMessagingVersion(),
+                                                       SerializationHelper.Flag.LOCAL);
+            // doublecheck that what we read is still] valid for the current schema
+            for (PartitionUpdate upd : mutation.getPartitionUpdates())
+                upd.validate();
+        }
+        catch (UnknownColumnFamilyException ex)
+        {
+            if (ex.cfId == null)
+                return;
+            AtomicInteger i = invalidMutations.get(ex.cfId);
+            if (i == null)
+            {
+                i = new AtomicInteger(1);
+                invalidMutations.put(ex.cfId, i);
+            }
+            else
+                i.incrementAndGet();
+            return;
+        }
+        catch (Throwable t)
+        {
+            JVMStabilityInspector.inspectThrowable(t);
+            File f = File.createTempFile("mutation", "dat");
+
+            try (DataOutputStream out = new DataOutputStream(new FileOutputStream(f)))
+            {
+                out.write(inputBuffer, 0, size);
+            }
+
+            // Checksum passed so this error can't be permissible.
+            handler.handleUnrecoverableError(new CommitLogReadException(
+                String.format(
+                    "Unexpected error deserializing mutation; saved to %s.  " +
+                    "This may be caused by replaying a mutation against a table with the same name but incompatible schema.  " +
+                    "Exception follows: %s", f.getAbsolutePath(), t),
+                CommitLogReadErrorReason.MUTATION_ERROR,
+                false));
+            return;
+        }
+
+        if (logger.isTraceEnabled())
+            logger.trace("Read mutation for {}.{}: {}", mutation.getKeyspaceName(), mutation.key(),
+                         "{" + StringUtils.join(mutation.getPartitionUpdates().iterator(), ", ") + "}");
+
+        if (shouldReplay)
+            handler.handleMutation(mutation, size, entryLocation, desc);
+    }
+
+    /**
+     * Helper methods to deal with changing formats of internals of the CommitLog without polluting deserialization code.
+     */
+    private static class CommitLogFormat
+    {
+        public static long calculateClaimedChecksum(FileDataInput input, int commitLogVersion) throws IOException
+        {
+            switch (commitLogVersion)
+            {
+                case CommitLogDescriptor.VERSION_12:
+                case CommitLogDescriptor.VERSION_20:
+                    return input.readLong();
+                // Changed format in 2.1
+                default:
+                    return input.readInt() & 0xffffffffL;
+            }
+        }
+
+        public static void updateChecksum(CRC32 checksum, int serializedSize, int commitLogVersion)
+        {
+            switch (commitLogVersion)
+            {
+                case CommitLogDescriptor.VERSION_12:
+                    checksum.update(serializedSize);
+                    break;
+                // Changed format in 2.0
+                default:
+                    updateChecksumInt(checksum, serializedSize);
+                    break;
+            }
+        }
+
+        public static long calculateClaimedCRC32(FileDataInput input, int commitLogVersion) throws IOException
+        {
+            switch (commitLogVersion)
+            {
+                case CommitLogDescriptor.VERSION_12:
+                case CommitLogDescriptor.VERSION_20:
+                    return input.readLong();
+                // Changed format in 2.1
+                default:
+                    return input.readInt() & 0xffffffffL;
+            }
+        }
+    }
+
+    private class ReadStatusTracker
+    {
+        private int mutationsLeft;
+        public String errorContext = "";
+        public boolean tolerateErrorsInSection;
+        private boolean error;
+
+        public ReadStatusTracker(int mutationLimit, boolean tolerateErrorsInSection)
+        {
+            this.mutationsLeft = mutationLimit;
+            this.tolerateErrorsInSection = tolerateErrorsInSection;
+        }
+
+        public void addProcessedMutation()
+        {
+            if (mutationsLeft == ALL_MUTATIONS)
+                return;
+            --mutationsLeft;
+        }
+
+        public boolean shouldContinue()
+        {
+            return !error && (mutationsLeft != 0 || mutationsLeft == ALL_MUTATIONS);
+        }
+
+        public void requestTermination()
+        {
+            error = true;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java b/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java
index f45a47a..c8e597f 100644
--- a/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLogReplayer.java

@@ -18,102 +18,90 @@
  */
 package org.apache.cassandra.db.commitlog;
 
-import java.io.DataOutputStream;
-import java.io.EOFException;
 import java.io.File;
-import java.io.FileOutputStream;
 import java.io.IOException;
-import java.nio.ByteBuffer;
 import java.util.*;
 import java.util.concurrent.ExecutionException;
 import java.util.concurrent.Future;
 import java.util.concurrent.atomic.AtomicInteger;
-import java.util.zip.CRC32;
 
+import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Predicate;
 import com.google.common.base.Throwables;
-import com.google.common.collect.HashMultimap;
-import com.google.common.collect.Iterables;
-import com.google.common.collect.Multimap;
+import com.google.common.collect.*;
 import com.google.common.util.concurrent.Uninterruptibles;
-
 import org.apache.commons.lang3.StringUtils;
+import org.cliffc.high_scale_lib.NonBlockingHashSet;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+
 import org.apache.cassandra.concurrent.Stage;
 import org.apache.cassandra.concurrent.StageManager;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.rows.SerializationHelper;
 import org.apache.cassandra.db.partitions.PartitionUpdate;
-import org.apache.cassandra.exceptions.ConfigurationException;
-import org.apache.cassandra.io.util.FileSegmentInputStream;
-import org.apache.cassandra.io.util.RebufferingInputStream;
-import org.apache.cassandra.schema.CompressionParams;
-import org.apache.cassandra.io.compress.ICompressor;
-import org.apache.cassandra.io.util.ChannelProxy;
-import org.apache.cassandra.io.util.DataInputBuffer;
-import org.apache.cassandra.io.util.FileDataInput;
-import org.apache.cassandra.io.util.RandomAccessReader;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.utils.FBUtilities;
-import org.apache.cassandra.utils.JVMStabilityInspector;
 import org.apache.cassandra.utils.WrappedRunnable;
-import org.cliffc.high_scale_lib.NonBlockingHashSet;
 
-import static org.apache.cassandra.utils.FBUtilities.updateChecksumInt;
-
-public class CommitLogReplayer
+public class CommitLogReplayer implements CommitLogReadHandler
 {
+    @VisibleForTesting
+    public static long MAX_OUTSTANDING_REPLAY_BYTES = Long.getLong("cassandra.commitlog_max_outstanding_replay_bytes", 1024 * 1024 * 64);
+    @VisibleForTesting
+    public static MutationInitiator mutationInitiator = new MutationInitiator();
     static final String IGNORE_REPLAY_ERRORS_PROPERTY = "cassandra.commitlog.ignorereplayerrors";
     private static final Logger logger = LoggerFactory.getLogger(CommitLogReplayer.class);
     private static final int MAX_OUTSTANDING_REPLAY_COUNT = Integer.getInteger("cassandra.commitlog_max_outstanding_replay_count", 1024);
-    private static final int LEGACY_END_OF_SEGMENT_MARKER = 0;
 
-    private final Set<Keyspace> keyspacesRecovered;
-    private final List<Future<?>> futures;
-    private final Map<UUID, AtomicInteger> invalidMutations;
+    private final Set<Keyspace> keyspacesReplayed;
+    private final Queue<Future<Integer>> futures;
+
     private final AtomicInteger replayedCount;
-    private final Map<UUID, ReplayPosition.ReplayFilter> cfPersisted;
-    private final ReplayPosition globalPosition;
-    private final CRC32 checksum;
-    private byte[] buffer;
-    private byte[] uncompressedBuffer;
+    private final Map<UUID, ReplayPositionFilter> cfPersisted;
+    private final CommitLogPosition globalPosition;
+
+    // Used to throttle speed of replay of mutations if we pass the max outstanding count
+    private long pendingMutationBytes = 0;
 
     private final ReplayFilter replayFilter;
     private final CommitLogArchiver archiver;
 
-    CommitLogReplayer(CommitLog commitLog, ReplayPosition globalPosition, Map<UUID, ReplayPosition.ReplayFilter> cfPersisted, ReplayFilter replayFilter)
+    @VisibleForTesting
+    protected CommitLogReader commitLogReader;
+
+    CommitLogReplayer(CommitLog commitLog,
+                      CommitLogPosition globalPosition,
+                      Map<UUID, ReplayPositionFilter> cfPersisted,
+                      ReplayFilter replayFilter)
     {
-        this.keyspacesRecovered = new NonBlockingHashSet<Keyspace>();
-        this.futures = new ArrayList<Future<?>>();
-        this.buffer = new byte[4096];
-        this.uncompressedBuffer = new byte[4096];
-        this.invalidMutations = new HashMap<UUID, AtomicInteger>();
+        this.keyspacesReplayed = new NonBlockingHashSet<Keyspace>();
+        this.futures = new ArrayDeque<Future<Integer>>();
         // count the number of replayed mutation. We don't really care about atomicity, but we need it to be a reference.
         this.replayedCount = new AtomicInteger();
-        this.checksum = new CRC32();
         this.cfPersisted = cfPersisted;
         this.globalPosition = globalPosition;
         this.replayFilter = replayFilter;
         this.archiver = commitLog.archiver;
+        this.commitLogReader = new CommitLogReader();
     }
 
     public static CommitLogReplayer construct(CommitLog commitLog)
     {
-        // compute per-CF and global replay positions
-        Map<UUID, ReplayPosition.ReplayFilter> cfPersisted = new HashMap<>();
+        // compute per-CF and global commit log segment positions
+        Map<UUID, ReplayPositionFilter> cfPersisted = new HashMap<>();
         ReplayFilter replayFilter = ReplayFilter.create();
-        ReplayPosition globalPosition = null;
+        CommitLogPosition globalPosition = null;
         for (ColumnFamilyStore cfs : ColumnFamilyStore.all())
         {
             // but, if we've truncated the cf in question, then we need to need to start replay after the truncation
-            ReplayPosition truncatedAt = SystemKeyspace.getTruncatedPosition(cfs.metadata.cfId);
+            CommitLogPosition truncatedAt = SystemKeyspace.getTruncatedPosition(cfs.metadata.cfId);
             if (truncatedAt != null)
             {
-                // Point in time restore is taken to mean that the tables need to be recovered even if they were
+                // Point in time restore is taken to mean that the tables need to be replayed even if they were
                 // deleted at a later point in time. Any truncation record after that point must thus be cleared prior
-                // to recovery (CASSANDRA-9195).
+                // to replay (CASSANDRA-9195).
                 long restoreTime = commitLog.archiver.restorePointInTime;
                 long truncatedTime = SystemKeyspace.getTruncatedAt(cfs.metadata.cfId);
                 if (truncatedTime > restoreTime)
@@ -129,28 +117,35 @@
                 }
             }
 
-            ReplayPosition.ReplayFilter filter = new ReplayPosition.ReplayFilter(cfs.getSSTables(), truncatedAt);
+            ReplayPositionFilter filter = new ReplayPositionFilter(cfs.getSSTables(), truncatedAt);
             if (!filter.isEmpty())
                 cfPersisted.put(cfs.metadata.cfId, filter);
             else
-                globalPosition = ReplayPosition.NONE; // if we have no ranges for this CF, we must replay everything and filter
+                globalPosition = CommitLogPosition.NONE; // if we have no ranges for this CF, we must replay everything and filter
         }
         if (globalPosition == null)
-            globalPosition = ReplayPosition.firstNotCovered(cfPersisted.values());
-        logger.debug("Global replay position is {} from columnfamilies {}", globalPosition, FBUtilities.toString(cfPersisted));
+            globalPosition = firstNotCovered(cfPersisted.values());
+        logger.trace("Global commit log segment position is {} from columnfamilies {}", globalPosition, FBUtilities.toString(cfPersisted));
         return new CommitLogReplayer(commitLog, globalPosition, cfPersisted, replayFilter);
     }
 
-    public void recover(File[] clogs) throws IOException
+    public void replayPath(File file, boolean tolerateTruncation) throws IOException
     {
-        int i;
-        for (i = 0; i < clogs.length; ++i)
-            recover(clogs[i], i + 1 == clogs.length);
+        commitLogReader.readCommitLogSegment(this, file, globalPosition, CommitLogReader.ALL_MUTATIONS, tolerateTruncation);
     }
 
+    public void replayFiles(File[] clogs) throws IOException
+    {
+        commitLogReader.readAllFiles(this, clogs, globalPosition);
+    }
+
+    /**
+     * Flushes all keyspaces associated with this replayer in parallel, blocking until their flushes are complete.
+     * @return the number of mutations replayed
+     */
     public int blockForWrites()
     {
-        for (Map.Entry<UUID, AtomicInteger> entry : invalidMutations.entrySet())
+        for (Map.Entry<UUID, AtomicInteger> entry : commitLogReader.getInvalidMutations())
             logger.warn(String.format("Skipped %d mutations from unknown (probably removed) CF with id %s", entry.getValue().intValue(), entry.getKey()));
 
         // wait for all the writes to finish on the mutation stage
@@ -160,7 +155,9 @@
         // flush replayed keyspaces
         futures.clear();
         boolean flushingSystem = false;
-        for (Keyspace keyspace : keyspacesRecovered)
+
+        List<Future<?>> futures = new ArrayList<Future<?>>();
+        for (Keyspace keyspace : keyspacesReplayed)
         {
             if (keyspace.getName().equals(SystemKeyspace.NAME))
                 flushingSystem = true;
@@ -173,41 +170,139 @@
             futures.add(Keyspace.open(SystemKeyspace.NAME).getColumnFamilyStore(SystemKeyspace.BATCHES).forceFlush());
 
         FBUtilities.waitOnFutures(futures);
+
         return replayedCount.get();
     }
 
-    private int readSyncMarker(CommitLogDescriptor descriptor, int offset, RandomAccessReader reader, boolean tolerateTruncation) throws IOException
+    /*
+     * Wrapper around initiating mutations read from the log to make it possible
+     * to spy on initiated mutations for test
+     */
+    @VisibleForTesting
+    public static class MutationInitiator
     {
-        if (offset > reader.length() - CommitLogSegment.SYNC_MARKER_SIZE)
+        protected Future<Integer> initiateMutation(final Mutation mutation,
+                                                   final long segmentId,
+                                                   final int serializedSize,
+                                                   final int entryLocation,
+                                                   final CommitLogReplayer commitLogReplayer)
         {
-            // There was no room in the segment to write a final header. No data could be present here.
-            return -1;
-        }
-        reader.seek(offset);
-        CRC32 crc = new CRC32();
-        updateChecksumInt(crc, (int) (descriptor.id & 0xFFFFFFFFL));
-        updateChecksumInt(crc, (int) (descriptor.id >>> 32));
-        updateChecksumInt(crc, (int) reader.getPosition());
-        int end = reader.readInt();
-        long filecrc = reader.readInt() & 0xffffffffL;
-        if (crc.getValue() != filecrc)
-        {
-            if (end != 0 || filecrc != 0)
+            Runnable runnable = new WrappedRunnable()
             {
-                handleReplayError(false,
-                                  "Encountered bad header at position %d of commit log %s, with invalid CRC. " +
-                                  "The end of segment marker should be zero.",
-                                  offset, reader.getPath());
-            }
-            return -1;
+                public void runMayThrow()
+                {
+                    if (Schema.instance.getKSMetaData(mutation.getKeyspaceName()) == null)
+                        return;
+                    if (commitLogReplayer.pointInTimeExceeded(mutation))
+                        return;
+
+                    final Keyspace keyspace = Keyspace.open(mutation.getKeyspaceName());
+
+                    // Rebuild the mutation, omitting column families that
+                    //    a) the user has requested that we ignore,
+                    //    b) have already been flushed,
+                    // or c) are part of a cf that was dropped.
+                    // Keep in mind that the cf.name() is suspect. do every thing based on the cfid instead.
+                    Mutation newMutation = null;
+                    for (PartitionUpdate update : commitLogReplayer.replayFilter.filter(mutation))
+                    {
+                        if (Schema.instance.getCF(update.metadata().cfId) == null)
+                            continue; // dropped
+
+                        // replay if current segment is newer than last flushed one or,
+                        // if it is the last known segment, if we are after the commit log segment position
+                        if (commitLogReplayer.shouldReplay(update.metadata().cfId, new CommitLogPosition(segmentId, entryLocation)))
+                        {
+                            if (newMutation == null)
+                                newMutation = new Mutation(mutation.getKeyspaceName(), mutation.key());
+                            newMutation.add(update);
+                            commitLogReplayer.replayedCount.incrementAndGet();
+                        }
+                    }
+                    if (newMutation != null)
+                    {
+                        assert !newMutation.isEmpty();
+
+                        try
+                        {
+                            Uninterruptibles.getUninterruptibly(Keyspace.open(newMutation.getKeyspaceName()).applyFromCommitLog(newMutation));
+                        }
+                        catch (ExecutionException e)
+                        {
+                            throw Throwables.propagate(e.getCause());
+                        }
+
+                        commitLogReplayer.keyspacesReplayed.add(keyspace);
+                    }
+                }
+            };
+            return StageManager.getStage(Stage.MUTATION).submit(runnable, serializedSize);
         }
-        else if (end < offset || end > reader.length())
+    }
+
+    /**
+     * A filter of known safe-to-discard commit log replay positions, based on
+     * the range covered by on disk sstables and those prior to the most recent truncation record
+     */
+    public static class ReplayPositionFilter
+    {
+        final NavigableMap<CommitLogPosition, CommitLogPosition> persisted = new TreeMap<>();
+        public ReplayPositionFilter(Iterable<SSTableReader> onDisk, CommitLogPosition truncatedAt)
         {
-            handleReplayError(tolerateTruncation, "Encountered bad header at position %d of commit log %s, with bad position but valid CRC",
-                              offset, reader.getPath());
-            return -1;
+            for (SSTableReader reader : onDisk)
+            {
+                CommitLogPosition start = reader.getSSTableMetadata().commitLogLowerBound;
+                CommitLogPosition end = reader.getSSTableMetadata().commitLogUpperBound;
+                add(persisted, start, end);
+            }
+            if (truncatedAt != null)
+                add(persisted, CommitLogPosition.NONE, truncatedAt);
         }
-        return end;
+
+        private static void add(NavigableMap<CommitLogPosition, CommitLogPosition> ranges, CommitLogPosition start, CommitLogPosition end)
+        {
+            // extend ourselves to cover any ranges we overlap
+            // record directly preceding our end may extend past us, so take the max of our end and its
+            Map.Entry<CommitLogPosition, CommitLogPosition> extend = ranges.floorEntry(end);
+            if (extend != null && extend.getValue().compareTo(end) > 0)
+                end = extend.getValue();
+
+            // record directly preceding our start may extend into us; if it does, we take it as our start
+            extend = ranges.lowerEntry(start);
+            if (extend != null && extend.getValue().compareTo(start) >= 0)
+                start = extend.getKey();
+
+            ranges.subMap(start, end).clear();
+            ranges.put(start, end);
+        }
+
+        public boolean shouldReplay(CommitLogPosition position)
+        {
+            // replay ranges are start exclusive, end inclusive
+            Map.Entry<CommitLogPosition, CommitLogPosition> range = persisted.lowerEntry(position);
+            return range == null || position.compareTo(range.getValue()) > 0;
+        }
+
+        public boolean isEmpty()
+        {
+            return persisted.isEmpty();
+        }
+    }
+
+    public static CommitLogPosition firstNotCovered(Iterable<ReplayPositionFilter> ranges)
+    {
+        CommitLogPosition min = null;
+        for (ReplayPositionFilter map : ranges)
+        {
+            CommitLogPosition first = map.persisted.firstEntry().getValue();
+            if (min == null)
+                min = first;
+            else
+                min = Ordering.natural().min(min, first);
+        }
+        if (min == null)
+            return CommitLogPosition.NONE;
+        return min;
     }
 
     abstract static class ReplayFilter
@@ -225,7 +320,7 @@
             Multimap<String, String> toReplay = HashMultimap.create();
             for (String rawPair : System.getProperty("cassandra.replayList").split(","))
             {
-                String[] pair = rawPair.trim().split("\\.");
+                String[] pair = StringUtils.split(rawPair.trim(), '.');
                 if (pair.length != 2)
                     throw new IllegalArgumentException("Each table to be replayed must be fully qualified with keyspace name, e.g., 'system.peers'");
 
@@ -291,347 +386,12 @@
      *
      * @return true iff replay is necessary
      */
-    private boolean shouldReplay(UUID cfId, ReplayPosition position)
+    private boolean shouldReplay(UUID cfId, CommitLogPosition position)
     {
-        ReplayPosition.ReplayFilter filter = cfPersisted.get(cfId);
+        ReplayPositionFilter filter = cfPersisted.get(cfId);
         return filter == null || filter.shouldReplay(position);
     }
 
-    @SuppressWarnings("resource")
-    public void recover(File file, boolean tolerateTruncation) throws IOException
-    {
-        CommitLogDescriptor desc = CommitLogDescriptor.fromFileName(file.getName());
-        try(ChannelProxy channel = new ChannelProxy(file);
-            RandomAccessReader reader = RandomAccessReader.open(channel))
-        {
-            if (desc.version < CommitLogDescriptor.VERSION_21)
-            {
-                if (logAndCheckIfShouldSkip(file, desc))
-                    return;
-                if (globalPosition.segment == desc.id)
-                    reader.seek(globalPosition.position);
-                replaySyncSection(reader, (int) reader.length(), desc, desc.fileName(), tolerateTruncation);
-                return;
-            }
-
-            final long segmentId = desc.id;
-            try
-            {
-                desc = CommitLogDescriptor.readHeader(reader);
-            }
-            catch (IOException e)
-            {
-                desc = null;
-            }
-            if (desc == null) {
-                handleReplayError(false, "Could not read commit log descriptor in file %s", file);
-                return;
-            }
-            if (segmentId != desc.id)
-            {
-                handleReplayError(false, "Segment id mismatch (filename %d, descriptor %d) in file %s", segmentId, desc.id, file);
-                // continue processing if ignored.
-            }
-
-            if (logAndCheckIfShouldSkip(file, desc))
-                return;
-
-            ICompressor compressor = null;
-            if (desc.compression != null)
-            {
-                try
-                {
-                    compressor = CompressionParams.createCompressor(desc.compression);
-                }
-                catch (ConfigurationException e)
-                {
-                    handleReplayError(false, "Unknown compression: %s", e.getMessage());
-                    return;
-                }
-            }
-
-            assert reader.length() <= Integer.MAX_VALUE;
-            int end = (int) reader.getFilePointer();
-            int replayEnd = end;
-
-            while ((end = readSyncMarker(desc, end, reader, tolerateTruncation)) >= 0)
-            {
-                int replayPos = replayEnd + CommitLogSegment.SYNC_MARKER_SIZE;
-
-                if (logger.isTraceEnabled())
-                    logger.trace("Replaying {} between {} and {}", file, reader.getFilePointer(), end);
-                if (compressor != null)
-                {
-                    int uncompressedLength = reader.readInt();
-                    replayEnd = replayPos + uncompressedLength;
-                }
-                else
-                {
-                    replayEnd = end;
-                }
-
-                if (segmentId == globalPosition.segment && replayEnd < globalPosition.position)
-                    // Skip over flushed section.
-                    continue;
-
-                FileDataInput sectionReader = reader;
-                String errorContext = desc.fileName();
-                // In the uncompressed case the last non-fully-flushed section can be anywhere in the file.
-                boolean tolerateErrorsInSection = tolerateTruncation;
-                if (compressor != null)
-                {
-                    // In the compressed case we know if this is the last section.
-                    tolerateErrorsInSection &= end == reader.length() || end < 0;
-
-                    int start = (int) reader.getFilePointer();
-                    try
-                    {
-                        int compressedLength = end - start;
-                        if (logger.isTraceEnabled())
-                            logger.trace("Decompressing {} between replay positions {} and {}",
-                                         file,
-                                         replayPos,
-                                         replayEnd);
-                        if (compressedLength > buffer.length)
-                            buffer = new byte[(int) (1.2 * compressedLength)];
-                        reader.readFully(buffer, 0, compressedLength);
-                        int uncompressedLength = replayEnd - replayPos;
-                        if (uncompressedLength > uncompressedBuffer.length)
-                            uncompressedBuffer = new byte[(int) (1.2 * uncompressedLength)];
-                        compressedLength = compressor.uncompress(buffer, 0, compressedLength, uncompressedBuffer, 0);
-                        sectionReader = new FileSegmentInputStream(ByteBuffer.wrap(uncompressedBuffer), reader.getPath(), replayPos);
-                        errorContext = "compressed section at " + start + " in " + errorContext;
-                    }
-                    catch (IOException | ArrayIndexOutOfBoundsException e)
-                    {
-                        handleReplayError(tolerateErrorsInSection,
-                                          "Unexpected exception decompressing section at %d: %s",
-                                          start, e);
-                        continue;
-                    }
-                }
-
-                if (!replaySyncSection(sectionReader, replayEnd, desc, errorContext, tolerateErrorsInSection))
-                    break;
-            }
-            logger.debug("Finished reading {}", file);
-        }
-    }
-
-    public boolean logAndCheckIfShouldSkip(File file, CommitLogDescriptor desc)
-    {
-        logger.debug("Replaying {} (CL version {}, messaging version {}, compression {})",
-                    file.getPath(),
-                    desc.version,
-                    desc.getMessagingVersion(),
-                    desc.compression);
-
-        if (globalPosition.segment > desc.id)
-        {
-            logger.trace("skipping replay of fully-flushed {}", file);
-            return true;
-        }
-        return false;
-    }
-
-    /**
-     * Replays a sync section containing a list of mutations.
-     *
-     * @return Whether replay should continue with the next section.
-     */
-    private boolean replaySyncSection(FileDataInput reader, int end, CommitLogDescriptor desc, String errorContext, boolean tolerateErrors) throws IOException
-    {
-         /* read the logs populate Mutation and apply */
-        while (reader.getFilePointer() < end && !reader.isEOF())
-        {
-            long mutationStart = reader.getFilePointer();
-            if (logger.isTraceEnabled())
-                logger.trace("Reading mutation at {}", mutationStart);
-
-            long claimedCRC32;
-            int serializedSize;
-            try
-            {
-                // any of the reads may hit EOF
-                serializedSize = reader.readInt();
-                if (serializedSize == LEGACY_END_OF_SEGMENT_MARKER)
-                {
-                    logger.trace("Encountered end of segment marker at {}", reader.getFilePointer());
-                    return false;
-                }
-
-                // Mutation must be at LEAST 10 bytes:
-                // 3 each for a non-empty Keyspace and Key (including the
-                // 2-byte length from writeUTF/writeWithShortLength) and 4 bytes for column count.
-                // This prevents CRC by being fooled by special-case garbage in the file; see CASSANDRA-2128
-                if (serializedSize < 10)
-                {
-                    handleReplayError(tolerateErrors,
-                                      "Invalid mutation size %d at %d in %s",
-                                      serializedSize, mutationStart, errorContext);
-                    return false;
-                }
-
-                long claimedSizeChecksum;
-                if (desc.version < CommitLogDescriptor.VERSION_21)
-                    claimedSizeChecksum = reader.readLong();
-                else
-                    claimedSizeChecksum = reader.readInt() & 0xffffffffL;
-                checksum.reset();
-                if (desc.version < CommitLogDescriptor.VERSION_20)
-                    checksum.update(serializedSize);
-                else
-                    updateChecksumInt(checksum, serializedSize);
-
-                if (checksum.getValue() != claimedSizeChecksum)
-                {
-                    handleReplayError(tolerateErrors,
-                                      "Mutation size checksum failure at %d in %s",
-                                      mutationStart, errorContext);
-                    return false;
-                }
-                // ok.
-
-                if (serializedSize > buffer.length)
-                    buffer = new byte[(int) (1.2 * serializedSize)];
-                reader.readFully(buffer, 0, serializedSize);
-                if (desc.version < CommitLogDescriptor.VERSION_21)
-                    claimedCRC32 = reader.readLong();
-                else
-                    claimedCRC32 = reader.readInt() & 0xffffffffL;
-            }
-            catch (EOFException eof)
-            {
-                handleReplayError(tolerateErrors,
-                                  "Unexpected end of segment",
-                                  mutationStart, errorContext);
-                return false; // last CL entry didn't get completely written. that's ok.
-            }
-
-            checksum.update(buffer, 0, serializedSize);
-            if (claimedCRC32 != checksum.getValue())
-            {
-                handleReplayError(tolerateErrors,
-                                  "Mutation checksum failure at %d in %s",
-                                  mutationStart, errorContext);
-                continue;
-            }
-            replayMutation(buffer, serializedSize, (int) reader.getFilePointer(), desc);
-        }
-        return true;
-    }
-
-    /**
-     * Deserializes and replays a commit log entry.
-     */
-    void replayMutation(byte[] inputBuffer, int size,
-            final int entryLocation, final CommitLogDescriptor desc) throws IOException
-    {
-
-        final Mutation mutation;
-        try (RebufferingInputStream bufIn = new DataInputBuffer(inputBuffer, 0, size))
-        {
-            mutation = Mutation.serializer.deserialize(bufIn,
-                                                       desc.getMessagingVersion(),
-                                                       SerializationHelper.Flag.LOCAL);
-            // doublecheck that what we read is [still] valid for the current schema
-            for (PartitionUpdate upd : mutation.getPartitionUpdates())
-                upd.validate();
-        }
-        catch (UnknownColumnFamilyException ex)
-        {
-            if (ex.cfId == null)
-                return;
-            AtomicInteger i = invalidMutations.get(ex.cfId);
-            if (i == null)
-            {
-                i = new AtomicInteger(1);
-                invalidMutations.put(ex.cfId, i);
-            }
-            else
-                i.incrementAndGet();
-            return;
-        }
-        catch (Throwable t)
-        {
-            JVMStabilityInspector.inspectThrowable(t);
-            File f = File.createTempFile("mutation", "dat");
-
-            try (DataOutputStream out = new DataOutputStream(new FileOutputStream(f)))
-            {
-                out.write(inputBuffer, 0, size);
-            }
-
-            // Checksum passed so this error can't be permissible.
-            handleReplayError(false,
-                              "Unexpected error deserializing mutation; saved to %s.  " +
-                              "This may be caused by replaying a mutation against a table with the same name but incompatible schema.  " +
-                              "Exception follows: %s",
-                              f.getAbsolutePath(),
-                              t);
-            return;
-        }
-
-        if (logger.isTraceEnabled())
-            logger.trace("replaying mutation for {}.{}: {}", mutation.getKeyspaceName(), mutation.key(), "{" + StringUtils.join(mutation.getPartitionUpdates().iterator(), ", ") + "}");
-
-        Runnable runnable = new WrappedRunnable()
-        {
-            public void runMayThrow()
-            {
-                if (Schema.instance.getKSMetaData(mutation.getKeyspaceName()) == null)
-                    return;
-                if (pointInTimeExceeded(mutation))
-                    return;
-
-                final Keyspace keyspace = Keyspace.open(mutation.getKeyspaceName());
-
-                // Rebuild the mutation, omitting column families that
-                //    a) the user has requested that we ignore,
-                //    b) have already been flushed,
-                // or c) are part of a cf that was dropped.
-                // Keep in mind that the cf.name() is suspect. do every thing based on the cfid instead.
-                Mutation newMutation = null;
-                for (PartitionUpdate update : replayFilter.filter(mutation))
-                {
-                    if (Schema.instance.getCF(update.metadata().cfId) == null)
-                        continue; // dropped
-
-                    // replay if current segment is newer than last flushed one or,
-                    // if it is the last known segment, if we are after the replay position
-                    if (shouldReplay(update.metadata().cfId, new ReplayPosition(desc.id, entryLocation)))
-                    {
-                        if (newMutation == null)
-                            newMutation = new Mutation(mutation.getKeyspaceName(), mutation.key());
-                        newMutation.add(update);
-                        replayedCount.incrementAndGet();
-                    }
-                }
-                if (newMutation != null)
-                {
-                    assert !newMutation.isEmpty();
-
-                    try
-                    {
-                        Uninterruptibles.getUninterruptibly(Keyspace.open(newMutation.getKeyspaceName()).applyFromCommitLog(newMutation));
-                    }
-                    catch (ExecutionException e)
-                    {
-                        throw Throwables.propagate(e.getCause());
-                    }
-
-                    keyspacesRecovered.add(keyspace);
-                }
-            }
-        };
-        futures.add(StageManager.getStage(Stage.MUTATION).submit(runnable));
-        if (futures.size() > MAX_OUTSTANDING_REPLAY_COUNT)
-        {
-            FBUtilities.waitOnFutures(futures);
-            futures.clear();
-        }
-    }
-
     protected boolean pointInTimeExceeded(Mutation fm)
     {
         long restoreTarget = archiver.restorePointInTime;
@@ -644,21 +404,47 @@
         return false;
     }
 
-    static void handleReplayError(boolean permissible, String message, Object... messageArgs) throws IOException
+    public void handleMutation(Mutation m, int size, int entryLocation, CommitLogDescriptor desc)
     {
-        String msg = String.format(message, messageArgs);
-        IOException e = new CommitLogReplayException(msg);
-        if (permissible)
-            logger.error("Ignoring commit log replay error likely due to incomplete flush to disk", e);
+        pendingMutationBytes += size;
+        futures.offer(mutationInitiator.initiateMutation(m,
+                                                         desc.id,
+                                                         size,
+                                                         entryLocation,
+                                                         this));
+        // If there are finished mutations, or too many outstanding bytes/mutations
+        // drain the futures in the queue
+        while (futures.size() > MAX_OUTSTANDING_REPLAY_COUNT
+               || pendingMutationBytes > MAX_OUTSTANDING_REPLAY_BYTES
+               || (!futures.isEmpty() && futures.peek().isDone()))
+        {
+            pendingMutationBytes -= FBUtilities.waitOnFuture(futures.poll());
+        }
+    }
+
+    public boolean shouldSkipSegmentOnError(CommitLogReadException exception) throws IOException
+    {
+        if (exception.permissible)
+            logger.error("Ignoring commit log replay error likely due to incomplete flush to disk", exception);
         else if (Boolean.getBoolean(IGNORE_REPLAY_ERRORS_PROPERTY))
-            logger.error("Ignoring commit log replay error", e);
-        else if (!CommitLog.handleCommitError("Failed commit log replay", e))
+            logger.error("Ignoring commit log replay error", exception);
+        else if (!CommitLog.handleCommitError("Failed commit log replay", exception))
         {
             logger.error("Replay stopped. If you wish to override this error and continue starting the node ignoring " +
                          "commit log replay problems, specify -D" + IGNORE_REPLAY_ERRORS_PROPERTY + "=true " +
                          "on the command line");
-            throw e;
+            throw new CommitLogReplayException(exception.getMessage(), exception);
         }
+        return false;
+    }
+
+    /**
+     * The logic for whether or not we throw on an error is identical for the replayer between recoverable or non.
+     */
+    public void handleUnrecoverableError(CommitLogReadException exception) throws IOException
+    {
+        // Don't care about return value, use this simply to throw exception as appropriate.
+        shouldSkipSegmentOnError(exception);
     }
 
     @SuppressWarnings("serial")

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLogSegment.java b/src/java/org/apache/cassandra/db/commitlog/CommitLogSegment.java
index 27c05b4..a1158be 100644
--- a/src/java/org/apache/cassandra/db/commitlog/CommitLogSegment.java
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLogSegment.java

@@ -22,32 +22,22 @@
 import java.nio.ByteBuffer;
 import java.nio.channels.FileChannel;
 import java.nio.file.StandardOpenOption;
-import java.util.ArrayList;
-import java.util.Collection;
-import java.util.Comparator;
-import java.util.Iterator;
-import java.util.List;
-import java.util.Map;
-import java.util.UUID;
+import java.util.*;
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.ConcurrentMap;
 import java.util.concurrent.atomic.AtomicInteger;
 import java.util.zip.CRC32;
 
-import com.codahale.metrics.Timer;
-
 import org.cliffc.high_scale_lib.NonBlockingHashMap;
-
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import org.apache.cassandra.config.CFMetaData;
-import org.apache.cassandra.config.DatabaseDescriptor;
-import org.apache.cassandra.config.Schema;
+import com.codahale.metrics.Timer;
+import org.apache.cassandra.config.*;
 import org.apache.cassandra.db.Mutation;
+import org.apache.cassandra.db.commitlog.CommitLog.Configuration;
 import org.apache.cassandra.db.partitions.PartitionUpdate;
 import org.apache.cassandra.io.FSWriteError;
-import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.utils.CLibrary;
 import org.apache.cassandra.utils.concurrent.OpOrder;
 import org.apache.cassandra.utils.concurrent.WaitQueue;
@@ -64,6 +54,15 @@
     private static final Logger logger = LoggerFactory.getLogger(CommitLogSegment.class);
 
     private final static long idBase;
+
+    private CDCState cdcState = CDCState.PERMITTED;
+    public enum CDCState {
+        PERMITTED,
+        FORBIDDEN,
+        CONTAINS
+    }
+    Object cdcStateLock = new Object();
+
     private final static AtomicInteger nextId = new AtomicInteger(1);
     private static long replayLimitId;
     static
@@ -113,15 +112,22 @@
     final FileChannel channel;
     final int fd;
 
+    protected final AbstractCommitLogSegmentManager manager;
+
     ByteBuffer buffer;
+    private volatile boolean headerWritten;
 
     final CommitLog commitLog;
     public final CommitLogDescriptor descriptor;
 
-    static CommitLogSegment createSegment(CommitLog commitLog, Runnable onClose)
+    static CommitLogSegment createSegment(CommitLog commitLog, AbstractCommitLogSegmentManager manager, Runnable onClose)
     {
-        return commitLog.configuration.useCompression() ? new CompressedSegment(commitLog, onClose)
-                                                        : new MemoryMappedSegment(commitLog);
+        Configuration config = commitLog.configuration;
+        CommitLogSegment segment = config.useEncryption() ? new EncryptedSegment(commitLog, manager, onClose)
+                                                          : config.useCompression() ? new CompressedSegment(commitLog, manager, onClose)
+                                                                                    : new MemoryMappedSegment(commitLog, manager);
+        segment.writeLogHeader();
+        return segment;
     }
 
     /**
@@ -132,7 +138,8 @@
      */
     static boolean usesBufferPool(CommitLog commitLog)
     {
-        return commitLog.configuration.useCompression();
+        Configuration config = commitLog.configuration;
+        return config.useEncryption() || config.useCompression();
     }
 
     static long getNextId()
@@ -142,15 +149,17 @@
 
     /**
      * Constructs a new segment file.
-     *
-     * @param filePath  if not null, recycles the existing file by renaming it and truncating it to CommitLog.SEGMENT_SIZE.
      */
-    CommitLogSegment(CommitLog commitLog)
+    CommitLogSegment(CommitLog commitLog, AbstractCommitLogSegmentManager manager)
     {
         this.commitLog = commitLog;
+        this.manager = manager;
+
         id = getNextId();
-        descriptor = new CommitLogDescriptor(id, commitLog.configuration.getCompressorClass());
-        logFile = new File(commitLog.location, descriptor.fileName());
+        descriptor = new CommitLogDescriptor(id,
+                                             commitLog.configuration.getCompressorClass(),
+                                             commitLog.configuration.getEncryptionContext());
+        logFile = new File(manager.storageDirectory, descriptor.fileName());
 
         try
         {
@@ -163,11 +172,26 @@
         }
 
         buffer = createBuffer(commitLog);
-        // write the header
-        CommitLogDescriptor.writeHeader(buffer, descriptor);
+    }
+
+    /**
+     * Deferred writing of the commit log header until subclasses have had a chance to initialize
+     */
+    void writeLogHeader()
+    {
+        CommitLogDescriptor.writeHeader(buffer, descriptor, additionalHeaderParameters());
         endOfBuffer = buffer.capacity();
         lastSyncedOffset = buffer.position();
         allocatePosition.set(lastSyncedOffset + SYNC_MARKER_SIZE);
+        headerWritten = true;
+    }
+
+    /**
+     * Provide any additional header data that should be stored in the {@link CommitLogDescriptor}.
+     */
+    protected Map<String, String> additionalHeaderParameters()
+    {
+        return Collections.<String, String>emptyMap();
     }
 
     abstract ByteBuffer createBuffer(CommitLog commitLog);
@@ -274,6 +298,8 @@
      */
     synchronized void sync()
     {
+        if (!headerWritten)
+            throw new IllegalStateException("commit log header has not been written");
         boolean close = false;
         // check we have more work to do
         if (allocatePosition.get() <= lastSyncedOffset + SYNC_MARKER_SIZE)
@@ -304,7 +330,7 @@
         waitForModifications();
         int sectionEnd = close ? endOfBuffer : nextMarker;
 
-        // Perform compression, writing to file and flush.
+        // Possibly perform compression or encryption, writing to file and flush.
         write(startMarker, sectionEnd);
 
         // Signal the sync as complete.
@@ -314,8 +340,20 @@
         syncComplete.signalAll();
     }
 
+    /**
+     * Create a sync marker to delineate sections of the commit log, typically created on each sync of the file.
+     * The sync marker consists of a file pointer to where the next sync marker should be (effectively declaring the length
+     * of this section), as well as a CRC value.
+     *
+     * @param buffer buffer in which to write out the sync marker.
+     * @param offset Offset into the {@code buffer} at which to write the sync marker.
+     * @param filePos The current position in the target file where the sync marker will be written (most likely different from the buffer position).
+     * @param nextMarker The file position of where the next sync marker should be.
+     */
     protected void writeSyncMarker(ByteBuffer buffer, int offset, int filePos, int nextMarker)
     {
+        if (filePos > nextMarker)
+            throw new IllegalArgumentException(String.format("commit log sync marker's current file position %d is greater than next file position %d", filePos, nextMarker));
         CRC32 crc = new CRC32();
         updateChecksumInt(crc, (int) (id & 0xFFFFFFFFL));
         updateChecksumInt(crc, (int) (id >>> 32));
@@ -332,22 +370,11 @@
     }
 
     /**
-     * Completely discards a segment file by deleting it. (Potentially blocking operation)
+     * @return the current CommitLogPosition for this log segment
      */
-    void discard(boolean deleteFile)
+    public CommitLogPosition getCurrentCommitLogPosition()
     {
-        close();
-        if (deleteFile)
-            FileUtils.deleteWithConfirm(logFile);
-        commitLog.allocator.addSize(-onDiskSize());
-    }
-
-    /**
-     * @return the current ReplayPosition for this log segment
-     */
-    public ReplayPosition getContext()
-    {
-        return new ReplayPosition(id, allocatePosition.get());
+        return new CommitLogPosition(id, allocatePosition.get());
     }
 
     /**
@@ -437,13 +464,13 @@
      * @param cfId    the column family ID that is now clean
      * @param context the optional clean offset
      */
-    public synchronized void markClean(UUID cfId, ReplayPosition context)
+    public synchronized void markClean(UUID cfId, CommitLogPosition context)
     {
         if (!cfDirty.containsKey(cfId))
             return;
-        if (context.segment == id)
+        if (context.segmentId == id)
             markClean(cfId, context.position);
-        else if (context.segment > id)
+        else if (context.segmentId > id)
             markClean(cfId, Integer.MAX_VALUE);
     }
 
@@ -528,14 +555,14 @@
     }
 
     /**
-     * Check to see if a certain ReplayPosition is contained by this segment file.
+     * Check to see if a certain CommitLogPosition is contained by this segment file.
      *
-     * @param   context the replay position to be checked
-     * @return  true if the replay position is contained by this segment file.
+     * @param   context the commit log segment position to be checked
+     * @return  true if the commit log segment position is contained by this segment file.
      */
-    public boolean contains(ReplayPosition context)
+    public boolean contains(CommitLogPosition context)
     {
-        return context.segment == id;
+        return context.segmentId == id;
     }
 
     // For debugging, not fast
@@ -573,14 +600,38 @@
         }
     }
 
+    public CDCState getCDCState()
+    {
+        return cdcState;
+    }
+
+    /**
+     * Change the current cdcState on this CommitLogSegment. There are some restrictions on state transitions and this
+     * method is idempotent.
+     */
+    public void setCDCState(CDCState newState)
+    {
+        if (newState == cdcState)
+            return;
+
+        // Also synchronized in CDCSizeTracker.processNewSegment and .processDiscardedSegment
+        synchronized(cdcStateLock)
+        {
+            if (cdcState == CDCState.CONTAINS && newState != CDCState.CONTAINS)
+                throw new IllegalArgumentException("Cannot transition from CONTAINS to any other state.");
+
+            if (cdcState == CDCState.FORBIDDEN && newState != CDCState.PERMITTED)
+                throw new IllegalArgumentException("Only transition from FORBIDDEN to PERMITTED is allowed.");
+
+            cdcState = newState;
+        }
+    }
+
     /**
      * A simple class for tracking information about the portion of a segment that has been allocated to a log write.
-     * The constructor leaves the fields uninitialized for population by CommitlogManager, so that it can be
-     * stack-allocated by escape analysis in CommitLog.add.
      */
-    static class Allocation
+    protected static class Allocation
     {
-
         private final CommitLogSegment segment;
         private final OpOrder.Group appendOp;
         private final int position;
@@ -616,10 +667,9 @@
             segment.waitForSync(position, waitingOnCommit);
         }
 
-        public ReplayPosition getReplayPosition()
+        public CommitLogPosition getCommitLogPosition()
         {
-            return new ReplayPosition(segment.id, buffer.limit());
+            return new CommitLogPosition(segment.id, buffer.limit());
         }
-
     }
 }

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java b/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java
new file mode 100644
index 0000000..04beb20
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDC.java

@@ -0,0 +1,302 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.commitlog;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.Files;
+import java.util.concurrent.*;
+import java.util.concurrent.atomic.AtomicLong;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.util.concurrent.RateLimiter;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.commitlog.CommitLogSegment.CDCState;
+import org.apache.cassandra.exceptions.WriteTimeoutException;
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.utils.DirectorySizeCalculator;
+
+public class CommitLogSegmentManagerCDC extends AbstractCommitLogSegmentManager
+{
+    static final Logger logger = LoggerFactory.getLogger(CommitLogSegmentManagerCDC.class);
+    private final CDCSizeTracker cdcSizeTracker;
+
+    public CommitLogSegmentManagerCDC(final CommitLog commitLog, String storageDirectory)
+    {
+        super(commitLog, storageDirectory);
+        cdcSizeTracker = new CDCSizeTracker(this, new File(DatabaseDescriptor.getCDCLogLocation()));
+    }
+
+    @Override
+    void start()
+    {
+        super.start();
+        cdcSizeTracker.start();
+    }
+
+    public void discard(CommitLogSegment segment, boolean delete)
+    {
+        segment.close();
+        addSize(-segment.onDiskSize());
+
+        cdcSizeTracker.processDiscardedSegment(segment);
+
+        if (segment.getCDCState() == CDCState.CONTAINS)
+            FileUtils.renameWithConfirm(segment.logFile.getAbsolutePath(), DatabaseDescriptor.getCDCLogLocation() + File.separator + segment.logFile.getName());
+        else
+        {
+            if (delete)
+                FileUtils.deleteWithConfirm(segment.logFile);
+        }
+    }
+
+    /**
+     * Initiates the shutdown process for the management thread. Also stops the cdc on-disk size calculator executor.
+     */
+    public void shutdown()
+    {
+        run = false;
+        cdcSizeTracker.shutdown();
+        wakeManager();
+    }
+
+    /**
+     * Reserve space in the current segment for the provided mutation or, if there isn't space available,
+     * create a new segment. For CDC mutations, allocation is expected to throw WTE if the segment disallows CDC mutations.
+     *
+     * @param mutation Mutation to allocate in segment manager
+     * @param size total size (overhead + serialized) of mutation
+     * @return the created Allocation object
+     * @throws WriteTimeoutException If segment disallows CDC mutations, we throw WTE
+     */
+    @Override
+    public CommitLogSegment.Allocation allocate(Mutation mutation, int size) throws WriteTimeoutException
+    {
+        CommitLogSegment segment = allocatingFrom();
+        CommitLogSegment.Allocation alloc;
+
+        throwIfForbidden(mutation, segment);
+        while ( null == (alloc = segment.allocate(mutation, size)) )
+        {
+            // Failed to allocate, so move to a new segment with enough room if possible.
+            advanceAllocatingFrom(segment);
+            segment = allocatingFrom;
+
+            throwIfForbidden(mutation, segment);
+        }
+
+        if (mutation.trackedByCDC())
+            segment.setCDCState(CDCState.CONTAINS);
+
+        return alloc;
+    }
+
+    private void throwIfForbidden(Mutation mutation, CommitLogSegment segment) throws WriteTimeoutException
+    {
+        if (mutation.trackedByCDC() && segment.getCDCState() == CDCState.FORBIDDEN)
+        {
+            cdcSizeTracker.submitOverflowSizeRecalculation();
+            throw new WriteTimeoutException(WriteType.CDC, ConsistencyLevel.LOCAL_ONE, 0, 1);
+        }
+    }
+
+    /**
+     * Move files to cdc_raw after replay, since recovery will flush to SSTable and these mutations won't be available
+     * in the CL subsystem otherwise.
+     */
+    void handleReplayedSegment(final File file)
+    {
+        logger.trace("Moving (Unopened) segment {} to cdc_raw directory after replay", file);
+        FileUtils.renameWithConfirm(file.getAbsolutePath(), DatabaseDescriptor.getCDCLogLocation() + File.separator + file.getName());
+        cdcSizeTracker.addFlushedSize(file.length());
+    }
+
+    /**
+     * On segment creation, flag whether the segment should accept CDC mutations or not based on the total currently
+     * allocated unflushed CDC segments and the contents of cdc_raw
+     */
+    public CommitLogSegment createSegment()
+    {
+        CommitLogSegment segment = CommitLogSegment.createSegment(commitLog, this, () -> wakeManager());
+        cdcSizeTracker.processNewSegment(segment);
+        return segment;
+    }
+
+    /**
+     * Tracks total disk usage of CDC subsystem, defined by the summation of all unflushed CommitLogSegments with CDC
+     * data in them and all segments archived into cdc_raw.
+     *
+     * Allows atomic increment/decrement of unflushed size, however only allows increment on flushed and requires a full
+     * directory walk to determine any potential deletions by CDC consumer.
+     *
+     * TODO: linux performs approximately 25% better with the following one-liner instead of this walker:
+     *      Arrays.stream(path.listFiles()).mapToLong(File::length).sum();
+     * However this solution is 375% slower on Windows. Revisit this and split logic to per-OS
+     */
+    private class CDCSizeTracker extends DirectorySizeCalculator
+    {
+        private final RateLimiter rateLimiter = RateLimiter.create(1000 / DatabaseDescriptor.getCDCDiskCheckInterval());
+        private ExecutorService cdcSizeCalculationExecutor;
+        private CommitLogSegmentManagerCDC segmentManager;
+        private AtomicLong unflushedCDCSize = new AtomicLong(0);
+
+        CDCSizeTracker(CommitLogSegmentManagerCDC segmentManager, File path)
+        {
+            super(path);
+            this.segmentManager = segmentManager;
+        }
+
+        /**
+         * Needed for stop/restart during unit tests
+         */
+        public void start()
+        {
+            cdcSizeCalculationExecutor = new ThreadPoolExecutor(1, 1, 1000, TimeUnit.SECONDS, new SynchronousQueue<>(), new ThreadPoolExecutor.DiscardPolicy());
+        }
+
+        /**
+         * Synchronous size recalculation on each segment creation/deletion call could lead to very long delays in new
+         * segment allocation, thus long delays in thread signaling to wake waiting allocation / writer threads.
+         *
+         * This can be reached either from the segment management thread in ABstractCommitLogSegmentManager or from the
+         * size recalculation executor, so we synchronize on this object to reduce the race overlap window available for
+         * size to get off.
+         *
+         * Reference DirectorySizerBench for more information about performance of the directory size recalc.
+         */
+        void processNewSegment(CommitLogSegment segment)
+        {
+            // See synchronization in CommitLogSegment.setCDCState
+            synchronized(segment.cdcStateLock)
+            {
+                segment.setCDCState(defaultSegmentSize() + totalCDCSizeOnDisk() > allowableCDCBytes()
+                                    ? CDCState.FORBIDDEN
+                                    : CDCState.PERMITTED);
+                if (segment.getCDCState() == CDCState.PERMITTED)
+                    unflushedCDCSize.addAndGet(defaultSegmentSize());
+            }
+
+            // Take this opportunity to kick off a recalc to pick up any consumer file deletion.
+            submitOverflowSizeRecalculation();
+        }
+
+        void processDiscardedSegment(CommitLogSegment segment)
+        {
+            // See synchronization in CommitLogSegment.setCDCState
+            synchronized(segment.cdcStateLock)
+            {
+                // Add to flushed size before decrementing unflushed so we don't have a window of false generosity
+                if (segment.getCDCState() == CDCState.CONTAINS)
+                    size.addAndGet(segment.onDiskSize());
+                if (segment.getCDCState() != CDCState.FORBIDDEN)
+                    unflushedCDCSize.addAndGet(-defaultSegmentSize());
+            }
+
+            // Take this opportunity to kick off a recalc to pick up any consumer file deletion.
+            submitOverflowSizeRecalculation();
+        }
+
+        private long allowableCDCBytes()
+        {
+            return (long)DatabaseDescriptor.getCDCSpaceInMB() * 1024 * 1024;
+        }
+
+        public void submitOverflowSizeRecalculation()
+        {
+            try
+            {
+                cdcSizeCalculationExecutor.submit(() -> recalculateOverflowSize());
+            }
+            catch (RejectedExecutionException e)
+            {
+                // Do nothing. Means we have one in flight so this req. should be satisfied when it completes.
+            }
+        }
+
+        private void recalculateOverflowSize()
+        {
+            rateLimiter.acquire();
+            calculateSize();
+            CommitLogSegment allocatingFrom = segmentManager.allocatingFrom;
+            if (allocatingFrom.getCDCState() == CDCState.FORBIDDEN)
+                processNewSegment(allocatingFrom);
+        }
+
+        private int defaultSegmentSize()
+        {
+            return DatabaseDescriptor.getCommitLogSegmentSize();
+        }
+
+        private void calculateSize()
+        {
+            try
+            {
+                // Since we don't synchronize around either rebuilding our file list or walking the tree and adding to
+                // size, it's possible we could have changes take place underneath us and end up with a slightly incorrect
+                // view of our flushed size by the time this walking completes. Given that there's a linear growth in
+                // runtime on both rebuildFileList and walkFileTree (about 50% for each one on runtime), and that the
+                // window for this race should be very small, this is an acceptable trade-off since it will be resolved
+                // on the next segment creation / deletion with a subsequent call to submitOverflowSizeRecalculation.
+                rebuildFileList();
+                Files.walkFileTree(path.toPath(), this);
+            }
+            catch (IOException ie)
+            {
+                CommitLog.instance.handleCommitError("Failed CDC Size Calculation", ie);
+            }
+        }
+
+        private long addFlushedSize(long toAdd)
+        {
+            return size.addAndGet(toAdd);
+        }
+
+        private long totalCDCSizeOnDisk()
+        {
+            return unflushedCDCSize.get() + size.get();
+        }
+
+        public void shutdown()
+        {
+            cdcSizeCalculationExecutor.shutdown();
+        }
+    }
+
+    /**
+     * Only use for testing / validation that size tracker is working. Not for production use.
+     */
+    @VisibleForTesting
+    public long updateCDCTotalSize()
+    {
+        cdcSizeTracker.submitOverflowSizeRecalculation();
+
+        // Give the update time to run
+        try
+        {
+            Thread.sleep(DatabaseDescriptor.getCDCDiskCheckInterval() + 10);
+        }
+        catch (InterruptedException e) {}
+
+        return cdcSizeTracker.totalCDCSizeOnDisk();
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerStandard.java b/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerStandard.java
new file mode 100644
index 0000000..333077c
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerStandard.java

@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.commitlog;
+
+import java.io.File;
+
+import org.apache.cassandra.db.Mutation;
+import org.apache.cassandra.io.util.FileUtils;
+
+public class CommitLogSegmentManagerStandard extends AbstractCommitLogSegmentManager
+{
+    public CommitLogSegmentManagerStandard(final CommitLog commitLog, String storageDirectory)
+    {
+        super(commitLog, storageDirectory);
+    }
+
+    public void discard(CommitLogSegment segment, boolean delete)
+    {
+        segment.close();
+        if (delete)
+            FileUtils.deleteWithConfirm(segment.logFile);
+        addSize(-segment.onDiskSize());
+    }
+
+    /**
+     * Initiates the shutdown process for the management thread.
+     */
+    public void shutdown()
+    {
+        run = false;
+        wakeManager();
+    }
+
+    /**
+     * Reserve space in the current segment for the provided mutation or, if there isn't space available,
+     * create a new segment. allocate() is blocking until allocation succeeds as it waits on a signal in advanceAllocatingFrom
+     *
+     * @param mutation mutation to allocate space for
+     * @param size total size of mutation (overhead + serialized size)
+     * @return the provided Allocation object
+     */
+    public CommitLogSegment.Allocation allocate(Mutation mutation, int size)
+    {
+        CommitLogSegment segment = allocatingFrom();
+
+        CommitLogSegment.Allocation alloc;
+        while ( null == (alloc = segment.allocate(mutation, size)) )
+        {
+            // failed to allocate, so move to a new segment with enough room
+            advanceAllocatingFrom(segment);
+            segment = allocatingFrom;
+        }
+
+        return alloc;
+    }
+
+    /**
+     * Simply delete untracked segment files w/standard, as it'll be flushed to sstables during recovery
+     *
+     * @param file segment file that is no longer in use.
+     */
+    void handleReplayedSegment(final File file)
+    {
+        // (don't decrease managed size, since this was never a "live" segment)
+        logger.trace("(Unopened) segment {} is no longer needed and will be deleted now", file);
+        FileUtils.deleteWithConfirm(file);
+    }
+
+    public CommitLogSegment createSegment()
+    {
+        return CommitLogSegment.createSegment(commitLog, this, () -> wakeManager());
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentReader.java b/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentReader.java
new file mode 100644
index 0000000..b547131
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/commitlog/CommitLogSegmentReader.java

@@ -0,0 +1,366 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db.commitlog;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Iterator;
+import java.util.zip.CRC32;
+import javax.crypto.Cipher;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.AbstractIterator;
+
+import org.apache.cassandra.db.commitlog.EncryptedFileSegmentInputStream.ChunkProvider;
+import org.apache.cassandra.db.commitlog.CommitLogReadHandler.*;
+import org.apache.cassandra.io.FSReadError;
+import org.apache.cassandra.io.compress.ICompressor;
+import org.apache.cassandra.io.util.FileDataInput;
+import org.apache.cassandra.io.util.FileSegmentInputStream;
+import org.apache.cassandra.io.util.RandomAccessReader;
+import org.apache.cassandra.schema.CompressionParams;
+import org.apache.cassandra.security.EncryptionUtils;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.utils.ByteBufferUtil;
+
+import static org.apache.cassandra.db.commitlog.CommitLogSegment.SYNC_MARKER_SIZE;
+import static org.apache.cassandra.utils.FBUtilities.updateChecksumInt;
+
+/**
+ * Read each sync section of a commit log, iteratively.
+ */
+public class CommitLogSegmentReader implements Iterable<CommitLogSegmentReader.SyncSegment>
+{
+    private final CommitLogReadHandler handler;
+    private final CommitLogDescriptor descriptor;
+    private final RandomAccessReader reader;
+    private final Segmenter segmenter;
+    private final boolean tolerateTruncation;
+
+    /**
+     * ending position of the current sync section.
+     */
+    protected int end;
+
+    protected CommitLogSegmentReader(CommitLogReadHandler handler,
+                                     CommitLogDescriptor descriptor,
+                                     RandomAccessReader reader,
+                                     boolean tolerateTruncation)
+    {
+        this.handler = handler;
+        this.descriptor = descriptor;
+        this.reader = reader;
+        this.tolerateTruncation = tolerateTruncation;
+
+        end = (int) reader.getFilePointer();
+        if (descriptor.getEncryptionContext().isEnabled())
+            segmenter = new EncryptedSegmenter(descriptor, reader);
+        else if (descriptor.compression != null)
+            segmenter = new CompressedSegmenter(descriptor, reader);
+        else
+            segmenter = new NoOpSegmenter(reader);
+    }
+
+    public Iterator<SyncSegment> iterator()
+    {
+        return new SegmentIterator();
+    }
+
+    protected class SegmentIterator extends AbstractIterator<CommitLogSegmentReader.SyncSegment>
+    {
+        protected SyncSegment computeNext()
+        {
+            while (true)
+            {
+                try
+                {
+                    final int currentStart = end;
+                    end = readSyncMarker(descriptor, currentStart, reader);
+                    if (end == -1)
+                    {
+                        return endOfData();
+                    }
+                    if (end > reader.length())
+                    {
+                        // the CRC was good (meaning it was good when it was written and still looks legit), but the file is truncated now.
+                        // try to grab and use as much of the file as possible, which might be nothing if the end of the file truly is corrupt
+                        end = (int) reader.length();
+                    }
+                    return segmenter.nextSegment(currentStart + SYNC_MARKER_SIZE, end);
+                }
+                catch(CommitLogSegmentReader.SegmentReadException e)
+                {
+                    try
+                    {
+                        handler.handleUnrecoverableError(new CommitLogReadException(
+                                                    e.getMessage(),
+                                                    CommitLogReadErrorReason.UNRECOVERABLE_DESCRIPTOR_ERROR,
+                                                    !e.invalidCrc && tolerateTruncation));
+                    }
+                    catch (IOException ioe)
+                    {
+                        throw new RuntimeException(ioe);
+                    }
+                }
+                catch (IOException e)
+                {
+                    try
+                    {
+                        boolean tolerateErrorsInSection = tolerateTruncation & segmenter.tolerateSegmentErrors(end, reader.length());
+                        // if no exception is thrown, the while loop will continue
+                        handler.handleUnrecoverableError(new CommitLogReadException(
+                                                    e.getMessage(),
+                                                    CommitLogReadErrorReason.UNRECOVERABLE_DESCRIPTOR_ERROR,
+                                                    tolerateErrorsInSection));
+                    }
+                    catch (IOException ioe)
+                    {
+                        throw new RuntimeException(ioe);
+                    }
+                }
+            }
+        }
+    }
+
+    private int readSyncMarker(CommitLogDescriptor descriptor, int offset, RandomAccessReader reader) throws IOException
+    {
+        if (offset > reader.length() - SYNC_MARKER_SIZE)
+        {
+            // There was no room in the segment to write a final header. No data could be present here.
+            return -1;
+        }
+        reader.seek(offset);
+        CRC32 crc = new CRC32();
+        updateChecksumInt(crc, (int) (descriptor.id & 0xFFFFFFFFL));
+        updateChecksumInt(crc, (int) (descriptor.id >>> 32));
+        updateChecksumInt(crc, (int) reader.getPosition());
+        final int end = reader.readInt();
+        long filecrc = reader.readInt() & 0xffffffffL;
+        if (crc.getValue() != filecrc)
+        {
+            if (end != 0 || filecrc != 0)
+            {
+                String msg = String.format("Encountered bad header at position %d of commit log %s, with invalid CRC. " +
+                             "The end of segment marker should be zero.", offset, reader.getPath());
+                throw new SegmentReadException(msg, true);
+            }
+            return -1;
+        }
+        else if (end < offset || end > reader.length())
+        {
+            String msg = String.format("Encountered bad header at position %d of commit log %s, with bad position but valid CRC", offset, reader.getPath());
+            throw new SegmentReadException(msg, false);
+        }
+        return end;
+    }
+
+    public static class SegmentReadException extends IOException
+    {
+        public final boolean invalidCrc;
+
+        public SegmentReadException(String msg, boolean invalidCrc)
+        {
+            super(msg);
+            this.invalidCrc = invalidCrc;
+        }
+    }
+
+    public static class SyncSegment
+    {
+        /** the 'buffer' to replay commit log data from */
+        public final FileDataInput input;
+
+        /** offset in file where this section begins. */
+        public final int fileStartPosition;
+
+        /** offset in file where this section ends. */
+        public final int fileEndPosition;
+
+        /** the logical ending position of the buffer */
+        public final int endPosition;
+
+        public final boolean toleratesErrorsInSection;
+
+        public SyncSegment(FileDataInput input, int fileStartPosition, int fileEndPosition, int endPosition, boolean toleratesErrorsInSection)
+        {
+            this.input = input;
+            this.fileStartPosition = fileStartPosition;
+            this.fileEndPosition = fileEndPosition;
+            this.endPosition = endPosition;
+            this.toleratesErrorsInSection = toleratesErrorsInSection;
+        }
+    }
+
+    /**
+     * Derives the next section of the commit log to be replayed. Section boundaries are derived from the commit log sync markers.
+     */
+    interface Segmenter
+    {
+        /**
+         * Get the next section of the commit log to replay.
+         *
+         * @param startPosition the position in the file to begin reading at
+         * @param nextSectionStartPosition the file position of the beginning of the next section
+         * @return the buffer and it's logical end position
+         * @throws IOException
+         */
+        SyncSegment nextSegment(int startPosition, int nextSectionStartPosition) throws IOException;
+
+        /**
+         * Determine if we tolerate errors in the current segment.
+         */
+        default boolean tolerateSegmentErrors(int segmentEndPosition, long fileLength)
+        {
+            return segmentEndPosition >= fileLength || segmentEndPosition < 0;
+        }
+    }
+
+    static class NoOpSegmenter implements Segmenter
+    {
+        private final RandomAccessReader reader;
+
+        public NoOpSegmenter(RandomAccessReader reader)
+        {
+            this.reader = reader;
+        }
+
+        public SyncSegment nextSegment(int startPosition, int nextSectionStartPosition)
+        {
+            reader.seek(startPosition);
+            return new SyncSegment(reader, startPosition, nextSectionStartPosition, nextSectionStartPosition, true);
+        }
+
+        public boolean tolerateSegmentErrors(int end, long length)
+        {
+            return true;
+        }
+    }
+
+    static class CompressedSegmenter implements Segmenter
+    {
+        private final ICompressor compressor;
+        private final RandomAccessReader reader;
+        private byte[] compressedBuffer;
+        private byte[] uncompressedBuffer;
+        private long nextLogicalStart;
+
+        public CompressedSegmenter(CommitLogDescriptor desc, RandomAccessReader reader)
+        {
+            this(CompressionParams.createCompressor(desc.compression), reader);
+        }
+
+        public CompressedSegmenter(ICompressor compressor, RandomAccessReader reader)
+        {
+            this.compressor = compressor;
+            this.reader = reader;
+            compressedBuffer = new byte[0];
+            uncompressedBuffer = new byte[0];
+            nextLogicalStart = reader.getFilePointer();
+        }
+
+        public SyncSegment nextSegment(final int startPosition, final int nextSectionStartPosition) throws IOException
+        {
+            reader.seek(startPosition);
+            int uncompressedLength = reader.readInt();
+
+            int compressedLength = nextSectionStartPosition - (int)reader.getPosition();
+            if (compressedLength > compressedBuffer.length)
+                compressedBuffer = new byte[(int) (1.2 * compressedLength)];
+            reader.readFully(compressedBuffer, 0, compressedLength);
+
+            if (uncompressedLength > uncompressedBuffer.length)
+               uncompressedBuffer = new byte[(int) (1.2 * uncompressedLength)];
+            int count = compressor.uncompress(compressedBuffer, 0, compressedLength, uncompressedBuffer, 0);
+            nextLogicalStart += SYNC_MARKER_SIZE;
+            FileDataInput input = new FileSegmentInputStream(ByteBuffer.wrap(uncompressedBuffer, 0, count), reader.getPath(), nextLogicalStart);
+            nextLogicalStart += uncompressedLength;
+            return new SyncSegment(input, startPosition, nextSectionStartPosition, (int)nextLogicalStart, tolerateSegmentErrors(nextSectionStartPosition, reader.length()));
+        }
+    }
+
+    static class EncryptedSegmenter implements Segmenter
+    {
+        private final RandomAccessReader reader;
+        private final ICompressor compressor;
+        private final Cipher cipher;
+
+        /**
+         * the result of the decryption is written into this buffer.
+         */
+        private ByteBuffer decryptedBuffer;
+
+        /**
+         * the result of the decryption is written into this buffer.
+         */
+        private ByteBuffer uncompressedBuffer;
+
+        private final ChunkProvider chunkProvider;
+
+        private long currentSegmentEndPosition;
+        private long nextLogicalStart;
+
+        public EncryptedSegmenter(CommitLogDescriptor descriptor, RandomAccessReader reader)
+        {
+            this(reader, descriptor.getEncryptionContext());
+        }
+
+        @VisibleForTesting
+        EncryptedSegmenter(final RandomAccessReader reader, EncryptionContext encryptionContext)
+        {
+            this.reader = reader;
+            decryptedBuffer = ByteBuffer.allocate(0);
+            compressor = encryptionContext.getCompressor();
+            nextLogicalStart = reader.getFilePointer();
+
+            try
+            {
+                cipher = encryptionContext.getDecryptor();
+            }
+            catch (IOException ioe)
+            {
+                throw new FSReadError(ioe, reader.getPath());
+            }
+
+            chunkProvider = () -> {
+                if (reader.getFilePointer() >= currentSegmentEndPosition)
+                    return ByteBufferUtil.EMPTY_BYTE_BUFFER;
+                try
+                {
+                    decryptedBuffer = EncryptionUtils.decrypt(reader, decryptedBuffer, true, cipher);
+                    uncompressedBuffer = EncryptionUtils.uncompress(decryptedBuffer, uncompressedBuffer, true, compressor);
+                    return uncompressedBuffer;
+                }
+                catch (IOException e)
+                {
+                    throw new FSReadError(e, reader.getPath());
+                }
+            };
+        }
+
+        public SyncSegment nextSegment(int startPosition, int nextSectionStartPosition) throws IOException
+        {
+            int totalPlainTextLength = reader.readInt();
+            currentSegmentEndPosition = nextSectionStartPosition - 1;
+
+            nextLogicalStart += SYNC_MARKER_SIZE;
+            FileDataInput input = new EncryptedFileSegmentInputStream(reader.getPath(), nextLogicalStart, 0, totalPlainTextLength, chunkProvider);
+            nextLogicalStart += totalPlainTextLength;
+            return new SyncSegment(input, startPosition, nextSectionStartPosition, (int)nextLogicalStart, tolerateSegmentErrors(nextSectionStartPosition, reader.length()));
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/commitlog/CompressedSegment.java b/src/java/org/apache/cassandra/db/commitlog/CompressedSegment.java
index c73a30a..e44dfdf 100644
--- a/src/java/org/apache/cassandra/db/commitlog/CompressedSegment.java
+++ b/src/java/org/apache/cassandra/db/commitlog/CompressedSegment.java

@@ -17,69 +17,37 @@
  */
 package org.apache.cassandra.db.commitlog;
 
-import java.io.IOException;
 import java.nio.ByteBuffer;
-import java.util.Queue;
-import java.util.concurrent.ConcurrentLinkedQueue;
-import java.util.concurrent.atomic.AtomicInteger;
 
-import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.io.FSWriteError;
 import org.apache.cassandra.io.compress.BufferType;
 import org.apache.cassandra.io.compress.ICompressor;
 import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.utils.SyncUtil;
 
-/*
+/**
  * Compressed commit log segment. Provides an in-memory buffer for the mutation threads. On sync compresses the written
  * section of the buffer and writes it to the destination channel.
+ *
+ * The format of the compressed commit log is as follows:
+ * - standard commit log header (as written by {@link CommitLogDescriptor#writeHeader(ByteBuffer, CommitLogDescriptor)})
+ * - a series of 'sync segments' that are written every time the commit log is sync()'ed
+ * -- a sync section header, see {@link CommitLogSegment#writeSyncMarker(ByteBuffer, int, int, int)}
+ * -- total plain text length for this section
+ * -- a block of compressed data
  */
-public class CompressedSegment extends CommitLogSegment
+public class CompressedSegment extends FileDirectSegment
 {
-    private static final ThreadLocal<ByteBuffer> compressedBufferHolder = new ThreadLocal<ByteBuffer>() {
-        protected ByteBuffer initialValue()
-        {
-            return ByteBuffer.allocate(0);
-        }
-    };
-    static Queue<ByteBuffer> bufferPool = new ConcurrentLinkedQueue<>();
-
-    /**
-     * The number of buffers in use
-     */
-    private static AtomicInteger usedBuffers = new AtomicInteger(0);
-
-
-    /**
-     * Maximum number of buffers in the compression pool. The default value is 3, it should not be set lower than that
-     * (one segment in compression, one written to, one in reserve); delays in compression may cause the log to use
-     * more, depending on how soon the sync policy stops all writing threads.
-     */
-    static final int MAX_BUFFERPOOL_SIZE = DatabaseDescriptor.getCommitLogMaxCompressionBuffersInPool();
-
     static final int COMPRESSED_MARKER_SIZE = SYNC_MARKER_SIZE + 4;
     final ICompressor compressor;
-    final Runnable onClose;
-
-    volatile long lastWrittenPos = 0;
 
     /**
      * Constructs a new segment file.
      */
-    CompressedSegment(CommitLog commitLog, Runnable onClose)
+    CompressedSegment(CommitLog commitLog, AbstractCommitLogSegmentManager manager, Runnable onClose)
     {
-        super(commitLog);
+        super(commitLog, manager, onClose);
         this.compressor = commitLog.configuration.getCompressor();
-        this.onClose = onClose;
-        try
-        {
-            channel.write((ByteBuffer) buffer.duplicate().flip());
-            commitLog.allocator.addSize(lastWrittenPos = buffer.position());
-        }
-        catch (IOException e)
-        {
-            throw new FSWriteError(e, getPath());
-        }
     }
 
     ByteBuffer allocate(int size)
@@ -89,21 +57,9 @@
 
     ByteBuffer createBuffer(CommitLog commitLog)
     {
-        usedBuffers.incrementAndGet();
-        ByteBuffer buf = bufferPool.poll();
-        if (buf == null)
-        {
-            // this.compressor is not yet set, so we must use the commitLog's one.
-            buf = commitLog.configuration.getCompressor()
-                                         .preferredBufferType()
-                                         .allocate(DatabaseDescriptor.getCommitLogSegmentSize());
-        } else
-            buf.clear();
-        return buf;
+        return manager.getBufferPool().createBuffer(commitLog.configuration.getCompressor().preferredBufferType());
     }
 
-    static long startMillis = System.currentTimeMillis();
-
     @Override
     void write(int startMarker, int nextMarker)
     {
@@ -115,13 +71,13 @@
         try
         {
             int neededBufferSize = compressor.initialCompressedBufferLength(length) + COMPRESSED_MARKER_SIZE;
-            ByteBuffer compressedBuffer = compressedBufferHolder.get();
+            ByteBuffer compressedBuffer = manager.getBufferPool().getThreadLocalReusableBuffer();
             if (compressor.preferredBufferType() != BufferType.typeOf(compressedBuffer) ||
                 compressedBuffer.capacity() < neededBufferSize)
             {
                 FileUtils.clean(compressedBuffer);
                 compressedBuffer = allocate(neededBufferSize);
-                compressedBufferHolder.set(compressedBuffer);
+                manager.getBufferPool().setThreadLocalReusableBuffer(compressedBuffer);
             }
 
             ByteBuffer inputBuffer = buffer.duplicate();
@@ -135,7 +91,7 @@
             // Only one thread can be here at a given time.
             // Protected by synchronization on CommitLogSegment.sync().
             writeSyncMarker(compressedBuffer, 0, (int) channel.position(), (int) channel.position() + compressedBuffer.remaining());
-            commitLog.allocator.addSize(compressedBuffer.limit());
+            manager.addSize(compressedBuffer.limit());
             channel.write(compressedBuffer);
             assert channel.position() - lastWrittenPos == compressedBuffer.limit();
             lastWrittenPos = channel.position();
@@ -148,39 +104,6 @@
     }
 
     @Override
-    protected void internalClose()
-    {
-        usedBuffers.decrementAndGet();
-        try {
-            if (bufferPool.size() < MAX_BUFFERPOOL_SIZE)
-                bufferPool.add(buffer);
-            else
-                FileUtils.clean(buffer);
-            super.internalClose();
-        }
-        finally
-        {
-            onClose.run();
-        }
-    }
-
-    /**
-     * Checks if the number of buffers in use is greater or equals to the maximum number of buffers allowed in the pool.
-     *
-     * @return <code>true</code> if the number of buffers in use is greater or equals to the maximum number of buffers
-     * allowed in the pool, <code>false</code> otherwise.
-     */
-    static boolean hasReachedPoolLimit()
-    {
-        return usedBuffers.get() >= MAX_BUFFERPOOL_SIZE;
-    }
-
-    static void shutdown()
-    {
-        bufferPool.clear();
-    }
-
-    @Override
     public long onDiskSize()
     {
         return lastWrittenPos;

diff --git a/src/java/org/apache/cassandra/db/commitlog/EncryptedFileSegmentInputStream.java b/src/java/org/apache/cassandra/db/commitlog/EncryptedFileSegmentInputStream.java
new file mode 100644
index 0000000..9da3d50
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/commitlog/EncryptedFileSegmentInputStream.java

@@ -0,0 +1,108 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.db.commitlog;
+
+import java.io.DataInput;
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.io.util.DataPosition;
+import org.apache.cassandra.io.util.FileDataInput;
+import org.apache.cassandra.io.util.FileSegmentInputStream;
+
+/**
+ * Each segment of an encrypted file may contain many encrypted chunks, and each chunk needs to be individually decrypted
+ * to reconstruct the full segment.
+ */
+public class EncryptedFileSegmentInputStream extends FileSegmentInputStream implements FileDataInput, DataInput
+{
+    private final long segmentOffset;
+    private final int expectedLength;
+    private final ChunkProvider chunkProvider;
+
+    /**
+     * Offset representing the decrypted chunks already processed in this segment.
+     */
+    private int totalChunkOffset;
+
+    public EncryptedFileSegmentInputStream(String filePath, long segmentOffset, int position, int expectedLength, ChunkProvider chunkProvider)
+    {
+        super(chunkProvider.nextChunk(), filePath, position);
+        this.segmentOffset = segmentOffset;
+        this.expectedLength = expectedLength;
+        this.chunkProvider = chunkProvider;
+    }
+
+    public interface ChunkProvider
+    {
+        /**
+         * Get the next chunk from the backing provider, if any chunks remain.
+         * @return Next chunk, else null if no more chunks remain.
+         */
+        ByteBuffer nextChunk();
+    }
+
+    public long getFilePointer()
+    {
+        return segmentOffset + totalChunkOffset + buffer.position();
+    }
+
+    public boolean isEOF()
+    {
+        return totalChunkOffset + buffer.position() >= expectedLength;
+    }
+
+    public long bytesRemaining()
+    {
+        return expectedLength - (totalChunkOffset + buffer.position());
+    }
+
+    public void seek(long position)
+    {
+        long bufferPos = position - totalChunkOffset - segmentOffset;
+        while (buffer != null && bufferPos > buffer.capacity())
+        {
+            // rebuffer repeatedly until we have reached desired position
+            buffer.position(buffer.limit());
+
+            // increases totalChunkOffset
+            reBuffer();
+            bufferPos = position - totalChunkOffset - segmentOffset;
+        }
+        if (buffer == null || bufferPos < 0 || bufferPos > buffer.capacity())
+            throw new IllegalArgumentException(
+                    String.format("Unable to seek to position %d in %s (%d bytes) in partial mode",
+                            position,
+                            getPath(),
+                            segmentOffset + expectedLength));
+        buffer.position((int) bufferPos);
+    }
+
+    public long bytesPastMark(DataPosition mark)
+    {
+        throw new UnsupportedOperationException();
+    }
+
+    public void reBuffer()
+    {
+        totalChunkOffset += buffer.position();
+        buffer = chunkProvider.nextChunk();
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/commitlog/EncryptedSegment.java b/src/java/org/apache/cassandra/db/commitlog/EncryptedSegment.java
new file mode 100644
index 0000000..e13b20a
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/commitlog/EncryptedSegment.java

@@ -0,0 +1,161 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db.commitlog;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Map;
+import javax.crypto.Cipher;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.io.FSWriteError;
+import org.apache.cassandra.io.compress.BufferType;
+import org.apache.cassandra.io.compress.ICompressor;
+import org.apache.cassandra.security.EncryptionUtils;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.utils.Hex;
+import org.apache.cassandra.utils.SyncUtil;
+
+import static org.apache.cassandra.security.EncryptionUtils.ENCRYPTED_BLOCK_HEADER_SIZE;
+
+/**
+ * Writes encrypted segments to disk. Data is compressed before encrypting to (hopefully) reduce the size of the data into
+ * the encryption algorithms.
+ *
+ * The format of the encrypted commit log is as follows:
+ * - standard commit log header (as written by {@link CommitLogDescriptor#writeHeader(ByteBuffer, CommitLogDescriptor)})
+ * - a series of 'sync segments' that are written every time the commit log is sync()'ed
+ * -- a sync section header, see {@link CommitLogSegment#writeSyncMarker(ByteBuffer, int, int, int)}
+ * -- total plain text length for this section
+ * -- a series of encrypted data blocks, each of which contains:
+ * --- the length of the encrypted block (cipher text)
+ * --- the length of the unencrypted data (compressed text)
+ * --- the encrypted block, which contains:
+ * ---- the length of the plain text (raw) data
+ * ---- block of compressed data
+ *
+ * Notes:
+ * - "length of the unencrypted data" is different from the length of resulting decrypted buffer as encryption adds padding
+ * to the output buffer, and we need to ignore that padding when processing.
+ */
+public class EncryptedSegment extends FileDirectSegment
+{
+    private static final Logger logger = LoggerFactory.getLogger(EncryptedSegment.class);
+
+    private static final int ENCRYPTED_SECTION_HEADER_SIZE = SYNC_MARKER_SIZE + 4;
+
+    private final EncryptionContext encryptionContext;
+    private final Cipher cipher;
+
+    public EncryptedSegment(CommitLog commitLog, AbstractCommitLogSegmentManager manager, Runnable onClose)
+    {
+        super(commitLog, manager, onClose);
+        this.encryptionContext = commitLog.configuration.getEncryptionContext();
+
+        try
+        {
+            cipher = encryptionContext.getEncryptor();
+        }
+        catch (IOException e)
+        {
+            throw new FSWriteError(e, logFile);
+        }
+        logger.debug("created a new encrypted commit log segment: {}", logFile);
+    }
+
+    protected Map<String, String> additionalHeaderParameters()
+    {
+        Map<String, String> map = encryptionContext.toHeaderParameters();
+        map.put(EncryptionContext.ENCRYPTION_IV, Hex.bytesToHex(cipher.getIV()));
+        return map;
+    }
+
+    ByteBuffer createBuffer(CommitLog commitLog)
+    {
+        // Note: we want to keep the compression buffers on-heap as we need those bytes for encryption,
+        // and we want to avoid copying from off-heap (compression buffer) to on-heap encryption APIs
+        return manager.getBufferPool().createBuffer(BufferType.ON_HEAP);
+    }
+
+    void write(int startMarker, int nextMarker)
+    {
+        int contentStart = startMarker + SYNC_MARKER_SIZE;
+        final int length = nextMarker - contentStart;
+        // The length may be 0 when the segment is being closed.
+        assert length > 0 || length == 0 && !isStillAllocating();
+
+        final ICompressor compressor = encryptionContext.getCompressor();
+        final int blockSize = encryptionContext.getChunkLength();
+        try
+        {
+            ByteBuffer inputBuffer = buffer.duplicate();
+            inputBuffer.limit(contentStart + length).position(contentStart);
+            ByteBuffer buffer = manager.getBufferPool().getThreadLocalReusableBuffer();
+
+            // save space for the sync marker at the beginning of this section
+            final long syncMarkerPosition = lastWrittenPos;
+            channel.position(syncMarkerPosition + ENCRYPTED_SECTION_HEADER_SIZE);
+
+            // loop over the segment data in encryption buffer sized chunks
+            while (contentStart < nextMarker)
+            {
+                int nextBlockSize = nextMarker - blockSize > contentStart ? blockSize : nextMarker - contentStart;
+                ByteBuffer slice = inputBuffer.duplicate();
+                slice.limit(contentStart + nextBlockSize).position(contentStart);
+
+                buffer = EncryptionUtils.compress(slice, buffer, true, compressor);
+
+                // reuse the same buffer for the input and output of the encryption operation
+                buffer = EncryptionUtils.encryptAndWrite(buffer, channel, true, cipher);
+
+                contentStart += nextBlockSize;
+                manager.addSize(buffer.limit() + ENCRYPTED_BLOCK_HEADER_SIZE);
+            }
+
+            lastWrittenPos = channel.position();
+
+            // rewind to the beginning of the section and write out the sync marker,
+            // reusing the one of the existing buffers
+            buffer = ByteBufferUtil.ensureCapacity(buffer, ENCRYPTED_SECTION_HEADER_SIZE, true);
+            writeSyncMarker(buffer, 0, (int) syncMarkerPosition, (int) lastWrittenPos);
+            buffer.putInt(SYNC_MARKER_SIZE, length);
+            buffer.position(0).limit(ENCRYPTED_SECTION_HEADER_SIZE);
+            manager.addSize(buffer.limit());
+
+            channel.position(syncMarkerPosition);
+            channel.write(buffer);
+
+            SyncUtil.force(channel, true);
+
+            if (manager.getBufferPool().getThreadLocalReusableBuffer().capacity() < buffer.capacity())
+                manager.getBufferPool().setThreadLocalReusableBuffer(buffer);
+        }
+        catch (Exception e)
+        {
+            throw new FSWriteError(e, getPath());
+        }
+    }
+
+    public long onDiskSize()
+    {
+        return lastWrittenPos;
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/commitlog/FileDirectSegment.java b/src/java/org/apache/cassandra/db/commitlog/FileDirectSegment.java
new file mode 100644
index 0000000..d4160e4
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/commitlog/FileDirectSegment.java

@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db.commitlog;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.io.FSWriteError;
+
+/**
+ * Writes to the backing commit log file only on sync, allowing transformations of the mutations,
+ * such as compression or encryption, before writing out to disk.
+ */
+public abstract class FileDirectSegment extends CommitLogSegment
+{
+    volatile long lastWrittenPos = 0;
+    private final Runnable onClose;
+
+    FileDirectSegment(CommitLog commitLog, AbstractCommitLogSegmentManager manager, Runnable onClose)
+    {
+        super(commitLog, manager);
+        this.onClose = onClose;
+    }
+
+    @Override
+    void writeLogHeader()
+    {
+        super.writeLogHeader();
+        try
+        {
+            channel.write((ByteBuffer) buffer.duplicate().flip());
+            manager.addSize(lastWrittenPos = buffer.position());
+        }
+        catch (IOException e)
+        {
+            throw new FSWriteError(e, getPath());
+        }
+    }
+
+    @Override
+    protected void internalClose()
+    {
+        try
+        {
+            manager.getBufferPool().releaseBuffer(buffer);
+            super.internalClose();
+        }
+        finally
+        {
+            onClose.run();
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/commitlog/MemoryMappedSegment.java b/src/java/org/apache/cassandra/db/commitlog/MemoryMappedSegment.java
index 3a52e11..2bbd12d 100644
--- a/src/java/org/apache/cassandra/db/commitlog/MemoryMappedSegment.java
+++ b/src/java/org/apache/cassandra/db/commitlog/MemoryMappedSegment.java

@@ -39,12 +39,11 @@
     /**
      * Constructs a new segment file.
      *
-     * @param filePath  if not null, recycles the existing file by renaming it and truncating it to CommitLog.SEGMENT_SIZE.
      * @param commitLog the commit log it will be used with.
      */
-    MemoryMappedSegment(CommitLog commitLog)
+    MemoryMappedSegment(CommitLog commitLog, AbstractCommitLogSegmentManager manager)
     {
-        super(commitLog);
+        super(commitLog, manager);
         // mark the initial sync marker as uninitialised
         int firstSync = buffer.position();
         buffer.putInt(firstSync + 0, 0);
@@ -67,7 +66,7 @@
             {
                 throw new FSWriteError(e, logFile);
             }
-            commitLog.allocator.addSize(DatabaseDescriptor.getCommitLogSegmentSize());
+            manager.addSize(DatabaseDescriptor.getCommitLogSegmentSize());
 
             return channel.map(FileChannel.MapMode.READ_WRITE, 0, DatabaseDescriptor.getCommitLogSegmentSize());
         }

diff --git a/src/java/org/apache/cassandra/db/commitlog/ReplayPosition.java b/src/java/org/apache/cassandra/db/commitlog/ReplayPosition.java
deleted file mode 100644
index 0b21763..0000000
--- a/src/java/org/apache/cassandra/db/commitlog/ReplayPosition.java
+++ /dev/null

@@ -1,178 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.db.commitlog;
-
-import java.io.IOException;
-import java.util.Map;
-import java.util.NavigableMap;
-import java.util.TreeMap;
-
-import com.google.common.collect.Ordering;
-
-import org.apache.cassandra.db.TypeSizes;
-import org.apache.cassandra.io.ISerializer;
-import org.apache.cassandra.io.sstable.format.SSTableReader;
-import org.apache.cassandra.io.util.DataInputPlus;
-import org.apache.cassandra.io.util.DataOutputPlus;
-
-public class ReplayPosition implements Comparable<ReplayPosition>
-{
-    public static final ReplayPositionSerializer serializer = new ReplayPositionSerializer();
-
-    // NONE is used for SSTables that are streamed from other nodes and thus have no relationship
-    // with our local commitlog. The values satisfy the criteria that
-    //  - no real commitlog segment will have the given id
-    //  - it will sort before any real replayposition, so it will be effectively ignored by getReplayPosition
-    public static final ReplayPosition NONE = new ReplayPosition(-1, 0);
-
-    public final long segment;
-    public final int position;
-
-    /**
-     * A filter of known safe-to-discard commit log replay positions, based on
-     * the range covered by on disk sstables and those prior to the most recent truncation record
-     */
-    public static class ReplayFilter
-    {
-        final NavigableMap<ReplayPosition, ReplayPosition> persisted = new TreeMap<>();
-        public ReplayFilter(Iterable<SSTableReader> onDisk, ReplayPosition truncatedAt)
-        {
-            for (SSTableReader reader : onDisk)
-            {
-                ReplayPosition start = reader.getSSTableMetadata().commitLogLowerBound;
-                ReplayPosition end = reader.getSSTableMetadata().commitLogUpperBound;
-                add(persisted, start, end);
-            }
-            if (truncatedAt != null)
-                add(persisted, ReplayPosition.NONE, truncatedAt);
-        }
-
-        private static void add(NavigableMap<ReplayPosition, ReplayPosition> ranges, ReplayPosition start, ReplayPosition end)
-        {
-            // extend ourselves to cover any ranges we overlap
-            // record directly preceding our end may extend past us, so take the max of our end and its
-            Map.Entry<ReplayPosition, ReplayPosition> extend = ranges.floorEntry(end);
-            if (extend != null && extend.getValue().compareTo(end) > 0)
-                end = extend.getValue();
-
-            // record directly preceding our start may extend into us; if it does, we take it as our start
-            extend = ranges.lowerEntry(start);
-            if (extend != null && extend.getValue().compareTo(start) >= 0)
-                start = extend.getKey();
-
-            ranges.subMap(start, end).clear();
-            ranges.put(start, end);
-        }
-
-        public boolean shouldReplay(ReplayPosition position)
-        {
-            // replay ranges are start exclusive, end inclusive
-            Map.Entry<ReplayPosition, ReplayPosition> range = persisted.lowerEntry(position);
-            return range == null || position.compareTo(range.getValue()) > 0;
-        }
-
-        public boolean isEmpty()
-        {
-            return persisted.isEmpty();
-        }
-    }
-
-    public static ReplayPosition firstNotCovered(Iterable<ReplayFilter> ranges)
-    {
-        ReplayPosition min = null;
-        for (ReplayFilter map : ranges)
-        {
-            ReplayPosition first = map.persisted.firstEntry().getValue();
-            if (min == null)
-                min = first;
-            else
-                min = Ordering.natural().min(min, first);
-        }
-        if (min == null)
-            return NONE;
-        return min;
-    }
-
-    public ReplayPosition(long segment, int position)
-    {
-        this.segment = segment;
-        assert position >= 0;
-        this.position = position;
-    }
-
-    public int compareTo(ReplayPosition that)
-    {
-        if (this.segment != that.segment)
-            return Long.compare(this.segment, that.segment);
-
-        return Integer.compare(this.position, that.position);
-    }
-
-    @Override
-    public boolean equals(Object o)
-    {
-        if (this == o) return true;
-        if (o == null || getClass() != o.getClass()) return false;
-
-        ReplayPosition that = (ReplayPosition) o;
-
-        if (position != that.position) return false;
-        return segment == that.segment;
-    }
-
-    @Override
-    public int hashCode()
-    {
-        int result = (int) (segment ^ (segment >>> 32));
-        result = 31 * result + position;
-        return result;
-    }
-
-    @Override
-    public String toString()
-    {
-        return "ReplayPosition(" +
-               "segmentId=" + segment +
-               ", position=" + position +
-               ')';
-    }
-
-    public ReplayPosition clone()
-    {
-        return new ReplayPosition(segment, position);
-    }
-
-    public static class ReplayPositionSerializer implements ISerializer<ReplayPosition>
-    {
-        public void serialize(ReplayPosition rp, DataOutputPlus out) throws IOException
-        {
-            out.writeLong(rp.segment);
-            out.writeInt(rp.position);
-        }
-
-        public ReplayPosition deserialize(DataInputPlus in) throws IOException
-        {
-            return new ReplayPosition(in.readLong(), in.readInt());
-        }
-
-        public long serializedSize(ReplayPosition rp)
-        {
-            return TypeSizes.sizeof(rp.segment) + TypeSizes.sizeof(rp.position);
-        }
-    }
-}

diff --git a/src/java/org/apache/cassandra/db/commitlog/SimpleCachedBufferPool.java b/src/java/org/apache/cassandra/db/commitlog/SimpleCachedBufferPool.java
new file mode 100644
index 0000000..1c10c25
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/commitlog/SimpleCachedBufferPool.java

@@ -0,0 +1,118 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.commitlog;
+
+import java.nio.ByteBuffer;
+import java.util.Queue;
+import java.util.concurrent.ConcurrentLinkedQueue;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import io.netty.util.concurrent.FastThreadLocal;
+import org.apache.cassandra.io.compress.BufferType;
+import org.apache.cassandra.io.util.FileUtils;
+
+/**
+ * A very simple Bytebuffer pool with a fixed allocation size and a cached max allocation count. Will allow
+ * you to go past the "max", freeing all buffers allocated beyond the max buffer count on release.
+ *
+ * Has a reusable thread local ByteBuffer that users can make use of.
+ */
+public class SimpleCachedBufferPool
+{
+    protected static final FastThreadLocal<ByteBuffer> reusableBufferHolder = new FastThreadLocal<ByteBuffer>()
+    {
+        protected ByteBuffer initialValue()
+        {
+            return ByteBuffer.allocate(0);
+        }
+    };
+
+    private Queue<ByteBuffer> bufferPool = new ConcurrentLinkedQueue<>();
+    private AtomicInteger usedBuffers = new AtomicInteger(0);
+
+    /**
+     * Maximum number of buffers in the compression pool. Any buffers above this count that are allocated will be cleaned
+     * upon release rather than held and re-used.
+     */
+    private final int maxBufferPoolSize;
+
+    /**
+     * Size of individual buffer segments on allocation.
+     */
+    private final int bufferSize;
+
+    public SimpleCachedBufferPool(int maxBufferPoolSize, int bufferSize)
+    {
+        this.maxBufferPoolSize = maxBufferPoolSize;
+        this.bufferSize = bufferSize;
+    }
+
+    public ByteBuffer createBuffer(BufferType bufferType)
+    {
+        usedBuffers.incrementAndGet();
+        ByteBuffer buf = bufferPool.poll();
+        if (buf != null)
+        {
+            buf.clear();
+            return buf;
+        }
+        return bufferType.allocate(bufferSize);
+    }
+
+    public ByteBuffer getThreadLocalReusableBuffer()
+    {
+        return reusableBufferHolder.get();
+    }
+
+    public void setThreadLocalReusableBuffer(ByteBuffer buffer)
+    {
+        reusableBufferHolder.set(buffer);
+    }
+
+    public void releaseBuffer(ByteBuffer buffer)
+    {
+        usedBuffers.decrementAndGet();
+
+        if (bufferPool.size() < maxBufferPoolSize)
+            bufferPool.add(buffer);
+        else
+            FileUtils.clean(buffer);
+    }
+
+    public void shutdown()
+    {
+        bufferPool.clear();
+    }
+
+    public boolean atLimit()
+    {
+        return usedBuffers.get() >= maxBufferPoolSize;
+    }
+
+    @Override
+    public String toString()
+    {
+        return new StringBuilder()
+               .append("SimpleBufferPool:")
+               .append(" bufferCount:").append(usedBuffers.get())
+               .append(", bufferSize:").append(maxBufferPoolSize)
+               .append(", buffer size:").append(bufferSize)
+               .toString();
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
index 0dce52b..cc9fc23 100644
--- a/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java
+++ b/src/java/org/apache/cassandra/db/compaction/AbstractCompactionStrategy.java

@@ -28,6 +28,7 @@
 import org.apache.cassandra.db.Directories;
 import org.apache.cassandra.db.SerializationHeader;
 import org.apache.cassandra.db.lifecycle.SSTableSet;
+import org.apache.cassandra.index.Index;
 import org.apache.cassandra.io.sstable.Descriptor;
 import org.apache.cassandra.io.sstable.SSTableMultiWriter;
 import org.apache.cassandra.io.sstable.SimpleSSTableMultiWriter;
@@ -62,11 +63,13 @@
     // minimum interval needed to perform tombstone removal compaction in seconds, default 86400 or 1 day.
     protected static final long DEFAULT_TOMBSTONE_COMPACTION_INTERVAL = 86400;
     protected static final boolean DEFAULT_UNCHECKED_TOMBSTONE_COMPACTION_OPTION = false;
+    protected static final boolean DEFAULT_LOG_ALL_OPTION = false;
 
     protected static final String TOMBSTONE_THRESHOLD_OPTION = "tombstone_threshold";
     protected static final String TOMBSTONE_COMPACTION_INTERVAL_OPTION = "tombstone_compaction_interval";
     // disable range overlap check when deciding if an SSTable is candidate for tombstone compaction (CASSANDRA-6563)
     protected static final String UNCHECKED_TOMBSTONE_COMPACTION_OPTION = "unchecked_tombstone_compaction";
+    protected static final String LOG_ALL_OPTION = "log_all";
     protected static final String COMPACTION_ENABLED = "enabled";
     public static final String ONLY_PURGE_REPAIRED_TOMBSTONES = "only_purge_repaired_tombstones";
 
@@ -77,6 +80,7 @@
     protected long tombstoneCompactionInterval;
     protected boolean uncheckedTombstoneCompaction;
     protected boolean disableTombstoneCompactions = false;
+    protected boolean logAll = true;
 
     private final Directories directories;
 
@@ -109,6 +113,8 @@
             tombstoneCompactionInterval = optionValue == null ? DEFAULT_TOMBSTONE_COMPACTION_INTERVAL : Long.parseLong(optionValue);
             optionValue = options.get(UNCHECKED_TOMBSTONE_COMPACTION_OPTION);
             uncheckedTombstoneCompaction = optionValue == null ? DEFAULT_UNCHECKED_TOMBSTONE_COMPACTION_OPTION : Boolean.parseBoolean(optionValue);
+            optionValue = options.get(LOG_ALL_OPTION);
+            logAll = optionValue == null ? DEFAULT_LOG_ALL_OPTION : Boolean.parseBoolean(optionValue);
             if (!shouldBeEnabled())
                 this.disable();
         }
@@ -316,6 +322,12 @@
 
     public abstract void addSSTable(SSTableReader added);
 
+    public synchronized void addSSTables(Iterable<SSTableReader> added)
+    {
+        for (SSTableReader sstable : added)
+            addSSTable(sstable);
+    }
+
     public abstract void removeSSTable(SSTableReader sstable);
 
     public static class ScannerList implements AutoCloseable
@@ -453,7 +465,16 @@
         if (unchecked != null)
         {
             if (!unchecked.equalsIgnoreCase("true") && !unchecked.equalsIgnoreCase("false"))
-                throw new ConfigurationException(String.format("'%s' should be either 'true' or 'false', not '%s'",UNCHECKED_TOMBSTONE_COMPACTION_OPTION, unchecked));
+                throw new ConfigurationException(String.format("'%s' should be either 'true' or 'false', not '%s'", UNCHECKED_TOMBSTONE_COMPACTION_OPTION, unchecked));
+        }
+
+        String logAll = options.get(LOG_ALL_OPTION);
+        if (logAll != null)
+        {
+            if (!logAll.equalsIgnoreCase("true") && !logAll.equalsIgnoreCase("false"))
+            {
+                throw new ConfigurationException(String.format("'%s' should either be 'true' or 'false', not %s", LOG_ALL_OPTION, logAll));
+            }
         }
 
         String compactionEnabled = options.get(COMPACTION_ENABLED);
@@ -464,10 +485,12 @@
                 throw new ConfigurationException(String.format("enabled should either be 'true' or 'false', not %s", compactionEnabled));
             }
         }
+
         Map<String, String> uncheckedOptions = new HashMap<String, String>(options);
         uncheckedOptions.remove(TOMBSTONE_THRESHOLD_OPTION);
         uncheckedOptions.remove(TOMBSTONE_COMPACTION_INTERVAL_OPTION);
         uncheckedOptions.remove(UNCHECKED_TOMBSTONE_COMPACTION_OPTION);
+        uncheckedOptions.remove(LOG_ALL_OPTION);
         uncheckedOptions.remove(COMPACTION_ENABLED);
         uncheckedOptions.remove(ONLY_PURGE_REPAIRED_TOMBSTONES);
         return uncheckedOptions;
@@ -511,9 +534,20 @@
         return groupedSSTables;
     }
 
-    public SSTableMultiWriter createSSTableMultiWriter(Descriptor descriptor, long keyCount, long repairedAt, MetadataCollector meta, SerializationHeader header, LifecycleTransaction txn)
+    public CompactionLogger.Strategy strategyLogger()
     {
-        return SimpleSSTableMultiWriter.create(descriptor, keyCount, repairedAt, cfs.metadata, meta, header, txn);
+        return CompactionLogger.Strategy.none;
+    }
+
+    public SSTableMultiWriter createSSTableMultiWriter(Descriptor descriptor,
+                                                       long keyCount,
+                                                       long repairedAt,
+                                                       MetadataCollector meta,
+                                                       SerializationHeader header,
+                                                       Collection<Index> indexes,
+                                                       LifecycleTransaction txn)
+    {
+        return SimpleSSTableMultiWriter.create(descriptor, keyCount, repairedAt, cfs.metadata, meta, header, indexes, txn);
     }
 
     public boolean supportsEarlyOpen()

diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionController.java b/src/java/org/apache/cassandra/db/compaction/CompactionController.java
index fbf29e3..e6115ed 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionController.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionController.java

@@ -211,9 +211,8 @@
         {
             // if we don't have bloom filter(bf_fp_chance=1.0 or filter file is missing),
             // we check index file instead.
-            if (sstable.getBloomFilter() instanceof AlwaysPresentFilter && sstable.getPosition(key, SSTableReader.Operator.EQ, false) != null)
-                min = Math.min(min, sstable.getMinTimestamp());
-            else if (sstable.getBloomFilter().isPresent(key))
+            if ((sstable.getBloomFilter() instanceof AlwaysPresentFilter && sstable.getPosition(key, SSTableReader.Operator.EQ, false) != null)
+                || sstable.getBloomFilter().isPresent(key))
                 min = Math.min(min, sstable.getMinTimestamp());
         }
 

diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionIterator.java b/src/java/org/apache/cassandra/db/compaction/CompactionIterator.java
index d39da2a..0111aec 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionIterator.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionIterator.java

@@ -63,6 +63,7 @@
 
     private final long totalBytes;
     private long bytesRead;
+    private long totalSourceCQLRows;
 
     /*
      * counters for merged rows.
@@ -136,6 +137,11 @@
         return mergeCounters;
     }
 
+    public long getTotalSourceCQLRows()
+    {
+        return totalSourceCQLRows;
+    }
+
     private UnfilteredPartitionIterators.MergeListener listener()
     {
         return new UnfilteredPartitionIterators.MergeListener()
@@ -287,6 +293,7 @@
         @Override
         protected void updateProgress()
         {
+            totalSourceCQLRows++;
             if ((++compactedUnfiltered) % UNFILTERED_TO_UPDATE_PROGRESS == 0)
                 updateBytesRead();
         }

diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionLogger.java b/src/java/org/apache/cassandra/db/compaction/CompactionLogger.java
new file mode 100644
index 0000000..16a7f2a
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionLogger.java

@@ -0,0 +1,342 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.compaction;
+
+import java.io.IOException;
+import java.io.OutputStreamWriter;
+import java.nio.file.*;
+import java.util.Collection;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.*;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.function.Consumer;
+import java.util.function.Function;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.utils.NoSpamLogger;
+import org.codehaus.jackson.JsonNode;
+import org.codehaus.jackson.node.ArrayNode;
+import org.codehaus.jackson.node.JsonNodeFactory;
+import org.codehaus.jackson.node.ObjectNode;
+
+public class CompactionLogger
+{
+    public interface Strategy
+    {
+        JsonNode sstable(SSTableReader sstable);
+
+        JsonNode options();
+
+        static Strategy none = new Strategy()
+        {
+            public JsonNode sstable(SSTableReader sstable)
+            {
+                return null;
+            }
+
+            public JsonNode options()
+            {
+                return null;
+            }
+        };
+    }
+
+    /**
+     * This will produce the compaction strategy's starting information.
+     */
+    public interface StrategySummary
+    {
+        JsonNode getSummary();
+    }
+
+    /**
+     * This is an interface to allow writing to a different interface.
+     */
+    public interface Writer
+    {
+        /**
+         * This is used when we are already trying to write out the start of a
+         * @param statement This should be written out to the medium capturing the logs
+         * @param tag       This is an identifier for a strategy; each strategy should have a distinct Object
+         */
+        void writeStart(JsonNode statement, Object tag);
+
+        /**
+         * @param statement This should be written out to the medium capturing the logs
+         * @param summary   This can be used when a tag is not recognized by this writer; this can be because the file
+         *                  has been rolled, or otherwise the writer had to start over
+         * @param tag       This is an identifier for a strategy; each strategy should have a distinct Object
+         */
+        void write(JsonNode statement, StrategySummary summary, Object tag);
+    }
+
+    private interface CompactionStrategyAndTableFunction
+    {
+        JsonNode apply(AbstractCompactionStrategy strategy, SSTableReader sstable);
+    }
+
+    private static final JsonNodeFactory json = JsonNodeFactory.instance;
+    private static final Logger logger = LoggerFactory.getLogger(CompactionLogger.class);
+    private static final Writer serializer = new CompactionLogSerializer();
+    private final ColumnFamilyStore cfs;
+    private final CompactionStrategyManager csm;
+    private final AtomicInteger identifier = new AtomicInteger(0);
+    private final Map<AbstractCompactionStrategy, String> compactionStrategyMapping = new ConcurrentHashMap<>();
+    private final AtomicBoolean enabled = new AtomicBoolean(false);
+
+    public CompactionLogger(ColumnFamilyStore cfs, CompactionStrategyManager csm)
+    {
+        this.csm = csm;
+        this.cfs = cfs;
+    }
+
+    private void forEach(Consumer<AbstractCompactionStrategy> consumer)
+    {
+        csm.getStrategies()
+           .forEach(l -> l.forEach(consumer));
+    }
+
+    private ArrayNode compactionStrategyMap(Function<AbstractCompactionStrategy, JsonNode> select)
+    {
+        ArrayNode node = json.arrayNode();
+        forEach(acs -> node.add(select.apply(acs)));
+        return node;
+    }
+
+    private ArrayNode sstableMap(Collection<SSTableReader> sstables, CompactionStrategyAndTableFunction csatf)
+    {
+        ArrayNode node = json.arrayNode();
+        sstables.forEach(t -> node.add(csatf.apply(csm.getCompactionStrategyFor(t), t)));
+        return node;
+    }
+
+    private String getId(AbstractCompactionStrategy strategy)
+    {
+        return compactionStrategyMapping.computeIfAbsent(strategy, s -> String.valueOf(identifier.getAndIncrement()));
+    }
+
+    private JsonNode formatSSTables(AbstractCompactionStrategy strategy)
+    {
+        ArrayNode node = json.arrayNode();
+        for (SSTableReader sstable : cfs.getLiveSSTables())
+        {
+            if (csm.getCompactionStrategyFor(sstable) == strategy)
+                node.add(formatSSTable(strategy, sstable));
+        }
+        return node;
+    }
+
+    private JsonNode formatSSTable(AbstractCompactionStrategy strategy, SSTableReader sstable)
+    {
+        ObjectNode node = json.objectNode();
+        node.put("generation", sstable.descriptor.generation);
+        node.put("version", sstable.descriptor.version.getVersion());
+        node.put("size", sstable.onDiskLength());
+        JsonNode logResult = strategy.strategyLogger().sstable(sstable);
+        if (logResult != null)
+            node.put("details", logResult);
+        return node;
+    }
+
+    private JsonNode startStrategy(AbstractCompactionStrategy strategy)
+    {
+        ObjectNode node = json.objectNode();
+        node.put("strategyId", getId(strategy));
+        node.put("type", strategy.getName());
+        node.put("tables", formatSSTables(strategy));
+        node.put("repaired", csm.isRepaired(strategy));
+        List<String> folders = csm.getStrategyFolders(strategy);
+        ArrayNode folderNode = json.arrayNode();
+        for (String folder : folders)
+        {
+            folderNode.add(folder);
+        }
+        node.put("folders", folderNode);
+
+        JsonNode logResult = strategy.strategyLogger().options();
+        if (logResult != null)
+            node.put("options", logResult);
+        return node;
+    }
+
+    private JsonNode shutdownStrategy(AbstractCompactionStrategy strategy)
+    {
+        ObjectNode node = json.objectNode();
+        node.put("strategyId", getId(strategy));
+        return node;
+    }
+
+    private JsonNode describeSSTable(AbstractCompactionStrategy strategy, SSTableReader sstable)
+    {
+        ObjectNode node = json.objectNode();
+        node.put("strategyId", getId(strategy));
+        node.put("table", formatSSTable(strategy, sstable));
+        return node;
+    }
+
+    private void describeStrategy(ObjectNode node)
+    {
+        node.put("keyspace", cfs.keyspace.getName());
+        node.put("table", cfs.getTableName());
+        node.put("time", System.currentTimeMillis());
+    }
+
+    private JsonNode startStrategies()
+    {
+        ObjectNode node = json.objectNode();
+        node.put("type", "enable");
+        describeStrategy(node);
+        node.put("strategies", compactionStrategyMap(this::startStrategy));
+        return node;
+    }
+
+    public void enable()
+    {
+        if (enabled.compareAndSet(false, true))
+        {
+            serializer.writeStart(startStrategies(), this);
+        }
+    }
+
+    public void disable()
+    {
+        if (enabled.compareAndSet(true, false))
+        {
+            ObjectNode node = json.objectNode();
+            node.put("type", "disable");
+            describeStrategy(node);
+            node.put("strategies", compactionStrategyMap(this::shutdownStrategy));
+            serializer.write(node, this::startStrategies, this);
+        }
+    }
+
+    public void flush(Collection<SSTableReader> sstables)
+    {
+        if (enabled.get())
+        {
+            ObjectNode node = json.objectNode();
+            node.put("type", "flush");
+            describeStrategy(node);
+            node.put("tables", sstableMap(sstables, this::describeSSTable));
+            serializer.write(node, this::startStrategies, this);
+        }
+    }
+
+    public void compaction(long startTime, Collection<SSTableReader> input, long endTime, Collection<SSTableReader> output)
+    {
+        if (enabled.get())
+        {
+            ObjectNode node = json.objectNode();
+            node.put("type", "compaction");
+            describeStrategy(node);
+            node.put("start", String.valueOf(startTime));
+            node.put("end", String.valueOf(endTime));
+            node.put("input", sstableMap(input, this::describeSSTable));
+            node.put("output", sstableMap(output, this::describeSSTable));
+            serializer.write(node, this::startStrategies, this);
+        }
+    }
+
+    public void pending(AbstractCompactionStrategy strategy, int remaining)
+    {
+        if (remaining != 0 && enabled.get())
+        {
+            ObjectNode node = json.objectNode();
+            node.put("type", "pending");
+            describeStrategy(node);
+            node.put("strategyId", getId(strategy));
+            node.put("pending", remaining);
+            serializer.write(node, this::startStrategies, this);
+        }
+    }
+
+    private static class CompactionLogSerializer implements Writer
+    {
+        private static final String logDirectory = System.getProperty("cassandra.logdir", ".");
+        private final ExecutorService loggerService = Executors.newFixedThreadPool(1);
+        // This is only accessed on the logger service thread, so it does not need to be thread safe
+        private final Set<Object> rolled = new HashSet<>();
+        private OutputStreamWriter stream;
+
+        private static OutputStreamWriter createStream() throws IOException
+        {
+            int count = 0;
+            Path compactionLog = Paths.get(logDirectory, "compaction.log");
+            if (Files.exists(compactionLog))
+            {
+                Path tryPath = compactionLog;
+                while (Files.exists(tryPath))
+                {
+                    tryPath = Paths.get(logDirectory, String.format("compaction-%d.log", count++));
+                }
+                Files.move(compactionLog, tryPath);
+            }
+
+            return new OutputStreamWriter(Files.newOutputStream(compactionLog, StandardOpenOption.CREATE_NEW, StandardOpenOption.WRITE));
+        }
+
+        private void writeLocal(String toWrite)
+        {
+            try
+            {
+                if (stream == null)
+                    stream = createStream();
+                stream.write(toWrite);
+                stream.flush();
+            }
+            catch (IOException ioe)
+            {
+                // We'll drop the change and log the error to the logger.
+                NoSpamLogger.log(logger, NoSpamLogger.Level.ERROR, 1, TimeUnit.MINUTES,
+                                 "Could not write to the log file: {}", ioe);
+            }
+
+        }
+
+        public void writeStart(JsonNode statement, Object tag)
+        {
+            final String toWrite = statement.toString() + System.lineSeparator();
+            loggerService.execute(() -> {
+                rolled.add(tag);
+                writeLocal(toWrite);
+            });
+        }
+
+        public void write(JsonNode statement, StrategySummary summary, Object tag)
+        {
+            final String toWrite = statement.toString() + System.lineSeparator();
+            loggerService.execute(() -> {
+                if (!rolled.contains(tag))
+                {
+                    writeLocal(summary.getSummary().toString() + System.lineSeparator());
+                    rolled.add(tag);
+                }
+                writeLocal(toWrite);
+            });
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
index 99e0fd5..a653b58 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionManager.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionManager.java

@@ -22,6 +22,7 @@
 import java.lang.management.ManagementFactory;
 import java.util.*;
 import java.util.concurrent.*;
+import java.util.stream.Collectors;
 import javax.management.MBeanServer;
 import javax.management.ObjectName;
 import javax.management.openmbean.OpenDataException;
@@ -289,7 +290,7 @@
             Iterable<SSTableReader> sstables = compacting != null ? Lists.newArrayList(operation.filterSSTables(compacting)) : Collections.<SSTableReader>emptyList();
             if (Iterables.isEmpty(sstables))
             {
-                logger.info("No sstables for {}.{}", cfs.keyspace.getName(), cfs.name);
+                logger.info("No sstables to {} for {}.{}", operationType.name(), cfs.keyspace.getName(), cfs.name);
                 return AllSSTableOpStatus.SUCCESSFUL;
             }
 
@@ -408,7 +409,7 @@
             }
 
             @Override
-            public void execute(LifecycleTransaction txn) throws IOException
+            public void execute(LifecycleTransaction txn)
             {
                 AbstractCompactionTask task = cfs.getCompactionStrategyManager().getCompactionTask(txn, NO_GC, Long.MAX_VALUE);
                 task.setUserDefined(true);
@@ -454,6 +455,77 @@
         }, jobs, OperationType.CLEANUP);
     }
 
+    public AllSSTableOpStatus relocateSSTables(final ColumnFamilyStore cfs, int jobs) throws ExecutionException, InterruptedException
+    {
+        if (!cfs.getPartitioner().splitter().isPresent())
+        {
+            logger.info("Partitioner does not support splitting");
+            return AllSSTableOpStatus.ABORTED;
+        }
+        final Collection<Range<Token>> r = StorageService.instance.getLocalRanges(cfs.keyspace.getName());
+
+        if (r.isEmpty())
+        {
+            logger.info("Relocate cannot run before a node has joined the ring");
+            return AllSSTableOpStatus.ABORTED;
+        }
+
+        final List<Range<Token>> localRanges = Range.sort(r);
+        final Directories.DataDirectory[] locations = cfs.getDirectories().getWriteableLocations();
+        final List<PartitionPosition> diskBoundaries = StorageService.getDiskBoundaries(localRanges, cfs.getPartitioner(), locations);
+
+        return parallelAllSSTableOperation(cfs, new OneSSTableOperation()
+        {
+            @Override
+            public Iterable<SSTableReader> filterSSTables(LifecycleTransaction transaction)
+            {
+                Set<SSTableReader> originals = Sets.newHashSet(transaction.originals());
+                Set<SSTableReader> needsRelocation = originals.stream().filter(s -> !inCorrectLocation(s)).collect(Collectors.toSet());
+                transaction.cancel(Sets.difference(originals, needsRelocation));
+
+                Map<Integer, List<SSTableReader>> groupedByDisk = needsRelocation.stream().collect(Collectors.groupingBy((s) ->
+                        CompactionStrategyManager.getCompactionStrategyIndex(cfs, cfs.getDirectories(), s)));
+
+                int maxSize = 0;
+                for (List<SSTableReader> diskSSTables : groupedByDisk.values())
+                    maxSize = Math.max(maxSize, diskSSTables.size());
+
+                List<SSTableReader> mixedSSTables = new ArrayList<>();
+
+                for (int i = 0; i < maxSize; i++)
+                    for (List<SSTableReader> diskSSTables : groupedByDisk.values())
+                        if (i < diskSSTables.size())
+                            mixedSSTables.add(diskSSTables.get(i));
+
+                return mixedSSTables;
+            }
+
+            private boolean inCorrectLocation(SSTableReader sstable)
+            {
+                if (!cfs.getPartitioner().splitter().isPresent())
+                    return true;
+                int directoryIndex = CompactionStrategyManager.getCompactionStrategyIndex(cfs, cfs.getDirectories(), sstable);
+                Directories.DataDirectory[] locations = cfs.getDirectories().getWriteableLocations();
+
+                Directories.DataDirectory location = locations[directoryIndex];
+                PartitionPosition diskLast = diskBoundaries.get(directoryIndex);
+                // the location we get from directoryIndex is based on the first key in the sstable
+                // now we need to make sure the last key is less than the boundary as well:
+                return sstable.descriptor.directory.getAbsolutePath().startsWith(location.location.getAbsolutePath()) && sstable.last.compareTo(diskLast) <= 0;
+            }
+
+            @Override
+            public void execute(LifecycleTransaction txn)
+            {
+                logger.debug("Relocating {}", txn.originals());
+                AbstractCompactionTask task = cfs.getCompactionStrategyManager().getCompactionTask(txn, NO_GC, Long.MAX_VALUE);
+                task.setUserDefined(true);
+                task.setCompactionType(OperationType.RELOCATE);
+                task.execute(metrics);
+            }
+        }, jobs, OperationType.RELOCATE);
+    }
+
     public ListenableFuture<?> submitAntiCompaction(final ColumnFamilyStore cfs,
                                           final Collection<Range<Token>> ranges,
                                           final Refs<SSTableReader> sstables,
@@ -601,7 +673,7 @@
                 nonEmptyTasks++;
             Runnable runnable = new WrappedRunnable()
             {
-                protected void runMayThrow() throws IOException
+                protected void runMayThrow()
                 {
                     task.execute(metrics);
                 }
@@ -614,7 +686,7 @@
             futures.add(executor.submit(runnable));
         }
         if (nonEmptyTasks > 1)
-            logger.info("Cannot perform a full major compaction as repaired and unrepaired sstables cannot be compacted together. These two set of sstables will be compacted separately.");
+            logger.info("Major compaction will not result in a single sstable - repaired and unrepaired data is kept separate and compaction runs per data_file_directory.");
         return futures;
     }
 
@@ -644,11 +716,66 @@
         FBUtilities.waitOnFutures(futures);
     }
 
+    public void forceUserDefinedCleanup(String dataFiles)
+    {
+        String[] filenames = dataFiles.split(",");
+        HashMap<ColumnFamilyStore, Descriptor> descriptors = Maps.newHashMap();
+
+        for (String filename : filenames)
+        {
+            // extract keyspace and columnfamily name from filename
+            Descriptor desc = Descriptor.fromFilename(filename.trim());
+            if (Schema.instance.getCFMetaData(desc) == null)
+            {
+                logger.warn("Schema does not exist for file {}. Skipping.", filename);
+                continue;
+            }
+            // group by keyspace/columnfamily
+            ColumnFamilyStore cfs = Keyspace.open(desc.ksname).getColumnFamilyStore(desc.cfname);
+            desc = cfs.getDirectories().find(new File(filename.trim()).getName());
+            if (desc != null)
+                descriptors.put(cfs, desc);
+        }
+
+        for (Map.Entry<ColumnFamilyStore,Descriptor> entry : descriptors.entrySet())
+        {
+            ColumnFamilyStore cfs = entry.getKey();
+            Keyspace keyspace = cfs.keyspace;
+            Collection<Range<Token>> ranges = StorageService.instance.getLocalRanges(keyspace.getName());
+            boolean hasIndexes = cfs.indexManager.hasIndexes();
+            SSTableReader sstable = lookupSSTable(cfs, entry.getValue());
+
+            if (ranges.isEmpty())
+            {
+                logger.error("Cleanup cannot run before a node has joined the ring");
+                return;
+            }
+
+            if (sstable == null)
+            {
+                logger.warn("Will not clean {}, it is not an active sstable", entry.getValue());
+            }
+            else
+            {
+                CleanupStrategy cleanupStrategy = CleanupStrategy.get(cfs, ranges, FBUtilities.nowInSeconds());
+                try (LifecycleTransaction txn = cfs.getTracker().tryModify(sstable, OperationType.CLEANUP))
+                {
+                    doCleanupOne(cfs, txn, cleanupStrategy, ranges, hasIndexes);
+                }
+                catch (IOException e)
+                {
+                    logger.error(String.format("forceUserDefinedCleanup failed: %s", e.getLocalizedMessage()));
+                }
+            }
+        }
+    }
+
+
     public Future<?> submitUserDefined(final ColumnFamilyStore cfs, final Collection<Descriptor> dataFiles, final int gcBefore)
     {
         Runnable runnable = new WrappedRunnable()
         {
-            protected void runMayThrow() throws IOException
+            protected void runMayThrow()
             {
                 // look up the sstables now that we're on the compaction executor, so we don't try to re-compact
                 // something that was already being compacted earlier.
@@ -673,9 +800,12 @@
                 }
                 else
                 {
-                    AbstractCompactionTask task = cfs.getCompactionStrategyManager().getUserDefinedTask(sstables, gcBefore);
-                    if (task != null)
-                        task.execute(metrics);
+                    List<AbstractCompactionTask> tasks = cfs.getCompactionStrategyManager().getUserDefinedTasks(sstables, gcBefore);
+                    for (AbstractCompactionTask task : tasks)
+                    {
+                        if (task != null)
+                            task.execute(metrics);
+                    }
                 }
             }
         };
@@ -858,13 +988,11 @@
 
         logger.info("Cleaning up {}", sstable);
 
-        File compactionFileLocation = cfs.getDirectories().getWriteableLocationAsFile(cfs.getExpectedCompactedFileSize(txn.originals(), OperationType.CLEANUP));
-        if (compactionFileLocation == null)
-            throw new IOException("disk full");
+        File compactionFileLocation = sstable.descriptor.directory;
 
         List<SSTableReader> finished;
         int nowInSec = FBUtilities.nowInSeconds();
-        try (SSTableRewriter writer = SSTableRewriter.construct(cfs, txn, false, sstable.maxDataAge, false);
+        try (SSTableRewriter writer = SSTableRewriter.construct(cfs, txn, false, sstable.maxDataAge);
              ISSTableScanner scanner = cleanupStrategy.getScanner(sstable, getRateLimiter());
              CompactionController controller = new CompactionController(cfs, txn.originals(), getDefaultGcBefore(cfs, nowInSec));
              CompactionIterator ci = new CompactionIterator(OperationType.CLEANUP, Collections.singletonList(scanner), controller, nowInSec, UUIDGen.getTimeUUID(), metrics))
@@ -895,14 +1023,15 @@
 
         if (!finished.isEmpty())
         {
-            String format = "Cleaned up to %s.  %,d to %,d (~%d%% of original) bytes for %,d keys.  Time: %,dms.";
+            String format = "Cleaned up to %s.  %s to %s (~%d%% of original) for %,d keys.  Time: %,dms.";
             long dTime = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
             long startsize = sstable.onDiskLength();
             long endsize = 0;
             for (SSTableReader newSstable : finished)
                 endsize += newSstable.onDiskLength();
             double ratio = (double) endsize / (double) startsize;
-            logger.info(String.format(format, finished.get(0).getFilename(), startsize, endsize, (int) (ratio * 100), totalkeysWritten, dTime));
+            logger.info(String.format(format, finished.get(0).getFilename(), FBUtilities.prettyPrintMemory(startsize),
+                                      FBUtilities.prettyPrintMemory(endsize), (int) (ratio * 100), totalkeysWritten, dTime));
         }
 
     }
@@ -1004,6 +1133,7 @@
                                     repairedAt,
                                     sstable.getSSTableLevel(),
                                     header,
+                                    cfs.indexManager.listIndexes(),
                                     txn);
     }
 
@@ -1036,6 +1166,7 @@
                                     cfs.metadata,
                                     new MetadataCollector(sstables, cfs.metadata.comparator, minLevel),
                                     SerializationHeader.make(cfs.metadata, sstables),
+                                    cfs.indexManager.listIndexes(),
                                     txn);
     }
 
@@ -1090,7 +1221,7 @@
                 StorageService.instance.forceKeyspaceFlush(cfs.keyspace.getName(), cfs.name);
                 sstables = getSSTablesToValidate(cfs, validator);
                 if (sstables == null)
-                    return; // this means the parent repair session was removed - the repair session failed on another node and we removed i
+                    return; // this means the parent repair session was removed - the repair session failed on another node and we removed it
                 if (validator.gcBefore > 0)
                     gcBefore = validator.gcBefore;
                 else
@@ -1265,8 +1396,8 @@
         int nowInSec = FBUtilities.nowInSeconds();
 
         CompactionStrategyManager strategy = cfs.getCompactionStrategyManager();
-        try (SSTableRewriter repairedSSTableWriter = new SSTableRewriter(anticompactionGroup, groupMaxDataAge, false, false);
-             SSTableRewriter unRepairedSSTableWriter = new SSTableRewriter(anticompactionGroup, groupMaxDataAge, false, false);
+        try (SSTableRewriter repairedSSTableWriter = SSTableRewriter.constructWithoutEarlyOpening(anticompactionGroup, false, groupMaxDataAge);
+             SSTableRewriter unRepairedSSTableWriter = SSTableRewriter.constructWithoutEarlyOpening(anticompactionGroup, false, groupMaxDataAge);
              AbstractCompactionStrategy.ScannerList scanners = strategy.getScanners(anticompactionGroup.originals());
              CompactionController controller = new CompactionController(cfs, sstableAsSet, getDefaultGcBefore(cfs, nowInSec));
              CompactionIterator ci = new CompactionIterator(OperationType.ANTICOMPACTION, scanners.scanners, controller, nowInSec, UUIDGen.getTimeUUID(), metrics))

diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionManagerMBean.java b/src/java/org/apache/cassandra/db/compaction/CompactionManagerMBean.java
index d5da0fe..bb67d5f 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionManagerMBean.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionManagerMBean.java

@@ -45,6 +45,17 @@
     public void forceUserDefinedCompaction(String dataFiles);
 
     /**
+     * Triggers the cleanup of user specified sstables.
+     * You can specify files from various keyspaces and columnfamilies.
+     * If you do so, cleanup is performed each file individually
+     *
+     * @param dataFiles a comma separated list of sstable file to cleanup.
+     *                  must contain keyspace and columnfamily name in path(for 2.1+) or file name itself.
+     */
+    public void forceUserDefinedCleanup(String dataFiles);
+
+
+    /**
      * Stop all running compaction-like tasks having the provided {@code type}.
      * @param type the type of compaction to stop. Can be one of:
      *   - COMPACTION

diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java b/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java
index 444d43d..b6d31d5 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionStrategyManager.java

@@ -20,8 +20,14 @@
 
 import java.util.*;
 import java.util.concurrent.Callable;
+import java.util.concurrent.locks.ReentrantReadWriteLock;
+import java.util.stream.Collectors;
+import java.util.stream.Stream;
 
 import com.google.common.collect.Iterables;
+import org.apache.cassandra.index.Index;
+import com.google.common.primitives.Ints;
+
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -30,6 +36,7 @@
 import org.apache.cassandra.db.Directories;
 import org.apache.cassandra.db.Memtable;
 import org.apache.cassandra.db.SerializationHeader;
+import org.apache.cassandra.db.PartitionPosition;
 import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
 import org.apache.cassandra.db.lifecycle.SSTableSet;
 import org.apache.cassandra.dht.Range;
@@ -42,22 +49,29 @@
 import org.apache.cassandra.notifications.*;
 import org.apache.cassandra.schema.CompactionParams;
 import org.apache.cassandra.service.ActiveRepairService;
+import org.apache.cassandra.service.StorageService;
 
 /**
  * Manages the compaction strategies.
  *
- * Currently has two instances of actual compaction strategies - one for repaired data and one for
+ * Currently has two instances of actual compaction strategies per data directory - one for repaired data and one for
  * unrepaired data. This is done to be able to totally separate the different sets of sstables.
  */
+
 public class CompactionStrategyManager implements INotificationConsumer
 {
     private static final Logger logger = LoggerFactory.getLogger(CompactionStrategyManager.class);
+    public final CompactionLogger compactionLogger;
     private final ColumnFamilyStore cfs;
-    private volatile AbstractCompactionStrategy repaired;
-    private volatile AbstractCompactionStrategy unrepaired;
+    private final List<AbstractCompactionStrategy> repaired = new ArrayList<>();
+    private final List<AbstractCompactionStrategy> unrepaired = new ArrayList<>();
     private volatile boolean enabled = true;
-    public boolean isActive = true;
+    private volatile boolean isActive = true;
     private volatile CompactionParams params;
+    private final ReentrantReadWriteLock lock = new ReentrantReadWriteLock();
+    private final ReentrantReadWriteLock.ReadLock readLock = lock.readLock();
+    private final ReentrantReadWriteLock.WriteLock writeLock = lock.writeLock();
+
     /*
         We keep a copy of the schema compaction parameters here to be able to decide if we
         should update the compaction strategy in maybeReloadCompactionStrategy() due to an ALTER.
@@ -65,15 +79,18 @@
         If a user changes the local compaction strategy and then later ALTERs a compaction parameter,
         we will use the new compaction parameters.
      */
-    private CompactionParams schemaCompactionParams;
+    private volatile CompactionParams schemaCompactionParams;
+    private Directories.DataDirectory[] locations;
 
     public CompactionStrategyManager(ColumnFamilyStore cfs)
     {
         cfs.getTracker().subscribe(this);
         logger.trace("{} subscribed to the data tracker.", this);
         this.cfs = cfs;
+        this.compactionLogger = new CompactionLogger(cfs, this);
         reload(cfs.metadata);
         params = cfs.metadata.params.compaction;
+        locations = getDirectories().getWriteableLocations();
         enabled = params.isEnabled();
     }
 
@@ -83,27 +100,32 @@
      * Returns a task for the compaction strategy that needs it the most (most estimated remaining tasks)
      *
      */
-    public synchronized AbstractCompactionTask getNextBackgroundTask(int gcBefore)
+    public AbstractCompactionTask getNextBackgroundTask(int gcBefore)
     {
-        if (!isEnabled())
-            return null;
-
-        maybeReload(cfs.metadata);
-
-        if (repaired.getEstimatedRemainingTasks() > unrepaired.getEstimatedRemainingTasks())
+        readLock.lock();
+        try
         {
-            AbstractCompactionTask repairedTask = repaired.getNextBackgroundTask(gcBefore);
-            if (repairedTask != null)
-                return repairedTask;
-            return unrepaired.getNextBackgroundTask(gcBefore);
+            if (!isEnabled())
+                return null;
+
+            maybeReload(cfs.metadata);
+            List<AbstractCompactionStrategy> strategies = new ArrayList<>();
+
+            strategies.addAll(repaired);
+            strategies.addAll(unrepaired);
+            Collections.sort(strategies, (o1, o2) -> Ints.compare(o2.getEstimatedRemainingTasks(), o1.getEstimatedRemainingTasks()));
+            for (AbstractCompactionStrategy strategy : strategies)
+            {
+                AbstractCompactionTask task = strategy.getNextBackgroundTask(gcBefore);
+                if (task != null)
+                    return task;
+            }
         }
-        else
+        finally
         {
-            AbstractCompactionTask unrepairedTask = unrepaired.getNextBackgroundTask(gcBefore);
-            if (unrepairedTask != null)
-                return unrepairedTask;
-            return repaired.getNextBackgroundTask(gcBefore);
+            readLock.unlock();
         }
+        return null;
     }
 
     public boolean isEnabled()
@@ -111,9 +133,22 @@
         return enabled && isActive;
     }
 
-    public synchronized void resume()
+    public boolean isActive()
     {
-        isActive = true;
+        return isActive;
+    }
+
+    public void resume()
+    {
+        writeLock.lock();
+        try
+        {
+            isActive = true;
+        }
+        finally
+        {
+            writeLock.unlock();
+        }
     }
 
     /**
@@ -121,51 +156,137 @@
      *
      * Separate call from enable/disable to not have to save the enabled-state externally
       */
-    public synchronized void pause()
+    public void pause()
     {
-        isActive = false;
-    }
+        writeLock.lock();
+        try
+        {
+            isActive = false;
+        }
+        finally
+        {
+            writeLock.unlock();
+        }
 
+    }
 
     private void startup()
     {
-        for (SSTableReader sstable : cfs.getSSTables(SSTableSet.CANONICAL))
+        writeLock.lock();
+        try
         {
-            if (sstable.openReason != SSTableReader.OpenReason.EARLY)
-                getCompactionStrategyFor(sstable).addSSTable(sstable);
+            for (SSTableReader sstable : cfs.getSSTables(SSTableSet.CANONICAL))
+            {
+                if (sstable.openReason != SSTableReader.OpenReason.EARLY)
+                    getCompactionStrategyFor(sstable).addSSTable(sstable);
+            }
+            repaired.forEach(AbstractCompactionStrategy::startup);
+            unrepaired.forEach(AbstractCompactionStrategy::startup);
         }
-        repaired.startup();
-        unrepaired.startup();
+        finally
+        {
+            writeLock.unlock();
+        }
+        repaired.forEach(AbstractCompactionStrategy::startup);
+        unrepaired.forEach(AbstractCompactionStrategy::startup);
+        if (Stream.concat(repaired.stream(), unrepaired.stream()).anyMatch(cs -> cs.logAll))
+            compactionLogger.enable();
     }
 
     /**
      * return the compaction strategy for the given sstable
      *
-     * returns differently based on the repaired status
+     * returns differently based on the repaired status and which vnode the compaction strategy belongs to
      * @param sstable
      * @return
      */
-    private AbstractCompactionStrategy getCompactionStrategyFor(SSTableReader sstable)
+    public AbstractCompactionStrategy getCompactionStrategyFor(SSTableReader sstable)
     {
-        if (sstable.isRepaired())
-            return repaired;
-        else
-            return unrepaired;
+        int index = getCompactionStrategyIndex(cfs, getDirectories(), sstable);
+        readLock.lock();
+        try
+        {
+            if (sstable.isRepaired())
+                return repaired.get(index);
+            else
+                return unrepaired.get(index);
+        }
+        finally
+        {
+            readLock.unlock();
+        }
+    }
+
+    /**
+     * Get the correct compaction strategy for the given sstable. If the first token starts within a disk boundary, we
+     * will add it to that compaction strategy.
+     *
+     * In the case we are upgrading, the first compaction strategy will get most files - we do not care about which disk
+     * the sstable is on currently (unless we don't know the local tokens yet). Once we start compacting we will write out
+     * sstables in the correct locations and give them to the correct compaction strategy instance.
+     *
+     * @param cfs
+     * @param locations
+     * @param sstable
+     * @return
+     */
+    public static int getCompactionStrategyIndex(ColumnFamilyStore cfs, Directories locations, SSTableReader sstable)
+    {
+        if (!cfs.getPartitioner().splitter().isPresent())
+            return 0;
+
+        List<PartitionPosition> boundaries = StorageService.getDiskBoundaries(cfs, locations.getWriteableLocations());
+        if (boundaries == null)
+        {
+            Directories.DataDirectory[] directories = locations.getWriteableLocations();
+
+            // try to figure out location based on sstable directory:
+            for (int i = 0; i < directories.length; i++)
+            {
+                Directories.DataDirectory directory = directories[i];
+                if (sstable.descriptor.directory.getAbsolutePath().startsWith(directory.location.getAbsolutePath()))
+                    return i;
+            }
+            return 0;
+        }
+
+        int pos = Collections.binarySearch(boundaries, sstable.first);
+        assert pos < 0; // boundaries are .minkeybound and .maxkeybound so they should never be equal
+        return -pos - 1;
     }
 
     public void shutdown()
     {
-        isActive = false;
-        repaired.shutdown();
-        unrepaired.shutdown();
+        writeLock.lock();
+        try
+        {
+            isActive = false;
+            repaired.forEach(AbstractCompactionStrategy::shutdown);
+            unrepaired.forEach(AbstractCompactionStrategy::shutdown);
+            compactionLogger.disable();
+        }
+        finally
+        {
+            writeLock.unlock();
+        }
     }
 
-    public synchronized void maybeReload(CFMetaData metadata)
+    public void maybeReload(CFMetaData metadata)
     {
         // compare the old schema configuration to the new one, ignore any locally set changes.
-        if (metadata.params.compaction.equals(schemaCompactionParams))
+        if (metadata.params.compaction.equals(schemaCompactionParams) &&
+            Arrays.equals(locations, cfs.getDirectories().getWriteableLocations())) // any drives broken?
             return;
-        reload(metadata);
+
+        writeLock.lock();
+        try
+        {
+            reload(metadata);
+        }
+        finally
+        {
+            writeLock.unlock();
+        }
     }
 
     /**
@@ -174,9 +295,14 @@
      * Called after changing configuration and at startup.
      * @param metadata
      */
-    public synchronized void reload(CFMetaData metadata)
+    private void reload(CFMetaData metadata)
     {
         boolean disabledWithJMX = !enabled && shouldBeEnabled();
+        if (!metadata.params.compaction.equals(schemaCompactionParams))
+            logger.trace("Recreating compaction strategy - compaction parameters changed for {}.{}", cfs.keyspace.getName(), cfs.getTableName());
+        else if (!Arrays.equals(locations, cfs.getDirectories().getWriteableLocations()))
+            logger.trace("Recreating compaction strategy - writeable locations changed for {}.{}", cfs.keyspace.getName(), cfs.getTableName());
+
         setStrategy(metadata.params.compaction);
         schemaCompactionParams = metadata.params.compaction;
 
@@ -193,26 +319,50 @@
 
     public int getUnleveledSSTables()
     {
-        if (repaired instanceof LeveledCompactionStrategy && unrepaired instanceof LeveledCompactionStrategy)
+        readLock.lock();
+        try
         {
-            int count = 0;
-            count += ((LeveledCompactionStrategy)repaired).getLevelSize(0);
-            count += ((LeveledCompactionStrategy)unrepaired).getLevelSize(0);
-            return count;
+            if (repaired.get(0) instanceof LeveledCompactionStrategy && unrepaired.get(0) instanceof LeveledCompactionStrategy)
+            {
+                int count = 0;
+                for (AbstractCompactionStrategy strategy : repaired)
+                    count += ((LeveledCompactionStrategy) strategy).getLevelSize(0);
+                for (AbstractCompactionStrategy strategy : unrepaired)
+                    count += ((LeveledCompactionStrategy) strategy).getLevelSize(0);
+                return count;
+            }
+        }
+        finally
+        {
+            readLock.unlock();
         }
         return 0;
     }
 
-    public synchronized int[] getSSTableCountPerLevel()
+    public int[] getSSTableCountPerLevel()
     {
-        if (repaired instanceof LeveledCompactionStrategy && unrepaired instanceof LeveledCompactionStrategy)
+        readLock.lock();
+        try
         {
-            int [] res = new int[LeveledManifest.MAX_LEVEL_COUNT];
-            int[] repairedCountPerLevel = ((LeveledCompactionStrategy) repaired).getAllLevelSize();
-            res = sumArrays(res, repairedCountPerLevel);
-            int[] unrepairedCountPerLevel = ((LeveledCompactionStrategy) unrepaired).getAllLevelSize();
-            res = sumArrays(res, unrepairedCountPerLevel);
-            return res;
+            if (repaired.get(0) instanceof LeveledCompactionStrategy && unrepaired.get(0) instanceof LeveledCompactionStrategy)
+            {
+                int[] res = new int[LeveledManifest.MAX_LEVEL_COUNT];
+                for (AbstractCompactionStrategy strategy : repaired)
+                {
+                    int[] repairedCountPerLevel = ((LeveledCompactionStrategy) strategy).getAllLevelSize();
+                    res = sumArrays(res, repairedCountPerLevel);
+                }
+                for (AbstractCompactionStrategy strategy : unrepaired)
+                {
+                    int[] unrepairedCountPerLevel = ((LeveledCompactionStrategy) strategy).getAllLevelSize();
+                    res = sumArrays(res, unrepairedCountPerLevel);
+                }
+                return res;
+            }
+        }
+        finally
+        {
+            readLock.unlock();
         }
         return null;
     }
@@ -234,119 +384,205 @@
 
     public boolean shouldDefragment()
     {
-        assert repaired.getClass().equals(unrepaired.getClass());
-        return repaired.shouldDefragment();
+        readLock.lock();
+        try
+        {
+            assert repaired.get(0).getClass().equals(unrepaired.get(0).getClass());
+            return repaired.get(0).shouldDefragment();
+        }
+        finally
+        {
+            readLock.unlock();
+        }
     }
 
     public Directories getDirectories()
     {
-        assert repaired.getClass().equals(unrepaired.getClass());
-        return repaired.getDirectories();
+        readLock.lock();
+        try
+        {
+            assert repaired.get(0).getClass().equals(unrepaired.get(0).getClass());
+            return repaired.get(0).getDirectories();
+        }
+        finally
+        {
+            readLock.unlock();
+        }
     }
 
-    public synchronized void handleNotification(INotification notification, Object sender)
+    private void handleFlushNotification(Iterable<SSTableReader> added)
     {
+        readLock.lock();
+        try
+        {
+            for (SSTableReader sstable : added)
+                getCompactionStrategyFor(sstable).addSSTable(sstable);
+        }
+        finally
+        {
+            readLock.unlock();
+        }
+    }
+
+    private void handleListChangedNotification(Iterable<SSTableReader> added, Iterable<SSTableReader> removed)
+    {
+        // a bit of gymnastics to be able to replace sstables in compaction strategies
+        // we use this to know that a compaction finished and where to start the next compaction in LCS
+        Directories.DataDirectory [] locations = cfs.getDirectories().getWriteableLocations();
+        int locationSize = cfs.getPartitioner().splitter().isPresent() ? locations.length : 1;
+
+        List<Set<SSTableReader>> repairedRemoved = new ArrayList<>(locationSize);
+        List<Set<SSTableReader>> repairedAdded = new ArrayList<>(locationSize);
+        List<Set<SSTableReader>> unrepairedRemoved = new ArrayList<>(locationSize);
+        List<Set<SSTableReader>> unrepairedAdded = new ArrayList<>(locationSize);
+
+        for (int i = 0; i < locationSize; i++)
+        {
+            repairedRemoved.add(new HashSet<>());
+            repairedAdded.add(new HashSet<>());
+            unrepairedRemoved.add(new HashSet<>());
+            unrepairedAdded.add(new HashSet<>());
+        }
+
+        for (SSTableReader sstable : removed)
+        {
+            int i = getCompactionStrategyIndex(cfs, getDirectories(), sstable);
+            if (sstable.isRepaired())
+                repairedRemoved.get(i).add(sstable);
+            else
+                unrepairedRemoved.get(i).add(sstable);
+        }
+        for (SSTableReader sstable : added)
+        {
+            int i = getCompactionStrategyIndex(cfs, getDirectories(), sstable);
+            if (sstable.isRepaired())
+                repairedAdded.get(i).add(sstable);
+            else
+                unrepairedAdded.get(i).add(sstable);
+        }
+        // we need write lock here since we might be moving sstables between strategies
+        writeLock.lock();
+        try
+        {
+            for (int i = 0; i < locationSize; i++)
+            {
+                if (!repairedRemoved.get(i).isEmpty())
+                    repaired.get(i).replaceSSTables(repairedRemoved.get(i), repairedAdded.get(i));
+                else
+                    repaired.get(i).addSSTables(repairedAdded.get(i));
+
+                if (!unrepairedRemoved.get(i).isEmpty())
+                    unrepaired.get(i).replaceSSTables(unrepairedRemoved.get(i), unrepairedAdded.get(i));
+                else
+                    unrepaired.get(i).addSSTables(unrepairedAdded.get(i));
+            }
+        }
+        finally
+        {
+            writeLock.unlock();
+        }
+    }
+
+    private void handleRepairStatusChangedNotification(Iterable<SSTableReader> sstables)
+    {
+        // we need a write lock here since we move sstables from one strategy instance to another
+        writeLock.lock();
+        try
+        {
+            for (SSTableReader sstable : sstables)
+            {
+                int index = getCompactionStrategyIndex(cfs, getDirectories(), sstable);
+                if (sstable.isRepaired())
+                {
+                    unrepaired.get(index).removeSSTable(sstable);
+                    repaired.get(index).addSSTable(sstable);
+                }
+                else
+                {
+                    repaired.get(index).removeSSTable(sstable);
+                    unrepaired.get(index).addSSTable(sstable);
+                }
+            }
+        }
+        finally
+        {
+            writeLock.unlock();
+        }
+    }
+
+    private void handleDeletingNotification(SSTableReader deleted)
+    {
+        readLock.lock();
+        try
+        {
+            getCompactionStrategyFor(deleted).removeSSTable(deleted);
+        }
+        finally
+        {
+            readLock.unlock();
+        }
+    }
+
+    public void handleNotification(INotification notification, Object sender)
+    {
+        maybeReload(cfs.metadata);
         if (notification instanceof SSTableAddedNotification)
         {
-            SSTableAddedNotification flushedNotification = (SSTableAddedNotification) notification;
-            for (SSTableReader sstable : flushedNotification.added)
-            {
-                if (sstable.isRepaired())
-                    repaired.addSSTable(sstable);
-                else
-                    unrepaired.addSSTable(sstable);
-            }
+            handleFlushNotification(((SSTableAddedNotification) notification).added);
         }
         else if (notification instanceof SSTableListChangedNotification)
         {
             SSTableListChangedNotification listChangedNotification = (SSTableListChangedNotification) notification;
-            Set<SSTableReader> repairedRemoved = new HashSet<>();
-            Set<SSTableReader> repairedAdded = new HashSet<>();
-            Set<SSTableReader> unrepairedRemoved = new HashSet<>();
-            Set<SSTableReader> unrepairedAdded = new HashSet<>();
-
-            for (SSTableReader sstable : listChangedNotification.removed)
-            {
-                if (sstable.isRepaired())
-                    repairedRemoved.add(sstable);
-                else
-                    unrepairedRemoved.add(sstable);
-            }
-            for (SSTableReader sstable : listChangedNotification.added)
-            {
-                if (sstable.isRepaired())
-                    repairedAdded.add(sstable);
-                else
-                    unrepairedAdded.add(sstable);
-            }
-            if (!repairedRemoved.isEmpty())
-            {
-                repaired.replaceSSTables(repairedRemoved, repairedAdded);
-            }
-            else
-            {
-                for (SSTableReader sstable : repairedAdded)
-                    repaired.addSSTable(sstable);
-            }
-
-            if (!unrepairedRemoved.isEmpty())
-            {
-                unrepaired.replaceSSTables(unrepairedRemoved, unrepairedAdded);
-            }
-            else
-            {
-                for (SSTableReader sstable : unrepairedAdded)
-                    unrepaired.addSSTable(sstable);
-            }
+            handleListChangedNotification(listChangedNotification.added, listChangedNotification.removed);
         }
         else if (notification instanceof SSTableRepairStatusChanged)
         {
-            for (SSTableReader sstable : ((SSTableRepairStatusChanged) notification).sstable)
-            {
-                if (sstable.isRepaired())
-                {
-                    unrepaired.removeSSTable(sstable);
-                    repaired.addSSTable(sstable);
-                }
-                else
-                {
-                    repaired.removeSSTable(sstable);
-                    unrepaired.addSSTable(sstable);
-                }
-            }
+            handleRepairStatusChangedNotification(((SSTableRepairStatusChanged) notification).sstables);
         }
         else if (notification instanceof SSTableDeletingNotification)
         {
-            SSTableReader sstable = ((SSTableDeletingNotification)notification).deleting;
-            if (sstable.isRepaired())
-                repaired.removeSSTable(sstable);
-            else
-                unrepaired.removeSSTable(sstable);
+            handleDeletingNotification(((SSTableDeletingNotification) notification).deleting);
         }
     }
 
     public void enable()
     {
-        if (repaired != null)
-            repaired.enable();
-        if (unrepaired != null)
-            unrepaired.enable();
-        // enable this last to make sure the strategies are ready to get calls.
-        enabled = true;
+        writeLock.lock();
+        try
+        {
+            if (repaired != null)
+                repaired.forEach(AbstractCompactionStrategy::enable);
+            if (unrepaired != null)
+                unrepaired.forEach(AbstractCompactionStrategy::enable);
+            // enable this last to make sure the strategies are ready to get calls.
+            enabled = true;
+        }
+        finally
+        {
+            writeLock.unlock();
+        }
     }
 
     public void disable()
     {
-        // disable this first avoid asking disabled strategies for compaction tasks
-        enabled = false;
-        if (repaired != null)
-            repaired.disable();
-        if (unrepaired != null)
-            unrepaired.disable();
+        writeLock.lock();
+        try
+        {
+            // disable this first avoid asking disabled strategies for compaction tasks
+            enabled = false;
+            if (repaired != null)
+                repaired.forEach(AbstractCompactionStrategy::disable);
+            if (unrepaired != null)
+                unrepaired.forEach(AbstractCompactionStrategy::disable);
+        }
+        finally
+        {
+            writeLock.unlock();
+        }
     }
 
     /**
-     * Create ISSTableScanner from the given sstables
+     * Create ISSTableScanners from the given sstables
      *
      * Delegates the call to the compaction strategies to allow LCS to create a scanner
      * @param sstables
@@ -354,89 +590,215 @@
      * @return
      */
     @SuppressWarnings("resource")
-    public synchronized AbstractCompactionStrategy.ScannerList getScanners(Collection<SSTableReader> sstables,  Collection<Range<Token>> ranges)
+    public AbstractCompactionStrategy.ScannerList getScanners(Collection<SSTableReader> sstables,  Collection<Range<Token>> ranges)
     {
-        List<SSTableReader> repairedSSTables = new ArrayList<>();
-        List<SSTableReader> unrepairedSSTables = new ArrayList<>();
+        assert repaired.size() == unrepaired.size();
+        List<Set<SSTableReader>> repairedSSTables = new ArrayList<>();
+        List<Set<SSTableReader>> unrepairedSSTables = new ArrayList<>();
+
+        for (int i = 0; i < repaired.size(); i++)
+        {
+            repairedSSTables.add(new HashSet<>());
+            unrepairedSSTables.add(new HashSet<>());
+        }
+
         for (SSTableReader sstable : sstables)
         {
             if (sstable.isRepaired())
-                repairedSSTables.add(sstable);
+                repairedSSTables.get(getCompactionStrategyIndex(cfs, getDirectories(), sstable)).add(sstable);
             else
-                unrepairedSSTables.add(sstable);
+                unrepairedSSTables.get(getCompactionStrategyIndex(cfs, getDirectories(), sstable)).add(sstable);
         }
 
-        Set<ISSTableScanner> scanners = new HashSet<>(sstables.size());
-        AbstractCompactionStrategy.ScannerList repairedScanners = repaired.getScanners(repairedSSTables, ranges);
-        AbstractCompactionStrategy.ScannerList unrepairedScanners = unrepaired.getScanners(unrepairedSSTables, ranges);
-        scanners.addAll(repairedScanners.scanners);
-        scanners.addAll(unrepairedScanners.scanners);
-        return new AbstractCompactionStrategy.ScannerList(new ArrayList<>(scanners));
+        List<ISSTableScanner> scanners = new ArrayList<>(sstables.size());
+
+        readLock.lock();
+        try
+        {
+            for (int i = 0; i < repairedSSTables.size(); i++)
+            {
+                if (!repairedSSTables.get(i).isEmpty())
+                    scanners.addAll(repaired.get(i).getScanners(repairedSSTables.get(i), ranges).scanners);
+            }
+            for (int i = 0; i < unrepairedSSTables.size(); i++)
+            {
+                if (!unrepairedSSTables.get(i).isEmpty())
+                    scanners.addAll(unrepaired.get(i).getScanners(unrepairedSSTables.get(i), ranges).scanners);
+            }
+
+            return new AbstractCompactionStrategy.ScannerList(scanners);
+        }
+        finally
+        {
+            readLock.unlock();
+        }
     }
 
-    public synchronized AbstractCompactionStrategy.ScannerList getScanners(Collection<SSTableReader> sstables)
+    public AbstractCompactionStrategy.ScannerList getScanners(Collection<SSTableReader> sstables)
     {
         return getScanners(sstables, null);
     }
 
     public Collection<Collection<SSTableReader>> groupSSTablesForAntiCompaction(Collection<SSTableReader> sstablesToGroup)
     {
-        return unrepaired.groupSSTablesForAntiCompaction(sstablesToGroup);
+        readLock.lock();
+        try
+        {
+            Map<Integer, List<SSTableReader>> groups = sstablesToGroup.stream().collect(Collectors.groupingBy((s) -> getCompactionStrategyIndex(cfs, getDirectories(), s)));
+            Collection<Collection<SSTableReader>> anticompactionGroups = new ArrayList<>();
+
+            for (Map.Entry<Integer, List<SSTableReader>> group : groups.entrySet())
+                anticompactionGroups.addAll(unrepaired.get(group.getKey()).groupSSTablesForAntiCompaction(group.getValue()));
+            return anticompactionGroups;
+        }
+        finally
+        {
+            readLock.unlock();
+        }
     }
 
     public long getMaxSSTableBytes()
     {
-        return unrepaired.getMaxSSTableBytes();
+        readLock.lock();
+        try
+        {
+            return unrepaired.get(0).getMaxSSTableBytes();
+        }
+        finally
+        {
+            readLock.unlock();
+        }
     }
 
     public AbstractCompactionTask getCompactionTask(LifecycleTransaction txn, int gcBefore, long maxSSTableBytes)
     {
+        maybeReload(cfs.metadata);
+        validateForCompaction(txn.originals(), cfs, getDirectories());
         return getCompactionStrategyFor(txn.originals().iterator().next()).getCompactionTask(txn, gcBefore, maxSSTableBytes);
     }
 
+    private static void validateForCompaction(Iterable<SSTableReader> input, ColumnFamilyStore cfs, Directories directories)
+    {
+        SSTableReader firstSSTable = Iterables.getFirst(input, null);
+        assert firstSSTable != null;
+        boolean repaired = firstSSTable.isRepaired();
+        int firstIndex = getCompactionStrategyIndex(cfs, directories, firstSSTable);
+        for (SSTableReader sstable : input)
+        {
+            if (sstable.isRepaired() != repaired)
+                throw new UnsupportedOperationException("You can't mix repaired and unrepaired data in a compaction");
+            if (firstIndex != getCompactionStrategyIndex(cfs, directories, sstable))
+                throw new UnsupportedOperationException("You can't mix sstables from different directories in a compaction");
+        }
+    }
+
     public Collection<AbstractCompactionTask> getMaximalTasks(final int gcBefore, final boolean splitOutput)
     {
+        maybeReload(cfs.metadata);
         // runWithCompactionsDisabled cancels active compactions and disables them, then we are able
         // to make the repaired/unrepaired strategies mark their own sstables as compacting. Once the
         // sstables are marked the compactions are re-enabled
         return cfs.runWithCompactionsDisabled(new Callable<Collection<AbstractCompactionTask>>()
         {
             @Override
-            public Collection<AbstractCompactionTask> call() throws Exception
+            public Collection<AbstractCompactionTask> call()
             {
-                synchronized (CompactionStrategyManager.this)
+                List<AbstractCompactionTask> tasks = new ArrayList<>();
+                readLock.lock();
+                try
                 {
-                    Collection<AbstractCompactionTask> repairedTasks = repaired.getMaximalTask(gcBefore, splitOutput);
-                    Collection<AbstractCompactionTask> unrepairedTasks = unrepaired.getMaximalTask(gcBefore, splitOutput);
-
-                    if (repairedTasks == null && unrepairedTasks == null)
-                        return null;
-
-                    if (repairedTasks == null)
-                        return unrepairedTasks;
-                    if (unrepairedTasks == null)
-                        return repairedTasks;
-
-                    List<AbstractCompactionTask> tasks = new ArrayList<>();
-                    tasks.addAll(repairedTasks);
-                    tasks.addAll(unrepairedTasks);
-                    return tasks;
+                    for (AbstractCompactionStrategy strategy : repaired)
+                    {
+                        Collection<AbstractCompactionTask> task = strategy.getMaximalTask(gcBefore, splitOutput);
+                        if (task != null)
+                            tasks.addAll(task);
+                    }
+                    for (AbstractCompactionStrategy strategy : unrepaired)
+                    {
+                        Collection<AbstractCompactionTask> task = strategy.getMaximalTask(gcBefore, splitOutput);
+                        if (task != null)
+                            tasks.addAll(task);
+                    }
                 }
+                finally
+                {
+                    readLock.unlock();
+                }
+                if (tasks.isEmpty())
+                    return null;
+                return tasks;
             }
         }, false, false);
     }
 
+    /**
+     * Return a list of compaction tasks corresponding to the sstables requested. Split the sstables according
+     * to whether they are repaired or not, and by disk location. Return a task per disk location and repair status
+     * group.
+     *
+     * @param sstables the sstables to compact
+     * @param gcBefore gc grace period, throw away tombstones older than this
+     * @return a list of compaction tasks corresponding to the sstables requested
+     */
+    public List<AbstractCompactionTask> getUserDefinedTasks(Collection<SSTableReader> sstables, int gcBefore)
+    {
+        maybeReload(cfs.metadata);
+        List<AbstractCompactionTask> ret = new ArrayList<>();
+
+        readLock.lock();
+        try
+        {
+            Map<Integer, List<SSTableReader>> repairedSSTables = sstables.stream()
+                                                                         .filter(s -> !s.isMarkedSuspect() && s.isRepaired())
+                                                                         .collect(Collectors.groupingBy((s) -> getCompactionStrategyIndex(cfs, getDirectories(), s)));
+
+            Map<Integer, List<SSTableReader>> unrepairedSSTables = sstables.stream()
+                                                                           .filter(s -> !s.isMarkedSuspect() && !s.isRepaired())
+                                                                           .collect(Collectors.groupingBy((s) -> getCompactionStrategyIndex(cfs, getDirectories(), s)));
+
+
+            for (Map.Entry<Integer, List<SSTableReader>> group : repairedSSTables.entrySet())
+                ret.add(repaired.get(group.getKey()).getUserDefinedTask(group.getValue(), gcBefore));
+
+            for (Map.Entry<Integer, List<SSTableReader>> group : unrepairedSSTables.entrySet())
+                ret.add(unrepaired.get(group.getKey()).getUserDefinedTask(group.getValue(), gcBefore));
+
+            return ret;
+        }
+        finally
+        {
+            readLock.unlock();
+        }
+    }
+
+    /**
+     * @deprecated use {@link #getUserDefinedTasks(Collection, int)} instead.
+     */
+    @Deprecated()
     public AbstractCompactionTask getUserDefinedTask(Collection<SSTableReader> sstables, int gcBefore)
     {
-        return getCompactionStrategyFor(sstables.iterator().next()).getUserDefinedTask(sstables, gcBefore);
+        validateForCompaction(sstables, cfs, getDirectories());
+        List<AbstractCompactionTask> tasks = getUserDefinedTasks(sstables, gcBefore);
+        assert tasks.size() == 1;
+        return tasks.get(0);
     }
 
     public int getEstimatedRemainingTasks()
     {
         int tasks = 0;
-        tasks += repaired.getEstimatedRemainingTasks();
-        tasks += unrepaired.getEstimatedRemainingTasks();
+        readLock.lock();
+        try
+        {
 
+            for (AbstractCompactionStrategy strategy : repaired)
+                tasks += strategy.getEstimatedRemainingTasks();
+            for (AbstractCompactionStrategy strategy : unrepaired)
+                tasks += strategy.getEstimatedRemainingTasks();
+        }
+        finally
+        {
+            readLock.unlock();
+        }
         return tasks;
     }
 
@@ -447,33 +809,70 @@
 
     public String getName()
     {
-        return unrepaired.getName();
+        readLock.lock();
+        try
+        {
+            return unrepaired.get(0).getName();
+        }
+        finally
+        {
+            readLock.unlock();
+        }
     }
 
-    public List<AbstractCompactionStrategy> getStrategies()
+    public List<List<AbstractCompactionStrategy>> getStrategies()
     {
-        return Arrays.asList(repaired, unrepaired);
+        readLock.lock();
+        try
+        {
+            return Arrays.asList(repaired, unrepaired);
+        }
+        finally
+        {
+            readLock.unlock();
+        }
     }
 
-    public synchronized void setNewLocalCompactionStrategy(CompactionParams params)
+    public void setNewLocalCompactionStrategy(CompactionParams params)
     {
         logger.info("Switching local compaction strategy from {} to {}}", this.params, params);
-        setStrategy(params);
-        if (shouldBeEnabled())
-            enable();
-        else
-            disable();
-        startup();
+        writeLock.lock();
+        try
+        {
+            setStrategy(params);
+            if (shouldBeEnabled())
+                enable();
+            else
+                disable();
+            startup();
+        }
+        finally
+        {
+            writeLock.unlock();
+        }
     }
 
     private void setStrategy(CompactionParams params)
     {
-        if (repaired != null)
-            repaired.shutdown();
-        if (unrepaired != null)
-            unrepaired.shutdown();
-        repaired = CFMetaData.createCompactionStrategyInstance(cfs, params);
-        unrepaired = CFMetaData.createCompactionStrategyInstance(cfs, params);
+        repaired.forEach(AbstractCompactionStrategy::shutdown);
+        unrepaired.forEach(AbstractCompactionStrategy::shutdown);
+        repaired.clear();
+        unrepaired.clear();
+
+        if (cfs.getPartitioner().splitter().isPresent())
+        {
+            locations = cfs.getDirectories().getWriteableLocations();
+            for (int i = 0; i < locations.length; i++)
+            {
+                repaired.add(CFMetaData.createCompactionStrategyInstance(cfs, params));
+                unrepaired.add(CFMetaData.createCompactionStrategyInstance(cfs, params));
+            }
+        }
+        else
+        {
+            repaired.add(CFMetaData.createCompactionStrategyInstance(cfs, params));
+            unrepaired.add(CFMetaData.createCompactionStrategyInstance(cfs, params));
+        }
         this.params = params;
     }
 
@@ -487,20 +886,63 @@
         return Boolean.parseBoolean(params.options().get(AbstractCompactionStrategy.ONLY_PURGE_REPAIRED_TOMBSTONES));
     }
 
-    public SSTableMultiWriter createSSTableMultiWriter(Descriptor descriptor, long keyCount, long repairedAt, MetadataCollector collector, SerializationHeader header, LifecycleTransaction txn)
+    public SSTableMultiWriter createSSTableMultiWriter(Descriptor descriptor,
+                                                       long keyCount,
+                                                       long repairedAt,
+                                                       MetadataCollector collector,
+                                                       SerializationHeader header,
+                                                       Collection<Index> indexes,
+                                                       LifecycleTransaction txn)
     {
-        if (repairedAt == ActiveRepairService.UNREPAIRED_SSTABLE)
+        readLock.lock();
+        try
         {
-            return unrepaired.createSSTableMultiWriter(descriptor, keyCount, repairedAt, collector, header, txn);
+            if (repairedAt == ActiveRepairService.UNREPAIRED_SSTABLE)
+            {
+                return unrepaired.get(0).createSSTableMultiWriter(descriptor, keyCount, repairedAt, collector, header, indexes, txn);
+            }
+            else
+            {
+                return repaired.get(0).createSSTableMultiWriter(descriptor, keyCount, repairedAt, collector, header, indexes, txn);
+            }
         }
-        else
+        finally
         {
-            return repaired.createSSTableMultiWriter(descriptor, keyCount, repairedAt, collector, header, txn);
+            readLock.unlock();
         }
     }
 
+    public boolean isRepaired(AbstractCompactionStrategy strategy)
+    {
+        return repaired.contains(strategy);
+    }
+
+    public List<String> getStrategyFolders(AbstractCompactionStrategy strategy)
+    {
+        Directories.DataDirectory[] locations = cfs.getDirectories().getWriteableLocations();
+        if (cfs.getPartitioner().splitter().isPresent())
+        {
+            int unrepairedIndex = unrepaired.indexOf(strategy);
+            if (unrepairedIndex > 0)
+            {
+                return Collections.singletonList(locations[unrepairedIndex].location.getAbsolutePath());
+            }
+            int repairedIndex = repaired.indexOf(strategy);
+            if (repairedIndex > 0)
+            {
+                return Collections.singletonList(locations[repairedIndex].location.getAbsolutePath());
+            }
+        }
+        List<String> folders = new ArrayList<>(locations.length);
+        for (Directories.DataDirectory location : locations)
+        {
+            folders.add(location.location.getAbsolutePath());
+        }
+        return folders;
+    }
+
     public boolean supportsEarlyOpen()
     {
-        return repaired.supportsEarlyOpen();
+        return repaired.get(0).supportsEarlyOpen();
     }
 }

diff --git a/src/java/org/apache/cassandra/db/compaction/CompactionTask.java b/src/java/org/apache/cassandra/db/compaction/CompactionTask.java
index 9a7aa98..86c8a8f 100644
--- a/src/java/org/apache/cassandra/db/compaction/CompactionTask.java
+++ b/src/java/org/apache/cassandra/db/compaction/CompactionTask.java

@@ -49,21 +49,25 @@
 {
     protected static final Logger logger = LoggerFactory.getLogger(CompactionTask.class);
     protected final int gcBefore;
-    protected final boolean offline;
     protected final boolean keepOriginals;
     protected static long totalBytesCompacted = 0;
     private CompactionExecutorStatsCollector collector;
 
     public CompactionTask(ColumnFamilyStore cfs, LifecycleTransaction txn, int gcBefore)
     {
-        this(cfs, txn, gcBefore, false, false);
+        this(cfs, txn, gcBefore, false);
     }
 
+    @Deprecated
     public CompactionTask(ColumnFamilyStore cfs, LifecycleTransaction txn, int gcBefore, boolean offline, boolean keepOriginals)
     {
+        this(cfs, txn, gcBefore, keepOriginals);
+    }
+
+    public CompactionTask(ColumnFamilyStore cfs, LifecycleTransaction txn, int gcBefore, boolean keepOriginals)
+    {
         super(cfs, txn);
         this.gcBefore = gcBefore;
-        this.offline = offline;
         this.keepOriginals = keepOriginals;
     }
 
@@ -84,7 +88,7 @@
         if (partialCompactionsAcceptable() && transaction.originals().size() > 1)
         {
             // Try again w/o the largest one.
-            logger.warn("insufficient space to compact all requested files {}", StringUtils.join(transaction.originals(), ", "));
+            logger.warn("Insufficient space to compact all requested files {}", StringUtils.join(transaction.originals(), ", "));
             // Note that we have removed files that are still marked as compacting.
             // This suboptimal but ok since the caller will unmark all the sstables at the end.
             SSTableReader removedSSTable = cfs.getMaxSizeFile(transaction.originals());
@@ -146,6 +150,7 @@
         logger.debug("Compacting ({}) {}", taskId, ssTableLoggerMsg);
 
         long start = System.nanoTime();
+        long startTime = System.currentTimeMillis();
         long totalKeysWritten = 0;
         long estimatedKeys = 0;
         try (CompactionController controller = getCompactionController(transaction.originals()))
@@ -155,6 +160,7 @@
             Collection<SSTableReader> newSStables;
 
             long[] mergedRowCounts;
+            long totalSourceCQLRows;
 
             // SSTableScanners need to be closed before markCompactedSSTablesReplaced call as scanners contain references
             // to both ifile and dfile and SSTR will throw deletion errors on Windows if it tries to delete before scanner is closed.
@@ -168,7 +174,7 @@
                     collector.beginCompaction(ci);
                 long lastCheckObsoletion = start;
 
-                if (!controller.cfs.getCompactionStrategyManager().isActive)
+                if (!controller.cfs.getCompactionStrategyManager().isActive())
                     throw new CompactionInterruptedException(ci.getCompactionInfo());
 
                 try (CompactionAwareWriter writer = getCompactionAwareWriter(cfs, getDirectories(), transaction, actuallyCompact))
@@ -198,11 +204,15 @@
                         collector.finishCompaction(ci);
 
                     mergedRowCounts = ci.getMergedRowCounts();
+
+                    totalSourceCQLRows = ci.getTotalSourceCQLRows();
                 }
             }
 
             // log a bunch of statistics about the result and save to system table compaction_history
-            long dTime = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
+
+            long durationInNano = System.nanoTime() - start;
+            long dTime = TimeUnit.NANOSECONDS.toMillis(durationInNano);
             long startsize = SSTableReader.getTotalBytes(transaction.originals());
             long endsize = SSTableReader.getTotalBytes(newSStables);
             double ratio = (double) endsize / (double) startsize;
@@ -211,15 +221,34 @@
             for (SSTableReader reader : newSStables)
                 newSSTableNames.append(reader.descriptor.baseFilename()).append(",");
 
-            double mbps = dTime > 0 ? (double) endsize / (1024 * 1024) / ((double) dTime / 1000) : 0;
             long totalSourceRows = 0;
-            String mergeSummary = updateCompactionHistory(cfs.keyspace.getName(), cfs.getColumnFamilyName(), mergedRowCounts, startsize, endsize);
-            logger.debug(String.format("Compacted (%s) %d sstables to [%s] to level=%d.  %,d bytes to %,d (~%d%% of original) in %,dms = %fMB/s.  %,d total partitions merged to %,d.  Partition merge counts were {%s}",
-                                      taskId, transaction.originals().size(), newSSTableNames.toString(), getLevel(), startsize, endsize, (int) (ratio * 100), dTime, mbps, totalSourceRows, totalKeysWritten, mergeSummary));
-            logger.trace(String.format("CF Total Bytes Compacted: %,d", CompactionTask.addToTotalBytesCompacted(endsize)));
-            logger.trace("Actual #keys: {}, Estimated #keys:{}, Err%: {}", totalKeysWritten, estimatedKeys, ((double)(totalKeysWritten - estimatedKeys)/totalKeysWritten));
+            for (int i = 0; i < mergedRowCounts.length; i++)
+                totalSourceRows += mergedRowCounts[i] * (i + 1);
 
-            if (offline)
+            String mergeSummary = updateCompactionHistory(cfs.keyspace.getName(), cfs.getColumnFamilyName(), mergedRowCounts, startsize, endsize);
+            logger.debug(String.format("Compacted (%s) %d sstables to [%s] to level=%d.  %s to %s (~%d%% of original) in %,dms.  Read Throughput = %s, Write Throughput = %s, Row Throughput = ~%,d/s.  %,d total partitions merged to %,d.  Partition merge counts were {%s}",
+                                      taskId,
+                                      transaction.originals().size(),
+                                      newSSTableNames.toString(),
+                                      getLevel(),
+                                      FBUtilities.prettyPrintMemory(startsize),
+                                      FBUtilities.prettyPrintMemory(endsize),
+                                      (int) (ratio * 100),
+                                      dTime,
+                                      FBUtilities.prettyPrintMemoryPerSecond(startsize, durationInNano),
+                                      FBUtilities.prettyPrintMemoryPerSecond(endsize, durationInNano),
+                                      (int) totalSourceCQLRows / (TimeUnit.NANOSECONDS.toSeconds(durationInNano) + 1),
+                                      totalSourceRows,
+                                      totalKeysWritten,
+                                      mergeSummary));
+            logger.trace(String.format("CF Total Bytes Compacted: %s", FBUtilities.prettyPrintMemory(CompactionTask.addToTotalBytesCompacted(endsize))));
+            logger.trace("Actual #keys: {}, Estimated #keys:{}, Err%: {}", totalKeysWritten, estimatedKeys, ((double)(totalKeysWritten - estimatedKeys)/totalKeysWritten));
+            cfs.getCompactionStrategyManager().compactionLogger.compaction(startTime, transaction.originals(), System.currentTimeMillis(), newSStables);
+
+            // update the metrics
+            cfs.metric.compactionBytesWritten.inc(endsize);
+
+            if (transaction.isOffline())
                 Refs.release(Refs.selfRefs(newSStables));
         }
     }
@@ -230,7 +259,7 @@
                                                           LifecycleTransaction transaction,
                                                           Set<SSTableReader> nonExpiredSSTables)
     {
-        return new DefaultCompactionWriter(cfs, directories, transaction, nonExpiredSSTables, offline, keepOriginals);
+        return new DefaultCompactionWriter(cfs, directories, transaction, nonExpiredSSTables, keepOriginals, getLevel());
     }
 
     public static String updateCompactionHistory(String keyspaceName, String columnFamilyName, long[] mergedRowCounts, long startSize, long endSize)

diff --git a/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategy.java
index 3e6ae61..cfe0121 100644
--- a/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategy.java
+++ b/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategy.java

@@ -18,6 +18,7 @@
 package org.apache.cassandra.db.compaction;
 
 import java.util.*;
+import java.util.concurrent.TimeUnit;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Predicate;
@@ -32,6 +33,9 @@
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.schema.CompactionParams;
 import org.apache.cassandra.utils.Pair;
+import org.codehaus.jackson.JsonNode;
+import org.codehaus.jackson.node.JsonNodeFactory;
+import org.codehaus.jackson.node.ObjectNode;
 
 import static com.google.common.collect.Iterables.filter;
 
@@ -67,7 +71,7 @@
 
     @Override
     @SuppressWarnings("resource")
-    public synchronized AbstractCompactionTask getNextBackgroundTask(int gcBefore)
+    public AbstractCompactionTask getNextBackgroundTask(int gcBefore)
     {
         while (true)
         {
@@ -87,9 +91,9 @@
      * @param gcBefore
      * @return
      */
-    private List<SSTableReader> getNextBackgroundSSTables(final int gcBefore)
+    private synchronized List<SSTableReader> getNextBackgroundSSTables(final int gcBefore)
     {
-        if (Iterables.isEmpty(cfs.getSSTables(SSTableSet.LIVE)))
+        if (sstables.isEmpty())
             return Collections.emptyList();
 
         Set<SSTableReader> uncompacting = ImmutableSet.copyOf(filter(cfs.getUncompactingSSTables(), sstables::contains));
@@ -216,6 +220,7 @@
     {
         sstables.remove(sstable);
     }
+
     /**
      * A target time span used for bucketing SSTables based on timestamps.
      */
@@ -345,6 +350,7 @@
                     n += Math.ceil((double)stcsBucket.size() / cfs.getMaximumCompactionThreshold());
         }
         estimatedRemainingTasks = n;
+        cfs.getCompactionStrategyManager().compactionLogger.pending(this, n);
     }
 
 
@@ -456,6 +462,32 @@
         return uncheckedOptions;
     }
 
+    public CompactionLogger.Strategy strategyLogger() {
+        return new CompactionLogger.Strategy()
+        {
+            public JsonNode sstable(SSTableReader sstable)
+            {
+                ObjectNode node = JsonNodeFactory.instance.objectNode();
+                node.put("min_timestamp", sstable.getMinTimestamp());
+                node.put("max_timestamp", sstable.getMaxTimestamp());
+                return node;
+            }
+
+            public JsonNode options()
+            {
+                ObjectNode node = JsonNodeFactory.instance.objectNode();
+                TimeUnit resolution = DateTieredCompactionStrategy.this.options.timestampResolution;
+                node.put(DateTieredCompactionStrategyOptions.TIMESTAMP_RESOLUTION_KEY,
+                         resolution.toString());
+                node.put(DateTieredCompactionStrategyOptions.BASE_TIME_KEY,
+                         resolution.toSeconds(DateTieredCompactionStrategy.this.options.baseTime));
+                node.put(DateTieredCompactionStrategyOptions.MAX_WINDOW_SIZE_KEY,
+                         resolution.toSeconds(DateTieredCompactionStrategy.this.options.maxWindowSize));
+                return node;
+            }
+        };
+    }
+
     public String toString()
     {
         return String.format("DateTieredCompactionStrategy[%s/%s]",

diff --git a/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java b/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
index 78a0cab..fee9e34 100644
--- a/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java
+++ b/src/java/org/apache/cassandra/db/compaction/DateTieredCompactionStrategyOptions.java

@@ -44,6 +44,7 @@
 
     @Deprecated
     protected final long maxSSTableAge;
+    protected final TimeUnit timestampResolution;
     protected final long baseTime;
     protected final long expiredSSTableCheckFrequency;
     protected final long maxWindowSize;
@@ -51,7 +52,7 @@
     public DateTieredCompactionStrategyOptions(Map<String, String> options)
     {
         String optionValue = options.get(TIMESTAMP_RESOLUTION_KEY);
-        TimeUnit timestampResolution = optionValue == null ? DEFAULT_TIMESTAMP_RESOLUTION : TimeUnit.valueOf(optionValue);
+        timestampResolution = optionValue == null ? DEFAULT_TIMESTAMP_RESOLUTION : TimeUnit.valueOf(optionValue);
         if (timestampResolution != DEFAULT_TIMESTAMP_RESOLUTION)
             logger.warn("Using a non-default timestamp_resolution {} - are you really doing inserts with USING TIMESTAMP <non_microsecond_timestamp> (or driver equivalent)?", timestampResolution.toString());
         optionValue = options.get(MAX_SSTABLE_AGE_KEY);
@@ -68,9 +69,10 @@
     public DateTieredCompactionStrategyOptions()
     {
         maxSSTableAge = Math.round(DEFAULT_MAX_SSTABLE_AGE_DAYS * DEFAULT_TIMESTAMP_RESOLUTION.convert((long) DEFAULT_MAX_SSTABLE_AGE_DAYS, TimeUnit.DAYS));
-        baseTime = DEFAULT_TIMESTAMP_RESOLUTION.convert(DEFAULT_BASE_TIME_SECONDS, TimeUnit.SECONDS);
+        timestampResolution = DEFAULT_TIMESTAMP_RESOLUTION;
+        baseTime = timestampResolution.convert(DEFAULT_BASE_TIME_SECONDS, TimeUnit.SECONDS);
         expiredSSTableCheckFrequency = TimeUnit.MILLISECONDS.convert(DEFAULT_EXPIRED_SSTABLE_CHECK_FREQUENCY_SECONDS, TimeUnit.SECONDS);
-        maxWindowSize = DEFAULT_TIMESTAMP_RESOLUTION.convert(1, TimeUnit.DAYS);
+        maxWindowSize = timestampResolution.convert(1, TimeUnit.DAYS);
     }
 
     public static Map<String, String> validateOptions(Map<String, String> options, Map<String, String> uncheckedOptions) throws  ConfigurationException

diff --git a/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java
index cd74620..b6ad64c 100644
--- a/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java
+++ b/src/java/org/apache/cassandra/db/compaction/LeveledCompactionStrategy.java

@@ -38,6 +38,9 @@
 import org.apache.cassandra.io.sstable.ISSTableScanner;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.utils.FBUtilities;
+import org.codehaus.jackson.JsonNode;
+import org.codehaus.jackson.node.JsonNodeFactory;
+import org.codehaus.jackson.node.ObjectNode;
 
 public class LeveledCompactionStrategy extends AbstractCompactionStrategy
 {
@@ -90,7 +93,7 @@
      * (by explicit user request) even when compaction is disabled.
      */
     @SuppressWarnings("resource")
-    public synchronized AbstractCompactionTask getNextBackgroundTask(int gcBefore)
+    public AbstractCompactionTask getNextBackgroundTask(int gcBefore)
     {
         while (true)
         {
@@ -208,7 +211,9 @@
 
     public int getEstimatedRemainingTasks()
     {
-        return manifest.getEstimatedTasks();
+        int n = manifest.getEstimatedTasks();
+        cfs.getCompactionStrategyManager().compactionLogger.pending(this, n);
+        return n;
     }
 
     public long getMaxSSTableBytes()
@@ -444,6 +449,26 @@
         return null;
     }
 
+    public CompactionLogger.Strategy strategyLogger()
+    {
+        return new CompactionLogger.Strategy()
+        {
+            public JsonNode sstable(SSTableReader sstable)
+            {
+                ObjectNode node = JsonNodeFactory.instance.objectNode();
+                node.put("level", sstable.getSSTableLevel());
+                node.put("min_token", sstable.first.getToken().toString());
+                node.put("max_token", sstable.last.getToken().toString());
+                return node;
+            }
+
+            public JsonNode options()
+            {
+                return null;
+            }
+        };
+    }
+
     public static Map<String, String> validateOptions(Map<String, String> options) throws ConfigurationException
     {
         Map<String, String> uncheckedOptions = AbstractCompactionStrategy.validateOptions(options);

diff --git a/src/java/org/apache/cassandra/db/compaction/LeveledCompactionTask.java b/src/java/org/apache/cassandra/db/compaction/LeveledCompactionTask.java
index f8c3521..c633937 100644
--- a/src/java/org/apache/cassandra/db/compaction/LeveledCompactionTask.java
+++ b/src/java/org/apache/cassandra/db/compaction/LeveledCompactionTask.java

@@ -48,8 +48,8 @@
                                                           Set<SSTableReader> nonExpiredSSTables)
     {
         if (majorCompaction)
-            return new MajorLeveledCompactionWriter(cfs, directories, txn, nonExpiredSSTables, maxSSTableBytes, false, false);
-        return new MaxSSTableSizeWriter(cfs, directories, txn, nonExpiredSSTables, maxSSTableBytes, getLevel(), false, false);
+            return new MajorLeveledCompactionWriter(cfs, directories, txn, nonExpiredSSTables, maxSSTableBytes, false);
+        return new MaxSSTableSizeWriter(cfs, directories, txn, nonExpiredSSTables, maxSSTableBytes, getLevel(), false);
     }
 
     @Override

diff --git a/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java b/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java
index 2bfe88f..d7a7d0b 100644
--- a/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java
+++ b/src/java/org/apache/cassandra/db/compaction/LeveledManifest.java

@@ -40,6 +40,7 @@
 import org.apache.cassandra.dht.Range;
 import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.service.StorageService;
+import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.Pair;
 
 public class LeveledManifest
@@ -353,7 +354,7 @@
             // small in L0.
             return getSTCSInL0CompactionCandidate();
         }
-        return new CompactionCandidate(candidates, getNextLevel(candidates), cfs.getCompactionStrategyManager().getMaxSSTableBytes());
+        return new CompactionCandidate(candidates, getNextLevel(candidates), maxSSTableSizeInBytes);
     }
 
     private CompactionCandidate getSTCSInL0CompactionCandidate()
@@ -470,8 +471,11 @@
             {
                 if (!getLevel(i).isEmpty())
                 {
-                    logger.trace("L{} contains {} SSTables ({} bytes) in {}",
-                                 i, getLevel(i).size(), SSTableReader.getTotalBytes(getLevel(i)), this);
+                    logger.trace("L{} contains {} SSTables ({}) in {}",
+                                 i,
+                                 getLevel(i).size(),
+                                 FBUtilities.prettyPrintMemory(SSTableReader.getTotalBytes(getLevel(i))),
+                                 this);
                 }
             }
         }
@@ -513,25 +517,30 @@
         return overlapping(first, last, others);
     }
 
-    @VisibleForTesting
-    static Set<SSTableReader> overlapping(SSTableReader sstable, Iterable<SSTableReader> others)
+    private static Set<SSTableReader> overlappingWithBounds(SSTableReader sstable, Map<SSTableReader, Bounds<Token>> others)
     {
-        return overlapping(sstable.first.getToken(), sstable.last.getToken(), others);
+        return overlappingWithBounds(sstable.first.getToken(), sstable.last.getToken(), others);
     }
 
     /**
      * @return sstables from @param sstables that contain keys between @param start and @param end, inclusive.
      */
-    private static Set<SSTableReader> overlapping(Token start, Token end, Iterable<SSTableReader> sstables)
+    @VisibleForTesting
+    static Set<SSTableReader> overlapping(Token start, Token end, Iterable<SSTableReader> sstables)
+    {
+        return overlappingWithBounds(start, end, genBounds(sstables));
+    }
+
+    private static Set<SSTableReader> overlappingWithBounds(Token start, Token end, Map<SSTableReader, Bounds<Token>> sstables)
     {
         assert start.compareTo(end) <= 0;
         Set<SSTableReader> overlapped = new HashSet<>();
         Bounds<Token> promotedBounds = new Bounds<Token>(start, end);
-        for (SSTableReader candidate : sstables)
+
+        for (Map.Entry<SSTableReader, Bounds<Token>> pair : sstables.entrySet())
         {
-            Bounds<Token> candidateBounds = new Bounds<Token>(candidate.first.getToken(), candidate.last.getToken());
-            if (candidateBounds.intersects(promotedBounds))
-                overlapped.add(candidate);
+            if (pair.getValue().intersects(promotedBounds))
+                overlapped.add(pair.getKey());
         }
         return overlapped;
     }
@@ -544,6 +553,16 @@
         }
     };
 
+    private static Map<SSTableReader, Bounds<Token>> genBounds(Iterable<SSTableReader> ssTableReaders)
+    {
+        Map<SSTableReader, Bounds<Token>> boundsMap = new HashMap<>();
+        for (SSTableReader sstable : ssTableReaders)
+        {
+            boundsMap.put(sstable, new Bounds<Token>(sstable.first.getToken(), sstable.last.getToken()));
+        }
+        return boundsMap;
+    }
+
     /**
      * @return highest-priority sstables to compact for the given level.
      * If no compactions are possible (because of concurrent compactions or because some sstables are blacklisted
@@ -584,14 +603,14 @@
             // basically screwed, since we expect all or most L0 sstables to overlap with each L1 sstable.
             // So if an L1 sstable is suspect we can't do much besides try anyway and hope for the best.
             Set<SSTableReader> candidates = new HashSet<>();
-            Set<SSTableReader> remaining = new HashSet<>();
-            Iterables.addAll(remaining, Iterables.filter(getLevel(0), Predicates.not(suspectP)));
-            for (SSTableReader sstable : ageSortedSSTables(remaining))
+            Map<SSTableReader, Bounds<Token>> remaining = genBounds(Iterables.filter(getLevel(0), Predicates.not(suspectP)));
+
+            for (SSTableReader sstable : ageSortedSSTables(remaining.keySet()))
             {
                 if (candidates.contains(sstable))
                     continue;
 
-                Sets.SetView<SSTableReader> overlappedL0 = Sets.union(Collections.singleton(sstable), overlapping(sstable, remaining));
+                Sets.SetView<SSTableReader> overlappedL0 = Sets.union(Collections.singleton(sstable), overlappingWithBounds(sstable, remaining));
                 if (!Sets.intersection(overlappedL0, compactingL0).isEmpty())
                     continue;
 
@@ -644,10 +663,11 @@
 
         // look for a non-suspect keyspace to compact with, starting with where we left off last time,
         // and wrapping back to the beginning of the generation if necessary
+        Map<SSTableReader, Bounds<Token>> sstablesNextLevel = genBounds(getLevel(level + 1));
         for (int i = 0; i < getLevel(level).size(); i++)
         {
             SSTableReader sstable = getLevel(level).get((start + i) % getLevel(level).size());
-            Set<SSTableReader> candidates = Sets.union(Collections.singleton(sstable), overlapping(sstable, getLevel(level + 1)));
+            Set<SSTableReader> candidates = Sets.union(Collections.singleton(sstable), overlappingWithBounds(sstable, sstablesNextLevel));
             if (Iterables.any(candidates, suspectP))
                 continue;
             if (Sets.intersection(candidates, compacting).isEmpty())

diff --git a/src/java/org/apache/cassandra/db/compaction/OperationType.java b/src/java/org/apache/cassandra/db/compaction/OperationType.java
index 20e6df2..84a34c9 100644
--- a/src/java/org/apache/cassandra/db/compaction/OperationType.java
+++ b/src/java/org/apache/cassandra/db/compaction/OperationType.java

@@ -37,7 +37,8 @@
     STREAM("Stream"),
     WRITE("Write"),
     VIEW_BUILD("View build"),
-    INDEX_SUMMARY("Index summary redistribution");
+    INDEX_SUMMARY("Index summary redistribution"),
+    RELOCATE("Relocate sstables to correct disk");
 
     public final String type;
     public final String fileName;

diff --git a/src/java/org/apache/cassandra/db/compaction/SSTableSplitter.java b/src/java/org/apache/cassandra/db/compaction/SSTableSplitter.java
index 3655a37..bd2eda2 100644
--- a/src/java/org/apache/cassandra/db/compaction/SSTableSplitter.java
+++ b/src/java/org/apache/cassandra/db/compaction/SSTableSplitter.java

@@ -60,7 +60,7 @@
 
         public SplittingCompactionTask(ColumnFamilyStore cfs, LifecycleTransaction transaction, int sstableSizeInMB)
         {
-            super(cfs, transaction, CompactionManager.NO_GC, true, false);
+            super(cfs, transaction, CompactionManager.NO_GC, false);
             this.sstableSizeInMB = sstableSizeInMB;
 
             if (sstableSizeInMB <= 0)
@@ -79,7 +79,7 @@
                                                               LifecycleTransaction txn,
                                                               Set<SSTableReader> nonExpiredSSTables)
         {
-            return new MaxSSTableSizeWriter(cfs, directories, txn, nonExpiredSSTables, sstableSizeInMB * 1024L * 1024L, 0, true, false);
+            return new MaxSSTableSizeWriter(cfs, directories, txn, nonExpiredSSTables, sstableSizeInMB * 1024L * 1024L, 0, false);
         }
 
         @Override

diff --git a/src/java/org/apache/cassandra/db/compaction/Scrubber.java b/src/java/org/apache/cassandra/db/compaction/Scrubber.java
index 539c4c7..a9cb211 100644
--- a/src/java/org/apache/cassandra/db/compaction/Scrubber.java
+++ b/src/java/org/apache/cassandra/db/compaction/Scrubber.java

@@ -74,7 +74,7 @@
     };
     private final SortedSet<Partition> outOfOrder = new TreeSet<>(partitionComparator);
 
-    public Scrubber(ColumnFamilyStore cfs, LifecycleTransaction transaction, boolean skipCorrupted, boolean checkData) throws IOException
+    public Scrubber(ColumnFamilyStore cfs, LifecycleTransaction transaction, boolean skipCorrupted, boolean checkData)
     {
         this(cfs, transaction, skipCorrupted, new OutputHandler.LogOutput(), checkData);
     }
@@ -84,7 +84,7 @@
                     LifecycleTransaction transaction,
                     boolean skipCorrupted,
                     OutputHandler outputHandler,
-                    boolean checkData) throws IOException
+                    boolean checkData)
     {
         this.cfs = cfs;
         this.transaction = transaction;
@@ -97,11 +97,8 @@
 
         List<SSTableReader> toScrub = Collections.singletonList(sstable);
 
-        // Calculate the expected compacted filesize
-        this.destination = cfs.getDirectories().getWriteableLocationAsFile(cfs.getExpectedCompactedFileSize(toScrub, OperationType.SCRUB));
-        if (destination == null)
-            throw new IOException("disk full");
-
+        int locIndex = CompactionStrategyManager.getCompactionStrategyIndex(cfs, cfs.getDirectories(), sstable);
+        this.destination = cfs.getDirectories().getLocationForDisk(cfs.getDirectories().getWriteableLocations()[locIndex]);
         this.isCommutative = cfs.metadata.isCounter();
 
         boolean hasIndexFile = (new File(sstable.descriptor.filenameFor(Component.PRIMARY_INDEX))).exists();
@@ -143,14 +140,14 @@
     {
         List<SSTableReader> finished = new ArrayList<>();
         boolean completed = false;
-        outputHandler.output(String.format("Scrubbing %s (%s bytes)", sstable, dataFile.length()));
-        try (SSTableRewriter writer = SSTableRewriter.construct(cfs, transaction, false, sstable.maxDataAge, transaction.isOffline()))
+        outputHandler.output(String.format("Scrubbing %s (%s)", sstable, FBUtilities.prettyPrintMemory(dataFile.length())));
+        try (SSTableRewriter writer = SSTableRewriter.construct(cfs, transaction, false, sstable.maxDataAge))
         {
             nextIndexKey = indexAvailable() ? ByteBufferUtil.readWithShortLength(indexFile) : null;
             if (indexAvailable())
             {
                 // throw away variable so we don't have a side effect in the assert
-                long firstRowPositionFromIndex = rowIndexEntrySerializer.deserialize(indexFile).position;
+                long firstRowPositionFromIndex = rowIndexEntrySerializer.deserializePositionAndSkip(indexFile);
                 assert firstRowPositionFromIndex == 0 : firstRowPositionFromIndex;
             }
 
@@ -191,7 +188,7 @@
 
                 // avoid an NPE if key is null
                 String keyName = key == null ? "(unreadable key)" : ByteBufferUtil.bytesToHex(key.getKey());
-                outputHandler.debug(String.format("row %s is %s bytes", keyName, dataSizeFromIndex));
+                outputHandler.debug(String.format("row %s is %s", keyName, FBUtilities.prettyPrintMemory(dataSizeFromIndex)));
 
                 assert currentIndexKey != null || !indexAvailable();
 
@@ -337,7 +334,7 @@
 
             nextRowPositionFromIndex = !indexAvailable()
                     ? dataFile.length()
-                    : rowIndexEntrySerializer.deserialize(indexFile).position;
+                    : rowIndexEntrySerializer.deserializePositionAndSkip(indexFile);
         }
         catch (Throwable th)
         {
@@ -442,7 +439,7 @@
             }
             catch (Exception e)
             {
-                throw new RuntimeException();
+                throw new RuntimeException(e);
             }
         }
     }

diff --git a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java
index f8a8240..8ef2ac7 100644
--- a/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java
+++ b/src/java/org/apache/cassandra/db/compaction/SizeTieredCompactionStrategy.java

@@ -21,6 +21,7 @@
 import java.util.Map.Entry;
 
 import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableSet;
 import com.google.common.collect.Iterables;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -74,7 +75,7 @@
         this.sizeTieredOptions = new SizeTieredCompactionStrategyOptions(options);
     }
 
-    private List<SSTableReader> getNextBackgroundSSTables(final int gcBefore)
+    private synchronized List<SSTableReader> getNextBackgroundSSTables(final int gcBefore)
     {
         // make local copies so they can't be changed out from under us mid-method
         int minThreshold = cfs.getMinimumCompactionThreshold();
@@ -84,7 +85,8 @@
 
         List<List<SSTableReader>> buckets = getBuckets(createSSTableAndLengthPairs(candidates), sizeTieredOptions.bucketHigh, sizeTieredOptions.bucketLow, sizeTieredOptions.minSSTableSize);
         logger.trace("Compaction buckets are {}", buckets);
-        updateEstimatedCompactionsByTasks(buckets);
+        estimatedRemainingTasks = getEstimatedCompactionsByTasks(cfs, buckets);
+        cfs.getCompactionStrategyManager().compactionLogger.pending(this, estimatedRemainingTasks);
         List<SSTableReader> mostInteresting = mostInterestingBucket(buckets, minThreshold, maxThreshold);
         if (!mostInteresting.isEmpty())
             return mostInteresting;
@@ -174,7 +176,7 @@
     }
 
     @SuppressWarnings("resource")
-    public synchronized AbstractCompactionTask getNextBackgroundTask(int gcBefore)
+    public AbstractCompactionTask getNextBackgroundTask(int gcBefore)
     {
         while (true)
         {
@@ -282,15 +284,15 @@
         return new ArrayList<List<T>>(buckets.values());
     }
 
-    private void updateEstimatedCompactionsByTasks(List<List<SSTableReader>> tasks)
+    public static int getEstimatedCompactionsByTasks(ColumnFamilyStore cfs, List<List<SSTableReader>> tasks)
     {
         int n = 0;
-        for (List<SSTableReader> bucket: tasks)
+        for (List<SSTableReader> bucket : tasks)
         {
             if (bucket.size() >= cfs.getMinimumCompactionThreshold())
                 n += Math.ceil((double)bucket.size() / cfs.getMaximumCompactionThreshold());
         }
-        estimatedRemainingTasks = n;
+        return n;
     }
 
     public long getMaxSSTableBytes()

diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
index e2ab7dc..55daaa1 100644
--- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java
+++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategy.java

@@ -19,7 +19,6 @@
 package org.apache.cassandra.db.compaction;
 
 import java.util.ArrayList;
-import java.util.Arrays;
 import java.util.Collection;
 import java.util.Collections;
 import java.util.Iterator;
@@ -158,8 +157,6 @@
         List<SSTableReader> mostInteresting = newestBucket(buckets.left,
                                                            cfs.getMinimumCompactionThreshold(),
                                                            cfs.getMaximumCompactionThreshold(),
-                                                           options.sstableWindowUnit,
-                                                           options.sstableWindowSize,
                                                            options.stcsOptions,
                                                            this.highestWindowSeen);
         if (!mostInteresting.isEmpty())
@@ -192,16 +189,16 @@
         switch(windowTimeUnit)
         {
             case MINUTES:
-                lowerTimestamp = timestampInSeconds - ((timestampInSeconds) % (60 * windowTimeSize));
+                lowerTimestamp = timestampInSeconds - ((timestampInSeconds) % (60L * windowTimeSize));
                 upperTimestamp = (lowerTimestamp + (60L * (windowTimeSize - 1L))) + 59L;
                 break;
             case HOURS:
-                lowerTimestamp = timestampInSeconds - ((timestampInSeconds) % (3600 * windowTimeSize));
+                lowerTimestamp = timestampInSeconds - ((timestampInSeconds) % (3600L * windowTimeSize));
                 upperTimestamp = (lowerTimestamp + (3600L * (windowTimeSize - 1L))) + 3599L;
                 break;
             case DAYS:
             default:
-                lowerTimestamp = timestampInSeconds - ((timestampInSeconds) % (86400 * windowTimeSize));
+                lowerTimestamp = timestampInSeconds - ((timestampInSeconds) % (86400L * windowTimeSize));
                 upperTimestamp = (lowerTimestamp + (86400L * (windowTimeSize - 1L))) + 86399L;
                 break;
         }
@@ -239,7 +236,7 @@
                 maxTimestamp = bounds.left;
         }
 
-        logger.trace("buckets {}, max timestamp", buckets, maxTimestamp);
+        logger.trace("buckets {}, max timestamp {}", buckets, maxTimestamp);
         return Pair.create(buckets, maxTimestamp);
     }
 
@@ -267,7 +264,7 @@
      * @return a bucket (list) of sstables to compact.
      */
     @VisibleForTesting
-    static List<SSTableReader> newestBucket(HashMultimap<Long, SSTableReader> buckets, int minThreshold, int maxThreshold, TimeUnit sstableWindowUnit, int sstableWindowSize, SizeTieredCompactionStrategyOptions stcsOptions, long now)
+    static List<SSTableReader> newestBucket(HashMultimap<Long, SSTableReader> buckets, int minThreshold, int maxThreshold, SizeTieredCompactionStrategyOptions stcsOptions, long now)
     {
         // If the current bucket has at least minThreshold SSTables, choose that one.
         // For any other bucket, at least 2 SSTables is enough.

diff --git a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java
index bcbdab6..07df606 100644
--- a/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java
+++ b/src/java/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyOptions.java

@@ -114,12 +114,12 @@
             int sstableWindowSize = optionValue == null ? DEFAULT_COMPACTION_WINDOW_SIZE : Integer.parseInt(optionValue);
             if (sstableWindowSize < 1)
             {
-                throw new ConfigurationException(String.format("%s must be greater than 1", DEFAULT_COMPACTION_WINDOW_SIZE, sstableWindowSize));
+                throw new ConfigurationException(String.format("%d must be greater than 1 for %s", sstableWindowSize, COMPACTION_WINDOW_SIZE_KEY));
             }
         }
         catch (NumberFormatException e)
         {
-            throw new ConfigurationException(String.format("%s is not a parsable int (base10) for %s", optionValue, DEFAULT_COMPACTION_WINDOW_SIZE), e);
+            throw new ConfigurationException(String.format("%s is not a parsable int (base10) for %s", optionValue, COMPACTION_WINDOW_SIZE_KEY), e);
         }
 
         optionValue = options.get(EXPIRED_SSTABLE_CHECK_FREQUENCY_SECONDS_KEY);

diff --git a/src/java/org/apache/cassandra/db/compaction/Upgrader.java b/src/java/org/apache/cassandra/db/compaction/Upgrader.java
index 822bb85..39d86dd 100644
--- a/src/java/org/apache/cassandra/db/compaction/Upgrader.java
+++ b/src/java/org/apache/cassandra/db/compaction/Upgrader.java

@@ -75,6 +75,7 @@
                                     cfs.metadata,
                                     sstableMetadataCollector,
                                     SerializationHeader.make(cfs.metadata, Sets.newHashSet(sstable)),
+                                    cfs.indexManager.listIndexes(),
                                     transaction);
     }
 
@@ -82,7 +83,7 @@
     {
         outputHandler.output("Upgrading " + sstable);
         int nowInSec = FBUtilities.nowInSeconds();
-        try (SSTableRewriter writer = SSTableRewriter.construct(cfs, transaction, keepOriginals, CompactionTask.getMaxDataAge(transaction.originals()), true);
+        try (SSTableRewriter writer = SSTableRewriter.construct(cfs, transaction, keepOriginals, CompactionTask.getMaxDataAge(transaction.originals()));
              AbstractCompactionStrategy.ScannerList scanners = strategyManager.getScanners(transaction.originals());
              CompactionIterator iter = new CompactionIterator(transaction.opType(), scanners.scanners, controller, nowInSec, UUIDGen.getTimeUUID()))
         {

diff --git a/src/java/org/apache/cassandra/db/compaction/Verifier.java b/src/java/org/apache/cassandra/db/compaction/Verifier.java
index ce04ad3..91c7ad7 100644
--- a/src/java/org/apache/cassandra/db/compaction/Verifier.java
+++ b/src/java/org/apache/cassandra/db/compaction/Verifier.java

@@ -62,12 +62,12 @@
     private final OutputHandler outputHandler;
     private FileDigestValidator validator;
 
-    public Verifier(ColumnFamilyStore cfs, SSTableReader sstable, boolean isOffline) throws IOException
+    public Verifier(ColumnFamilyStore cfs, SSTableReader sstable, boolean isOffline)
     {
         this(cfs, sstable, new OutputHandler.LogOutput(), isOffline);
     }
 
-    public Verifier(ColumnFamilyStore cfs, SSTableReader sstable, OutputHandler outputHandler, boolean isOffline) throws IOException
+    public Verifier(ColumnFamilyStore cfs, SSTableReader sstable, OutputHandler outputHandler, boolean isOffline)
     {
         this.cfs = cfs;
         this.sstable = sstable;
@@ -87,7 +87,7 @@
     {
         long rowStart = 0;
 
-        outputHandler.output(String.format("Verifying %s (%s bytes)", sstable, dataFile.length()));
+        outputHandler.output(String.format("Verifying %s (%s)", sstable, FBUtilities.prettyPrintMemory(dataFile.length())));
         outputHandler.output(String.format("Checking computed hash of %s ", sstable));
 
 
@@ -128,7 +128,7 @@
         {
             ByteBuffer nextIndexKey = ByteBufferUtil.readWithShortLength(indexFile);
             {
-                long firstRowPositionFromIndex = rowIndexEntrySerializer.deserialize(indexFile).position;
+                long firstRowPositionFromIndex = rowIndexEntrySerializer.deserializePositionAndSkip(indexFile);
                 if (firstRowPositionFromIndex != 0)
                     markAndThrow();
             }
@@ -162,7 +162,7 @@
                     nextIndexKey = indexFile.isEOF() ? null : ByteBufferUtil.readWithShortLength(indexFile);
                     nextRowPositionFromIndex = indexFile.isEOF()
                                              ? dataFile.length()
-                                             : rowIndexEntrySerializer.deserialize(indexFile).position;
+                                             : rowIndexEntrySerializer.deserializePositionAndSkip(indexFile);
                 }
                 catch (Throwable th)
                 {
@@ -177,7 +177,7 @@
                 long dataSize = nextRowPositionFromIndex - dataStartFromIndex;
                 // avoid an NPE if key is null
                 String keyName = key == null ? "(unreadable key)" : ByteBufferUtil.bytesToHex(key.getKey());
-                outputHandler.debug(String.format("row %s is %s bytes", keyName, dataSize));
+                outputHandler.debug(String.format("row %s is %s", keyName, FBUtilities.prettyPrintMemory(dataSize)));
 
                 assert currentIndexKey != null || indexFile.isEOF();
 

diff --git a/src/java/org/apache/cassandra/db/compaction/writers/CompactionAwareWriter.java b/src/java/org/apache/cassandra/db/compaction/writers/CompactionAwareWriter.java
index d33d72c..3557022 100644
--- a/src/java/org/apache/cassandra/db/compaction/writers/CompactionAwareWriter.java
+++ b/src/java/org/apache/cassandra/db/compaction/writers/CompactionAwareWriter.java

@@ -18,18 +18,27 @@
 
 package org.apache.cassandra.db.compaction.writers;
 
+import java.io.File;
 import java.util.Collection;
+import java.util.List;
 import java.util.Set;
 
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.DecoratedKey;
 import org.apache.cassandra.db.Directories;
+import org.apache.cassandra.db.PartitionPosition;
 import org.apache.cassandra.db.rows.UnfilteredRowIterator;
 import org.apache.cassandra.db.compaction.CompactionTask;
 import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
 import org.apache.cassandra.io.sstable.SSTableRewriter;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.concurrent.Transactional;
+import org.apache.cassandra.db.compaction.OperationType;
+import org.apache.cassandra.service.StorageService;
 
 
 /**
@@ -38,6 +47,8 @@
  */
 public abstract class CompactionAwareWriter extends Transactional.AbstractTransactional implements Transactional
 {
+    protected static final Logger logger = LoggerFactory.getLogger(CompactionAwareWriter.class);
+
     protected final ColumnFamilyStore cfs;
     protected final Directories directories;
     protected final Set<SSTableReader> nonExpiredSSTables;
@@ -45,10 +56,13 @@
     protected final long maxAge;
     protected final long minRepairedAt;
 
-    protected final LifecycleTransaction txn;
     protected final SSTableRewriter sstableWriter;
-    private boolean isInitialized = false;
+    protected final LifecycleTransaction txn;
+    private final Directories.DataDirectory[] locations;
+    private final List<PartitionPosition> diskBoundaries;
+    private int locationIndex;
 
+    @Deprecated
     public CompactionAwareWriter(ColumnFamilyStore cfs,
                                  Directories directories,
                                  LifecycleTransaction txn,
@@ -56,14 +70,27 @@
                                  boolean offline,
                                  boolean keepOriginals)
     {
+        this(cfs, directories, txn, nonExpiredSSTables, keepOriginals);
+    }
+
+    public CompactionAwareWriter(ColumnFamilyStore cfs,
+                                 Directories directories,
+                                 LifecycleTransaction txn,
+                                 Set<SSTableReader> nonExpiredSSTables,
+                                 boolean keepOriginals)
+    {
         this.cfs = cfs;
         this.directories = directories;
         this.nonExpiredSSTables = nonExpiredSSTables;
-        this.estimatedTotalKeys = SSTableReader.getApproximateKeyCount(nonExpiredSSTables);
-        this.maxAge = CompactionTask.getMaxDataAge(nonExpiredSSTables);
-        this.minRepairedAt = CompactionTask.getMinRepairedAt(nonExpiredSSTables);
         this.txn = txn;
-        this.sstableWriter = SSTableRewriter.construct(cfs, txn, keepOriginals, maxAge, offline);
+
+        estimatedTotalKeys = SSTableReader.getApproximateKeyCount(nonExpiredSSTables);
+        maxAge = CompactionTask.getMaxDataAge(nonExpiredSSTables);
+        sstableWriter = SSTableRewriter.construct(cfs, txn, keepOriginals, maxAge);
+        minRepairedAt = CompactionTask.getMinRepairedAt(nonExpiredSSTables);
+        locations = cfs.getDirectories().getWriteableLocations();
+        diskBoundaries = StorageService.getDiskBoundaries(cfs);
+        locationIndex = -1;
     }
 
     @Override
@@ -103,6 +130,11 @@
         return estimatedTotalKeys;
     }
 
+    /**
+     * Writes a partition in an implementation specific way
+     * @param partition the partition to append
+     * @return true if the partition was written, false otherwise
+     */
     public final boolean append(UnfilteredRowIterator partition)
     {
         maybeSwitchWriter(partition.partitionKey());
@@ -124,9 +156,26 @@
      */
     protected void maybeSwitchWriter(DecoratedKey key)
     {
-        if (!isInitialized)
-            switchCompactionLocation(getDirectories().getWriteableLocation(cfs.getExpectedCompactedFileSize(nonExpiredSSTables, txn.opType())));
-        isInitialized = true;
+        if (diskBoundaries == null)
+        {
+            if (locationIndex < 0)
+            {
+                Directories.DataDirectory defaultLocation = getWriteDirectory(nonExpiredSSTables, cfs.getExpectedCompactedFileSize(nonExpiredSSTables, OperationType.UNKNOWN));
+                switchCompactionLocation(defaultLocation);
+                locationIndex = 0;
+            }
+            return;
+        }
+
+        if (locationIndex > -1 && key.compareTo(diskBoundaries.get(locationIndex)) < 0)
+            return;
+
+        int prevIdx = locationIndex;
+        while (locationIndex == -1 || key.compareTo(diskBoundaries.get(locationIndex)) > 0)
+            locationIndex++;
+        if (prevIdx >= 0)
+            logger.debug("Switching write location from {} to {}", locations[prevIdx], locations[locationIndex]);
+        switchCompactionLocation(locations[locationIndex]);
     }
 
     /**
@@ -147,13 +196,41 @@
 
     /**
      * Return a directory where we can expect expectedWriteSize to fit.
+     *
+     * @param sstables the sstables to compact
+     * @return
      */
-    public Directories.DataDirectory getWriteDirectory(long expectedWriteSize)
+    public Directories.DataDirectory getWriteDirectory(Iterable<SSTableReader> sstables, long estimatedWriteSize)
     {
-        Directories.DataDirectory directory = getDirectories().getWriteableLocation(expectedWriteSize);
-        if (directory == null)
-            throw new RuntimeException("Insufficient disk space to write " + expectedWriteSize + " bytes");
+        File directory = null;
+        for (SSTableReader sstable : sstables)
+        {
+            if (directory == null)
+                directory = sstable.descriptor.directory;
+            if (!directory.equals(sstable.descriptor.directory))
+                logger.trace("All sstables not from the same disk - putting results in {}", directory);
+        }
+        Directories.DataDirectory d = getDirectories().getDataDirectoryForFile(directory);
+        if (d != null)
+        {
+            if (d.getAvailableSpace() < estimatedWriteSize)
+                throw new RuntimeException(String.format("Not enough space to write %s to %s (%s available)",
+                                                         FBUtilities.prettyPrintMemory(estimatedWriteSize),
+                                                         d.location,
+                                                         FBUtilities.prettyPrintMemory(d.getAvailableSpace())));
+            logger.trace("putting compaction results in {}", directory);
+            return d;
+        }
+        d = getDirectories().getWriteableLocation(estimatedWriteSize);
+        if (d == null)
+            throw new RuntimeException(String.format("Not enough disk space to store %s",
+                                                     FBUtilities.prettyPrintMemory(estimatedWriteSize)));
+        return d;
+    }
 
-        return directory;
+    public CompactionAwareWriter setRepairedAt(long repairedAt)
+    {
+        this.sstableWriter.setRepairedAt(repairedAt);
+        return this;
     }
 }

diff --git a/src/java/org/apache/cassandra/db/compaction/writers/DefaultCompactionWriter.java b/src/java/org/apache/cassandra/db/compaction/writers/DefaultCompactionWriter.java
index 8b90224..f8ecd87 100644
--- a/src/java/org/apache/cassandra/db/compaction/writers/DefaultCompactionWriter.java
+++ b/src/java/org/apache/cassandra/db/compaction/writers/DefaultCompactionWriter.java

@@ -39,16 +39,24 @@
 public class DefaultCompactionWriter extends CompactionAwareWriter
 {
     protected static final Logger logger = LoggerFactory.getLogger(DefaultCompactionWriter.class);
+    private final int sstableLevel;
 
     public DefaultCompactionWriter(ColumnFamilyStore cfs, Directories directories, LifecycleTransaction txn, Set<SSTableReader> nonExpiredSSTables)
     {
-        this(cfs, directories, txn, nonExpiredSSTables, false, false);
+        this(cfs, directories, txn, nonExpiredSSTables, false, 0);
+    }
+
+    @Deprecated
+    public DefaultCompactionWriter(ColumnFamilyStore cfs, Directories directories, LifecycleTransaction txn, Set<SSTableReader> nonExpiredSSTables, boolean offline, boolean keepOriginals, int sstableLevel)
+    {
+        this(cfs, directories, txn, nonExpiredSSTables, keepOriginals, sstableLevel);
     }
 
     @SuppressWarnings("resource")
-    public DefaultCompactionWriter(ColumnFamilyStore cfs, Directories directories, LifecycleTransaction txn, Set<SSTableReader> nonExpiredSSTables, boolean offline, boolean keepOriginals)
+    public DefaultCompactionWriter(ColumnFamilyStore cfs, Directories directories, LifecycleTransaction txn, Set<SSTableReader> nonExpiredSSTables, boolean keepOriginals, int sstableLevel)
     {
-        super(cfs, directories, txn, nonExpiredSSTables, offline, keepOriginals);
+        super(cfs, directories, txn, nonExpiredSSTables, keepOriginals);
+        this.sstableLevel = sstableLevel;
     }
 
     @Override
@@ -58,15 +66,16 @@
     }
 
     @Override
-    protected void switchCompactionLocation(Directories.DataDirectory directory)
+    public void switchCompactionLocation(Directories.DataDirectory directory)
     {
         @SuppressWarnings("resource")
         SSTableWriter writer = SSTableWriter.create(Descriptor.fromFilename(cfs.getSSTablePath(getDirectories().getLocationForDisk(directory))),
                                                     estimatedTotalKeys,
                                                     minRepairedAt,
                                                     cfs.metadata,
-                                                    new MetadataCollector(txn.originals(), cfs.metadata.comparator, 0),
+                                                    new MetadataCollector(txn.originals(), cfs.metadata.comparator, sstableLevel),
                                                     SerializationHeader.make(cfs.metadata, nonExpiredSSTables),
+                                                    cfs.indexManager.listIndexes(),
                                                     txn);
         sstableWriter.switchWriter(writer);
     }

diff --git a/src/java/org/apache/cassandra/db/compaction/writers/MajorLeveledCompactionWriter.java b/src/java/org/apache/cassandra/db/compaction/writers/MajorLeveledCompactionWriter.java
index 6d191f8..6cccfcb 100644
--- a/src/java/org/apache/cassandra/db/compaction/writers/MajorLeveledCompactionWriter.java
+++ b/src/java/org/apache/cassandra/db/compaction/writers/MajorLeveledCompactionWriter.java

@@ -17,12 +17,8 @@
  */
 package org.apache.cassandra.db.compaction.writers;
 
-import java.io.File;
 import java.util.Set;
 
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
 import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.Directories;
 import org.apache.cassandra.db.RowIndexEntry;
@@ -37,15 +33,14 @@
 
 public class MajorLeveledCompactionWriter extends CompactionAwareWriter
 {
-    private static final Logger logger = LoggerFactory.getLogger(MajorLeveledCompactionWriter.class);
     private final long maxSSTableSize;
-    private final long expectedWriteSize;
-    private final Set<SSTableReader> allSSTables;
     private int currentLevel = 1;
     private long averageEstimatedKeysPerSSTable;
     private long partitionsWritten = 0;
     private long totalWrittenInLevel = 0;
     private int sstablesWritten = 0;
+    private final long keysPerSSTable;
+    private Directories.DataDirectory sstableDirectory;
 
     public MajorLeveledCompactionWriter(ColumnFamilyStore cfs,
                                         Directories directories,
@@ -53,10 +48,10 @@
                                         Set<SSTableReader> nonExpiredSSTables,
                                         long maxSSTableSize)
     {
-        this(cfs, directories, txn, nonExpiredSSTables, maxSSTableSize, false, false);
+        this(cfs, directories, txn, nonExpiredSSTables, maxSSTableSize, false);
     }
 
-    @SuppressWarnings("resource")
+    @Deprecated
     public MajorLeveledCompactionWriter(ColumnFamilyStore cfs,
                                         Directories directories,
                                         LifecycleTransaction txn,
@@ -65,48 +60,59 @@
                                         boolean offline,
                                         boolean keepOriginals)
     {
-        super(cfs, directories, txn, nonExpiredSSTables, offline, keepOriginals);
+        this(cfs, directories, txn, nonExpiredSSTables, maxSSTableSize, keepOriginals);
+    }
+
+    @SuppressWarnings("resource")
+    public MajorLeveledCompactionWriter(ColumnFamilyStore cfs,
+                                        Directories directories,
+                                        LifecycleTransaction txn,
+                                        Set<SSTableReader> nonExpiredSSTables,
+                                        long maxSSTableSize,
+                                        boolean keepOriginals)
+    {
+        super(cfs, directories, txn, nonExpiredSSTables, keepOriginals);
         this.maxSSTableSize = maxSSTableSize;
-        this.allSSTables = txn.originals();
-        expectedWriteSize = Math.min(maxSSTableSize, cfs.getExpectedCompactedFileSize(nonExpiredSSTables, txn.opType()));
+        long estimatedSSTables = Math.max(1, SSTableReader.getTotalBytes(nonExpiredSSTables) / maxSSTableSize);
+        keysPerSSTable = estimatedTotalKeys / estimatedSSTables;
     }
 
     @Override
     @SuppressWarnings("resource")
     public boolean realAppend(UnfilteredRowIterator partition)
     {
-        long posBefore = sstableWriter.currentWriter().getOnDiskFilePointer();
         RowIndexEntry rie = sstableWriter.append(partition);
-        totalWrittenInLevel += sstableWriter.currentWriter().getOnDiskFilePointer() - posBefore;
         partitionsWritten++;
-        if (sstableWriter.currentWriter().getOnDiskFilePointer() > maxSSTableSize)
+        long totalWrittenInCurrentWriter = sstableWriter.currentWriter().getEstimatedOnDiskBytesWritten();
+        if (totalWrittenInCurrentWriter > maxSSTableSize)
         {
+            totalWrittenInLevel += totalWrittenInCurrentWriter;
             if (totalWrittenInLevel > LeveledManifest.maxBytesForLevel(currentLevel, maxSSTableSize))
             {
                 totalWrittenInLevel = 0;
                 currentLevel++;
             }
-
-            averageEstimatedKeysPerSSTable = Math.round(((double) averageEstimatedKeysPerSSTable * sstablesWritten + partitionsWritten) / (sstablesWritten + 1));
-            switchCompactionLocation(getWriteDirectory(expectedWriteSize));
-            partitionsWritten = 0;
-            sstablesWritten++;
+            switchCompactionLocation(sstableDirectory);
         }
         return rie != null;
 
     }
 
-    public void switchCompactionLocation(Directories.DataDirectory directory)
+    @Override
+    public void switchCompactionLocation(Directories.DataDirectory location)
     {
-        File sstableDirectory = getDirectories().getLocationForDisk(directory);
-        @SuppressWarnings("resource")
-        SSTableWriter writer = SSTableWriter.create(Descriptor.fromFilename(cfs.getSSTablePath(sstableDirectory)),
-                                                    averageEstimatedKeysPerSSTable,
-                                                    minRepairedAt,
-                                                    cfs.metadata,
-                                                    new MetadataCollector(allSSTables, cfs.metadata.comparator, currentLevel),
-                                                    SerializationHeader.make(cfs.metadata, nonExpiredSSTables),
-                                                    txn);
-        sstableWriter.switchWriter(writer);
+        this.sstableDirectory = location;
+        averageEstimatedKeysPerSSTable = Math.round(((double) averageEstimatedKeysPerSSTable * sstablesWritten + partitionsWritten) / (sstablesWritten + 1));
+        sstableWriter.switchWriter(SSTableWriter.create(Descriptor.fromFilename(cfs.getSSTablePath(getDirectories().getLocationForDisk(sstableDirectory))),
+                keysPerSSTable,
+                minRepairedAt,
+                cfs.metadata,
+                new MetadataCollector(txn.originals(), cfs.metadata.comparator, currentLevel),
+                SerializationHeader.make(cfs.metadata, txn.originals()),
+                cfs.indexManager.listIndexes(),
+                txn));
+        partitionsWritten = 0;
+        sstablesWritten = 0;
+
     }
 }

diff --git a/src/java/org/apache/cassandra/db/compaction/writers/MaxSSTableSizeWriter.java b/src/java/org/apache/cassandra/db/compaction/writers/MaxSSTableSizeWriter.java
index b206498..864185e 100644
--- a/src/java/org/apache/cassandra/db/compaction/writers/MaxSSTableSizeWriter.java
+++ b/src/java/org/apache/cassandra/db/compaction/writers/MaxSSTableSizeWriter.java

@@ -33,11 +33,11 @@
 
 public class MaxSSTableSizeWriter extends CompactionAwareWriter
 {
-    private final long expectedWriteSize;
     private final long maxSSTableSize;
     private final int level;
     private final long estimatedSSTables;
     private final Set<SSTableReader> allSSTables;
+    private Directories.DataDirectory sstableDirectory;
 
     public MaxSSTableSizeWriter(ColumnFamilyStore cfs,
                                 Directories directories,
@@ -46,10 +46,10 @@
                                 long maxSSTableSize,
                                 int level)
     {
-        this(cfs, directories, txn, nonExpiredSSTables, maxSSTableSize, level, false, false);
+        this(cfs, directories, txn, nonExpiredSSTables, maxSSTableSize, level, false);
     }
 
-    @SuppressWarnings("resource")
+    @Deprecated
     public MaxSSTableSizeWriter(ColumnFamilyStore cfs,
                                 Directories directories,
                                 LifecycleTransaction txn,
@@ -59,13 +59,23 @@
                                 boolean offline,
                                 boolean keepOriginals)
     {
-        super(cfs, directories, txn, nonExpiredSSTables, offline, keepOriginals);
+        this(cfs, directories, txn, nonExpiredSSTables, maxSSTableSize, level, keepOriginals);
+    }
+
+    public MaxSSTableSizeWriter(ColumnFamilyStore cfs,
+                                Directories directories,
+                                LifecycleTransaction txn,
+                                Set<SSTableReader> nonExpiredSSTables,
+                                long maxSSTableSize,
+                                int level,
+                                boolean keepOriginals)
+    {
+        super(cfs, directories, txn, nonExpiredSSTables, keepOriginals);
         this.allSSTables = txn.originals();
         this.level = level;
         this.maxSSTableSize = maxSSTableSize;
 
         long totalSize = getTotalWriteSize(nonExpiredSSTables, estimatedTotalKeys, cfs, txn.opType());
-        expectedWriteSize = Math.min(maxSSTableSize, totalSize);
         estimatedSSTables = Math.max(1, totalSize / maxSSTableSize);
     }
 
@@ -79,30 +89,34 @@
             estimatedKeysBeforeCompaction += sstable.estimatedKeys();
         estimatedKeysBeforeCompaction = Math.max(1, estimatedKeysBeforeCompaction);
         double estimatedCompactionRatio = (double) estimatedTotalKeys / estimatedKeysBeforeCompaction;
+
         return Math.round(estimatedCompactionRatio * cfs.getExpectedCompactedFileSize(nonExpiredSSTables, compactionType));
     }
 
-    @Override
-    public boolean realAppend(UnfilteredRowIterator partition)
+    protected boolean realAppend(UnfilteredRowIterator partition)
     {
         RowIndexEntry rie = sstableWriter.append(partition);
-        if (sstableWriter.currentWriter().getOnDiskFilePointer() > maxSSTableSize)
-            switchCompactionLocation(getWriteDirectory(expectedWriteSize));
+        if (sstableWriter.currentWriter().getEstimatedOnDiskBytesWritten() > maxSSTableSize)
+        {
+            switchCompactionLocation(sstableDirectory);
+        }
         return rie != null;
     }
 
+    @Override
     public void switchCompactionLocation(Directories.DataDirectory location)
     {
+        sstableDirectory = location;
         @SuppressWarnings("resource")
-        SSTableWriter writer = SSTableWriter.create(Descriptor.fromFilename(cfs.getSSTablePath(getDirectories().getLocationForDisk(location))),
+        SSTableWriter writer = SSTableWriter.create(Descriptor.fromFilename(cfs.getSSTablePath(getDirectories().getLocationForDisk(sstableDirectory))),
                                                     estimatedTotalKeys / estimatedSSTables,
                                                     minRepairedAt,
                                                     cfs.metadata,
                                                     new MetadataCollector(allSSTables, cfs.metadata.comparator, level),
                                                     SerializationHeader.make(cfs.metadata, nonExpiredSSTables),
+                                                    cfs.indexManager.listIndexes(),
                                                     txn);
 
         sstableWriter.switchWriter(writer);
-
     }
 }

diff --git a/src/java/org/apache/cassandra/db/compaction/writers/SplittingSizeTieredCompactionWriter.java b/src/java/org/apache/cassandra/db/compaction/writers/SplittingSizeTieredCompactionWriter.java
index 796391c..46cb891 100644
--- a/src/java/org/apache/cassandra/db/compaction/writers/SplittingSizeTieredCompactionWriter.java
+++ b/src/java/org/apache/cassandra/db/compaction/writers/SplittingSizeTieredCompactionWriter.java

@@ -17,7 +17,6 @@
  */
 package org.apache.cassandra.db.compaction.writers;
 
-import java.io.File;
 import java.util.Arrays;
 import java.util.Set;
 
@@ -51,13 +50,13 @@
     private final Set<SSTableReader> allSSTables;
     private long currentBytesToWrite;
     private int currentRatioIndex = 0;
+    private Directories.DataDirectory location;
 
     public SplittingSizeTieredCompactionWriter(ColumnFamilyStore cfs, Directories directories, LifecycleTransaction txn, Set<SSTableReader> nonExpiredSSTables)
     {
         this(cfs, directories, txn, nonExpiredSSTables, DEFAULT_SMALLEST_SSTABLE_BYTES);
     }
 
-    @SuppressWarnings("resource")
     public SplittingSizeTieredCompactionWriter(ColumnFamilyStore cfs, Directories directories, LifecycleTransaction txn, Set<SSTableReader> nonExpiredSSTables, long smallestSSTable)
     {
         super(cfs, directories, txn, nonExpiredSSTables, false, false);
@@ -82,27 +81,27 @@
             }
         }
         ratios = Arrays.copyOfRange(potentialRatios, 0, noPointIndex);
-        long currentPartitionsToWrite = Math.round(estimatedTotalKeys * ratios[currentRatioIndex]);
         currentBytesToWrite = Math.round(totalSize * ratios[currentRatioIndex]);
-        switchCompactionLocation(getWriteDirectory(currentBytesToWrite));
-        logger.trace("Ratios={}, expectedKeys = {}, totalSize = {}, currentPartitionsToWrite = {}, currentBytesToWrite = {}", ratios, estimatedTotalKeys, totalSize, currentPartitionsToWrite, currentBytesToWrite);
     }
 
     @Override
     public boolean realAppend(UnfilteredRowIterator partition)
     {
         RowIndexEntry rie = sstableWriter.append(partition);
-        if (sstableWriter.currentWriter().getOnDiskFilePointer() > currentBytesToWrite && currentRatioIndex < ratios.length - 1) // if we underestimate how many keys we have, the last sstable might get more than we expect
+        if (sstableWriter.currentWriter().getEstimatedOnDiskBytesWritten() > currentBytesToWrite && currentRatioIndex < ratios.length - 1) // if we underestimate how many keys we have, the last sstable might get more than we expect
         {
             currentRatioIndex++;
             currentBytesToWrite = Math.round(totalSize * ratios[currentRatioIndex]);
-            switchCompactionLocation(getWriteDirectory(Math.round(totalSize * ratios[currentRatioIndex])));
+            switchCompactionLocation(location);
+            logger.debug("Switching writer, currentBytesToWrite = {}", currentBytesToWrite);
         }
         return rie != null;
     }
 
+    @Override
     public void switchCompactionLocation(Directories.DataDirectory location)
     {
+        this.location = location;
         long currentPartitionsToWrite = Math.round(ratios[currentRatioIndex] * estimatedTotalKeys);
         @SuppressWarnings("resource")
         SSTableWriter writer = SSTableWriter.create(Descriptor.fromFilename(cfs.getSSTablePath(getDirectories().getLocationForDisk(location))),
@@ -111,9 +110,9 @@
                                                     cfs.metadata,
                                                     new MetadataCollector(allSSTables, cfs.metadata.comparator, 0),
                                                     SerializationHeader.make(cfs.metadata, nonExpiredSSTables),
+                                                    cfs.indexManager.listIndexes(),
                                                     txn);
         logger.trace("Switching writer, currentPartitionsToWrite = {}", currentPartitionsToWrite);
         sstableWriter.switchWriter(writer);
-
     }
 }

diff --git a/src/java/org/apache/cassandra/db/filter/ClusteringIndexFilter.java b/src/java/org/apache/cassandra/db/filter/ClusteringIndexFilter.java
index e3f824f..d1907b1 100644
--- a/src/java/org/apache/cassandra/db/filter/ClusteringIndexFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/ClusteringIndexFilter.java

@@ -128,14 +128,7 @@
      */
     public UnfilteredRowIterator filterNotIndexed(ColumnFilter columnFilter, UnfilteredRowIterator iterator);
 
-    /**
-     * Returns an iterator that only returns the rows of the provided sliceable iterator that this filter selects.
-     *
-     * @param iterator the sliceable iterator for which we should filter rows.
-     *
-     * @return an iterator that only returns the rows (or rather unfiltered) from {@code iterator} that are selected by this filter.
-     */
-    public UnfilteredRowIterator filter(SliceableUnfilteredRowIterator iterator);
+    public Slices getSlices(CFMetaData metadata);
 
     /**
      * Given a partition, returns a row iterator for the rows of this partition that are selected by this filter.

diff --git a/src/java/org/apache/cassandra/db/filter/ClusteringIndexNamesFilter.java b/src/java/org/apache/cassandra/db/filter/ClusteringIndexNamesFilter.java
index a81a7a6..6a010d9 100644
--- a/src/java/org/apache/cassandra/db/filter/ClusteringIndexNamesFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/ClusteringIndexNamesFilter.java

@@ -126,52 +126,12 @@
         return Transformation.apply(iterator, new FilterNotIndexed());
     }
 
-    public UnfilteredRowIterator filter(final SliceableUnfilteredRowIterator iter)
+    public Slices getSlices(CFMetaData metadata)
     {
-        // Please note that this method assumes that rows from 'iter' already have their columns filtered, i.e. that
-        // they only include columns that we select.
-        return new WrappingUnfilteredRowIterator(iter)
-        {
-            private final Iterator<Clustering> clusteringIter = clusteringsInQueryOrder.iterator();
-            private Iterator<Unfiltered> currentClustering;
-            private Unfiltered next;
-
-            @Override
-            public boolean hasNext()
-            {
-                if (next != null)
-                    return true;
-
-                if (currentClustering != null && currentClustering.hasNext())
-                {
-                    next = currentClustering.next();
-                    return true;
-                }
-
-                while (clusteringIter.hasNext())
-                {
-                    Clustering nextClustering = clusteringIter.next();
-                    currentClustering = iter.slice(Slice.make(nextClustering));
-                    if (currentClustering.hasNext())
-                    {
-                        next = currentClustering.next();
-                        return true;
-                    }
-                }
-                return false;
-            }
-
-            @Override
-            public Unfiltered next()
-            {
-                if (next == null && !hasNext())
-                    throw new NoSuchElementException();
-
-                Unfiltered toReturn = next;
-                next = null;
-                return toReturn;
-            }
-        };
+        Slices.Builder builder = new Slices.Builder(metadata.comparator, clusteringsInQueryOrder.size());
+        for (Clustering clustering : clusteringsInQueryOrder)
+            builder.add(Slice.make(clustering));
+        return builder.build();
     }
 
     public UnfilteredRowIterator getUnfilteredRowIterator(final ColumnFilter columnFilter, final Partition partition)
@@ -229,7 +189,7 @@
 
     public String toCQLString(CFMetaData metadata)
     {
-        if (clusterings.isEmpty())
+        if (metadata.clusteringColumns().isEmpty() || clusterings.size() <= 1)
             return "";
 
         StringBuilder sb = new StringBuilder();

diff --git a/src/java/org/apache/cassandra/db/filter/ClusteringIndexSliceFilter.java b/src/java/org/apache/cassandra/db/filter/ClusteringIndexSliceFilter.java
index 7a174ee..ba30dcf 100644
--- a/src/java/org/apache/cassandra/db/filter/ClusteringIndexSliceFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/ClusteringIndexSliceFilter.java

@@ -114,11 +114,9 @@
         return Transformation.apply(iterator, new FilterNotIndexed());
     }
 
-    public UnfilteredRowIterator filter(SliceableUnfilteredRowIterator iterator)
+    public Slices getSlices(CFMetaData metadata)
     {
-        // Please note that this method assumes that rows from 'iter' already have their columns filtered, i.e. that
-        // they only include columns that we select.
-        return slices.makeSliceIterator(iterator);
+        return slices;
     }
 
     public UnfilteredRowIterator getUnfilteredRowIterator(ColumnFilter columnFilter, Partition partition)

diff --git a/src/java/org/apache/cassandra/db/filter/ColumnFilter.java b/src/java/org/apache/cassandra/db/filter/ColumnFilter.java
index 05eade5..9c4c714 100644
--- a/src/java/org/apache/cassandra/db/filter/ColumnFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/ColumnFilter.java

@@ -36,44 +36,57 @@
  * Represents which (non-PK) columns (and optionally which sub-part of a column for complex columns) are selected
  * by a query.
  *
- * In practice, this class cover 2 main cases:
- *   1) most user queries have to internally query all columns, because the CQL semantic requires us to know if
- *      a row is live or not even if it has no values for the columns requested by the user (see #6588for more
- *      details). However, while we need to know for columns if it has live values, we can actually save from
- *      sending the values for those columns that will not be returned to the user.
- *   2) for some internal queries (and for queries using #6588 if we introduce it), we're actually fine only
- *      actually querying some of the columns.
+ * We distinguish 2 sets of columns in practice: the _fetched_ columns, which are the columns that we (may, see
+ * below) need to fetch internally, and the _queried_ columns, which are the columns that the user has selected
+ * in its request.
  *
- * For complex columns, this class allows to be more fine grained than the column by only selection some of the
- * cells of the complex column (either individual cell by path name, or some slice).
+ * The reason for distinguishing those 2 sets is that due to the CQL semantic (see #6588 for more details), we
+ * often need to internally fetch all columns for the queried table, but can still do some optimizations for those
+ * columns that are not directly queried by the user (see #10657 for more details).
+ *
+ * Note that in practice:
+ *   - the _queried_ columns set is always included in the _fetched_ one.
+ *   - whenever those sets are different, we know the _fetched_ set contains all columns for the table, so we
+ *     don't have to record this set, we just keep a pointer to the table metadata. The only set we concretely
+ *     store is thus the _queried_ one.
+ *   - in the special case of a {@code SELECT *} query, we want to query all columns, and _fetched_ == _queried.
+ *     As this is a common case, we special case it by keeping the _queried_ set {@code null} (and we retrieve
+ *     the columns through the metadata pointer).
+ *
+ * For complex columns, this class optionally allows to specify a subset of the cells to query for each column.
+ * We can either select individual cells by path name, or a slice of them. Note that this is a sub-selection of
+ * _queried_ cells, so if _fetched_ != _queried_, then the cell selected by this sub-selection are considered
+ * queried and the other ones are considered fetched (and if a column has some sub-selection, it must be a queried
+ * column, which is actually enforced by the Builder below).
  */
 public class ColumnFilter
 {
     public static final Serializer serializer = new Serializer();
 
-    // Distinguish between the 2 cases described above: if 'isFetchAll' is true, then all columns will be retrieved
-    // by the query, but the values for column/cells not selected by 'selection' and 'subSelections' will be skipped.
-    // Otherwise, only the column/cells returned by 'selection' and 'subSelections' will be returned at all.
+    // True if _fetched_ is all the columns, in which case metadata must not be null. If false,
+    // then _fetched_ == _queried_ and we only store _queried_.
     private final boolean isFetchAll;
 
     private final CFMetaData metadata; // can be null if !isFetchAll
 
-    private final PartitionColumns selection; // can be null if isFetchAll and we don't want to skip any value
+    private final PartitionColumns queried; // can be null if isFetchAll and _fetched_ == _queried_
     private final SortedSetMultimap<ColumnIdentifier, ColumnSubselection> subSelections; // can be null
 
     private ColumnFilter(boolean isFetchAll,
                          CFMetaData metadata,
-                         PartitionColumns columns,
+                         PartitionColumns queried,
                          SortedSetMultimap<ColumnIdentifier, ColumnSubselection> subSelections)
     {
+        assert !isFetchAll || metadata != null;
+        assert isFetchAll || queried != null;
         this.isFetchAll = isFetchAll;
         this.metadata = metadata;
-        this.selection = columns;
+        this.queried = queried;
         this.subSelections = subSelections;
     }
 
     /**
-     * A selection that includes all columns (and their values).
+     * A filter that includes all columns for the provided table.
      */
     public static ColumnFilter all(CFMetaData metadata)
     {
@@ -81,7 +94,7 @@
     }
 
     /**
-     * A selection that only fetch the provided columns.
+     * A filter that only fetches/queries the provided columns.
      * <p>
      * Note that this shouldn't be used for CQL queries in general as all columns should be queried to
      * preserve CQL semantic (see class javadoc). This is ok for some internal queries however (and
@@ -93,81 +106,93 @@
     }
 
     /**
-     * The columns that needs to be fetched internally for this selection.
-     * <p>
-     * This is the columns that must be present in the internal rows returned by queries using this selection,
-     * not the columns that are actually queried by the user (see the class javadoc for details).
+     * The columns that needs to be fetched internally for this filter.
      *
-     * @return the column to fetch for this selection.
+     * @return the columns to fetch for this filter.
      */
     public PartitionColumns fetchedColumns()
     {
-        return isFetchAll ? metadata.partitionColumns() : selection;
+        return isFetchAll ? metadata.partitionColumns() : queried;
     }
 
-    public boolean includesAllColumns()
+    /**
+     * The columns actually queried by the user.
+     * <p>
+     * Note that this is in general not all the columns that are fetched internally (see {@link #fetchedColumns}).
+     */
+    public PartitionColumns queriedColumns()
+    {
+        assert queried != null || isFetchAll;
+        return queried == null ? metadata.partitionColumns() : queried;
+    }
+
+    public boolean fetchesAllColumns()
     {
         return isFetchAll;
     }
 
     /**
-     * Whether the provided column is selected by this selection.
+     * Whether _fetched_ == _queried_ for this filter, and so if the {@code isQueried()} methods
+     * can return {@code false} for some column/cell.
      */
-    public boolean includes(ColumnDefinition column)
+    public boolean allFetchedColumnsAreQueried()
     {
-        return isFetchAll || selection.contains(column);
+        return !isFetchAll || (queried == null && subSelections == null);
     }
 
     /**
-     * Whether we can skip the value for the provided selected column.
+     * Whether the provided column is fetched by this filter.
      */
-    public boolean canSkipValue(ColumnDefinition column)
+    public boolean fetches(ColumnDefinition column)
     {
-        // We don't use that currently, see #10655 for more details.
-        return false;
+        return isFetchAll || queried.contains(column);
     }
 
     /**
-     * Whether the provided cell of a complex column is selected by this selection.
+     * Whether the provided column, which is assumed to be _fetched_ by this filter (so the caller must guarantee
+     * that {@code fetches(column) == true}, is also _queried_ by the user.
+     *
+     * !WARNING! please be sure to understand the difference between _fetched_ and _queried_
+     * columns that this class made before using this method. If unsure, you probably want
+     * to use the {@link #fetches} method.
      */
-    public boolean includes(Cell cell)
+    public boolean fetchedColumnIsQueried(ColumnDefinition column)
     {
-        if (isFetchAll || subSelections == null || !cell.column().isComplex())
+        return !isFetchAll || queried == null || queried.contains(column);
+    }
+
+    /**
+     * Whether the provided complex cell (identified by its column and path), which is assumed to be _fetched_ by
+     * this filter, is also _queried_ by the user.
+     *
+     * !WARNING! please be sure to understand the difference between _fetched_ and _queried_
+     * columns that this class made before using this method. If unsure, you probably want
+     * to use the {@link #fetches} method.
+     */
+    public boolean fetchedCellIsQueried(ColumnDefinition column, CellPath path)
+    {
+        assert path != null;
+        if (!isFetchAll || subSelections == null)
             return true;
 
-        SortedSet<ColumnSubselection> s = subSelections.get(cell.column().name);
+        SortedSet<ColumnSubselection> s = subSelections.get(column.name);
+        // No subsection for this column means everything is queried
         if (s.isEmpty())
             return true;
 
         for (ColumnSubselection subSel : s)
-            if (subSel.compareInclusionOf(cell.path()) == 0)
+            if (subSel.compareInclusionOf(path) == 0)
                 return true;
 
         return false;
     }
 
     /**
-     * Whether we can skip the value of the cell of a complex column.
-     */
-    public boolean canSkipValue(ColumnDefinition column, CellPath path)
-    {
-        if (!isFetchAll || subSelections == null || !column.isComplex())
-            return false;
-
-        SortedSet<ColumnSubselection> s = subSelections.get(column.name);
-        if (s.isEmpty())
-            return false;
-
-        for (ColumnSubselection subSel : s)
-            if (subSel.compareInclusionOf(path) == 0)
-                return false;
-
-        return true;
-    }
-
-    /**
      * Creates a new {@code Tester} to efficiently test the inclusion of cells of complex column
      * {@code column}.
+     *
+     * @return the created tester or {@code null} if all the cells from the provided column
+     * are queried.
      */
     public Tester newTester(ColumnDefinition column)
     {
@@ -182,8 +207,8 @@
     }
 
     /**
-     * Returns a {@code ColumnFilter}} builder that includes all columns (so the selections
-     * added to the builder are the columns/cells for which we shouldn't skip the values).
+     * Returns a {@code ColumnFilter}} builder that fetches all columns (and queries the columns
+     * added to the builder, or everything if no column is added).
      */
     public static Builder allColumnsBuilder(CFMetaData metadata)
     {
@@ -191,8 +216,7 @@
     }
 
     /**
-     * Returns a {@code ColumnFilter}} builder that includes only the columns/cells
-     * added to the builder.
+     * Returns a {@code ColumnFilter} builder that only fetches the columns/cells added to the builder.
      */
     public static Builder selectionBuilder()
     {
@@ -211,17 +235,20 @@
             this.iterator = iterator;
         }
 
-        public boolean includes(CellPath path)
+        public boolean fetches(CellPath path)
         {
-            return isFetchAll || includedBySubselection(path);
+            return isFetchAll || hasSubselection(path);
         }
 
-        public boolean canSkipValue(CellPath path)
+        /**
+         * Must only be called if {@code fetches(path) == true}.
+         */
+        public boolean fetchedCellIsQueried(CellPath path)
         {
-            return isFetchAll && !includedBySubselection(path);
+            return !isFetchAll || hasSubselection(path);
         }
 
-        private boolean includedBySubselection(CellPath path)
+        private boolean hasSubselection(CellPath path)
         {
             while (current != null || iterator.hasNext())
             {
@@ -241,10 +268,22 @@
         }
     }
 
+    /**
+     * A builder for a {@code ColumnFilter} object.
+     *
+     * Note that the columns added to this build are the _queried_ column. Whether or not all columns
+     * are _fetched_ depends on which ctor you've used to obtained this builder, allColumnsBuilder (all
+     * columns are fetched) or selectionBuilder (only the queried columns are fetched).
+     *
+     * Note that for a allColumnsBuilder, if no queried columns are added, this is interpreted as querying
+     * all columns, not querying none (but if you know you want to query all columns, prefer
+     * {@link ColumnFilter#all)}. For selectionBuilder, adding no queried columns means no column will be
+     * fetched (so the builder will return {@code PartitionColumns.NONE}).
+     */
     public static class Builder
     {
-        private final CFMetaData metadata;
-        private PartitionColumns.Builder selection;
+        private final CFMetaData metadata; // null if we don't fetch all columns
+        private PartitionColumns.Builder queriedBuilder;
         private List<ColumnSubselection> subSelections;
 
         private Builder(CFMetaData metadata)
@@ -254,17 +293,17 @@
 
         public Builder add(ColumnDefinition c)
         {
-            if (selection == null)
-                selection = PartitionColumns.builder();
-            selection.add(c);
+            if (queriedBuilder == null)
+                queriedBuilder = PartitionColumns.builder();
+            queriedBuilder.add(c);
             return this;
         }
 
         public Builder addAll(Iterable<ColumnDefinition> columns)
         {
-            if (selection == null)
-                selection = PartitionColumns.builder();
-            selection.addAll(columns);
+            if (queriedBuilder == null)
+                queriedBuilder = PartitionColumns.builder();
+            queriedBuilder.addAll(columns);
             return this;
         }
 
@@ -291,11 +330,11 @@
         {
             boolean isFetchAll = metadata != null;
 
-            PartitionColumns selectedColumns = selection == null ? null : selection.build();
-            // It's only ok to have selection == null in ColumnFilter if isFetchAll. So deal with the case of a "selection" builder
+            PartitionColumns queried = queriedBuilder == null ? null : queriedBuilder.build();
+            // It's only ok to have queried == null in ColumnFilter if isFetchAll. So deal with the case of a selectionBuilder
             // with nothing selected (we can at least happen on some backward compatible queries - CASSANDRA-10471).
-            if (!isFetchAll && selectedColumns == null)
-                selectedColumns = PartitionColumns.NONE;
+            if (!isFetchAll && queried == null)
+                queried = PartitionColumns.NONE;
 
             SortedSetMultimap<ColumnIdentifier, ColumnSubselection> s = null;
             if (subSelections != null)
@@ -305,7 +344,7 @@
                     s.put(subSelection.column().name, subSelection);
             }
 
-            return new ColumnFilter(isFetchAll, metadata, selectedColumns, s);
+            return new ColumnFilter(isFetchAll, metadata, queried, s);
         }
     }
 
@@ -315,17 +354,20 @@
         if (isFetchAll)
             return "*";
 
-        if (selection.isEmpty())
+        if (queried.isEmpty())
             return "";
 
-        Iterator<ColumnDefinition> defs = selection.selectOrderIterator();
+        Iterator<ColumnDefinition> defs = queried.selectOrderIterator();
         if (!defs.hasNext())
             return "<none>";
 
         StringBuilder sb = new StringBuilder();
-        appendColumnDef(sb, defs.next());
         while (defs.hasNext())
-            appendColumnDef(sb.append(", "), defs.next());
+        {
+            appendColumnDef(sb, defs.next());
+            if (defs.hasNext())
+                sb.append(", ");
+        }
         return sb.toString();
     }
 
@@ -352,13 +394,13 @@
     public static class Serializer
     {
         private static final int IS_FETCH_ALL_MASK       = 0x01;
-        private static final int HAS_SELECTION_MASK      = 0x02;
+        private static final int HAS_QUERIED_MASK      = 0x02;
         private static final int HAS_SUB_SELECTIONS_MASK = 0x04;
 
         private static int makeHeaderByte(ColumnFilter selection)
         {
             return (selection.isFetchAll ? IS_FETCH_ALL_MASK : 0)
-                 | (selection.selection != null ? HAS_SELECTION_MASK : 0)
+                 | (selection.queried != null ? HAS_QUERIED_MASK : 0)
                  | (selection.subSelections != null ? HAS_SUB_SELECTIONS_MASK : 0);
         }
 
@@ -366,10 +408,10 @@
         {
             out.writeByte(makeHeaderByte(selection));
 
-            if (selection.selection != null)
+            if (selection.queried != null)
             {
-                Columns.serializer.serialize(selection.selection.statics, out);
-                Columns.serializer.serialize(selection.selection.regulars, out);
+                Columns.serializer.serialize(selection.queried.statics, out);
+                Columns.serializer.serialize(selection.queried.regulars, out);
             }
 
             if (selection.subSelections != null)
@@ -384,15 +426,15 @@
         {
             int header = in.readUnsignedByte();
             boolean isFetchAll = (header & IS_FETCH_ALL_MASK) != 0;
-            boolean hasSelection = (header & HAS_SELECTION_MASK) != 0;
+            boolean hasQueried = (header & HAS_QUERIED_MASK) != 0;
             boolean hasSubSelections = (header & HAS_SUB_SELECTIONS_MASK) != 0;
 
-            PartitionColumns selection = null;
-            if (hasSelection)
+            PartitionColumns queried = null;
+            if (hasQueried)
             {
                 Columns statics = Columns.serializer.deserialize(in, metadata);
                 Columns regulars = Columns.serializer.deserialize(in, metadata);
-                selection = new PartitionColumns(statics, regulars);
+                queried = new PartitionColumns(statics, regulars);
             }
 
             SortedSetMultimap<ColumnIdentifier, ColumnSubselection> subSelections = null;
@@ -407,17 +449,17 @@
                 }
             }
 
-            return new ColumnFilter(isFetchAll, isFetchAll ? metadata : null, selection, subSelections);
+            return new ColumnFilter(isFetchAll, isFetchAll ? metadata : null, queried, subSelections);
         }
 
         public long serializedSize(ColumnFilter selection, int version)
         {
             long size = 1; // header byte
 
-            if (selection.selection != null)
+            if (selection.queried != null)
             {
-                size += Columns.serializer.serializedSize(selection.selection.statics);
-                size += Columns.serializer.serializedSize(selection.selection.regulars);
+                size += Columns.serializer.serializedSize(selection.queried.statics);
+                size += Columns.serializer.serializedSize(selection.queried.regulars);
             }
 
             if (selection.subSelections != null)

diff --git a/src/java/org/apache/cassandra/db/filter/DataLimits.java b/src/java/org/apache/cassandra/db/filter/DataLimits.java
index 94f43dc..85cae0c 100644
--- a/src/java/org/apache/cassandra/db/filter/DataLimits.java
+++ b/src/java/org/apache/cassandra/db/filter/DataLimits.java

@@ -72,12 +72,21 @@
 
     public static DataLimits cqlLimits(int cqlRowLimit)
     {
-        return new CQLLimits(cqlRowLimit);
+        return cqlRowLimit == NO_LIMIT ? NONE : new CQLLimits(cqlRowLimit);
     }
 
     public static DataLimits cqlLimits(int cqlRowLimit, int perPartitionLimit)
     {
-        return new CQLLimits(cqlRowLimit, perPartitionLimit);
+        return cqlRowLimit == NO_LIMIT && perPartitionLimit == NO_LIMIT
+             ? NONE
+             : new CQLLimits(cqlRowLimit, perPartitionLimit);
+    }
+
+    private static DataLimits cqlLimits(int cqlRowLimit, int perPartitionLimit, boolean isDistinct)
+    {
+        return cqlRowLimit == NO_LIMIT && perPartitionLimit == NO_LIMIT && !isDistinct
+             ? NONE
+             : new CQLLimits(cqlRowLimit, perPartitionLimit, isDistinct);
     }
 
     public static DataLimits distinctLimits(int cqlRowLimit)
@@ -765,7 +774,7 @@
                     int perPartitionLimit = (int)in.readUnsignedVInt();
                     boolean isDistinct = in.readBoolean();
                     if (kind == Kind.CQL_LIMIT)
-                        return new CQLLimits(rowLimit, perPartitionLimit, isDistinct);
+                        return cqlLimits(rowLimit, perPartitionLimit, isDistinct);
 
                     ByteBuffer lastKey = ByteBufferUtil.readWithVIntLength(in);
                     int lastRemaining = (int)in.readUnsignedVInt();

diff --git a/src/java/org/apache/cassandra/db/filter/RowFilter.java b/src/java/org/apache/cassandra/db/filter/RowFilter.java
index 11cfb87..6626275 100644
--- a/src/java/org/apache/cassandra/db/filter/RowFilter.java
+++ b/src/java/org/apache/cassandra/db/filter/RowFilter.java

@@ -20,8 +20,13 @@
 import java.io.IOException;
 import java.nio.ByteBuffer;
 import java.util.*;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.atomic.AtomicInteger;
 
 import com.google.common.base.Objects;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
@@ -29,7 +34,8 @@
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.context.*;
 import org.apache.cassandra.db.marshal.*;
-import org.apache.cassandra.db.partitions.*;
+import org.apache.cassandra.db.partitions.ImmutableBTreePartition;
+import org.apache.cassandra.db.partitions.UnfilteredPartitionIterator;
 import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.db.transform.Transformation;
 import org.apache.cassandra.exceptions.InvalidRequestException;
@@ -54,6 +60,8 @@
  */
 public abstract class RowFilter implements Iterable<RowFilter.Expression>
 {
+    private static final Logger logger = LoggerFactory.getLogger(RowFilter.class);
+
     public static final Serializer serializer = new Serializer();
     public static final RowFilter NONE = new CQLFilter(Collections.emptyList());
 
@@ -79,9 +87,11 @@
         return new ThriftFilter(new ArrayList<>(capacity));
     }
 
-    public void add(ColumnDefinition def, Operator op, ByteBuffer value)
+    public SimpleExpression add(ColumnDefinition def, Operator op, ByteBuffer value)
     {
-        add(new SimpleExpression(def, op, value));
+        SimpleExpression expression = new SimpleExpression(def, op, value);
+        add(expression);
+        return expression;
     }
 
     public void addMapEquality(ColumnDefinition def, ByteBuffer key, Operator op, ByteBuffer value)
@@ -106,6 +116,11 @@
         expressions.add(expression);
     }
 
+    public void addUserExpression(UserExpression e)
+    {
+        expressions.add(e);
+    }
+
     public List<Expression> getExpressions()
     {
         return expressions;
@@ -202,6 +217,11 @@
         return withNewExpressions(newExpressions);
     }
 
+    public RowFilter withoutExpressions()
+    {
+        return withNewExpressions(Collections.emptyList());
+    }
+
     protected abstract RowFilter withNewExpressions(List<Expression> expressions);
 
     public boolean isEmpty()
@@ -220,11 +240,11 @@
         if (metadata.isCompound())
         {
             List<ByteBuffer> values = CompositeType.splitName(name);
-            return new Clustering(values.toArray(new ByteBuffer[metadata.comparator.size()]));
+            return Clustering.make(values.toArray(new ByteBuffer[metadata.comparator.size()]));
         }
         else
         {
-            return new Clustering(name);
+            return Clustering.make(name);
         }
     }
 
@@ -263,13 +283,13 @@
                 DecoratedKey pk;
                 public UnfilteredRowIterator applyToPartition(UnfilteredRowIterator partition)
                 {
+                    pk = partition.partitionKey();
+
                     // The filter might be on static columns, so need to check static row first.
                     if (filterStaticColumns && applyToRow(partition.staticRow()) == null)
                         return null;
 
-                    pk = partition.partitionKey();
                     UnfilteredRowIterator iterator = Transformation.apply(partition, this);
-
                     return (filterNonStaticColumns && !iterator.hasNext()) ? null : iterator;
                 }
 
@@ -345,9 +365,9 @@
         private static final Serializer serializer = new Serializer();
 
         // Note: the order of this enum matter, it's used for serialization
-        protected enum Kind { SIMPLE, MAP_EQUALITY, THRIFT_DYN_EXPR, CUSTOM }
+        protected enum Kind { SIMPLE, MAP_EQUALITY, THRIFT_DYN_EXPR, CUSTOM, USER }
 
-        abstract Kind kind();
+        protected abstract Kind kind();
         protected final ColumnDefinition column;
         protected final Operator operator;
         protected final ByteBuffer value;
@@ -364,6 +384,11 @@
             return kind() == Kind.CUSTOM;
         }
 
+        public boolean isUserDefined()
+        {
+            return kind() == Kind.USER;
+        }
+
         public ColumnDefinition column()
         {
             return column;
@@ -486,6 +511,13 @@
                     return;
                 }
 
+                if (expression.kind() == Kind.USER)
+                {
+                    assert version >= MessagingService.VERSION_30;
+                    UserExpression.serialize((UserExpression)expression, out, version);
+                    return;
+                }
+
                 ByteBufferUtil.writeWithShortLength(expression.column.name.bytes, out);
                 expression.operator.writeTo(out);
 
@@ -529,6 +561,11 @@
                                                     IndexMetadata.serializer.deserialize(in, version, metadata),
                                                     ByteBufferUtil.readWithShortLength(in));
                     }
+
+                    if (kind == Kind.USER)
+                    {
+                        return UserExpression.deserialize(in, version, metadata);
+                    }
                 }
 
                 name = ByteBufferUtil.readWithShortLength(in);
@@ -578,8 +615,11 @@
                 // version 3.0+ includes a byte for Kind
                 long size = version >= MessagingService.VERSION_30 ? 1 : 0;
 
-                // custom expressions don't include a column or operator, all other expressions do
-                if (expression.kind() != Kind.CUSTOM)
+                // Custom expressions include neither a column or operator, but all
+                // other expressions do. Also, custom expressions are 3.0+ only, so
+                // the column & operator will always be the first things written for
+                // any pre-3.0 version
+                if (expression.kind() != Kind.CUSTOM && expression.kind() != Kind.USER)
                     size += ByteBufferUtil.serializedSizeWithShortLength(expression.column().name.bytes)
                             + expression.operator.serializedSize();
 
@@ -602,8 +642,11 @@
                     case CUSTOM:
                         if (version >= MessagingService.VERSION_30)
                             size += IndexMetadata.serializer.serializedSize(((CustomExpression)expression).targetIndex, version)
-                                  + ByteBufferUtil.serializedSizeWithShortLength(expression.value);
+                                   + ByteBufferUtil.serializedSizeWithShortLength(expression.value);
                         break;
+                    case USER:
+                        if (version >= MessagingService.VERSION_30)
+                            size += UserExpression.serializedSize((UserExpression)expression, version);
                 }
                 return size;
             }
@@ -613,9 +656,9 @@
     /**
      * An expression of the form 'column' 'op' 'value'.
      */
-    private static class SimpleExpression extends Expression
+    public static class SimpleExpression extends Expression
     {
-        public SimpleExpression(ColumnDefinition column, Operator operator, ByteBuffer value)
+        SimpleExpression(ColumnDefinition column, Operator operator, ByteBuffer value)
         {
             super(column, operator, value);
         }
@@ -658,6 +701,10 @@
                         }
                     }
                 case NEQ:
+                case LIKE_PREFIX:
+                case LIKE_SUFFIX:
+                case LIKE_CONTAINS:
+                case LIKE_MATCHES:
                     {
                         assert !column.isComplex() : "Only CONTAINS and CONTAINS_KEY are supported for 'complex' types";
                         ByteBuffer foundValue = getValue(metadata, partitionKey, row);
@@ -751,7 +798,7 @@
         }
 
         @Override
-        Kind kind()
+        protected Kind kind()
         {
             return Kind.SIMPLE;
         }
@@ -845,7 +892,7 @@
         }
 
         @Override
-        Kind kind()
+        protected Kind kind()
         {
             return Kind.MAP_EQUALITY;
         }
@@ -893,7 +940,7 @@
         }
 
         @Override
-        Kind kind()
+        protected Kind kind()
         {
             return Kind.THRIFT_DYN_EXPR;
         }
@@ -943,7 +990,7 @@
                                          .customExpressionValueType());
         }
 
-        Kind kind()
+        protected Kind kind()
         {
             return Kind.CUSTOM;
         }
@@ -955,6 +1002,100 @@
         }
     }
 
+    /**
+     * A user defined filtering expression. These may be added to RowFilter programmatically by a
+     * QueryHandler implementation. No concrete implementations are provided and adding custom impls
+     * to the classpath is a task for operators (needless to say, this is something of a power
+     * user feature). Care must also be taken to register implementations, via the static register
+     * method during system startup. An implementation and its corresponding Deserializer must be
+     * registered before sending or receiving any messages containing expressions of that type.
+     * Use of custom filtering expressions in a mixed version cluster should be handled with caution
+     * as the order in which types are registered is significant: if continuity of use during upgrades
+     * is important, new types should registered last and obsoleted types should still be registered (
+     * or dummy implementations registered in their place) to preserve consistent identifiers across
+     * the cluster).
+     *
+     * During serialization, the identifier for the Deserializer implementation is prepended to the
+     * implementation specific payload. To deserialize, the identifier is read first to obtain the
+     * Deserializer, which then provides the concrete expression instance.
+     */
+    public static abstract class UserExpression extends Expression
+    {
+        private static final DeserializerRegistry deserializers = new DeserializerRegistry();
+        private static final class DeserializerRegistry
+        {
+            private final AtomicInteger counter = new AtomicInteger(0);
+            private final ConcurrentMap<Integer, Deserializer> deserializers = new ConcurrentHashMap<>();
+            private final ConcurrentMap<Class<? extends UserExpression>, Integer> registeredClasses = new ConcurrentHashMap<>();
+
+            public void registerUserExpressionClass(Class<? extends UserExpression> expressionClass,
+                                                    UserExpression.Deserializer deserializer)
+            {
+                int id = registeredClasses.computeIfAbsent(expressionClass, (cls) -> counter.getAndIncrement());
+                deserializers.put(id, deserializer);
+
+                logger.debug("Registered user defined expression type {} and serializer {} with identifier {}",
+                             expressionClass.getName(), deserializer.getClass().getName(), id);
+            }
+
+            public Integer getId(UserExpression expression)
+            {
+                return registeredClasses.get(expression.getClass());
+            }
+
+            public Deserializer getDeserializer(int id)
+            {
+                return deserializers.get(id);
+            }
+        }
+
+        protected static abstract class Deserializer
+        {
+            protected abstract UserExpression deserialize(DataInputPlus in,
+                                                          int version,
+                                                          CFMetaData metadata) throws IOException;
+        }
+
+        public static void register(Class<? extends UserExpression> expressionClass, Deserializer deserializer)
+        {
+            deserializers.registerUserExpressionClass(expressionClass, deserializer);
+        }
+
+        private static UserExpression deserialize(DataInputPlus in, int version, CFMetaData metadata) throws IOException
+        {
+            int id = in.readInt();
+            Deserializer deserializer = deserializers.getDeserializer(id);
+            assert deserializer != null : "No user defined expression type registered with id " + id;
+            return deserializer.deserialize(in, version, metadata);
+        }
+
+        private static void serialize(UserExpression expression, DataOutputPlus out, int version) throws IOException
+        {
+            Integer id = deserializers.getId(expression);
+            assert id != null : "User defined expression type " + expression.getClass().getName() + " is not registered";
+            out.writeInt(id);
+            expression.serialize(out, version);
+        }
+
+        private static long serializedSize(UserExpression expression, int version)
+        {   // 4 bytes for the expression type id
+            return 4 + expression.serializedSize(version);
+        }
+
+        protected UserExpression(ColumnDefinition column, Operator operator, ByteBuffer value)
+        {
+            super(column, operator, value);
+        }
+
+        protected Kind kind()
+        {
+            return Kind.USER;
+        }
+
+        protected abstract void serialize(DataOutputPlus out, int version) throws IOException;
+        protected abstract long serializedSize(int version);
+    }
+
     public static class Serializer
     {
         public void serialize(RowFilter filter, DataOutputPlus out, int version) throws IOException

diff --git a/src/java/org/apache/cassandra/db/lifecycle/LifecycleTransaction.java b/src/java/org/apache/cassandra/db/lifecycle/LifecycleTransaction.java
index 91515aa..2311143 100644
--- a/src/java/org/apache/cassandra/db/lifecycle/LifecycleTransaction.java
+++ b/src/java/org/apache/cassandra/db/lifecycle/LifecycleTransaction.java

@@ -525,9 +525,9 @@
         log.untrackNew(table);
     }
 
-    public static void removeUnfinishedLeftovers(CFMetaData metadata)
+    public static boolean removeUnfinishedLeftovers(CFMetaData metadata)
     {
-        LogTransaction.removeUnfinishedLeftovers(metadata);
+        return LogTransaction.removeUnfinishedLeftovers(metadata);
     }
 
     /**

diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogAwareFileLister.java b/src/java/org/apache/cassandra/db/lifecycle/LogAwareFileLister.java
index e9072c4..a22f6ef 100644
--- a/src/java/org/apache/cassandra/db/lifecycle/LogAwareFileLister.java
+++ b/src/java/org/apache/cassandra/db/lifecycle/LogAwareFileLister.java

@@ -169,13 +169,12 @@
 
         logger.error("Failed to classify files in {}\n" +
                      "Some old files are missing but the txn log is still there and not completed\n" +
-                     "Files in folder:\n{}\nTxn: {}\n{}",
+                     "Files in folder:\n{}\nTxn: {}",
                      folder,
                      files.isEmpty()
                         ? "\t-"
                         : String.join("\n", files.keySet().stream().map(f -> String.format("\t%s", f)).collect(Collectors.toList())),
-                     txnFile.toString(),
-                     String.join("\n", txnFile.getRecords().stream().map(r -> String.format("\t%s", r)).collect(Collectors.toList())));
+                     txnFile.toString(true));
 
         // some old files are missing and yet the txn is still there and not completed
         // something must be wrong (see comment at the top of LogTransaction requiring txn to be

diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogFile.java b/src/java/org/apache/cassandra/db/lifecycle/LogFile.java
index 6d0c835..201f04d 100644
--- a/src/java/org/apache/cassandra/db/lifecycle/LogFile.java
+++ b/src/java/org/apache/cassandra/db/lifecycle/LogFile.java

@@ -94,9 +94,9 @@
         return new LogFile(operationType, id, logReplicas);
     }
 
-    Throwable syncFolder(Throwable accumulate)
+    Throwable syncDirectory(Throwable accumulate)
     {
-        return replicas.syncFolder(accumulate);
+        return replicas.syncDirectory(accumulate);
     }
 
     OperationType type()
@@ -115,9 +115,9 @@
         {
             deleteFilesForRecordsOfType(committed() ? Type.REMOVE : Type.ADD);
 
-            // we sync the parent folders between contents and log deletion
+            // we sync the parent directories between contents and log deletion
             // to ensure there is a happens before edge between them
-            Throwables.maybeFail(syncFolder(accumulate));
+            Throwables.maybeFail(syncDirectory(accumulate));
 
             accumulate = replicas.delete(accumulate);
         }
@@ -151,7 +151,7 @@
         records.clear();
         if (!replicas.readRecords(records))
         {
-            logger.error("Failed to read records from {}", replicas);
+            logger.error("Failed to read records for transaction log {}", this);
             return false;
         }
 
@@ -164,7 +164,7 @@
         LogRecord failedOn = firstInvalid.get();
         if (getLastRecord() != failedOn)
         {
-            logError(failedOn);
+            setErrorInReplicas(failedOn);
             return false;
         }
 
@@ -172,10 +172,10 @@
         if (records.stream()
                    .filter((r) -> r != failedOn)
                    .filter(LogRecord::isInvalid)
-                   .map(LogFile::logError)
+                   .map(this::setErrorInReplicas)
                    .findFirst().isPresent())
         {
-            logError(failedOn);
+            setErrorInReplicas(failedOn);
             return false;
         }
 
@@ -188,9 +188,9 @@
         return true;
     }
 
-    static LogRecord logError(LogRecord record)
+    LogRecord setErrorInReplicas(LogRecord record)
     {
-        logger.error("{}", record.error());
+        replicas.setErrorInReplicas(record);
         return record;
     }
 
@@ -198,9 +198,8 @@
     {
         if (record.checksum != record.computeChecksum())
         {
-            record.setError(String.format("Invalid checksum for sstable [%s], record [%s]: [%d] should have been [%d]",
+            record.setError(String.format("Invalid checksum for sstable [%s]: [%d] should have been [%d]",
                                           record.fileName(),
-                                          record,
                                           record.checksum,
                                           record.computeChecksum()));
             return;
@@ -218,10 +217,9 @@
         record.status.onDiskRecord = record.withExistingFiles();
         if (record.updateTime != record.status.onDiskRecord.updateTime && record.status.onDiskRecord.numFiles > 0)
         {
-            record.setError(String.format("Unexpected files detected for sstable [%s], " +
-                                          "record [%s]: last update time [%tT] should have been [%tT]",
+            record.setError(String.format("Unexpected files detected for sstable [%s]: " +
+                                          "last update time [%tT] should have been [%tT]",
                                           record.fileName(),
-                                          record,
                                           record.status.onDiskRecord.updateTime,
                                           record.updateTime));
 
@@ -233,11 +231,9 @@
         if (record.type == Type.REMOVE && record.status.onDiskRecord.numFiles < record.numFiles)
         { // if we found a corruption in the last record, then we continue only
           // if the number of files matches exactly for all previous records.
-            record.setError(String.format("Incomplete fileset detected for sstable [%s], record [%s]: " +
-                                          "number of files [%d] should have been [%d]. Treating as unrecoverable " +
-                                          "due to corruption of the final record.",
+            record.setError(String.format("Incomplete fileset detected for sstable [%s]: " +
+                                          "number of files [%d] should have been [%d].",
                                           record.fileName(),
-                                          record.raw,
                                           record.status.onDiskRecord.numFiles,
                                           record.numFiles));
         }
@@ -288,8 +284,9 @@
     {
         assert type == Type.ADD || type == Type.REMOVE;
 
-        File folder = table.descriptor.directory;
-        replicas.maybeCreateReplica(folder, getFileName(folder), records);
+        File directory = table.descriptor.directory;
+        String fileName = StringUtils.join(directory, File.separator, getFileName());
+        replicas.maybeCreateReplica(directory, fileName, records);
         return LogRecord.make(type, table);
     }
 
@@ -382,7 +379,25 @@
     @Override
     public String toString()
     {
-        return replicas.toString();
+        return toString(false);
+    }
+
+    public String toString(boolean showContents)
+    {
+        StringBuilder str = new StringBuilder();
+        str.append('[');
+        str.append(getFileName());
+        str.append(" in ");
+        str.append(replicas.getDirectories());
+        str.append(']');
+        if (showContents)
+        {
+            str.append(System.lineSeparator());
+            str.append("Files and contents follow:");
+            str.append(System.lineSeparator());
+            replicas.printContentsWithAnyErrors(str);
+        }
+        return str.toString();
     }
 
     @VisibleForTesting
@@ -397,21 +412,15 @@
         return replicas.getFilePaths();
     }
 
-    private String getFileName(File folder)
+    private String getFileName()
     {
-        String fileName = StringUtils.join(BigFormat.latestVersion,
-                                           LogFile.SEP,
-                                           "txn",
-                                           LogFile.SEP,
-                                           type.fileName,
-                                           LogFile.SEP,
-                                           id.toString(),
-                                           LogFile.EXT);
-        return StringUtils.join(folder, File.separator, fileName);
-    }
-
-    Collection<LogRecord> getRecords()
-    {
-        return records;
+        return StringUtils.join(BigFormat.latestVersion,
+                                LogFile.SEP,
+                                "txn",
+                                LogFile.SEP,
+                                type.fileName,
+                                LogFile.SEP,
+                                id.toString(),
+                                LogFile.EXT);
     }
 }

diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java b/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java
index d7eb774..02f8841 100644
--- a/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java
+++ b/src/java/org/apache/cassandra/db/lifecycle/LogRecord.java

@@ -125,11 +125,13 @@
                                  matcher.group(2),
                                  Long.valueOf(matcher.group(3)),
                                  Integer.valueOf(matcher.group(4)),
-                                 Long.valueOf(matcher.group(5)), line);
+                                 Long.valueOf(matcher.group(5)),
+                                 line);
         }
-        catch (Throwable t)
+        catch (IllegalArgumentException e)
         {
-            return new LogRecord(Type.UNKNOWN, null, 0, 0, 0, line).setError(t);
+            return new LogRecord(Type.UNKNOWN, null, 0, 0, 0, line)
+                   .setError(String.format("Failed to parse line: %s", e.getMessage()));
         }
     }
 
@@ -200,11 +202,6 @@
         }
     }
 
-    LogRecord setError(Throwable t)
-    {
-        return setError(t.getMessage());
-    }
-
     LogRecord setError(String error)
     {
         status.setError(error);

diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogReplica.java b/src/java/org/apache/cassandra/db/lifecycle/LogReplica.java
index 79b9749..da90f88 100644
--- a/src/java/org/apache/cassandra/db/lifecycle/LogReplica.java
+++ b/src/java/org/apache/cassandra/db/lifecycle/LogReplica.java

@@ -19,6 +19,9 @@
 package org.apache.cassandra.db.lifecycle;
 
 import java.io.File;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
 
 import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.utils.CLibrary;
@@ -26,7 +29,7 @@
 /**
  * Because a column family may have sstables on different disks and disks can
  * be removed, we duplicate log files into many replicas so as to have a file
- * in each folder where sstables exist.
+ * in each directory where sstables exist.
  *
  * Each replica contains the exact same content but we do allow for final
  * partial records in case we crashed after writing to one replica but
@@ -37,11 +40,12 @@
 final class LogReplica
 {
     private final File file;
-    private int folderDescriptor;
+    private int directoryDescriptor;
+    private final Map<String, String> errors = new HashMap<>();
 
-    static LogReplica create(File folder, String fileName)
+    static LogReplica create(File directory, String fileName)
     {
-        return new LogReplica(new File(fileName), CLibrary.tryOpenDirectory(folder.getPath()));
+        return new LogReplica(new File(fileName), CLibrary.tryOpenDirectory(directory.getPath()));
     }
 
     static LogReplica open(File file)
@@ -49,10 +53,10 @@
         return new LogReplica(file, CLibrary.tryOpenDirectory(file.getParentFile().getPath()));
     }
 
-    LogReplica(File file, int folderDescriptor)
+    LogReplica(File file, int directoryDescriptor)
     {
         this.file = file;
-        this.folderDescriptor = folderDescriptor;
+        this.directoryDescriptor = directoryDescriptor;
     }
 
     File file()
@@ -60,27 +64,42 @@
         return file;
     }
 
+    List<String> readLines()
+    {
+        return FileUtils.readLines(file);
+    }
+
+    String getFileName()
+    {
+        return file.getName();
+    }
+
+    String getDirectory()
+    {
+        return file.getParent();
+    }
+
     void append(LogRecord record)
     {
         boolean existed = exists();
         FileUtils.appendAndSync(file, record.toString());
 
         // If the file did not exist before appending the first
-        // line, then sync the folder as well since now it must exist
+        // line, then sync the directory as well since now it must exist
         if (!existed)
-            syncFolder();
+            syncDirectory();
     }
 
-    void syncFolder()
+    void syncDirectory()
     {
-        if (folderDescriptor >= 0)
-            CLibrary.trySync(folderDescriptor);
+        if (directoryDescriptor >= 0)
+            CLibrary.trySync(directoryDescriptor);
     }
 
     void delete()
     {
         LogTransaction.delete(file);
-        syncFolder();
+        syncDirectory();
     }
 
     boolean exists()
@@ -90,10 +109,10 @@
 
     void close()
     {
-        if (folderDescriptor >= 0)
+        if (directoryDescriptor >= 0)
         {
-            CLibrary.tryCloseFD(folderDescriptor);
-            folderDescriptor = -1;
+            CLibrary.tryCloseFD(directoryDescriptor);
+            directoryDescriptor = -1;
         }
     }
 
@@ -102,4 +121,31 @@
     {
         return String.format("[%s] ", file);
     }
+
+    void setError(String line, String error)
+    {
+        errors.put(line, error);
+    }
+
+    void printContentsWithAnyErrors(StringBuilder str)
+    {
+        str.append(file.getPath());
+        str.append(System.lineSeparator());
+        FileUtils.readLines(file).forEach(line -> printLineWithAnyError(str, line));
+    }
+
+    private void printLineWithAnyError(StringBuilder str, String line)
+    {
+        str.append('\t');
+        str.append(line);
+        str.append(System.lineSeparator());
+
+        String error = errors.get(line);
+        if (error != null)
+        {
+            str.append("\t\t***");
+            str.append(error);
+            str.append(System.lineSeparator());
+        }
+    }
 }

diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogReplicaSet.java b/src/java/org/apache/cassandra/db/lifecycle/LogReplicaSet.java
index d9d9213..47a9901 100644
--- a/src/java/org/apache/cassandra/db/lifecycle/LogReplicaSet.java
+++ b/src/java/org/apache/cassandra/db/lifecycle/LogReplicaSet.java

@@ -32,14 +32,14 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
-import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.utils.Throwables;
 
 /**
  * A set of log replicas. This class mostly iterates over replicas when writing or reading,
  * ensuring consistency among them and hiding replication details from LogFile.
  *
- * @see LogReplica, LogFile
+ * @see LogReplica
+ * @see LogFile
  */
 public class LogReplicaSet
 {
@@ -59,31 +59,31 @@
 
     void addReplica(File file)
     {
-        File folder = file.getParentFile();
-        assert !replicasByFile.containsKey(folder);
-        replicasByFile.put(folder, LogReplica.open(file));
+        File directory = file.getParentFile();
+        assert !replicasByFile.containsKey(directory);
+        replicasByFile.put(directory, LogReplica.open(file));
 
         if (logger.isTraceEnabled())
             logger.trace("Added log file replica {} ", file);
     }
 
-    void maybeCreateReplica(File folder, String fileName, Set<LogRecord> records)
+    void maybeCreateReplica(File directory, String fileName, Set<LogRecord> records)
     {
-        if (replicasByFile.containsKey(folder))
+        if (replicasByFile.containsKey(directory))
             return;
 
-        final LogReplica replica = LogReplica.create(folder, fileName);
+        final LogReplica replica = LogReplica.create(directory, fileName);
 
         records.forEach(replica::append);
-        replicasByFile.put(folder, replica);
+        replicasByFile.put(directory, replica);
 
         if (logger.isTraceEnabled())
             logger.trace("Created new file replica {}", replica);
     }
 
-    Throwable syncFolder(Throwable accumulate)
+    Throwable syncDirectory(Throwable accumulate)
     {
-        return Throwables.perform(accumulate, replicas().stream().map(s -> s::syncFolder));
+        return Throwables.perform(accumulate, replicas().stream().map(s -> s::syncDirectory));
     }
 
     Throwable delete(Throwable accumulate)
@@ -100,15 +100,18 @@
 
     boolean readRecords(Set<LogRecord> records)
     {
-        Map<File, List<String>> linesByReplica = replicas().stream()
-                                                           .map(LogReplica::file)
-                                                           .collect(Collectors.toMap(Function.<File>identity(), FileUtils::readLines));
+        Map<LogReplica, List<String>> linesByReplica = replicas().stream()
+                                                                 .collect(Collectors.toMap(Function.<LogReplica>identity(),
+                                                                                           LogReplica::readLines,
+                                                                                           (k, v) -> {throw new IllegalStateException("Duplicated key: " + k);},
+                                                                                           LinkedHashMap::new));
+
         int maxNumLines = linesByReplica.values().stream().map(List::size).reduce(0, Integer::max);
         for (int i = 0; i < maxNumLines; i++)
         {
             String firstLine = null;
             boolean partial = false;
-            for (Map.Entry<File, List<String>> entry : linesByReplica.entrySet())
+            for (Map.Entry<LogReplica, List<String>> entry : linesByReplica.entrySet())
             {
                 List<String> currentLines = entry.getValue();
                 if (i >= currentLines.size())
@@ -124,9 +127,10 @@
                 if (!isPrefixMatch(firstLine, currentLine))
                 { // not a prefix match
                     logger.error("Mismatched line in file {}: got '{}' expected '{}', giving up",
-                                 entry.getKey().getName(),
+                                 entry.getKey().getFileName(),
                                  currentLine,
                                  firstLine);
+                    entry.getKey().setError(currentLine, String.format("Does not match <%s> in first replica file", firstLine));
                     return false;
                 }
 
@@ -135,7 +139,7 @@
                     if (i == currentLines.size() - 1)
                     { // last record, just set record as invalid and move on
                         logger.warn("Mismatched last line in file {}: '{}' not the same as '{}'",
-                                    entry.getKey().getName(),
+                                    entry.getKey().getFileName(),
                                     currentLine,
                                     firstLine);
 
@@ -147,9 +151,10 @@
                     else
                     {   // mismatched entry file has more lines, giving up
                         logger.error("Mismatched line in file {}: got '{}' expected '{}', giving up",
-                                     entry.getKey().getName(),
+                                     entry.getKey().getFileName(),
                                      currentLine,
                                      firstLine);
+                        entry.getKey().setError(currentLine, String.format("Does not match <%s> in first replica file", firstLine));
                         return false;
                     }
                 }
@@ -159,6 +164,7 @@
             if (records.contains(record))
             { // duplicate records
                 logger.error("Found duplicate record {} for {}, giving up", record, record.fileName());
+                setError(record, "Duplicated record");
                 return false;
             }
 
@@ -170,6 +176,7 @@
             if (record.isFinal() && i != (maxNumLines - 1))
             { // too many final records
                 logger.error("Found too many lines for {}, giving up", record.fileName());
+                setError(record, "This record should have been the last one in all replicas");
                 return false;
             }
         }
@@ -177,6 +184,22 @@
         return true;
     }
 
+    void setError(LogRecord record, String error)
+    {
+        record.setError(error);
+        setErrorInReplicas(record);
+    }
+
+    void setErrorInReplicas(LogRecord record)
+    {
+        replicas().forEach(r -> r.setError(record.raw, record.error()));
+    }
+
+    void printContentsWithAnyErrors(StringBuilder str)
+    {
+        replicas().forEach(r -> r.printContentsWithAnyErrors(str));
+    }
+
     /**
      *  Add the record to all the replicas: if it is a final record then we throw only if we fail to write it
      *  to all, otherwise we throw if we fail to write it to any file, see CASSANDRA-10421 for details
@@ -215,6 +238,11 @@
                : "[-]";
     }
 
+    String getDirectories()
+    {
+        return String.join(", ", replicas().stream().map(LogReplica::getDirectory).collect(Collectors.toList()));
+    }
+
     @VisibleForTesting
     List<File> getFiles()
     {

diff --git a/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java b/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java
index b34ca60..bfd9739 100644
--- a/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java
+++ b/src/java/org/apache/cassandra/db/lifecycle/LogTransaction.java

@@ -24,6 +24,7 @@
 import java.util.*;
 import java.util.concurrent.ConcurrentLinkedQueue;
 import java.util.concurrent.TimeUnit;
+import java.util.function.Predicate;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.util.concurrent.Runnables;
@@ -56,7 +57,7 @@
  * IMPORTANT: The transaction must complete (commit or abort) before any temporary files are deleted, even though the
  * txn log file itself will not be deleted until all tracked files are deleted. This is required by FileLister to ensure
  * a consistent disk state. LifecycleTransaction ensures this requirement, so this class should really never be used
- * outside of LT. @see FileLister.classifyFiles(TransactionData txn)
+ * outside of LT. @see LogAwareFileLister.classifyFiles()
  *
  * A class that tracks sstable files involved in a transaction across sstables:
  * if the transaction succeeds the old files should be deleted and the new ones kept; vice-versa if it fails.
@@ -68,8 +69,7 @@
  *
  * where sstable-2 is a new sstable to be retained if the transaction succeeds and sstable-1 is an old sstable to be
  * removed. CRC is an incremental CRC of the file content up to this point. For old sstable files we also log the
- * last update time of all files for the sstable descriptor and a checksum of vital properties such as update times
- * and file sizes.
+ * last update time of all files for the sstable descriptor and the number of sstable files.
  *
  * Upon commit we add a final line to the log file:
  *
@@ -239,27 +239,29 @@
         public void run()
         {
             if (logger.isTraceEnabled())
-                logger.trace("Removing files for transaction {}", name());
+                logger.trace("Removing files for transaction log {}", data);
 
             if (!data.completed())
             { // this happens if we forget to close a txn and the garbage collector closes it for us
-                logger.error("{} was not completed, trying to abort it now", data);
+                logger.error("Transaction log {} indicates txn was not completed, trying to abort it now", data);
                 Throwable err = Throwables.perform((Throwable)null, data::abort);
                 if (err != null)
-                    logger.error("Failed to abort {}", data, err);
+                    logger.error("Failed to abort transaction log {}", data, err);
             }
 
             Throwable err = data.removeUnfinishedLeftovers(null);
 
             if (err != null)
             {
-                logger.info("Failed deleting files for transaction {}, we'll retry after GC and on on server restart", name(), err);
+                logger.info("Failed deleting files for transaction log {}, we'll retry after GC and on on server restart",
+                            data,
+                            err);
                 failedDeletions.add(this);
             }
             else
             {
                 if (logger.isTraceEnabled())
-                    logger.trace("Closing file transaction {}", name());
+                    logger.trace("Closing transaction log {}", data);
 
                 data.close();
             }
@@ -361,7 +363,7 @@
         }
         catch (Throwable t)
         {
-            logger.error("Failed to complete file transaction {}", id(), t);
+            logger.error("Failed to complete file transaction id {}", id(), t);
             return Throwables.merge(accumulate, t);
         }
     }
@@ -379,31 +381,43 @@
     protected void doPrepare() { }
 
     /**
-     * Called on startup to scan existing folders for any unfinished leftovers of
-     * operations that were ongoing when the process exited. Also called by the standalone
-     * sstableutil tool when the cleanup option is specified, @see StandaloneSSTableUtil.
+     * Removes any leftovers from unifinished transactions as indicated by any transaction log files that
+     * are found in the table directories. This means that any old sstable files for transactions that were committed,
+     * or any new sstable files for transactions that were aborted or still in progress, should be removed *if
+     * it is safe to do so*. Refer to the checks in LogFile.verify for further details on the safety checks
+     * before removing transaction leftovers and refer to the comments at the beginning of this file or in NEWS.txt
+     * for further details on transaction logs.
+     *
+     * This method is called on startup and by the standalone sstableutil tool when the cleanup option is specified,
+     * @see StandaloneSSTableUtil.
+     *
+     * @return true if the leftovers of all transaction logs found were removed, false otherwise.
      *
      */
-    static void removeUnfinishedLeftovers(CFMetaData metadata)
+    static boolean removeUnfinishedLeftovers(CFMetaData metadata)
     {
-        removeUnfinishedLeftovers(new Directories(metadata, ColumnFamilyStore.getInitialDirectories()).getCFDirectories());
+        return removeUnfinishedLeftovers(new Directories(metadata, ColumnFamilyStore.getInitialDirectories()).getCFDirectories());
     }
 
     @VisibleForTesting
-    static void removeUnfinishedLeftovers(List<File> folders)
+    static boolean removeUnfinishedLeftovers(List<File> directories)
     {
         LogFilesByName logFiles = new LogFilesByName();
-        folders.forEach(logFiles::list);
-        logFiles.removeUnfinishedLeftovers();
+        directories.forEach(logFiles::list);
+        return logFiles.removeUnfinishedLeftovers();
     }
 
     private static final class LogFilesByName
     {
+        // This maps a transaction log file name to a list of physical files. Each sstable
+        // can have multiple directories and a transaction is trakced by identical transaction log
+        // files, one per directory. So for each transaction file name we can have multiple
+        // physical files.
         Map<String, List<File>> files = new HashMap<>();
 
-        void list(File folder)
+        void list(File directory)
         {
-            Arrays.stream(folder.listFiles(LogFile::isLogFile)).forEach(this::add);
+            Arrays.stream(directory.listFiles(LogFile::isLogFile)).forEach(this::add);
         }
 
         void add(File file)
@@ -418,25 +432,35 @@
             filesByName.add(file);
         }
 
-        void removeUnfinishedLeftovers()
+        boolean removeUnfinishedLeftovers()
         {
-            files.forEach(LogFilesByName::removeUnfinishedLeftovers);
+            return files.entrySet()
+                        .stream()
+                        .map(LogFilesByName::removeUnfinishedLeftovers)
+                        .allMatch(Predicate.isEqual(true));
         }
 
-        static void removeUnfinishedLeftovers(String name, List<File> logFiles)
+        static boolean removeUnfinishedLeftovers(Map.Entry<String, List<File>> entry)
         {
-            LogFile txn = LogFile.make(name, logFiles);
+            LogFile txn = LogFile.make(entry.getKey(), entry.getValue());
             try
             {
                 if (txn.verify())
                 {
                     Throwable failure = txn.removeUnfinishedLeftovers(null);
                     if (failure != null)
-                        logger.error("Failed to remove unfinished transaction leftovers for txn {}", txn, failure);
+                    {
+                        logger.error("Failed to remove unfinished transaction leftovers for transaction log {}",
+                                     txn.toString(true), failure);
+                        return false;
+                    }
+
+                    return true;
                 }
                 else
                 {
-                    logger.error("Unexpected disk state: failed to read transaction txn {}", txn);
+                    logger.error("Unexpected disk state: failed to read transaction log {}", txn.toString(true));
+                    return false;
                 }
             }
             finally

diff --git a/src/java/org/apache/cassandra/db/lifecycle/Tracker.java b/src/java/org/apache/cassandra/db/lifecycle/Tracker.java
index c94b88f..b1c706e 100644
--- a/src/java/org/apache/cassandra/db/lifecycle/Tracker.java
+++ b/src/java/org/apache/cassandra/db/lifecycle/Tracker.java

@@ -32,7 +32,7 @@
 import org.apache.cassandra.db.Directories;
 import org.apache.cassandra.db.Memtable;
 import org.apache.cassandra.db.commitlog.CommitLog;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -199,7 +199,7 @@
     public void reset()
     {
         view.set(new View(
-                         !isDummy() ? ImmutableList.of(new Memtable(new AtomicReference<>(CommitLog.instance.getContext()), cfstore))
+                         !isDummy() ? ImmutableList.of(new Memtable(new AtomicReference<>(CommitLog.instance.getCurrentPosition()), cfstore))
                                     : ImmutableList.<Memtable>of(),
                          ImmutableList.<Memtable>of(),
                          Collections.<SSTableReader, SSTableReader>emptyMap(),
@@ -293,7 +293,7 @@
     /**
      * get the Memtable that the ordered writeOp should be directed to
      */
-    public Memtable getMemtableFor(OpOrder.Group opGroup, ReplayPosition replayPosition)
+    public Memtable getMemtableFor(OpOrder.Group opGroup, CommitLogPosition commitLogPosition)
     {
         // since any new memtables appended to the list after we fetch it will be for operations started
         // after us, we can safely assume that we will always find the memtable that 'accepts' us;
@@ -304,7 +304,7 @@
         // assign operations to a memtable that was retired/queued before we started)
         for (Memtable memtable : view.get().liveMemtables)
         {
-            if (memtable.accepts(opGroup, replayPosition))
+            if (memtable.accepts(opGroup, commitLogPosition))
                 return memtable;
         }
         throw new AssertionError(view.get().liveMemtables.toString());
@@ -323,6 +323,8 @@
         Pair<View, View> result = apply(View.switchMemtable(newMemtable));
         if (truncating)
             notifyRenewed(newMemtable);
+        else
+            notifySwitched(result.left.getCurrentMemtable());
 
         return result.left.getCurrentMemtable();
     }
@@ -332,10 +334,10 @@
         apply(View.markFlushing(memtable));
     }
 
-    public void replaceFlushed(Memtable memtable, Collection<SSTableReader> sstables)
+    public void replaceFlushed(Memtable memtable, Iterable<SSTableReader> sstables)
     {
         assert !isDummy();
-        if (sstables.isEmpty())
+        if (Iterables.isEmpty(sstables))
         {
             // sstable may be null if we flushed batchlog and nothing needed to be retained
             // if it's null, we don't care what state the cfstore is in, we just replace it and continue
@@ -352,6 +354,8 @@
         Throwable fail;
         fail = updateSizeTracking(emptySet(), sstables, null);
 
+        notifyDiscarded(memtable);
+
         maybeFail(fail);
     }
 
@@ -476,16 +480,30 @@
             subscriber.handleNotification(notification, this);
     }
 
-    public void notifyRenewed(Memtable renewed)
+    public void notifyTruncated(long truncatedAt)
     {
-        INotification notification = new MemtableRenewedNotification(renewed);
+        INotification notification = new TruncationNotification(truncatedAt);
         for (INotificationConsumer subscriber : subscribers)
             subscriber.handleNotification(notification, this);
     }
 
-    public void notifyTruncated(long truncatedAt)
+    public void notifyRenewed(Memtable renewed)
     {
-        INotification notification = new TruncationNotification(truncatedAt);
+        notify(new MemtableRenewedNotification(renewed));
+    }
+
+    public void notifySwitched(Memtable previous)
+    {
+        notify(new MemtableSwitchedNotification(previous));
+    }
+
+    public void notifyDiscarded(Memtable discarded)
+    {
+        notify(new MemtableDiscardedNotification(discarded));
+    }
+
+    private void notify(INotification notification)
+    {
         for (INotificationConsumer subscriber : subscribers)
             subscriber.handleNotification(notification, this);
     }

diff --git a/src/java/org/apache/cassandra/db/lifecycle/View.java b/src/java/org/apache/cassandra/db/lifecycle/View.java
index 3fa197f..a5c781d 100644
--- a/src/java/org/apache/cassandra/db/lifecycle/View.java
+++ b/src/java/org/apache/cassandra/db/lifecycle/View.java

@@ -327,7 +327,7 @@
     }
 
     // called after flush: removes memtable from flushingMemtables, and inserts flushed into the live sstable set
-    static Function<View, View> replaceFlushed(final Memtable memtable, final Collection<SSTableReader> flushed)
+    static Function<View, View> replaceFlushed(final Memtable memtable, final Iterable<SSTableReader> flushed)
     {
         return new Function<View, View>()
         {
@@ -336,7 +336,7 @@
                 List<Memtable> flushingMemtables = copyOf(filter(view.flushingMemtables, not(equalTo(memtable))));
                 assert flushingMemtables.size() == view.flushingMemtables.size() - 1;
 
-                if (flushed == null || flushed.isEmpty())
+                if (flushed == null || Iterables.isEmpty(flushed))
                     return new View(view.liveMemtables, flushingMemtables, view.sstablesMap,
                                     view.compactingMap, view.premature, view.intervalTree);
 

diff --git a/src/java/org/apache/cassandra/db/marshal/AbstractCompositeType.java b/src/java/org/apache/cassandra/db/marshal/AbstractCompositeType.java
index b0d6a5d..e0b365f 100644
--- a/src/java/org/apache/cassandra/db/marshal/AbstractCompositeType.java
+++ b/src/java/org/apache/cassandra/db/marshal/AbstractCompositeType.java

@@ -21,6 +21,7 @@
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.List;
+import java.util.regex.Pattern;
 
 import org.apache.cassandra.cql3.Term;
 import org.apache.cassandra.serializers.TypeSerializer;
@@ -107,6 +108,10 @@
         }
         return l.toArray(new ByteBuffer[l.size()]);
     }
+    private static final String COLON = ":";
+    private static final Pattern COLON_PAT = Pattern.compile(COLON);
+    private static final String ESCAPED_COLON = "\\\\:";
+    private static final Pattern ESCAPED_COLON_PAT = Pattern.compile(ESCAPED_COLON);
 
 
     /*
@@ -118,7 +123,7 @@
         if (input.isEmpty())
             return input;
 
-        String res = input.replaceAll(":", "\\\\:");
+        String res = COLON_PAT.matcher(input).replaceAll(ESCAPED_COLON);
         char last = res.charAt(res.length() - 1);
         return last == '\\' || last == '!' ? res + '!' : res;
     }
@@ -132,7 +137,7 @@
         if (input.isEmpty())
             return input;
 
-        String res = input.replaceAll("\\\\:", ":");
+        String res = ESCAPED_COLON_PAT.matcher(input).replaceAll(COLON);
         char last = res.charAt(res.length() - 1);
         return last == '!' ? res.substring(0, res.length() - 1) : res;
     }

diff --git a/src/java/org/apache/cassandra/db/marshal/AbstractType.java b/src/java/org/apache/cassandra/db/marshal/AbstractType.java
index 9c7ac49..2b5503b 100644
--- a/src/java/org/apache/cassandra/db/marshal/AbstractType.java
+++ b/src/java/org/apache/cassandra/db/marshal/AbstractType.java

@@ -30,7 +30,9 @@
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import org.apache.cassandra.cql3.AssignmentTestable;
 import org.apache.cassandra.cql3.CQL3Type;
+import org.apache.cassandra.cql3.ColumnSpecification;
 import org.apache.cassandra.cql3.Term;
 import org.apache.cassandra.db.TypeSizes;
 import org.apache.cassandra.exceptions.SyntaxException;
@@ -54,7 +56,7 @@
  * represent a valid ByteBuffer for the type being compared.
  */
 @Unmetered
-public abstract class AbstractType<T> implements Comparator<ByteBuffer>
+public abstract class AbstractType<T> implements Comparator<ByteBuffer>, AssignmentTestable
 {
     private static final Logger logger = LoggerFactory.getLogger(AbstractType.class);
 
@@ -311,17 +313,40 @@
         return false;
     }
 
+    public boolean isUDT()
+    {
+        return false;
+    }
+
     public boolean isMultiCell()
     {
         return false;
     }
 
+    public boolean isFreezable()
+    {
+        return false;
+    }
+
     public AbstractType<?> freeze()
     {
         return this;
     }
 
     /**
+     * Returns an AbstractType instance that is equivalent to this one, but with all nested UDTs and collections
+     * explicitly frozen.
+     *
+     * This is only necessary for {@code 2.x -> 3.x} schema migrations, and can be removed in Cassandra 4.0.
+     *
+     * See CASSANDRA-11609 and CASSANDRA-11613.
+     */
+    public AbstractType<?> freezeNestedMulticellTypes()
+    {
+        return this;
+    }
+
+    /**
      * Returns {@code true} for types where empty should be handled like {@code null} like {@link Int32Type}.
      */
     public boolean isEmptyValueMeaningless()
@@ -434,6 +459,17 @@
         return getClass().getName();
     }
 
+    /**
+     * Checks to see if two types are equal when ignoring or not ignoring differences in being frozen, depending on
+     * the value of the ignoreFreezing parameter.
+     * @param other type to compare
+     * @param ignoreFreezing if true, differences in the types being frozen will be ignored
+     */
+    public boolean equals(Object other, boolean ignoreFreezing)
+    {
+        return this.equals(other);
+    }
+
     public void checkComparable()
     {
         switch (comparisonType)
@@ -442,4 +478,24 @@
                 throw new IllegalArgumentException(this + " cannot be used in comparisons, so cannot be used as a clustering column");
         }
     }
+
+    public final AssignmentTestable.TestResult testAssignment(String keyspace, ColumnSpecification receiver)
+    {
+        // We should ignore the fact that the output type is frozen in our comparison as functions do not support
+        // frozen types for arguments
+        AbstractType<?> receiverType = receiver.type;
+        if (isFreezable() && !isMultiCell())
+            receiverType = receiverType.freeze();
+
+        if (isReversed())
+            receiverType = ReversedType.getInstance(receiverType);
+
+        if (equals(receiverType))
+            return AssignmentTestable.TestResult.EXACT_MATCH;
+
+        if (receiverType.isValueCompatibleWith(this))
+            return AssignmentTestable.TestResult.WEAKLY_ASSIGNABLE;
+
+        return AssignmentTestable.TestResult.NOT_ASSIGNABLE;
+    }
 }

diff --git a/src/java/org/apache/cassandra/db/marshal/CollectionType.java b/src/java/org/apache/cassandra/db/marshal/CollectionType.java
index d65e3a6..2f5cbb6 100644
--- a/src/java/org/apache/cassandra/db/marshal/CollectionType.java
+++ b/src/java/org/apache/cassandra/db/marshal/CollectionType.java

@@ -22,7 +22,6 @@
 import java.util.List;
 import java.util.Iterator;
 
-import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.CQL3Type;
 import org.apache.cassandra.cql3.ColumnSpecification;
 import org.apache.cassandra.cql3.Lists;
@@ -133,13 +132,19 @@
         return kind == Kind.MAP;
     }
 
+    @Override
+    public boolean isFreezable()
+    {
+        return true;
+    }
+
     // Overrided by maps
     protected int collectionSize(List<ByteBuffer> values)
     {
         return values.size();
     }
 
-    public ByteBuffer serializeForNativeProtocol(ColumnDefinition def, Iterator<Cell> cells, int version)
+    public ByteBuffer serializeForNativeProtocol(Iterator<Cell> cells, int version)
     {
         assert isMultiCell();
         List<ByteBuffer> values = serializedValues(cells);
@@ -204,6 +209,27 @@
     }
 
     @Override
+    public boolean equals(Object o, boolean ignoreFreezing)
+    {
+        if (this == o)
+            return true;
+
+        if (!(o instanceof CollectionType))
+            return false;
+
+        CollectionType other = (CollectionType)o;
+
+        if (kind != other.kind)
+            return false;
+
+        if (!ignoreFreezing && isMultiCell() != other.isMultiCell())
+            return false;
+
+        return nameComparator().equals(other.nameComparator(), ignoreFreezing) &&
+               valueComparator().equals(other.valueComparator(), ignoreFreezing);
+    }
+
+    @Override
     public String toString()
     {
         return this.toString(false);

diff --git a/src/java/org/apache/cassandra/db/marshal/CompositeType.java b/src/java/org/apache/cassandra/db/marshal/CompositeType.java
index d005fd7..8e581b6 100644
--- a/src/java/org/apache/cassandra/db/marshal/CompositeType.java
+++ b/src/java/org/apache/cassandra/db/marshal/CompositeType.java

@@ -384,7 +384,9 @@
         for (ByteBuffer bb : buffers)
         {
             ByteBufferUtil.writeShortLength(out, bb.remaining());
-            out.put(bb.duplicate());
+            int toCopy = bb.remaining();
+            ByteBufferUtil.arrayCopy(bb, bb.position(), out, out.position(), toCopy);
+            out.position(out.position() + toCopy);
             out.put((byte) 0);
         }
         out.flip();

diff --git a/src/java/org/apache/cassandra/db/marshal/ListType.java b/src/java/org/apache/cassandra/db/marshal/ListType.java
index 4480dcb..ed843b1 100644
--- a/src/java/org/apache/cassandra/db/marshal/ListType.java
+++ b/src/java/org/apache/cassandra/db/marshal/ListType.java

@@ -110,6 +110,18 @@
     }
 
     @Override
+    public AbstractType<?> freezeNestedMulticellTypes()
+    {
+        if (!isMultiCell())
+            return this;
+
+        if (elements.isFreezable() && elements.isMultiCell())
+            return getInstance(elements.freeze(), isMultiCell);
+
+        return getInstance(elements.freezeNestedMulticellTypes(), isMultiCell);
+    }
+
+    @Override
     public boolean isMultiCell()
     {
         return isMultiCell;

diff --git a/src/java/org/apache/cassandra/db/marshal/MapType.java b/src/java/org/apache/cassandra/db/marshal/MapType.java
index 425ffc2..d5cf959 100644
--- a/src/java/org/apache/cassandra/db/marshal/MapType.java
+++ b/src/java/org/apache/cassandra/db/marshal/MapType.java

@@ -117,6 +117,23 @@
     }
 
     @Override
+    public AbstractType<?> freezeNestedMulticellTypes()
+    {
+        if (!isMultiCell())
+            return this;
+
+        AbstractType<?> keyType = (keys.isFreezable() && keys.isMultiCell())
+                                ? keys.freeze()
+                                : keys.freezeNestedMulticellTypes();
+
+        AbstractType<?> valueType = (values.isFreezable() && values.isMultiCell())
+                                  ? values.freeze()
+                                  : values.freezeNestedMulticellTypes();
+
+        return getInstance(keyType, valueType, isMultiCell);
+    }
+
+    @Override
     public boolean isCompatibleWithFrozen(CollectionType<?> previous)
     {
         assert !isMultiCell;

diff --git a/src/java/org/apache/cassandra/db/marshal/SetType.java b/src/java/org/apache/cassandra/db/marshal/SetType.java
index 22577b3..98f9f7e 100644
--- a/src/java/org/apache/cassandra/db/marshal/SetType.java
+++ b/src/java/org/apache/cassandra/db/marshal/SetType.java

@@ -105,6 +105,18 @@
     }
 
     @Override
+    public AbstractType<?> freezeNestedMulticellTypes()
+    {
+        if (!isMultiCell())
+            return this;
+
+        if (elements.isFreezable() && elements.isMultiCell())
+            return getInstance(elements.freeze(), isMultiCell);
+
+        return getInstance(elements.freezeNestedMulticellTypes(), isMultiCell);
+    }
+
+    @Override
     public boolean isCompatibleWithFrozen(CollectionType<?> previous)
     {
         assert !isMultiCell;

diff --git a/src/java/org/apache/cassandra/db/marshal/TupleType.java b/src/java/org/apache/cassandra/db/marshal/TupleType.java
index 5486183..eaf6653 100644
--- a/src/java/org/apache/cassandra/db/marshal/TupleType.java
+++ b/src/java/org/apache/cassandra/db/marshal/TupleType.java

@@ -22,11 +22,14 @@
 import java.util.Arrays;
 import java.util.Iterator;
 import java.util.List;
+import java.util.regex.Pattern;
+import java.util.stream.Collectors;
 
 import com.google.common.base.Objects;
 
 import org.apache.cassandra.cql3.*;
 import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.exceptions.SyntaxException;
 import org.apache.cassandra.serializers.*;
 import org.apache.cassandra.utils.ByteBufferUtil;
@@ -37,14 +40,29 @@
  */
 public class TupleType extends AbstractType<ByteBuffer>
 {
+    private static final String COLON = ":";
+    private static final Pattern COLON_PAT = Pattern.compile(COLON);
+    private static final String ESCAPED_COLON = "\\\\:";
+    private static final Pattern ESCAPED_COLON_PAT = Pattern.compile(ESCAPED_COLON);
+    private static final String AT = "@";
+    private static final Pattern AT_PAT = Pattern.compile(AT);
+    private static final String ESCAPED_AT = "\\\\@";
+    private static final Pattern ESCAPED_AT_PAT = Pattern.compile(ESCAPED_AT);
+    
     protected final List<AbstractType<?>> types;
 
     public TupleType(List<AbstractType<?>> types)
     {
+        this(types, true);
+    }
+
+    protected TupleType(List<AbstractType<?>> types, boolean freezeInner)
+    {
         super(ComparisonType.CUSTOM);
-        for (int i = 0; i < types.size(); i++)
-            types.set(i, types.get(i).freeze());
-        this.types = types;
+        if (freezeInner)
+            this.types = types.stream().map(AbstractType::freeze).collect(Collectors.toList());
+        else
+            this.types = types;
     }
 
     public static TupleType getInstance(TypeParser parser) throws ConfigurationException, SyntaxException
@@ -108,11 +126,22 @@
                 return cmp;
         }
 
-        if (bb1.remaining() == 0)
-            return bb2.remaining() == 0 ? 0 : -1;
+        // handle trailing nulls
+        while (bb1.remaining() > 0)
+        {
+            int size = bb1.getInt();
+            if (size > 0) // non-null
+                return 1;
+        }
 
-        // bb1.remaining() > 0 && bb2.remaining() == 0
-        return 1;
+        while (bb2.remaining() > 0)
+        {
+            int size = bb2.getInt();
+            if (size > 0) // non-null
+                return -1;
+        }
+
+        return 0;
     }
 
     @Override
@@ -161,6 +190,15 @@
             int size = input.getInt();
             components[i] = size < 0 ? null : ByteBufferUtil.readBytes(input, size);
         }
+
+        // error out if we got more values in the tuple/UDT than we expected
+        if (input.hasRemaining())
+        {
+            throw new InvalidRequestException(String.format(
+                    "Expected %s %s for %s column, but got more",
+                    size(), size() == 1 ? "value" : "values", this.asCQL3Type()));
+        }
+
         return components;
     }
 
@@ -190,6 +228,9 @@
     @Override
     public String getString(ByteBuffer value)
     {
+        if (value == null)
+            return "null";
+
         StringBuilder sb = new StringBuilder();
         ByteBuffer input = value.duplicate();
         for (int i = 0; i < size(); i++)
@@ -210,7 +251,9 @@
 
             ByteBuffer field = ByteBufferUtil.readBytes(input, size);
             // We use ':' as delimiter, and @ to represent null, so escape them in the generated string
-            sb.append(type.getString(field).replaceAll(":", "\\\\:").replaceAll("@", "\\\\@"));
+            String fld = COLON_PAT.matcher(type.getString(field)).replaceAll(ESCAPED_COLON);
+            fld = AT_PAT.matcher(fld).replaceAll(ESCAPED_AT);
+            sb.append(fld);
         }
         return sb.toString();
     }
@@ -233,7 +276,9 @@
                 continue;
 
             AbstractType<?> type = type(i);
-            fields[i] = type.fromString(fieldString.replaceAll("\\\\:", ":").replaceAll("\\\\@", "@"));
+            fieldString = ESCAPED_COLON_PAT.matcher(fieldString).replaceAll(COLON);
+            fieldString = ESCAPED_AT_PAT.matcher(fieldString).replaceAll(AT);
+            fields[i] = type.fromString(fieldString);
         }
         return buildValue(fields);
     }

diff --git a/src/java/org/apache/cassandra/db/marshal/TypeParser.java b/src/java/org/apache/cassandra/db/marshal/TypeParser.java
index 35d15ab..78af800 100644
--- a/src/java/org/apache/cassandra/db/marshal/TypeParser.java
+++ b/src/java/org/apache/cassandra/db/marshal/TypeParser.java

@@ -23,6 +23,7 @@
 import java.nio.ByteBuffer;
 import java.util.*;
 
+import org.apache.cassandra.cql3.FieldIdentifier;
 import org.apache.cassandra.exceptions.*;
 import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.FBUtilities;
@@ -538,17 +539,17 @@
         return sb.toString();
     }
 
-    public static String stringifyUserTypeParameters(String keysace, ByteBuffer typeName, List<ByteBuffer> columnNames, List<AbstractType<?>> columnTypes)
+    public static String stringifyUserTypeParameters(String keysace, ByteBuffer typeName, List<FieldIdentifier> fields,
+                                                     List<AbstractType<?>> columnTypes, boolean ignoreFreezing)
     {
         StringBuilder sb = new StringBuilder();
         sb.append('(').append(keysace).append(",").append(ByteBufferUtil.bytesToHex(typeName));
 
-        for (int i = 0; i < columnNames.size(); i++)
+        for (int i = 0; i < fields.size(); i++)
         {
             sb.append(',');
-            sb.append(ByteBufferUtil.bytesToHex(columnNames.get(i))).append(":");
-            // omit FrozenType(...) from fields because it is currently implicit
-            sb.append(columnTypes.get(i).toString(true));
+            sb.append(ByteBufferUtil.bytesToHex(fields.get(i).bytes)).append(":");
+            sb.append(columnTypes.get(i).toString(ignoreFreezing));
         }
         sb.append(')');
         return sb.toString();

diff --git a/src/java/org/apache/cassandra/db/marshal/UUIDType.java b/src/java/org/apache/cassandra/db/marshal/UUIDType.java
index acaf27c..9722a52 100644
--- a/src/java/org/apache/cassandra/db/marshal/UUIDType.java
+++ b/src/java/org/apache/cassandra/db/marshal/UUIDType.java

@@ -140,7 +140,7 @@
         {
             try
             {
-                return ByteBuffer.wrap(UUIDGen.decompose(UUID.fromString(source)));
+                return UUIDGen.toByteBuffer(UUID.fromString(source));
             }
             catch (IllegalArgumentException e)
             {

diff --git a/src/java/org/apache/cassandra/db/marshal/UserType.java b/src/java/org/apache/cassandra/db/marshal/UserType.java
index e766190..7803ee2 100644
--- a/src/java/org/apache/cassandra/db/marshal/UserType.java
+++ b/src/java/org/apache/cassandra/db/marshal/UserType.java

@@ -21,15 +21,20 @@
 import java.nio.charset.CharacterCodingException;
 import java.nio.charset.StandardCharsets;
 import java.util.*;
+import java.util.stream.Collectors;
 
 import com.google.common.base.Objects;
 
 import org.apache.cassandra.cql3.*;
+import org.apache.cassandra.db.rows.Cell;
+import org.apache.cassandra.db.rows.CellPath;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.exceptions.SyntaxException;
 import org.apache.cassandra.serializers.*;
 import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.Pair;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 /**
  * A user defined type.
@@ -38,30 +43,26 @@
  */
 public class UserType extends TupleType
 {
+    private static final Logger logger = LoggerFactory.getLogger(UserType.class);
+
     public final String keyspace;
     public final ByteBuffer name;
-    private final List<ByteBuffer> fieldNames;
+    private final List<FieldIdentifier> fieldNames;
     private final List<String> stringFieldNames;
+    private final boolean isMultiCell;
 
-    public UserType(String keyspace, ByteBuffer name, List<ByteBuffer> fieldNames, List<AbstractType<?>> fieldTypes)
+    public UserType(String keyspace, ByteBuffer name, List<FieldIdentifier> fieldNames, List<AbstractType<?>> fieldTypes, boolean isMultiCell)
     {
-        super(fieldTypes);
+        super(fieldTypes, false);
         assert fieldNames.size() == fieldTypes.size();
         this.keyspace = keyspace;
         this.name = name;
         this.fieldNames = fieldNames;
         this.stringFieldNames = new ArrayList<>(fieldNames.size());
-        for (ByteBuffer fieldName : fieldNames)
-        {
-            try
-            {
-                stringFieldNames.add(ByteBufferUtil.string(fieldName, StandardCharsets.UTF_8));
-            }
-            catch (CharacterCodingException ex)
-            {
-                throw new AssertionError("Got non-UTF8 field name for user-defined type: " + ByteBufferUtil.bytesToHex(fieldName), ex);
-            }
-        }
+        this.isMultiCell = isMultiCell;
+
+        for (FieldIdentifier fieldName : fieldNames)
+            stringFieldNames.add(fieldName.toString());
     }
 
     public static UserType getInstance(TypeParser parser) throws ConfigurationException, SyntaxException
@@ -69,14 +70,33 @@
         Pair<Pair<String, ByteBuffer>, List<Pair<ByteBuffer, AbstractType>>> params = parser.getUserTypeParameters();
         String keyspace = params.left.left;
         ByteBuffer name = params.left.right;
-        List<ByteBuffer> columnNames = new ArrayList<>(params.right.size());
+        List<FieldIdentifier> columnNames = new ArrayList<>(params.right.size());
         List<AbstractType<?>> columnTypes = new ArrayList<>(params.right.size());
         for (Pair<ByteBuffer, AbstractType> p : params.right)
         {
-            columnNames.add(p.left);
-            columnTypes.add(p.right.freeze());
+            columnNames.add(new FieldIdentifier(p.left));
+            columnTypes.add(p.right);
         }
-        return new UserType(keyspace, name, columnNames, columnTypes);
+
+        return new UserType(keyspace, name, columnNames, columnTypes, true);
+    }
+
+    @Override
+    public boolean isUDT()
+    {
+        return true;
+    }
+
+    @Override
+    public boolean isMultiCell()
+    {
+        return isMultiCell;
+    }
+
+    @Override
+    public boolean isFreezable()
+    {
+        return true;
     }
 
     public AbstractType<?> fieldType(int i)
@@ -89,7 +109,7 @@
         return types;
     }
 
-    public ByteBuffer fieldName(int i)
+    public FieldIdentifier fieldName(int i)
     {
         return fieldNames.get(i);
     }
@@ -99,7 +119,7 @@
         return stringFieldNames.get(i);
     }
 
-    public List<ByteBuffer> fieldNames()
+    public List<FieldIdentifier> fieldNames()
     {
         return fieldNames;
     }
@@ -109,6 +129,47 @@
         return UTF8Type.instance.compose(name);
     }
 
+    public int fieldPosition(FieldIdentifier fieldName)
+    {
+        return fieldNames.indexOf(fieldName);
+    }
+
+    public CellPath cellPathForField(FieldIdentifier fieldName)
+    {
+        // we use the field position instead of the field name to allow for field renaming in ALTER TYPE statements
+        return CellPath.create(ByteBufferUtil.bytes((short)fieldPosition(fieldName)));
+    }
+
+    public ShortType nameComparator()
+    {
+        return ShortType.instance;
+    }
+
+    public ByteBuffer serializeForNativeProtocol(Iterator<Cell> cells, int protocolVersion)
+    {
+        assert isMultiCell;
+
+        ByteBuffer[] components = new ByteBuffer[size()];
+        short fieldPosition = 0;
+        while (cells.hasNext())
+        {
+            Cell cell = cells.next();
+
+            // handle null fields that aren't at the end
+            short fieldPositionOfCell = ByteBufferUtil.toShort(cell.path().get(0));
+            while (fieldPosition < fieldPositionOfCell)
+                components[fieldPosition++] = null;
+
+            components[fieldPosition++] = cell.value();
+        }
+
+        // append trailing nulls for missing cells
+        while (fieldPosition < size())
+            components[fieldPosition++] = null;
+
+        return TupleType.buildValue(components);
+    }
+
     // Note: the only reason we override this is to provide nicer error message, but since that's not that much code...
     @Override
     public void validate(ByteBuffer bytes) throws MarshalException
@@ -217,19 +278,93 @@
     }
 
     @Override
+    public UserType freeze()
+    {
+        if (isMultiCell)
+            return new UserType(keyspace, name, fieldNames, fieldTypes(), false);
+        else
+            return this;
+    }
+
+    @Override
+    public AbstractType<?> freezeNestedMulticellTypes()
+    {
+        if (!isMultiCell())
+            return this;
+
+        // the behavior here doesn't exactly match the method name: we want to freeze everything inside of UDTs
+        List<AbstractType<?>> newTypes = fieldTypes().stream()
+                .map(subtype -> (subtype.isFreezable() && subtype.isMultiCell() ? subtype.freeze() : subtype))
+                .collect(Collectors.toList());
+
+        return new UserType(keyspace, name, fieldNames, newTypes, isMultiCell);
+    }
+
+    @Override
     public int hashCode()
     {
-        return Objects.hashCode(keyspace, name, fieldNames, types);
+        return Objects.hashCode(keyspace, name, fieldNames, types, isMultiCell);
+    }
+
+    @Override
+    public boolean isValueCompatibleWith(AbstractType<?> previous)
+    {
+        if (this == previous)
+            return true;
+
+        if (!(previous instanceof UserType))
+            return false;
+
+        UserType other = (UserType) previous;
+        if (isMultiCell != other.isMultiCell())
+            return false;
+
+        if (!keyspace.equals(other.keyspace))
+            return false;
+
+        Iterator<AbstractType<?>> thisTypeIter = types.iterator();
+        Iterator<AbstractType<?>> previousTypeIter = other.types.iterator();
+        while (thisTypeIter.hasNext() && previousTypeIter.hasNext())
+        {
+            if (!thisTypeIter.next().isCompatibleWith(previousTypeIter.next()))
+                return false;
+        }
+
+        // it's okay for the new type to have additional fields, but not for the old type to have additional fields
+        return !previousTypeIter.hasNext();
     }
 
     @Override
     public boolean equals(Object o)
     {
+        return o instanceof UserType && equals(o, false);
+    }
+
+    @Override
+    public boolean equals(Object o, boolean ignoreFreezing)
+    {
         if(!(o instanceof UserType))
             return false;
 
         UserType that = (UserType)o;
-        return keyspace.equals(that.keyspace) && name.equals(that.name) && fieldNames.equals(that.fieldNames) && types.equals(that.types);
+
+        if (!keyspace.equals(that.keyspace) || !name.equals(that.name) || !fieldNames.equals(that.fieldNames))
+            return false;
+
+        if (!ignoreFreezing && isMultiCell != that.isMultiCell)
+            return false;
+
+        if (this.types.size() != that.types.size())
+            return false;
+
+        Iterator<AbstractType<?>> otherTypeIter = that.types.iterator();
+        for (AbstractType<?> type : types)
+        {
+            if (!type.equals(otherTypeIter.next(), ignoreFreezing))
+                return false;
+        }
+
+        return true;
     }
 
     @Override
@@ -248,6 +383,21 @@
     @Override
     public String toString()
     {
-        return getClass().getName() + TypeParser.stringifyUserTypeParameters(keyspace, name, fieldNames, types);
+        return this.toString(false);
+    }
+
+    @Override
+    public String toString(boolean ignoreFreezing)
+    {
+        boolean includeFrozenType = !ignoreFreezing && !isMultiCell();
+
+        StringBuilder sb = new StringBuilder();
+        if (includeFrozenType)
+            sb.append(FrozenType.class.getName()).append("(");
+        sb.append(getClass().getName());
+        sb.append(TypeParser.stringifyUserTypeParameters(keyspace, name, fieldNames, types, ignoreFreezing || !isMultiCell));
+        if (includeFrozenType)
+            sb.append(")");
+        return sb.toString();
     }
 }

diff --git a/src/java/org/apache/cassandra/db/monitoring/ApproximateTime.java b/src/java/org/apache/cassandra/db/monitoring/ApproximateTime.java
new file mode 100644
index 0000000..1d57398
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/monitoring/ApproximateTime.java

@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.monitoring;
+
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.concurrent.ScheduledExecutors;
+import org.apache.cassandra.config.Config;
+
+/**
+ * This is an approximation of System.currentTimeInMillis(). It updates its
+ * time value at periodic intervals of CHECK_INTERVAL_MS milliseconds
+ * (currently 10 milliseconds by default). It can be used as a faster alternative
+ * to System.currentTimeInMillis() every time an imprecision of a few milliseconds
+ * can be accepted.
+ */
+public class ApproximateTime
+{
+    private static final Logger logger = LoggerFactory.getLogger(ApproximateTime.class);
+    private static final int CHECK_INTERVAL_MS = Math.max(5, Integer.valueOf(System.getProperty(Config.PROPERTY_PREFIX + "approximate_time_precision_ms", "10")));
+
+    private static volatile long time = System.currentTimeMillis();
+    static
+    {
+        logger.info("Scheduling approximate time-check task with a precision of {} milliseconds", CHECK_INTERVAL_MS);
+        ScheduledExecutors.scheduledFastTasks.scheduleWithFixedDelay(() -> time = System.currentTimeMillis(),
+                                                                     CHECK_INTERVAL_MS,
+                                                                     CHECK_INTERVAL_MS,
+                                                                     TimeUnit.MILLISECONDS);
+    }
+
+    public static long currentTimeMillis()
+    {
+        return time;
+    }
+
+    public static long precision()
+    {
+        return 2 * CHECK_INTERVAL_MS;
+    }
+
+}

diff --git a/src/java/org/apache/cassandra/db/monitoring/ConstructionTime.java b/src/java/org/apache/cassandra/db/monitoring/ConstructionTime.java
new file mode 100644
index 0000000..d6b6078
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/monitoring/ConstructionTime.java

@@ -0,0 +1,41 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.monitoring;
+
+public final class ConstructionTime
+{
+    public final long timestamp;
+    public final boolean isCrossNode;
+
+    public ConstructionTime()
+    {
+        this(ApproximateTime.currentTimeMillis());
+    }
+
+    public ConstructionTime(long timestamp)
+    {
+        this(timestamp, false);
+    }
+
+    public ConstructionTime(long timestamp, boolean isCrossNode)
+    {
+        this.timestamp = timestamp;
+        this.isCrossNode = isCrossNode;
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/monitoring/Monitorable.java b/src/java/org/apache/cassandra/db/monitoring/Monitorable.java
new file mode 100644
index 0000000..202ac87
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/monitoring/Monitorable.java

@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.monitoring;
+
+public interface Monitorable
+{
+    String name();
+    ConstructionTime constructionTime();
+    long timeout();
+
+    boolean isInProgress();
+    boolean isAborted();
+    boolean isCompleted();
+
+    boolean abort();
+    boolean complete();
+}

diff --git a/src/java/org/apache/cassandra/db/monitoring/MonitorableImpl.java b/src/java/org/apache/cassandra/db/monitoring/MonitorableImpl.java
new file mode 100644
index 0000000..f89f8ad
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/monitoring/MonitorableImpl.java

@@ -0,0 +1,104 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.monitoring;
+
+public abstract class MonitorableImpl implements Monitorable
+{
+    private MonitoringState state;
+    private ConstructionTime constructionTime;
+    private long timeout;
+
+    protected MonitorableImpl()
+    {
+        this.state = MonitoringState.IN_PROGRESS;
+    }
+
+    /**
+     * This setter is ugly but the construction chain to ReadCommand
+     * is too complex, it would require passing new parameters to all serializers
+     * or specializing the serializers to accept these message properties.
+     */
+    public void setMonitoringTime(ConstructionTime constructionTime, long timeout)
+    {
+        this.constructionTime = constructionTime;
+        this.timeout = timeout;
+    }
+
+    public ConstructionTime constructionTime()
+    {
+        return constructionTime;
+    }
+
+    public long timeout()
+    {
+        return timeout;
+    }
+
+    public boolean isInProgress()
+    {
+        check();
+        return state == MonitoringState.IN_PROGRESS;
+    }
+
+    public boolean isAborted()
+    {
+        check();
+        return state == MonitoringState.ABORTED;
+    }
+
+    public boolean isCompleted()
+    {
+        check();
+        return state == MonitoringState.COMPLETED;
+    }
+
+    public boolean abort()
+    {
+        if (state == MonitoringState.IN_PROGRESS)
+        {
+            if (constructionTime != null)
+                MonitoringTask.addFailedOperation(this, ApproximateTime.currentTimeMillis());
+            state = MonitoringState.ABORTED;
+            return true;
+        }
+
+        return state == MonitoringState.ABORTED;
+    }
+
+    public boolean complete()
+    {
+        if (state == MonitoringState.IN_PROGRESS)
+        {
+            state = MonitoringState.COMPLETED;
+            return true;
+        }
+
+        return state == MonitoringState.COMPLETED;
+    }
+
+    private void check()
+    {
+        if (constructionTime == null || state != MonitoringState.IN_PROGRESS)
+            return;
+
+        long elapsed = ApproximateTime.currentTimeMillis() - constructionTime.timestamp;
+        if (elapsed >= timeout)
+            abort();
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/monitoring/MonitoringState.java b/src/java/org/apache/cassandra/db/monitoring/MonitoringState.java
new file mode 100644
index 0000000..4fe3cf8
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/monitoring/MonitoringState.java

@@ -0,0 +1,26 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.monitoring;
+
+public enum MonitoringState
+{
+    IN_PROGRESS,
+    ABORTED,
+    COMPLETED
+}

diff --git a/src/java/org/apache/cassandra/db/monitoring/MonitoringTask.java b/src/java/org/apache/cassandra/db/monitoring/MonitoringTask.java
new file mode 100644
index 0000000..6d28078
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/monitoring/MonitoringTask.java

@@ -0,0 +1,264 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.monitoring;
+
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ArrayBlockingQueue;
+import java.util.concurrent.BlockingQueue;
+import java.util.concurrent.LinkedBlockingQueue;
+import java.util.concurrent.ScheduledFuture;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicLong;
+
+import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.concurrent.ScheduledExecutors;
+import org.apache.cassandra.config.Config;
+
+import static java.lang.System.getProperty;
+
+/**
+ * A task for monitoring in progress operations, currently only read queries, and aborting them if they time out.
+ * We also log timed out operations, see CASSANDRA-7392.
+ */
+public class MonitoringTask
+{
+    private static final String LINE_SEPARATOR = getProperty("line.separator");
+    private static final Logger logger = LoggerFactory.getLogger(MonitoringTask.class);
+
+    /**
+     * Defines the interval for reporting any operations that have timed out.
+     */
+    private static final int REPORT_INTERVAL_MS = Math.max(0, Integer.valueOf(System.getProperty(Config.PROPERTY_PREFIX + "monitoring_report_interval_ms", "5000")));
+
+    /**
+     * Defines the maximum number of unique timed out queries that will be reported in the logs.
+     * Use a negative number to remove any limit.
+     */
+    private static final int MAX_OPERATIONS = Integer.valueOf(System.getProperty(Config.PROPERTY_PREFIX + "monitoring_max_operations", "50"));
+
+    @VisibleForTesting
+    static MonitoringTask instance = make(REPORT_INTERVAL_MS, MAX_OPERATIONS);
+
+    private final int maxOperations;
+    private final ScheduledFuture<?> reportingTask;
+    private final BlockingQueue<FailedOperation> operationsQueue;
+    private final AtomicLong numDroppedOperations;
+    private long lastLogTime;
+
+    @VisibleForTesting
+    static MonitoringTask make(int reportIntervalMillis, int maxTimedoutOperations)
+    {
+        if (instance != null)
+        {
+            instance.cancel();
+            instance = null;
+        }
+
+        return new MonitoringTask(reportIntervalMillis, maxTimedoutOperations);
+    }
+
+    private MonitoringTask(int reportIntervalMillis, int maxOperations)
+    {
+        this.maxOperations = maxOperations;
+        this.operationsQueue = maxOperations > 0 ? new ArrayBlockingQueue<>(maxOperations) : new LinkedBlockingQueue<>();
+        this.numDroppedOperations = new AtomicLong();
+        this.lastLogTime = ApproximateTime.currentTimeMillis();
+
+        logger.info("Scheduling monitoring task with report interval of {} ms, max operations {}", reportIntervalMillis, maxOperations);
+        this.reportingTask = ScheduledExecutors.scheduledTasks.scheduleWithFixedDelay(() -> logFailedOperations(ApproximateTime.currentTimeMillis()),
+                                                                                     reportIntervalMillis,
+                                                                                     reportIntervalMillis,
+                                                                                     TimeUnit.MILLISECONDS);
+    }
+
+    public void cancel()
+    {
+        reportingTask.cancel(false);
+    }
+
+    public static void addFailedOperation(Monitorable operation, long now)
+    {
+        instance.innerAddFailedOperation(operation, now);
+    }
+
+    private void innerAddFailedOperation(Monitorable operation, long now)
+    {
+        if (maxOperations == 0)
+            return; // logging of failed operations disabled
+
+        if (!operationsQueue.offer(new FailedOperation(operation, now)))
+            numDroppedOperations.incrementAndGet();
+    }
+
+    @VisibleForTesting
+    FailedOperations aggregateFailedOperations()
+    {
+        Map<String, FailedOperation> operations = new HashMap<>();
+
+        FailedOperation failedOperation;
+        while((failedOperation = operationsQueue.poll()) != null)
+        {
+            FailedOperation existing = operations.get(failedOperation.name());
+            if (existing != null)
+                existing.addTimeout(failedOperation);
+            else
+                operations.put(failedOperation.name(), failedOperation);
+        }
+
+        return new FailedOperations(operations, numDroppedOperations.getAndSet(0L));
+    }
+
+    @VisibleForTesting
+    List<String> getFailedOperations()
+    {
+        FailedOperations failedOperations = aggregateFailedOperations();
+        String ret = failedOperations.getLogMessage();
+        lastLogTime = ApproximateTime.currentTimeMillis();
+        return ret.isEmpty() ? Collections.emptyList() : Arrays.asList(ret.split("\n"));
+    }
+
+    @VisibleForTesting
+    void logFailedOperations(long now)
+    {
+        FailedOperations failedOperations = aggregateFailedOperations();
+        if (!failedOperations.isEmpty())
+        {
+            long elapsed = now - lastLogTime;
+            logger.warn("{} operations timed out in the last {} msecs, operation list available at debug log level",
+                        failedOperations.num(),
+                        elapsed);
+
+            if (logger.isDebugEnabled())
+                logger.debug("{} operations timed out in the last {} msecs:{}{}",
+                            failedOperations.num(),
+                            elapsed,
+                            LINE_SEPARATOR,
+                            failedOperations.getLogMessage());
+        }
+
+        lastLogTime = now;
+    }
+
+    private static final class FailedOperations
+    {
+        public final Map<String, FailedOperation> operations;
+        public final long numDropped;
+
+        FailedOperations(Map<String, FailedOperation> operations, long numDropped)
+        {
+            this.operations = operations;
+            this.numDropped = numDropped;
+        }
+
+        public boolean isEmpty()
+        {
+            return operations.isEmpty() && numDropped == 0;
+        }
+
+        public long num()
+        {
+            return operations.size() + numDropped;
+        }
+
+        public String getLogMessage()
+        {
+            if (isEmpty())
+                return "";
+
+            final StringBuilder ret = new StringBuilder();
+            operations.values().forEach(o -> addOperation(ret, o));
+
+            if (numDropped > 0)
+                ret.append(LINE_SEPARATOR)
+                   .append("... (")
+                   .append(numDropped)
+                   .append(" were dropped)");
+
+            return ret.toString();
+        }
+
+        private static void addOperation(StringBuilder ret, FailedOperation operation)
+        {
+            if (ret.length() > 0)
+                ret.append(LINE_SEPARATOR);
+
+            ret.append(operation.getLogMessage());
+        }
+    }
+
+    private final static class FailedOperation
+    {
+        public final Monitorable operation;
+        public int numTimeouts;
+        public long totalTime;
+        public long maxTime;
+        public long minTime;
+        private String name;
+
+        FailedOperation(Monitorable operation, long failedAt)
+        {
+            this.operation = operation;
+            numTimeouts = 1;
+            totalTime = failedAt - operation.constructionTime().timestamp;
+            minTime = totalTime;
+            maxTime = totalTime;
+        }
+
+        public String name()
+        {
+            if (name == null)
+                name = operation.name();
+            return name;
+        }
+
+        void addTimeout(FailedOperation operation)
+        {
+            numTimeouts++;
+            totalTime += operation.totalTime;
+            maxTime = Math.max(maxTime, operation.maxTime);
+            minTime = Math.min(minTime, operation.minTime);
+        }
+
+        public String getLogMessage()
+        {
+            if (numTimeouts == 1)
+                return String.format("%s: total time %d msec - timeout %d %s",
+                                     name(),
+                                     totalTime,
+                                     operation.timeout(),
+                                     operation.constructionTime().isCrossNode ? "msec/cross-node" : "msec");
+            else
+                return String.format("%s (timed out %d times): total time avg/min/max %d/%d/%d msec - timeout %d %s",
+                                     name(),
+                                     numTimeouts,
+                                     totalTime / numTimeouts,
+                                     minTime,
+                                     maxTime,
+                                     operation.timeout(),
+                                     operation.constructionTime().isCrossNode ? "msec/cross-node" : "msec");
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/partitions/AbstractBTreePartition.java b/src/java/org/apache/cassandra/db/partitions/AbstractBTreePartition.java
index 1fa3324..0400402 100644
--- a/src/java/org/apache/cassandra/db/partitions/AbstractBTreePartition.java
+++ b/src/java/org/apache/cassandra/db/partitions/AbstractBTreePartition.java

@@ -183,7 +183,7 @@
         if (slices.size() == 0)
         {
             DeletionTime partitionDeletion = current.deletionInfo.getPartitionDeletion();
-            return UnfilteredRowIterators.noRowsIterator(metadata, partitionKey, staticRow, partitionDeletion, reversed);
+            return UnfilteredRowIterators.noRowsIterator(metadata, partitionKey(), staticRow, partitionDeletion, reversed);
         }
 
         return slices.size() == 1
@@ -193,8 +193,8 @@
 
     private UnfilteredRowIterator sliceIterator(ColumnFilter selection, Slice slice, boolean reversed, Holder current, Row staticRow)
     {
-        Slice.Bound start = slice.start() == Slice.Bound.BOTTOM ? null : slice.start();
-        Slice.Bound end = slice.end() == Slice.Bound.TOP ? null : slice.end();
+        ClusteringBound start = slice.start() == ClusteringBound.BOTTOM ? null : slice.start();
+        ClusteringBound end = slice.end() == ClusteringBound.TOP ? null : slice.end();
         Iterator<Row> rowIter = BTree.slice(current.tree, metadata.comparator, start, true, end, true, desc(reversed));
         Iterator<RangeTombstone> deleteIter = current.deletionInfo.rangeIterator(slice, reversed);
 
@@ -202,9 +202,9 @@
     }
 
     private RowAndDeletionMergeIterator merge(Iterator<Row> rowIter, Iterator<RangeTombstone> deleteIter,
-                                                     ColumnFilter selection, boolean reversed, Holder current, Row staticRow)
+                                              ColumnFilter selection, boolean reversed, Holder current, Row staticRow)
     {
-        return new RowAndDeletionMergeIterator(metadata, partitionKey, current.deletionInfo.getPartitionDeletion(),
+        return new RowAndDeletionMergeIterator(metadata, partitionKey(), current.deletionInfo.getPartitionDeletion(),
                                                selection, staticRow, reversed, current.stats,
                                                rowIter, deleteIter,
                                                canHaveShadowedData());
@@ -215,22 +215,10 @@
         final Holder current;
         final ColumnFilter selection;
 
-        private AbstractIterator(ColumnFilter selection, boolean isReversed)
-        {
-            this(AbstractBTreePartition.this.holder(), selection, isReversed);
-        }
-
-        private AbstractIterator(Holder current, ColumnFilter selection, boolean isReversed)
-        {
-            this(current,
-                 AbstractBTreePartition.this.staticRow(current, selection, false),
-                 selection, isReversed);
-        }
-
         private AbstractIterator(Holder current, Row staticRow, ColumnFilter selection, boolean isReversed)
         {
             super(AbstractBTreePartition.this.metadata,
-                  AbstractBTreePartition.this.partitionKey,
+                  AbstractBTreePartition.this.partitionKey(),
                   current.deletionInfo.getPartitionDeletion(),
                   selection.fetchedColumns(), // non-selected columns will be filtered in subclasses by RowAndDeletionMergeIterator
                                               // it would also be more precise to return the intersection of the selection and current.columns,
@@ -282,40 +270,6 @@
         }
     }
 
-    public class SliceableIterator extends AbstractIterator implements SliceableUnfilteredRowIterator
-    {
-        private Iterator<Unfiltered> iterator;
-
-        protected SliceableIterator(ColumnFilter selection, boolean isReversed)
-        {
-            super(selection, isReversed);
-        }
-
-        protected Unfiltered computeNext()
-        {
-            if (iterator == null)
-                iterator = unfilteredIterator(selection, Slices.ALL, isReverseOrder);
-            if (!iterator.hasNext())
-                return endOfData();
-            return iterator.next();
-        }
-
-        public Iterator<Unfiltered> slice(Slice slice)
-        {
-            return sliceIterator(selection, slice, isReverseOrder, current, staticRow);
-        }
-    }
-
-    public SliceableUnfilteredRowIterator sliceableUnfilteredIterator(ColumnFilter columns, boolean reversed)
-    {
-        return new SliceableIterator(columns, reversed);
-    }
-
-    protected SliceableUnfilteredRowIterator sliceableUnfilteredIterator()
-    {
-        return sliceableUnfilteredIterator(ColumnFilter.all(metadata), false);
-    }
-
     protected static Holder build(UnfilteredRowIterator iterator, int initialRowCapacity)
     {
         CFMetaData metadata = iterator.metadata();
@@ -352,10 +306,7 @@
         BTree.Builder<Row> builder = BTree.builder(metadata.comparator, initialRowCapacity);
         builder.auto(false);
         while (rows.hasNext())
-        {
-            Row row = rows.next();
-            builder.add(row);
-        }
+            builder.add(rows.next());
 
         if (reversed)
             builder.reverse();

diff --git a/src/java/org/apache/cassandra/db/partitions/AtomicBTreePartition.java b/src/java/org/apache/cassandra/db/partitions/AtomicBTreePartition.java
index 2be882e..c7113d4 100644
--- a/src/java/org/apache/cassandra/db/partitions/AtomicBTreePartition.java
+++ b/src/java/org/apache/cassandra/db/partitions/AtomicBTreePartition.java

@@ -19,6 +19,7 @@
 
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
+import java.util.Iterator;
 import java.util.List;
 import java.util.concurrent.atomic.AtomicIntegerFieldUpdater;
 import java.util.concurrent.atomic.AtomicReferenceFieldUpdater;
@@ -26,12 +27,12 @@
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.rows.EncodingStats;
-import org.apache.cassandra.db.rows.Row;
-import org.apache.cassandra.db.rows.Rows;
+import org.apache.cassandra.db.filter.ColumnFilter;
+import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.index.transactions.UpdateTransaction;
 import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.ObjectSizes;
+import org.apache.cassandra.utils.SearchIterator;
 import org.apache.cassandra.utils.btree.BTree;
 import org.apache.cassandra.utils.btree.UpdateFunction;
 import org.apache.cassandra.utils.concurrent.Locks;
@@ -181,7 +182,66 @@
             if (monitorOwned)
                 Locks.monitorExitUnsafe(this);
         }
+    }
 
+    @Override
+    public DeletionInfo deletionInfo()
+    {
+        return allocator.ensureOnHeap().applyToDeletionInfo(super.deletionInfo());
+    }
+
+    @Override
+    public Row staticRow()
+    {
+        return allocator.ensureOnHeap().applyToStatic(super.staticRow());
+    }
+
+    @Override
+    public DecoratedKey partitionKey()
+    {
+        return allocator.ensureOnHeap().applyToPartitionKey(super.partitionKey());
+    }
+
+    @Override
+    public Row getRow(Clustering clustering)
+    {
+        return allocator.ensureOnHeap().applyToRow(super.getRow(clustering));
+    }
+
+    @Override
+    public Row lastRow()
+    {
+        return allocator.ensureOnHeap().applyToRow(super.lastRow());
+    }
+
+    @Override
+    public SearchIterator<Clustering, Row> searchIterator(ColumnFilter columns, boolean reversed)
+    {
+        return allocator.ensureOnHeap().applyToPartition(super.searchIterator(columns, reversed));
+    }
+
+    @Override
+    public UnfilteredRowIterator unfilteredIterator(ColumnFilter selection, Slices slices, boolean reversed)
+    {
+        return allocator.ensureOnHeap().applyToPartition(super.unfilteredIterator(selection, slices, reversed));
+    }
+
+    @Override
+    public UnfilteredRowIterator unfilteredIterator()
+    {
+        return allocator.ensureOnHeap().applyToPartition(super.unfilteredIterator());
+    }
+
+    @Override
+    public UnfilteredRowIterator unfilteredIterator(Holder current, ColumnFilter selection, Slices slices, boolean reversed)
+    {
+        return allocator.ensureOnHeap().applyToPartition(super.unfilteredIterator(current, selection, slices, reversed));
+    }
+
+    @Override
+    public Iterator<Row> iterator()
+    {
+        return allocator.ensureOnHeap().applyToPartition(super.iterator());
     }
 
     public boolean usePessimisticLocking()

diff --git a/src/java/org/apache/cassandra/db/partitions/CachedPartition.java b/src/java/org/apache/cassandra/db/partitions/CachedPartition.java
index 33e6ecc..0cbaba0 100644
--- a/src/java/org/apache/cassandra/db/partitions/CachedPartition.java
+++ b/src/java/org/apache/cassandra/db/partitions/CachedPartition.java

@@ -45,7 +45,7 @@
     /**
      * The number of rows that were live at the time the partition was cached.
      *
-     * See {@link ColumnFamilyStore#isFilterFullyCoveredBy} to see why we need this.
+     * See {@link org.apache.cassandra.db.ColumnFamilyStore#isFilterFullyCoveredBy} to see why we need this.
      *
      * @return the number of rows in this partition that were live at the time the
      * partition was cached (this can be different from the number of live rows now
@@ -58,7 +58,7 @@
      * non-deleted cell.
      *
      * Note that this is generally not a very meaningful number, but this is used by
-     * {@link DataLimits#hasEnoughLiveData} as an optimization.
+     * {@link org.apache.cassandra.db.filter.DataLimits#hasEnoughLiveData} as an optimization.
      *
      * @return the number of row that have at least one non-expiring non-deleted cell.
      */
@@ -86,7 +86,7 @@
      * The number of cells in this cached partition that are neither tombstone nor expiring.
      *
      * Note that this is generally not a very meaningful number, but this is used by
-     * {@link DataLimits#hasEnoughLiveData} as an optimization.
+     * {@link org.apache.cassandra.db.filter.DataLimits#hasEnoughLiveData} as an optimization.
      *
      * @return the number of cells that are neither tombstones nor expiring.
      */

diff --git a/src/java/org/apache/cassandra/db/partitions/FilteredPartition.java b/src/java/org/apache/cassandra/db/partitions/FilteredPartition.java
index 26a947b..70a4678 100644
--- a/src/java/org/apache/cassandra/db/partitions/FilteredPartition.java
+++ b/src/java/org/apache/cassandra/db/partitions/FilteredPartition.java

@@ -65,7 +65,7 @@
 
             public DecoratedKey partitionKey()
             {
-                return partitionKey;
+                return FilteredPartition.this.partitionKey();
             }
 
             public Row staticRow()

diff --git a/src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java b/src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java
index 2a881a3..cfc778f 100644
--- a/src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java
+++ b/src/java/org/apache/cassandra/db/partitions/PartitionUpdate.java

@@ -193,18 +193,36 @@
     /**
      * Turns the given iterator into an update.
      *
+     * @param iterator the iterator to turn into updates.
+     * @param filter the column filter used when querying {@code iterator}. This is used to make
+     * sure we don't include data for which the value has been skipped while reading (as we would
+     * then be writing something incorrect).
+     *
      * Warning: this method does not close the provided iterator, it is up to
      * the caller to close it.
      */
-    public static PartitionUpdate fromIterator(UnfilteredRowIterator iterator)
+    public static PartitionUpdate fromIterator(UnfilteredRowIterator iterator, ColumnFilter filter)
     {
+        iterator = UnfilteredRowIterators.withOnlyQueriedData(iterator, filter);
         Holder holder = build(iterator, 16);
         MutableDeletionInfo deletionInfo = (MutableDeletionInfo) holder.deletionInfo;
         return new PartitionUpdate(iterator.metadata(), iterator.partitionKey(), holder, deletionInfo, false);
     }
 
-    public static PartitionUpdate fromIterator(RowIterator iterator)
+    /**
+     * Turns the given iterator into an update.
+     *
+     * @param iterator the iterator to turn into updates.
+     * @param filter the column filter used when querying {@code iterator}. This is used to make
+     * sure we don't include data for which the value has been skipped while reading (as we would
+     * then be writing something incorrect).
+     *
+     * Warning: this method does not close the provided iterator, it is up to
+     * the caller to close it.
+     */
+    public static PartitionUpdate fromIterator(RowIterator iterator, ColumnFilter filter)
     {
+        iterator = RowIterators.withOnlyQueriedData(iterator, filter);
         MutableDeletionInfo deletionInfo = MutableDeletionInfo.live();
         Holder holder = build(iterator, deletionInfo, true, 16);
         return new PartitionUpdate(iterator.metadata(), iterator.partitionKey(), holder, deletionInfo, false);
@@ -256,7 +274,7 @@
         try (DataOutputBuffer out = new DataOutputBuffer())
         {
             serializer.serialize(update, out, version);
-            return ByteBuffer.wrap(out.getData(), 0, out.getLength());
+            return out.asNewBuffer();
         }
         catch (IOException e)
         {
@@ -296,7 +314,7 @@
 
         int nowInSecs = FBUtilities.nowInSeconds();
         List<UnfilteredRowIterator> asIterators = Lists.transform(updates, AbstractBTreePartition::unfilteredIterator);
-        return fromIterator(UnfilteredRowIterators.merge(asIterators, nowInSecs));
+        return fromIterator(UnfilteredRowIterators.merge(asIterators, nowInSecs), ColumnFilter.all(updates.get(0).metadata()));
     }
 
     /**
@@ -420,13 +438,6 @@
         return super.iterator();
     }
 
-    @Override
-    public SliceableUnfilteredRowIterator sliceableUnfilteredIterator(ColumnFilter columns, boolean reversed)
-    {
-        maybeBuild();
-        return super.sliceableUnfilteredIterator(columns, reversed);
-    }
-
     /**
      * Validates the data contained in this update.
      *
@@ -611,7 +622,7 @@
     {
         public void serialize(PartitionUpdate update, DataOutputPlus out, int version) throws IOException
         {
-            try (UnfilteredRowIterator iter = update.sliceableUnfilteredIterator())
+            try (UnfilteredRowIterator iter = update.unfilteredIterator())
             {
                 assert !iter.isReverseOrder();
 
@@ -694,13 +705,13 @@
             try (UnfilteredRowIterator iterator = LegacyLayout.deserializeLegacyPartition(in, version, flag, key))
             {
                 assert iterator != null; // This is only used in mutation, and mutation have never allowed "null" column families
-                return PartitionUpdate.fromIterator(iterator);
+                return PartitionUpdate.fromIterator(iterator, ColumnFilter.all(iterator.metadata()));
             }
         }
 
         public long serializedSize(PartitionUpdate update, int version)
         {
-            try (UnfilteredRowIterator iter = update.sliceableUnfilteredIterator())
+            try (UnfilteredRowIterator iter = update.unfilteredIterator())
             {
                 if (version < MessagingService.VERSION_30)
                     return LegacyLayout.serializedSizeAsLegacyPartition(null, iter, version);

diff --git a/src/java/org/apache/cassandra/db/partitions/UnfilteredPartitionIterators.java b/src/java/org/apache/cassandra/db/partitions/UnfilteredPartitionIterators.java
index 41b1424..b7f6793 100644
--- a/src/java/org/apache/cassandra/db/partitions/UnfilteredPartitionIterators.java
+++ b/src/java/org/apache/cassandra/db/partitions/UnfilteredPartitionIterators.java

@@ -177,12 +177,6 @@
         {
             private final List<UnfilteredRowIterator> toMerge = new ArrayList<>(iterators.size());
 
-            @Override
-            public boolean trivialReduceIsTrivial()
-            {
-                return false;
-            }
-
             public void reduce(int idx, UnfilteredRowIterator current)
             {
                 toMerge.add(current);

diff --git a/src/java/org/apache/cassandra/db/rows/AbstractCell.java b/src/java/org/apache/cassandra/db/rows/AbstractCell.java
index 7e93c2e..002abe6 100644
--- a/src/java/org/apache/cassandra/db/rows/AbstractCell.java
+++ b/src/java/org/apache/cassandra/db/rows/AbstractCell.java

@@ -17,15 +17,19 @@
  */
 package org.apache.cassandra.db.rows;
 
+import java.nio.ByteBuffer;
 import java.security.MessageDigest;
 import java.util.Objects;
 
 import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.db.DeletionPurger;
+import org.apache.cassandra.db.TypeSizes;
 import org.apache.cassandra.db.context.CounterContext;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.CollectionType;
 import org.apache.cassandra.serializers.MarshalException;
 import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.memory.AbstractAllocator;
 
 /**
  * Base abstract class for {@code Cell} implementations.
@@ -40,6 +44,81 @@
         super(column);
     }
 
+    public boolean isCounterCell()
+    {
+        return !isTombstone() && column.cellValueType().isCounter();
+    }
+
+    public boolean isLive(int nowInSec)
+    {
+        return localDeletionTime() == NO_DELETION_TIME || (ttl() != NO_TTL && nowInSec < localDeletionTime());
+    }
+
+    public boolean isTombstone()
+    {
+        return localDeletionTime() != NO_DELETION_TIME && ttl() == NO_TTL;
+    }
+
+    public boolean isExpiring()
+    {
+        return ttl() != NO_TTL;
+    }
+
+    public Cell markCounterLocalToBeCleared()
+    {
+        if (!isCounterCell())
+            return this;
+
+        ByteBuffer value = value();
+        ByteBuffer marked = CounterContext.instance().markLocalToBeCleared(value);
+        return marked == value ? this : new BufferCell(column, timestamp(), ttl(), localDeletionTime(), marked, path());
+    }
+
+    public Cell purge(DeletionPurger purger, int nowInSec)
+    {
+        if (!isLive(nowInSec))
+        {
+            if (purger.shouldPurge(timestamp(), localDeletionTime()))
+                return null;
+
+            // We slightly hijack purging to convert expired but not purgeable columns to tombstones. The reason we do that is
+            // that once a column has expired it is equivalent to a tombstone but actually using a tombstone is more compact since
+            // we don't keep the column value. The reason we do it here is that 1) it's somewhat related to dealing with tombstones
+            // so hopefully not too surprising and 2) we want to this and purging at the same places, so it's simpler/more efficient
+            // to do both here.
+            if (isExpiring())
+            {
+                // Note that as long as the expiring column and the tombstone put together live longer than GC grace seconds,
+                // we'll fulfil our responsibility to repair. See discussion at
+                // http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html
+                return BufferCell.tombstone(column, timestamp(), localDeletionTime() - ttl(), path());
+            }
+        }
+        return this;
+    }
+
+    public Cell copy(AbstractAllocator allocator)
+    {
+        CellPath path = path();
+        return new BufferCell(column, timestamp(), ttl(), localDeletionTime(), allocator.clone(value()), path == null ? null : path.copy(allocator));
+    }
+
+    // note: while the cell returned may be different, the value is the same, so if the value is offheap it must be referenced inside a guarded context (or copied)
+    public Cell updateAllTimestamp(long newTimestamp)
+    {
+        return new BufferCell(column, isTombstone() ? newTimestamp - 1 : newTimestamp, ttl(), localDeletionTime(), value(), path());
+    }
+
+    public int dataSize()
+    {
+        CellPath path = path();
+        return TypeSizes.sizeof(timestamp())
+               + TypeSizes.sizeof(ttl())
+               + TypeSizes.sizeof(localDeletionTime())
+               + value().remaining()
+               + (path == null ? 0 : path.dataSize());
+    }
+
     public void digest(MessageDigest digest)
     {
         digest.update(value().duplicate());

diff --git a/src/java/org/apache/cassandra/db/rows/AbstractRangeTombstoneMarker.java b/src/java/org/apache/cassandra/db/rows/AbstractRangeTombstoneMarker.java
index b1ee7ec..153243c 100644
--- a/src/java/org/apache/cassandra/db/rows/AbstractRangeTombstoneMarker.java
+++ b/src/java/org/apache/cassandra/db/rows/AbstractRangeTombstoneMarker.java

@@ -20,18 +20,18 @@
 import java.nio.ByteBuffer;
 
 import org.apache.cassandra.config.CFMetaData;
-import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.ClusteringBoundOrBoundary;
 
-public abstract class AbstractRangeTombstoneMarker implements RangeTombstoneMarker
+public abstract class AbstractRangeTombstoneMarker<B extends ClusteringBoundOrBoundary> implements RangeTombstoneMarker
 {
-    protected final RangeTombstone.Bound bound;
+    protected final B bound;
 
-    protected AbstractRangeTombstoneMarker(RangeTombstone.Bound bound)
+    protected AbstractRangeTombstoneMarker(B bound)
     {
         this.bound = bound;
     }
 
-    public RangeTombstone.Bound clustering()
+    public B clustering()
     {
         return bound;
     }
@@ -58,7 +58,7 @@
 
     public void validateData(CFMetaData metadata)
     {
-        Slice.Bound bound = clustering();
+        ClusteringBoundOrBoundary bound = clustering();
         for (int i = 0; i < bound.size(); i++)
         {
             ByteBuffer value = bound.get(i);

diff --git a/src/java/org/apache/cassandra/db/rows/BTreeRow.java b/src/java/org/apache/cassandra/db/rows/BTreeRow.java
index ea1d9e0..c699634 100644
--- a/src/java/org/apache/cassandra/db/rows/BTreeRow.java
+++ b/src/java/org/apache/cassandra/db/rows/BTreeRow.java

@@ -62,7 +62,11 @@
     // no expiring cells, this will be Integer.MAX_VALUE;
     private final int minLocalDeletionTime;
 
-    private BTreeRow(Clustering clustering, LivenessInfo primaryKeyLivenessInfo, Deletion deletion, Object[] btree, int minLocalDeletionTime)
+    private BTreeRow(Clustering clustering,
+                     LivenessInfo primaryKeyLivenessInfo,
+                     Deletion deletion,
+                     Object[] btree,
+                     int minLocalDeletionTime)
     {
         assert !deletion.isShadowedBy(primaryKeyLivenessInfo);
         this.clustering = clustering;
@@ -78,7 +82,10 @@
     }
 
     // Note that it's often easier/safer to use the sortedBuilder/unsortedBuilder or one of the static creation method below. Only directly useful in a small amount of cases.
-    public static BTreeRow create(Clustering clustering, LivenessInfo primaryKeyLivenessInfo, Deletion deletion, Object[] btree)
+    public static BTreeRow create(Clustering clustering,
+                                  LivenessInfo primaryKeyLivenessInfo,
+                                  Deletion deletion,
+                                  Object[] btree)
     {
         int minDeletionTime = Math.min(minDeletionTime(primaryKeyLivenessInfo), minDeletionTime(deletion.time()));
         if (minDeletionTime != Integer.MIN_VALUE)
@@ -87,6 +94,15 @@
                 minDeletionTime = Math.min(minDeletionTime, minDeletionTime(cd));
         }
 
+        return create(clustering, primaryKeyLivenessInfo, deletion, btree, minDeletionTime);
+    }
+
+    public static BTreeRow create(Clustering clustering,
+                                  LivenessInfo primaryKeyLivenessInfo,
+                                  Deletion deletion,
+                                  Object[] btree,
+                                  int minDeletionTime)
+    {
         return new BTreeRow(clustering, primaryKeyLivenessInfo, deletion, btree, minDeletionTime);
     }
 
@@ -113,7 +129,11 @@
     public static BTreeRow noCellLiveRow(Clustering clustering, LivenessInfo primaryKeyLivenessInfo)
     {
         assert !primaryKeyLivenessInfo.isEmpty();
-        return new BTreeRow(clustering, primaryKeyLivenessInfo, Deletion.LIVE, BTree.empty(), minDeletionTime(primaryKeyLivenessInfo));
+        return new BTreeRow(clustering,
+                            primaryKeyLivenessInfo,
+                            Deletion.LIVE,
+                            BTree.empty(),
+                            minDeletionTime(primaryKeyLivenessInfo));
     }
 
     private static int minDeletionTime(Cell cell)
@@ -237,10 +257,12 @@
     {
         Map<ByteBuffer, CFMetaData.DroppedColumn> droppedColumns = metadata.getDroppedColumns();
 
-        if (filter.includesAllColumns() && (activeDeletion.isLive() || deletion.supersedes(activeDeletion)) && droppedColumns.isEmpty())
+        boolean mayFilterColumns = !filter.fetchesAllColumns() || !filter.allFetchedColumnsAreQueried();
+        boolean mayHaveShadowed = activeDeletion.supersedes(deletion.time());
+
+        if (!mayFilterColumns && !mayHaveShadowed && droppedColumns.isEmpty())
             return this;
 
-        boolean mayHaveShadowed = activeDeletion.supersedes(deletion.time());
 
         LivenessInfo newInfo = primaryKeyLivenessInfo;
         Deletion newDeletion = deletion;
@@ -255,6 +277,8 @@
 
         Columns columns = filter.fetchedColumns().columns(isStatic());
         Predicate<ColumnDefinition> inclusionTester = columns.inOrderInclusionTester();
+        Predicate<ColumnDefinition> queriedByUserTester = filter.queriedColumns().columns(isStatic()).inOrderInclusionTester();
+        final LivenessInfo rowLiveness = newInfo;
         return transformAndFilter(newInfo, newDeletion, (cd) -> {
 
             ColumnDefinition column = cd.column();
@@ -263,11 +287,31 @@
 
             CFMetaData.DroppedColumn dropped = droppedColumns.get(column.name.bytes);
             if (column.isComplex())
-                return ((ComplexColumnData) cd).filter(filter, mayHaveShadowed ? activeDeletion : DeletionTime.LIVE, dropped);
+                return ((ComplexColumnData) cd).filter(filter, mayHaveShadowed ? activeDeletion : DeletionTime.LIVE, dropped, rowLiveness);
 
             Cell cell = (Cell) cd;
-            return (dropped == null || cell.timestamp() > dropped.droppedTime) && !(mayHaveShadowed && activeDeletion.deletes(cell))
-                   ? cell : null;
+            // We include the cell unless it is 1) shadowed, 2) for a dropped column or 3) skippable.
+            // And a cell is skippable if it is for a column that is not queried by the user and its timestamp
+            // is lower than the row timestamp (see #10657 or SerializationHelper.includes() for details).
+            boolean isForDropped = dropped != null && cell.timestamp() <= dropped.droppedTime;
+            boolean isShadowed = mayHaveShadowed && activeDeletion.deletes(cell);
+            boolean isSkippable = !queriedByUserTester.test(column) && cell.timestamp() < rowLiveness.timestamp();
+            return isForDropped || isShadowed || isSkippable ? null : cell;
+        });
+    }
+
+    public Row withOnlyQueriedData(ColumnFilter filter)
+    {
+        if (filter.allFetchedColumnsAreQueried())
+            return this;
+
+        return transformAndFilter(primaryKeyLivenessInfo, deletion, (cd) -> {
+
+            ColumnDefinition column = cd.column();
+            if (column.isComplex())
+                return ((ComplexColumnData)cd).withOnlyQueriedData(filter);
+
+            return filter.fetchedColumnIsQueried(column) ? cd : null;
         });
     }
 
@@ -286,9 +330,10 @@
             if (cd.column().isSimple())
                 return false;
 
-            if (!((ComplexColumnData)cd).complexDeletion().isLive())
+            if (!((ComplexColumnData) cd).complexDeletion().isLive())
                 return true;
         }
+
         return false;
     }
 
@@ -356,7 +401,7 @@
             return null;
 
         int minDeletionTime = minDeletionTime(transformed, info, deletion.time());
-        return new BTreeRow(clustering, info, deletion, transformed, minDeletionTime);
+        return BTreeRow.create(clustering, info, deletion, transformed, minDeletionTime);
     }
 
     public int dataSize()
@@ -593,13 +638,13 @@
                 return new ComplexColumnData(column, btree, deletion);
             }
 
-        };
+        }
         protected Clustering clustering;
         protected LivenessInfo primaryKeyLivenessInfo = LivenessInfo.EMPTY;
         protected Deletion deletion = Deletion.LIVE;
 
         private final boolean isSorted;
-        private final BTree.Builder<Cell> cells;
+        private BTree.Builder<Cell> cells_;
         private final CellResolver resolver;
         private boolean hasComplex = false;
 
@@ -612,10 +657,19 @@
 
         protected Builder(boolean isSorted, int nowInSecs)
         {
-            this.cells = BTree.builder(ColumnData.comparator);
+            cells_ = null;
             resolver = new CellResolver(nowInSecs);
             this.isSorted = isSorted;
-            this.cells.auto(false);
+        }
+
+        private BTree.Builder<Cell> getCells()
+        {
+            if (cells_ == null)
+            {
+                cells_ = BTree.builder(ColumnData.comparator);
+                cells_.auto(false);
+            }
+            return cells_;
         }
 
         public boolean isSorted()
@@ -639,7 +693,7 @@
             this.clustering = null;
             this.primaryKeyLivenessInfo = LivenessInfo.EMPTY;
             this.deletion = Deletion.LIVE;
-            this.cells.reuse();
+            this.cells_ = null;
         }
 
         public void addPrimaryKeyLivenessInfo(LivenessInfo info)
@@ -660,38 +714,38 @@
         public void addCell(Cell cell)
         {
             assert cell.column().isStatic() == (clustering == Clustering.STATIC_CLUSTERING) : "Column is " + cell.column() + ", clustering = " + clustering;
+
             // In practice, only unsorted builder have to deal with shadowed cells, but it doesn't cost us much to deal with it unconditionally in this case
             if (deletion.deletes(cell))
                 return;
 
-            cells.add(cell);
+            getCells().add(cell);
             hasComplex |= cell.column.isComplex();
         }
 
         public void addComplexDeletion(ColumnDefinition column, DeletionTime complexDeletion)
         {
-            cells.add(new ComplexColumnDeletion(column, complexDeletion));
+            getCells().add(new ComplexColumnDeletion(column, complexDeletion));
             hasComplex = true;
         }
 
         public Row build()
         {
             if (!isSorted)
-                cells.sort();
+                getCells().sort();
             // we can avoid resolving if we're sorted and have no complex values
             // (because we'll only have unique simple cells, which are already in their final condition)
             if (!isSorted | hasComplex)
-                cells.resolve(resolver);
-            Object[] btree = cells.build();
+                getCells().resolve(resolver);
+            Object[] btree = getCells().build();
 
             if (deletion.isShadowedBy(primaryKeyLivenessInfo))
                 deletion = Deletion.LIVE;
 
             int minDeletionTime = minDeletionTime(btree, primaryKeyLivenessInfo, deletion.time());
-            Row row = new BTreeRow(clustering, primaryKeyLivenessInfo, deletion, btree, minDeletionTime);
+            Row row = BTreeRow.create(clustering, primaryKeyLivenessInfo, deletion, btree, minDeletionTime);
             reset();
             return row;
         }
-
     }
 }

diff --git a/src/java/org/apache/cassandra/db/rows/BufferCell.java b/src/java/org/apache/cassandra/db/rows/BufferCell.java
index db0ded5..9b31c16 100644
--- a/src/java/org/apache/cassandra/db/rows/BufferCell.java
+++ b/src/java/org/apache/cassandra/db/rows/BufferCell.java

@@ -17,18 +17,12 @@
  */
 package org.apache.cassandra.db.rows;
 
-import java.io.IOException;
 import java.nio.ByteBuffer;
 
-import org.apache.cassandra.config.*;
-import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.context.CounterContext;
+import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.db.marshal.ByteType;
-import org.apache.cassandra.io.util.DataInputPlus;
-import org.apache.cassandra.io.util.DataOutputPlus;
 import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.ObjectSizes;
-import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.memory.AbstractAllocator;
 
 public class BufferCell extends AbstractCell
@@ -45,6 +39,7 @@
     public BufferCell(ColumnDefinition column, long timestamp, int ttl, int localDeletionTime, ByteBuffer value, CellPath path)
     {
         super(column);
+        assert !column.isPrimaryKeyColumn();
         assert column.isComplex() == (path != null);
         this.timestamp = timestamp;
         this.ttl = ttl;
@@ -53,16 +48,13 @@
         this.path = path;
     }
 
-    public static BufferCell live(CFMetaData metadata, ColumnDefinition column, long timestamp, ByteBuffer value)
+    public static BufferCell live(ColumnDefinition column, long timestamp, ByteBuffer value)
     {
-        return live(metadata, column, timestamp, value, null);
+        return live(column, timestamp, value, null);
     }
 
-    public static BufferCell live(CFMetaData metadata, ColumnDefinition column, long timestamp, ByteBuffer value, CellPath path)
+    public static BufferCell live(ColumnDefinition column, long timestamp, ByteBuffer value, CellPath path)
     {
-        if (metadata.params.defaultTimeToLive != NO_TTL)
-            return expiring(column, timestamp, metadata.params.defaultTimeToLive, FBUtilities.nowInSeconds(), value, path);
-
         return new BufferCell(column, timestamp, NO_TTL, NO_DELETION_TIME, value, path);
     }
 
@@ -87,26 +79,6 @@
         return new BufferCell(column, timestamp, NO_TTL, nowInSec, ByteBufferUtil.EMPTY_BYTE_BUFFER, path);
     }
 
-    public boolean isCounterCell()
-    {
-        return !isTombstone() && column.cellValueType().isCounter();
-    }
-
-    public boolean isLive(int nowInSec)
-    {
-        return localDeletionTime == NO_DELETION_TIME || (ttl != NO_TTL && nowInSec < localDeletionTime);
-    }
-
-    public boolean isTombstone()
-    {
-        return localDeletionTime != NO_DELETION_TIME && ttl == NO_TTL;
-    }
-
-    public boolean isExpiring()
-    {
-        return ttl != NO_TTL;
-    }
-
     public long timestamp()
     {
         return timestamp;
@@ -150,216 +122,8 @@
         return new BufferCell(column, timestamp, ttl, localDeletionTime, allocator.clone(value), path == null ? null : path.copy(allocator));
     }
 
-    public Cell markCounterLocalToBeCleared()
-    {
-        if (!isCounterCell())
-            return this;
-
-        ByteBuffer marked = CounterContext.instance().markLocalToBeCleared(value());
-        return marked == value() ? this : new BufferCell(column, timestamp, ttl, localDeletionTime, marked, path);
-    }
-
-    public Cell purge(DeletionPurger purger, int nowInSec)
-    {
-        if (!isLive(nowInSec))
-        {
-            if (purger.shouldPurge(timestamp, localDeletionTime))
-                return null;
-
-            // We slightly hijack purging to convert expired but not purgeable columns to tombstones. The reason we do that is
-            // that once a column has expired it is equivalent to a tombstone but actually using a tombstone is more compact since
-            // we don't keep the column value. The reason we do it here is that 1) it's somewhat related to dealing with tombstones
-            // so hopefully not too surprising and 2) we want to this and purging at the same places, so it's simpler/more efficient
-            // to do both here.
-            if (isExpiring())
-            {
-                // Note that as long as the expiring column and the tombstone put together live longer than GC grace seconds,
-                // we'll fulfil our responsibility to repair. See discussion at
-                // http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/repair-compaction-and-tombstone-rows-td7583481.html
-                return BufferCell.tombstone(column, timestamp, localDeletionTime - ttl, path);
-            }
-        }
-        return this;
-    }
-
-    public Cell updateAllTimestamp(long newTimestamp)
-    {
-        return new BufferCell(column, isTombstone() ? newTimestamp - 1 : newTimestamp, ttl, localDeletionTime, value, path);
-    }
-
-    public int dataSize()
-    {
-        return TypeSizes.sizeof(timestamp)
-             + TypeSizes.sizeof(ttl)
-             + TypeSizes.sizeof(localDeletionTime)
-             + value.remaining()
-             + (path == null ? 0 : path.dataSize());
-    }
-
     public long unsharedHeapSizeExcludingData()
     {
         return EMPTY_SIZE + ObjectSizes.sizeOnHeapExcludingData(value) + (path == null ? 0 : path.unsharedHeapSizeExcludingData());
     }
-
-    /**
-     * The serialization format for cell is:
-     *     [ flags ][ timestamp ][ deletion time ][    ttl    ][ path size ][ path ][ value size ][ value ]
-     *     [   1b  ][ 8b (vint) ][   4b (vint)   ][ 4b (vint) ][ 4b (vint) ][  arb ][  4b (vint) ][  arb  ]
-     *
-     * where not all field are always present (in fact, only the [ flags ] are guaranteed to be present). The fields have the following
-     * meaning:
-     *   - [ flags ] is the cell flags. It is a byte for which each bit represents a flag whose meaning is explained below (*_MASK constants)
-     *   - [ timestamp ] is the cell timestamp. Present unless the cell has the USE_TIMESTAMP_MASK.
-     *   - [ deletion time]: the local deletion time for the cell. Present if either the cell is deleted (IS_DELETED_MASK)
-     *       or it is expiring (IS_EXPIRING_MASK) but doesn't have the USE_ROW_TTL_MASK.
-     *   - [ ttl ]: the ttl for the cell. Present if the row is expiring (IS_EXPIRING_MASK) but doesn't have the
-     *       USE_ROW_TTL_MASK.
-     *   - [ value size ] is the size of the [ value ] field. It's present unless either the cell has the HAS_EMPTY_VALUE_MASK, or the value
-     *       for columns of this type have a fixed length.
-     *   - [ path size ] is the size of the [ path ] field. Present iff this is the cell of a complex column.
-     *   - [ value ]: the cell value, unless it has the HAS_EMPTY_VALUE_MASK.
-     *   - [ path ]: the cell path if the column this is a cell of is complex.
-     */
-    static class Serializer implements Cell.Serializer
-    {
-        private final static int IS_DELETED_MASK             = 0x01; // Whether the cell is a tombstone or not.
-        private final static int IS_EXPIRING_MASK            = 0x02; // Whether the cell is expiring.
-        private final static int HAS_EMPTY_VALUE_MASK        = 0x04; // Wether the cell has an empty value. This will be the case for tombstone in particular.
-        private final static int USE_ROW_TIMESTAMP_MASK      = 0x08; // Wether the cell has the same timestamp than the row this is a cell of.
-        private final static int USE_ROW_TTL_MASK            = 0x10; // Wether the cell has the same ttl than the row this is a cell of.
-
-        public void serialize(Cell cell, ColumnDefinition column, DataOutputPlus out, LivenessInfo rowLiveness, SerializationHeader header) throws IOException
-        {
-            assert cell != null;
-            boolean hasValue = cell.value().hasRemaining();
-            boolean isDeleted = cell.isTombstone();
-            boolean isExpiring = cell.isExpiring();
-            boolean useRowTimestamp = !rowLiveness.isEmpty() && cell.timestamp() == rowLiveness.timestamp();
-            boolean useRowTTL = isExpiring && rowLiveness.isExpiring() && cell.ttl() == rowLiveness.ttl() && cell.localDeletionTime() == rowLiveness.localExpirationTime();
-            int flags = 0;
-            if (!hasValue)
-                flags |= HAS_EMPTY_VALUE_MASK;
-
-            if (isDeleted)
-                flags |= IS_DELETED_MASK;
-            else if (isExpiring)
-                flags |= IS_EXPIRING_MASK;
-
-            if (useRowTimestamp)
-                flags |= USE_ROW_TIMESTAMP_MASK;
-            if (useRowTTL)
-                flags |= USE_ROW_TTL_MASK;
-
-            out.writeByte((byte)flags);
-
-            if (!useRowTimestamp)
-                header.writeTimestamp(cell.timestamp(), out);
-
-            if ((isDeleted || isExpiring) && !useRowTTL)
-                header.writeLocalDeletionTime(cell.localDeletionTime(), out);
-            if (isExpiring && !useRowTTL)
-                header.writeTTL(cell.ttl(), out);
-
-            if (column.isComplex())
-                column.cellPathSerializer().serialize(cell.path(), out);
-
-            if (hasValue)
-                header.getType(column).writeValue(cell.value(), out);
-        }
-
-        public Cell deserialize(DataInputPlus in, LivenessInfo rowLiveness, ColumnDefinition column, SerializationHeader header, SerializationHelper helper) throws IOException
-        {
-            int flags = in.readUnsignedByte();
-            boolean hasValue = (flags & HAS_EMPTY_VALUE_MASK) == 0;
-            boolean isDeleted = (flags & IS_DELETED_MASK) != 0;
-            boolean isExpiring = (flags & IS_EXPIRING_MASK) != 0;
-            boolean useRowTimestamp = (flags & USE_ROW_TIMESTAMP_MASK) != 0;
-            boolean useRowTTL = (flags & USE_ROW_TTL_MASK) != 0;
-
-            long timestamp = useRowTimestamp ? rowLiveness.timestamp() : header.readTimestamp(in);
-
-            int localDeletionTime = useRowTTL
-                                  ? rowLiveness.localExpirationTime()
-                                  : (isDeleted || isExpiring ? header.readLocalDeletionTime(in) : NO_DELETION_TIME);
-
-            int ttl = useRowTTL ? rowLiveness.ttl() : (isExpiring ? header.readTTL(in) : NO_TTL);
-
-            CellPath path = column.isComplex()
-                          ? column.cellPathSerializer().deserialize(in)
-                          : null;
-
-            boolean isCounter = localDeletionTime == NO_DELETION_TIME && column.type.isCounter();
-
-            ByteBuffer value = ByteBufferUtil.EMPTY_BYTE_BUFFER;
-            if (hasValue)
-            {
-                if (helper.canSkipValue(column) || (path != null && helper.canSkipValue(path)))
-                {
-                    header.getType(column).skipValue(in);
-                }
-                else
-                {
-                    value = header.getType(column).readValue(in, DatabaseDescriptor.getMaxValueSize());
-                    if (isCounter)
-                        value = helper.maybeClearCounterValue(value);
-                }
-            }
-
-            return new BufferCell(column, timestamp, ttl, localDeletionTime, value, path);
-        }
-
-        public long serializedSize(Cell cell, ColumnDefinition column, LivenessInfo rowLiveness, SerializationHeader header)
-        {
-            long size = 1; // flags
-            boolean hasValue = cell.value().hasRemaining();
-            boolean isDeleted = cell.isTombstone();
-            boolean isExpiring = cell.isExpiring();
-            boolean useRowTimestamp = !rowLiveness.isEmpty() && cell.timestamp() == rowLiveness.timestamp();
-            boolean useRowTTL = isExpiring && rowLiveness.isExpiring() && cell.ttl() == rowLiveness.ttl() && cell.localDeletionTime() == rowLiveness.localExpirationTime();
-
-            if (!useRowTimestamp)
-                size += header.timestampSerializedSize(cell.timestamp());
-
-            if ((isDeleted || isExpiring) && !useRowTTL)
-                size += header.localDeletionTimeSerializedSize(cell.localDeletionTime());
-            if (isExpiring && !useRowTTL)
-                size += header.ttlSerializedSize(cell.ttl());
-
-            if (column.isComplex())
-                size += column.cellPathSerializer().serializedSize(cell.path());
-
-            if (hasValue)
-                size += header.getType(column).writtenLength(cell.value());
-
-            return size;
-        }
-
-        // Returns if the skipped cell was an actual cell (i.e. it had its presence flag).
-        public boolean skip(DataInputPlus in, ColumnDefinition column, SerializationHeader header) throws IOException
-        {
-            int flags = in.readUnsignedByte();
-            boolean hasValue = (flags & HAS_EMPTY_VALUE_MASK) == 0;
-            boolean isDeleted = (flags & IS_DELETED_MASK) != 0;
-            boolean isExpiring = (flags & IS_EXPIRING_MASK) != 0;
-            boolean useRowTimestamp = (flags & USE_ROW_TIMESTAMP_MASK) != 0;
-            boolean useRowTTL = (flags & USE_ROW_TTL_MASK) != 0;
-
-            if (!useRowTimestamp)
-                header.skipTimestamp(in);
-
-            if (!useRowTTL && (isDeleted || isExpiring))
-                header.skipLocalDeletionTime(in);
-
-            if (!useRowTTL && isExpiring)
-                header.skipTTL(in);
-
-            if (column.isComplex())
-                column.cellPathSerializer().skip(in);
-
-            if (hasValue)
-                header.getType(column).skipValue(in);
-
-            return true;
-        }
-    }
 }

diff --git a/src/java/org/apache/cassandra/db/rows/Cell.java b/src/java/org/apache/cassandra/db/rows/Cell.java
index d10cc74..19d1f30 100644
--- a/src/java/org/apache/cassandra/db/rows/Cell.java
+++ b/src/java/org/apache/cassandra/db/rows/Cell.java

@@ -21,10 +21,11 @@
 import java.nio.ByteBuffer;
 import java.util.Comparator;
 
-import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.config.*;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.io.util.DataOutputPlus;
 import org.apache.cassandra.io.util.DataInputPlus;
+import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.memory.AbstractAllocator;
 
 /**
@@ -143,15 +144,165 @@
     // Overrides super type to provide a more precise return type.
     public abstract Cell purge(DeletionPurger purger, int nowInSec);
 
-    public interface Serializer
+    /**
+     * The serialization format for cell is:
+     *     [ flags ][ timestamp ][ deletion time ][    ttl    ][ path size ][ path ][ value size ][ value ]
+     *     [   1b  ][ 8b (vint) ][   4b (vint)   ][ 4b (vint) ][ 4b (vint) ][  arb ][  4b (vint) ][  arb  ]
+     *
+     * where not all field are always present (in fact, only the [ flags ] are guaranteed to be present). The fields have the following
+     * meaning:
+     *   - [ flags ] is the cell flags. It is a byte for which each bit represents a flag whose meaning is explained below (*_MASK constants)
+     *   - [ timestamp ] is the cell timestamp. Present unless the cell has the USE_TIMESTAMP_MASK.
+     *   - [ deletion time]: the local deletion time for the cell. Present if either the cell is deleted (IS_DELETED_MASK)
+     *       or it is expiring (IS_EXPIRING_MASK) but doesn't have the USE_ROW_TTL_MASK.
+     *   - [ ttl ]: the ttl for the cell. Present if the row is expiring (IS_EXPIRING_MASK) but doesn't have the
+     *       USE_ROW_TTL_MASK.
+     *   - [ value size ] is the size of the [ value ] field. It's present unless either the cell has the HAS_EMPTY_VALUE_MASK, or the value
+     *       for columns of this type have a fixed length.
+     *   - [ path size ] is the size of the [ path ] field. Present iff this is the cell of a complex column.
+     *   - [ value ]: the cell value, unless it has the HAS_EMPTY_VALUE_MASK.
+     *   - [ path ]: the cell path if the column this is a cell of is complex.
+     */
+    static class Serializer
     {
-        public void serialize(Cell cell, ColumnDefinition column, DataOutputPlus out, LivenessInfo rowLiveness, SerializationHeader header) throws IOException;
+        private final static int IS_DELETED_MASK             = 0x01; // Whether the cell is a tombstone or not.
+        private final static int IS_EXPIRING_MASK            = 0x02; // Whether the cell is expiring.
+        private final static int HAS_EMPTY_VALUE_MASK        = 0x04; // Wether the cell has an empty value. This will be the case for tombstone in particular.
+        private final static int USE_ROW_TIMESTAMP_MASK      = 0x08; // Wether the cell has the same timestamp than the row this is a cell of.
+        private final static int USE_ROW_TTL_MASK            = 0x10; // Wether the cell has the same ttl than the row this is a cell of.
 
-        public Cell deserialize(DataInputPlus in, LivenessInfo rowLiveness, ColumnDefinition column, SerializationHeader header, SerializationHelper helper) throws IOException;
+        public void serialize(Cell cell, ColumnDefinition column, DataOutputPlus out, LivenessInfo rowLiveness, SerializationHeader header) throws IOException
+        {
+            assert cell != null;
+            boolean hasValue = cell.value().hasRemaining();
+            boolean isDeleted = cell.isTombstone();
+            boolean isExpiring = cell.isExpiring();
+            boolean useRowTimestamp = !rowLiveness.isEmpty() && cell.timestamp() == rowLiveness.timestamp();
+            boolean useRowTTL = isExpiring && rowLiveness.isExpiring() && cell.ttl() == rowLiveness.ttl() && cell.localDeletionTime() == rowLiveness.localExpirationTime();
+            int flags = 0;
+            if (!hasValue)
+                flags |= HAS_EMPTY_VALUE_MASK;
 
-        public long serializedSize(Cell cell, ColumnDefinition column, LivenessInfo rowLiveness, SerializationHeader header);
+            if (isDeleted)
+                flags |= IS_DELETED_MASK;
+            else if (isExpiring)
+                flags |= IS_EXPIRING_MASK;
+
+            if (useRowTimestamp)
+                flags |= USE_ROW_TIMESTAMP_MASK;
+            if (useRowTTL)
+                flags |= USE_ROW_TTL_MASK;
+
+            out.writeByte((byte)flags);
+
+            if (!useRowTimestamp)
+                header.writeTimestamp(cell.timestamp(), out);
+
+            if ((isDeleted || isExpiring) && !useRowTTL)
+                header.writeLocalDeletionTime(cell.localDeletionTime(), out);
+            if (isExpiring && !useRowTTL)
+                header.writeTTL(cell.ttl(), out);
+
+            if (column.isComplex())
+                column.cellPathSerializer().serialize(cell.path(), out);
+
+            if (hasValue)
+                header.getType(column).writeValue(cell.value(), out);
+        }
+
+        public Cell deserialize(DataInputPlus in, LivenessInfo rowLiveness, ColumnDefinition column, SerializationHeader header, SerializationHelper helper) throws IOException
+        {
+            int flags = in.readUnsignedByte();
+            boolean hasValue = (flags & HAS_EMPTY_VALUE_MASK) == 0;
+            boolean isDeleted = (flags & IS_DELETED_MASK) != 0;
+            boolean isExpiring = (flags & IS_EXPIRING_MASK) != 0;
+            boolean useRowTimestamp = (flags & USE_ROW_TIMESTAMP_MASK) != 0;
+            boolean useRowTTL = (flags & USE_ROW_TTL_MASK) != 0;
+
+            long timestamp = useRowTimestamp ? rowLiveness.timestamp() : header.readTimestamp(in);
+
+            int localDeletionTime = useRowTTL
+                                    ? rowLiveness.localExpirationTime()
+                                    : (isDeleted || isExpiring ? header.readLocalDeletionTime(in) : NO_DELETION_TIME);
+
+            int ttl = useRowTTL ? rowLiveness.ttl() : (isExpiring ? header.readTTL(in) : NO_TTL);
+
+            CellPath path = column.isComplex()
+                            ? column.cellPathSerializer().deserialize(in)
+                            : null;
+
+            ByteBuffer value = ByteBufferUtil.EMPTY_BYTE_BUFFER;
+            if (hasValue)
+            {
+                if (helper.canSkipValue(column) || (path != null && helper.canSkipValue(path)))
+                {
+                    header.getType(column).skipValue(in);
+                }
+                else
+                {
+                    boolean isCounter = localDeletionTime == NO_DELETION_TIME && column.type.isCounter();
+
+                    value = header.getType(column).readValue(in, DatabaseDescriptor.getMaxValueSize());
+                    if (isCounter)
+                        value = helper.maybeClearCounterValue(value);
+                }
+            }
+
+            return new BufferCell(column, timestamp, ttl, localDeletionTime, value, path);
+        }
+
+        public long serializedSize(Cell cell, ColumnDefinition column, LivenessInfo rowLiveness, SerializationHeader header)
+        {
+            long size = 1; // flags
+            boolean hasValue = cell.value().hasRemaining();
+            boolean isDeleted = cell.isTombstone();
+            boolean isExpiring = cell.isExpiring();
+            boolean useRowTimestamp = !rowLiveness.isEmpty() && cell.timestamp() == rowLiveness.timestamp();
+            boolean useRowTTL = isExpiring && rowLiveness.isExpiring() && cell.ttl() == rowLiveness.ttl() && cell.localDeletionTime() == rowLiveness.localExpirationTime();
+
+            if (!useRowTimestamp)
+                size += header.timestampSerializedSize(cell.timestamp());
+
+            if ((isDeleted || isExpiring) && !useRowTTL)
+                size += header.localDeletionTimeSerializedSize(cell.localDeletionTime());
+            if (isExpiring && !useRowTTL)
+                size += header.ttlSerializedSize(cell.ttl());
+
+            if (column.isComplex())
+                size += column.cellPathSerializer().serializedSize(cell.path());
+
+            if (hasValue)
+                size += header.getType(column).writtenLength(cell.value());
+
+            return size;
+        }
 
         // Returns if the skipped cell was an actual cell (i.e. it had its presence flag).
-        public boolean skip(DataInputPlus in, ColumnDefinition column, SerializationHeader header) throws IOException;
+        public boolean skip(DataInputPlus in, ColumnDefinition column, SerializationHeader header) throws IOException
+        {
+            int flags = in.readUnsignedByte();
+            boolean hasValue = (flags & HAS_EMPTY_VALUE_MASK) == 0;
+            boolean isDeleted = (flags & IS_DELETED_MASK) != 0;
+            boolean isExpiring = (flags & IS_EXPIRING_MASK) != 0;
+            boolean useRowTimestamp = (flags & USE_ROW_TIMESTAMP_MASK) != 0;
+            boolean useRowTTL = (flags & USE_ROW_TTL_MASK) != 0;
+
+            if (!useRowTimestamp)
+                header.skipTimestamp(in);
+
+            if (!useRowTTL && (isDeleted || isExpiring))
+                header.skipLocalDeletionTime(in);
+
+            if (!useRowTTL && isExpiring)
+                header.skipTTL(in);
+
+            if (column.isComplex())
+                column.cellPathSerializer().skip(in);
+
+            if (hasValue)
+                header.getType(column).skipValue(in);
+
+            return true;
+        }
     }
 }

diff --git a/src/java/org/apache/cassandra/db/rows/CellPath.java b/src/java/org/apache/cassandra/db/rows/CellPath.java
index 68e3c2b..e2b362c 100644
--- a/src/java/org/apache/cassandra/db/rows/CellPath.java
+++ b/src/java/org/apache/cassandra/db/rows/CellPath.java

@@ -22,6 +22,7 @@
 import java.security.MessageDigest;
 import java.util.Objects;
 
+import org.apache.cassandra.db.marshal.UserType;
 import org.apache.cassandra.io.util.DataOutputPlus;
 import org.apache.cassandra.io.util.DataInputPlus;
 import org.apache.cassandra.utils.ByteBufferUtil;
@@ -39,11 +40,11 @@
     public abstract int size();
     public abstract ByteBuffer get(int i);
 
-    // The only complex we currently have are collections that have only one value.
+    // The only complex paths we currently have are collections and UDTs, which both have a depth of one
     public static CellPath create(ByteBuffer value)
     {
         assert value != null;
-        return new CollectionCellPath(value);
+        return new SingleItemCellPath(value);
     }
 
     public int dataSize()
@@ -98,13 +99,13 @@
         public void skip(DataInputPlus in) throws IOException;
     }
 
-    private static class CollectionCellPath extends CellPath
+    private static class SingleItemCellPath extends CellPath
     {
-        private static final long EMPTY_SIZE = ObjectSizes.measure(new CollectionCellPath(ByteBufferUtil.EMPTY_BYTE_BUFFER));
+        private static final long EMPTY_SIZE = ObjectSizes.measure(new SingleItemCellPath(ByteBufferUtil.EMPTY_BYTE_BUFFER));
 
         protected final ByteBuffer value;
 
-        private CollectionCellPath(ByteBuffer value)
+        private SingleItemCellPath(ByteBuffer value)
         {
             this.value = value;
         }
@@ -122,7 +123,7 @@
 
         public CellPath copy(AbstractAllocator allocator)
         {
-            return new CollectionCellPath(allocator.clone(value));
+            return new SingleItemCellPath(allocator.clone(value));
         }
 
         public long unsharedHeapSizeExcludingData()

diff --git a/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java b/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java
index 2a8e843..b2c09b1 100644
--- a/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java
+++ b/src/java/org/apache/cassandra/db/rows/ComplexColumnData.java

@@ -19,22 +19,22 @@
 
 import java.nio.ByteBuffer;
 import java.security.MessageDigest;
-import java.util.*;
+import java.util.Iterator;
+import java.util.Objects;
 import java.util.function.BiFunction;
 
 import com.google.common.base.Function;
-import com.google.common.collect.Iterables;
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
-import org.apache.cassandra.db.DeletionTime;
 import org.apache.cassandra.db.DeletionPurger;
+import org.apache.cassandra.db.DeletionTime;
+import org.apache.cassandra.db.LivenessInfo;
 import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.marshal.ByteType;
 import org.apache.cassandra.db.marshal.SetType;
 import org.apache.cassandra.utils.ObjectSizes;
 import org.apache.cassandra.utils.btree.BTree;
-import org.apache.cassandra.utils.btree.UpdateFunction;
 
 /**
  * The data for a complex column, that is it's cells and potential complex
@@ -144,19 +144,21 @@
         return transformAndFilter(complexDeletion, Cell::markCounterLocalToBeCleared);
     }
 
-    public ComplexColumnData filter(ColumnFilter filter, DeletionTime activeDeletion, CFMetaData.DroppedColumn dropped)
+    public ComplexColumnData filter(ColumnFilter filter, DeletionTime activeDeletion, CFMetaData.DroppedColumn dropped, LivenessInfo rowLiveness)
     {
         ColumnFilter.Tester cellTester = filter.newTester(column);
         if (cellTester == null && activeDeletion.isLive() && dropped == null)
             return this;
 
         DeletionTime newDeletion = activeDeletion.supersedes(complexDeletion) ? DeletionTime.LIVE : complexDeletion;
-        return transformAndFilter(newDeletion,
-                                  (cell) ->
-                                           (cellTester == null || cellTester.includes(cell.path()))
-                                        && !activeDeletion.deletes(cell)
-                                        && (dropped == null || cell.timestamp() > dropped.droppedTime)
-                                           ? cell : null);
+        return transformAndFilter(newDeletion, (cell) ->
+        {
+            boolean isForDropped = dropped != null && cell.timestamp() <= dropped.droppedTime;
+            boolean isShadowed = activeDeletion.deletes(cell);
+            boolean isSkippable = cellTester != null && (!cellTester.fetches(cell.path())
+                                                         || (!cellTester.fetchedCellIsQueried(cell.path()) && cell.timestamp() < rowLiveness.timestamp()));
+            return isForDropped || isShadowed || isSkippable ? null : cell;
+        });
     }
 
     public ComplexColumnData purge(DeletionPurger purger, int nowInSec)
@@ -165,6 +167,11 @@
         return transformAndFilter(newDeletion, (cell) -> cell.purge(purger, nowInSec));
     }
 
+    public ComplexColumnData withOnlyQueriedData(ColumnFilter filter)
+    {
+        return transformAndFilter(complexDeletion, (cell) -> filter.fetchedCellIsQueried(column, cell.path()) ? null : cell);
+    }
+
     private ComplexColumnData transformAndFilter(DeletionTime newDeletion, Function<? super Cell, ? extends Cell> function)
     {
         Object[] transformed = BTree.transformAndFilter(cells, function);
@@ -240,8 +247,7 @@
         {
             this.column = column;
             this.complexDeletion = DeletionTime.LIVE; // default if writeComplexDeletion is not called
-            if (builder == null) builder = BTree.builder(column.cellComparator());
-            else builder.reuse(column.cellComparator());
+            this.builder = BTree.builder(column.cellComparator());
         }
 
         public void addComplexDeletion(DeletionTime complexDeletion)

diff --git a/src/java/org/apache/cassandra/db/rows/LazilyInitializedUnfilteredRowIterator.java b/src/java/org/apache/cassandra/db/rows/LazilyInitializedUnfilteredRowIterator.java
index 8ba4394..fc5bdbe 100644
--- a/src/java/org/apache/cassandra/db/rows/LazilyInitializedUnfilteredRowIterator.java
+++ b/src/java/org/apache/cassandra/db/rows/LazilyInitializedUnfilteredRowIterator.java

@@ -27,7 +27,7 @@
  *
  * This is used during partition range queries when we know the partition key but want
  * to defer the initialization of the rest of the UnfilteredRowIterator until we need those informations.
- * See {@link BigTableScanner#KeyScanningIterator} for instance.
+ * See {@link org.apache.cassandra.io.sstable.format.big.BigTableScanner#KeyScanningIterator} for instance.
  */
 public abstract class LazilyInitializedUnfilteredRowIterator extends AbstractIterator<Unfiltered> implements UnfilteredRowIterator
 {
@@ -42,12 +42,17 @@
 
     protected abstract UnfilteredRowIterator initializeIterator();
 
-    private void maybeInit()
+    protected void maybeInit()
     {
         if (iterator == null)
             iterator = initializeIterator();
     }
 
+    public boolean initialized()
+    {
+        return iterator != null;
+    }
+
     public CFMetaData metadata()
     {
         maybeInit();

diff --git a/src/java/org/apache/cassandra/db/rows/NativeCell.java b/src/java/org/apache/cassandra/db/rows/NativeCell.java
new file mode 100644
index 0000000..5930332
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/rows/NativeCell.java

@@ -0,0 +1,156 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db.rows;
+
+import java.nio.ByteBuffer;
+import java.nio.ByteOrder;
+
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.utils.ObjectSizes;
+import org.apache.cassandra.utils.concurrent.OpOrder;
+import org.apache.cassandra.utils.memory.MemoryUtil;
+import org.apache.cassandra.utils.memory.NativeAllocator;
+
+public class NativeCell extends AbstractCell
+{
+    private static final long EMPTY_SIZE = ObjectSizes.measure(new NativeCell());
+
+    private static final long HAS_CELLPATH = 0;
+    private static final long TIMESTAMP = 1;
+    private static final long TTL = 9;
+    private static final long DELETION = 13;
+    private static final long LENGTH = 17;
+    private static final long VALUE = 21;
+
+    private final long peer;
+
+    private NativeCell()
+    {
+        super(null);
+        this.peer = 0;
+    }
+
+    public NativeCell(NativeAllocator allocator,
+                      OpOrder.Group writeOp,
+                      Cell cell)
+    {
+        this(allocator,
+             writeOp,
+             cell.column(),
+             cell.timestamp(),
+             cell.ttl(),
+             cell.localDeletionTime(),
+             cell.value(),
+             cell.path());
+    }
+
+    public NativeCell(NativeAllocator allocator,
+                      OpOrder.Group writeOp,
+                      ColumnDefinition column,
+                      long timestamp,
+                      int ttl,
+                      int localDeletionTime,
+                      ByteBuffer value,
+                      CellPath path)
+    {
+        super(column);
+        long size = simpleSize(value.remaining());
+
+        assert value.order() == ByteOrder.BIG_ENDIAN;
+        assert column.isComplex() == (path != null);
+        if (path != null)
+        {
+            assert path.size() == 1;
+            size += 4 + path.get(0).remaining();
+        }
+
+        if (size > Integer.MAX_VALUE)
+            throw new IllegalStateException();
+
+        // cellpath? : timestamp : ttl : localDeletionTime : length : <data> : [cell path length] : [<cell path data>]
+        peer = allocator.allocate((int) size, writeOp);
+        MemoryUtil.setByte(peer + HAS_CELLPATH, (byte)(path == null ? 0 : 1));
+        MemoryUtil.setLong(peer + TIMESTAMP, timestamp);
+        MemoryUtil.setInt(peer + TTL, ttl);
+        MemoryUtil.setInt(peer + DELETION, localDeletionTime);
+        MemoryUtil.setInt(peer + LENGTH, value.remaining());
+        MemoryUtil.setBytes(peer + VALUE, value);
+
+        if (path != null)
+        {
+            ByteBuffer pathbuffer = path.get(0);
+            assert pathbuffer.order() == ByteOrder.BIG_ENDIAN;
+
+            long offset = peer + VALUE + value.remaining();
+            MemoryUtil.setInt(offset, pathbuffer.remaining());
+            MemoryUtil.setBytes(offset + 4, pathbuffer);
+        }
+    }
+
+    private static long simpleSize(int length)
+    {
+        return VALUE + length;
+    }
+
+    public long timestamp()
+    {
+        return MemoryUtil.getLong(peer + TIMESTAMP);
+    }
+
+    public int ttl()
+    {
+        return MemoryUtil.getInt(peer + TTL);
+    }
+
+    public int localDeletionTime()
+    {
+        return MemoryUtil.getInt(peer + DELETION);
+    }
+
+    public ByteBuffer value()
+    {
+        int length = MemoryUtil.getInt(peer + LENGTH);
+        return MemoryUtil.getByteBuffer(peer + VALUE, length, ByteOrder.BIG_ENDIAN);
+    }
+
+    public CellPath path()
+    {
+        if (MemoryUtil.getByte(peer+ HAS_CELLPATH) == 0)
+            return null;
+
+        long offset = peer + VALUE + MemoryUtil.getInt(peer + LENGTH);
+        int size = MemoryUtil.getInt(offset);
+        return CellPath.create(MemoryUtil.getByteBuffer(offset + 4, size, ByteOrder.BIG_ENDIAN));
+    }
+
+    public Cell withUpdatedValue(ByteBuffer newValue)
+    {
+        throw new UnsupportedOperationException();
+    }
+
+    public Cell withUpdatedColumn(ColumnDefinition column)
+    {
+        return new BufferCell(column, timestamp(), ttl(), localDeletionTime(), value(), path());
+    }
+
+    public long unsharedHeapSizeExcludingData()
+    {
+        return EMPTY_SIZE;
+    }
+
+}

diff --git a/src/java/org/apache/cassandra/db/rows/RangeTombstoneBoundMarker.java b/src/java/org/apache/cassandra/db/rows/RangeTombstoneBoundMarker.java
index b35033d..45e594b 100644
--- a/src/java/org/apache/cassandra/db/rows/RangeTombstoneBoundMarker.java
+++ b/src/java/org/apache/cassandra/db/rows/RangeTombstoneBoundMarker.java

@@ -28,43 +28,37 @@
 /**
  * A range tombstone marker that indicates the bound of a range tombstone (start or end).
  */
-public class RangeTombstoneBoundMarker extends AbstractRangeTombstoneMarker
+public class RangeTombstoneBoundMarker extends AbstractRangeTombstoneMarker<ClusteringBound>
 {
     private final DeletionTime deletion;
 
-    public RangeTombstoneBoundMarker(RangeTombstone.Bound bound, DeletionTime deletion)
+    public RangeTombstoneBoundMarker(ClusteringBound bound, DeletionTime deletion)
     {
         super(bound);
-        assert !bound.isBoundary();
         this.deletion = deletion;
     }
 
-    public RangeTombstoneBoundMarker(Slice.Bound bound, DeletionTime deletion)
-    {
-        this(new RangeTombstone.Bound(bound.kind(), bound.getRawValues()), deletion);
-    }
-
     public static RangeTombstoneBoundMarker inclusiveOpen(boolean reversed, ByteBuffer[] boundValues, DeletionTime deletion)
     {
-        RangeTombstone.Bound bound = RangeTombstone.Bound.inclusiveOpen(reversed, boundValues);
+        ClusteringBound bound = ClusteringBound.inclusiveOpen(reversed, boundValues);
         return new RangeTombstoneBoundMarker(bound, deletion);
     }
 
     public static RangeTombstoneBoundMarker exclusiveOpen(boolean reversed, ByteBuffer[] boundValues, DeletionTime deletion)
     {
-        RangeTombstone.Bound bound = RangeTombstone.Bound.exclusiveOpen(reversed, boundValues);
+        ClusteringBound bound = ClusteringBound.exclusiveOpen(reversed, boundValues);
         return new RangeTombstoneBoundMarker(bound, deletion);
     }
 
     public static RangeTombstoneBoundMarker inclusiveClose(boolean reversed, ByteBuffer[] boundValues, DeletionTime deletion)
     {
-        RangeTombstone.Bound bound = RangeTombstone.Bound.inclusiveClose(reversed, boundValues);
+        ClusteringBound bound = ClusteringBound.inclusiveClose(reversed, boundValues);
         return new RangeTombstoneBoundMarker(bound, deletion);
     }
 
     public static RangeTombstoneBoundMarker exclusiveClose(boolean reversed, ByteBuffer[] boundValues, DeletionTime deletion)
     {
-        RangeTombstone.Bound bound = RangeTombstone.Bound.exclusiveClose(reversed, boundValues);
+        ClusteringBound bound = ClusteringBound.exclusiveClose(reversed, boundValues);
         return new RangeTombstoneBoundMarker(bound, deletion);
     }
 
@@ -109,12 +103,12 @@
         return bound.isInclusive();
     }
 
-    public RangeTombstone.Bound openBound(boolean reversed)
+    public ClusteringBound openBound(boolean reversed)
     {
         return isOpen(reversed) ? clustering() : null;
     }
 
-    public RangeTombstone.Bound closeBound(boolean reversed)
+    public ClusteringBound closeBound(boolean reversed)
     {
         return isClose(reversed) ? clustering() : null;
     }

diff --git a/src/java/org/apache/cassandra/db/rows/RangeTombstoneBoundaryMarker.java b/src/java/org/apache/cassandra/db/rows/RangeTombstoneBoundaryMarker.java
index 06fbf87..fd41bea 100644
--- a/src/java/org/apache/cassandra/db/rows/RangeTombstoneBoundaryMarker.java
+++ b/src/java/org/apache/cassandra/db/rows/RangeTombstoneBoundaryMarker.java

@@ -28,12 +28,12 @@
 /**
  * A range tombstone marker that represents a boundary between 2 range tombstones (i.e. it closes one range and open another).
  */
-public class RangeTombstoneBoundaryMarker extends AbstractRangeTombstoneMarker
+public class RangeTombstoneBoundaryMarker extends AbstractRangeTombstoneMarker<ClusteringBoundary>
 {
     private final DeletionTime endDeletion;
     private final DeletionTime startDeletion;
 
-    public RangeTombstoneBoundaryMarker(RangeTombstone.Bound bound, DeletionTime endDeletion, DeletionTime startDeletion)
+    public RangeTombstoneBoundaryMarker(ClusteringBoundary bound, DeletionTime endDeletion, DeletionTime startDeletion)
     {
         super(bound);
         assert bound.isBoundary();
@@ -43,7 +43,7 @@
 
     public static RangeTombstoneBoundaryMarker exclusiveCloseInclusiveOpen(boolean reversed, ByteBuffer[] boundValues, DeletionTime closeDeletion, DeletionTime openDeletion)
     {
-        RangeTombstone.Bound bound = RangeTombstone.Bound.exclusiveCloseInclusiveOpen(reversed, boundValues);
+        ClusteringBoundary bound = ClusteringBoundary.exclusiveCloseInclusiveOpen(reversed, boundValues);
         DeletionTime endDeletion = reversed ? openDeletion : closeDeletion;
         DeletionTime startDeletion = reversed ? closeDeletion : openDeletion;
         return new RangeTombstoneBoundaryMarker(bound, endDeletion, startDeletion);
@@ -51,7 +51,7 @@
 
     public static RangeTombstoneBoundaryMarker inclusiveCloseExclusiveOpen(boolean reversed, ByteBuffer[] boundValues, DeletionTime closeDeletion, DeletionTime openDeletion)
     {
-        RangeTombstone.Bound bound = RangeTombstone.Bound.inclusiveCloseExclusiveOpen(reversed, boundValues);
+        ClusteringBoundary bound = ClusteringBoundary.inclusiveCloseExclusiveOpen(reversed, boundValues);
         DeletionTime endDeletion = reversed ? openDeletion : closeDeletion;
         DeletionTime startDeletion = reversed ? closeDeletion : openDeletion;
         return new RangeTombstoneBoundaryMarker(bound, endDeletion, startDeletion);
@@ -88,14 +88,14 @@
         return (bound.kind() == ClusteringPrefix.Kind.EXCL_END_INCL_START_BOUNDARY) ^ reversed;
     }
 
-    public RangeTombstone.Bound openBound(boolean reversed)
+    public ClusteringBound openBound(boolean reversed)
     {
-        return bound.withNewKind(bound.kind().openBoundOfBoundary(reversed));
+        return bound.openBound(reversed);
     }
 
-    public RangeTombstone.Bound closeBound(boolean reversed)
+    public ClusteringBound closeBound(boolean reversed)
     {
-        return bound.withNewKind(bound.kind().closeBoundOfBoundary(reversed));
+        return bound.closeBound(reversed);
     }
 
     public boolean closeIsInclusive(boolean reversed)
@@ -120,9 +120,9 @@
         return new RangeTombstoneBoundaryMarker(clustering().copy(allocator), endDeletion, startDeletion);
     }
 
-    public static RangeTombstoneBoundaryMarker makeBoundary(boolean reversed, Slice.Bound close, Slice.Bound open, DeletionTime closeDeletion, DeletionTime openDeletion)
+    public static RangeTombstoneBoundaryMarker makeBoundary(boolean reversed, ClusteringBound close, ClusteringBound open, DeletionTime closeDeletion, DeletionTime openDeletion)
     {
-        assert RangeTombstone.Bound.Kind.compare(close.kind(), open.kind()) == 0 : "Both bound don't form a boundary";
+        assert ClusteringPrefix.Kind.compare(close.kind(), open.kind()) == 0 : "Both bound don't form a boundary";
         boolean isExclusiveClose = close.isExclusive() || (close.isInclusive() && open.isInclusive() && openDeletion.supersedes(closeDeletion));
         return isExclusiveClose
              ? exclusiveCloseInclusiveOpen(reversed, close.getRawValues(), closeDeletion, openDeletion)

diff --git a/src/java/org/apache/cassandra/db/rows/RangeTombstoneMarker.java b/src/java/org/apache/cassandra/db/rows/RangeTombstoneMarker.java
index 5771a86..bc98899 100644
--- a/src/java/org/apache/cassandra/db/rows/RangeTombstoneMarker.java
+++ b/src/java/org/apache/cassandra/db/rows/RangeTombstoneMarker.java

@@ -20,19 +20,19 @@
 import java.nio.ByteBuffer;
 import java.util.*;
 
-import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.utils.memory.AbstractAllocator;
 
 /**
  * A marker for a range tombstone bound.
  * <p>
- * There is 2 types of markers: bounds (see {@link RangeTombstoneBound}) and boundaries (see {@link RangeTombstoneBoundary}).
+ * There is 2 types of markers: bounds (see {@link RangeTombstoneBoundMarker}) and boundaries (see {@link RangeTombstoneBoundaryMarker}).
+ * </p>
  */
 public interface RangeTombstoneMarker extends Unfiltered
 {
     @Override
-    public RangeTombstone.Bound clustering();
+    public ClusteringBoundOrBoundary clustering();
 
     public boolean isBoundary();
 
@@ -44,8 +44,8 @@
     public boolean openIsInclusive(boolean reversed);
     public boolean closeIsInclusive(boolean reversed);
 
-    public RangeTombstone.Bound openBound(boolean reversed);
-    public RangeTombstone.Bound closeBound(boolean reversed);
+    public ClusteringBound openBound(boolean reversed);
+    public ClusteringBound closeBound(boolean reversed);
 
     public RangeTombstoneMarker copy(AbstractAllocator allocator);
 
@@ -67,7 +67,7 @@
         private final DeletionTime partitionDeletion;
         private final boolean reversed;
 
-        private RangeTombstone.Bound bound;
+        private ClusteringBoundOrBoundary bound;
         private final RangeTombstoneMarker[] markers;
 
         // For each iterator, what is the currently open marker deletion time (or null if there is no open marker on that iterator)

diff --git a/src/java/org/apache/cassandra/db/rows/Row.java b/src/java/org/apache/cassandra/db/rows/Row.java
index c7c3216..53b0eb3 100644
--- a/src/java/org/apache/cassandra/db/rows/Row.java
+++ b/src/java/org/apache/cassandra/db/rows/Row.java

@@ -206,6 +206,16 @@
     public Row purge(DeletionPurger purger, int nowInSec);
 
     /**
+     * Returns a copy of this row which only include the data queried by {@code filter}, excluding anything _fetched_ for
+     * internal reasons but not queried by the user (see {@link ColumnFilter} for details).
+     *
+     * @param filter the {@code ColumnFilter} to use when deciding what is user queried. This should be the filter
+     * that was used when querying the row on which this method is called.
+     * @return the row but with all data that wasn't queried by the user skipped.
+     */
+    public Row withOnlyQueriedData(ColumnFilter filter);
+
+    /**
      * Returns a copy of this row where all counter cells have they "local" shard marked for clearing.
      */
     public Row markCounterLocalToBeCleared();
@@ -424,11 +434,11 @@
         public Clustering clustering();
 
         /**
-         * Adds the liveness information for the primary key columns of this row.
+         * Adds the liveness information for the partition key columns of this row.
          *
          * This call is optional (skipping it is equivalent to calling {@code addPartitionKeyLivenessInfo(LivenessInfo.NONE)}).
          *
-         * @param info the liveness information for the primary key columns of the built row.
+         * @param info the liveness information for the partition key columns of the built row.
          */
         public void addPrimaryKeyLivenessInfo(LivenessInfo info);
 

diff --git a/src/java/org/apache/cassandra/db/rows/RowAndDeletionMergeIterator.java b/src/java/org/apache/cassandra/db/rows/RowAndDeletionMergeIterator.java
index 389fe45..5d7d8fe 100644
--- a/src/java/org/apache/cassandra/db/rows/RowAndDeletionMergeIterator.java
+++ b/src/java/org/apache/cassandra/db/rows/RowAndDeletionMergeIterator.java

@@ -153,12 +153,12 @@
         return range;
     }
 
-    private Slice.Bound openBound(RangeTombstone range)
+    private ClusteringBound openBound(RangeTombstone range)
     {
         return range.deletedSlice().open(isReverseOrder());
     }
 
-    private Slice.Bound closeBound(RangeTombstone range)
+    private ClusteringBound closeBound(RangeTombstone range)
     {
         return range.deletedSlice().close(isReverseOrder());
     }

diff --git a/src/java/org/apache/cassandra/db/rows/RowDiffListener.java b/src/java/org/apache/cassandra/db/rows/RowDiffListener.java
index ec848a0..0c7e32b 100644
--- a/src/java/org/apache/cassandra/db/rows/RowDiffListener.java
+++ b/src/java/org/apache/cassandra/db/rows/RowDiffListener.java

@@ -23,7 +23,7 @@
 /**
  * Interface that allows to act on the result of merging multiple rows.
  *
- * More precisely, given N rows and the result of merging them, one can call {@link Rows#diff()}
+ * More precisely, given N rows and the result of merging them, one can call {@link Rows#diff(RowDiffListener, Row, Row...)}
  * with a {@code RowDiffListener} and that listener will be informed for each input row of the diff between
  * that input and merge row.
  */

diff --git a/src/java/org/apache/cassandra/db/rows/RowIterator.java b/src/java/org/apache/cassandra/db/rows/RowIterator.java
index f0b4499..0cc4a3c 100644
--- a/src/java/org/apache/cassandra/db/rows/RowIterator.java
+++ b/src/java/org/apache/cassandra/db/rows/RowIterator.java

@@ -17,11 +17,6 @@
  */
 package org.apache.cassandra.db.rows;
 
-import java.util.Iterator;
-
-import org.apache.cassandra.config.CFMetaData;
-import org.apache.cassandra.db.*;
-
 /**
  * An iterator over rows belonging to a partition.
  *

diff --git a/src/java/org/apache/cassandra/db/rows/RowIterators.java b/src/java/org/apache/cassandra/db/rows/RowIterators.java
index ae051c0..bce6a7d 100644
--- a/src/java/org/apache/cassandra/db/rows/RowIterators.java
+++ b/src/java/org/apache/cassandra/db/rows/RowIterators.java

@@ -23,6 +23,7 @@
 import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.transform.Transformation;
 import org.apache.cassandra.utils.FBUtilities;
 
@@ -52,6 +53,22 @@
     }
 
     /**
+     * Filter the provided iterator to only include cells that are selected by the user.
+     *
+     * @param iterator the iterator to filter.
+     * @param filter the {@code ColumnFilter} to use when deciding which cells are queried by the user. This should be the filter
+     * that was used when querying {@code iterator}.
+     * @return the filtered iterator..
+     */
+    public static RowIterator withOnlyQueriedData(RowIterator iterator, ColumnFilter filter)
+    {
+        if (filter.allFetchedColumnsAreQueried())
+            return iterator;
+
+        return Transformation.apply(iterator, new WithOnlyQueriedData(filter));
+    }
+
+    /**
      * Wraps the provided iterator so it logs the returned rows for debugging purposes.
      * <p>
      * Note that this is only meant for debugging as this can log a very large amount of

diff --git a/src/java/org/apache/cassandra/db/rows/SerializationHelper.java b/src/java/org/apache/cassandra/db/rows/SerializationHelper.java
index 6b4bc2e..e40a1e1 100644
--- a/src/java/org/apache/cassandra/db/rows/SerializationHelper.java
+++ b/src/java/org/apache/cassandra/db/rows/SerializationHelper.java

@@ -67,34 +67,49 @@
         this(metadata, version, flag, null);
     }
 
-    public Columns fetchedStaticColumns(SerializationHeader header)
-    {
-        return columnsToFetch == null ? header.columns().statics : columnsToFetch.fetchedColumns().statics;
-    }
-
-    public Columns fetchedRegularColumns(SerializationHeader header)
-    {
-        return columnsToFetch == null ? header.columns().regulars : columnsToFetch.fetchedColumns().regulars;
-    }
-
     public boolean includes(ColumnDefinition column)
     {
-        return columnsToFetch == null || columnsToFetch.includes(column);
+        return columnsToFetch == null || columnsToFetch.fetches(column);
+    }
+
+    public boolean includes(Cell cell, LivenessInfo rowLiveness)
+    {
+        if (columnsToFetch == null)
+            return true;
+
+        // During queries, some columns are included even though they are not queried by the user because
+        // we always need to distinguish between having a row (with potentially only null values) and not
+        // having a row at all (see #CASSANDRA-7085 for background). In the case where the column is not
+        // actually requested by the user however (canSkipValue), we can skip the full cell if the cell
+        // timestamp is lower than the row one, because in that case, the row timestamp is enough proof
+        // of the liveness of the row. Otherwise, we'll only be able to skip the values of those cells.
+        ColumnDefinition column = cell.column();
+        if (column.isComplex())
+        {
+            if (!includes(cell.path()))
+                return false;
+
+            return !canSkipValue(cell.path()) || cell.timestamp() >= rowLiveness.timestamp();
+        }
+        else
+        {
+            return columnsToFetch.fetchedColumnIsQueried(column) || cell.timestamp() >= rowLiveness.timestamp();
+        }
     }
 
     public boolean includes(CellPath path)
     {
-        return path == null || tester == null || tester.includes(path);
+        return path == null || tester == null || tester.fetches(path);
     }
 
     public boolean canSkipValue(ColumnDefinition column)
     {
-        return columnsToFetch != null && columnsToFetch.canSkipValue(column);
+        return columnsToFetch != null && !columnsToFetch.fetchedColumnIsQueried(column);
     }
 
     public boolean canSkipValue(CellPath path)
     {
-        return path != null && tester != null && tester.canSkipValue(path);
+        return path != null && tester != null && !tester.fetchedCellIsQueried(path);
     }
 
     public void startOfComplexColumn(ColumnDefinition column)

diff --git a/src/java/org/apache/cassandra/db/rows/SliceableUnfilteredRowIterator.java b/src/java/org/apache/cassandra/db/rows/SliceableUnfilteredRowIterator.java
deleted file mode 100644
index 2250ee9..0000000
--- a/src/java/org/apache/cassandra/db/rows/SliceableUnfilteredRowIterator.java
+++ /dev/null

@@ -1,39 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.db.rows;
-
-import java.util.Iterator;
-
-import org.apache.cassandra.db.Slice;
-
-public interface SliceableUnfilteredRowIterator extends UnfilteredRowIterator
-{
-    /**
-     * Move forward (resp. backward if isReverseOrder() is true for the iterator) in
-     * the iterator and return an iterator over the Unfiltered selected by the provided
-     * {@code slice}.
-     * <p>
-     * Please note that successive calls to {@code slice} are allowed provided the
-     * slice are non overlapping and are passed in clustering (resp. reverse clustering) order.
-     * However, {@code slice} is allowed to leave the iterator in an unknown state and there
-     * is no guarantee over what a call to {@code hasNext} or {@code next} will yield after
-     * a call to {@code slice}. In other words, for a given iterator, you should either use
-     * {@code slice} or {@code hasNext/next} but not both.
-     */
-    public Iterator<Unfiltered> slice(Slice slice);
-}

diff --git a/src/java/org/apache/cassandra/db/rows/Unfiltered.java b/src/java/org/apache/cassandra/db/rows/Unfiltered.java
index 9d96137..37ad447 100644
--- a/src/java/org/apache/cassandra/db/rows/Unfiltered.java
+++ b/src/java/org/apache/cassandra/db/rows/Unfiltered.java

@@ -49,7 +49,7 @@
      * Validate the data of this atom.
      *
      * @param metadata the metadata for the table this atom is part of.
-     * @throws MarshalException if some of the data in this atom is
+     * @throws org.apache.cassandra.serializers.MarshalException if some of the data in this atom is
      * invalid (some value is invalid for its column type, or some field
      * is nonsensical).
      */

diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
index 932ca4c..542f0a2 100644
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorSerializer.java

@@ -36,6 +36,7 @@
  * The serialization is composed of a header, follows by the rows and range tombstones of the iterator serialized
  * until we read the end of the partition (see UnfilteredSerializer for details). The header itself
  * is:
+ * {@code
  *     <cfid><key><flags><s_header>[<partition_deletion>][<static_row>][<row_estimate>]
  * where:
  *     <cfid> is the table cfid.
@@ -54,6 +55,7 @@
  *     <static_row> is the static row for this partition as serialized by UnfilteredSerializer.
  *     <row_estimate> is the (potentially estimated) number of rows serialized. This is only used for
  *         the purpose of sizing on the receiving end and should not be relied upon too strongly.
+ * }
  *
  * Please note that the format described above is the on-wire format. On-disk, the format is basically the
  * same, but the header is written once per sstable, not once per-partition. Further, the actual row and

diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorWithLowerBound.java b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorWithLowerBound.java
new file mode 100644
index 0000000..14730ac
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIteratorWithLowerBound.java

@@ -0,0 +1,240 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.db.rows;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Comparator;
+import java.util.List;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.filter.ClusteringIndexFilter;
+import org.apache.cassandra.db.filter.ColumnFilter;
+import org.apache.cassandra.io.sstable.IndexInfo;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.io.sstable.metadata.StatsMetadata;
+import org.apache.cassandra.thrift.ThriftResultsMerger;
+import org.apache.cassandra.utils.IteratorWithLowerBound;
+
+/**
+ * An unfiltered row iterator with a lower bound retrieved from either the global
+ * sstable statistics or the row index lower bounds (if available in the cache).
+ * Before initializing the sstable unfiltered row iterator, we return an empty row
+ * with the clustering set to the lower bound. The empty row will be filtered out and
+ * the result is that if we don't need to access this sstable, i.e. due to the LIMIT conditon,
+ * then we will not. See CASSANDRA-8180 for examples of why this is useful.
+ */
+public class UnfilteredRowIteratorWithLowerBound extends LazilyInitializedUnfilteredRowIterator implements IteratorWithLowerBound<Unfiltered>
+{
+    private final SSTableReader sstable;
+    private final ClusteringIndexFilter filter;
+    private final ColumnFilter selectedColumns;
+    private final boolean isForThrift;
+    private final int nowInSec;
+    private final boolean applyThriftTransformation;
+    private ClusteringBound lowerBound;
+    private boolean firstItemRetrieved;
+
+    public UnfilteredRowIteratorWithLowerBound(DecoratedKey partitionKey,
+                                               SSTableReader sstable,
+                                               ClusteringIndexFilter filter,
+                                               ColumnFilter selectedColumns,
+                                               boolean isForThrift,
+                                               int nowInSec,
+                                               boolean applyThriftTransformation)
+    {
+        super(partitionKey);
+        this.sstable = sstable;
+        this.filter = filter;
+        this.selectedColumns = selectedColumns;
+        this.isForThrift = isForThrift;
+        this.nowInSec = nowInSec;
+        this.applyThriftTransformation = applyThriftTransformation;
+        this.lowerBound = null;
+        this.firstItemRetrieved = false;
+    }
+
+    public Unfiltered lowerBound()
+    {
+        if (lowerBound != null)
+            return makeBound(lowerBound);
+
+        // The partition index lower bound is more accurate than the sstable metadata lower bound but it is only
+        // present if the iterator has already been initialized, which we only do when there are tombstones since in
+        // this case we cannot use the sstable metadata clustering values
+        ClusteringBound ret = getPartitionIndexLowerBound();
+        return ret != null ? makeBound(ret) : makeBound(getMetadataLowerBound());
+    }
+
+    private Unfiltered makeBound(ClusteringBound bound)
+    {
+        if (bound == null)
+            return null;
+
+        if (lowerBound != bound)
+            lowerBound = bound;
+
+        return new RangeTombstoneBoundMarker(lowerBound, DeletionTime.LIVE);
+    }
+
+    @Override
+    protected UnfilteredRowIterator initializeIterator()
+    {
+        sstable.incrementReadCount();
+
+        @SuppressWarnings("resource") // 'iter' is added to iterators which is closed on exception, or through the closing of the final merged iterator
+        UnfilteredRowIterator iter = sstable.iterator(partitionKey(), filter.getSlices(metadata()), selectedColumns, filter.isReversed(), isForThrift);
+        return isForThrift && applyThriftTransformation
+               ? ThriftResultsMerger.maybeWrap(iter, nowInSec)
+               : iter;
+    }
+
+    @Override
+    protected Unfiltered computeNext()
+    {
+        Unfiltered ret = super.computeNext();
+        if (firstItemRetrieved)
+            return ret;
+
+        // Check that the lower bound is not bigger than the first item retrieved
+        firstItemRetrieved = true;
+        if (lowerBound != null && ret != null)
+            assert comparator().compare(lowerBound, ret.clustering()) <= 0
+                : String.format("Lower bound [%s ]is bigger than first returned value [%s] for sstable %s",
+                                lowerBound.toString(sstable.metadata),
+                                ret.toString(sstable.metadata),
+                                sstable.getFilename());
+
+        return ret;
+    }
+
+    private Comparator<Clusterable> comparator()
+    {
+        return filter.isReversed() ? sstable.metadata.comparator.reversed() : sstable.metadata.comparator;
+    }
+
+    @Override
+    public CFMetaData metadata()
+    {
+        return sstable.metadata;
+    }
+
+    @Override
+    public boolean isReverseOrder()
+    {
+        return filter.isReversed();
+    }
+
+    @Override
+    public PartitionColumns columns()
+    {
+        return selectedColumns.fetchedColumns();
+    }
+
+    @Override
+    public EncodingStats stats()
+    {
+        return sstable.stats();
+    }
+
+    @Override
+    public DeletionTime partitionLevelDeletion()
+    {
+        if (!sstable.hasTombstones())
+            return DeletionTime.LIVE;
+
+        return super.partitionLevelDeletion();
+    }
+
+    @Override
+    public Row staticRow()
+    {
+        if (columns().statics.isEmpty())
+            return Rows.EMPTY_STATIC_ROW;
+
+        return super.staticRow();
+    }
+
+    /**
+     * @return the lower bound stored on the index entry for this partition, if available.
+     */
+    private ClusteringBound getPartitionIndexLowerBound()
+    {
+        // NOTE: CASSANDRA-11206 removed the lookup against the key-cache as the IndexInfo objects are no longer
+        // in memory for not heap backed IndexInfo objects (so, these are on disk).
+        // CASSANDRA-11369 is there to fix this afterwards.
+
+        // Creating the iterator ensures that rowIndexEntry is loaded if available (partitions bigger than
+        // DatabaseDescriptor.column_index_size_in_kb)
+        if (!canUseMetadataLowerBound())
+            maybeInit();
+
+        RowIndexEntry rowIndexEntry = sstable.getCachedPosition(partitionKey(), false);
+        if (rowIndexEntry == null || !rowIndexEntry.indexOnHeap())
+            return null;
+
+        try (RowIndexEntry.IndexInfoRetriever onHeapRetriever = rowIndexEntry.openWithIndex(null))
+        {
+            IndexInfo column = onHeapRetriever.columnsIndex(filter.isReversed() ? rowIndexEntry.columnsIndexCount() - 1 : 0);
+            ClusteringPrefix lowerBoundPrefix = filter.isReversed() ? column.lastName : column.firstName;
+            assert lowerBoundPrefix.getRawValues().length <= sstable.metadata.comparator.size() :
+            String.format("Unexpected number of clustering values %d, expected %d or fewer for %s",
+                          lowerBoundPrefix.getRawValues().length,
+                          sstable.metadata.comparator.size(),
+                          sstable.getFilename());
+            return ClusteringBound.inclusiveOpen(filter.isReversed(), lowerBoundPrefix.getRawValues());
+        }
+        catch (IOException e)
+        {
+            throw new RuntimeException("should never occur", e);
+        }
+    }
+
+    /**
+     * @return true if we can use the clustering values in the stats of the sstable:
+     * - we need the latest stats file format (or else the clustering values create clusterings with the wrong size)
+     * - we cannot create tombstone bounds from these values only and so we rule out sstables with tombstones
+     */
+    private boolean canUseMetadataLowerBound()
+    {
+        return !sstable.hasTombstones() && sstable.descriptor.version.hasNewStatsFile();
+    }
+
+    /**
+     * @return a global lower bound made from the clustering values stored in the sstable metadata, note that
+     * this currently does not correctly compare tombstone bounds, especially ranges.
+     */
+    private ClusteringBound getMetadataLowerBound()
+    {
+        if (!canUseMetadataLowerBound())
+            return null;
+
+        final StatsMetadata m = sstable.getSSTableMetadata();
+        List<ByteBuffer> vals = filter.isReversed() ? m.maxClusteringValues : m.minClusteringValues;
+        assert vals.size() <= sstable.metadata.comparator.size() :
+        String.format("Unexpected number of clustering values %d, expected %d or fewer for %s",
+                      vals.size(),
+                      sstable.metadata.comparator.size(),
+                      sstable.getFilename());
+        return  ClusteringBound.inclusiveOpen(filter.isReversed(), vals.toArray(new ByteBuffer[vals.size()]));
+    }
+}

diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIterators.java b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIterators.java
index 3218ff2..d324190 100644
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredRowIterators.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredRowIterators.java

@@ -25,6 +25,7 @@
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.transform.FilteredRows;
 import org.apache.cassandra.db.transform.MoreRows;
 import org.apache.cassandra.db.transform.Transformation;
@@ -35,7 +36,6 @@
 import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.IMergeIterator;
 import org.apache.cassandra.utils.MergeIterator;
-import org.apache.cassandra.utils.memory.AbstractAllocator;
 
 /**
  * Static methods to work with atom iterators.
@@ -183,6 +183,23 @@
     }
 
     /**
+     * Filter the provided iterator to exclude cells that have been fetched but are not queried by the user
+     * (see ColumnFilter for detailes).
+     *
+     * @param iterator the iterator to filter.
+     * @param filter the {@code ColumnFilter} to use when deciding which columns are the one queried by the
+     * user. This should be the filter that was used when querying {@code iterator}.
+     * @return the filtered iterator..
+     */
+    public static UnfilteredRowIterator withOnlyQueriedData(UnfilteredRowIterator iterator, ColumnFilter filter)
+    {
+        if (filter.allFetchedColumnsAreQueried())
+            return iterator;
+
+        return Transformation.apply(iterator, new WithOnlyQueriedData(filter));
+    }
+
+    /**
      * Returns an iterator that concatenate two atom iterators.
      * This method assumes that both iterator are from the same partition and that the atom from
      * {@code iter2} come after the ones of {@code iter1} (that is, that concatenating the iterator
@@ -212,32 +229,6 @@
         return MoreRows.extend(iter1, new Extend());
     }
 
-    public static UnfilteredRowIterator cloningIterator(UnfilteredRowIterator iterator, final AbstractAllocator allocator)
-    {
-        class Cloner extends Transformation
-        {
-            private final Row.Builder builder = allocator.cloningBTreeRowBuilder();
-
-            public Row applyToStatic(Row row)
-            {
-                return Rows.copy(row, builder).build();
-            }
-
-            @Override
-            public Row applyToRow(Row row)
-            {
-                return Rows.copy(row, builder).build();
-            }
-
-            @Override
-            public RangeTombstoneMarker applyToMarker(RangeTombstoneMarker marker)
-            {
-                return marker.copy(allocator);
-            }
-        }
-        return Transformation.apply(iterator, new Cloner());
-    }
-
     /**
      * Validate that the data of the provided iterator is valid, that is that the values
      * it contains are valid for the type they represent, and more generally that the

diff --git a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
index dc6f187..5ca7e03 100644
--- a/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java
+++ b/src/java/org/apache/cassandra/db/rows/UnfilteredSerializer.java

@@ -21,52 +21,72 @@
 
 import com.google.common.collect.Collections2;
 
+import net.nicoulaj.compilecommand.annotations.Inline;
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.io.util.DataInputPlus;
+import org.apache.cassandra.io.util.DataOutputBuffer;
 import org.apache.cassandra.io.util.DataOutputPlus;
 import org.apache.cassandra.utils.SearchIterator;
 
 /**
  * Serialize/deserialize a single Unfiltered (both on-wire and on-disk).
+ * <p>
  *
- * The encoded format for an unfiltered is <flags>(<row>|<marker>) where:
- *
- *   <flags> is a byte (or two) whose bits are flags used by the rest of the serialization. Each
- *       flag is defined/explained below as the "Unfiltered flags" constants. One of those flags
- *       is an extension flag, and if present, trigger the rid of another byte that contains more
- *       flags. If the extension is not set, defaults are assumed for the flags of that 2nd byte.
- *   <row> is <clustering><size>[<timestamp>][<ttl>][<deletion>]<sc1>...<sci><cc1>...<ccj> where
- *       <clustering> is the row clustering as serialized by {@code Clustering.serializer} (note
- *       that static row are an exception and don't have this).
- *       <size> is the size of the whole unfiltered on disk (it's only used for sstables and is
- *       used to efficiently skip rows).
- *       <timestamp>, <ttl> and <deletion> are the row timestamp, ttl and deletion
- *       whose presence is determined by the flags. <sci> is the simple columns of the row and <ccj> the
- *       complex ones.
- *       The columns for the row are then serialized if they differ from those in the header,
- *       and each cell then follows:
- *         * Each simple column <sci> will simply be a <cell>
- *           (which might have no value, see below),
- *         * Each <ccj> will be [<delTime>]<n><cell1>...<celln> where <delTime>
- *           is the deletion for this complex column (if flags indicates it present), <n>
- *           is the vint encoded value of n, i.e. <celln>'s 1-based index, <celli>
- *           are the <cell> for this complex column
- *   <marker> is <bound><deletion> where <bound> is the marker bound as serialized
- *       by {@code Slice.Bound.serializer} and <deletion> is the marker deletion
- *       time.
- *
- *   <cell> A cell start with a 1 byte <flag>. The 2nd and third flag bits indicate if
- *       it's a deleted or expiring cell. The 4th flag indicates if the value
- *       is empty or not. The 5th and 6th indicates if the timestamp and ttl/
- *       localDeletionTime for the cell are the same than the row one (if that
- *       is the case, those are not repeated for the cell).Follows the <value>
- *       (unless it's marked empty in the flag) and a delta-encoded long <timestamp>
- *       (unless the flag tells to use the row level one).
- *       Then if it's a deleted or expiring cell a delta-encoded int <localDelTime>
- *       and if it's expiring a delta-encoded int <ttl> (unless it's an expiring cell
- *       and the ttl and localDeletionTime are indicated by the flags to be the same
- *       than the row ones, in which case none of those appears).
+ * The encoded format for an unfiltered is {@code <flags>(<row>|<marker>)} where:
+ * <ul>
+ *   <li>
+ *     {@code <flags>} is a byte (or two) whose bits are flags used by the rest
+ *     of the serialization. Each flag is defined/explained below as the
+ *     "Unfiltered flags" constants. One of those flags is an extension flag,
+ *     and if present, indicates the presence of a 2ndbyte that contains more
+ *     flags. If the extension is not set, defaults are assumed for the flags
+ *     of that 2nd byte.
+ *   </li>
+ *   <li>
+ *     {@code <row>} is
+ *        {@code <clustering><sizes>[<pkliveness>][<deletion>][<columns>]<columns_data>}
+ *     where:
+ *     <ul>
+ *       <li>{@code <clustering>} is the row clustering as serialized by
+ *           {@link org.apache.cassandra.db.Clustering.Serializer} (note that static row are an
+ *           exception and don't have this). </li>
+ *       <li>{@code <sizes>} are the sizes of the whole unfiltered on disk and
+ *           of the previous unfiltered. This is only present for sstables and
+ *           is used to efficiently skip rows (both forward and backward).</li>
+ *       <li>{@code <pkliveness>} is the row primary key liveness infos, and it
+ *           contains the timestamp, ttl and local deletion time of that info,
+ *           though some/all of those can be absent based on the flags. </li>
+ *       <li>{@code deletion} is the row deletion. It's presence is determined
+ *           by the flags and if present, it conists of both the deletion
+ *           timestamp and local deletion time.</li>
+ *       <li>{@code <columns>} are the columns present in the row  encoded by
+ *           {@link org.apache.cassandra.db.Columns.Serializer#serializeSubset}. It is absent if the row
+ *           contains all the columns of the {@code SerializationHeader} (which
+ *           is then indicated by a flag). </li>
+ *       <li>{@code <columns_data>} is the data for each of the column present
+ *           in the row. The encoding of each data depends on whether the data
+ *           is for a simple or complex column:
+ *           <ul>
+ *              <li>Simple columns are simply encoded as one {@code <cell>}</li>
+ *              <li>Complex columns are encoded as {@code [<delTime>]<n><cell1>...<celln>}
+ *                  where {@code <delTime>} is the deletion for this complex
+ *                  column (if flags indicates its presence), {@code <n>} is the
+ *                  vint encoded value of n, i.e. {@code <celln>}'s 1-based
+ *                  inde and {@code <celli>} are the {@code <cell>} for this
+ *                  complex column</li>
+ *           </ul>
+ *       </li>
+ *     </ul>
+ *   </li>
+ *   <li>
+ *     {@code <marker>} is {@code <bound><deletion>} where {@code <bound>} is
+ *     the marker bound as serialized by {@link org.apache.cassandra.db.ClusteringBoundOrBoundary.Serializer}
+ *     and {@code <deletion>} is the marker deletion time.
+ *   </li>
+ * </ul>
+ * <p>
+ * The serialization of a {@code <cell>} is defined by {@link Cell.Serializer}.
  */
 public class UnfilteredSerializer
 {
@@ -161,9 +181,37 @@
 
         if (header.isForSSTable())
         {
-            out.writeUnsignedVInt(serializedRowBodySize(row, header, previousUnfilteredSize, version));
-            out.writeUnsignedVInt(previousUnfilteredSize);
+            DataOutputBuffer dob = DataOutputBuffer.RECYCLER.get();
+            try
+            {
+                serializeRowBody(row, flags, header, dob);
+
+                out.writeUnsignedVInt(dob.position() + TypeSizes.sizeofUnsignedVInt(previousUnfilteredSize));
+                // We write the size of the previous unfiltered to make reverse queries more efficient (and simpler).
+                // This is currently not used however and using it is tbd.
+                out.writeUnsignedVInt(previousUnfilteredSize);
+                out.write(dob.buffer());
+            }
+            finally
+            {
+                dob.recycle();
+            }
         }
+        else
+        {
+            serializeRowBody(row, flags, header, out);
+        }
+    }
+
+    @Inline
+    private void serializeRowBody(Row row, int flags, SerializationHeader header, DataOutputPlus out)
+    throws IOException
+    {
+        boolean isStatic = row.isStatic();
+
+        Columns headerColumns = header.columns(isStatic);
+        LivenessInfo pkLiveness = row.primaryKeyLivenessInfo();
+        Row.Deletion deletion = row.deletion();
 
         if ((flags & HAS_TIMESTAMP) != 0)
             header.writeTimestamp(pkLiveness.timestamp(), out);
@@ -175,7 +223,7 @@
         if ((flags & HAS_DELETION) != 0)
             header.writeDeletionTime(deletion.time(), out);
 
-        if (!hasAllColumns)
+        if ((flags & HAS_ALL_COLUMNS) == 0)
             Columns.serializer.serializeSubset(Collections2.transform(row, ColumnData::column), headerColumns, out);
 
         SearchIterator<ColumnDefinition, ColumnDefinition> si = headerColumns.iterator();
@@ -192,7 +240,7 @@
             if (data.column.isSimple())
                 Cell.serializer.serialize((Cell) data, column, out, pkLiveness, header);
             else
-                writeComplexColumn((ComplexColumnData) data, column, hasComplexDeletion, pkLiveness, header, out);
+                writeComplexColumn((ComplexColumnData) data, column, (flags & HAS_COMPLEX_DELETION) != 0, pkLiveness, header, out);
         }
     }
 
@@ -211,7 +259,7 @@
     throws IOException
     {
         out.writeByte((byte)IS_MARKER);
-        RangeTombstone.Bound.serializer.serialize(marker.clustering(), out, version, header.clusteringTypes());
+        ClusteringBoundOrBoundary.serializer.serialize(marker.clustering(), out, version, header.clusteringTypes());
 
         if (header.isForSSTable())
         {
@@ -317,7 +365,7 @@
     {
         assert !header.isForSSTable();
         return 1 // flags
-             + RangeTombstone.Bound.serializer.serializedSize(marker.clustering(), version, header.clusteringTypes())
+             + ClusteringBoundOrBoundary.serializer.serializedSize(marker.clustering(), version, header.clusteringTypes())
              + serializedMarkerBodySize(marker, header, previousUnfilteredSize, version);
     }
 
@@ -364,7 +412,7 @@
 
         if (kind(flags) == Unfiltered.Kind.RANGE_TOMBSTONE_MARKER)
         {
-            RangeTombstone.Bound bound = RangeTombstone.Bound.serializer.deserialize(in, helper.version, header.clusteringTypes());
+            ClusteringBoundOrBoundary bound = ClusteringBoundOrBoundary.serializer.deserialize(in, helper.version, header.clusteringTypes());
             return deserializeMarkerBody(in, header, bound);
         }
         else
@@ -395,7 +443,7 @@
         return deserializeRowBody(in, header, helper, flags, extendedFlags, builder);
     }
 
-    public RangeTombstoneMarker deserializeMarkerBody(DataInputPlus in, SerializationHeader header, RangeTombstone.Bound bound)
+    public RangeTombstoneMarker deserializeMarkerBody(DataInputPlus in, SerializationHeader header, ClusteringBoundOrBoundary bound)
     throws IOException
     {
         if (header.isForSSTable())
@@ -405,9 +453,9 @@
         }
 
         if (bound.isBoundary())
-            return new RangeTombstoneBoundaryMarker(bound, header.readDeletionTime(in), header.readDeletionTime(in));
+            return new RangeTombstoneBoundaryMarker((ClusteringBoundary) bound, header.readDeletionTime(in), header.readDeletionTime(in));
         else
-            return new RangeTombstoneBoundMarker(bound, header.readDeletionTime(in));
+            return new RangeTombstoneBoundMarker((ClusteringBound) bound, header.readDeletionTime(in));
     }
 
     public Row deserializeRowBody(DataInputPlus in,
@@ -441,7 +489,7 @@
                 long timestamp = header.readTimestamp(in);
                 int ttl = hasTTL ? header.readTTL(in) : LivenessInfo.NO_TTL;
                 int localDeletionTime = hasTTL ? header.readLocalDeletionTime(in) : LivenessInfo.NO_EXPIRATION_TIME;
-                rowLiveness = LivenessInfo.create(timestamp, ttl, localDeletionTime);
+                rowLiveness = LivenessInfo.withExpirationTime(timestamp, ttl, localDeletionTime);
             }
 
             builder.addPrimaryKeyLivenessInfo(rowLiveness);
@@ -474,7 +522,7 @@
         if (helper.includes(column))
         {
             Cell cell = Cell.serializer.deserialize(in, rowLiveness, column, header, helper);
-            if (!helper.isDropped(cell, false))
+            if (helper.includes(cell, rowLiveness) && !helper.isDropped(cell, false))
                 builder.addCell(cell);
         }
         else
@@ -500,7 +548,7 @@
             while (--count >= 0)
             {
                 Cell cell = Cell.serializer.deserialize(in, rowLiveness, column, header, helper);
-                if (helper.includes(cell.path()) && !helper.isDropped(cell, true))
+                if (helper.includes(cell, rowLiveness) && !helper.isDropped(cell, true))
                     builder.addCell(cell);
             }
 

diff --git a/src/java/org/apache/cassandra/db/rows/WithOnlyQueriedData.java b/src/java/org/apache/cassandra/db/rows/WithOnlyQueriedData.java
new file mode 100644
index 0000000..5e7d905
--- /dev/null
+++ b/src/java/org/apache/cassandra/db/rows/WithOnlyQueriedData.java

@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db.rows;
+
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.filter.ColumnFilter;
+import org.apache.cassandra.db.transform.Transformation;
+
+/**
+ * Function to skip cells (from an iterator) that are not part of those queried by the user
+ * according to the provided {@code ColumnFilter}. See {@link UnfilteredRowIterators#withOnlyQueriedData}
+ * for more details.
+ */
+public class WithOnlyQueriedData<I extends BaseRowIterator<?>> extends Transformation<I>
+{
+    private final ColumnFilter filter;
+
+    public WithOnlyQueriedData(ColumnFilter filter)
+    {
+        this.filter = filter;
+    }
+
+    @Override
+    protected Row applyToStatic(Row row)
+    {
+        return row.withOnlyQueriedData(filter);
+    }
+
+    @Override
+    protected Row applyToRow(Row row)
+    {
+        return row.withOnlyQueriedData(filter);
+    }
+};

diff --git a/src/java/org/apache/cassandra/db/transform/BaseRows.java b/src/java/org/apache/cassandra/db/transform/BaseRows.java
index 7b0bb99..e4a6bb8 100644
--- a/src/java/org/apache/cassandra/db/transform/BaseRows.java
+++ b/src/java/org/apache/cassandra/db/transform/BaseRows.java

@@ -17,7 +17,7 @@
  * specific language governing permissions and limitations
  * under the License.
  *
-*/
+ */
 package org.apache.cassandra.db.transform;
 
 import org.apache.cassandra.config.CFMetaData;
@@ -33,11 +33,13 @@
 {
 
     private Row staticRow;
+    private DecoratedKey partitionKey;
 
     public BaseRows(I input)
     {
         super(input);
         staticRow = input.staticRow();
+        partitionKey = input.partitionKey();
     }
 
     // swap parameter order to avoid casting errors
@@ -45,6 +47,7 @@
     {
         super(copyFrom);
         staticRow = copyFrom.staticRow;
+        partitionKey = copyFrom.partitionKey();
     }
 
     public CFMetaData metadata()
@@ -104,6 +107,7 @@
         // transform any existing data
         staticRow = transformation.applyToStatic(staticRow);
         next = applyOne(next, transformation);
+        partitionKey = transformation.applyToPartitionKey(partitionKey);
     }
 
     @Override

diff --git a/src/java/org/apache/cassandra/db/transform/Stack.java b/src/java/org/apache/cassandra/db/transform/Stack.java
index f680ec9..b15dd55 100644
--- a/src/java/org/apache/cassandra/db/transform/Stack.java
+++ b/src/java/org/apache/cassandra/db/transform/Stack.java

@@ -24,6 +24,8 @@
 
 class Stack
 {
+    public static final Transformation[] EMPTY_TRANSFORMATIONS = new Transformation[0];
+    public static final MoreContentsHolder[] EMPTY_MORE_CONTENTS_HOLDERS = new MoreContentsHolder[0];
     static final Stack EMPTY = new Stack();
 
     Transformation[] stack;
@@ -44,8 +46,8 @@
 
     Stack()
     {
-        stack = new Transformation[0];
-        moreContents = new MoreContentsHolder[0];
+        stack = EMPTY_TRANSFORMATIONS;
+        moreContents = EMPTY_MORE_CONTENTS_HOLDERS;
     }
 
     Stack(Stack copy)

diff --git a/src/java/org/apache/cassandra/db/transform/Transformation.java b/src/java/org/apache/cassandra/db/transform/Transformation.java
index 6a31ece..3134725 100644
--- a/src/java/org/apache/cassandra/db/transform/Transformation.java
+++ b/src/java/org/apache/cassandra/db/transform/Transformation.java

@@ -20,6 +20,7 @@
  */
 package org.apache.cassandra.db.transform;
 
+import org.apache.cassandra.db.DecoratedKey;
 import org.apache.cassandra.db.DeletionTime;
 import org.apache.cassandra.db.partitions.PartitionIterator;
 import org.apache.cassandra.db.partitions.UnfilteredPartitionIterator;
@@ -82,6 +83,11 @@
     }
 
     /**
+     * Applied to the partition key of any rows/unfiltered iterator we are applied to
+     */
+    protected DecoratedKey applyToPartitionKey(DecoratedKey key) { return key; }
+
+    /**
      * Applied to the static row of any rows iterator.
      *
      * NOTE that this is only applied to the first iterator in any sequence of iterators filled by a MoreContents;

diff --git a/src/java/org/apache/cassandra/db/view/TableViews.java b/src/java/org/apache/cassandra/db/view/TableViews.java
index 7feb67c..e4cdde3 100644
--- a/src/java/org/apache/cassandra/db/view/TableViews.java
+++ b/src/java/org/apache/cassandra/db/view/TableViews.java

@@ -28,7 +28,7 @@
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.apache.cassandra.db.filter.*;
 import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.db.partitions.*;
@@ -94,7 +94,7 @@
             viewCfs.dumpMemtable();
     }
 
-    public void truncateBlocking(ReplayPosition replayAfter, long truncatedAt)
+    public void truncateBlocking(CommitLogPosition replayAfter, long truncatedAt)
     {
         for (ColumnFamilyStore viewCfs : allViewsCfs())
         {
@@ -133,7 +133,7 @@
         ColumnFamilyStore cfs = Keyspace.openAndGetStore(update.metadata());
         long start = System.nanoTime();
         Collection<Mutation> mutations;
-        try (ReadOrderGroup orderGroup = command.startOrderGroup();
+        try (ReadExecutionController orderGroup = command.executionController();
              UnfilteredRowIterator existings = UnfilteredPartitionIterators.getOnlyElement(command.executeLocally(orderGroup), command);
              UnfilteredRowIterator updates = update.unfilteredIterator())
         {

diff --git a/src/java/org/apache/cassandra/db/view/View.java b/src/java/org/apache/cassandra/db/view/View.java
index 845a6ab..0b8de9e 100644
--- a/src/java/org/apache/cassandra/db/view/View.java
+++ b/src/java/org/apache/cassandra/db/view/View.java

@@ -17,9 +17,7 @@
  */
 package org.apache.cassandra.db.view;
 
-import java.nio.ByteBuffer;
 import java.util.*;
-import java.util.concurrent.TimeUnit;
 import java.util.stream.Collectors;
 
 import javax.annotation.Nullable;
@@ -31,16 +29,11 @@
 import org.apache.cassandra.cql3.statements.SelectStatement;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.config.*;
-import org.apache.cassandra.cql3.ColumnIdentifier;
 import org.apache.cassandra.db.compaction.CompactionManager;
-import org.apache.cassandra.db.partitions.*;
 import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.schema.KeyspaceMetadata;
 import org.apache.cassandra.service.ClientState;
-import org.apache.cassandra.service.pager.QueryPager;
-import org.apache.cassandra.transport.Server;
 import org.apache.cassandra.utils.FBUtilities;
-import org.apache.cassandra.utils.btree.BTreeSet;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -88,15 +81,11 @@
 
     /**
      * This updates the columns stored which are dependent on the base CFMetaData.
-     *
-     * @return true if the view contains only columns which are part of the base's primary key; false if there is at
-     *         least one column which is not.
      */
     public void updateDefinition(ViewDefinition definition)
     {
         this.definition = definition;
 
-        CFMetaData viewCfm = definition.metadata;
         List<ColumnDefinition> nonPKDefPartOfViewPK = new ArrayList<>();
         for (ColumnDefinition baseColumn : baseCfs.metadata.allColumns())
         {
@@ -262,12 +251,12 @@
             if (rel.isMultiColumn())
             {
                 sb.append(((MultiColumnRelation) rel).getEntities().stream()
-                        .map(ColumnIdentifier.Raw::toCQLString)
+                        .map(ColumnDefinition.Raw::toString)
                         .collect(Collectors.joining(", ", "(", ")")));
             }
             else
             {
-                sb.append(((SingleColumnRelation) rel).getEntity().toCQLString());
+                sb.append(((SingleColumnRelation) rel).getEntity());
             }
 
             sb.append(" ").append(rel.operator()).append(" ");

diff --git a/src/java/org/apache/cassandra/db/view/ViewBuilder.java b/src/java/org/apache/cassandra/db/view/ViewBuilder.java
index b55eda0..8f8abb8 100644
--- a/src/java/org/apache/cassandra/db/view/ViewBuilder.java
+++ b/src/java/org/apache/cassandra/db/view/ViewBuilder.java

@@ -42,6 +42,7 @@
 import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.io.sstable.ReducingKeyIterator;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.repair.SystemDistributedKeyspace;
 import org.apache.cassandra.service.StorageProxy;
 import org.apache.cassandra.service.StorageService;
 import org.apache.cassandra.service.pager.QueryPager;
@@ -71,7 +72,6 @@
 
     private void buildKey(DecoratedKey key)
     {
-        AtomicLong noBase = new AtomicLong(Long.MAX_VALUE);
         ReadQuery selectQuery = view.getReadQuery();
         if (!selectQuery.selectsKey(key))
             return;
@@ -84,22 +84,31 @@
         UnfilteredRowIterator empty = UnfilteredRowIterators.noRowsIterator(baseCfs.metadata, key, Rows.EMPTY_STATIC_ROW, DeletionTime.LIVE, false);
 
         Collection<Mutation> mutations;
-        try (ReadOrderGroup orderGroup = command.startOrderGroup();
+        try (ReadExecutionController orderGroup = command.executionController();
              UnfilteredRowIterator data = UnfilteredPartitionIterators.getOnlyElement(command.executeLocally(orderGroup), command))
         {
             mutations = baseCfs.keyspace.viewManager.forTable(baseCfs.metadata).generateViewUpdates(Collections.singleton(view), data, empty, nowInSec);
         }
 
         if (!mutations.isEmpty())
+        {
+            AtomicLong noBase = new AtomicLong(Long.MAX_VALUE);
             StorageProxy.mutateMV(key.getKey(), mutations, true, noBase);
+        }
     }
 
     public void run()
     {
+        logger.trace("Running view builder for {}.{}", baseCfs.metadata.ksName, view.name);
+        UUID localHostId = SystemKeyspace.getLocalHostId();
         String ksname = baseCfs.metadata.ksName, viewName = view.name;
 
         if (SystemKeyspace.isViewBuilt(ksname, viewName))
+        {
+            if (!SystemKeyspace.isViewStatusReplicated(ksname, viewName))
+                updateDistributed(ksname, viewName, localHostId);
             return;
+        }
 
         Iterable<Range<Token>> ranges = StorageService.instance.getLocalRanges(baseCfs.metadata.ksName);
         final Pair<Integer, Token> buildStatus = SystemKeyspace.getViewBuildStatus(ksname, viewName);
@@ -142,6 +151,7 @@
         try (Refs<SSTableReader> sstables = baseCfs.selectAndReference(function).refs;
              ReducingKeyIterator iter = new ReducingKeyIterator(sstables))
         {
+            SystemDistributedKeyspace.startViewBuild(ksname, viewName, localHostId);
             while (!isStopped && iter.hasNext())
             {
                 DecoratedKey key = iter.next();
@@ -166,19 +176,36 @@
             }
 
             if (!isStopped)
-            SystemKeyspace.finishViewBuildStatus(ksname, viewName);
-
+            {
+                SystemKeyspace.finishViewBuildStatus(ksname, viewName);
+                updateDistributed(ksname, viewName, localHostId);
+            }
         }
         catch (Exception e)
         {
-            final ViewBuilder builder = new ViewBuilder(baseCfs, view);
-            ScheduledExecutors.nonPeriodicTasks.schedule(() -> CompactionManager.instance.submitViewBuilder(builder),
+            ScheduledExecutors.nonPeriodicTasks.schedule(() -> CompactionManager.instance.submitViewBuilder(this),
                                                          5,
                                                          TimeUnit.MINUTES);
             logger.warn("Materialized View failed to complete, sleeping 5 minutes before restarting", e);
         }
     }
 
+    private void updateDistributed(String ksname, String viewName, UUID localHostId)
+    {
+        try
+        {
+            SystemDistributedKeyspace.successfulViewBuild(ksname, viewName, localHostId);
+            SystemKeyspace.setViewBuiltReplicated(ksname, viewName);
+        }
+        catch (Exception e)
+        {
+            ScheduledExecutors.nonPeriodicTasks.schedule(() -> CompactionManager.instance.submitViewBuilder(this),
+                                                         5,
+                                                         TimeUnit.MINUTES);
+            logger.warn("Failed to updated the distributed status of view, sleeping 5 minutes before retrying", e);
+        }
+    }
+
     public CompactionInfo getCompactionInfo()
     {
         long rangesLeft = 0, rangesTotal = 0;

diff --git a/src/java/org/apache/cassandra/db/view/ViewManager.java b/src/java/org/apache/cassandra/db/view/ViewManager.java
index fd04b97..14bcd58 100644
--- a/src/java/org/apache/cassandra/db/view/ViewManager.java
+++ b/src/java/org/apache/cassandra/db/view/ViewManager.java

@@ -17,7 +17,6 @@
  */
 package org.apache.cassandra.db.view;
 
-import java.nio.ByteBuffer;
 import java.util.*;
 import java.util.concurrent.ConcurrentMap;
 import java.util.concurrent.ConcurrentHashMap;
@@ -31,17 +30,17 @@
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.ViewDefinition;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.db.partitions.*;
-
+import org.apache.cassandra.repair.SystemDistributedKeyspace;
+import org.apache.cassandra.service.StorageService;
 
 /**
  * Manages {@link View}'s for a single {@link ColumnFamilyStore}. All of the views for that table are created when this
  * manager is initialized.
  *
  * The main purposes of the manager are to provide a single location for updates to be vetted to see whether they update
- * any views {@link ViewManager#updatesAffectView(Collection, boolean)}, provide locks to prevent multiple
- * updates from creating incoherent updates in the view {@link ViewManager#acquireLockFor(ByteBuffer)}, and
+ * any views {@link #updatesAffectView(Collection, boolean)}, provide locks to prevent multiple
+ * updates from creating incoherent updates in the view {@link #acquireLockFor(int)}, and
  * to affect change on the view.
  *
  * TODO: I think we can get rid of that class. For addition/removal of view by names, we could move it Keyspace. And we
@@ -126,6 +125,20 @@
                 addView(entry.getValue());
         }
 
+        // Building views involves updating view build status in the system_distributed
+        // keyspace and therefore it requires ring information. This check prevents builds
+        // being submitted when Keyspaces are initialized during CassandraDaemon::setup as
+        // that happens before StorageService & gossip are initialized. After SS has been
+        // init'd we schedule builds for *all* views anyway, so this doesn't have any effect
+        // on startup. It does mean however, that builds will not be triggered if gossip is
+        // disabled via JMX or nodetool as that sets SS to an uninitialized state.
+        if (!StorageService.instance.isInitialized())
+        {
+            logger.info("Not submitting build tasks for views in keyspace {} as " +
+                        "storage service is not initialized", keyspace.getName());
+            return;
+        }
+
         for (View view : allViews())
         {
             view.build();
@@ -150,6 +163,7 @@
 
         forTable(view.getDefinition().baseTableMetadata()).removeByName(name);
         SystemKeyspace.setViewRemoved(keyspace.getName(), view.name);
+        SystemDistributedKeyspace.setViewRemoved(keyspace.getName(), view.name);
     }
 
     public void buildAllViews()
@@ -172,9 +186,9 @@
         return views;
     }
 
-    public static Lock acquireLockFor(ByteBuffer key)
+    public static Lock acquireLockFor(int keyAndCfidHash)
     {
-        Lock lock = LOCKS.get(key);
+        Lock lock = LOCKS.get(keyAndCfidHash);
 
         if (lock.tryLock())
             return lock;

diff --git a/src/java/org/apache/cassandra/db/view/ViewUpdateGenerator.java b/src/java/org/apache/cassandra/db/view/ViewUpdateGenerator.java
index 3bdc380..a8af37b 100644
--- a/src/java/org/apache/cassandra/db/view/ViewUpdateGenerator.java
+++ b/src/java/org/apache/cassandra/db/view/ViewUpdateGenerator.java

@@ -400,7 +400,7 @@
                 clusteringValues[viewColumn.position()] = value;
         }
 
-        currentViewEntryBuilder.newRow(new Clustering(clusteringValues));
+        currentViewEntryBuilder.newRow(Clustering.make(clusteringValues));
     }
 
     private LivenessInfo computeLivenessInfoForEntry(Row baseRow)
@@ -447,7 +447,7 @@
             }
             return ttl == baseLiveness.ttl()
                  ? baseLiveness
-                 : LivenessInfo.create(baseLiveness.timestamp(), ttl, expirationTime);
+                 : LivenessInfo.withExpirationTime(baseLiveness.timestamp(), ttl, expirationTime);
         }
 
         ColumnDefinition baseColumn = view.baseNonPKColumnsInViewPK.get(0);
@@ -455,7 +455,7 @@
         assert isLive(cell) : "We shouldn't have got there if the base row had no associated entry";
 
         long timestamp = Math.max(baseLiveness.timestamp(), cell.timestamp());
-        return LivenessInfo.create(timestamp, cell.ttl(), cell.localDeletionTime());
+        return LivenessInfo.withExpirationTime(timestamp, cell.ttl(), cell.localDeletionTime());
     }
 
     private long computeTimestampForEntryDeletion(Row baseRow)

diff --git a/src/java/org/apache/cassandra/db/view/ViewUtils.java b/src/java/org/apache/cassandra/db/view/ViewUtils.java
index 4d9517f..4dc1766 100644
--- a/src/java/org/apache/cassandra/db/view/ViewUtils.java
+++ b/src/java/org/apache/cassandra/db/view/ViewUtils.java

@@ -45,7 +45,7 @@
      * nodes in the local datacenter when calculating cardinality.
      *
      * For example, if we have the following ring:
-     *   A, T1 -> B, T2 -> C, T3 -> A
+     *   {@code A, T1 -> B, T2 -> C, T3 -> A}
      *
      * For the token T1, at RF=1, A would be included, so A's cardinality for T1 is 1. For the token T1, at RF=2, B would
      * be included, so B's cardinality for token T1 is 2. For token T3, at RF = 2, A would be included, so A's cardinality

diff --git a/src/java/org/apache/cassandra/dht/BootStrapper.java b/src/java/org/apache/cassandra/dht/BootStrapper.java
index 1c40482..392dbf2 100644
--- a/src/java/org/apache/cassandra/dht/BootStrapper.java
+++ b/src/java/org/apache/cassandra/dht/BootStrapper.java

@@ -73,7 +73,8 @@
                                                    "Bootstrap",
                                                    useStrictConsistency,
                                                    DatabaseDescriptor.getEndpointSnitch(),
-                                                   stateStore);
+                                                   stateStore,
+                                                   true);
         streamer.addSourceFilter(new RangeStreamer.FailureDetectorSourceFilter(FailureDetector.instance));
         streamer.addSourceFilter(new RangeStreamer.ExcludeLocalNodeFilter());
 

diff --git a/src/java/org/apache/cassandra/dht/ExcludingBounds.java b/src/java/org/apache/cassandra/dht/ExcludingBounds.java
index 8fbde28..ed6c2fc 100644
--- a/src/java/org/apache/cassandra/dht/ExcludingBounds.java
+++ b/src/java/org/apache/cassandra/dht/ExcludingBounds.java

@@ -23,7 +23,7 @@
 import org.apache.cassandra.utils.Pair;
 
 /**
- * AbstractBounds containing neither of its endpoints: (left, right).  Used by CQL key > X AND key < Y range scans.
+ * AbstractBounds containing neither of its endpoints: (left, right).  Used by {@code CQL key > X AND key < Y} range scans.
  */
 public class ExcludingBounds<T extends RingPosition<T>> extends AbstractBounds<T>
 {

diff --git a/src/java/org/apache/cassandra/dht/IPartitioner.java b/src/java/org/apache/cassandra/dht/IPartitioner.java
index e0a08dc..b559a6f 100644
--- a/src/java/org/apache/cassandra/dht/IPartitioner.java
+++ b/src/java/org/apache/cassandra/dht/IPartitioner.java

@@ -20,6 +20,7 @@
 import java.nio.ByteBuffer;
 import java.util.List;
 import java.util.Map;
+import java.util.Optional;
 
 import org.apache.cassandra.db.DecoratedKey;
 import org.apache.cassandra.db.marshal.AbstractType;
@@ -49,6 +50,17 @@
     public Token getMinimumToken();
 
     /**
+     * The biggest token for this partitioner, unlike getMinimumToken, this token is actually used and users wanting to
+     * include all tokens need to do getMaximumToken().maxKeyBound()
+     *
+     * Not implemented for the ordered partitioners
+     */
+    default Token getMaximumToken()
+    {
+        throw new UnsupportedOperationException("If you are using a splitting partitioner, getMaximumToken has to be implemented");
+    }
+
+    /**
      * @return a Token that can be used to route a given key
      * (This is NOT a method to create a Token from its string representation;
      * for that, use TokenFactory.fromString.)
@@ -84,4 +96,9 @@
      * Used by secondary indices.
      */
     public AbstractType<?> partitionOrdering();
+
+    default Optional<Splitter> splitter()
+    {
+        return Optional.empty();
+    }
 }

diff --git a/src/java/org/apache/cassandra/dht/IncludingExcludingBounds.java b/src/java/org/apache/cassandra/dht/IncludingExcludingBounds.java
index 19c098e..ac5185f 100644
--- a/src/java/org/apache/cassandra/dht/IncludingExcludingBounds.java
+++ b/src/java/org/apache/cassandra/dht/IncludingExcludingBounds.java

@@ -23,7 +23,7 @@
 import org.apache.cassandra.utils.Pair;
 
 /**
- * AbstractBounds containing only its left endpoint: [left, right).  Used by CQL key >= X AND key < Y range scans.
+ * AbstractBounds containing only its left endpoint: [left, right).  Used by {@code CQL key >= X AND key < Y} range scans.
  */
 public class IncludingExcludingBounds<T extends RingPosition<T>> extends AbstractBounds<T>
 {

diff --git a/src/java/org/apache/cassandra/dht/Murmur3Partitioner.java b/src/java/org/apache/cassandra/dht/Murmur3Partitioner.java
index d68be3f..9ed0cca 100644
--- a/src/java/org/apache/cassandra/dht/Murmur3Partitioner.java
+++ b/src/java/org/apache/cassandra/dht/Murmur3Partitioner.java

@@ -48,6 +48,19 @@
     public static final Murmur3Partitioner instance = new Murmur3Partitioner();
     public static final AbstractType<?> partitionOrdering = new PartitionerDefinedOrder(instance);
 
+    private final Splitter splitter = new Splitter(this)
+    {
+        public Token tokenForValue(BigInteger value)
+        {
+            return new LongToken(value.longValue());
+        }
+
+        public BigInteger valueForToken(Token token)
+        {
+            return BigInteger.valueOf(((LongToken) token).token);
+        }
+    };
+
     public DecoratedKey decorateKey(ByteBuffer key)
     {
         long[] hash = getHash(key);
@@ -265,7 +278,7 @@
         {
             try
             {
-                Long.valueOf(token);
+                fromString(token);
             }
             catch (NumberFormatException e)
             {
@@ -291,8 +304,18 @@
         return LongType.instance;
     }
 
+    public Token getMaximumToken()
+    {
+        return new LongToken(Long.MAX_VALUE);
+    }
+
     public AbstractType<?> partitionOrdering()
     {
         return partitionOrdering;
     }
+
+    public Optional<Splitter> splitter()
+    {
+        return Optional.of(splitter);
+    }
 }

diff --git a/src/java/org/apache/cassandra/dht/RandomPartitioner.java b/src/java/org/apache/cassandra/dht/RandomPartitioner.java
index b0dea01..96a96ca 100644
--- a/src/java/org/apache/cassandra/dht/RandomPartitioner.java
+++ b/src/java/org/apache/cassandra/dht/RandomPartitioner.java

@@ -50,6 +50,19 @@
     public static final RandomPartitioner instance = new RandomPartitioner();
     public static final AbstractType<?> partitionOrdering = new PartitionerDefinedOrder(instance);
 
+    private final Splitter splitter = new Splitter(this)
+    {
+        public Token tokenForValue(BigInteger value)
+        {
+            return new BigIntegerToken(value);
+        }
+
+        public BigInteger valueForToken(Token token)
+        {
+            return ((BigIntegerToken)token).getTokenValue();
+        }
+    };
+
     public DecoratedKey decorateKey(ByteBuffer key)
     {
         return new CachedHashDecoratedKey(getToken(key), key);
@@ -194,6 +207,11 @@
         return ownerships;
     }
 
+    public Token getMaximumToken()
+    {
+        return new BigIntegerToken(MAXIMUM);
+    }
+
     public AbstractType<?> getTokenValidator()
     {
         return IntegerType.instance;
@@ -203,4 +221,10 @@
     {
         return partitionOrdering;
     }
+
+    public Optional<Splitter> splitter()
+    {
+        return Optional.of(splitter);
+    }
+
 }

diff --git a/src/java/org/apache/cassandra/dht/Range.java b/src/java/org/apache/cassandra/dht/Range.java
index ba6854e..974d08e 100644
--- a/src/java/org/apache/cassandra/dht/Range.java
+++ b/src/java/org/apache/cassandra/dht/Range.java

@@ -166,7 +166,7 @@
         boolean thatwraps = isWrapAround(that.left, that.right);
         if (!thiswraps && !thatwraps)
         {
-            // neither wraps.  the straightforward case.
+            // neither wraps:  the straightforward case.
             if (!(left.compareTo(that.right) < 0 && that.left.compareTo(right) < 0))
                 return Collections.emptySet();
             return rangeSet(new Range<T>(ObjectUtils.max(this.left, that.left),
@@ -174,7 +174,7 @@
         }
         if (thiswraps && thatwraps)
         {
-            // if the starts are the same, one contains the other, which we have already ruled out.
+            //both wrap: if the starts are the same, one contains the other, which we have already ruled out.
             assert !this.left.equals(that.left);
             // two wrapping ranges always intersect.
             // since we have already determined that neither this nor that contains the other, we have 2 cases,
@@ -188,9 +188,9 @@
                    ? intersectionBothWrapping(this, that)
                    : intersectionBothWrapping(that, this);
         }
-        if (thiswraps && !thatwraps)
+        if (thiswraps) // this wraps, that does not wrap
             return intersectionOneWrapping(this, that);
-        assert (!thiswraps && thatwraps);
+        // the last case: this does not wrap, that wraps
         return intersectionOneWrapping(that, this);
     }
 
@@ -496,6 +496,23 @@
         return new Range<T>(left, newRight);
     }
 
+    public static <T extends RingPosition<T>> List<Range<T>> sort(Collection<Range<T>> ranges)
+    {
+        List<Range<T>> output = new ArrayList<>(ranges.size());
+        for (Range<T> r : ranges)
+            output.addAll(r.unwrap());
+        // sort by left
+        Collections.sort(output, new Comparator<Range<T>>()
+        {
+            public int compare(Range<T> b1, Range<T> b2)
+            {
+                return b1.left.compareTo(b2.left);
+            }
+        });
+        return output;
+    }
+
+
     /**
      * Compute a range of keys corresponding to a given range of token.
      */

diff --git a/src/java/org/apache/cassandra/dht/RangeStreamer.java b/src/java/org/apache/cassandra/dht/RangeStreamer.java
index 38282c6..ee2d792 100644
--- a/src/java/org/apache/cassandra/dht/RangeStreamer.java
+++ b/src/java/org/apache/cassandra/dht/RangeStreamer.java

@@ -128,13 +128,14 @@
                          String description,
                          boolean useStrictConsistency,
                          IEndpointSnitch snitch,
-                         StreamStateStore stateStore)
+                         StreamStateStore stateStore,
+                         boolean connectSequentially)
     {
         this.metadata = metadata;
         this.tokens = tokens;
         this.address = address;
         this.description = description;
-        this.streamPlan = new StreamPlan(description, true);
+        this.streamPlan = new StreamPlan(description, true, connectSequentially);
         this.useStrictConsistency = useStrictConsistency;
         this.snitch = snitch;
         this.stateStore = stateStore;
@@ -325,8 +326,8 @@
                         throw new IllegalStateException("Unable to find sufficient sources for streaming range " + range + " in keyspace " + keyspace + " with RF=1." +
                                                         "If you want to ignore this, consider using system property -Dcassandra.consistent.rangemovement=false.");
                     else
-                        logger.warn("Unable to find sufficient sources for streaming range " + range + " in keyspace " + keyspace + " with RF=1. " +
-                                    "Keyspace might be missing data.");
+                        logger.warn("Unable to find sufficient sources for streaming range {} in keyspace {} with RF=1. " +
+                                    "Keyspace might be missing data.", range, keyspace);
                 }
                 else
                     throw new IllegalStateException("Unable to find sufficient sources for streaming range " + range + " in keyspace " + keyspace);

diff --git a/src/java/org/apache/cassandra/dht/Splitter.java b/src/java/org/apache/cassandra/dht/Splitter.java
new file mode 100644
index 0000000..4268e83
--- /dev/null
+++ b/src/java/org/apache/cassandra/dht/Splitter.java

@@ -0,0 +1,127 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.dht;
+
+import java.math.BigInteger;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+
+/**
+ * Partition splitter.
+ */
+public abstract class Splitter
+{
+    private final IPartitioner partitioner;
+
+    protected Splitter(IPartitioner partitioner)
+    {
+        this.partitioner = partitioner;
+    }
+
+    protected abstract Token tokenForValue(BigInteger value);
+
+    protected abstract BigInteger valueForToken(Token token);
+
+    public List<Token> splitOwnedRanges(int parts, List<Range<Token>> localRanges, boolean dontSplitRanges)
+    {
+        if (localRanges.isEmpty() || parts == 1)
+            return Collections.singletonList(partitioner.getMaximumToken());
+
+        BigInteger totalTokens = BigInteger.ZERO;
+        for (Range<Token> r : localRanges)
+        {
+            BigInteger right = valueForToken(token(r.right));
+            totalTokens = totalTokens.add(right.subtract(valueForToken(r.left)));
+        }
+        BigInteger perPart = totalTokens.divide(BigInteger.valueOf(parts));
+        // the range owned is so tiny we can't split it:
+        if (perPart.equals(BigInteger.ZERO))
+            return Collections.singletonList(partitioner.getMaximumToken());
+
+        if (dontSplitRanges)
+            return splitOwnedRangesNoPartialRanges(localRanges, perPart, parts);
+
+        List<Token> boundaries = new ArrayList<>();
+        BigInteger sum = BigInteger.ZERO;
+        for (Range<Token> r : localRanges)
+        {
+            Token right = token(r.right);
+            BigInteger currentRangeWidth = valueForToken(right).subtract(valueForToken(r.left)).abs();
+            BigInteger left = valueForToken(r.left);
+            while (sum.add(currentRangeWidth).compareTo(perPart) >= 0)
+            {
+                BigInteger withinRangeBoundary = perPart.subtract(sum);
+                left = left.add(withinRangeBoundary);
+                boundaries.add(tokenForValue(left));
+                currentRangeWidth = currentRangeWidth.subtract(withinRangeBoundary);
+                sum = BigInteger.ZERO;
+            }
+            sum = sum.add(currentRangeWidth);
+        }
+        boundaries.set(boundaries.size() - 1, partitioner.getMaximumToken());
+
+        assert boundaries.size() == parts : boundaries.size() +"!="+parts+" "+boundaries+":"+localRanges;
+        return boundaries;
+    }
+
+    private List<Token> splitOwnedRangesNoPartialRanges(List<Range<Token>> localRanges, BigInteger perPart, int parts)
+    {
+        List<Token> boundaries = new ArrayList<>(parts);
+        BigInteger sum = BigInteger.ZERO;
+        int i = 0;
+        while (boundaries.size() < parts - 1)
+        {
+            Range<Token> r = localRanges.get(i);
+            Range<Token> nextRange = localRanges.get(i + 1);
+            Token right = token(r.right);
+            Token nextRight = token(nextRange.right);
+
+            BigInteger currentRangeWidth = valueForToken(right).subtract(valueForToken(r.left));
+            BigInteger nextRangeWidth = valueForToken(nextRight).subtract(valueForToken(nextRange.left));
+            sum = sum.add(currentRangeWidth);
+            // does this or next range take us beyond the per part limit?
+            if (sum.compareTo(perPart) > 0 || sum.add(nextRangeWidth).compareTo(perPart) > 0)
+            {
+                // Either this or the next range will take us beyond the perPart limit. Will stopping now or
+                // adding the next range create the smallest difference to perPart?
+                BigInteger diffCurrent = sum.subtract(perPart).abs();
+                BigInteger diffNext = sum.add(nextRangeWidth).subtract(perPart).abs();
+                if (diffNext.compareTo(diffCurrent) >= 0)
+                {
+                    sum = BigInteger.ZERO;
+                    boundaries.add(right);
+                }
+            }
+            i++;
+        }
+        boundaries.add(partitioner.getMaximumToken());
+        return boundaries;
+    }
+
+    /**
+     * We avoid calculating for wrap around ranges, instead we use the actual max token, and then, when translating
+     * to PartitionPositions, we include tokens from .minKeyBound to .maxKeyBound to make sure we include all tokens.
+     */
+    private Token token(Token t)
+    {
+        return t.equals(partitioner.getMinimumToken()) ? partitioner.getMaximumToken() : t;
+    }
+
+}

diff --git a/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java b/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java
index e715ff6..7658ea0 100644
--- a/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java
+++ b/src/java/org/apache/cassandra/dht/tokenallocator/TokenAllocation.java

@@ -62,8 +62,8 @@
             TokenMetadata tokenMetadataCopy = tokenMetadata.cloneOnlyTokenMap();
             tokenMetadataCopy.updateNormalTokens(tokens, endpoint);
             SummaryStatistics ns = replicatedOwnershipStats(tokenMetadataCopy, rs, endpoint);
-            logger.warn("Replicated node load in datacentre before allocation " + statToString(os));
-            logger.warn("Replicated node load in datacentre after allocation " + statToString(ns));
+            logger.warn("Replicated node load in datacentre before allocation {}", statToString(os));
+            logger.warn("Replicated node load in datacentre after allocation {}", statToString(ns));
 
             // TODO: Is it worth doing the replicated ownership calculation always to be able to raise this alarm?
             if (ns.getStandardDeviation() > os.getStandardDeviation())

diff --git a/src/java/org/apache/cassandra/exceptions/StartupException.java b/src/java/org/apache/cassandra/exceptions/StartupException.java
index ec4890f..1513cf9 100644
--- a/src/java/org/apache/cassandra/exceptions/StartupException.java
+++ b/src/java/org/apache/cassandra/exceptions/StartupException.java

@@ -23,6 +23,10 @@
  */
 public class StartupException extends Exception
 {
+    public final static int ERR_WRONG_MACHINE_STATE = 1;
+    public final static int ERR_WRONG_DISK_STATE = 3;
+    public final static int ERR_WRONG_CONFIG = 100;
+
     public final int returnCode;
 
     public StartupException(int returnCode, String message)

diff --git a/src/java/org/apache/cassandra/exceptions/UnrecognizedEntityException.java b/src/java/org/apache/cassandra/exceptions/UnrecognizedEntityException.java
deleted file mode 100644
index e8392e9..0000000
--- a/src/java/org/apache/cassandra/exceptions/UnrecognizedEntityException.java
+++ /dev/null

@@ -1,49 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.exceptions;
-
-import org.apache.cassandra.cql3.ColumnIdentifier;
-import org.apache.cassandra.cql3.Relation;
-
-/**
- * Exception thrown when an entity is not recognized within a relation.
- */
-public final class UnrecognizedEntityException extends InvalidRequestException
-{
-    /**
-     * The unrecognized entity.
-     */
-    public final ColumnIdentifier entity;
-
-    /**
-     * The entity relation.
-     */
-    public final Relation relation;
-
-    /**
-     * Creates a new <code>UnrecognizedEntityException</code>.
-     * @param entity the unrecognized entity
-     * @param relation the entity relation
-     */
-    public UnrecognizedEntityException(ColumnIdentifier entity, Relation relation)
-    {
-        super(String.format("Undefined name %s in where clause ('%s')", entity, relation));
-        this.entity = entity;
-        this.relation = relation;
-    }
-}

diff --git a/src/java/org/apache/cassandra/gms/FailureDetector.java b/src/java/org/apache/cassandra/gms/FailureDetector.java
index b9b7944..964b4ad 100644
--- a/src/java/org/apache/cassandra/gms/FailureDetector.java
+++ b/src/java/org/apache/cassandra/gms/FailureDetector.java

@@ -246,7 +246,7 @@
         // it's worth being defensive here so minor bugs don't cause disproportionate
         // badness.  (See CASSANDRA-1463 for an example).
         if (epState == null)
-            logger.error("unknown endpoint {}", ep);
+            logger.error("Unknown endpoint: " + ep, new IllegalArgumentException(""));
         return epState != null && epState.isAlive();
     }
 

diff --git a/src/java/org/apache/cassandra/gms/GossipDigestAckVerbHandler.java b/src/java/org/apache/cassandra/gms/GossipDigestAckVerbHandler.java
index 9f69a94..15662b1 100644
--- a/src/java/org/apache/cassandra/gms/GossipDigestAckVerbHandler.java
+++ b/src/java/org/apache/cassandra/gms/GossipDigestAckVerbHandler.java

@@ -61,8 +61,9 @@
         if (Gossiper.instance.isInShadowRound())
         {
             if (logger.isDebugEnabled())
-                logger.debug("Finishing shadow round with {}", from);
-            Gossiper.instance.finishShadowRound();
+                logger.debug("Received an ack from {}, which may trigger exit from shadow round", from);
+            // if the ack is completely empty, then we can infer that the respondent is also in a shadow round
+            Gossiper.instance.maybeFinishShadowRound(from, gDigestList.isEmpty() && epStateMap.isEmpty());
             return; // don't bother doing anything else, we have what we came for
         }
 

diff --git a/src/java/org/apache/cassandra/gms/GossipDigestSynVerbHandler.java b/src/java/org/apache/cassandra/gms/GossipDigestSynVerbHandler.java
index 1c67570..6d0afa2 100644
--- a/src/java/org/apache/cassandra/gms/GossipDigestSynVerbHandler.java
+++ b/src/java/org/apache/cassandra/gms/GossipDigestSynVerbHandler.java

@@ -38,7 +38,7 @@
         InetAddress from = message.from;
         if (logger.isTraceEnabled())
             logger.trace("Received a GossipDigestSynMessage from {}", from);
-        if (!Gossiper.instance.isEnabled())
+        if (!Gossiper.instance.isEnabled() && !Gossiper.instance.isInShadowRound())
         {
             if (logger.isTraceEnabled())
                 logger.trace("Ignoring GossipDigestSynMessage because gossip is disabled");
@@ -60,6 +60,32 @@
         }
 
         List<GossipDigest> gDigestList = gDigestMessage.getGossipDigests();
+
+        // if the syn comes from a peer performing a shadow round and this node is
+        // also currently in a shadow round, send back a minimal ack. This node must
+        // be in the sender's seed list and doing this allows the sender to
+        // differentiate between seeds from which it is partitioned and those which
+        // are in their shadow round
+        if (!Gossiper.instance.isEnabled() && Gossiper.instance.isInShadowRound())
+        {
+            // a genuine syn (as opposed to one from a node currently
+            // doing a shadow round) will always contain > 0 digests
+            if (gDigestList.size() > 0)
+            {
+                logger.debug("Ignoring non-empty GossipDigestSynMessage because currently in gossip shadow round");
+                return;
+            }
+
+            logger.debug("Received a shadow round syn from {}. Gossip is disabled but " +
+                         "currently also in shadow round, responding with a minimal ack", from);
+            MessagingService.instance()
+                            .sendOneWay(new MessageOut<>(MessagingService.Verb.GOSSIP_DIGEST_ACK,
+                                                         new GossipDigestAck(new ArrayList<>(), new HashMap<>()),
+                                                         GossipDigestAck.serializer),
+                                        from);
+            return;
+        }
+
         if (logger.isTraceEnabled())
         {
             StringBuilder sb = new StringBuilder();

diff --git a/src/java/org/apache/cassandra/gms/Gossiper.java b/src/java/org/apache/cassandra/gms/Gossiper.java
index 6f63727..c5d243f 100644
--- a/src/java/org/apache/cassandra/gms/Gossiper.java
+++ b/src/java/org/apache/cassandra/gms/Gossiper.java

@@ -123,6 +123,7 @@
     private final Map<InetAddress, Long> expireTimeEndpointMap = new ConcurrentHashMap<InetAddress, Long>();
 
     private volatile boolean inShadowRound = false;
+    private final Set<InetAddress> seedsInShadowRound = new ConcurrentSkipListSet<>(inetcomparator);
 
     private volatile long lastProcessedMessageAt = System.currentTimeMillis();
 
@@ -708,27 +709,46 @@
     }
 
     /**
-     * Check if this endpoint can safely bootstrap into the cluster.
+     * Check if this node can safely be started and join the ring.
+     * If the node is bootstrapping, examines gossip state for any previous status to decide whether
+     * it's safe to allow this node to start and bootstrap. If not bootstrapping, compares the host ID
+     * that the node itself has (obtained by reading from system.local or generated if not present)
+     * with the host ID obtained from gossip for the endpoint address (if any). This latter case
+     * prevents a non-bootstrapping, new node from being started with the same address of a
+     * previously started, but currently down predecessor.
      *
      * @param endpoint - the endpoint to check
-     * @return true if the endpoint can join the cluster
+     * @param localHostUUID - the host id to check
+     * @param isBootstrapping - whether the node intends to bootstrap when joining
+     * @return true if it is safe to start the node, false otherwise
      */
-    public boolean isSafeForBootstrap(InetAddress endpoint)
+    public boolean isSafeForStartup(InetAddress endpoint, UUID localHostUUID, boolean isBootstrapping)
     {
         EndpointState epState = endpointStateMap.get(endpoint);
-
         // if there's no previous state, or the node was previously removed from the cluster, we're good
         if (epState == null || isDeadState(epState))
             return true;
 
-        String status = getGossipStatus(epState);
-
-        // these states are not allowed to join the cluster as it would not be safe
-        final List<String> unsafeStatuses = new ArrayList<String>() {{
-            add(""); // failed bootstrap but we did start gossiping
-            add(VersionedValue.STATUS_NORMAL); // node is legit in the cluster or it was stopped with kill -9
-            add(VersionedValue.SHUTDOWN); }}; // node was shutdown
-        return !unsafeStatuses.contains(status);
+        if (isBootstrapping)
+        {
+            String status = getGossipStatus(epState);
+            // these states are not allowed to join the cluster as it would not be safe
+            final List<String> unsafeStatuses = new ArrayList<String>()
+            {{
+                add("");                           // failed bootstrap but we did start gossiping
+                add(VersionedValue.STATUS_NORMAL); // node is legit in the cluster or it was stopped with kill -9
+                add(VersionedValue.SHUTDOWN);      // node was shutdown
+            }};
+            return !unsafeStatuses.contains(status);
+        }
+        else
+        {
+            // if the previous UUID matches what we currently have (i.e. what was read from
+            // system.local at startup), then we're good to start up. Otherwise, something
+            // is amiss and we need to replace the previous node
+            VersionedValue previous = epState.getApplicationState(ApplicationState.HOST_ID);
+            return UUID.fromString(previous.value).equals(localHostUUID);
+        }
     }
 
     private void doStatusCheck()
@@ -1318,11 +1338,19 @@
 
     /**
      *  Do a single 'shadow' round of gossip, where we do not modify any state
-     *  Only used when replacing a node, to get and assume its states
+     *  Used when preparing to join the ring:
+     *      * when replacing a node, to get and assume its tokens
+     *      * when joining, to check that the local host id matches any previous id for the endpoint address
      */
     public void doShadowRound()
     {
         buildSeedsList();
+        // it may be that the local address is the only entry in the seed
+        // list in which case, attempting a shadow round is pointless
+        if (seeds.isEmpty())
+            return;
+
+        seedsInShadowRound.clear();
         // send a completely empty syn
         List<GossipDigest> gDigests = new ArrayList<GossipDigest>();
         GossipDigestSyn digestSynMessage = new GossipDigestSyn(DatabaseDescriptor.getClusterName(),
@@ -1341,6 +1369,7 @@
                 if (slept % 5000 == 0)
                 { // CASSANDRA-8072, retry at the beginning and every 5 seconds
                     logger.trace("Sending shadow round GOSSIP DIGEST SYN to seeds {}", seeds);
+
                     for (InetAddress seed : seeds)
                         MessagingService.instance().sendOneWay(message, seed);
                 }
@@ -1351,7 +1380,15 @@
 
                 slept += 1000;
                 if (slept > StorageService.RING_DELAY)
-                    throw new RuntimeException("Unable to gossip with any seeds");
+                {
+                    // if we don't consider ourself to be a seed, fail out
+                    if (!DatabaseDescriptor.getSeeds().contains(FBUtilities.getBroadcastAddress()))
+                        throw new RuntimeException("Unable to gossip with any seeds");
+
+                    logger.warn("Unable to gossip with any seeds but continuing since node is in its own seed list");
+                    inShadowRound = false;
+                    break;
+                }
             }
         }
         catch (InterruptedException wtf)
@@ -1478,10 +1515,33 @@
         return (scheduledGossipTask != null) && (!scheduledGossipTask.isCancelled());
     }
 
-    protected void finishShadowRound()
+    protected void maybeFinishShadowRound(InetAddress respondent, boolean isInShadowRound)
     {
         if (inShadowRound)
-            inShadowRound = false;
+        {
+            if (!isInShadowRound)
+            {
+                logger.debug("Received a regular ack from {}, can now exit shadow round", respondent);
+                // respondent sent back a full ack, so we can exit our shadow round
+                inShadowRound = false;
+                seedsInShadowRound.clear();
+            }
+            else
+            {
+                // respondent indicates it too is in a shadow round, if all seeds
+                // are in this state then we can exit our shadow round. Otherwise,
+                // we keep retrying the SR until one responds with a full ACK or
+                // we learn that all seeds are in SR.
+                logger.debug("Received an ack from {} indicating it is also in shadow round", respondent);
+                seedsInShadowRound.add(respondent);
+                if (seedsInShadowRound.containsAll(seeds))
+                {
+                    logger.debug("All seeds are in a shadow round, clearing this node to exit its own");
+                    inShadowRound = false;
+                    seedsInShadowRound.clear();
+                }
+            }
+        }
     }
 
     protected boolean isInShadowRound()

diff --git a/src/java/org/apache/cassandra/gms/TokenSerializer.java b/src/java/org/apache/cassandra/gms/TokenSerializer.java
index 1404258..41bd821 100644
--- a/src/java/org/apache/cassandra/gms/TokenSerializer.java
+++ b/src/java/org/apache/cassandra/gms/TokenSerializer.java

@@ -19,6 +19,8 @@
 
 import org.apache.cassandra.dht.IPartitioner;
 import org.apache.cassandra.dht.Token;
+import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.utils.FBUtilities;
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
@@ -37,9 +39,9 @@
     {
         for (Token token : tokens)
         {
-            byte[] bintoken = partitioner.getTokenFactory().toByteArray(token).array();
-            out.writeInt(bintoken.length);
-            out.write(bintoken);
+            ByteBuffer tokenBuffer = partitioner.getTokenFactory().toByteArray(token);
+            assert tokenBuffer.arrayOffset() == 0;
+            ByteBufferUtil.writeWithLength(tokenBuffer.array(), out);
         }
         out.writeInt(0);
     }
@@ -52,7 +54,7 @@
             int size = in.readInt();
             if (size < 1)
                 break;
-            logger.trace("Reading token of {} bytes", size);
+            logger.trace("Reading token of {}", FBUtilities.prettyPrintMemory(size));
             byte[] bintoken = new byte[size];
             in.readFully(bintoken);
             tokens.add(partitioner.getTokenFactory().fromByteArray(ByteBuffer.wrap(bintoken)));

diff --git a/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkOutputFormat.java b/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkOutputFormat.java
index 051447c..dfdf855 100644
--- a/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkOutputFormat.java
+++ b/src/java/org/apache/cassandra/hadoop/cql3/CqlBulkOutputFormat.java

@@ -45,10 +45,10 @@
  * As is the case with the {@link org.apache.cassandra.hadoop.cql3.CqlOutputFormat}, 
  * you need to set the prepared statement in your
  * Hadoop job Configuration. The {@link CqlConfigHelper} class, through its
- * {@link ConfigHelper#setOutputPreparedStatement} method, is provided to make this
+ * {@link org.apache.cassandra.hadoop.ConfigHelper#setOutputPreparedStatement} method, is provided to make this
  * simple.
  * you need to set the Keyspace. The {@link ConfigHelper} class, through its
- * {@link ConfigHelper#setOutputColumnFamily} method, is provided to make this
+ * {@link org.apache.cassandra.hadoop.ConfigHelper#setOutputColumnFamily} method, is provided to make this
  * simple.
  * </p>
  */

diff --git a/src/java/org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java b/src/java/org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java
index 757be65..f76f7d9 100644
--- a/src/java/org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java
+++ b/src/java/org/apache/cassandra/hadoop/cql3/CqlConfigHelper.java

@@ -529,13 +529,14 @@
     public static Optional<SSLOptions> getSSLOptions(Configuration conf)
     {
         Optional<String> truststorePath = getInputNativeSSLTruststorePath(conf);
-        Optional<String> keystorePath = getInputNativeSSLKeystorePath(conf);
-        Optional<String> truststorePassword = getInputNativeSSLTruststorePassword(conf);
-        Optional<String> keystorePassword = getInputNativeSSLKeystorePassword(conf);
-        Optional<String> cipherSuites = getInputNativeSSLCipherSuites(conf);
 
         if (truststorePath.isPresent())
         {
+            Optional<String> keystorePath = getInputNativeSSLKeystorePath(conf);
+            Optional<String> truststorePassword = getInputNativeSSLTruststorePassword(conf);
+            Optional<String> keystorePassword = getInputNativeSSLKeystorePassword(conf);
+            Optional<String> cipherSuites = getInputNativeSSLCipherSuites(conf);
+
             SSLContext context;
             try
             {

diff --git a/src/java/org/apache/cassandra/hadoop/cql3/CqlInputFormat.java b/src/java/org/apache/cassandra/hadoop/cql3/CqlInputFormat.java
index a426532..daba701 100644
--- a/src/java/org/apache/cassandra/hadoop/cql3/CqlInputFormat.java
+++ b/src/java/org/apache/cassandra/hadoop/cql3/CqlInputFormat.java

@@ -213,7 +213,7 @@
                 metadata.newToken(partitioner.getTokenFactory().toString(range.right)));
     }
 
-    private Map<TokenRange, Long> getSubSplits(String keyspace, String cfName, TokenRange range, Configuration conf, Session session) throws IOException
+    private Map<TokenRange, Long> getSubSplits(String keyspace, String cfName, TokenRange range, Configuration conf, Session session)
     {
         int splitSize = ConfigHelper.getInputSplitSize(conf);
         int splitSizeMb = ConfigHelper.getInputSplitSizeInMb(conf);

diff --git a/src/java/org/apache/cassandra/hints/ChecksummedDataInput.java b/src/java/org/apache/cassandra/hints/ChecksummedDataInput.java
index 095d7f4..8bb5b6d 100644
--- a/src/java/org/apache/cassandra/hints/ChecksummedDataInput.java
+++ b/src/java/org/apache/cassandra/hints/ChecksummedDataInput.java

@@ -22,9 +22,11 @@
 import java.nio.ByteBuffer;
 import java.util.zip.CRC32;
 
-import org.apache.cassandra.io.util.ChannelProxy;
-import org.apache.cassandra.io.util.DataPosition;
-import org.apache.cassandra.io.util.RandomAccessReader;
+import com.google.common.base.Preconditions;
+
+import org.apache.cassandra.io.compress.BufferType;
+import org.apache.cassandra.io.util.*;
+import org.apache.cassandra.utils.memory.BufferPool;
 
 /**
  * A {@link RandomAccessReader} wrapper that calctulates the CRC in place.
@@ -37,35 +39,55 @@
  * corrupted sequence by reading a huge corrupted length of bytes via
  * via {@link org.apache.cassandra.utils.ByteBufferUtil#readWithLength(java.io.DataInput)}.
  */
-public class ChecksummedDataInput extends RandomAccessReader.RandomAccessReaderWithOwnChannel
+public class ChecksummedDataInput extends RebufferingInputStream
 {
     private final CRC32 crc;
     private int crcPosition;
     private boolean crcUpdateDisabled;
 
     private long limit;
-    private DataPosition limitMark;
+    private long limitMark;
 
-    protected ChecksummedDataInput(Builder builder)
+    protected long bufferOffset;
+    protected final ChannelProxy channel;
+
+    ChecksummedDataInput(ChannelProxy channel, BufferType bufferType)
     {
-        super(builder);
+        super(BufferPool.get(RandomAccessReader.DEFAULT_BUFFER_SIZE, bufferType));
 
         crc = new CRC32();
         crcPosition = 0;
         crcUpdateDisabled = false;
+        this.channel = channel;
+        bufferOffset = 0;
+        buffer.limit(0);
 
         resetLimit();
     }
 
-    @SuppressWarnings("resource")   // channel owned by RandomAccessReaderWithOwnChannel
-    public static ChecksummedDataInput open(File file)
+    ChecksummedDataInput(ChannelProxy channel)
     {
-        return new Builder(new ChannelProxy(file)).build();
+        this(channel, BufferType.OFF_HEAP);
     }
 
-    protected void releaseBuffer()
+    @SuppressWarnings("resource")
+    public static ChecksummedDataInput open(File file)
     {
-        super.releaseBuffer();
+        return new ChecksummedDataInput(new ChannelProxy(file));
+    }
+
+    public boolean isEOF()
+    {
+        return getPosition() == channel.size();
+    }
+
+    /**
+     * Returns the position in the source file, which is different for getPosition() for compressed/encrypted files
+     * and may be imprecise.
+     */
+    public long getSourcePosition()
+    {
+        return getPosition();
     }
 
     public void resetCrc()
@@ -76,29 +98,34 @@
 
     public void limit(long newLimit)
     {
-        limit = newLimit;
-        limitMark = mark();
+        limitMark = getPosition();
+        limit = limitMark + newLimit;
+    }
+
+    /**
+     * Returns the exact position in the uncompressed view of the file.
+     */
+    protected long getPosition()
+    {
+        return bufferOffset + buffer.position();
     }
 
     public void resetLimit()
     {
         limit = Long.MAX_VALUE;
-        limitMark = null;
+        limitMark = -1;
     }
 
     public void checkLimit(int length) throws IOException
     {
-        if (limitMark == null)
-            return;
-
-        if ((bytesPastLimit() + length) > limit)
+        if (getPosition() + length > limit)
             throw new IOException("Digest mismatch exception");
     }
 
     public long bytesPastLimit()
     {
-        assert limitMark != null;
-        return bytesPastMark(limitMark);
+        assert limitMark != -1;
+        return getPosition() - limitMark;
     }
 
     public boolean checkCrc() throws IOException
@@ -134,13 +161,24 @@
     }
 
     @Override
-    public void reBuffer()
+    protected void reBuffer()
     {
+        Preconditions.checkState(buffer.remaining() == 0);
         updateCrc();
-        super.reBuffer();
+        bufferOffset += buffer.limit();
+
+        readBuffer();
+
         crcPosition = buffer.position();
     }
 
+    protected void readBuffer()
+    {
+        buffer.clear();
+        while ((channel.read(buffer, bufferOffset)) == 0) {}
+        buffer.flip();
+    }
+
     private void updateCrc()
     {
         if (crcPosition == buffer.position() || crcUpdateDisabled)
@@ -155,16 +193,20 @@
         crc.update(unprocessed);
     }
 
-    public static class Builder extends RandomAccessReader.Builder
+    @Override
+    public void close()
     {
-        public Builder(ChannelProxy channel)
-        {
-            super(channel);
-        }
+        BufferPool.put(buffer);
+        channel.close();
+    }
 
-        public ChecksummedDataInput build()
-        {
-            return new ChecksummedDataInput(this);
-        }
+    protected String getPath()
+    {
+        return channel.filePath();
+    }
+
+    public ChannelProxy getChannel()
+    {
+        return channel;
     }
 }

diff --git a/src/java/org/apache/cassandra/hints/CompressedChecksummedDataInput.java b/src/java/org/apache/cassandra/hints/CompressedChecksummedDataInput.java
index cc4a6bd..f584dd1 100644
--- a/src/java/org/apache/cassandra/hints/CompressedChecksummedDataInput.java
+++ b/src/java/org/apache/cassandra/hints/CompressedChecksummedDataInput.java

@@ -21,6 +21,8 @@
 import java.io.IOException;
 import java.nio.ByteBuffer;
 
+import com.google.common.annotations.VisibleForTesting;
+
 import org.apache.cassandra.io.FSReadError;
 import org.apache.cassandra.io.compress.ICompressor;
 import org.apache.cassandra.io.util.ChannelProxy;
@@ -33,13 +35,11 @@
     private volatile ByteBuffer compressedBuffer = null;
     private final ByteBuffer metadataBuffer = ByteBuffer.allocate(CompressedHintsWriter.METADATA_SIZE);
 
-    public CompressedChecksummedDataInput(Builder builder)
+    public CompressedChecksummedDataInput(ChannelProxy channel, ICompressor compressor, long filePosition)
     {
-        super(builder);
-        assert regions == null;  //mmapped regions are not supported
-
-        compressor = builder.compressor;
-        filePosition = builder.position;
+        super(channel, compressor.preferredBufferType());
+        this.compressor = compressor;
+        this.filePosition = filePosition;
     }
 
     /**
@@ -51,7 +51,13 @@
         return filePosition == channel.size() && buffer.remaining() == 0;
     }
 
-    protected void reBufferStandard()
+    public long getSourcePosition()
+    {
+        return filePosition;
+    }
+
+    @Override
+    protected void readBuffer()
     {
         metadataBuffer.clear();
         channel.read(metadataBuffer, filePosition);
@@ -68,7 +74,7 @@
             {
                 BufferPool.put(compressedBuffer);
             }
-            compressedBuffer = allocateBuffer(bufferSize, compressor.preferredBufferType());
+            compressedBuffer = BufferPool.get(bufferSize, compressor.preferredBufferType());
         }
 
         compressedBuffer.clear();
@@ -77,12 +83,11 @@
         compressedBuffer.rewind();
         filePosition += compressedSize;
 
-        bufferOffset += buffer.position();
         if (buffer.capacity() < uncompressedSize)
         {
             int bufferSize = uncompressedSize + (uncompressedSize / 20);
             BufferPool.put(buffer);
-            buffer = allocateBuffer(bufferSize, compressor.preferredBufferType());
+            buffer = BufferPool.get(bufferSize, compressor.preferredBufferType());
         }
 
         buffer.clear();
@@ -98,63 +103,25 @@
         }
     }
 
-    protected void releaseBuffer()
+    @Override
+    public void close()
     {
-        super.releaseBuffer();
-        if (compressedBuffer != null)
-        {
-            BufferPool.put(compressedBuffer);
-            compressedBuffer = null;
-        }
+        BufferPool.put(compressedBuffer);
+        super.close();
     }
 
-    protected void reBufferMmap()
-    {
-        throw new UnsupportedOperationException();
-    }
-
-    public static final class Builder extends ChecksummedDataInput.Builder
-    {
-        private long position;
-        private ICompressor compressor;
-
-        public Builder(ChannelProxy channel)
-        {
-            super(channel);
-            bufferType = null;
-        }
-
-        public CompressedChecksummedDataInput build()
-        {
-            assert position >= 0;
-            assert compressor != null;
-            return new CompressedChecksummedDataInput(this);
-        }
-
-        public Builder withCompressor(ICompressor compressor)
-        {
-            this.compressor = compressor;
-            bufferType = compressor.preferredBufferType();
-            return this;
-        }
-
-        public Builder withPosition(long position)
-        {
-            this.position = position;
-            return this;
-        }
-    }
-
-    // Closing the CompressedChecksummedDataInput will close the underlying channel.
-    @SuppressWarnings("resource")
-    public static final CompressedChecksummedDataInput upgradeInput(ChecksummedDataInput input, ICompressor compressor)
+    @SuppressWarnings("resource") // Closing the ChecksummedDataInput will close the underlying channel.
+    public static ChecksummedDataInput upgradeInput(ChecksummedDataInput input, ICompressor compressor)
     {
         long position = input.getPosition();
         input.close();
 
-        Builder builder = new Builder(new ChannelProxy(input.getPath()));
-        builder.withPosition(position);
-        builder.withCompressor(compressor);
-        return builder.build();
+        return new CompressedChecksummedDataInput(new ChannelProxy(input.getPath()), compressor, position);
+    }
+
+    @VisibleForTesting
+    ICompressor getCompressor()
+    {
+        return compressor;
     }
 }

diff --git a/src/java/org/apache/cassandra/hints/CompressedHintsWriter.java b/src/java/org/apache/cassandra/hints/CompressedHintsWriter.java
index 491dceb..8792e32 100644
--- a/src/java/org/apache/cassandra/hints/CompressedHintsWriter.java
+++ b/src/java/org/apache/cassandra/hints/CompressedHintsWriter.java

@@ -24,6 +24,8 @@
 import java.nio.channels.FileChannel;
 import java.util.zip.CRC32;
 
+import com.google.common.annotations.VisibleForTesting;
+
 import org.apache.cassandra.io.compress.ICompressor;
 
 public class CompressedHintsWriter extends HintsWriter
@@ -64,4 +66,10 @@
         compressionBuffer.limit(compressedSize + METADATA_SIZE);
         super.writeBuffer(compressionBuffer);
     }
+
+    @VisibleForTesting
+    ICompressor getCompressor()
+    {
+        return compressor;
+    }
 }

diff --git a/src/java/org/apache/cassandra/hints/EncryptedChecksummedDataInput.java b/src/java/org/apache/cassandra/hints/EncryptedChecksummedDataInput.java
new file mode 100644
index 0000000..7ecfbfe
--- /dev/null
+++ b/src/java/org/apache/cassandra/hints/EncryptedChecksummedDataInput.java

@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.hints;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import javax.crypto.Cipher;
+
+import com.google.common.annotations.VisibleForTesting;
+
+import org.apache.cassandra.security.EncryptionUtils;
+import org.apache.cassandra.io.FSReadError;
+import org.apache.cassandra.io.compress.ICompressor;
+import org.apache.cassandra.io.util.ChannelProxy;
+
+public class EncryptedChecksummedDataInput extends ChecksummedDataInput
+{
+    private static final ThreadLocal<ByteBuffer> reusableBuffers = new ThreadLocal<ByteBuffer>()
+    {
+        protected ByteBuffer initialValue()
+        {
+            return ByteBuffer.allocate(0);
+        }
+    };
+
+    private final Cipher cipher;
+    private final ICompressor compressor;
+
+    private final EncryptionUtils.ChannelProxyReadChannel readChannel;
+
+    protected EncryptedChecksummedDataInput(ChannelProxy channel, Cipher cipher, ICompressor compressor, long filePosition)
+    {
+        super(channel);
+        this.cipher = cipher;
+        this.compressor = compressor;
+        readChannel = new EncryptionUtils.ChannelProxyReadChannel(channel, filePosition);
+        assert cipher != null;
+        assert compressor != null;
+    }
+
+    /**
+     * Since an entire block of compressed data is read off of disk, not just a hint at a time,
+     * we don't report EOF until the decompressed data has also been read completely
+     */
+    public boolean isEOF()
+    {
+        return getSourcePosition() == channel.size() && buffer.remaining() == 0;
+    }
+
+    public long getSourcePosition()
+    {
+        return readChannel.getCurrentPosition();
+    }
+
+    @Override
+    protected void readBuffer()
+    {
+        try
+        {
+            ByteBuffer byteBuffer = reusableBuffers.get();
+            ByteBuffer decrypted = EncryptionUtils.decrypt(readChannel, byteBuffer, true, cipher);
+            buffer = EncryptionUtils.uncompress(decrypted, buffer, true, compressor);
+
+            if (decrypted.capacity() > byteBuffer.capacity())
+                reusableBuffers.set(decrypted);
+        }
+        catch (IOException ioe)
+        {
+            throw new FSReadError(ioe, getPath());
+        }
+    }
+
+    @SuppressWarnings("resource")
+    public static ChecksummedDataInput upgradeInput(ChecksummedDataInput input, Cipher cipher, ICompressor compressor)
+    {
+        long position = input.getPosition();
+        input.close();
+
+        return new EncryptedChecksummedDataInput(new ChannelProxy(input.getPath()), cipher, compressor, position);
+    }
+
+    @VisibleForTesting
+    Cipher getCipher()
+    {
+        return cipher;
+    }
+
+    @VisibleForTesting
+    ICompressor getCompressor()
+    {
+        return compressor;
+    }
+}

diff --git a/src/java/org/apache/cassandra/hints/EncryptedHintsWriter.java b/src/java/org/apache/cassandra/hints/EncryptedHintsWriter.java
new file mode 100644
index 0000000..4786d9c
--- /dev/null
+++ b/src/java/org/apache/cassandra/hints/EncryptedHintsWriter.java

@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.hints;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.util.zip.CRC32;
+import javax.crypto.Cipher;
+
+import com.google.common.annotations.VisibleForTesting;
+
+import org.apache.cassandra.security.EncryptionUtils;
+import org.apache.cassandra.io.compress.ICompressor;
+
+import static org.apache.cassandra.utils.FBUtilities.updateChecksum;
+
+public class EncryptedHintsWriter extends HintsWriter
+{
+    private final Cipher cipher;
+    private final ICompressor compressor;
+    private volatile ByteBuffer byteBuffer;
+
+    protected EncryptedHintsWriter(File directory, HintsDescriptor descriptor, File file, FileChannel channel, int fd, CRC32 globalCRC)
+    {
+        super(directory, descriptor, file, channel, fd, globalCRC);
+        cipher = descriptor.getCipher();
+        compressor = descriptor.createCompressor();
+    }
+
+    protected void writeBuffer(ByteBuffer input) throws IOException
+    {
+        byteBuffer = EncryptionUtils.compress(input, byteBuffer, true, compressor);
+        ByteBuffer output = EncryptionUtils.encryptAndWrite(byteBuffer, channel, true, cipher);
+        updateChecksum(globalCRC, output);
+    }
+
+    @VisibleForTesting
+    Cipher getCipher()
+    {
+        return cipher;
+    }
+
+    @VisibleForTesting
+    ICompressor getCompressor()
+    {
+        return compressor;
+    }
+}

diff --git a/src/java/org/apache/cassandra/hints/HintVerbHandler.java b/src/java/org/apache/cassandra/hints/HintVerbHandler.java
index d8838a9..4fbd496 100644
--- a/src/java/org/apache/cassandra/hints/HintVerbHandler.java
+++ b/src/java/org/apache/cassandra/hints/HintVerbHandler.java

@@ -68,7 +68,7 @@
         }
         catch (MarshalException e)
         {
-            logger.warn("Failed to validate a hint for {} (table id {}) - skipped", hostId);
+            logger.warn("Failed to validate a hint for {} - skipped", hostId);
             reply(id, message.from);
             return;
         }

diff --git a/src/java/org/apache/cassandra/hints/HintsDescriptor.java b/src/java/org/apache/cassandra/hints/HintsDescriptor.java
index f5296b3..8a3ee8b 100644
--- a/src/java/org/apache/cassandra/hints/HintsDescriptor.java
+++ b/src/java/org/apache/cassandra/hints/HintsDescriptor.java

@@ -22,15 +22,20 @@
 import java.io.RandomAccessFile;
 import java.nio.charset.StandardCharsets;
 import java.nio.file.Path;
+import java.util.HashMap;
 import java.util.Map;
 import java.util.UUID;
 import java.util.regex.Pattern;
 import java.util.zip.CRC32;
+import javax.crypto.Cipher;
 
 import com.google.common.base.MoreObjects;
 import com.google.common.base.Objects;
 import com.google.common.collect.ImmutableMap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
+import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.ParameterizedClass;
 import org.apache.cassandra.db.TypeSizes;
 import org.apache.cassandra.io.FSReadError;
@@ -38,6 +43,8 @@
 import org.apache.cassandra.io.util.DataOutputPlus;
 import org.apache.cassandra.net.MessagingService;
 import org.apache.cassandra.schema.CompressionParams;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.utils.Hex;
 import org.json.simple.JSONValue;
 
 import static org.apache.cassandra.utils.FBUtilities.updateChecksumInt;
@@ -50,10 +57,13 @@
  */
 final class HintsDescriptor
 {
+    private static final Logger logger = LoggerFactory.getLogger(HintsDescriptor.class);
+
     static final int VERSION_30 = 1;
     static final int CURRENT_VERSION = VERSION_30;
 
     static final String COMPRESSION = "compression";
+    static final String ENCRYPTION = "encryption";
 
     static final Pattern pattern =
         Pattern.compile("^[a-fA-F0-9]{8}\\-[a-fA-F0-9]{4}\\-[a-fA-F0-9]{4}\\-[a-fA-F0-9]{4}\\-[a-fA-F0-9]{12}\\-(\\d+)\\-(\\d+)\\.hints$");
@@ -62,17 +72,35 @@
     final int version;
     final long timestamp;
 
-    // implemented for future compression support - see CASSANDRA-9428
     final ImmutableMap<String, Object> parameters;
     final ParameterizedClass compressionConfig;
 
+    private final Cipher cipher;
+    private final ICompressor compressor;
+
     HintsDescriptor(UUID hostId, int version, long timestamp, ImmutableMap<String, Object> parameters)
     {
         this.hostId = hostId;
         this.version = version;
         this.timestamp = timestamp;
-        this.parameters = parameters;
         compressionConfig = createCompressionConfig(parameters);
+
+        EncryptionData encryption = createEncryption(parameters);
+        if (encryption == null)
+        {
+            cipher = null;
+            compressor = null;
+        }
+        else
+        {
+            if (compressionConfig != null)
+                throw new IllegalStateException("a hints file cannot be configured for both compression and encryption");
+            cipher = encryption.cipher;
+            compressor = encryption.compressor;
+            parameters = encryption.params;
+        }
+
+        this.parameters = parameters;
     }
 
     HintsDescriptor(UUID hostId, long timestamp, ImmutableMap<String, Object> parameters)
@@ -100,6 +128,71 @@
         }
     }
 
+    /**
+     * Create, if necessary, the required encryption components (for either decrpyt or encrypt operations).
+     * Note that in the case of encyption (this is, when writing out a new hints file), we need to write
+     * the cipher's IV out to the header so it can be used when decrypting. Thus, we need to add an additional
+     * entry to the {@code params} map.
+     *
+     * @param params the base parameters into the descriptor.
+     * @return null if not using encryption; else, the initialized {@link Cipher} and a possibly updated version
+     * of the {@code params} map.
+     */
+    @SuppressWarnings("unchecked")
+    static EncryptionData createEncryption(ImmutableMap<String, Object> params)
+    {
+        if (params.containsKey(ENCRYPTION))
+        {
+            Map<?, ?> encryptionConfig = (Map<?, ?>) params.get(ENCRYPTION);
+            EncryptionContext encryptionContext = EncryptionContext.createFromMap(encryptionConfig, DatabaseDescriptor.getEncryptionContext());
+
+            try
+            {
+                Cipher cipher;
+                if (encryptionConfig.containsKey(EncryptionContext.ENCRYPTION_IV))
+                {
+                    cipher = encryptionContext.getDecryptor();
+                }
+                else
+                {
+                    cipher = encryptionContext.getEncryptor();
+                    ImmutableMap<String, Object> encParams = ImmutableMap.<String, Object>builder()
+                                                                 .putAll(encryptionContext.toHeaderParameters())
+                                                                 .put(EncryptionContext.ENCRYPTION_IV, Hex.bytesToHex(cipher.getIV()))
+                                                                 .build();
+
+                    Map<String, Object> map = new HashMap<>(params);
+                    map.put(ENCRYPTION, encParams);
+                    params = ImmutableMap.<String, Object>builder().putAll(map).build();
+                }
+                return new EncryptionData(cipher, encryptionContext.getCompressor(), params);
+            }
+            catch (IOException ioe)
+            {
+                logger.warn("failed to create encyption context for hints file. ignoring encryption for hints.", ioe);
+                return null;
+            }
+        }
+        else
+        {
+            return null;
+        }
+    }
+
+    private static final class EncryptionData
+    {
+        final Cipher cipher;
+        final ICompressor compressor;
+        final ImmutableMap<String, Object> params;
+
+        private EncryptionData(Cipher cipher, ICompressor compressor, ImmutableMap<String, Object> params)
+        {
+            this.cipher = cipher;
+            this.compressor = compressor;
+            this.params = params;
+        }
+    }
+
     String fileName()
     {
         return String.format("%s-%s-%s.hints", hostId, timestamp, version);
@@ -148,9 +241,23 @@
         return compressionConfig != null;
     }
 
+    public boolean isEncrypted()
+    {
+        return cipher != null;
+    }
+
     public ICompressor createCompressor()
     {
-        return isCompressed() ? CompressionParams.createCompressor(compressionConfig) : null;
+        if (isCompressed())
+            return CompressionParams.createCompressor(compressionConfig);
+        if (isEncrypted())
+            return compressor;
+        return null;
+    }
+
+    public Cipher getCipher()
+    {
+        return isEncrypted() ? cipher : null;
     }
 
     @Override

diff --git a/src/java/org/apache/cassandra/hints/HintsReader.java b/src/java/org/apache/cassandra/hints/HintsReader.java
index fe2b57a..5e73805 100644
--- a/src/java/org/apache/cassandra/hints/HintsReader.java
+++ b/src/java/org/apache/cassandra/hints/HintsReader.java

@@ -83,6 +83,8 @@
                 // The compressed input is instantiated with the uncompressed input's position
                 reader = CompressedChecksummedDataInput.upgradeInput(reader, descriptor.createCompressor());
             }
+            else if (descriptor.isEncrypted())
+                reader = EncryptedChecksummedDataInput.upgradeInput(reader, descriptor.getCipher(), descriptor.createCompressor());
             return new HintsReader(descriptor, file, reader, rateLimiter);
         }
         catch (IOException e)
@@ -109,7 +111,7 @@
 
     void seek(long newPosition)
     {
-        input.seek(newPosition);
+        throw new UnsupportedOperationException("Hints are not seekable.");
     }
 
     public Iterator<Page> iterator()
@@ -147,12 +149,12 @@
         @SuppressWarnings("resource")
         protected Page computeNext()
         {
-            CLibrary.trySkipCache(input.getChannel().getFileDescriptor(), 0, input.getFilePointer(), input.getPath());
+            CLibrary.trySkipCache(input.getChannel().getFileDescriptor(), 0, input.getSourcePosition(), input.getPath());
 
             if (input.isEOF())
                 return endOfData();
 
-            return new Page(input.getFilePointer());
+            return new Page(input.getSourcePosition());
         }
     }
 
@@ -175,7 +177,7 @@
 
             do
             {
-                long position = input.getFilePointer();
+                long position = input.getSourcePosition();
 
                 if (input.isEOF())
                     return endOfData(); // reached EOF
@@ -265,7 +267,7 @@
 
             do
             {
-                long position = input.getFilePointer();
+                long position = input.getSourcePosition();
 
                 if (input.isEOF())
                     return endOfData(); // reached EOF

diff --git a/src/java/org/apache/cassandra/hints/HintsWriter.java b/src/java/org/apache/cassandra/hints/HintsWriter.java
index 8836258..ae9e05a 100644
--- a/src/java/org/apache/cassandra/hints/HintsWriter.java
+++ b/src/java/org/apache/cassandra/hints/HintsWriter.java

@@ -33,7 +33,6 @@
 import org.apache.cassandra.io.FSWriteError;
 import org.apache.cassandra.io.util.DataOutputBuffer;
 import org.apache.cassandra.io.util.DataOutputBufferFixed;
-import org.apache.cassandra.net.MessagingService;
 import org.apache.cassandra.utils.CLibrary;
 import org.apache.cassandra.utils.SyncUtil;
 import org.apache.cassandra.utils.Throwables;
@@ -49,9 +48,9 @@
     private final File directory;
     private final HintsDescriptor descriptor;
     private final File file;
-    private final FileChannel channel;
+    protected final FileChannel channel;
     private final int fd;
-    private final CRC32 globalCRC;
+    protected final CRC32 globalCRC;
 
     private volatile long lastSyncPosition = 0L;
 
@@ -75,7 +74,8 @@
 
         CRC32 crc = new CRC32();
 
-        try (DataOutputBuffer dob = new DataOutputBuffer())
+        DataOutputBuffer dob = null;
+        try (DataOutputBuffer ignored = dob = DataOutputBuffer.RECYCLER.get())
         {
             // write the descriptor
             descriptor.serialize(dob);
@@ -88,15 +88,16 @@
             channel.close();
             throw e;
         }
+        finally
+        {
+            dob.recycle();
+        }
 
+        if (descriptor.isEncrypted())
+            return new EncryptedHintsWriter(directory, descriptor, file, channel, fd, crc);
         if (descriptor.isCompressed())
-        {
             return new CompressedHintsWriter(directory, descriptor, file, channel, fd, crc);
-        }
-        else
-        {
-            return new HintsWriter(directory, descriptor, file, channel, fd, crc);
-        }
+        return new HintsWriter(directory, descriptor, file, channel, fd, crc);
     }
 
     HintsDescriptor descriptor()

diff --git a/src/java/org/apache/cassandra/hints/LegacyHintsMigrator.java b/src/java/org/apache/cassandra/hints/LegacyHintsMigrator.java
index 30e5fe0..93c1193 100644
--- a/src/java/org/apache/cassandra/hints/LegacyHintsMigrator.java
+++ b/src/java/org/apache/cassandra/hints/LegacyHintsMigrator.java

@@ -213,7 +213,7 @@
         }
         catch (IOException e)
         {
-            logger.error("Failed to migrate a hint for {} from legacy {}.{} table: {}",
+            logger.error("Failed to migrate a hint for {} from legacy {}.{} table",
                          row.getUUID("target_id"),
                          SystemKeyspace.NAME,
                          SystemKeyspace.LEGACY_HINTS,
@@ -222,7 +222,7 @@
         }
         catch (MarshalException e)
         {
-            logger.warn("Failed to validate a hint for {} (table id {}) from legacy {}.{} table - skipping: {})",
+            logger.warn("Failed to validate a hint for {} from legacy {}.{} table - skipping",
                         row.getUUID("target_id"),
                         SystemKeyspace.NAME,
                         SystemKeyspace.LEGACY_HINTS,

diff --git a/src/java/org/apache/cassandra/index/Index.java b/src/java/org/apache/cassandra/index/Index.java
index 469ef07..251d331 100644
--- a/src/java/org/apache/cassandra/index/Index.java
+++ b/src/java/org/apache/cassandra/index/Index.java

@@ -20,13 +20,16 @@
  */
 package org.apache.cassandra.index;
 
+import java.util.Collection;
 import java.util.Optional;
+import java.util.Set;
 import java.util.concurrent.Callable;
 import java.util.function.BiFunction;
 
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.Operator;
 import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.compaction.OperationType;
 import org.apache.cassandra.db.filter.RowFilter;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.partitions.PartitionIterator;
@@ -34,7 +37,12 @@
 import org.apache.cassandra.db.partitions.UnfilteredPartitionIterator;
 import org.apache.cassandra.db.rows.Row;
 import org.apache.cassandra.exceptions.InvalidRequestException;
+import org.apache.cassandra.index.internal.CollatedViewIndexBuilder;
 import org.apache.cassandra.index.transactions.IndexTransaction;
+import org.apache.cassandra.io.sstable.Descriptor;
+import org.apache.cassandra.io.sstable.ReducingKeyIterator;
+import org.apache.cassandra.io.sstable.format.SSTableFlushObserver;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.schema.IndexMetadata;
 import org.apache.cassandra.utils.concurrent.OpOrder;
 
@@ -67,7 +75,7 @@
  * scheduling its execution can rest with SecondaryIndexManager. For instance, a task like reloading index metadata
  * following potential updates caused by modifications to the base table may be performed in a blocking way. In
  * contrast, adding a new index may require it to be built from existing SSTable data, a potentially expensive task
- * which should be performed asyncronously.
+ * which should be performed asynchronously.
  *
  * Index Selection:
  * There are two facets to index selection, write time and read time selection. The former is concerned with
@@ -100,7 +108,7 @@
  * whether any of them are supported by a registered Index. supportsExpression is used to filter out Indexes which
  * cannot support a given Expression. After filtering, the set of candidate indexes are ranked according to the result
  * of getEstimatedResultRows and the most selective (i.e. the one expected to return the smallest number of results) is
- * chosen. A Searcher instance is then obtained from the searcherFor method & used to perform the actual Index lookup.
+ * chosen. A Searcher instance is then obtained from the searcherFor method and used to perform the actual Index lookup.
  * Finally, Indexes can define a post processing step to be performed on the coordinator, after results (partitions from
  * the primary table) have been received from replicas and reconciled. This post processing is defined as a
  * java.util.functions.BiFunction<PartitionIterator, RowFilter, PartitionIterator>, that is a function which takes as
@@ -130,10 +138,56 @@
 {
 
     /*
+     * Helpers for building indexes from SSTable data
+     */
+
+    /**
+     * Provider of {@code SecondaryIndexBuilder} instances. See {@code getBuildTaskSupport} and
+     * {@code SecondaryIndexManager.buildIndexesBlocking} for more detail.
+     */
+    interface IndexBuildingSupport
+    {
+        SecondaryIndexBuilder getIndexBuildTask(ColumnFamilyStore cfs, Set<Index> indexes, Collection<SSTableReader> sstables);
+    }
+
+    /**
+     * Default implementation of {@code IndexBuildingSupport} which uses a {@code ReducingKeyIterator} to obtain a
+     * collated view of the data in the SSTables.
+     */
+    public static class CollatedViewIndexBuildingSupport implements IndexBuildingSupport
+    {
+        public SecondaryIndexBuilder getIndexBuildTask(ColumnFamilyStore cfs, Set<Index> indexes, Collection<SSTableReader> sstables)
+        {
+            return new CollatedViewIndexBuilder(cfs, indexes, new ReducingKeyIterator(sstables));
+        }
+    }
+
+    /**
+     * Singleton instance of {@code CollatedViewIndexBuildingSupport}, which may be used by any {@code Index}
+     * implementation.
+     */
+    public static final CollatedViewIndexBuildingSupport INDEX_BUILDER_SUPPORT = new CollatedViewIndexBuildingSupport();
+
+    /*
      * Management functions
      */
 
     /**
+     * Get an instance of a helper to provide tasks for building the index from a set of SSTable data.
+     * When processing a number of indexes to be rebuilt, {@code SecondaryIndexManager.buildIndexesBlocking} groups
+     * those with the same {@code IndexBuildingSupport} instance, allowing multiple indexes to be built with a
+     * single pass through the data. The singleton instance returned from the default method implementation builds
+     * indexes using a {@code ReducingKeyIterator} to provide a collated view of the SSTable data.
+     *
+     * @return an instance of the index build taski helper. Index implementations which return <b>the same instance</b>
+     * will be built using a single task.
+     */
+    default IndexBuildingSupport getBuildTaskSupport()
+    {
+        return INDEX_BUILDER_SUPPORT;
+    }
+
+    /**
      * Return a task to perform any initialization work when a new index instance is created.
      * This may involve costly operations such as (re)building the index, and is performed asynchronously
      * by SecondaryIndexManager
@@ -203,13 +257,25 @@
      * false enables the index implementation (or some other component) to control if and when SSTable data is
      * incorporated into the index.
      *
-     * This is called by SecondaryIndexManager in buildIndexBlocking, buildAllIndexesBlocking & rebuildIndexesBlocking
+     * This is called by SecondaryIndexManager in buildIndexBlocking, buildAllIndexesBlocking and rebuildIndexesBlocking
      * where a return value of false causes the index to be exluded from the set of those which will process the
      * SSTable data.
      * @return if the index should be included in the set which processes SSTable data, false otherwise.
      */
     public boolean shouldBuildBlocking();
 
+    /**
+     * Get flush observer to observe partition/cell events generated by flushing SSTable (memtable flush or compaction).
+     *
+     * @param descriptor The descriptor of the sstable observer is requested for.
+     * @param opType The type of the operation which requests observer e.g. memtable flush or compaction.
+     *
+     * @return SSTable flush observer.
+     */
+    default SSTableFlushObserver getFlushObserver(Descriptor descriptor, OperationType opType)
+    {
+        return null;
+    }
 
     /*
      * Index selection
@@ -443,7 +509,7 @@
      * See CASSANDRA-8717 for further discussion.
      *
      * The function takes a PartitionIterator of the results from the replicas which has already been collated
-     * & reconciled, along with the command being executed. It returns another PartitionIterator containing the results
+     * and reconciled, along with the command being executed. It returns another PartitionIterator containing the results
      * of the transformation (which may be the same as the input if the transformation is a no-op).
      */
     public BiFunction<PartitionIterator, ReadCommand, PartitionIterator> postProcessorFor(ReadCommand command);
@@ -464,9 +530,9 @@
     public interface Searcher
     {
         /**
-         * @param orderGroup the collection of OpOrder.Groups which the ReadCommand is being performed under.
+         * @param executionController the collection of OpOrder.Groups which the ReadCommand is being performed under.
          * @return partitions from the base table matching the criteria of the search.
          */
-        public UnfilteredPartitionIterator search(ReadOrderGroup orderGroup);
+        public UnfilteredPartitionIterator search(ReadExecutionController executionController);
     }
 }

diff --git a/src/java/org/apache/cassandra/index/IndexNotAvailableException.java b/src/java/org/apache/cassandra/index/IndexNotAvailableException.java
index 5440e2a..5e5a753 100644
--- a/src/java/org/apache/cassandra/index/IndexNotAvailableException.java
+++ b/src/java/org/apache/cassandra/index/IndexNotAvailableException.java

@@ -25,7 +25,7 @@
 {
     /**
      * Creates a new <code>IndexNotAvailableException</code> for the specified index.
-     * @param name the index name
+     * @param index the index
      */
     public IndexNotAvailableException(Index index)
     {

diff --git a/src/java/org/apache/cassandra/index/SecondaryIndexBuilder.java b/src/java/org/apache/cassandra/index/SecondaryIndexBuilder.java
index e66f0a3..9ec8a4e 100644
--- a/src/java/org/apache/cassandra/index/SecondaryIndexBuilder.java
+++ b/src/java/org/apache/cassandra/index/SecondaryIndexBuilder.java

@@ -17,61 +17,12 @@
  */
 package org.apache.cassandra.index;
 
-import java.io.IOException;
-import java.util.Set;
-import java.util.UUID;
-
-import org.apache.cassandra.db.ColumnFamilyStore;
-import org.apache.cassandra.db.DecoratedKey;
-import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.db.compaction.CompactionInfo;
-import org.apache.cassandra.db.compaction.CompactionInterruptedException;
-import org.apache.cassandra.db.compaction.OperationType;
-import org.apache.cassandra.io.sstable.ReducingKeyIterator;
-import org.apache.cassandra.utils.UUIDGen;
 
 /**
  * Manages building an entire index from column family data. Runs on to compaction manager.
  */
-public class SecondaryIndexBuilder extends CompactionInfo.Holder
+public abstract class SecondaryIndexBuilder extends CompactionInfo.Holder
 {
-    private final ColumnFamilyStore cfs;
-    private final Set<Index> indexers;
-    private final ReducingKeyIterator iter;
-    private final UUID compactionId;
-
-    public SecondaryIndexBuilder(ColumnFamilyStore cfs, Set<Index> indexers, ReducingKeyIterator iter)
-    {
-        this.cfs = cfs;
-        this.indexers = indexers;
-        this.iter = iter;
-        this.compactionId = UUIDGen.getTimeUUID();
-    }
-
-    public CompactionInfo getCompactionInfo()
-    {
-        return new CompactionInfo(cfs.metadata,
-                                  OperationType.INDEX_BUILD,
-                                  iter.getBytesRead(),
-                                  iter.getTotalBytes(),
-                                  compactionId);
-    }
-
-    public void build()
-    {
-        try
-        {
-            while (iter.hasNext())
-            {
-                if (isStopRequested())
-                    throw new CompactionInterruptedException(getCompactionInfo());
-                DecoratedKey key = iter.next();
-                Keyspace.indexPartition(key, cfs, indexers);
-            }
-        }
-        finally
-        {
-            iter.close();
-        }
-    }
+    public abstract void build();
 }

diff --git a/src/java/org/apache/cassandra/index/SecondaryIndexManager.java b/src/java/org/apache/cassandra/index/SecondaryIndexManager.java
index 6dfdeee..8de846f 100644
--- a/src/java/org/apache/cassandra/index/SecondaryIndexManager.java
+++ b/src/java/org/apache/cassandra/index/SecondaryIndexManager.java

@@ -52,7 +52,6 @@
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.index.internal.CassandraIndex;
 import org.apache.cassandra.index.transactions.*;
-import org.apache.cassandra.io.sstable.ReducingKeyIterator;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.schema.IndexMetadata;
 import org.apache.cassandra.schema.Indexes;
@@ -360,11 +359,20 @@
                     indexes.stream().map(i -> i.getIndexMetadata().name).collect(Collectors.joining(",")),
                     sstables.stream().map(SSTableReader::toString).collect(Collectors.joining(",")));
 
-        SecondaryIndexBuilder builder = new SecondaryIndexBuilder(baseCfs,
-                                                                  indexes,
-                                                                  new ReducingKeyIterator(sstables));
-        Future<?> future = CompactionManager.instance.submitIndexBuild(builder);
-        FBUtilities.waitOnFuture(future);
+        Map<Index.IndexBuildingSupport, Set<Index>> byType = new HashMap<>();
+        for (Index index : indexes)
+        {
+            Set<Index> stored = byType.computeIfAbsent(index.getBuildTaskSupport(), i -> new HashSet<>());
+            stored.add(index);
+        }
+
+        List<Future<?>> futures = byType.entrySet()
+                                        .stream()
+                                        .map((e) -> e.getKey().getIndexBuildTask(baseCfs, e.getValue(), sstables))
+                                        .map(CompactionManager.instance::submitIndexBuild)
+                                        .collect(Collectors.toList());
+
+        FBUtilities.waitOnFutures(futures);
 
         flushIndexesBlocking(indexes);
         logger.info("Index build of {} complete",
@@ -553,7 +561,7 @@
      * Delete all data from all indexes for this partition.
      * For when cleanup rips a partition out entirely.
      *
-     * TODO : improve cleanup transaction to batch updates & perform them async
+     * TODO : improve cleanup transaction to batch updates and perform them async
      */
     public void deletePartition(UnfilteredRowIterator partition, int nowInSec)
     {
@@ -624,7 +632,7 @@
                 Tracing.trace("Command contains a custom index expression, using target index {}", customExpression.getTargetIndex().name);
                 return indexes.get(customExpression.getTargetIndex().name);
             }
-            else
+            else if (!expression.isUserDefined())
             {
                 indexes.values().stream()
                        .filter(index -> index.supportsExpression(expression.column(), expression.operator()))
@@ -657,6 +665,11 @@
         return selected;
     }
 
+    public Optional<Index> getBestIndexFor(RowFilter.Expression expression)
+    {
+        return indexes.values().stream().filter((i) -> i.supportsExpression(expression.column(), expression.operator())).findFirst();
+    }
+
     /**
      * Called at write time to ensure that values present in the update
      * are valid according to the rules of all registered indexes which
@@ -1037,6 +1050,12 @@
 
     private static void executeAllBlocking(Stream<Index> indexers, Function<Index, Callable<?>> function)
     {
+        if (function == null)
+        {
+            logger.error("failed to flush indexes: {} because flush task is missing.", indexers);
+            return;
+        }
+
         List<Future<?>> waitFor = new ArrayList<>();
         indexers.forEach(indexer -> {
             Callable<?> task = function.apply(indexer);

diff --git a/src/java/org/apache/cassandra/index/TargetParser.java b/src/java/org/apache/cassandra/index/TargetParser.java
new file mode 100644
index 0000000..849ad16
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/TargetParser.java

@@ -0,0 +1,90 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index;
+
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import org.apache.commons.lang3.StringUtils;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.cql3.ColumnIdentifier;
+import org.apache.cassandra.cql3.statements.IndexTarget;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.schema.IndexMetadata;
+import org.apache.cassandra.utils.Pair;
+
+public class TargetParser
+{
+    private static final Pattern TARGET_REGEX = Pattern.compile("^(keys|entries|values|full)\\((.+)\\)$");
+    private static final Pattern TWO_QUOTES = Pattern.compile("\"\"");
+    private static final String QUOTE = "\"";
+
+    public static Pair<ColumnDefinition, IndexTarget.Type> parse(CFMetaData cfm, IndexMetadata indexDef)
+    {
+        String target = indexDef.options.get("target");
+        assert target != null : String.format("No target definition found for index %s", indexDef.name);
+        Pair<ColumnDefinition, IndexTarget.Type> result = parse(cfm, target);
+        if (result == null)
+            throw new ConfigurationException(String.format("Unable to parse targets for index %s (%s)", indexDef.name, target));
+        return result;
+    }
+
+    public static Pair<ColumnDefinition, IndexTarget.Type> parse(CFMetaData cfm, String target)
+    {
+        // if the regex matches then the target is in the form "keys(foo)", "entries(bar)" etc
+        // if not, then it must be a simple column name and implictly its type is VALUES
+        Matcher matcher = TARGET_REGEX.matcher(target);
+        String columnName;
+        IndexTarget.Type targetType;
+        if (matcher.matches())
+        {
+            targetType = IndexTarget.Type.fromString(matcher.group(1));
+            columnName = matcher.group(2);
+        }
+        else
+        {
+            columnName = target;
+            targetType = IndexTarget.Type.VALUES;
+        }
+
+        // in the case of a quoted column name the name in the target string
+        // will be enclosed in quotes, which we need to unwrap. It may also
+        // include quote characters internally, escaped like so:
+        //      abc"def -> abc""def.
+        // Because the target string is stored in a CQL compatible form, we
+        // need to un-escape any such quotes to get the actual column name
+        if (columnName.startsWith(QUOTE))
+        {
+            columnName = StringUtils.substring(StringUtils.substring(columnName, 1), 0, -1);
+            columnName = TWO_QUOTES.matcher(columnName).replaceAll(QUOTE);
+        }
+
+        // if it's not a CQL table, we can't assume that the column name is utf8, so
+        // in that case we have to do a linear scan of the cfm's columns to get the matching one
+        if (cfm.isCQLTable())
+            return Pair.create(cfm.getColumnDefinition(new ColumnIdentifier(columnName, true)), targetType);
+        else
+            for (ColumnDefinition column : cfm.allColumns())
+                if (column.name.toString().equals(columnName))
+                    return Pair.create(column, targetType);
+
+        return null;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/internal/CassandraIndex.java b/src/java/org/apache/cassandra/index/internal/CassandraIndex.java
index 2a0dec0..70aaf0d 100644
--- a/src/java/org/apache/cassandra/index/internal/CassandraIndex.java
+++ b/src/java/org/apache/cassandra/index/internal/CassandraIndex.java

@@ -25,19 +25,16 @@
 import java.util.concurrent.Callable;
 import java.util.concurrent.Future;
 import java.util.function.BiFunction;
-import java.util.regex.Matcher;
-import java.util.regex.Pattern;
 import java.util.stream.Collectors;
 import java.util.stream.StreamSupport;
 
 import com.google.common.collect.ImmutableSet;
-import org.apache.commons.lang3.StringUtils;
+
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
-import org.apache.cassandra.cql3.ColumnIdentifier;
 import org.apache.cassandra.cql3.Operator;
 import org.apache.cassandra.cql3.statements.IndexTarget;
 import org.apache.cassandra.db.*;
@@ -53,9 +50,7 @@
 import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.dht.LocalPartitioner;
 import org.apache.cassandra.exceptions.InvalidRequestException;
-import org.apache.cassandra.index.Index;
-import org.apache.cassandra.index.IndexRegistry;
-import org.apache.cassandra.index.SecondaryIndexBuilder;
+import org.apache.cassandra.index.*;
 import org.apache.cassandra.index.internal.composites.CompositesSearcher;
 import org.apache.cassandra.index.internal.keys.KeysSearcher;
 import org.apache.cassandra.index.transactions.IndexTransaction;
@@ -78,8 +73,6 @@
 {
     private static final Logger logger = LoggerFactory.getLogger(CassandraIndex.class);
 
-    public static final Pattern TARGET_REGEX = Pattern.compile("^(keys|entries|values|full)\\((.+)\\)$");
-
     public final ColumnFamilyStore baseCfs;
     protected IndexMetadata metadata;
     protected ColumnFamilyStore indexCfs;
@@ -230,7 +223,7 @@
     private void setMetadata(IndexMetadata indexDef)
     {
         metadata = indexDef;
-        Pair<ColumnDefinition, IndexTarget.Type> target = parseTarget(baseCfs.metadata, indexDef);
+        Pair<ColumnDefinition, IndexTarget.Type> target = TargetParser.parse(baseCfs.metadata, indexDef);
         functions = getFunctions(indexDef, target);
         CFMetaData cfm = indexCfsMetadata(baseCfs.metadata, indexDef);
         indexCfs = ColumnFamilyStore.createColumnFamilyStore(baseCfs.keyspace,
@@ -305,7 +298,6 @@
 
         if (target.isPresent())
         {
-            target.get().validateForIndexing();
             switch (getIndexMetadata().kind)
             {
                 case COMPOSITES:
@@ -452,7 +444,7 @@
                 insert(key.getKey(),
                        clustering,
                        cell,
-                       LivenessInfo.create(cell.timestamp(), cell.ttl(), cell.localDeletionTime()),
+                       LivenessInfo.withExpirationTime(cell.timestamp(), cell.ttl(), cell.localDeletionTime()),
                        opGroup);
             }
 
@@ -500,7 +492,7 @@
                         }
                     }
                 }
-                return LivenessInfo.create(baseCfs.metadata, timestamp, ttl, nowInSec);
+                return LivenessInfo.create(timestamp, ttl, nowInSec);
             }
         };
     }
@@ -716,9 +708,9 @@
                         metadata.name,
                         getSSTableNames(sstables));
 
-            SecondaryIndexBuilder builder = new SecondaryIndexBuilder(baseCfs,
-                                                                      Collections.singleton(this),
-                                                                      new ReducingKeyIterator(sstables));
+            SecondaryIndexBuilder builder = new CollatedViewIndexBuilder(baseCfs,
+                                                                         Collections.singleton(this),
+                                                                         new ReducingKeyIterator(sstables));
             Future<?> future = CompactionManager.instance.submitIndexBuild(builder);
             FBUtilities.waitOnFuture(future);
             indexCfs.forceBlockingFlush();
@@ -743,7 +735,7 @@
      */
     public static final CFMetaData indexCfsMetadata(CFMetaData baseCfsMetadata, IndexMetadata indexMetadata)
     {
-        Pair<ColumnDefinition, IndexTarget.Type> target = parseTarget(baseCfsMetadata, indexMetadata);
+        Pair<ColumnDefinition, IndexTarget.Type> target = TargetParser.parse(baseCfsMetadata, indexMetadata);
         CassandraIndexFunctions utils = getFunctions(indexMetadata, target);
         ColumnDefinition indexedColumn = target.left;
         AbstractType<?> indexedValueType = utils.getIndexedValueType(indexedColumn);
@@ -787,54 +779,7 @@
      */
     public static CassandraIndex newIndex(ColumnFamilyStore baseCfs, IndexMetadata indexMetadata)
     {
-        return getFunctions(indexMetadata, parseTarget(baseCfs.metadata, indexMetadata)).newIndexInstance(baseCfs, indexMetadata);
-    }
-
-    // Public because it's also used to convert index metadata into a thrift-compatible format
-    public static Pair<ColumnDefinition, IndexTarget.Type> parseTarget(CFMetaData cfm,
-                                                                       IndexMetadata indexDef)
-    {
-        String target = indexDef.options.get("target");
-        assert target != null : String.format("No target definition found for index %s", indexDef.name);
-
-        // if the regex matches then the target is in the form "keys(foo)", "entries(bar)" etc
-        // if not, then it must be a simple column name and implictly its type is VALUES
-        Matcher matcher = TARGET_REGEX.matcher(target);
-        String columnName;
-        IndexTarget.Type targetType;
-        if (matcher.matches())
-        {
-            targetType = IndexTarget.Type.fromString(matcher.group(1));
-            columnName = matcher.group(2);
-        }
-        else
-        {
-            columnName = target;
-            targetType = IndexTarget.Type.VALUES;
-        }
-
-        // in the case of a quoted column name the name in the target string
-        // will be enclosed in quotes, which we need to unwrap. It may also
-        // include quote characters internally, escaped like so:
-        //      abc"def -> abc""def.
-        // Because the target string is stored in a CQL compatible form, we
-        // need to un-escape any such quotes to get the actual column name
-        if (columnName.startsWith("\""))
-        {
-            columnName = StringUtils.substring(StringUtils.substring(columnName, 1), 0, -1);
-            columnName = columnName.replaceAll("\"\"", "\"");
-        }
-
-        // if it's not a CQL table, we can't assume that the column name is utf8, so
-        // in that case we have to do a linear scan of the cfm's columns to get the matching one
-        if (cfm.isCQLTable())
-            return Pair.create(cfm.getColumnDefinition(new ColumnIdentifier(columnName, true)), targetType);
-        else
-            for (ColumnDefinition column : cfm.allColumns())
-                if (column.name.toString().equals(columnName))
-                    return Pair.create(column, targetType);
-
-        throw new RuntimeException(String.format("Unable to parse targets for index %s (%s)", indexDef.name, target));
+        return getFunctions(indexMetadata, TargetParser.parse(baseCfs.metadata, indexMetadata)).newIndexInstance(baseCfs, indexMetadata);
     }
 
     static CassandraIndexFunctions getFunctions(IndexMetadata indexDef,
@@ -871,6 +816,7 @@
             case CLUSTERING:
                 return CassandraIndexFunctions.CLUSTERING_COLUMN_INDEX_FUNCTIONS;
             case REGULAR:
+            case STATIC:
                 return CassandraIndexFunctions.REGULAR_COLUMN_INDEX_FUNCTIONS;
             case PARTITION_KEY:
                 return CassandraIndexFunctions.PARTITION_KEY_INDEX_FUNCTIONS;

diff --git a/src/java/org/apache/cassandra/index/internal/CassandraIndexFunctions.java b/src/java/org/apache/cassandra/index/internal/CassandraIndexFunctions.java
index b7cb3a2..89eebdf 100644
--- a/src/java/org/apache/cassandra/index/internal/CassandraIndexFunctions.java
+++ b/src/java/org/apache/cassandra/index/internal/CassandraIndexFunctions.java

@@ -54,7 +54,7 @@
     /**
      * Add the clustering columns for a specific type of index table to the a CFMetaData.Builder (which is being
      * used to construct the index table's CFMetadata. In the default implementation, the clustering columns of the
-     * index table hold the partition key & clustering columns of the base table. This is overridden in several cases:
+     * index table hold the partition key and clustering columns of the base table. This is overridden in several cases:
      * * When the indexed value is itself a clustering column, in which case, we only need store the base table's
      *   *other* clustering values in the index - the indexed value being the index table's partition key
      * * When the indexed value is a collection value, in which case we also need to capture the cell path from the base

diff --git a/src/java/org/apache/cassandra/index/internal/CassandraIndexSearcher.java b/src/java/org/apache/cassandra/index/internal/CassandraIndexSearcher.java
index d6b39e6..7b622e3 100644
--- a/src/java/org/apache/cassandra/index/internal/CassandraIndexSearcher.java
+++ b/src/java/org/apache/cassandra/index/internal/CassandraIndexSearcher.java

@@ -56,14 +56,14 @@
 
     @SuppressWarnings("resource") // Both the OpOrder and 'indexIter' are closed on exception, or through the closing of the result
     // of this method.
-    public UnfilteredPartitionIterator search(ReadOrderGroup orderGroup)
+    public UnfilteredPartitionIterator search(ReadExecutionController executionController)
     {
         // the value of the index expression is the partition key in the index table
         DecoratedKey indexKey = index.getBackingTable().get().decorateKey(expression.getIndexValue());
-        UnfilteredRowIterator indexIter = queryIndex(indexKey, command, orderGroup);
+        UnfilteredRowIterator indexIter = queryIndex(indexKey, command, executionController);
         try
         {
-            return queryDataFromIndex(indexKey, UnfilteredRowIterators.filter(indexIter, command.nowInSec()), command, orderGroup);
+            return queryDataFromIndex(indexKey, UnfilteredRowIterators.filter(indexIter, command.nowInSec()), command, executionController);
         }
         catch (RuntimeException | Error e)
         {
@@ -72,13 +72,13 @@
         }
     }
 
-    private UnfilteredRowIterator queryIndex(DecoratedKey indexKey, ReadCommand command, ReadOrderGroup orderGroup)
+    private UnfilteredRowIterator queryIndex(DecoratedKey indexKey, ReadCommand command, ReadExecutionController executionController)
     {
         ClusteringIndexFilter filter = makeIndexFilter(command);
         ColumnFamilyStore indexCfs = index.getBackingTable().get();
         CFMetaData indexCfm = indexCfs.metadata;
         return SinglePartitionReadCommand.create(indexCfm, command.nowInSec(), indexKey, ColumnFilter.all(indexCfm), filter)
-                                         .queryMemtableAndDisk(indexCfs, orderGroup.indexReadOpOrderGroup());
+                                         .queryMemtableAndDisk(indexCfs, executionController.indexReadController());
     }
 
     private ClusteringIndexFilter makeIndexFilter(ReadCommand command)
@@ -132,8 +132,8 @@
                     DecoratedKey startKey = (DecoratedKey) range.left;
                     DecoratedKey endKey = (DecoratedKey) range.right;
 
-                    Slice.Bound start = Slice.Bound.BOTTOM;
-                    Slice.Bound end = Slice.Bound.TOP;
+                    ClusteringBound start = ClusteringBound.BOTTOM;
+                    ClusteringBound end = ClusteringBound.TOP;
 
                     /*
                      * For index queries over a range, we can't do a whole lot better than querying everything for the key range, though for
@@ -166,15 +166,15 @@
                 else
                 {
                     // otherwise, just start the index slice from the key we do have
-                    slice = Slice.make(makeIndexBound(((DecoratedKey)range.left).getKey(), Slice.Bound.BOTTOM),
-                                       Slice.Bound.TOP);
+                    slice = Slice.make(makeIndexBound(((DecoratedKey)range.left).getKey(), ClusteringBound.BOTTOM),
+                                       ClusteringBound.TOP);
                 }
             }
             return new ClusteringIndexSliceFilter(Slices.with(index.getIndexComparator(), slice), false);
         }
     }
 
-    private Slice.Bound makeIndexBound(ByteBuffer rowKey, Slice.Bound bound)
+    private ClusteringBound makeIndexBound(ByteBuffer rowKey, ClusteringBound bound)
     {
         return index.buildIndexClusteringPrefix(rowKey, bound, null)
                                  .buildBound(bound.isStart(), bound.isInclusive());
@@ -188,5 +188,5 @@
     protected abstract UnfilteredPartitionIterator queryDataFromIndex(DecoratedKey indexKey,
                                                                       RowIterator indexHits,
                                                                       ReadCommand command,
-                                                                      ReadOrderGroup orderGroup);
+                                                                      ReadExecutionController executionController);
 }

diff --git a/src/java/org/apache/cassandra/index/internal/CollatedViewIndexBuilder.java b/src/java/org/apache/cassandra/index/internal/CollatedViewIndexBuilder.java
new file mode 100644
index 0000000..8ea7a68
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/internal/CollatedViewIndexBuilder.java

@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.internal;
+
+import java.util.Set;
+import java.util.UUID;
+
+import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.Keyspace;
+import org.apache.cassandra.db.compaction.CompactionInfo;
+import org.apache.cassandra.db.compaction.CompactionInterruptedException;
+import org.apache.cassandra.db.compaction.OperationType;
+import org.apache.cassandra.index.Index;
+import org.apache.cassandra.index.SecondaryIndexBuilder;
+import org.apache.cassandra.io.sstable.ReducingKeyIterator;
+import org.apache.cassandra.utils.UUIDGen;
+
+/**
+ * Manages building an entire index from column family data. Runs on to compaction manager.
+ */
+public class CollatedViewIndexBuilder extends SecondaryIndexBuilder
+{
+    private final ColumnFamilyStore cfs;
+    private final Set<Index> indexers;
+    private final ReducingKeyIterator iter;
+    private final UUID compactionId;
+
+    public CollatedViewIndexBuilder(ColumnFamilyStore cfs, Set<Index> indexers, ReducingKeyIterator iter)
+    {
+        this.cfs = cfs;
+        this.indexers = indexers;
+        this.iter = iter;
+        this.compactionId = UUIDGen.getTimeUUID();
+    }
+
+    public CompactionInfo getCompactionInfo()
+    {
+        return new CompactionInfo(cfs.metadata,
+                OperationType.INDEX_BUILD,
+                iter.getBytesRead(),
+                iter.getTotalBytes(),
+                compactionId);
+    }
+
+    public void build()
+    {
+        try
+        {
+            while (iter.hasNext())
+            {
+                if (isStopRequested())
+                    throw new CompactionInterruptedException(getCompactionInfo());
+                DecoratedKey key = iter.next();
+                Keyspace.indexPartition(key, cfs, indexers);
+            }
+        }
+        finally
+        {
+            iter.close();
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/internal/IndexEntry.java b/src/java/org/apache/cassandra/index/internal/IndexEntry.java
index 97525d6..8ffd26a 100644
--- a/src/java/org/apache/cassandra/index/internal/IndexEntry.java
+++ b/src/java/org/apache/cassandra/index/internal/IndexEntry.java

@@ -23,6 +23,7 @@
 import java.nio.ByteBuffer;
 
 import org.apache.cassandra.db.Clustering;
+import org.apache.cassandra.db.Clustering;
 import org.apache.cassandra.db.DecoratedKey;
 
 /**

diff --git a/src/java/org/apache/cassandra/index/internal/composites/ClusteringColumnIndex.java b/src/java/org/apache/cassandra/index/internal/composites/ClusteringColumnIndex.java
index b932602..f207e9b 100644
--- a/src/java/org/apache/cassandra/index/internal/composites/ClusteringColumnIndex.java
+++ b/src/java/org/apache/cassandra/index/internal/composites/ClusteringColumnIndex.java

@@ -36,12 +36,14 @@
  * has no impact) and v the cell value.
  *
  * Such a cell is always indexed by this index (or rather, it is indexed if
+ * {@code 
  * n >= columnDef.componentIndex, which will always be the case in practice)
  * and it will generate (makeIndexColumnName()) an index entry whose:
  *   - row key will be ck_i (getIndexedValue()) where i == columnDef.componentIndex.
  *   - cell name will
  *       rk ck_0 ... ck_{i-1} ck_{i+1} ck_n
  *     where rk is the row key of the initial cell and i == columnDef.componentIndex.
+ * }
  */
 public class ClusteringColumnIndex extends CassandraIndex
 {

diff --git a/src/java/org/apache/cassandra/index/internal/composites/CollectionKeyIndexBase.java b/src/java/org/apache/cassandra/index/internal/composites/CollectionKeyIndexBase.java
index fe77c96..ef76870 100644
--- a/src/java/org/apache/cassandra/index/internal/composites/CollectionKeyIndexBase.java
+++ b/src/java/org/apache/cassandra/index/internal/composites/CollectionKeyIndexBase.java

@@ -54,6 +54,9 @@
     {
         CBuilder builder = CBuilder.create(getIndexComparator());
         builder.add(partitionKey);
+
+        // When indexing a static column, prefix will be empty but only the
+        // partition key is needed at query time.
         for (int i = 0; i < prefix.size(); i++)
             builder.add(prefix.get(i));
 
@@ -63,16 +66,24 @@
     public IndexEntry decodeEntry(DecoratedKey indexedValue,
                                   Row indexEntry)
     {
-        int count = 1 + baseCfs.metadata.clusteringColumns().size();
         Clustering clustering = indexEntry.clustering();
-        CBuilder builder = CBuilder.create(baseCfs.getComparator());
-        for (int i = 0; i < count - 1; i++)
-            builder.add(clustering.get(i + 1));
+
+        Clustering indexedEntryClustering = null;
+        if (getIndexedColumn().isStatic())
+            indexedEntryClustering = Clustering.STATIC_CLUSTERING;
+        else
+        {
+            int count = 1 + baseCfs.metadata.clusteringColumns().size();
+            CBuilder builder = CBuilder.create(baseCfs.getComparator());
+            for (int i = 0; i < count - 1; i++)
+                builder.add(clustering.get(i + 1));
+            indexedEntryClustering = builder.build();
+        }
 
         return new IndexEntry(indexedValue,
                               clustering,
                               indexEntry.primaryKeyLivenessInfo().timestamp(),
                               clustering.get(0),
-                              builder.build());
+                              indexedEntryClustering);
     }
 }

diff --git a/src/java/org/apache/cassandra/index/internal/composites/CollectionValueIndex.java b/src/java/org/apache/cassandra/index/internal/composites/CollectionValueIndex.java
index 95bd7e1..5929e69 100644
--- a/src/java/org/apache/cassandra/index/internal/composites/CollectionValueIndex.java
+++ b/src/java/org/apache/cassandra/index/internal/composites/CollectionValueIndex.java

@@ -63,7 +63,10 @@
         for (int i = 0; i < prefix.size(); i++)
             builder.add(prefix.get(i));
 
-        // When indexing, cell will be present, but when searching, it won't  (CASSANDRA-7525)
+        // When indexing a static column, prefix will be empty but only the
+        // partition key is needed at query time.
+        // In the non-static case, cell will be present during indexing but
+        // not when searching (CASSANDRA-7525).
         if (prefix.size() == baseCfs.metadata.clusteringColumns().size() && path != null)
             builder.add(path.get(0));
 
@@ -73,15 +76,22 @@
     public IndexEntry decodeEntry(DecoratedKey indexedValue, Row indexEntry)
     {
         Clustering clustering = indexEntry.clustering();
-        CBuilder builder = CBuilder.create(baseCfs.getComparator());
-        for (int i = 0; i < baseCfs.getComparator().size(); i++)
-            builder.add(clustering.get(i + 1));
+        Clustering indexedEntryClustering = null;
+        if (getIndexedColumn().isStatic())
+            indexedEntryClustering = Clustering.STATIC_CLUSTERING;
+        else
+        {
+            CBuilder builder = CBuilder.create(baseCfs.getComparator());
+            for (int i = 0; i < baseCfs.getComparator().size(); i++)
+                builder.add(clustering.get(i + 1));
+            indexedEntryClustering = builder.build();
+        }
 
         return new IndexEntry(indexedValue,
                                 clustering,
                                 indexEntry.primaryKeyLivenessInfo().timestamp(),
                                 clustering.get(0),
-                                builder.build());
+                                indexedEntryClustering);
     }
 
     public boolean supportsOperator(ColumnDefinition indexedColumn, Operator operator)

diff --git a/src/java/org/apache/cassandra/index/internal/composites/CompositesSearcher.java b/src/java/org/apache/cassandra/index/internal/composites/CompositesSearcher.java
index 135839b..e24e441 100644
--- a/src/java/org/apache/cassandra/index/internal/composites/CompositesSearcher.java
+++ b/src/java/org/apache/cassandra/index/internal/composites/CompositesSearcher.java

@@ -21,12 +21,11 @@
 import java.util.ArrayList;
 import java.util.List;
 
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.filter.ClusteringIndexNamesFilter;
+import org.apache.cassandra.db.filter.ClusteringIndexSliceFilter;
+import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.filter.DataLimits;
 import org.apache.cassandra.db.filter.RowFilter;
 import org.apache.cassandra.db.partitions.UnfilteredPartitionIterator;
@@ -41,8 +40,6 @@
 
 public class CompositesSearcher extends CassandraIndexSearcher
 {
-    private static final Logger logger = LoggerFactory.getLogger(CompositesSearcher.class);
-
     public CompositesSearcher(ReadCommand command,
                               RowFilter.Expression expression,
                               CassandraIndex index)
@@ -55,10 +52,15 @@
         return command.selectsKey(partitionKey) && command.selectsClustering(partitionKey, entry.indexedEntryClustering);
     }
 
+    private boolean isStaticColumn()
+    {
+        return index.getIndexedColumn().isStatic();
+    }
+
     protected UnfilteredPartitionIterator queryDataFromIndex(final DecoratedKey indexKey,
                                                              final RowIterator indexHits,
                                                              final ReadCommand command,
-                                                             final ReadOrderGroup orderGroup)
+                                                             final ReadExecutionController executionController)
     {
         assert indexHits.staticRow() == Rows.EMPTY_STATIC_ROW;
 
@@ -108,49 +110,66 @@
                         nextEntry = index.decodeEntry(indexKey, indexHits.next());
                     }
 
-                    // Gather all index hits belonging to the same partition and query the data for those hits.
-                    // TODO: it's much more efficient to do 1 read for all hits to the same partition than doing
-                    // 1 read per index hit. However, this basically mean materializing all hits for a partition
-                    // in memory so we should consider adding some paging mechanism. However, index hits should
-                    // be relatively small so it's much better than the previous code that was materializing all
-                    // *data* for a given partition.
-                    BTreeSet.Builder<Clustering> clusterings = BTreeSet.builder(index.baseCfs.getComparator());
-                    List<IndexEntry> entries = new ArrayList<>();
+                    SinglePartitionReadCommand dataCmd;
                     DecoratedKey partitionKey = index.baseCfs.decorateKey(nextEntry.indexedKey);
-
-                    while (nextEntry != null && partitionKey.getKey().equals(nextEntry.indexedKey))
+                    List<IndexEntry> entries = new ArrayList<>();
+                    if (isStaticColumn())
                     {
-                        // We're queried a slice of the index, but some hits may not match some of the clustering column constraints
-                        if (isMatchingEntry(partitionKey, nextEntry, command))
-                        {
-                            clusterings.add(nextEntry.indexedEntryClustering);
-                            entries.add(nextEntry);
-                        }
-
+                        // If the index is on a static column, we just need to do a full read on the partition.
+                        // Note that we want to re-use the command.columnFilter() in case of future change.
+                        dataCmd = SinglePartitionReadCommand.create(index.baseCfs.metadata,
+                                                                    command.nowInSec(),
+                                                                    command.columnFilter(),
+                                                                    RowFilter.NONE,
+                                                                    DataLimits.NONE,
+                                                                    partitionKey,
+                                                                    new ClusteringIndexSliceFilter(Slices.ALL, false));
+                        entries.add(nextEntry);
                         nextEntry = indexHits.hasNext() ? index.decodeEntry(indexKey, indexHits.next()) : null;
                     }
+                    else
+                    {
+                        // Gather all index hits belonging to the same partition and query the data for those hits.
+                        // TODO: it's much more efficient to do 1 read for all hits to the same partition than doing
+                        // 1 read per index hit. However, this basically mean materializing all hits for a partition
+                        // in memory so we should consider adding some paging mechanism. However, index hits should
+                        // be relatively small so it's much better than the previous code that was materializing all
+                        // *data* for a given partition.
+                        BTreeSet.Builder<Clustering> clusterings = BTreeSet.builder(index.baseCfs.getComparator());
+                        while (nextEntry != null && partitionKey.getKey().equals(nextEntry.indexedKey))
+                        {
+                            // We're queried a slice of the index, but some hits may not match some of the clustering column constraints
+                            if (isMatchingEntry(partitionKey, nextEntry, command))
+                            {
+                                clusterings.add(nextEntry.indexedEntryClustering);
+                                entries.add(nextEntry);
+                            }
 
-                    // Because we've eliminated entries that don't match the clustering columns, it's possible we added nothing
-                    if (clusterings.isEmpty())
-                        continue;
+                            nextEntry = indexHits.hasNext() ? index.decodeEntry(indexKey, indexHits.next()) : null;
+                        }
 
-                    // Query the gathered index hits. We still need to filter stale hits from the resulting query.
-                    ClusteringIndexNamesFilter filter = new ClusteringIndexNamesFilter(clusterings.build(), false);
-                    SinglePartitionReadCommand dataCmd = SinglePartitionReadCommand.create(index.baseCfs.metadata,
-                                                                                           command.nowInSec(),
-                                                                                           command.columnFilter(),
-                                                                                           command.rowFilter(),
-                                                                                           DataLimits.NONE,
-                                                                                           partitionKey,
-                                                                                           filter);
+                        // Because we've eliminated entries that don't match the clustering columns, it's possible we added nothing
+                        if (clusterings.isEmpty())
+                            continue;
+
+                        // Query the gathered index hits. We still need to filter stale hits from the resulting query.
+                        ClusteringIndexNamesFilter filter = new ClusteringIndexNamesFilter(clusterings.build(), false);
+                        dataCmd = SinglePartitionReadCommand.create(index.baseCfs.metadata,
+                                                                    command.nowInSec(),
+                                                                    command.columnFilter(),
+                                                                    command.rowFilter(),
+                                                                    DataLimits.NONE,
+                                                                    partitionKey,
+                                                                    filter);
+                    }
+
                     @SuppressWarnings("resource") // We close right away if empty, and if it's assign to next it will be called either
                     // by the next caller of next, or through closing this iterator is this come before.
                     UnfilteredRowIterator dataIter =
-                        filterStaleEntries(dataCmd.queryMemtableAndDisk(index.baseCfs,
-                                                                        orderGroup.baseReadOpOrderGroup()),
+                        filterStaleEntries(dataCmd.queryMemtableAndDisk(index.baseCfs, executionController),
                                            indexKey.getKey(),
                                            entries,
-                                           orderGroup.writeOpOrderGroup(),
+                                           executionController.writeOpOrderGroup(),
                                            command.nowInSec());
 
                     if (dataIter.isEmpty())
@@ -182,11 +201,12 @@
     {
         entries.forEach(entry ->
             index.deleteStaleEntry(entry.indexValue,
-                                     entry.indexClustering,
-                                     new DeletionTime(entry.timestamp, nowInSec),
-                                     writeOp));
+                                   entry.indexClustering,
+                                   new DeletionTime(entry.timestamp, nowInSec),
+                                   writeOp));
     }
 
+    // We assume all rows in dataIter belong to the same partition.
     private UnfilteredRowIterator filterStaleEntries(UnfilteredRowIterator dataIter,
                                                      final ByteBuffer indexValue,
                                                      final List<IndexEntry> entries,
@@ -207,50 +227,75 @@
             });
         }
 
+        UnfilteredRowIterator iteratorToReturn = null;
         ClusteringComparator comparator = dataIter.metadata().comparator;
-        class Transform extends Transformation
+        if (isStaticColumn())
         {
-            private int entriesIdx;
+            if (entries.size() != 1)
+                throw new AssertionError("A partition should have at most one index within a static column index");
 
-            @Override
-            public Row applyToRow(Row row)
+            iteratorToReturn = dataIter;
+            if (index.isStale(dataIter.staticRow(), indexValue, nowInSec))
             {
-                IndexEntry entry = findEntry(row.clustering());
-                if (!index.isStale(row, indexValue, nowInSec))
-                    return row;
-
-                staleEntries.add(entry);
-                return null;
+                // The entry is staled, we return no rows in this partition.
+                staleEntries.addAll(entries);
+                iteratorToReturn = UnfilteredRowIterators.noRowsIterator(dataIter.metadata(),
+                                                                         dataIter.partitionKey(),
+                                                                         Rows.EMPTY_STATIC_ROW,
+                                                                         dataIter.partitionLevelDeletion(),
+                                                                         dataIter.isReverseOrder());
             }
-
-            private IndexEntry findEntry(Clustering clustering)
+            deleteAllEntries(staleEntries, writeOp, nowInSec);
+        }
+        else
+        {
+            class Transform extends Transformation
             {
-                assert entriesIdx < entries.size();
-                while (entriesIdx < entries.size())
+                private int entriesIdx;
+
+                @Override
+                public Row applyToRow(Row row)
                 {
-                    IndexEntry entry = entries.get(entriesIdx++);
-                    // The entries are in clustering order. So that the requested entry should be the
-                    // next entry, the one at 'entriesIdx'. However, we can have stale entries, entries
-                    // that have no corresponding row in the base table typically because of a range
-                    // tombstone or partition level deletion. Delete such stale entries.
-                    int cmp = comparator.compare(entry.indexedEntryClustering, clustering);
-                    assert cmp <= 0; // this would means entries are not in clustering order, which shouldn't happen
-                    if (cmp == 0)
-                        return entry;
-                    else
-                        staleEntries.add(entry);
-                }
-                // entries correspond to the rows we've queried, so we shouldn't have a row that has no corresponding entry.
-                throw new AssertionError();
-            }
+                    IndexEntry entry = findEntry(row.clustering());
+                    if (!index.isStale(row, indexValue, nowInSec))
+                        return row;
 
-            @Override
-            public void onPartitionClose()
-            {
-                deleteAllEntries(staleEntries, writeOp, nowInSec);
+                    staleEntries.add(entry);
+                    return null;
+                }
+
+                private IndexEntry findEntry(Clustering clustering)
+                {
+                    assert entriesIdx < entries.size();
+                    while (entriesIdx < entries.size())
+                    {
+                        IndexEntry entry = entries.get(entriesIdx++);
+                        // The entries are in clustering order. So that the requested entry should be the
+                        // next entry, the one at 'entriesIdx'. However, we can have stale entries, entries
+                        // that have no corresponding row in the base table typically because of a range
+                        // tombstone or partition level deletion. Delete such stale entries.
+                        // For static column, we only need to compare the partition key, otherwise we compare
+                        // the whole clustering.
+                        int cmp = comparator.compare(entry.indexedEntryClustering, clustering);
+                        assert cmp <= 0; // this would means entries are not in clustering order, which shouldn't happen
+                        if (cmp == 0)
+                            return entry;
+                        else
+                            staleEntries.add(entry);
+                    }
+                    // entries correspond to the rows we've queried, so we shouldn't have a row that has no corresponding entry.
+                    throw new AssertionError();
+                }
+
+                @Override
+                public void onPartitionClose()
+                {
+                    deleteAllEntries(staleEntries, writeOp, nowInSec);
+                }
             }
+            iteratorToReturn = Transformation.apply(dataIter, new Transform());
         }
 
-        return Transformation.apply(dataIter, new Transform());
+        return iteratorToReturn;
     }
 }

diff --git a/src/java/org/apache/cassandra/index/internal/composites/RegularColumnIndex.java b/src/java/org/apache/cassandra/index/internal/composites/RegularColumnIndex.java
index f1dc3af..9cbfe03 100644
--- a/src/java/org/apache/cassandra/index/internal/composites/RegularColumnIndex.java
+++ b/src/java/org/apache/cassandra/index/internal/composites/RegularColumnIndex.java

@@ -68,22 +68,34 @@
         for (int i = 0; i < prefix.size(); i++)
             builder.add(prefix.get(i));
 
+        // Note: if indexing a static column, prefix will be Clustering.STATIC_CLUSTERING
+        // so the Clustering obtained from builder::build will contain a value for only
+        // the partition key. At query time though, this is all that's needed as the entire
+        // base table partition should be returned for any mathching index entry.
         return builder;
     }
 
     public IndexEntry decodeEntry(DecoratedKey indexedValue, Row indexEntry)
     {
         Clustering clustering = indexEntry.clustering();
-        ClusteringComparator baseComparator = baseCfs.getComparator();
-        CBuilder builder = CBuilder.create(baseComparator);
-        for (int i = 0; i < baseComparator.size(); i++)
-            builder.add(clustering.get(i + 1));
+
+        Clustering indexedEntryClustering = null;
+        if (getIndexedColumn().isStatic())
+            indexedEntryClustering = Clustering.STATIC_CLUSTERING;
+        else
+        {
+            ClusteringComparator baseComparator = baseCfs.getComparator();
+            CBuilder builder = CBuilder.create(baseComparator);
+            for (int i = 0; i < baseComparator.size(); i++)
+                builder.add(clustering.get(i + 1));
+            indexedEntryClustering = builder.build();
+        }
 
         return new IndexEntry(indexedValue,
                                 clustering,
                                 indexEntry.primaryKeyLivenessInfo().timestamp(),
                                 clustering.get(0),
-                                builder.build());
+                                indexedEntryClustering);
     }
 
     public boolean isStale(Row data, ByteBuffer indexValue, int nowInSec)

diff --git a/src/java/org/apache/cassandra/index/internal/keys/KeysSearcher.java b/src/java/org/apache/cassandra/index/internal/keys/KeysSearcher.java
index 189b652..0169d3f 100644
--- a/src/java/org/apache/cassandra/index/internal/keys/KeysSearcher.java
+++ b/src/java/org/apache/cassandra/index/internal/keys/KeysSearcher.java

@@ -49,7 +49,7 @@
     protected UnfilteredPartitionIterator queryDataFromIndex(final DecoratedKey indexKey,
                                                              final RowIterator indexHits,
                                                              final ReadCommand command,
-                                                             final ReadOrderGroup orderGroup)
+                                                             final ReadExecutionController executionController)
     {
         assert indexHits.staticRow() == Rows.EMPTY_STATIC_ROW;
 
@@ -104,11 +104,10 @@
                     @SuppressWarnings("resource") // filterIfStale closes it's iterator if either it materialize it or if it returns null.
                                                   // Otherwise, we close right away if empty, and if it's assigned to next it will be called either
                                                   // by the next caller of next, or through closing this iterator is this come before.
-                    UnfilteredRowIterator dataIter = filterIfStale(dataCmd.queryMemtableAndDisk(index.baseCfs,
-                                                                                                orderGroup.baseReadOpOrderGroup()),
+                    UnfilteredRowIterator dataIter = filterIfStale(dataCmd.queryMemtableAndDisk(index.baseCfs, executionController),
                                                                    hit,
                                                                    indexKey.getKey(),
-                                                                   orderGroup.writeOpOrderGroup(),
+                                                                   executionController.writeOpOrderGroup(),
                                                                    isForThrift(),
                                                                    command.nowInSec());
 
@@ -139,7 +138,7 @@
 
     private ColumnFilter getExtendedFilter(ColumnFilter initialFilter)
     {
-        if (command.columnFilter().includes(index.getIndexedColumn()))
+        if (command.columnFilter().fetches(index.getIndexedColumn()))
             return initialFilter;
 
         ColumnFilter.Builder builder = ColumnFilter.selectionBuilder();
@@ -161,7 +160,7 @@
             // is the indexed name and so we need to materialize the partition.
             ImmutableBTreePartition result = ImmutableBTreePartition.create(iterator);
             iterator.close();
-            Row data = result.getRow(new Clustering(index.getIndexedColumn().name.bytes));
+            Row data = result.getRow(Clustering.make(index.getIndexedColumn().name.bytes));
             if (data == null)
                 return null;
 
@@ -173,14 +172,14 @@
             {
                 // Index is stale, remove the index entry and ignore
                 index.deleteStaleEntry(index.getIndexCfs().decorateKey(indexedValue),
-                                         new Clustering(index.getIndexedColumn().name.bytes),
+                                         Clustering.make(index.getIndexedColumn().name.bytes),
                                          new DeletionTime(indexHit.primaryKeyLivenessInfo().timestamp(), nowInSec),
                                          writeOp);
                 return null;
             }
             else
             {
-                if (command.columnFilter().includes(index.getIndexedColumn()))
+                if (command.columnFilter().fetches(index.getIndexedColumn()))
                     return result.unfilteredIterator();
 
                 // The query on the base table used an extended column filter to ensure that the

diff --git a/src/java/org/apache/cassandra/index/sasi/SASIIndex.java b/src/java/org/apache/cassandra/index/sasi/SASIIndex.java
new file mode 100644
index 0000000..0b9d900
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/SASIIndex.java

@@ -0,0 +1,343 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi;
+
+import java.util.*;
+import java.util.concurrent.Callable;
+import java.util.function.BiFunction;
+
+import com.googlecode.concurrenttrees.common.Iterables;
+
+import org.apache.cassandra.config.*;
+import org.apache.cassandra.cql3.Operator;
+import org.apache.cassandra.cql3.statements.IndexTarget;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.compaction.CompactionManager;
+import org.apache.cassandra.db.compaction.OperationType;
+import org.apache.cassandra.db.filter.RowFilter;
+import org.apache.cassandra.db.lifecycle.Tracker;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.partitions.PartitionIterator;
+import org.apache.cassandra.db.partitions.PartitionUpdate;
+import org.apache.cassandra.db.rows.Row;
+import org.apache.cassandra.dht.Murmur3Partitioner;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.exceptions.InvalidRequestException;
+import org.apache.cassandra.index.Index;
+import org.apache.cassandra.index.IndexRegistry;
+import org.apache.cassandra.index.SecondaryIndexBuilder;
+import org.apache.cassandra.index.TargetParser;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.conf.IndexMode;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.Mode;
+import org.apache.cassandra.index.sasi.disk.PerSSTableIndexWriter;
+import org.apache.cassandra.index.sasi.plan.QueryPlan;
+import org.apache.cassandra.index.transactions.IndexTransaction;
+import org.apache.cassandra.io.sstable.Descriptor;
+import org.apache.cassandra.io.sstable.format.SSTableFlushObserver;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.notifications.*;
+import org.apache.cassandra.schema.IndexMetadata;
+import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.Pair;
+import org.apache.cassandra.utils.concurrent.OpOrder;
+
+public class SASIIndex implements Index, INotificationConsumer
+{
+    private static class SASIIndexBuildingSupport implements IndexBuildingSupport
+    {
+        public SecondaryIndexBuilder getIndexBuildTask(ColumnFamilyStore cfs,
+                                                       Set<Index> indexes,
+                                                       Collection<SSTableReader> sstablesToRebuild)
+        {
+            NavigableMap<SSTableReader, Map<ColumnDefinition, ColumnIndex>> sstables = new TreeMap<>((a, b) -> {
+                return Integer.compare(a.descriptor.generation, b.descriptor.generation);
+            });
+
+            indexes.stream()
+                   .filter((i) -> i instanceof SASIIndex)
+                   .forEach((i) -> {
+                       SASIIndex sasi = (SASIIndex) i;
+                       sstablesToRebuild.stream()
+                                        .filter((sstable) -> !sasi.index.hasSSTable(sstable))
+                                        .forEach((sstable) -> {
+                                            Map<ColumnDefinition, ColumnIndex> toBuild = sstables.get(sstable);
+                                            if (toBuild == null)
+                                                sstables.put(sstable, (toBuild = new HashMap<>()));
+
+                                            toBuild.put(sasi.index.getDefinition(), sasi.index);
+                                        });
+                   });
+
+            return new SASIIndexBuilder(cfs, sstables);
+        }
+    }
+
+    private static final SASIIndexBuildingSupport INDEX_BUILDER_SUPPORT = new SASIIndexBuildingSupport();
+
+    private final ColumnFamilyStore baseCfs;
+    private final IndexMetadata config;
+    private final ColumnIndex index;
+
+    public SASIIndex(ColumnFamilyStore baseCfs, IndexMetadata config)
+    {
+        this.baseCfs = baseCfs;
+        this.config = config;
+
+        ColumnDefinition column = TargetParser.parse(baseCfs.metadata, config).left;
+        this.index = new ColumnIndex(baseCfs.metadata.getKeyValidator(), column, config);
+
+        Tracker tracker = baseCfs.getTracker();
+        tracker.subscribe(this);
+
+        SortedMap<SSTableReader, Map<ColumnDefinition, ColumnIndex>> toRebuild = new TreeMap<>((a, b)
+                                                -> Integer.compare(a.descriptor.generation, b.descriptor.generation));
+
+        for (SSTableReader sstable : index.init(tracker.getView().liveSSTables()))
+        {
+            Map<ColumnDefinition, ColumnIndex> perSSTable = toRebuild.get(sstable);
+            if (perSSTable == null)
+                toRebuild.put(sstable, (perSSTable = new HashMap<>()));
+
+            perSSTable.put(index.getDefinition(), index);
+        }
+
+        CompactionManager.instance.submitIndexBuild(new SASIIndexBuilder(baseCfs, toRebuild));
+    }
+
+    public static Map<String, String> validateOptions(Map<String, String> options, CFMetaData cfm)
+    {
+        if (!(cfm.partitioner instanceof Murmur3Partitioner))
+            throw new ConfigurationException("SASI only supports Murmur3Partitioner.");
+
+        String targetColumn = options.get("target");
+        if (targetColumn == null)
+            throw new ConfigurationException("unknown target column");
+
+        Pair<ColumnDefinition, IndexTarget.Type> target = TargetParser.parse(cfm, targetColumn);
+        if (target == null)
+            throw new ConfigurationException("failed to retrieve target column for: " + targetColumn);
+
+        if (target.left.isComplex())
+            throw new ConfigurationException("complex columns are not yet supported by SASI");
+
+        IndexMode.validateAnalyzer(options);
+
+        IndexMode mode = IndexMode.getMode(target.left, options);
+        if (mode.mode == Mode.SPARSE)
+        {
+            if (mode.isLiteral)
+                throw new ConfigurationException("SPARSE mode is only supported on non-literal columns.");
+
+            if (mode.isAnalyzed)
+                throw new ConfigurationException("SPARSE mode doesn't support analyzers.");
+        }
+
+        return Collections.emptyMap();
+    }
+
+    public void register(IndexRegistry registry)
+    {
+        registry.registerIndex(this);
+    }
+
+    public IndexMetadata getIndexMetadata()
+    {
+        return config;
+    }
+
+    public Callable<?> getInitializationTask()
+    {
+        return null;
+    }
+
+    public Callable<?> getMetadataReloadTask(IndexMetadata indexMetadata)
+    {
+        return null;
+    }
+
+    public Callable<?> getBlockingFlushTask()
+    {
+        return null; // SASI indexes are flushed along side memtable
+    }
+
+    public Callable<?> getInvalidateTask()
+    {
+        return getTruncateTask(FBUtilities.timestampMicros());
+    }
+
+    public Callable<?> getTruncateTask(long truncatedAt)
+    {
+        return () -> {
+            index.dropData(truncatedAt);
+            return null;
+        };
+    }
+
+    public boolean shouldBuildBlocking()
+    {
+        return true;
+    }
+
+    public Optional<ColumnFamilyStore> getBackingTable()
+    {
+        return Optional.empty();
+    }
+
+    public boolean indexes(PartitionColumns columns)
+    {
+        return columns.contains(index.getDefinition());
+    }
+
+    public boolean dependsOn(ColumnDefinition column)
+    {
+        return index.getDefinition().compareTo(column) == 0;
+    }
+
+    public boolean supportsExpression(ColumnDefinition column, Operator operator)
+    {
+        return dependsOn(column) && index.supports(operator);
+    }
+
+    public AbstractType<?> customExpressionValueType()
+    {
+        return null;
+    }
+
+    public RowFilter getPostIndexQueryFilter(RowFilter filter)
+    {
+        return filter.withoutExpressions();
+    }
+
+    public long getEstimatedResultRows()
+    {
+        // this is temporary (until proper QueryPlan is integrated into Cassandra)
+        // and allows us to priority SASI indexes if any in the query since they
+        // are going to be more efficient, to query and intersect, than built-in indexes.
+        return Long.MIN_VALUE;
+    }
+
+    public void validate(PartitionUpdate update) throws InvalidRequestException
+    {}
+
+    public Indexer indexerFor(DecoratedKey key, PartitionColumns columns, int nowInSec, OpOrder.Group opGroup, IndexTransaction.Type transactionType)
+    {
+        return new Indexer()
+        {
+            public void begin()
+            {}
+
+            public void partitionDelete(DeletionTime deletionTime)
+            {}
+
+            public void rangeTombstone(RangeTombstone tombstone)
+            {}
+
+            public void insertRow(Row row)
+            {
+                if (isNewData())
+                    adjustMemtableSize(index.index(key, row), opGroup);
+            }
+
+            public void updateRow(Row oldRow, Row newRow)
+            {
+                insertRow(newRow);
+            }
+
+            public void removeRow(Row row)
+            {}
+
+            public void finish()
+            {}
+
+            // we are only interested in the data from Memtable
+            // everything else is going to be handled by SSTableWriter observers
+            private boolean isNewData()
+            {
+                return transactionType == IndexTransaction.Type.UPDATE;
+            }
+
+            public void adjustMemtableSize(long additionalSpace, OpOrder.Group opGroup)
+            {
+                baseCfs.getTracker().getView().getCurrentMemtable().getAllocator().onHeap().allocate(additionalSpace, opGroup);
+            }
+        };
+    }
+
+    public Searcher searcherFor(ReadCommand command) throws InvalidRequestException
+    {
+        CFMetaData config = command.metadata();
+        ColumnFamilyStore cfs = Schema.instance.getColumnFamilyStoreInstance(config.cfId);
+        return controller -> new QueryPlan(cfs, command, DatabaseDescriptor.getRangeRpcTimeout()).execute(controller);
+    }
+
+    public SSTableFlushObserver getFlushObserver(Descriptor descriptor, OperationType opType)
+    {
+        return newWriter(baseCfs.metadata.getKeyValidator(), descriptor, Collections.singletonMap(index.getDefinition(), index), opType);
+    }
+
+    public BiFunction<PartitionIterator, ReadCommand, PartitionIterator> postProcessorFor(ReadCommand command)
+    {
+        return (partitionIterator, readCommand) -> partitionIterator;
+    }
+
+    public IndexBuildingSupport getBuildTaskSupport()
+    {
+        return INDEX_BUILDER_SUPPORT;
+    }
+
+    public void handleNotification(INotification notification, Object sender)
+    {
+        // unfortunately, we can only check the type of notification via instanceof :(
+        if (notification instanceof SSTableAddedNotification)
+        {
+            SSTableAddedNotification notice = (SSTableAddedNotification) notification;
+            index.update(Collections.<SSTableReader>emptyList(), Iterables.toList(notice.added));
+        }
+        else if (notification instanceof SSTableListChangedNotification)
+        {
+            SSTableListChangedNotification notice = (SSTableListChangedNotification) notification;
+            index.update(notice.removed, notice.added);
+        }
+        else if (notification instanceof MemtableRenewedNotification)
+        {
+            index.switchMemtable();
+        }
+        else if (notification instanceof MemtableSwitchedNotification)
+        {
+            index.switchMemtable(((MemtableSwitchedNotification) notification).memtable);
+        }
+        else if (notification instanceof MemtableDiscardedNotification)
+        {
+            index.discardMemtable(((MemtableDiscardedNotification) notification).memtable);
+        }
+    }
+
+    public ColumnIndex getIndex()
+    {
+        return index;
+    }
+
+    protected static PerSSTableIndexWriter newWriter(AbstractType<?> keyValidator,
+                                                     Descriptor descriptor,
+                                                     Map<ColumnDefinition, ColumnIndex> indexes,
+                                                     OperationType opType)
+    {
+        return new PerSSTableIndexWriter(keyValidator, descriptor, opType, indexes);
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/SASIIndexBuilder.java b/src/java/org/apache/cassandra/index/sasi/SASIIndexBuilder.java
new file mode 100644
index 0000000..de8d69b
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/SASIIndexBuilder.java

@@ -0,0 +1,151 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.index.sasi;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.RowIndexEntry;
+import org.apache.cassandra.db.compaction.CompactionInfo;
+import org.apache.cassandra.db.compaction.CompactionInterruptedException;
+import org.apache.cassandra.db.compaction.OperationType;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.index.SecondaryIndexBuilder;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.disk.PerSSTableIndexWriter;
+import org.apache.cassandra.io.FSReadError;
+import org.apache.cassandra.io.sstable.KeyIterator;
+import org.apache.cassandra.io.sstable.SSTable;
+import org.apache.cassandra.io.sstable.SSTableIdentityIterator;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.io.util.RandomAccessReader;
+import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.utils.UUIDGen;
+
+class SASIIndexBuilder extends SecondaryIndexBuilder
+{
+    private final ColumnFamilyStore cfs;
+    private final UUID compactionId = UUIDGen.getTimeUUID();
+
+    private final SortedMap<SSTableReader, Map<ColumnDefinition, ColumnIndex>> sstables;
+
+    private long bytesProcessed = 0;
+    private final long totalSizeInBytes;
+
+    public SASIIndexBuilder(ColumnFamilyStore cfs, SortedMap<SSTableReader, Map<ColumnDefinition, ColumnIndex>> sstables)
+    {
+        long totalIndexBytes = 0;
+        for (SSTableReader sstable : sstables.keySet())
+            totalIndexBytes += getPrimaryIndexLength(sstable);
+
+        this.cfs = cfs;
+        this.sstables = sstables;
+        this.totalSizeInBytes = totalIndexBytes;
+    }
+
+    public void build()
+    {
+        AbstractType<?> keyValidator = cfs.metadata.getKeyValidator();
+        for (Map.Entry<SSTableReader, Map<ColumnDefinition, ColumnIndex>> e : sstables.entrySet())
+        {
+            SSTableReader sstable = e.getKey();
+            Map<ColumnDefinition, ColumnIndex> indexes = e.getValue();
+
+            try (RandomAccessReader dataFile = sstable.openDataReader())
+            {
+                PerSSTableIndexWriter indexWriter = SASIIndex.newWriter(keyValidator, sstable.descriptor, indexes, OperationType.COMPACTION);
+
+                long previousKeyPosition = 0;
+                try (KeyIterator keys = new KeyIterator(sstable.descriptor, cfs.metadata))
+                {
+                    while (keys.hasNext())
+                    {
+                        if (isStopRequested())
+                            throw new CompactionInterruptedException(getCompactionInfo());
+
+                        final DecoratedKey key = keys.next();
+                        final long keyPosition = keys.getKeyPosition();
+
+                        indexWriter.startPartition(key, keyPosition);
+
+                        try
+                        {
+                            RowIndexEntry indexEntry = sstable.getPosition(key, SSTableReader.Operator.EQ);
+                            dataFile.seek(indexEntry.position);
+                            ByteBufferUtil.readWithShortLength(dataFile); // key
+
+                            try (SSTableIdentityIterator partition = new SSTableIdentityIterator(sstable, dataFile, key))
+                            {
+                                // if the row has statics attached, it has to be indexed separately
+                                indexWriter.nextUnfilteredCluster(partition.staticRow());
+
+                                while (partition.hasNext())
+                                    indexWriter.nextUnfilteredCluster(partition.next());
+                            }
+                        }
+                        catch (IOException ex)
+                        {
+                            throw new FSReadError(ex, sstable.getFilename());
+                        }
+
+                        bytesProcessed += keyPosition - previousKeyPosition;
+                        previousKeyPosition = keyPosition;
+                    }
+
+                    completeSSTable(indexWriter, sstable, indexes.values());
+                }
+            }
+        }
+    }
+
+    public CompactionInfo getCompactionInfo()
+    {
+        return new CompactionInfo(cfs.metadata,
+                                  OperationType.INDEX_BUILD,
+                                  bytesProcessed,
+                                  totalSizeInBytes,
+                                  compactionId);
+    }
+
+    private long getPrimaryIndexLength(SSTable sstable)
+    {
+        File primaryIndex = new File(sstable.getIndexFilename());
+        return primaryIndex.exists() ? primaryIndex.length() : 0;
+    }
+
+    private void completeSSTable(PerSSTableIndexWriter indexWriter, SSTableReader sstable, Collection<ColumnIndex> indexes)
+    {
+        indexWriter.complete();
+
+        for (ColumnIndex index : indexes)
+        {
+            File tmpIndex = new File(sstable.descriptor.filenameFor(index.getComponent()));
+            if (!tmpIndex.exists()) // no data was inserted into the index for given sstable
+                continue;
+
+            index.update(Collections.<SSTableReader>emptyList(), Collections.singletonList(sstable));
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/SSTableIndex.java b/src/java/org/apache/cassandra/index/sasi/SSTableIndex.java
new file mode 100644
index 0000000..c67c39c
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/SSTableIndex.java

@@ -0,0 +1,198 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndex;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder;
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.io.FSReadError;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.utils.concurrent.Ref;
+
+import org.apache.commons.lang3.builder.HashCodeBuilder;
+
+import com.google.common.base.Function;
+
+public class SSTableIndex
+{
+    private final ColumnIndex columnIndex;
+    private final Ref<SSTableReader> sstableRef;
+    private final SSTableReader sstable;
+    private final OnDiskIndex index;
+    private final AtomicInteger references = new AtomicInteger(1);
+    private final AtomicBoolean obsolete = new AtomicBoolean(false);
+
+    public SSTableIndex(ColumnIndex index, File indexFile, SSTableReader referent)
+    {
+        this.columnIndex = index;
+        this.sstableRef = referent.tryRef();
+        this.sstable = sstableRef.get();
+
+        if (sstable == null)
+            throw new IllegalStateException("Couldn't acquire reference to the sstable: " + referent);
+
+        AbstractType<?> validator = columnIndex.getValidator();
+
+        assert validator != null;
+        assert indexFile.exists() : String.format("SSTable %s should have index %s.",
+                sstable.getFilename(),
+                columnIndex.getIndexName());
+
+        this.index = new OnDiskIndex(indexFile, validator, new DecoratedKeyFetcher(sstable));
+    }
+
+    public OnDiskIndexBuilder.Mode mode()
+    {
+        return index.mode();
+    }
+
+    public boolean hasMarkedPartials()
+    {
+        return index.hasMarkedPartials();
+    }
+
+    public ByteBuffer minTerm()
+    {
+        return index.minTerm();
+    }
+
+    public ByteBuffer maxTerm()
+    {
+        return index.maxTerm();
+    }
+
+    public ByteBuffer minKey()
+    {
+        return index.minKey();
+    }
+
+    public ByteBuffer maxKey()
+    {
+        return index.maxKey();
+    }
+
+    public RangeIterator<Long, Token> search(Expression expression)
+    {
+        return index.search(expression);
+    }
+
+    public SSTableReader getSSTable()
+    {
+        return sstable;
+    }
+
+    public String getPath()
+    {
+        return index.getIndexPath();
+    }
+
+    public boolean reference()
+    {
+        while (true)
+        {
+            int n = references.get();
+            if (n <= 0)
+                return false;
+            if (references.compareAndSet(n, n + 1))
+                return true;
+        }
+    }
+
+    public void release()
+    {
+        int n = references.decrementAndGet();
+        if (n == 0)
+        {
+            FileUtils.closeQuietly(index);
+            sstableRef.release();
+            if (obsolete.get() || sstableRef.globalCount() == 0)
+                FileUtils.delete(index.getIndexPath());
+        }
+    }
+
+    public void markObsolete()
+    {
+        obsolete.getAndSet(true);
+        release();
+    }
+
+    public boolean isObsolete()
+    {
+        return obsolete.get();
+    }
+
+    public boolean equals(Object o)
+    {
+        return o instanceof SSTableIndex && index.getIndexPath().equals(((SSTableIndex) o).index.getIndexPath());
+    }
+
+    public int hashCode()
+    {
+        return new HashCodeBuilder().append(index.getIndexPath()).build();
+    }
+
+    public String toString()
+    {
+        return String.format("SSTableIndex(column: %s, SSTable: %s)", columnIndex.getColumnName(), sstable.descriptor);
+    }
+
+    private static class DecoratedKeyFetcher implements Function<Long, DecoratedKey>
+    {
+        private final SSTableReader sstable;
+
+        DecoratedKeyFetcher(SSTableReader reader)
+        {
+            sstable = reader;
+        }
+
+        public DecoratedKey apply(Long offset)
+        {
+            try
+            {
+                return sstable.keyAt(offset);
+            }
+            catch (IOException e)
+            {
+                throw new FSReadError(new IOException("Failed to read key from " + sstable.descriptor, e), sstable.getFilename());
+            }
+        }
+
+        public int hashCode()
+        {
+            return sstable.descriptor.hashCode();
+        }
+
+        public boolean equals(Object other)
+        {
+            return other instanceof DecoratedKeyFetcher
+                    && sstable.descriptor.equals(((DecoratedKeyFetcher) other).sstable.descriptor);
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/Term.java b/src/java/org/apache/cassandra/index/sasi/Term.java
new file mode 100644
index 0000000..8f42d58
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/Term.java

@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.TermSize;
+import org.apache.cassandra.index.sasi.utils.MappedBuffer;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+import static org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.IS_PARTIAL_BIT;
+
+public class Term
+{
+    protected final MappedBuffer content;
+    protected final TermSize termSize;
+
+    private final boolean hasMarkedPartials;
+
+    public Term(MappedBuffer content, TermSize size, boolean hasMarkedPartials)
+    {
+        this.content = content;
+        this.termSize = size;
+        this.hasMarkedPartials = hasMarkedPartials;
+    }
+
+    public ByteBuffer getTerm()
+    {
+        long offset = termSize.isConstant() ? content.position() : content.position() + 2;
+        int  length = termSize.isConstant() ? termSize.size : readLength(content.position());
+
+        return content.getPageRegion(offset, length);
+    }
+
+    public boolean isPartial()
+    {
+        return !termSize.isConstant()
+               && hasMarkedPartials
+               && (content.getShort(content.position()) & (1 << IS_PARTIAL_BIT)) != 0;
+    }
+
+    public long getDataOffset()
+    {
+        long position = content.position();
+        return position + (termSize.isConstant() ? termSize.size : 2 + readLength(position));
+    }
+
+    public int compareTo(AbstractType<?> comparator, ByteBuffer query)
+    {
+        return compareTo(comparator, query, true);
+    }
+
+    public int compareTo(AbstractType<?> comparator, ByteBuffer query, boolean checkFully)
+    {
+        long position = content.position();
+        int padding = termSize.isConstant() ? 0 : 2;
+        int len = termSize.isConstant() ? termSize.size : readLength(position);
+
+        return content.comparePageTo(position + padding, checkFully ? len : Math.min(len, query.remaining()), comparator, query);
+    }
+
+    private short readLength(long position)
+    {
+        return (short) (content.getShort(position) & ~(1 << IS_PARTIAL_BIT));
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/TermIterator.java b/src/java/org/apache/cassandra/index/sasi/TermIterator.java
new file mode 100644
index 0000000..5b08a56
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/TermIterator.java

@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi;
+
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.*;
+import java.util.concurrent.atomic.AtomicInteger;
+import java.util.concurrent.atomic.AtomicLong;
+
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder;
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.index.sasi.utils.RangeUnionIterator;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.io.util.FileUtils;
+
+import com.google.common.util.concurrent.MoreExecutors;
+import com.google.common.util.concurrent.Uninterruptibles;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class TermIterator extends RangeIterator<Long, Token>
+{
+    private static final Logger logger = LoggerFactory.getLogger(TermIterator.class);
+
+    private static final ThreadLocal<ExecutorService> SEARCH_EXECUTOR = new ThreadLocal<ExecutorService>()
+    {
+        public ExecutorService initialValue()
+        {
+            final String currentThread = Thread.currentThread().getName();
+            final int concurrencyFactor = DatabaseDescriptor.searchConcurrencyFactor();
+
+            logger.info("Search Concurrency Factor is set to {} for {}", concurrencyFactor, currentThread);
+
+            return (concurrencyFactor <= 1)
+                    ? MoreExecutors.newDirectExecutorService()
+                    : Executors.newFixedThreadPool(concurrencyFactor, new ThreadFactory()
+            {
+                public final AtomicInteger count = new AtomicInteger();
+
+                public Thread newThread(Runnable task)
+                {
+                    return new Thread(task, currentThread + "-SEARCH-" + count.incrementAndGet()) {{ setDaemon(true); }};
+                }
+            });
+        }
+    };
+
+    private final Expression expression;
+
+    private final RangeIterator<Long, Token> union;
+    private final Set<SSTableIndex> referencedIndexes;
+
+    private TermIterator(Expression e,
+                         RangeIterator<Long, Token> union,
+                         Set<SSTableIndex> referencedIndexes)
+    {
+        super(union.getMinimum(), union.getMaximum(), union.getCount());
+
+        this.expression = e;
+        this.union = union;
+        this.referencedIndexes = referencedIndexes;
+    }
+
+    @SuppressWarnings("resource")
+    public static TermIterator build(final Expression e, Set<SSTableIndex> perSSTableIndexes)
+    {
+        final List<RangeIterator<Long, Token>> tokens = new CopyOnWriteArrayList<>();
+        final AtomicLong tokenCount = new AtomicLong(0);
+
+        RangeIterator<Long, Token> memtableIterator = e.index.searchMemtable(e);
+        if (memtableIterator != null)
+        {
+            tokens.add(memtableIterator);
+            tokenCount.addAndGet(memtableIterator.getCount());
+        }
+
+        final Set<SSTableIndex> referencedIndexes = new CopyOnWriteArraySet<>();
+
+        try
+        {
+            final CountDownLatch latch = new CountDownLatch(perSSTableIndexes.size());
+            final ExecutorService searchExecutor = SEARCH_EXECUTOR.get();
+
+            for (final SSTableIndex index : perSSTableIndexes)
+            {
+                if (e.getOp() == Expression.Op.PREFIX &&
+                    index.mode() == OnDiskIndexBuilder.Mode.CONTAINS && !index.hasMarkedPartials())
+                    throw new UnsupportedOperationException(String.format("The index %s has not yet been upgraded " +
+                                                                          "to support prefix queries in CONTAINS mode. " +
+                                                                          "Wait for compaction or rebuild the index.",
+                                                                          index.getPath()));
+
+
+                if (!index.reference())
+                {
+                    latch.countDown();
+                    continue;
+                }
+
+                // add to referenced right after the reference was acquired,
+                // that helps to release index if something goes bad inside of the search
+                referencedIndexes.add(index);
+
+                searchExecutor.submit((Runnable) () -> {
+                    try
+                    {
+                        e.checkpoint();
+
+                        RangeIterator<Long, Token> keyIterator = index.search(e);
+                        if (keyIterator == null)
+                        {
+                            releaseIndex(referencedIndexes, index);
+                            return;
+                        }
+
+                        tokens.add(keyIterator);
+                        tokenCount.getAndAdd(keyIterator.getCount());
+                    }
+                    catch (Throwable e1)
+                    {
+                        releaseIndex(referencedIndexes, index);
+
+                        if (logger.isDebugEnabled())
+                            logger.debug(String.format("Failed search an index %s, skipping.", index.getPath()), e1);
+                    }
+                    finally
+                    {
+                        latch.countDown();
+                    }
+                });
+            }
+
+            Uninterruptibles.awaitUninterruptibly(latch);
+
+            // checkpoint right away after all indexes complete search because we might have crossed the quota
+            e.checkpoint();
+
+            RangeIterator<Long, Token> ranges = RangeUnionIterator.build(tokens);
+            return ranges == null ? null : new TermIterator(e, ranges, referencedIndexes);
+        }
+        catch (Throwable ex)
+        {
+            // if execution quota was exceeded while opening indexes or something else happened
+            // local (yet to be tracked) indexes should be released first before re-throwing exception
+            referencedIndexes.forEach(TermIterator::releaseQuietly);
+
+            throw ex;
+        }
+    }
+
+    protected Token computeNext()
+    {
+        try
+        {
+            return union.hasNext() ? union.next() : endOfData();
+        }
+        finally
+        {
+            expression.checkpoint();
+        }
+    }
+
+    protected void performSkipTo(Long nextToken)
+    {
+        try
+        {
+            union.skipTo(nextToken);
+        }
+        finally
+        {
+            expression.checkpoint();
+        }
+    }
+
+    public void close()
+    {
+        FileUtils.closeQuietly(union);
+        referencedIndexes.forEach(TermIterator::releaseQuietly);
+        referencedIndexes.clear();
+    }
+
+    private static void releaseIndex(Set<SSTableIndex> indexes, SSTableIndex index)
+    {
+        indexes.remove(index);
+        releaseQuietly(index);
+    }
+
+    private static void releaseQuietly(SSTableIndex index)
+    {
+        try
+        {
+            index.release();
+        }
+        catch (Throwable e)
+        {
+            logger.error(String.format("Failed to release index %s", index.getPath()), e);
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/AbstractAnalyzer.java b/src/java/org/apache/cassandra/index/sasi/analyzer/AbstractAnalyzer.java
new file mode 100644
index 0000000..31c66cc
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/AbstractAnalyzer.java

@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer;
+
+import java.nio.ByteBuffer;
+import java.text.Normalizer;
+import java.util.Iterator;
+import java.util.Map;
+
+import org.apache.cassandra.db.marshal.AbstractType;
+
+public abstract class AbstractAnalyzer implements Iterator<ByteBuffer>
+{
+    protected ByteBuffer next = null;
+
+    public ByteBuffer next()
+    {
+        return next;
+    }
+
+    public void remove()
+    {
+        throw new UnsupportedOperationException();
+    }
+
+    public abstract void init(Map<String, String> options, AbstractType validator);
+
+    public abstract void reset(ByteBuffer input);
+
+    /**
+     * @return true if current analyzer provides text tokenization, false otherwise.
+     */
+    public boolean isTokenizing()
+    {
+        return false;
+    }
+
+    public static String normalize(String original)
+    {
+        return Normalizer.isNormalized(original, Normalizer.Form.NFC)
+                ? original
+                : Normalizer.normalize(original, Normalizer.Form.NFC);
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/NoOpAnalyzer.java b/src/java/org/apache/cassandra/index/sasi/analyzer/NoOpAnalyzer.java
new file mode 100644
index 0000000..9939a13
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/NoOpAnalyzer.java

@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer;
+
+import java.nio.ByteBuffer;
+import java.util.Map;
+
+import org.apache.cassandra.db.marshal.AbstractType;
+
+/**
+ * Default noOp tokenizer. The iterator will iterate only once
+ * returning the unmodified input
+ */
+public class NoOpAnalyzer extends AbstractAnalyzer
+{
+    private ByteBuffer input;
+    private boolean hasNext = false;
+
+    public void init(Map<String, String> options, AbstractType validator)
+    {}
+
+    public boolean hasNext()
+    {
+        if (hasNext)
+        {
+            this.next = input;
+            this.hasNext = false;
+            return true;
+        }
+        return false;
+    }
+
+    public void reset(ByteBuffer input)
+    {
+        this.next = null;
+        this.input = input;
+        this.hasNext = true;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/NonTokenizingAnalyzer.java b/src/java/org/apache/cassandra/index/sasi/analyzer/NonTokenizingAnalyzer.java
new file mode 100644
index 0000000..676b304
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/NonTokenizingAnalyzer.java

@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer;
+
+import java.nio.ByteBuffer;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.cassandra.index.sasi.analyzer.filter.BasicResultFilters;
+import org.apache.cassandra.index.sasi.analyzer.filter.FilterPipelineBuilder;
+import org.apache.cassandra.index.sasi.analyzer.filter.FilterPipelineExecutor;
+import org.apache.cassandra.index.sasi.analyzer.filter.FilterPipelineTask;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.marshal.AsciiType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.serializers.MarshalException;
+import org.apache.cassandra.utils.ByteBufferUtil;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Analyzer that does *not* tokenize the input. Optionally will
+ * apply filters for the input output as defined in analyzers options
+ */
+public class NonTokenizingAnalyzer extends AbstractAnalyzer
+{
+    private static final Logger logger = LoggerFactory.getLogger(NonTokenizingAnalyzer.class);
+
+    private static final Set<AbstractType<?>> VALID_ANALYZABLE_TYPES = new HashSet<AbstractType<?>>()
+    {{
+            add(UTF8Type.instance);
+            add(AsciiType.instance);
+    }};
+
+    private AbstractType validator;
+    private NonTokenizingOptions options;
+    private FilterPipelineTask filterPipeline;
+
+    private ByteBuffer input;
+    private boolean hasNext = false;
+
+    public void init(Map<String, String> options, AbstractType validator)
+    {
+        init(NonTokenizingOptions.buildFromMap(options), validator);
+    }
+
+    public void init(NonTokenizingOptions tokenizerOptions, AbstractType validator)
+    {
+        this.validator = validator;
+        this.options = tokenizerOptions;
+        this.filterPipeline = getFilterPipeline();
+    }
+
+    public boolean hasNext()
+    {
+        // check that we know how to handle the input, otherwise bail
+        if (!VALID_ANALYZABLE_TYPES.contains(validator))
+            return false;
+
+        if (hasNext)
+        {
+            String inputStr;
+
+            try
+            {
+                inputStr = validator.getString(input);
+                if (inputStr == null)
+                    throw new MarshalException(String.format("'null' deserialized value for %s with %s", ByteBufferUtil.bytesToHex(input), validator));
+
+                Object pipelineRes = FilterPipelineExecutor.execute(filterPipeline, inputStr);
+                if (pipelineRes == null)
+                    return false;
+
+                next = validator.fromString(normalize((String) pipelineRes));
+                return true;
+            }
+            catch (MarshalException e)
+            {
+                logger.error("Failed to deserialize value with " + validator, e);
+                return false;
+            }
+            finally
+            {
+                hasNext = false;
+            }
+        }
+
+        return false;
+    }
+
+    public void reset(ByteBuffer input)
+    {
+        this.next = null;
+        this.input = input;
+        this.hasNext = true;
+    }
+
+    private FilterPipelineTask getFilterPipeline()
+    {
+        FilterPipelineBuilder builder = new FilterPipelineBuilder(new BasicResultFilters.NoOperation());
+        if (options.isCaseSensitive() && options.shouldLowerCaseOutput())
+            builder = builder.add("to_lower", new BasicResultFilters.LowerCase());
+        if (options.isCaseSensitive() && options.shouldUpperCaseOutput())
+            builder = builder.add("to_upper", new BasicResultFilters.UpperCase());
+        if (!options.isCaseSensitive())
+            builder = builder.add("to_lower", new BasicResultFilters.LowerCase());
+        return builder.build();
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/NonTokenizingOptions.java b/src/java/org/apache/cassandra/index/sasi/analyzer/NonTokenizingOptions.java
new file mode 100644
index 0000000..303087b
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/NonTokenizingOptions.java

@@ -0,0 +1,147 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer;
+
+import java.util.Map;
+
+public class NonTokenizingOptions
+{
+    public static final String NORMALIZE_LOWERCASE = "normalize_lowercase";
+    public static final String NORMALIZE_UPPERCASE = "normalize_uppercase";
+    public static final String CASE_SENSITIVE = "case_sensitive";
+
+    private boolean caseSensitive;
+    private boolean upperCaseOutput;
+    private boolean lowerCaseOutput;
+
+    public boolean isCaseSensitive()
+    {
+        return caseSensitive;
+    }
+
+    public void setCaseSensitive(boolean caseSensitive)
+    {
+        this.caseSensitive = caseSensitive;
+    }
+
+    public boolean shouldUpperCaseOutput()
+    {
+        return upperCaseOutput;
+    }
+
+    public void setUpperCaseOutput(boolean upperCaseOutput)
+    {
+        this.upperCaseOutput = upperCaseOutput;
+    }
+
+    public boolean shouldLowerCaseOutput()
+    {
+        return lowerCaseOutput;
+    }
+
+    public void setLowerCaseOutput(boolean lowerCaseOutput)
+    {
+        this.lowerCaseOutput = lowerCaseOutput;
+    }
+
+    public static class OptionsBuilder
+    {
+        private boolean caseSensitive = true;
+        private boolean upperCaseOutput = false;
+        private boolean lowerCaseOutput = false;
+
+        public OptionsBuilder()
+        {
+        }
+
+        public OptionsBuilder caseSensitive(boolean caseSensitive)
+        {
+            this.caseSensitive = caseSensitive;
+            return this;
+        }
+
+        public OptionsBuilder upperCaseOutput(boolean upperCaseOutput)
+        {
+            this.upperCaseOutput = upperCaseOutput;
+            return this;
+        }
+
+        public OptionsBuilder lowerCaseOutput(boolean lowerCaseOutput)
+        {
+            this.lowerCaseOutput = lowerCaseOutput;
+            return this;
+        }
+
+        public NonTokenizingOptions build()
+        {
+            if (lowerCaseOutput && upperCaseOutput)
+                throw new IllegalArgumentException("Options to normalize terms cannot be " +
+                        "both uppercase and lowercase at the same time");
+
+            NonTokenizingOptions options = new NonTokenizingOptions();
+            options.setCaseSensitive(caseSensitive);
+            options.setUpperCaseOutput(upperCaseOutput);
+            options.setLowerCaseOutput(lowerCaseOutput);
+            return options;
+        }
+    }
+
+    public static NonTokenizingOptions buildFromMap(Map<String, String> optionsMap)
+    {
+        OptionsBuilder optionsBuilder = new OptionsBuilder();
+
+        if (optionsMap.containsKey(CASE_SENSITIVE) && (optionsMap.containsKey(NORMALIZE_LOWERCASE)
+                || optionsMap.containsKey(NORMALIZE_UPPERCASE)))
+            throw new IllegalArgumentException("case_sensitive option cannot be specified together " +
+                    "with either normalize_lowercase or normalize_uppercase");
+
+        for (Map.Entry<String, String> entry : optionsMap.entrySet())
+        {
+            switch (entry.getKey())
+            {
+                case NORMALIZE_LOWERCASE:
+                {
+                    boolean bool = Boolean.parseBoolean(entry.getValue());
+                    optionsBuilder = optionsBuilder.lowerCaseOutput(bool);
+                    break;
+                }
+                case NORMALIZE_UPPERCASE:
+                {
+                    boolean bool = Boolean.parseBoolean(entry.getValue());
+                    optionsBuilder = optionsBuilder.upperCaseOutput(bool);
+                    break;
+                }
+                case CASE_SENSITIVE:
+                {
+                    boolean bool = Boolean.parseBoolean(entry.getValue());
+                    optionsBuilder = optionsBuilder.caseSensitive(bool);
+                    break;
+                }
+            }
+        }
+        return optionsBuilder.build();
+    }
+
+    public static NonTokenizingOptions getDefaultOptions()
+    {
+        return new OptionsBuilder()
+                .caseSensitive(true).lowerCaseOutput(false)
+                .upperCaseOutput(false)
+                .build();
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/SUPPLEMENTARY.jflex-macro b/src/java/org/apache/cassandra/index/sasi/analyzer/SUPPLEMENTARY.jflex-macro
new file mode 100644
index 0000000..f5bf68e
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/SUPPLEMENTARY.jflex-macro

@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+// Generated using ICU4J 52.1.0.0
+// by org.apache.lucene.analysis.icu.GenerateJFlexSupplementaryMacros
+
+
+ALetterSupp = (
+	  ([\ud83b][\uDE00-\uDE03\uDE05-\uDE1F\uDE21\uDE22\uDE24\uDE27\uDE29-\uDE32\uDE34-\uDE37\uDE39\uDE3B\uDE42\uDE47\uDE49\uDE4B\uDE4D-\uDE4F\uDE51\uDE52\uDE54\uDE57\uDE59\uDE5B\uDE5D\uDE5F\uDE61\uDE62\uDE64\uDE67-\uDE6A\uDE6C-\uDE72\uDE74-\uDE77\uDE79-\uDE7C\uDE7E\uDE80-\uDE89\uDE8B-\uDE9B\uDEA1-\uDEA3\uDEA5-\uDEA9\uDEAB-\uDEBB])
+	| ([\ud81a][\uDC00-\uDE38])
+	| ([\ud81b][\uDF00-\uDF44\uDF50\uDF93-\uDF9F])
+	| ([\ud835][\uDC00-\uDC54\uDC56-\uDC9C\uDC9E\uDC9F\uDCA2\uDCA5\uDCA6\uDCA9-\uDCAC\uDCAE-\uDCB9\uDCBB\uDCBD-\uDCC3\uDCC5-\uDD05\uDD07-\uDD0A\uDD0D-\uDD14\uDD16-\uDD1C\uDD1E-\uDD39\uDD3B-\uDD3E\uDD40-\uDD44\uDD46\uDD4A-\uDD50\uDD52-\uDEA5\uDEA8-\uDEC0\uDEC2-\uDEDA\uDEDC-\uDEFA\uDEFC-\uDF14\uDF16-\uDF34\uDF36-\uDF4E\uDF50-\uDF6E\uDF70-\uDF88\uDF8A-\uDFA8\uDFAA-\uDFC2\uDFC4-\uDFCB])
+	| ([\ud80d][\uDC00-\uDC2E])
+	| ([\ud80c][\uDC00-\uDFFF])
+	| ([\ud809][\uDC00-\uDC62])
+	| ([\ud808][\uDC00-\uDF6E])
+	| ([\ud805][\uDE80-\uDEAA])
+	| ([\ud804][\uDC03-\uDC37\uDC83-\uDCAF\uDCD0-\uDCE8\uDD03-\uDD26\uDD83-\uDDB2\uDDC1-\uDDC4])
+	| ([\ud801][\uDC00-\uDC9D])
+	| ([\ud800][\uDC00-\uDC0B\uDC0D-\uDC26\uDC28-\uDC3A\uDC3C\uDC3D\uDC3F-\uDC4D\uDC50-\uDC5D\uDC80-\uDCFA\uDD40-\uDD74\uDE80-\uDE9C\uDEA0-\uDED0\uDF00-\uDF1E\uDF30-\uDF4A\uDF80-\uDF9D\uDFA0-\uDFC3\uDFC8-\uDFCF\uDFD1-\uDFD5])
+	| ([\ud803][\uDC00-\uDC48])
+	| ([\ud802][\uDC00-\uDC05\uDC08\uDC0A-\uDC35\uDC37\uDC38\uDC3C\uDC3F-\uDC55\uDD00-\uDD15\uDD20-\uDD39\uDD80-\uDDB7\uDDBE\uDDBF\uDE00\uDE10-\uDE13\uDE15-\uDE17\uDE19-\uDE33\uDE60-\uDE7C\uDF00-\uDF35\uDF40-\uDF55\uDF60-\uDF72])
+)
+FormatSupp = (
+	  ([\ud804][\uDCBD])
+	| ([\ud834][\uDD73-\uDD7A])
+	| ([\udb40][\uDC01\uDC20-\uDC7F])
+)
+NumericSupp = (
+	  ([\ud805][\uDEC0-\uDEC9])
+	| ([\ud804][\uDC66-\uDC6F\uDCF0-\uDCF9\uDD36-\uDD3F\uDDD0-\uDDD9])
+	| ([\ud835][\uDFCE-\uDFFF])
+	| ([\ud801][\uDCA0-\uDCA9])
+)
+ExtendSupp = (
+	  ([\ud81b][\uDF51-\uDF7E\uDF8F-\uDF92])
+	| ([\ud805][\uDEAB-\uDEB7])
+	| ([\ud804][\uDC00-\uDC02\uDC38-\uDC46\uDC80-\uDC82\uDCB0-\uDCBA\uDD00-\uDD02\uDD27-\uDD34\uDD80-\uDD82\uDDB3-\uDDC0])
+	| ([\ud834][\uDD65-\uDD69\uDD6D-\uDD72\uDD7B-\uDD82\uDD85-\uDD8B\uDDAA-\uDDAD\uDE42-\uDE44])
+	| ([\ud800][\uDDFD])
+	| ([\udb40][\uDD00-\uDDEF])
+	| ([\ud802][\uDE01-\uDE03\uDE05\uDE06\uDE0C-\uDE0F\uDE38-\uDE3A\uDE3F])
+)
+KatakanaSupp = (
+	  ([\ud82c][\uDC00])
+)
+MidLetterSupp = (
+	  []
+)
+MidNumSupp = (
+	  []
+)
+MidNumLetSupp = (
+	  []
+)
+ExtendNumLetSupp = (
+	  []
+)
+ExtendNumLetSupp = (
+	  []
+)
+ComplexContextSupp = (
+	  []
+)
+HanSupp = (
+	  ([\ud87e][\uDC00-\uDE1D])
+	| ([\ud86b][\uDC00-\uDFFF])
+	| ([\ud86a][\uDC00-\uDFFF])
+	| ([\ud869][\uDC00-\uDED6\uDF00-\uDFFF])
+	| ([\ud868][\uDC00-\uDFFF])
+	| ([\ud86e][\uDC00-\uDC1D])
+	| ([\ud86d][\uDC00-\uDF34\uDF40-\uDFFF])
+	| ([\ud86c][\uDC00-\uDFFF])
+	| ([\ud863][\uDC00-\uDFFF])
+	| ([\ud862][\uDC00-\uDFFF])
+	| ([\ud861][\uDC00-\uDFFF])
+	| ([\ud860][\uDC00-\uDFFF])
+	| ([\ud867][\uDC00-\uDFFF])
+	| ([\ud866][\uDC00-\uDFFF])
+	| ([\ud865][\uDC00-\uDFFF])
+	| ([\ud864][\uDC00-\uDFFF])
+	| ([\ud858][\uDC00-\uDFFF])
+	| ([\ud859][\uDC00-\uDFFF])
+	| ([\ud85a][\uDC00-\uDFFF])
+	| ([\ud85b][\uDC00-\uDFFF])
+	| ([\ud85c][\uDC00-\uDFFF])
+	| ([\ud85d][\uDC00-\uDFFF])
+	| ([\ud85e][\uDC00-\uDFFF])
+	| ([\ud85f][\uDC00-\uDFFF])
+	| ([\ud850][\uDC00-\uDFFF])
+	| ([\ud851][\uDC00-\uDFFF])
+	| ([\ud852][\uDC00-\uDFFF])
+	| ([\ud853][\uDC00-\uDFFF])
+	| ([\ud854][\uDC00-\uDFFF])
+	| ([\ud855][\uDC00-\uDFFF])
+	| ([\ud856][\uDC00-\uDFFF])
+	| ([\ud857][\uDC00-\uDFFF])
+	| ([\ud849][\uDC00-\uDFFF])
+	| ([\ud848][\uDC00-\uDFFF])
+	| ([\ud84b][\uDC00-\uDFFF])
+	| ([\ud84a][\uDC00-\uDFFF])
+	| ([\ud84d][\uDC00-\uDFFF])
+	| ([\ud84c][\uDC00-\uDFFF])
+	| ([\ud84f][\uDC00-\uDFFF])
+	| ([\ud84e][\uDC00-\uDFFF])
+	| ([\ud841][\uDC00-\uDFFF])
+	| ([\ud840][\uDC00-\uDFFF])
+	| ([\ud843][\uDC00-\uDFFF])
+	| ([\ud842][\uDC00-\uDFFF])
+	| ([\ud845][\uDC00-\uDFFF])
+	| ([\ud844][\uDC00-\uDFFF])
+	| ([\ud847][\uDC00-\uDFFF])
+	| ([\ud846][\uDC00-\uDFFF])
+)
+HiraganaSupp = (
+	  ([\ud83c][\uDE00])
+	| ([\ud82c][\uDC01])
+)
+SingleQuoteSupp = (
+	  []
+)
+DoubleQuoteSupp = (
+	  []
+)
+HebrewLetterSupp = (
+	  []
+)
+RegionalIndicatorSupp = (
+	  ([\ud83c][\uDDE6-\uDDFF])
+)

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzer.java b/src/java/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzer.java
new file mode 100644
index 0000000..069164c
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzer.java

@@ -0,0 +1,199 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer;
+
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.Reader;
+import java.nio.ByteBuffer;
+import java.util.Map;
+
+import org.apache.cassandra.index.sasi.analyzer.filter.*;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.io.util.DataInputBuffer;
+import org.apache.cassandra.utils.ByteBufferUtil;
+
+import com.google.common.annotations.VisibleForTesting;
+
+import com.carrotsearch.hppc.IntObjectMap;
+import com.carrotsearch.hppc.IntObjectOpenHashMap;
+
+public class StandardAnalyzer extends AbstractAnalyzer
+{
+    public enum TokenType
+    {
+        EOF(-1),
+        ALPHANUM(0),
+        NUM(6),
+        SOUTHEAST_ASIAN(9),
+        IDEOGRAPHIC(10),
+        HIRAGANA(11),
+        KATAKANA(12),
+        HANGUL(13);
+
+        private static final IntObjectMap<TokenType> TOKENS = new IntObjectOpenHashMap<>();
+
+        static
+        {
+            for (TokenType type : TokenType.values())
+                TOKENS.put(type.value, type);
+        }
+
+        public final int value;
+
+        TokenType(int value)
+        {
+            this.value = value;
+        }
+
+        public int getValue()
+        {
+            return value;
+        }
+
+        public static TokenType fromValue(int val)
+        {
+            return TOKENS.get(val);
+        }
+    }
+
+    private AbstractType validator;
+
+    private StandardTokenizerInterface scanner;
+    private StandardTokenizerOptions options;
+    private FilterPipelineTask filterPipeline;
+
+    protected Reader inputReader = null;
+
+    public String getToken()
+    {
+        return scanner.getText();
+    }
+
+    public final boolean incrementToken() throws IOException
+    {
+        while(true)
+        {
+            TokenType currentTokenType = TokenType.fromValue(scanner.getNextToken());
+            if (currentTokenType == TokenType.EOF)
+                return false;
+            if (scanner.yylength() <= options.getMaxTokenLength()
+                    && scanner.yylength() >= options.getMinTokenLength())
+                return true;
+        }
+    }
+
+    protected String getFilteredCurrentToken() throws IOException
+    {
+        String token = getToken();
+        Object pipelineRes;
+
+        while (true)
+        {
+            pipelineRes = FilterPipelineExecutor.execute(filterPipeline, token);
+            if (pipelineRes != null)
+                break;
+
+            boolean reachedEOF = incrementToken();
+            if (!reachedEOF)
+                break;
+
+            token = getToken();
+        }
+
+        return (String) pipelineRes;
+    }
+
+    private FilterPipelineTask getFilterPipeline()
+    {
+        FilterPipelineBuilder builder = new FilterPipelineBuilder(new BasicResultFilters.NoOperation());
+        if (!options.isCaseSensitive() && options.shouldLowerCaseTerms())
+            builder = builder.add("to_lower", new BasicResultFilters.LowerCase());
+        if (!options.isCaseSensitive() && options.shouldUpperCaseTerms())
+            builder = builder.add("to_upper", new BasicResultFilters.UpperCase());
+        if (options.shouldIgnoreStopTerms())
+            builder = builder.add("skip_stop_words", new StopWordFilters.DefaultStopWordFilter(options.getLocale()));
+        if (options.shouldStemTerms())
+            builder = builder.add("term_stemming", new StemmingFilters.DefaultStemmingFilter(options.getLocale()));
+        return builder.build();
+    }
+
+    public void init(Map<String, String> options, AbstractType validator)
+    {
+        init(StandardTokenizerOptions.buildFromMap(options), validator);
+    }
+
+    @VisibleForTesting
+    protected void init(StandardTokenizerOptions options)
+    {
+        init(options, UTF8Type.instance);
+    }
+
+    public void init(StandardTokenizerOptions tokenizerOptions, AbstractType validator)
+    {
+        this.validator = validator;
+        this.options = tokenizerOptions;
+        this.filterPipeline = getFilterPipeline();
+
+        Reader reader = new InputStreamReader(new DataInputBuffer(ByteBufferUtil.EMPTY_BYTE_BUFFER, false));
+        this.scanner = new StandardTokenizerImpl(reader);
+        this.inputReader = reader;
+    }
+
+    public boolean hasNext()
+    {
+        try
+        {
+            if (incrementToken())
+            {
+                if (getFilteredCurrentToken() != null)
+                {
+                    this.next = validator.fromString(normalize(getFilteredCurrentToken()));
+                    return true;
+                }
+            }
+        }
+        catch (IOException e)
+        {}
+
+        return false;
+    }
+
+    public void reset(ByteBuffer input)
+    {
+        this.next = null;
+        Reader reader = new InputStreamReader(new DataInputBuffer(input, false));
+        scanner.yyreset(reader);
+        this.inputReader = reader;
+    }
+
+    public void reset(InputStream input)
+    {
+        this.next = null;
+        Reader reader = new InputStreamReader(input);
+        scanner.yyreset(reader);
+        this.inputReader = reader;
+    }
+
+    public boolean isTokenizing()
+    {
+        return true;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerImpl.jflex b/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerImpl.jflex
new file mode 100644
index 0000000..d0270ff
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerImpl.jflex

@@ -0,0 +1,220 @@
+package org.apache.cassandra.index.sasi.analyzer;
+
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+import java.util.Arrays;
+
+/**
+ * This class implements Word Break rules from the Unicode Text Segmentation 
+ * algorithm, as specified in 
+ * <a href="http://unicode.org/reports/tr29/">Unicode Standard Annex #29</a>. ∂
+ * <p/>
+ * Tokens produced are of the following types:
+ * <ul>
+ *   <li>&lt;ALPHANUM&gt;: A sequence of alphabetic and numeric characters</li>
+ *   <li>&lt;NUM&gt;: A number</li>
+ *   <li>&lt;SOUTHEAST_ASIAN&gt;: A sequence of characters from South and Southeast
+ *       Asian languages, including Thai, Lao, Myanmar, and Khmer</li>
+ *   <li>&lt;IDEOGRAPHIC&gt;: A single CJKV ideographic character</li>
+ *   <li>&lt;HIRAGANA&gt;: A single hiragana character</li>
+ *   <li>&lt;KATAKANA&gt;: A sequence of katakana characters</li>
+ *   <li>&lt;HANGUL&gt;: A sequence of Hangul characters</li>
+ * </ul>
+ */
+%%
+
+%unicode 6.3
+%integer
+%final
+%public
+%class StandardTokenizerImpl
+%implements StandardTokenizerInterface
+%function getNextToken
+%char
+%buffer 4096
+
+%include SUPPLEMENTARY.jflex-macro
+ALetter           = (\p{WB:ALetter}                                     | {ALetterSupp})
+Format            = (\p{WB:Format}                                      | {FormatSupp})
+Numeric           = ([\p{WB:Numeric}[\p{Blk:HalfAndFullForms}&&\p{Nd}]] | {NumericSupp})
+Extend            = (\p{WB:Extend}                                      | {ExtendSupp})
+Katakana          = (\p{WB:Katakana}                                    | {KatakanaSupp})
+MidLetter         = (\p{WB:MidLetter}                                   | {MidLetterSupp})
+MidNum            = (\p{WB:MidNum}                                      | {MidNumSupp})
+MidNumLet         = (\p{WB:MidNumLet}                                   | {MidNumLetSupp})
+ExtendNumLet      = (\p{WB:ExtendNumLet}                                | {ExtendNumLetSupp})
+ComplexContext    = (\p{LB:Complex_Context}                             | {ComplexContextSupp})
+Han               = (\p{Script:Han}                                     | {HanSupp})
+Hiragana          = (\p{Script:Hiragana}                                | {HiraganaSupp})
+SingleQuote       = (\p{WB:Single_Quote}                                | {SingleQuoteSupp})
+DoubleQuote       = (\p{WB:Double_Quote}                                | {DoubleQuoteSupp})
+HebrewLetter      = (\p{WB:Hebrew_Letter}                               | {HebrewLetterSupp})
+RegionalIndicator = (\p{WB:Regional_Indicator}                          | {RegionalIndicatorSupp})
+HebrewOrALetter   = ({HebrewLetter} | {ALetter})
+
+// UAX#29 WB4. X (Extend | Format)* --> X
+//
+HangulEx            = [\p{Script:Hangul}&&[\p{WB:ALetter}\p{WB:Hebrew_Letter}]] ({Format} | {Extend})*
+HebrewOrALetterEx   = {HebrewOrALetter}                                         ({Format} | {Extend})*
+NumericEx           = {Numeric}                                                 ({Format} | {Extend})*
+KatakanaEx          = {Katakana}                                                ({Format} | {Extend})* 
+MidLetterEx         = ({MidLetter} | {MidNumLet} | {SingleQuote})               ({Format} | {Extend})* 
+MidNumericEx        = ({MidNum} | {MidNumLet} | {SingleQuote})                  ({Format} | {Extend})*
+ExtendNumLetEx      = {ExtendNumLet}                                            ({Format} | {Extend})*
+HanEx               = {Han}                                                     ({Format} | {Extend})*
+HiraganaEx          = {Hiragana}                                                ({Format} | {Extend})*
+SingleQuoteEx       = {SingleQuote}                                             ({Format} | {Extend})*                                            
+DoubleQuoteEx       = {DoubleQuote}                                             ({Format} | {Extend})*
+HebrewLetterEx      = {HebrewLetter}                                            ({Format} | {Extend})*
+RegionalIndicatorEx = {RegionalIndicator}                                       ({Format} | {Extend})*
+
+
+%{
+  /** Alphanumeric sequences */
+  public static final int WORD_TYPE = StandardAnalyzer.TokenType.ALPHANUM.value;
+  
+  /** Numbers */
+  public static final int NUMERIC_TYPE = StandardAnalyzer.TokenType.NUM.value;
+  
+  /**
+   * Chars in class \p{Line_Break = Complex_Context} are from South East Asian
+   * scripts (Thai, Lao, Myanmar, Khmer, etc.).  Sequences of these are kept 
+   * together as as a single token rather than broken up, because the logic
+   * required to break them at word boundaries is too complex for UAX#29.
+   * <p>
+   * See Unicode Line Breaking Algorithm: http://www.unicode.org/reports/tr14/#SA
+   */
+  public static final int SOUTH_EAST_ASIAN_TYPE = StandardAnalyzer.TokenType.SOUTHEAST_ASIAN.value;
+  
+  public static final int IDEOGRAPHIC_TYPE = StandardAnalyzer.TokenType.IDEOGRAPHIC.value;
+  
+  public static final int HIRAGANA_TYPE = StandardAnalyzer.TokenType.HIRAGANA.value;
+  
+  public static final int KATAKANA_TYPE = StandardAnalyzer.TokenType.KATAKANA.value;
+  
+  public static final int HANGUL_TYPE = StandardAnalyzer.TokenType.HANGUL.value;
+
+  public final int yychar()
+  {
+    return yychar;
+  }
+
+  public String getText()
+  {
+    return String.valueOf(zzBuffer, zzStartRead, zzMarkedPos-zzStartRead);
+  }
+
+  public char[] getArray()
+  {
+    return Arrays.copyOfRange(zzBuffer, zzStartRead, zzMarkedPos);
+  }
+
+  public byte[] getBytes()
+  {
+    return getText().getBytes();
+  }
+
+%}
+
+%%
+
+// UAX#29 WB1.   sot   ÷
+//        WB2.     ÷   eot
+//
+<<EOF>> { return StandardAnalyzer.TokenType.EOF.value; }
+
+// UAX#29 WB8.   Numeric × Numeric
+//        WB11.  Numeric (MidNum | MidNumLet | Single_Quote) × Numeric
+//        WB12.  Numeric × (MidNum | MidNumLet | Single_Quote) Numeric
+//        WB13a. (ALetter | Hebrew_Letter | Numeric | Katakana | ExtendNumLet) × ExtendNumLet
+//        WB13b. ExtendNumLet × (ALetter | Hebrew_Letter | Numeric | Katakana) 
+//
+{ExtendNumLetEx}* {NumericEx} ( ( {ExtendNumLetEx}* | {MidNumericEx} ) {NumericEx} )* {ExtendNumLetEx}* 
+  { return NUMERIC_TYPE; }
+
+// subset of the below for typing purposes only!
+{HangulEx}+
+  { return HANGUL_TYPE; }
+  
+{KatakanaEx}+
+  { return KATAKANA_TYPE; }
+
+// UAX#29 WB5.   (ALetter | Hebrew_Letter) × (ALetter | Hebrew_Letter)
+//        WB6.   (ALetter | Hebrew_Letter) × (MidLetter | MidNumLet | Single_Quote) (ALetter | Hebrew_Letter)
+//        WB7.   (ALetter | Hebrew_Letter) (MidLetter | MidNumLet | Single_Quote) × (ALetter | Hebrew_Letter)
+//        WB7a.  Hebrew_Letter × Single_Quote
+//        WB7b.  Hebrew_Letter × Double_Quote Hebrew_Letter
+//        WB7c.  Hebrew_Letter Double_Quote × Hebrew_Letter
+//        WB9.   (ALetter | Hebrew_Letter) × Numeric
+//        WB10.  Numeric × (ALetter | Hebrew_Letter)
+//        WB13.  Katakana × Katakana
+//        WB13a. (ALetter | Hebrew_Letter | Numeric | Katakana | ExtendNumLet) × ExtendNumLet
+//        WB13b. ExtendNumLet × (ALetter | Hebrew_Letter | Numeric | Katakana) 
+//
+{ExtendNumLetEx}*  ( {KatakanaEx}          ( {ExtendNumLetEx}*   {KatakanaEx}                           )*
+                   | ( {HebrewLetterEx}    ( {SingleQuoteEx}     | {DoubleQuoteEx}  {HebrewLetterEx}    )
+                     | {NumericEx}         ( ( {ExtendNumLetEx}* | {MidNumericEx} ) {NumericEx}         )*
+                     | {HebrewOrALetterEx} ( ( {ExtendNumLetEx}* | {MidLetterEx}  ) {HebrewOrALetterEx} )*
+                     )+
+                   )
+({ExtendNumLetEx}+ ( {KatakanaEx}          ( {ExtendNumLetEx}*   {KatakanaEx}                           )*
+                   | ( {HebrewLetterEx}    ( {SingleQuoteEx}     | {DoubleQuoteEx}  {HebrewLetterEx}    )
+                     | {NumericEx}         ( ( {ExtendNumLetEx}* | {MidNumericEx} ) {NumericEx}         )*
+                     | {HebrewOrALetterEx} ( ( {ExtendNumLetEx}* | {MidLetterEx}  ) {HebrewOrALetterEx} )*
+                     )+
+                   )
+)*
+{ExtendNumLetEx}* 
+  { return WORD_TYPE; }
+
+
+// From UAX #29:
+//
+//    [C]haracters with the Line_Break property values of Contingent_Break (CB), 
+//    Complex_Context (SA/South East Asian), and XX (Unknown) are assigned word 
+//    boundary property values based on criteria outside of the scope of this
+//    annex.  That means that satisfactory treatment of languages like Chinese
+//    or Thai requires special handling.
+// 
+// In Unicode 6.3, only one character has the \p{Line_Break = Contingent_Break}
+// property: U+FFFC (  ) OBJECT REPLACEMENT CHARACTER.
+//
+// In the ICU implementation of UAX#29, \p{Line_Break = Complex_Context}
+// character sequences (from South East Asian scripts like Thai, Myanmar, Khmer,
+// Lao, etc.) are kept together.  This grammar does the same below.
+//
+// See also the Unicode Line Breaking Algorithm:
+//
+//    http://www.unicode.org/reports/tr14/#SA
+//
+{ComplexContext}+ { return SOUTH_EAST_ASIAN_TYPE; }
+
+// UAX#29 WB14.  Any ÷ Any
+//
+{HanEx} { return IDEOGRAPHIC_TYPE; }
+{HiraganaEx} { return HIRAGANA_TYPE; }
+
+
+// UAX#29 WB3.   CR × LF
+//        WB3a.  (Newline | CR | LF) ÷
+//        WB3b.  ÷ (Newline | CR | LF)
+//        WB13c. Regional_Indicator × Regional_Indicator
+//        WB14.  Any ÷ Any
+//
+{RegionalIndicatorEx} {RegionalIndicatorEx}+ | [^]
+  { /* Break so we don't hit fall-through warning: */ break; /* Not numeric, word, ideographic, hiragana, or SE Asian -- ignore it. */ }

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerInterface.java b/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerInterface.java
new file mode 100644
index 0000000..57e35d7
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerInterface.java

@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer;
+
+import java.io.IOException;
+import java.io.Reader;
+
+/**
+ * Internal interface for supporting versioned grammars.
+ */
+public interface StandardTokenizerInterface
+{
+
+    String getText();
+
+    char[] getArray();
+
+    byte[] getBytes();
+
+    /**
+     * Returns the current position.
+     */
+    int yychar();
+
+    /**
+     * Returns the length of the matched text region.
+     */
+    int yylength();
+
+    /**
+     * Resumes scanning until the next regular expression is matched,
+     * the end of input is encountered or an I/O-Error occurs.
+     *
+     * @return      the next token, {@link #YYEOF} on end of stream
+     * @exception   java.io.IOException  if any I/O-Error occurs
+     */
+    int getNextToken() throws IOException;
+
+    /**
+     * Resets the scanner to read from a new input stream.
+     * Does not close the old reader.
+     *
+     * All internal variables are reset, the old input stream
+     * <b>cannot</b> be reused (internal buffer is discarded and lost).
+     * Lexical state is set to <tt>ZZ_INITIAL</tt>.
+     *
+     * @param reader   the new input stream
+     */
+    void yyreset(Reader reader);
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerOptions.java b/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerOptions.java
new file mode 100644
index 0000000..2a5e4ef
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/StandardTokenizerOptions.java

@@ -0,0 +1,272 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer;
+
+import java.util.Locale;
+import java.util.Map;
+
+/**
+ * Various options for controlling tokenization and enabling
+ * or disabling features
+ */
+public class StandardTokenizerOptions
+{
+    public static final String TOKENIZATION_ENABLE_STEMMING = "tokenization_enable_stemming";
+    public static final String TOKENIZATION_SKIP_STOP_WORDS = "tokenization_skip_stop_words";
+    public static final String TOKENIZATION_LOCALE = "tokenization_locale";
+    public static final String TOKENIZATION_NORMALIZE_LOWERCASE = "tokenization_normalize_lowercase";
+    public static final String TOKENIZATION_NORMALIZE_UPPERCASE = "tokenization_normalize_uppercase";
+
+    public static final int DEFAULT_MAX_TOKEN_LENGTH = 255;
+    public static final int DEFAULT_MIN_TOKEN_LENGTH = 0;
+
+    private boolean stemTerms;
+    private boolean ignoreStopTerms;
+    private Locale locale;
+    private boolean caseSensitive;
+    private boolean allTermsToUpperCase;
+    private boolean allTermsToLowerCase;
+    private int minTokenLength;
+    private int maxTokenLength;
+
+    public boolean shouldStemTerms()
+    {
+        return stemTerms;
+    }
+
+    public void setStemTerms(boolean stemTerms)
+    {
+        this.stemTerms = stemTerms;
+    }
+
+    public boolean shouldIgnoreStopTerms()
+    {
+        return ignoreStopTerms;
+    }
+
+    public void setIgnoreStopTerms(boolean ignoreStopTerms)
+    {
+        this.ignoreStopTerms = ignoreStopTerms;
+    }
+
+    public Locale getLocale()
+    {
+        return locale;
+    }
+
+    public void setLocale(Locale locale)
+    {
+        this.locale = locale;
+    }
+
+    public boolean isCaseSensitive()
+    {
+        return caseSensitive;
+    }
+
+    public void setCaseSensitive(boolean caseSensitive)
+    {
+        this.caseSensitive = caseSensitive;
+    }
+
+    public boolean shouldUpperCaseTerms()
+    {
+        return allTermsToUpperCase;
+    }
+
+    public void setAllTermsToUpperCase(boolean allTermsToUpperCase)
+    {
+        this.allTermsToUpperCase = allTermsToUpperCase;
+    }
+
+    public boolean shouldLowerCaseTerms()
+    {
+        return allTermsToLowerCase;
+    }
+
+    public void setAllTermsToLowerCase(boolean allTermsToLowerCase)
+    {
+        this.allTermsToLowerCase = allTermsToLowerCase;
+    }
+
+    public int getMinTokenLength()
+    {
+        return minTokenLength;
+    }
+
+    public void setMinTokenLength(int minTokenLength)
+    {
+        this.minTokenLength = minTokenLength;
+    }
+
+    public int getMaxTokenLength()
+    {
+        return maxTokenLength;
+    }
+
+    public void setMaxTokenLength(int maxTokenLength)
+    {
+        this.maxTokenLength = maxTokenLength;
+    }
+
+    public static class OptionsBuilder {
+        private boolean stemTerms;
+        private boolean ignoreStopTerms;
+        private Locale locale;
+        private boolean caseSensitive;
+        private boolean allTermsToUpperCase;
+        private boolean allTermsToLowerCase;
+        private int minTokenLength = DEFAULT_MIN_TOKEN_LENGTH;
+        private int maxTokenLength = DEFAULT_MAX_TOKEN_LENGTH;
+
+        public OptionsBuilder()
+        {
+        }
+
+        public OptionsBuilder stemTerms(boolean stemTerms)
+        {
+            this.stemTerms = stemTerms;
+            return this;
+        }
+
+        public OptionsBuilder ignoreStopTerms(boolean ignoreStopTerms)
+        {
+            this.ignoreStopTerms = ignoreStopTerms;
+            return this;
+        }
+
+        public OptionsBuilder useLocale(Locale locale)
+        {
+            this.locale = locale;
+            return this;
+        }
+
+        public OptionsBuilder caseSensitive(boolean caseSensitive)
+        {
+            this.caseSensitive = caseSensitive;
+            return this;
+        }
+
+        public OptionsBuilder alwaysUpperCaseTerms(boolean allTermsToUpperCase)
+        {
+            this.allTermsToUpperCase = allTermsToUpperCase;
+            return this;
+        }
+
+        public OptionsBuilder alwaysLowerCaseTerms(boolean allTermsToLowerCase)
+        {
+            this.allTermsToLowerCase = allTermsToLowerCase;
+            return this;
+        }
+
+        /**
+         * Set the min allowed token length.  Any token shorter
+         * than this is skipped.
+         */
+        public OptionsBuilder minTokenLength(int minTokenLength)
+        {
+            if (minTokenLength < 1)
+                throw new IllegalArgumentException("minTokenLength must be greater than zero");
+            this.minTokenLength = minTokenLength;
+            return this;
+        }
+
+        /**
+         * Set the max allowed token length.  Any token longer
+         * than this is skipped.
+         */
+        public OptionsBuilder maxTokenLength(int maxTokenLength)
+        {
+            if (maxTokenLength < 1)
+                throw new IllegalArgumentException("maxTokenLength must be greater than zero");
+            this.maxTokenLength = maxTokenLength;
+            return this;
+        }
+
+        public StandardTokenizerOptions build()
+        {
+            if(allTermsToLowerCase && allTermsToUpperCase)
+                throw new IllegalArgumentException("Options to normalize terms cannot be " +
+                        "both uppercase and lowercase at the same time");
+
+            StandardTokenizerOptions options = new StandardTokenizerOptions();
+            options.setIgnoreStopTerms(ignoreStopTerms);
+            options.setStemTerms(stemTerms);
+            options.setLocale(locale);
+            options.setCaseSensitive(caseSensitive);
+            options.setAllTermsToLowerCase(allTermsToLowerCase);
+            options.setAllTermsToUpperCase(allTermsToUpperCase);
+            options.setMinTokenLength(minTokenLength);
+            options.setMaxTokenLength(maxTokenLength);
+            return options;
+        }
+    }
+
+    public static StandardTokenizerOptions buildFromMap(Map<String, String> optionsMap)
+    {
+        OptionsBuilder optionsBuilder = new OptionsBuilder();
+
+        for (Map.Entry<String, String> entry : optionsMap.entrySet())
+        {
+            switch(entry.getKey())
+            {
+                case TOKENIZATION_ENABLE_STEMMING:
+                {
+                    boolean bool = Boolean.parseBoolean(entry.getValue());
+                    optionsBuilder = optionsBuilder.stemTerms(bool);
+                    break;
+                }
+                case TOKENIZATION_SKIP_STOP_WORDS:
+                {
+                    boolean bool = Boolean.parseBoolean(entry.getValue());
+                    optionsBuilder = optionsBuilder.ignoreStopTerms(bool);
+                    break;
+                }
+                case TOKENIZATION_LOCALE:
+                {
+                    Locale locale = new Locale(entry.getValue());
+                    optionsBuilder = optionsBuilder.useLocale(locale);
+                    break;
+                }
+                case TOKENIZATION_NORMALIZE_UPPERCASE:
+                {
+                    boolean bool = Boolean.parseBoolean(entry.getValue());
+                    optionsBuilder = optionsBuilder.alwaysUpperCaseTerms(bool);
+                    break;
+                }
+                case TOKENIZATION_NORMALIZE_LOWERCASE:
+                {
+                    boolean bool = Boolean.parseBoolean(entry.getValue());
+                    optionsBuilder = optionsBuilder.alwaysLowerCaseTerms(bool);
+                    break;
+                }
+                default:
+                {
+                }
+            }
+        }
+        return optionsBuilder.build();
+    }
+
+    public static StandardTokenizerOptions getDefaultOptions()
+    {
+        return new OptionsBuilder()
+                .ignoreStopTerms(true).alwaysLowerCaseTerms(true)
+                .stemTerms(false).useLocale(Locale.ENGLISH).build();
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/filter/BasicResultFilters.java b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/BasicResultFilters.java
new file mode 100644
index 0000000..2b949b8
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/BasicResultFilters.java

@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer.filter;
+
+import java.util.Locale;
+
+/**
+ * Basic/General Token Filters
+ */
+public class BasicResultFilters
+{
+    private static final Locale DEFAULT_LOCALE = Locale.getDefault();
+
+    public static class LowerCase extends FilterPipelineTask<String, String>
+    {
+        private Locale locale;
+
+        public LowerCase(Locale locale)
+        {
+            this.locale = locale;
+        }
+
+        public LowerCase()
+        {
+            this.locale = DEFAULT_LOCALE;
+        }
+
+        public String process(String input) throws Exception
+        {
+            return input.toLowerCase(locale);
+        }
+    }
+
+    public static class UpperCase extends FilterPipelineTask<String, String>
+    {
+        private Locale locale;
+
+        public UpperCase(Locale locale)
+        {
+            this.locale = locale;
+        }
+
+        public UpperCase()
+        {
+            this.locale = DEFAULT_LOCALE;
+        }
+
+        public String process(String input) throws Exception
+        {
+            return input.toUpperCase(locale);
+        }
+    }
+
+    public static class NoOperation extends FilterPipelineTask<Object, Object>
+    {
+        public Object process(Object input) throws Exception
+        {
+            return input;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/filter/FilterPipelineBuilder.java b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/FilterPipelineBuilder.java
new file mode 100644
index 0000000..e9d262d
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/FilterPipelineBuilder.java

@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer.filter;
+
+/**
+ * Creates a Pipeline object for applying n pieces of logic
+ * from the provided methods to the builder in a guaranteed order
+ */
+public class FilterPipelineBuilder
+{
+    private final FilterPipelineTask<?,?> parent;
+    private FilterPipelineTask<?,?> current;
+
+    public FilterPipelineBuilder(FilterPipelineTask<?, ?> first)
+    {
+        this(first, first);
+    }
+
+    private FilterPipelineBuilder(FilterPipelineTask<?, ?> first, FilterPipelineTask<?, ?> current)
+    {
+        this.parent = first;
+        this.current = current;
+    }
+
+    public FilterPipelineBuilder add(String name, FilterPipelineTask<?,?> nextTask)
+    {
+        this.current.setLast(name, nextTask);
+        this.current = nextTask;
+        return this;
+    }
+
+    public FilterPipelineTask<?,?> build()
+    {
+        return this.parent;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/filter/FilterPipelineExecutor.java b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/FilterPipelineExecutor.java
new file mode 100644
index 0000000..68c055e
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/FilterPipelineExecutor.java

@@ -0,0 +1,53 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer.filter;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Executes all linked Pipeline Tasks serially and returns
+ * output (if exists) from the executed logic
+ */
+public class FilterPipelineExecutor
+{
+    private static final Logger logger = LoggerFactory.getLogger(FilterPipelineExecutor.class);
+
+    public static <F,T> T execute(FilterPipelineTask<F, T> task, T initialInput)
+    {
+        FilterPipelineTask<?, ?> taskPtr = task;
+        T result = initialInput;
+        try
+        {
+            while (true)
+            {
+                FilterPipelineTask<F,T> taskGeneric = (FilterPipelineTask<F,T>) taskPtr;
+                result = taskGeneric.process((F) result);
+                taskPtr = taskPtr.next;
+                if(taskPtr == null)
+                    return result;
+            }
+        }
+        catch (Exception e)
+        {
+            logger.info("An unhandled exception to occurred while processing " +
+                    "pipeline [{}]", task.getName(), e);
+        }
+        return null;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/filter/FilterPipelineTask.java b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/FilterPipelineTask.java
new file mode 100644
index 0000000..13e2a17
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/FilterPipelineTask.java

@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer.filter;
+
+/**
+ * A single task or set of work to process an input
+ * and return a single output. Maintains a link to the
+ * next task to be executed after itself
+ */
+public abstract class FilterPipelineTask<F, T>
+{
+    private String name;
+    public FilterPipelineTask<?, ?> next;
+
+    protected <K, V> void setLast(String name, FilterPipelineTask<K, V> last)
+    {
+        if (last == this)
+            throw new IllegalArgumentException("provided last task [" + last.name + "] cannot be set to itself");
+
+        if (this.next == null)
+        {
+            this.next = last;
+            this.name = name;
+        }
+        else
+        {
+            this.next.setLast(name, last);
+        }
+    }
+
+    public abstract T process(F input) throws Exception;
+
+    public String getName()
+    {
+        return name;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StemmerFactory.java b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StemmerFactory.java
new file mode 100644
index 0000000..04da55c
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StemmerFactory.java

@@ -0,0 +1,101 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer.filter;
+
+import java.lang.reflect.Constructor;
+import java.util.HashMap;
+import java.util.Locale;
+import java.util.Map;
+
+import org.tartarus.snowball.SnowballStemmer;
+import org.tartarus.snowball.ext.*;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Returns a SnowballStemmer instance appropriate for
+ * a given language
+ */
+public class StemmerFactory
+{
+    private static final Logger logger = LoggerFactory.getLogger(StemmerFactory.class);
+    private static final LoadingCache<Class, Constructor<?>> STEMMER_CONSTRUCTOR_CACHE = CacheBuilder.newBuilder()
+            .build(new CacheLoader<Class, Constructor<?>>()
+            {
+                public Constructor<?> load(Class aClass) throws Exception
+                {
+                    try
+                    {
+                        return aClass.getConstructor();
+                    }
+                    catch (Exception e) {
+                        logger.error("Failed to get stemmer constructor", e);
+                    }
+                    return null;
+                }
+            });
+
+    private static final Map<String, Class> SUPPORTED_LANGUAGES;
+
+    static
+    {
+        SUPPORTED_LANGUAGES = new HashMap<>();
+        SUPPORTED_LANGUAGES.put("de", germanStemmer.class);
+        SUPPORTED_LANGUAGES.put("da", danishStemmer.class);
+        SUPPORTED_LANGUAGES.put("es", spanishStemmer.class);
+        SUPPORTED_LANGUAGES.put("en", englishStemmer.class);
+        SUPPORTED_LANGUAGES.put("fl", finnishStemmer.class);
+        SUPPORTED_LANGUAGES.put("fr", frenchStemmer.class);
+        SUPPORTED_LANGUAGES.put("hu", hungarianStemmer.class);
+        SUPPORTED_LANGUAGES.put("it", italianStemmer.class);
+        SUPPORTED_LANGUAGES.put("nl", dutchStemmer.class);
+        SUPPORTED_LANGUAGES.put("no", norwegianStemmer.class);
+        SUPPORTED_LANGUAGES.put("pt", portugueseStemmer.class);
+        SUPPORTED_LANGUAGES.put("ro", romanianStemmer.class);
+        SUPPORTED_LANGUAGES.put("ru", russianStemmer.class);
+        SUPPORTED_LANGUAGES.put("sv", swedishStemmer.class);
+        SUPPORTED_LANGUAGES.put("tr", turkishStemmer.class);
+    }
+
+    public static SnowballStemmer getStemmer(Locale locale)
+    {
+        if (locale == null)
+            return null;
+
+        String rootLang = locale.getLanguage().substring(0, 2);
+        try
+        {
+            Class clazz = SUPPORTED_LANGUAGES.get(rootLang);
+            if(clazz == null)
+                return null;
+            Constructor<?> ctor = STEMMER_CONSTRUCTOR_CACHE.get(clazz);
+            return (SnowballStemmer) ctor.newInstance();
+        }
+        catch (Exception e)
+        {
+            logger.debug("Failed to create new SnowballStemmer instance " +
+                    "for language [{}]", locale.getLanguage(), e);
+        }
+        return null;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StemmingFilters.java b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StemmingFilters.java
new file mode 100644
index 0000000..cb840a8
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StemmingFilters.java

@@ -0,0 +1,46 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer.filter;
+
+import java.util.Locale;
+
+import org.tartarus.snowball.SnowballStemmer;
+
+/**
+ * Filters for performing Stemming on tokens
+ */
+public class StemmingFilters
+{
+    public static class DefaultStemmingFilter extends FilterPipelineTask<String, String>
+    {
+        private SnowballStemmer stemmer;
+
+        public DefaultStemmingFilter(Locale locale)
+        {
+            stemmer = StemmerFactory.getStemmer(locale);
+        }
+
+        public String process(String input) throws Exception
+        {
+            if (input == null || stemmer == null)
+                return input;
+            stemmer.setCurrent(input);
+            return (stemmer.stem()) ? stemmer.getCurrent() : input;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StopWordFactory.java b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StopWordFactory.java
new file mode 100644
index 0000000..8ec02e0
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StopWordFactory.java

@@ -0,0 +1,100 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer.filter;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.nio.charset.StandardCharsets;
+import java.util.Arrays;
+import java.util.HashSet;
+import java.util.Locale;
+import java.util.Set;
+import java.util.concurrent.ExecutionException;
+
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/**
+ * Provides a list of Stop Words for a given language
+ */
+public class StopWordFactory
+{
+    private static final Logger logger = LoggerFactory.getLogger(StopWordFactory.class);
+
+    private static final String DEFAULT_RESOURCE_EXT = "_ST.txt";
+    private static final String DEFAULT_RESOURCE_PREFIX = StopWordFactory.class.getPackage()
+            .getName().replace(".", File.separator);
+    private static final Set<String> SUPPORTED_LANGUAGES = new HashSet<>(
+            Arrays.asList("ar","bg","cs","de","en","es","fi","fr","hi","hu","it",
+            "pl","pt","ro","ru","sv"));
+
+    private static final LoadingCache<String, Set<String>> STOP_WORDS_CACHE = CacheBuilder.newBuilder()
+            .build(new CacheLoader<String, Set<String>>()
+            {
+                public Set<String> load(String s)
+                {
+                    return getStopWordsFromResource(s);
+                }
+            });
+
+    public static Set<String> getStopWordsForLanguage(Locale locale)
+    {
+        if (locale == null)
+            return null;
+
+        String rootLang = locale.getLanguage().substring(0, 2);
+        try
+        {
+            return (!SUPPORTED_LANGUAGES.contains(rootLang)) ? null : STOP_WORDS_CACHE.get(rootLang);
+        }
+        catch (ExecutionException e)
+        {
+            logger.error("Failed to populate Stop Words Cache for language [{}]", locale.getLanguage(), e);
+            return null;
+        }
+    }
+
+    private static Set<String> getStopWordsFromResource(String language)
+    {
+        Set<String> stopWords = new HashSet<>();
+        String resourceName = DEFAULT_RESOURCE_PREFIX + File.separator + language + DEFAULT_RESOURCE_EXT;
+        try (InputStream is = StopWordFactory.class.getClassLoader().getResourceAsStream(resourceName);
+             BufferedReader r = new BufferedReader(new InputStreamReader(is, StandardCharsets.UTF_8)))
+        {
+                String line;
+                while ((line = r.readLine()) != null)
+                {
+                    //skip comments (lines starting with # char)
+                    if(line.charAt(0) == '#')
+                        continue;
+                    stopWords.add(line.trim());
+                }
+        }
+        catch (Exception e)
+        {
+            logger.error("Failed to retrieve Stop Terms resource for language [{}]", language, e);
+        }
+        return stopWords;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StopWordFilters.java b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StopWordFilters.java
new file mode 100644
index 0000000..4ae849c
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/analyzer/filter/StopWordFilters.java

@@ -0,0 +1,42 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer.filter;
+
+import java.util.Locale;
+import java.util.Set;
+
+/**
+ * Filter implementations for input matching Stop Words
+ */
+public class StopWordFilters
+{
+    public static class DefaultStopWordFilter extends FilterPipelineTask<String, String>
+    {
+        private Set<String> stopWords = null;
+
+        public DefaultStopWordFilter(Locale locale)
+        {
+            this.stopWords = StopWordFactory.getStopWordsForLanguage(locale);
+        }
+
+        public String process(String input) throws Exception
+        {
+            return (stopWords != null && stopWords.contains(input)) ? null : input;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/conf/ColumnIndex.java b/src/java/org/apache/cassandra/index/sasi/conf/ColumnIndex.java
new file mode 100644
index 0000000..3f268e3
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/conf/ColumnIndex.java

@@ -0,0 +1,250 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.conf;
+
+import java.nio.ByteBuffer;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Optional;
+import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.atomic.AtomicReference;
+
+import com.google.common.annotations.VisibleForTesting;
+
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.cql3.Operator;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.Memtable;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.marshal.AsciiType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.db.rows.Cell;
+import org.apache.cassandra.db.rows.Row;
+import org.apache.cassandra.index.sasi.analyzer.AbstractAnalyzer;
+import org.apache.cassandra.index.sasi.conf.view.View;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder;
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.memory.IndexMemtable;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.index.sasi.plan.Expression.Op;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.index.sasi.utils.RangeUnionIterator;
+import org.apache.cassandra.io.sstable.Component;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.schema.IndexMetadata;
+import org.apache.cassandra.utils.FBUtilities;
+
+public class ColumnIndex
+{
+    private static final String FILE_NAME_FORMAT = "SI_%s.db";
+
+    private final AbstractType<?> keyValidator;
+
+    private final ColumnDefinition column;
+    private final Optional<IndexMetadata> config;
+
+    private final AtomicReference<IndexMemtable> memtable;
+    private final ConcurrentMap<Memtable, IndexMemtable> pendingFlush = new ConcurrentHashMap<>();
+
+    private final IndexMode mode;
+
+    private final Component component;
+    private final DataTracker tracker;
+
+    private final boolean isTokenized;
+
+    public ColumnIndex(AbstractType<?> keyValidator, ColumnDefinition column, IndexMetadata metadata)
+    {
+        this.keyValidator = keyValidator;
+        this.column = column;
+        this.config = metadata == null ? Optional.empty() : Optional.of(metadata);
+        this.mode = IndexMode.getMode(column, config);
+        this.memtable = new AtomicReference<>(new IndexMemtable(this));
+        this.tracker = new DataTracker(keyValidator, this);
+        this.component = new Component(Component.Type.SECONDARY_INDEX, String.format(FILE_NAME_FORMAT, getIndexName()));
+        this.isTokenized = getAnalyzer().isTokenizing();
+    }
+
+    /**
+     * Initialize this column index with specific set of SSTables.
+     *
+     * @param sstables The sstables to be used by index initially.
+     *
+     * @return A collection of sstables which don't have this specific index attached to them.
+     */
+    public Iterable<SSTableReader> init(Set<SSTableReader> sstables)
+    {
+        return tracker.update(Collections.emptySet(), sstables);
+    }
+
+    public AbstractType<?> keyValidator()
+    {
+        return keyValidator;
+    }
+
+    public long index(DecoratedKey key, Row row)
+    {
+        return getCurrentMemtable().index(key, getValueOf(column, row, FBUtilities.nowInSeconds()));
+    }
+
+    public void switchMemtable()
+    {
+        // discard current memtable with all of it's data, useful on truncate
+        memtable.set(new IndexMemtable(this));
+    }
+
+    public void switchMemtable(Memtable parent)
+    {
+        pendingFlush.putIfAbsent(parent, memtable.getAndSet(new IndexMemtable(this)));
+    }
+
+    public void discardMemtable(Memtable parent)
+    {
+        pendingFlush.remove(parent);
+    }
+
+    @VisibleForTesting
+    public IndexMemtable getCurrentMemtable()
+    {
+        return memtable.get();
+    }
+
+    @VisibleForTesting
+    public Collection<IndexMemtable> getPendingMemtables()
+    {
+        return pendingFlush.values();
+    }
+
+    public RangeIterator<Long, Token> searchMemtable(Expression e)
+    {
+        RangeIterator.Builder<Long, Token> builder = new RangeUnionIterator.Builder<>();
+        builder.add(getCurrentMemtable().search(e));
+        for (IndexMemtable memtable : getPendingMemtables())
+            builder.add(memtable.search(e));
+
+        return builder.build();
+    }
+
+    public void update(Collection<SSTableReader> oldSSTables, Collection<SSTableReader> newSSTables)
+    {
+        tracker.update(oldSSTables, newSSTables);
+    }
+
+    public ColumnDefinition getDefinition()
+    {
+        return column;
+    }
+
+    public AbstractType<?> getValidator()
+    {
+        return column.cellValueType();
+    }
+
+    public Component getComponent()
+    {
+        return component;
+    }
+
+    public IndexMode getMode()
+    {
+        return mode;
+    }
+
+    public String getColumnName()
+    {
+        return column.name.toString();
+    }
+
+    public String getIndexName()
+    {
+        return config.isPresent() ? config.get().name : "undefined";
+    }
+
+    public AbstractAnalyzer getAnalyzer()
+    {
+        AbstractAnalyzer analyzer = mode.getAnalyzer(getValidator());
+        analyzer.init(config.isPresent() ? config.get().options : Collections.emptyMap(), column.cellValueType());
+        return analyzer;
+    }
+
+    public View getView()
+    {
+        return tracker.getView();
+    }
+
+    public boolean hasSSTable(SSTableReader sstable)
+    {
+        return tracker.hasSSTable(sstable);
+    }
+
+    public void dropData(long truncateUntil)
+    {
+        switchMemtable();
+        tracker.dropData(truncateUntil);
+    }
+
+    public boolean isIndexed()
+    {
+        return mode != IndexMode.NOT_INDEXED;
+    }
+
+    public boolean isLiteral()
+    {
+        AbstractType<?> validator = getValidator();
+        return isIndexed() ? mode.isLiteral : (validator instanceof UTF8Type || validator instanceof AsciiType);
+    }
+
+    public boolean supports(Operator op)
+    {
+        if (op == Operator.LIKE)
+            return isLiteral();
+
+        Op operator = Op.valueOf(op);
+        return !(isTokenized && operator == Op.EQ) // EQ is only applicable to non-tokenized indexes
+               && !(isTokenized && mode.mode == OnDiskIndexBuilder.Mode.CONTAINS && operator == Op.PREFIX) // PREFIX not supported on tokenized CONTAINS mode indexes
+               && !(isLiteral() && operator == Op.RANGE) // RANGE only applicable to indexes non-literal indexes
+               && mode.supports(operator); // for all other cases let's refer to index itself
+
+    }
+
+    public static ByteBuffer getValueOf(ColumnDefinition column, Row row, int nowInSecs)
+    {
+        if (row == null)
+            return null;
+
+        switch (column.kind)
+        {
+            case CLUSTERING:
+                return row.clustering().get(column.position());
+
+            // treat static cell retrieval the same was as regular
+            // only if row kind is STATIC otherwise return null
+            case STATIC:
+                if (!row.isStatic())
+                    return null;
+            case REGULAR:
+                Cell cell = row.getCell(column);
+                return cell == null || !cell.isLive(nowInSecs) ? null : cell.value();
+
+            default:
+                return null;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java b/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java
new file mode 100644
index 0000000..9475d12
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/conf/DataTracker.java

@@ -0,0 +1,162 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.conf;
+
+import java.io.File;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.HashSet;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicReference;
+import java.util.stream.Collectors;
+
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.index.sasi.SSTableIndex;
+import org.apache.cassandra.index.sasi.conf.view.View;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+
+import com.google.common.collect.Sets;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+/** a pared-down version of DataTracker and DT.View. need one for each index of each column family */
+public class DataTracker
+{
+    private static final Logger logger = LoggerFactory.getLogger(DataTracker.class);
+
+    private final AbstractType<?> keyValidator;
+    private final ColumnIndex columnIndex;
+    private final AtomicReference<View> view = new AtomicReference<>();
+
+    public DataTracker(AbstractType<?> keyValidator, ColumnIndex index)
+    {
+        this.keyValidator = keyValidator;
+        this.columnIndex = index;
+        this.view.set(new View(index, Collections.<SSTableIndex>emptySet()));
+    }
+
+    public View getView()
+    {
+        return view.get();
+    }
+
+    /**
+     * Replaces old SSTables with new by creating new immutable tracker.
+     *
+     * @param oldSSTables A set of SSTables to remove.
+     * @param newSSTables A set of SSTables to add to tracker.
+     *
+     * @return A collection of SSTables which don't have component attached for current index.
+     */
+    public Iterable<SSTableReader> update(Collection<SSTableReader> oldSSTables, Collection<SSTableReader> newSSTables)
+    {
+        final Set<SSTableIndex> newIndexes = getIndexes(newSSTables);
+        final Set<SSTableReader> indexedSSTables = getSSTables(newIndexes);
+
+        View currentView, newView;
+        do
+        {
+            currentView = view.get();
+            newView = new View(columnIndex, currentView.getIndexes(), oldSSTables, newIndexes);
+        }
+        while (!view.compareAndSet(currentView, newView));
+
+        return newSSTables.stream().filter(sstable -> !indexedSSTables.contains(sstable)).collect(Collectors.toList());
+    }
+
+    public boolean hasSSTable(SSTableReader sstable)
+    {
+        View currentView = view.get();
+        for (SSTableIndex index : currentView)
+        {
+            if (index.getSSTable().equals(sstable))
+                return true;
+        }
+
+        return false;
+    }
+
+    public void dropData(long truncateUntil)
+    {
+        View currentView = view.get();
+        if (currentView == null)
+            return;
+
+        Set<SSTableReader> toRemove = new HashSet<>();
+        for (SSTableIndex index : currentView)
+        {
+            SSTableReader sstable = index.getSSTable();
+            if (sstable.getMaxTimestamp() > truncateUntil)
+                continue;
+
+            index.markObsolete();
+            toRemove.add(sstable);
+        }
+
+        update(toRemove, Collections.<SSTableReader>emptyList());
+    }
+
+    private Set<SSTableIndex> getIndexes(Collection<SSTableReader> sstables)
+    {
+        Set<SSTableIndex> indexes = new HashSet<>(sstables.size());
+        for (SSTableReader sstable : sstables)
+        {
+            if (sstable.isMarkedCompacted())
+                continue;
+
+            File indexFile = new File(sstable.descriptor.filenameFor(columnIndex.getComponent()));
+            if (!indexFile.exists())
+                continue;
+
+            SSTableIndex index = null;
+
+            try
+            {
+                index = new SSTableIndex(columnIndex, indexFile, sstable);
+
+                logger.info("SSTableIndex.open(column: {}, minTerm: {}, maxTerm: {}, minKey: {}, maxKey: {}, sstable: {})",
+                            columnIndex.getColumnName(),
+                            columnIndex.getValidator().getString(index.minTerm()),
+                            columnIndex.getValidator().getString(index.maxTerm()),
+                            keyValidator.getString(index.minKey()),
+                            keyValidator.getString(index.maxKey()),
+                            index.getSSTable());
+
+                // Try to add new index to the set, if set already has such index, we'll simply release and move on.
+                // This covers situation when sstable collection has the same sstable multiple
+                // times because we don't know what kind of collection it actually is.
+                if (!indexes.add(index))
+                    index.release();
+            }
+            catch (Throwable t)
+            {
+                logger.error("Can't open index file at " + indexFile.getAbsolutePath() + ", skipping.", t);
+                if (index != null)
+                    index.release();
+            }
+        }
+
+        return indexes;
+    }
+
+    private Set<SSTableReader> getSSTables(Set<SSTableIndex> indexes)
+    {
+        return Sets.newHashSet(indexes.stream().map(SSTableIndex::getSSTable).collect(Collectors.toList()));
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/conf/IndexMode.java b/src/java/org/apache/cassandra/index/sasi/conf/IndexMode.java
new file mode 100644
index 0000000..1c85ed5
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/conf/IndexMode.java

@@ -0,0 +1,184 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.conf;
+
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Optional;
+import java.util.Set;
+
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.index.sasi.analyzer.AbstractAnalyzer;
+import org.apache.cassandra.index.sasi.analyzer.NoOpAnalyzer;
+import org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer;
+import org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.Mode;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.marshal.AsciiType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.index.sasi.plan.Expression.Op;
+import org.apache.cassandra.schema.IndexMetadata;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class IndexMode
+{
+    private static final Logger logger = LoggerFactory.getLogger(IndexMode.class);
+
+    public static final IndexMode NOT_INDEXED = new IndexMode(Mode.PREFIX, true, false, NonTokenizingAnalyzer.class, 0);
+
+    private static final Set<AbstractType<?>> TOKENIZABLE_TYPES = new HashSet<AbstractType<?>>()
+    {{
+        add(UTF8Type.instance);
+        add(AsciiType.instance);
+    }};
+
+    private static final String INDEX_MODE_OPTION = "mode";
+    private static final String INDEX_ANALYZED_OPTION = "analyzed";
+    private static final String INDEX_ANALYZER_CLASS_OPTION = "analyzer_class";
+    private static final String INDEX_IS_LITERAL_OPTION = "is_literal";
+    private static final String INDEX_MAX_FLUSH_MEMORY_OPTION = "max_compaction_flush_memory_in_mb";
+    private static final double INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER = 0.15;
+
+    public final Mode mode;
+    public final boolean isAnalyzed, isLiteral;
+    public final Class analyzerClass;
+    public final long maxCompactionFlushMemoryInMb;
+
+    private IndexMode(Mode mode, boolean isLiteral, boolean isAnalyzed, Class analyzerClass, long maxFlushMemMb)
+    {
+        this.mode = mode;
+        this.isLiteral = isLiteral;
+        this.isAnalyzed = isAnalyzed;
+        this.analyzerClass = analyzerClass;
+        this.maxCompactionFlushMemoryInMb = maxFlushMemMb;
+    }
+
+    public AbstractAnalyzer getAnalyzer(AbstractType<?> validator)
+    {
+        AbstractAnalyzer analyzer = new NoOpAnalyzer();
+
+        try
+        {
+            if (isAnalyzed)
+            {
+                if (analyzerClass != null)
+                    analyzer = (AbstractAnalyzer) analyzerClass.newInstance();
+                else if (TOKENIZABLE_TYPES.contains(validator))
+                    analyzer = new StandardAnalyzer();
+            }
+        }
+        catch (InstantiationException | IllegalAccessException e)
+        {
+            logger.error("Failed to create new instance of analyzer with class [{}]", analyzerClass.getName(), e);
+        }
+
+        return analyzer;
+    }
+
+    public static void validateAnalyzer(Map<String, String> indexOptions) throws ConfigurationException
+    {
+        // validate that a valid analyzer class was provided if specified
+        if (indexOptions.containsKey(INDEX_ANALYZER_CLASS_OPTION))
+        {
+            try
+            {
+                Class.forName(indexOptions.get(INDEX_ANALYZER_CLASS_OPTION));
+            }
+            catch (ClassNotFoundException e)
+            {
+                throw new ConfigurationException(String.format("Invalid analyzer class option specified [%s]",
+                                                               indexOptions.get(INDEX_ANALYZER_CLASS_OPTION)));
+            }
+        }
+    }
+
+    public static IndexMode getMode(ColumnDefinition column, Optional<IndexMetadata> config) throws ConfigurationException
+    {
+        return getMode(column, config.isPresent() ? config.get().options : null);
+    }
+
+    public static IndexMode getMode(ColumnDefinition column, Map<String, String> indexOptions) throws ConfigurationException
+    {
+        if (indexOptions == null || indexOptions.isEmpty())
+            return IndexMode.NOT_INDEXED;
+
+        Mode mode;
+
+        try
+        {
+            mode = indexOptions.get(INDEX_MODE_OPTION) == null
+                            ? Mode.PREFIX
+                            : Mode.mode(indexOptions.get(INDEX_MODE_OPTION));
+        }
+        catch (IllegalArgumentException e)
+        {
+            throw new ConfigurationException("Incorrect index mode: " + indexOptions.get(INDEX_MODE_OPTION));
+        }
+
+        boolean isAnalyzed = false;
+        Class analyzerClass = null;
+        try
+        {
+            if (indexOptions.get(INDEX_ANALYZER_CLASS_OPTION) != null)
+            {
+                analyzerClass = Class.forName(indexOptions.get(INDEX_ANALYZER_CLASS_OPTION));
+                isAnalyzed = indexOptions.get(INDEX_ANALYZED_OPTION) == null
+                              ? true : Boolean.valueOf(indexOptions.get(INDEX_ANALYZED_OPTION));
+            }
+            else if (indexOptions.get(INDEX_ANALYZED_OPTION) != null)
+            {
+                isAnalyzed = Boolean.valueOf(indexOptions.get(INDEX_ANALYZED_OPTION));
+            }
+        }
+        catch (ClassNotFoundException e)
+        {
+            // should not happen as we already validated we could instantiate an instance in validateAnalyzer()
+            logger.error("Failed to find specified analyzer class [{}]. Falling back to default analyzer",
+                         indexOptions.get(INDEX_ANALYZER_CLASS_OPTION));
+        }
+
+        boolean isLiteral = false;
+        try
+        {
+            String literalOption = indexOptions.get(INDEX_IS_LITERAL_OPTION);
+            AbstractType<?> validator = column.cellValueType();
+
+            isLiteral = literalOption == null
+                            ? (validator instanceof UTF8Type || validator instanceof AsciiType)
+                            : Boolean.valueOf(literalOption);
+        }
+        catch (Exception e)
+        {
+            logger.error("failed to parse {} option, defaulting to 'false'.", INDEX_IS_LITERAL_OPTION);
+        }
+
+        Long maxMemMb = indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION) == null
+                ? (long) (1073741824 * INDEX_MAX_FLUSH_DEFAULT_MULTIPLIER) // 1G default for memtable
+                : Long.parseLong(indexOptions.get(INDEX_MAX_FLUSH_MEMORY_OPTION));
+
+        return new IndexMode(mode, isLiteral, isAnalyzed, analyzerClass, maxMemMb);
+    }
+
+    public boolean supports(Op operator)
+    {
+        return mode.supports(operator);
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/conf/view/PrefixTermTree.java b/src/java/org/apache/cassandra/index/sasi/conf/view/PrefixTermTree.java
new file mode 100644
index 0000000..f7cd942
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/conf/view/PrefixTermTree.java

@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.conf.view;
+
+import java.nio.ByteBuffer;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+
+import org.apache.cassandra.index.sasi.SSTableIndex;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.index.sasi.utils.trie.KeyAnalyzer;
+import org.apache.cassandra.index.sasi.utils.trie.PatriciaTrie;
+import org.apache.cassandra.index.sasi.utils.trie.Trie;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.utils.Interval;
+import org.apache.cassandra.utils.IntervalTree;
+
+import com.google.common.collect.Sets;
+
+/**
+ * This class is an extension over RangeTermTree for string terms,
+ * it is required because interval tree can't handle matching if search is on the
+ * prefix of min/max of the range, so for ascii/utf8 fields we build an additional
+ * prefix trie (including both min/max terms of the index) and do union of the results
+ * of the prefix tree search and results from the interval tree lookup.
+ */
+public class PrefixTermTree extends RangeTermTree
+{
+    private final OnDiskIndexBuilder.Mode mode;
+    private final Trie<ByteBuffer, Set<SSTableIndex>> trie;
+
+    public PrefixTermTree(ByteBuffer min, ByteBuffer max,
+                          Trie<ByteBuffer, Set<SSTableIndex>> trie,
+                          IntervalTree<Term, SSTableIndex, Interval<Term, SSTableIndex>> ranges,
+                          OnDiskIndexBuilder.Mode mode,
+                          AbstractType<?> comparator)
+    {
+        super(min, max, ranges, comparator);
+
+        this.mode = mode;
+        this.trie = trie;
+    }
+
+    public Set<SSTableIndex> search(Expression e)
+    {
+        Map<ByteBuffer, Set<SSTableIndex>> indexes = (e == null || e.lower == null || mode == OnDiskIndexBuilder.Mode.CONTAINS)
+                                                        ? trie : trie.prefixMap(e.lower.value);
+
+        Set<SSTableIndex> view = new HashSet<>(indexes.size());
+        indexes.values().forEach(view::addAll);
+        return Sets.union(view, super.search(e));
+    }
+
+    public static class Builder extends RangeTermTree.Builder
+    {
+        private final PatriciaTrie<ByteBuffer, Set<SSTableIndex>> trie;
+
+        protected Builder(OnDiskIndexBuilder.Mode mode, final AbstractType<?> comparator)
+        {
+            super(mode, comparator);
+            trie = new PatriciaTrie<>(new ByteBufferKeyAnalyzer(comparator));
+
+        }
+
+        public void addIndex(SSTableIndex index)
+        {
+            super.addIndex(index);
+            addTerm(index.minTerm(), index);
+            addTerm(index.maxTerm(), index);
+        }
+
+        public TermTree build()
+        {
+            return new PrefixTermTree(min, max, trie, IntervalTree.build(intervals), mode, comparator);
+        }
+
+        private void addTerm(ByteBuffer term, SSTableIndex index)
+        {
+            Set<SSTableIndex> indexes = trie.get(term);
+            if (indexes == null)
+                trie.put(term, (indexes = new HashSet<>()));
+
+            indexes.add(index);
+        }
+    }
+
+    private static class ByteBufferKeyAnalyzer implements KeyAnalyzer<ByteBuffer>
+    {
+        private final AbstractType<?> comparator;
+
+        public ByteBufferKeyAnalyzer(AbstractType<?> comparator)
+        {
+            this.comparator = comparator;
+        }
+
+        /**
+         * A bit mask where the first bit is 1 and the others are zero
+         */
+        private static final int MSB = 1 << Byte.SIZE-1;
+
+        public int compare(ByteBuffer a, ByteBuffer b)
+        {
+            return comparator.compare(a, b);
+        }
+
+        public int lengthInBits(ByteBuffer o)
+        {
+            return o.remaining() * Byte.SIZE;
+        }
+
+        public boolean isBitSet(ByteBuffer key, int bitIndex)
+        {
+            if (bitIndex >= lengthInBits(key))
+                return false;
+
+            int index = bitIndex / Byte.SIZE;
+            int bit = bitIndex % Byte.SIZE;
+            return (key.get(index) & mask(bit)) != 0;
+        }
+
+        public int bitIndex(ByteBuffer key, ByteBuffer otherKey)
+        {
+            int length = Math.max(key.remaining(), otherKey.remaining());
+
+            boolean allNull = true;
+            for (int i = 0; i < length; i++)
+            {
+                byte b1 = valueAt(key, i);
+                byte b2 = valueAt(otherKey, i);
+
+                if (b1 != b2)
+                {
+                    int xor = b1 ^ b2;
+                    for (int j = 0; j < Byte.SIZE; j++)
+                    {
+                        if ((xor & mask(j)) != 0)
+                            return (i * Byte.SIZE) + j;
+                    }
+                }
+
+                if (b1 != 0)
+                    allNull = false;
+            }
+
+            return allNull ? KeyAnalyzer.NULL_BIT_KEY : KeyAnalyzer.EQUAL_BIT_KEY;
+        }
+
+        public boolean isPrefix(ByteBuffer key, ByteBuffer prefix)
+        {
+            if (key.remaining() < prefix.remaining())
+                return false;
+
+            for (int i = 0; i < prefix.remaining(); i++)
+            {
+                if (key.get(i) != prefix.get(i))
+                    return false;
+            }
+
+            return true;
+        }
+
+        /**
+         * Returns the {@code byte} value at the given index.
+         */
+        private byte valueAt(ByteBuffer value, int index)
+        {
+            return index >= 0 && index < value.remaining() ? value.get(index) : 0;
+        }
+
+        /**
+         * Returns a bit mask where the given bit is set
+         */
+        private int mask(int bit)
+        {
+            return MSB >>> bit;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/conf/view/RangeTermTree.java b/src/java/org/apache/cassandra/index/sasi/conf/view/RangeTermTree.java
new file mode 100644
index 0000000..d6b4551
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/conf/view/RangeTermTree.java

@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.conf.view;
+
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.HashSet;
+import java.util.List;
+import java.util.Set;
+
+import org.apache.cassandra.index.sasi.SSTableIndex;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.utils.Interval;
+import org.apache.cassandra.utils.IntervalTree;
+
+public class RangeTermTree implements TermTree
+{
+    protected final ByteBuffer min, max;
+    protected final IntervalTree<Term, SSTableIndex, Interval<Term, SSTableIndex>> rangeTree;
+    protected final AbstractType<?> comparator;
+
+    public RangeTermTree(ByteBuffer min, ByteBuffer max, IntervalTree<Term, SSTableIndex, Interval<Term, SSTableIndex>> rangeTree, AbstractType<?> comparator)
+    {
+        this.min = min;
+        this.max = max;
+        this.rangeTree = rangeTree;
+        this.comparator = comparator;
+    }
+
+    public Set<SSTableIndex> search(Expression e)
+    {
+        ByteBuffer minTerm = e.lower == null ? min : e.lower.value;
+        ByteBuffer maxTerm = e.upper == null ? max : e.upper.value;
+
+        return new HashSet<>(rangeTree.search(Interval.create(new Term(minTerm, comparator),
+                                                              new Term(maxTerm, comparator),
+                                                              (SSTableIndex) null)));
+    }
+
+    public int intervalCount()
+    {
+        return rangeTree.intervalCount();
+    }
+
+    static class Builder extends TermTree.Builder
+    {
+        protected final List<Interval<Term, SSTableIndex>> intervals = new ArrayList<>();
+
+        protected Builder(OnDiskIndexBuilder.Mode mode, AbstractType<?> comparator)
+        {
+            super(mode, comparator);
+        }
+
+        public void addIndex(SSTableIndex index)
+        {
+            intervals.add(Interval.create(new Term(index.minTerm(), comparator),
+                                          new Term(index.maxTerm(), comparator), index));
+        }
+
+
+        public TermTree build()
+        {
+            return new RangeTermTree(min, max, IntervalTree.build(intervals), comparator);
+        }
+    }
+
+
+    /**
+     * This is required since IntervalTree doesn't support custom Comparator
+     * implementations and relied on items to be comparable which "raw" terms are not.
+     */
+    protected static class Term implements Comparable<Term>
+    {
+        private final ByteBuffer term;
+        private final AbstractType<?> comparator;
+
+        public Term(ByteBuffer term, AbstractType<?> comparator)
+        {
+            this.term = term;
+            this.comparator = comparator;
+        }
+
+        public int compareTo(Term o)
+        {
+            return comparator.compare(term, o.term);
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/conf/view/TermTree.java b/src/java/org/apache/cassandra/index/sasi/conf/view/TermTree.java
new file mode 100644
index 0000000..a175e22
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/conf/view/TermTree.java

@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.conf.view;
+
+import java.nio.ByteBuffer;
+import java.util.Set;
+
+import org.apache.cassandra.index.sasi.SSTableIndex;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+public interface TermTree
+{
+    Set<SSTableIndex> search(Expression e);
+
+    int intervalCount();
+
+    abstract class Builder
+    {
+        protected final OnDiskIndexBuilder.Mode mode;
+        protected final AbstractType<?> comparator;
+        protected ByteBuffer min, max;
+
+        protected Builder(OnDiskIndexBuilder.Mode mode, AbstractType<?> comparator)
+        {
+            this.mode = mode;
+            this.comparator = comparator;
+        }
+
+        public final void add(SSTableIndex index)
+        {
+            addIndex(index);
+
+            min = min == null || comparator.compare(min, index.minTerm()) > 0 ? index.minTerm() : min;
+            max = max == null || comparator.compare(max, index.maxTerm()) < 0 ? index.maxTerm() : max;
+        }
+
+        protected abstract void addIndex(SSTableIndex index);
+
+        public abstract TermTree build();
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/conf/view/View.java b/src/java/org/apache/cassandra/index/sasi/conf/view/View.java
new file mode 100644
index 0000000..1f68b0c
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/conf/view/View.java

@@ -0,0 +1,130 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.conf.view;
+
+import java.nio.ByteBuffer;
+import java.util.*;
+
+import org.apache.cassandra.db.marshal.UUIDType;
+import org.apache.cassandra.index.sasi.SSTableIndex;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.marshal.AsciiType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.io.sstable.Descriptor;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.utils.Interval;
+import org.apache.cassandra.utils.IntervalTree;
+
+import com.google.common.collect.Iterables;
+import com.google.common.collect.Sets;
+
+public class View implements Iterable<SSTableIndex>
+{
+    private final Map<Descriptor, SSTableIndex> view;
+
+    private final TermTree termTree;
+    private final AbstractType<?> keyValidator;
+    private final IntervalTree<Key, SSTableIndex, Interval<Key, SSTableIndex>> keyIntervalTree;
+
+    public View(ColumnIndex index, Set<SSTableIndex> indexes)
+    {
+        this(index, Collections.<SSTableIndex>emptyList(), Collections.<SSTableReader>emptyList(), indexes);
+    }
+
+    public View(ColumnIndex index,
+                Collection<SSTableIndex> currentView,
+                Collection<SSTableReader> oldSSTables,
+                Set<SSTableIndex> newIndexes)
+    {
+        Map<Descriptor, SSTableIndex> newView = new HashMap<>();
+
+        AbstractType<?> validator = index.getValidator();
+        TermTree.Builder termTreeBuilder = (validator instanceof AsciiType || validator instanceof UTF8Type)
+                                            ? new PrefixTermTree.Builder(index.getMode().mode, validator)
+                                            : new RangeTermTree.Builder(index.getMode().mode, validator);
+
+        List<Interval<Key, SSTableIndex>> keyIntervals = new ArrayList<>();
+        for (SSTableIndex sstableIndex : Iterables.concat(currentView, newIndexes))
+        {
+            SSTableReader sstable = sstableIndex.getSSTable();
+            if (oldSSTables.contains(sstable) || sstable.isMarkedCompacted() || newView.containsKey(sstable.descriptor))
+            {
+                sstableIndex.release();
+                continue;
+            }
+
+            newView.put(sstable.descriptor, sstableIndex);
+
+            termTreeBuilder.add(sstableIndex);
+            keyIntervals.add(Interval.create(new Key(sstableIndex.minKey(), index.keyValidator()),
+                                             new Key(sstableIndex.maxKey(), index.keyValidator()),
+                                             sstableIndex));
+        }
+
+        this.view = newView;
+        this.termTree = termTreeBuilder.build();
+        this.keyValidator = index.keyValidator();
+        this.keyIntervalTree = IntervalTree.build(keyIntervals);
+
+        if (keyIntervalTree.intervalCount() != termTree.intervalCount())
+            throw new IllegalStateException(String.format("mismatched sizes for intervals tree for keys vs terms: %d != %d", keyIntervalTree.intervalCount(), termTree.intervalCount()));
+    }
+
+    public Set<SSTableIndex> match(Expression expression)
+    {
+        return termTree.search(expression);
+    }
+
+    public List<SSTableIndex> match(ByteBuffer minKey, ByteBuffer maxKey)
+    {
+        return keyIntervalTree.search(Interval.create(new Key(minKey, keyValidator), new Key(maxKey, keyValidator), (SSTableIndex) null));
+    }
+
+    public Iterator<SSTableIndex> iterator()
+    {
+        return view.values().iterator();
+    }
+
+    public Collection<SSTableIndex> getIndexes()
+    {
+        return view.values();
+    }
+
+    /**
+     * This is required since IntervalTree doesn't support custom Comparator
+     * implementations and relied on items to be comparable which "raw" keys are not.
+     */
+    private static class Key implements Comparable<Key>
+    {
+        private final ByteBuffer key;
+        private final AbstractType<?> comparator;
+
+        public Key(ByteBuffer key, AbstractType<?> comparator)
+        {
+            this.key = key;
+            this.comparator = comparator;
+        }
+
+        public int compareTo(Key o)
+        {
+            return comparator.compare(key, o.key);
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/disk/AbstractTokenTreeBuilder.java b/src/java/org/apache/cassandra/index/sasi/disk/AbstractTokenTreeBuilder.java
new file mode 100644
index 0000000..9a1f7f1
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/disk/AbstractTokenTreeBuilder.java

@@ -0,0 +1,673 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.index.sasi.disk;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+
+import org.apache.cassandra.io.util.DataOutputPlus;
+import org.apache.cassandra.utils.AbstractIterator;
+import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.Pair;
+
+import com.carrotsearch.hppc.LongArrayList;
+import com.carrotsearch.hppc.LongSet;
+import com.carrotsearch.hppc.cursors.LongCursor;
+
+public abstract class AbstractTokenTreeBuilder implements TokenTreeBuilder
+{
+    protected int numBlocks;
+    protected Node root;
+    protected InteriorNode rightmostParent;
+    protected Leaf leftmostLeaf;
+    protected Leaf rightmostLeaf;
+    protected long tokenCount = 0;
+    protected long treeMinToken;
+    protected long treeMaxToken;
+
+    public void add(TokenTreeBuilder other)
+    {
+        add(other.iterator());
+    }
+
+    public TokenTreeBuilder finish()
+    {
+        if (root == null)
+            constructTree();
+
+        return this;
+    }
+
+    public long getTokenCount()
+    {
+        return tokenCount;
+    }
+
+    public int serializedSize()
+    {
+        if (numBlocks == 1)
+            return (BLOCK_HEADER_BYTES + ((int) tokenCount * 16));
+        else
+            return numBlocks * BLOCK_BYTES;
+    }
+
+    public void write(DataOutputPlus out) throws IOException
+    {
+        ByteBuffer blockBuffer = ByteBuffer.allocate(BLOCK_BYTES);
+        Iterator<Node> levelIterator = root.levelIterator();
+        long childBlockIndex = 1;
+
+        while (levelIterator != null)
+        {
+            Node firstChild = null;
+            while (levelIterator.hasNext())
+            {
+                Node block = levelIterator.next();
+
+                if (firstChild == null && !block.isLeaf())
+                    firstChild = ((InteriorNode) block).children.get(0);
+
+                if (block.isSerializable())
+                {
+                    block.serialize(childBlockIndex, blockBuffer);
+                    flushBuffer(blockBuffer, out, numBlocks != 1);
+                }
+
+                childBlockIndex += block.childCount();
+            }
+
+            levelIterator = (firstChild == null) ? null : firstChild.levelIterator();
+        }
+    }
+
+    protected abstract void constructTree();
+
+    protected void flushBuffer(ByteBuffer buffer, DataOutputPlus o, boolean align) throws IOException
+    {
+        // seek to end of last block before flushing
+        if (align)
+            alignBuffer(buffer, BLOCK_BYTES);
+
+        buffer.flip();
+        o.write(buffer);
+        buffer.clear();
+    }
+
+    protected abstract class Node
+    {
+        protected InteriorNode parent;
+        protected Node next;
+        protected Long nodeMinToken, nodeMaxToken;
+
+        public Node(Long minToken, Long maxToken)
+        {
+            nodeMinToken = minToken;
+            nodeMaxToken = maxToken;
+        }
+
+        public abstract boolean isSerializable();
+        public abstract void serialize(long childBlockIndex, ByteBuffer buf);
+        public abstract int childCount();
+        public abstract int tokenCount();
+
+        public Long smallestToken()
+        {
+            return nodeMinToken;
+        }
+
+        public Long largestToken()
+        {
+            return nodeMaxToken;
+        }
+
+        public Iterator<Node> levelIterator()
+        {
+            return new LevelIterator(this);
+        }
+
+        public boolean isLeaf()
+        {
+            return (this instanceof Leaf);
+        }
+
+        protected boolean isLastLeaf()
+        {
+            return this == rightmostLeaf;
+        }
+
+        protected boolean isRoot()
+        {
+            return this == root;
+        }
+
+        protected void updateTokenRange(long token)
+        {
+            nodeMinToken = nodeMinToken == null ? token : Math.min(nodeMinToken, token);
+            nodeMaxToken = nodeMaxToken == null ? token : Math.max(nodeMaxToken, token);
+        }
+
+        protected void serializeHeader(ByteBuffer buf)
+        {
+            Header header;
+            if (isRoot())
+                header = new RootHeader();
+            else if (!isLeaf())
+                header = new InteriorNodeHeader();
+            else
+                header = new LeafHeader();
+
+            header.serialize(buf);
+            alignBuffer(buf, BLOCK_HEADER_BYTES);
+        }
+
+        private abstract class Header
+        {
+            public void serialize(ByteBuffer buf)
+            {
+                buf.put(infoByte())
+                   .putShort((short) (tokenCount()))
+                   .putLong(nodeMinToken)
+                   .putLong(nodeMaxToken);
+            }
+
+            protected abstract byte infoByte();
+        }
+
+        private class RootHeader extends Header
+        {
+            public void serialize(ByteBuffer buf)
+            {
+                super.serialize(buf);
+                writeMagic(buf);
+                buf.putLong(tokenCount)
+                   .putLong(treeMinToken)
+                   .putLong(treeMaxToken);
+            }
+
+            protected byte infoByte()
+            {
+                // if leaf, set leaf indicator and last leaf indicator (bits 0 & 1)
+                // if not leaf, clear both bits
+                return (byte) ((isLeaf()) ? 3 : 0);
+            }
+
+            protected void writeMagic(ByteBuffer buf)
+            {
+                switch (Descriptor.CURRENT_VERSION)
+                {
+                    case Descriptor.VERSION_AB:
+                        buf.putShort(AB_MAGIC);
+                        break;
+
+                    default:
+                        break;
+                }
+
+            }
+        }
+
+        private class InteriorNodeHeader extends Header
+        {
+            // bit 0 (leaf indicator) & bit 1 (last leaf indicator) cleared
+            protected byte infoByte()
+            {
+                return 0;
+            }
+        }
+
+        private class LeafHeader extends Header
+        {
+            // bit 0 set as leaf indicator
+            // bit 1 set if this is last leaf of data
+            protected byte infoByte()
+            {
+                byte infoByte = 1;
+                infoByte |= (isLastLeaf()) ? (1 << LAST_LEAF_SHIFT) : 0;
+
+                return infoByte;
+            }
+        }
+
+    }
+
+    protected abstract class Leaf extends Node
+    {
+        protected LongArrayList overflowCollisions;
+
+        public Leaf(Long minToken, Long maxToken)
+        {
+            super(minToken, maxToken);
+        }
+
+        public int childCount()
+        {
+            return 0;
+        }
+
+        protected void serializeOverflowCollisions(ByteBuffer buf)
+        {
+            if (overflowCollisions != null)
+                for (LongCursor offset : overflowCollisions)
+                    buf.putLong(offset.value);
+        }
+
+        public void serialize(long childBlockIndex, ByteBuffer buf)
+        {
+            serializeHeader(buf);
+            serializeData(buf);
+            serializeOverflowCollisions(buf);
+        }
+
+        protected abstract void serializeData(ByteBuffer buf);
+
+        protected LeafEntry createEntry(final long tok, final LongSet offsets)
+        {
+            int offsetCount = offsets.size();
+            switch (offsetCount)
+            {
+                case 0:
+                    throw new AssertionError("no offsets for token " + tok);
+                case 1:
+                    long offset = offsets.toArray()[0];
+                    if (offset > MAX_OFFSET)
+                        throw new AssertionError("offset " + offset + " cannot be greater than " + MAX_OFFSET);
+                    else if (offset <= Integer.MAX_VALUE)
+                        return new SimpleLeafEntry(tok, offset);
+                    else
+                        return new FactoredOffsetLeafEntry(tok, offset);
+                case 2:
+                    long[] rawOffsets = offsets.toArray();
+                    if (rawOffsets[0] <= Integer.MAX_VALUE && rawOffsets[1] <= Integer.MAX_VALUE &&
+                        (rawOffsets[0] <= Short.MAX_VALUE || rawOffsets[1] <= Short.MAX_VALUE))
+                        return new PackedCollisionLeafEntry(tok, rawOffsets);
+                    else
+                        return createOverflowEntry(tok, offsetCount, offsets);
+                default:
+                    return createOverflowEntry(tok, offsetCount, offsets);
+            }
+        }
+
+        private LeafEntry createOverflowEntry(final long tok, final int offsetCount, final LongSet offsets)
+        {
+            if (overflowCollisions == null)
+                overflowCollisions = new LongArrayList();
+
+            LeafEntry entry = new OverflowCollisionLeafEntry(tok, (short) overflowCollisions.size(), (short) offsetCount);
+            for (LongCursor o : offsets) {
+                if (overflowCollisions.size() == OVERFLOW_TRAILER_CAPACITY)
+                    throw new AssertionError("cannot have more than " + OVERFLOW_TRAILER_CAPACITY + " overflow collisions per leaf");
+                else
+                    overflowCollisions.add(o.value);
+            }
+            return entry;
+        }
+
+        protected abstract class LeafEntry
+        {
+            protected final long token;
+
+            abstract public EntryType type();
+            abstract public int offsetData();
+            abstract public short offsetExtra();
+
+            public LeafEntry(final long tok)
+            {
+                token = tok;
+            }
+
+            public void serialize(ByteBuffer buf)
+            {
+                buf.putShort((short) type().ordinal())
+                   .putShort(offsetExtra())
+                   .putLong(token)
+                   .putInt(offsetData());
+            }
+
+        }
+
+
+        // assumes there is a single offset and the offset is <= Integer.MAX_VALUE
+        protected class SimpleLeafEntry extends LeafEntry
+        {
+            private final long offset;
+
+            public SimpleLeafEntry(final long tok, final long off)
+            {
+                super(tok);
+                offset = off;
+            }
+
+            public EntryType type()
+            {
+                return EntryType.SIMPLE;
+            }
+
+            public int offsetData()
+            {
+                return (int) offset;
+            }
+
+            public short offsetExtra()
+            {
+                return 0;
+            }
+        }
+
+        // assumes there is a single offset and Integer.MAX_VALUE < offset <= MAX_OFFSET
+        // take the middle 32 bits of offset (or the top 32 when considering offset is max 48 bits)
+        // and store where offset is normally stored. take bottom 16 bits of offset and store in entry header
+        private class FactoredOffsetLeafEntry extends LeafEntry
+        {
+            private final long offset;
+
+            public FactoredOffsetLeafEntry(final long tok, final long off)
+            {
+                super(tok);
+                offset = off;
+            }
+
+            public EntryType type()
+            {
+                return EntryType.FACTORED;
+            }
+
+            public int offsetData()
+            {
+                return (int) (offset >>> Short.SIZE);
+            }
+
+            public short offsetExtra()
+            {
+                // exta offset is supposed to be an unsigned 16-bit integer
+                return (short) offset;
+            }
+        }
+
+        // holds an entry with two offsets that can be packed in an int & a short
+        // the int offset is stored where offset is normally stored. short offset is
+        // stored in entry header
+        private class PackedCollisionLeafEntry extends LeafEntry
+        {
+            private short smallerOffset;
+            private int largerOffset;
+
+            public PackedCollisionLeafEntry(final long tok, final long[] offs)
+            {
+                super(tok);
+
+                smallerOffset = (short) Math.min(offs[0], offs[1]);
+                largerOffset = (int) Math.max(offs[0], offs[1]);
+            }
+
+            public EntryType type()
+            {
+                return EntryType.PACKED;
+            }
+
+            public int offsetData()
+            {
+                return largerOffset;
+            }
+
+            public short offsetExtra()
+            {
+                return smallerOffset;
+            }
+        }
+
+        // holds an entry with three or more offsets, or two offsets that cannot
+        // be packed into an int & a short. the index into the overflow list
+        // is stored where the offset is normally stored. the number of overflowed offsets
+        // for the entry is stored in the entry header
+        private class OverflowCollisionLeafEntry extends LeafEntry
+        {
+            private final short startIndex;
+            private final short count;
+
+            public OverflowCollisionLeafEntry(final long tok, final short collisionStartIndex, final short collisionCount)
+            {
+                super(tok);
+                startIndex = collisionStartIndex;
+                count = collisionCount;
+            }
+
+            public EntryType type()
+            {
+                return EntryType.OVERFLOW;
+            }
+
+            public int offsetData()
+            {
+                return startIndex;
+            }
+
+            public short offsetExtra()
+            {
+                return count;
+            }
+
+        }
+
+    }
+
+    protected class InteriorNode extends Node
+    {
+        protected List<Long> tokens = new ArrayList<>(TOKENS_PER_BLOCK);
+        protected List<Node> children = new ArrayList<>(TOKENS_PER_BLOCK + 1);
+        protected int position = 0;
+
+        public InteriorNode()
+        {
+            super(null, null);
+        }
+
+        public boolean isSerializable()
+        {
+            return true;
+        }
+
+        public void serialize(long childBlockIndex, ByteBuffer buf)
+        {
+            serializeHeader(buf);
+            serializeTokens(buf);
+            serializeChildOffsets(childBlockIndex, buf);
+        }
+
+        public int childCount()
+        {
+            return children.size();
+        }
+
+        public int tokenCount()
+        {
+            return tokens.size();
+        }
+
+        public Long smallestToken()
+        {
+            return tokens.get(0);
+        }
+
+        protected void add(Long token, InteriorNode leftChild, InteriorNode rightChild)
+        {
+            int pos = tokens.size();
+            if (pos == TOKENS_PER_BLOCK)
+            {
+                InteriorNode sibling = split();
+                sibling.add(token, leftChild, rightChild);
+
+            }
+            else {
+                if (leftChild != null)
+                    children.add(pos, leftChild);
+
+                if (rightChild != null)
+                {
+                    children.add(pos + 1, rightChild);
+                    rightChild.parent = this;
+                }
+
+                updateTokenRange(token);
+                tokens.add(pos, token);
+            }
+        }
+
+        protected void add(Leaf node)
+        {
+
+            if (position == (TOKENS_PER_BLOCK + 1))
+            {
+                rightmostParent = split();
+                rightmostParent.add(node);
+            }
+            else
+            {
+
+                node.parent = this;
+                children.add(position, node);
+                position++;
+
+                // the first child is referenced only during bulk load. we don't take a value
+                // to store into the tree, one is subtracted since position has already been incremented
+                // for the next node to be added
+                if (position - 1 == 0)
+                    return;
+
+
+                // tokens are inserted one behind the current position, but 2 is subtracted because
+                // position has already been incremented for the next add
+                Long smallestToken = node.smallestToken();
+                updateTokenRange(smallestToken);
+                tokens.add(position - 2, smallestToken);
+            }
+
+        }
+
+        protected InteriorNode split()
+        {
+            Pair<Long, InteriorNode> splitResult = splitBlock();
+            Long middleValue = splitResult.left;
+            InteriorNode sibling = splitResult.right;
+            InteriorNode leftChild = null;
+
+            // create a new root if necessary
+            if (parent == null)
+            {
+                parent = new InteriorNode();
+                root = parent;
+                sibling.parent = parent;
+                leftChild = this;
+                numBlocks++;
+            }
+
+            parent.add(middleValue, leftChild, sibling);
+
+            return sibling;
+        }
+
+        protected Pair<Long, InteriorNode> splitBlock()
+        {
+            final int splitPosition = TOKENS_PER_BLOCK - 2;
+            InteriorNode sibling = new InteriorNode();
+            sibling.parent = parent;
+            next = sibling;
+
+            Long middleValue = tokens.get(splitPosition);
+
+            for (int i = splitPosition; i < TOKENS_PER_BLOCK; i++)
+            {
+                if (i != TOKENS_PER_BLOCK && i != splitPosition)
+                {
+                    long token = tokens.get(i);
+                    sibling.updateTokenRange(token);
+                    sibling.tokens.add(token);
+                }
+
+                Node child = children.get(i + 1);
+                child.parent = sibling;
+                sibling.children.add(child);
+                sibling.position++;
+            }
+
+            for (int i = TOKENS_PER_BLOCK; i >= splitPosition; i--)
+            {
+                if (i != TOKENS_PER_BLOCK)
+                    tokens.remove(i);
+
+                if (i != splitPosition)
+                    children.remove(i);
+            }
+
+            nodeMinToken = smallestToken();
+            nodeMaxToken = tokens.get(tokens.size() - 1);
+            numBlocks++;
+
+            return Pair.create(middleValue, sibling);
+        }
+
+        protected boolean isFull()
+        {
+            return (position >= TOKENS_PER_BLOCK + 1);
+        }
+
+        private void serializeTokens(ByteBuffer buf)
+        {
+            tokens.forEach(buf::putLong);
+        }
+
+        private void serializeChildOffsets(long childBlockIndex, ByteBuffer buf)
+        {
+            for (int i = 0; i < children.size(); i++)
+                buf.putLong((childBlockIndex + i) * BLOCK_BYTES);
+        }
+    }
+
+    public static class LevelIterator extends AbstractIterator<Node>
+    {
+        private Node currentNode;
+
+        LevelIterator(Node first)
+        {
+            currentNode = first;
+        }
+
+        public Node computeNext()
+        {
+            if (currentNode == null)
+                return endOfData();
+
+            Node returnNode = currentNode;
+            currentNode = returnNode.next;
+
+            return returnNode;
+        }
+    }
+
+
+    protected static void alignBuffer(ByteBuffer buffer, int blockSize)
+    {
+        long curPos = buffer.position();
+        if ((curPos & (blockSize - 1)) != 0) // align on the block boundary if needed
+            buffer.position((int) FBUtilities.align(curPos, blockSize));
+    }
+
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/disk/Descriptor.java b/src/java/org/apache/cassandra/index/sasi/disk/Descriptor.java
new file mode 100644
index 0000000..3aa6f14
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/disk/Descriptor.java

@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+/**
+ * Object descriptor for SASIIndex files. Similar to, and based upon, the sstable descriptor.
+ */
+public class Descriptor
+{
+    public static final String VERSION_AA = "aa";
+    public static final String VERSION_AB = "ab";
+    public static final String CURRENT_VERSION = VERSION_AB;
+    public static final Descriptor CURRENT = new Descriptor(CURRENT_VERSION);
+
+    public static class Version
+    {
+        public final String version;
+
+        public Version(String version)
+        {
+            this.version = version;
+        }
+
+        public String toString()
+        {
+            return version;
+        }
+    }
+
+    public final Version version;
+
+    public Descriptor(String v)
+    {
+        this.version = new Version(v);
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/disk/DynamicTokenTreeBuilder.java b/src/java/org/apache/cassandra/index/sasi/disk/DynamicTokenTreeBuilder.java
new file mode 100644
index 0000000..2ddfd89
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/disk/DynamicTokenTreeBuilder.java

@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import java.nio.ByteBuffer;
+import java.util.*;
+
+import org.apache.cassandra.utils.AbstractIterator;
+import org.apache.cassandra.utils.Pair;
+
+import com.carrotsearch.hppc.LongOpenHashSet;
+import com.carrotsearch.hppc.LongSet;
+import com.carrotsearch.hppc.cursors.LongCursor;
+
+public class DynamicTokenTreeBuilder extends AbstractTokenTreeBuilder
+{
+    private final SortedMap<Long, LongSet> tokens = new TreeMap<>();
+
+
+    public DynamicTokenTreeBuilder()
+    {}
+
+    public DynamicTokenTreeBuilder(TokenTreeBuilder data)
+    {
+        add(data);
+    }
+
+    public DynamicTokenTreeBuilder(SortedMap<Long, LongSet> data)
+    {
+        add(data);
+    }
+
+    public void add(Long token, long keyPosition)
+    {
+        LongSet found = tokens.get(token);
+        if (found == null)
+            tokens.put(token, (found = new LongOpenHashSet(2)));
+
+        found.add(keyPosition);
+    }
+
+    public void add(Iterator<Pair<Long, LongSet>> data)
+    {
+        while (data.hasNext())
+        {
+            Pair<Long, LongSet> entry = data.next();
+            for (LongCursor l : entry.right)
+                add(entry.left, l.value);
+        }
+    }
+
+    public void add(SortedMap<Long, LongSet> data)
+    {
+        for (Map.Entry<Long, LongSet> newEntry : data.entrySet())
+        {
+            LongSet found = tokens.get(newEntry.getKey());
+            if (found == null)
+                tokens.put(newEntry.getKey(), (found = new LongOpenHashSet(4)));
+
+            for (LongCursor offset : newEntry.getValue())
+                found.add(offset.value);
+        }
+    }
+
+    public Iterator<Pair<Long, LongSet>> iterator()
+    {
+        final Iterator<Map.Entry<Long, LongSet>> iterator = tokens.entrySet().iterator();
+        return new AbstractIterator<Pair<Long, LongSet>>()
+        {
+            protected Pair<Long, LongSet> computeNext()
+            {
+                if (!iterator.hasNext())
+                    return endOfData();
+
+                Map.Entry<Long, LongSet> entry = iterator.next();
+                return Pair.create(entry.getKey(), entry.getValue());
+            }
+        };
+    }
+
+    public boolean isEmpty()
+    {
+        return tokens.size() == 0;
+    }
+
+    protected void constructTree()
+    {
+        tokenCount = tokens.size();
+        treeMinToken = tokens.firstKey();
+        treeMaxToken = tokens.lastKey();
+        numBlocks = 1;
+
+        // special case the tree that only has a single block in it (so we don't create a useless root)
+        if (tokenCount <= TOKENS_PER_BLOCK)
+        {
+            leftmostLeaf = new DynamicLeaf(tokens);
+            rightmostLeaf = leftmostLeaf;
+            root = leftmostLeaf;
+        }
+        else
+        {
+            root = new InteriorNode();
+            rightmostParent = (InteriorNode) root;
+
+            int i = 0;
+            Leaf lastLeaf = null;
+            Long firstToken = tokens.firstKey();
+            Long finalToken = tokens.lastKey();
+            Long lastToken;
+            for (Long token : tokens.keySet())
+            {
+                if (i == 0 || (i % TOKENS_PER_BLOCK != 0 && i != (tokenCount - 1)))
+                {
+                    i++;
+                    continue;
+                }
+
+                lastToken = token;
+                Leaf leaf = (i != (tokenCount - 1) || token.equals(finalToken)) ?
+                        new DynamicLeaf(tokens.subMap(firstToken, lastToken)) : new DynamicLeaf(tokens.tailMap(firstToken));
+
+                if (i == TOKENS_PER_BLOCK)
+                    leftmostLeaf = leaf;
+                else
+                    lastLeaf.next = leaf;
+
+                rightmostParent.add(leaf);
+                lastLeaf = leaf;
+                rightmostLeaf = leaf;
+                firstToken = lastToken;
+                i++;
+                numBlocks++;
+
+                if (token.equals(finalToken))
+                {
+                    Leaf finalLeaf = new DynamicLeaf(tokens.tailMap(token));
+                    lastLeaf.next = finalLeaf;
+                    rightmostParent.add(finalLeaf);
+                    rightmostLeaf = finalLeaf;
+                    numBlocks++;
+                }
+            }
+
+        }
+    }
+
+    private class DynamicLeaf extends Leaf
+    {
+        private final SortedMap<Long, LongSet> tokens;
+
+        DynamicLeaf(SortedMap<Long, LongSet> data)
+        {
+            super(data.firstKey(), data.lastKey());
+            tokens = data;
+        }
+
+        public int tokenCount()
+        {
+            return tokens.size();
+        }
+
+        public boolean isSerializable()
+        {
+            return true;
+        }
+
+        protected void serializeData(ByteBuffer buf)
+        {
+            for (Map.Entry<Long, LongSet> entry : tokens.entrySet())
+                createEntry(entry.getKey(), entry.getValue()).serialize(buf);
+        }
+
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/disk/OnDiskBlock.java b/src/java/org/apache/cassandra/index/sasi/disk/OnDiskBlock.java
new file mode 100644
index 0000000..e335b50
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/disk/OnDiskBlock.java

@@ -0,0 +1,143 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.index.sasi.Term;
+import org.apache.cassandra.index.sasi.utils.MappedBuffer;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+public abstract class OnDiskBlock<T extends Term>
+{
+    public enum BlockType
+    {
+        POINTER, DATA
+    }
+
+    // this contains offsets of the terms and term data
+    protected final MappedBuffer blockIndex;
+    protected final int blockIndexSize;
+
+    protected final boolean hasCombinedIndex;
+    protected final TokenTree combinedIndex;
+
+    public OnDiskBlock(Descriptor descriptor, MappedBuffer block, BlockType blockType)
+    {
+        blockIndex = block;
+
+        if (blockType == BlockType.POINTER)
+        {
+            hasCombinedIndex = false;
+            combinedIndex = null;
+            blockIndexSize = block.getInt() << 1; // num terms * sizeof(short)
+            return;
+        }
+
+        long blockOffset = block.position();
+        int combinedIndexOffset = block.getInt(blockOffset + OnDiskIndexBuilder.BLOCK_SIZE);
+
+        hasCombinedIndex = (combinedIndexOffset >= 0);
+        long blockIndexOffset = blockOffset + OnDiskIndexBuilder.BLOCK_SIZE + 4 + combinedIndexOffset;
+
+        combinedIndex = hasCombinedIndex ? new TokenTree(descriptor, blockIndex.duplicate().position(blockIndexOffset)) : null;
+        blockIndexSize = block.getInt() * 2;
+    }
+
+    public SearchResult<T> search(AbstractType<?> comparator, ByteBuffer query)
+    {
+        int cmp = -1, start = 0, end = termCount() - 1, middle = 0;
+
+        T element = null;
+        while (start <= end)
+        {
+            middle = start + ((end - start) >> 1);
+            element = getTerm(middle);
+
+            cmp = element.compareTo(comparator, query);
+            if (cmp == 0)
+                return new SearchResult<>(element, cmp, middle);
+            else if (cmp < 0)
+                start = middle + 1;
+            else
+                end = middle - 1;
+        }
+
+        return new SearchResult<>(element, cmp, middle);
+    }
+
+    @SuppressWarnings("resource")
+    protected T getTerm(int index)
+    {
+        MappedBuffer dup = blockIndex.duplicate();
+        long startsAt = getTermPosition(index);
+        if (termCount() - 1 == index) // last element
+            dup.position(startsAt);
+        else
+            dup.position(startsAt).limit(getTermPosition(index + 1));
+
+        return cast(dup);
+    }
+
+    protected long getTermPosition(int idx)
+    {
+        return getTermPosition(blockIndex, idx, blockIndexSize);
+    }
+
+    protected int termCount()
+    {
+        return blockIndexSize >> 1;
+    }
+
+    protected abstract T cast(MappedBuffer data);
+
+    static long getTermPosition(MappedBuffer data, int idx, int indexSize)
+    {
+        idx <<= 1;
+        assert idx < indexSize;
+        return data.position() + indexSize + data.getShort(data.position() + idx);
+    }
+
+    public TokenTree getBlockIndex()
+    {
+        return combinedIndex;
+    }
+
+    public int minOffset(OnDiskIndex.IteratorOrder order)
+    {
+        return order == OnDiskIndex.IteratorOrder.DESC ? 0 : termCount() - 1;
+    }
+
+    public int maxOffset(OnDiskIndex.IteratorOrder order)
+    {
+        return minOffset(order) == 0 ? termCount() - 1 : 0;
+    }
+
+    public static class SearchResult<T>
+    {
+        public final T result;
+        public final int index, cmp;
+
+        public SearchResult(T result, int cmp, int index)
+        {
+            this.result = result;
+            this.index = index;
+            this.cmp = cmp;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java b/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java
new file mode 100644
index 0000000..4d43cd9
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndex.java

@@ -0,0 +1,813 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import java.io.*;
+import java.nio.ByteBuffer;
+import java.util.*;
+import java.util.stream.Collectors;
+
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.index.sasi.Term;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.index.sasi.plan.Expression.Op;
+import org.apache.cassandra.index.sasi.utils.MappedBuffer;
+import org.apache.cassandra.index.sasi.utils.RangeUnionIterator;
+import org.apache.cassandra.index.sasi.utils.AbstractIterator;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.io.FSReadError;
+import org.apache.cassandra.io.util.ChannelProxy;
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.utils.FBUtilities;
+
+import com.google.common.base.Function;
+import com.google.common.collect.Iterables;
+import com.google.common.collect.Iterators;
+import com.google.common.collect.PeekingIterator;
+
+import static org.apache.cassandra.index.sasi.disk.OnDiskBlock.SearchResult;
+
+public class OnDiskIndex implements Iterable<OnDiskIndex.DataTerm>, Closeable
+{
+    public enum IteratorOrder
+    {
+        DESC(1), ASC(-1);
+
+        public final int step;
+
+        IteratorOrder(int step)
+        {
+            this.step = step;
+        }
+
+        public int startAt(OnDiskBlock<DataTerm> block, Expression e)
+        {
+            switch (this)
+            {
+                case DESC:
+                    return e.lower == null
+                            ? 0
+                            : startAt(block.search(e.validator, e.lower.value), e.lower.inclusive);
+
+                case ASC:
+                    return e.upper == null
+                            ? block.termCount() - 1
+                            : startAt(block.search(e.validator, e.upper.value), e.upper.inclusive);
+
+                default:
+                    throw new IllegalArgumentException("Unknown order: " + this);
+            }
+        }
+
+        public int startAt(SearchResult<DataTerm> found, boolean inclusive)
+        {
+            switch (this)
+            {
+                case DESC:
+                    if (found.cmp < 0)
+                        return found.index + 1;
+
+                    return inclusive || found.cmp != 0 ? found.index : found.index + 1;
+
+                case ASC:
+                    if (found.cmp < 0) // search term was bigger then whole data set
+                        return found.index;
+                    return inclusive && (found.cmp == 0 || found.cmp < 0) ? found.index : found.index - 1;
+
+                default:
+                    throw new IllegalArgumentException("Unknown order: " + this);
+            }
+        }
+    }
+
+    public final Descriptor descriptor;
+    protected final OnDiskIndexBuilder.Mode mode;
+    protected final OnDiskIndexBuilder.TermSize termSize;
+
+    protected final AbstractType<?> comparator;
+    protected final MappedBuffer indexFile;
+    protected final long indexSize;
+    protected final boolean hasMarkedPartials;
+
+    protected final Function<Long, DecoratedKey> keyFetcher;
+
+    protected final String indexPath;
+
+    protected final PointerLevel[] levels;
+    protected final DataLevel dataLevel;
+
+    protected final ByteBuffer minTerm, maxTerm, minKey, maxKey;
+
+    @SuppressWarnings("resource")
+    public OnDiskIndex(File index, AbstractType<?> cmp, Function<Long, DecoratedKey> keyReader)
+    {
+        keyFetcher = keyReader;
+
+        comparator = cmp;
+        indexPath = index.getAbsolutePath();
+
+        RandomAccessFile backingFile = null;
+        try
+        {
+            backingFile = new RandomAccessFile(index, "r");
+
+            descriptor = new Descriptor(backingFile.readUTF());
+
+            termSize = OnDiskIndexBuilder.TermSize.of(backingFile.readShort());
+
+            minTerm = ByteBufferUtil.readWithShortLength(backingFile);
+            maxTerm = ByteBufferUtil.readWithShortLength(backingFile);
+
+            minKey = ByteBufferUtil.readWithShortLength(backingFile);
+            maxKey = ByteBufferUtil.readWithShortLength(backingFile);
+
+            mode = OnDiskIndexBuilder.Mode.mode(backingFile.readUTF());
+            hasMarkedPartials = backingFile.readBoolean();
+
+            indexSize = backingFile.length();
+            indexFile = new MappedBuffer(new ChannelProxy(indexPath, backingFile.getChannel()));
+
+            // start of the levels
+            indexFile.position(indexFile.getLong(indexSize - 8));
+
+            int numLevels = indexFile.getInt();
+            levels = new PointerLevel[numLevels];
+            for (int i = 0; i < levels.length; i++)
+            {
+                int blockCount = indexFile.getInt();
+                levels[i] = new PointerLevel(indexFile.position(), blockCount);
+                indexFile.position(indexFile.position() + blockCount * 8);
+            }
+
+            int blockCount = indexFile.getInt();
+            dataLevel = new DataLevel(indexFile.position(), blockCount);
+        }
+        catch (IOException e)
+        {
+            throw new FSReadError(e, index);
+        }
+        finally
+        {
+            FileUtils.closeQuietly(backingFile);
+        }
+    }
+
+    public boolean hasMarkedPartials()
+    {
+        return hasMarkedPartials;
+    }
+
+    public OnDiskIndexBuilder.Mode mode()
+    {
+        return mode;
+    }
+
+    public ByteBuffer minTerm()
+    {
+        return minTerm;
+    }
+
+    public ByteBuffer maxTerm()
+    {
+        return maxTerm;
+    }
+
+    public ByteBuffer minKey()
+    {
+        return minKey;
+    }
+
+    public ByteBuffer maxKey()
+    {
+        return maxKey;
+    }
+
+    public DataTerm min()
+    {
+        return dataLevel.getBlock(0).getTerm(0);
+    }
+
+    public DataTerm max()
+    {
+        DataBlock block = dataLevel.getBlock(dataLevel.blockCount - 1);
+        return block.getTerm(block.termCount() - 1);
+    }
+
+    /**
+     * Search for rows which match all of the terms inside the given expression in the index file.
+     *
+     * @param exp The expression to use for the query.
+     *
+     * @return Iterator which contains rows for all of the terms from the given range.
+     */
+    public RangeIterator<Long, Token> search(Expression exp)
+    {
+        assert mode.supports(exp.getOp());
+
+        if (exp.getOp() == Expression.Op.PREFIX && mode == OnDiskIndexBuilder.Mode.CONTAINS && !hasMarkedPartials)
+            throw new UnsupportedOperationException("prefix queries in CONTAINS mode are not supported by this index");
+
+        // optimization in case single term is requested from index
+        // we don't really need to build additional union iterator
+        if (exp.getOp() == Op.EQ)
+        {
+            DataTerm term = getTerm(exp.lower.value);
+            return term == null ? null : term.getTokens();
+        }
+
+        // convert single NOT_EQ to range with exclusion
+        final Expression expression = (exp.getOp() != Op.NOT_EQ)
+                                        ? exp
+                                        : new Expression(exp).setOp(Op.RANGE)
+                                                .setLower(new Expression.Bound(minTerm, true))
+                                                .setUpper(new Expression.Bound(maxTerm, true))
+                                                .addExclusion(exp.lower.value);
+
+        List<ByteBuffer> exclusions = new ArrayList<>(expression.exclusions.size());
+
+        Iterables.addAll(exclusions, expression.exclusions.stream().filter(exclusion -> {
+            // accept only exclusions which are in the bounds of lower/upper
+            return !(expression.lower != null && comparator.compare(exclusion, expression.lower.value) < 0)
+                && !(expression.upper != null && comparator.compare(exclusion, expression.upper.value) > 0);
+        }).collect(Collectors.toList()));
+
+        Collections.sort(exclusions, comparator);
+
+        if (exclusions.size() == 0)
+            return searchRange(expression);
+
+        List<Expression> ranges = new ArrayList<>(exclusions.size());
+
+        // calculate range splits based on the sorted exclusions
+        Iterator<ByteBuffer> exclusionsIterator = exclusions.iterator();
+
+        Expression.Bound min = expression.lower, max = null;
+        while (exclusionsIterator.hasNext())
+        {
+            max = new Expression.Bound(exclusionsIterator.next(), false);
+            ranges.add(new Expression(expression).setOp(Op.RANGE).setLower(min).setUpper(max));
+            min = max;
+        }
+
+        assert max != null;
+        ranges.add(new Expression(expression).setOp(Op.RANGE).setLower(max).setUpper(expression.upper));
+
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+        for (Expression e : ranges)
+        {
+            @SuppressWarnings("resource")
+            RangeIterator<Long, Token> range = searchRange(e);
+            if (range != null)
+                builder.add(range);
+        }
+
+        return builder.build();
+    }
+
+    private RangeIterator<Long, Token> searchRange(Expression range)
+    {
+        Expression.Bound lower = range.lower;
+        Expression.Bound upper = range.upper;
+
+        int lowerBlock = lower == null ? 0 : getDataBlock(lower.value);
+        int upperBlock = upper == null
+                ? dataLevel.blockCount - 1
+                // optimization so we don't have to fetch upperBlock when query has lower == upper
+                : (lower != null && comparator.compare(lower.value, upper.value) == 0) ? lowerBlock : getDataBlock(upper.value);
+
+        return (mode != OnDiskIndexBuilder.Mode.SPARSE || lowerBlock == upperBlock || upperBlock - lowerBlock <= 1)
+                ? searchPoint(lowerBlock, range)
+                : searchRange(lowerBlock, lower, upperBlock, upper);
+    }
+
+    private RangeIterator<Long, Token> searchRange(int lowerBlock, Expression.Bound lower, int upperBlock, Expression.Bound upper)
+    {
+        // if lower is at the beginning of the block that means we can just do a single iterator per block
+        SearchResult<DataTerm> lowerPosition = (lower == null) ? null : searchIndex(lower.value, lowerBlock);
+        SearchResult<DataTerm> upperPosition = (upper == null) ? null : searchIndex(upper.value, upperBlock);
+
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+
+        // optimistically assume that first and last blocks are full block reads, saves at least 3 'else' conditions
+        int firstFullBlockIdx = lowerBlock, lastFullBlockIdx = upperBlock;
+
+        // 'lower' doesn't cover the whole block so we need to do a partial iteration
+        // Two reasons why that can happen:
+        //   - 'lower' is not the first element of the block
+        //   - 'lower' is first element but it's not inclusive in the query
+        if (lowerPosition != null && (lowerPosition.index > 0 || !lower.inclusive))
+        {
+            DataBlock block = dataLevel.getBlock(lowerBlock);
+            int start = (lower.inclusive || lowerPosition.cmp != 0) ? lowerPosition.index : lowerPosition.index + 1;
+
+            builder.add(block.getRange(start, block.termCount()));
+            firstFullBlockIdx = lowerBlock + 1;
+        }
+
+        if (upperPosition != null)
+        {
+            DataBlock block = dataLevel.getBlock(upperBlock);
+            int lastIndex = block.termCount() - 1;
+
+            // The save as with 'lower' but here we need to check if the upper is the last element of the block,
+            // which means that we only have to get individual results if:
+            //  - if it *is not* the last element, or
+            //  - it *is* but shouldn't be included (dictated by upperInclusive)
+            if (upperPosition.index != lastIndex || !upper.inclusive)
+            {
+                int end = (upperPosition.cmp < 0 || (upperPosition.cmp == 0 && upper.inclusive))
+                                ? upperPosition.index + 1 : upperPosition.index;
+
+                builder.add(block.getRange(0, end));
+                lastFullBlockIdx = upperBlock - 1;
+            }
+        }
+
+        int totalSuperBlocks = (lastFullBlockIdx - firstFullBlockIdx) / OnDiskIndexBuilder.SUPER_BLOCK_SIZE;
+
+        // if there are no super-blocks, we can simply read all of the block iterators in sequence
+        if (totalSuperBlocks == 0)
+        {
+            for (int i = firstFullBlockIdx; i <= lastFullBlockIdx; i++)
+                builder.add(dataLevel.getBlock(i).getBlockIndex().iterator(keyFetcher));
+
+            return builder.build();
+        }
+
+        // first get all of the blocks which are aligned before the first super-block in the sequence,
+        // e.g. if the block range was (1, 9) and super-block-size = 4, we need to read 1, 2, 3, 4 - 7 is covered by
+        // super-block, 8, 9 is a remainder.
+
+        int superBlockAlignedStart = firstFullBlockIdx == 0 ? 0 : (int) FBUtilities.align(firstFullBlockIdx, OnDiskIndexBuilder.SUPER_BLOCK_SIZE);
+        for (int blockIdx = firstFullBlockIdx; blockIdx < Math.min(superBlockAlignedStart, lastFullBlockIdx); blockIdx++)
+            builder.add(getBlockIterator(blockIdx));
+
+        // now read all of the super-blocks matched by the request, from the previous comment
+        // it's a block with index 1 (which covers everything from 4 to 7)
+
+        int superBlockIdx = superBlockAlignedStart / OnDiskIndexBuilder.SUPER_BLOCK_SIZE;
+        for (int offset = 0; offset < totalSuperBlocks - 1; offset++)
+            builder.add(dataLevel.getSuperBlock(superBlockIdx++).iterator());
+
+        // now it's time for a remainder read, again from the previous example it's 8, 9 because
+        // we have over-shot previous block but didn't request enough to cover next super-block.
+
+        int lastCoveredBlock = superBlockIdx * OnDiskIndexBuilder.SUPER_BLOCK_SIZE;
+        for (int offset = 0; offset <= (lastFullBlockIdx - lastCoveredBlock); offset++)
+            builder.add(getBlockIterator(lastCoveredBlock + offset));
+
+        return builder.build();
+    }
+
+    private RangeIterator<Long, Token> searchPoint(int lowerBlock, Expression expression)
+    {
+        Iterator<DataTerm> terms = new TermIterator(lowerBlock, expression, IteratorOrder.DESC);
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+
+        while (terms.hasNext())
+        {
+            try
+            {
+                builder.add(terms.next().getTokens());
+            }
+            finally
+            {
+                expression.checkpoint();
+            }
+        }
+
+        return builder.build();
+    }
+
+    private RangeIterator<Long, Token> getBlockIterator(int blockIdx)
+    {
+        DataBlock block = dataLevel.getBlock(blockIdx);
+        return (block.hasCombinedIndex)
+                ? block.getBlockIndex().iterator(keyFetcher)
+                : block.getRange(0, block.termCount());
+    }
+
+    public Iterator<DataTerm> iteratorAt(ByteBuffer query, IteratorOrder order, boolean inclusive)
+    {
+        Expression e = new Expression("", comparator);
+        Expression.Bound bound = new Expression.Bound(query, inclusive);
+
+        switch (order)
+        {
+            case DESC:
+                e.setLower(bound);
+                break;
+
+            case ASC:
+                e.setUpper(bound);
+                break;
+
+            default:
+                throw new IllegalArgumentException("Unknown order: " + order);
+        }
+
+        return new TermIterator(levels.length == 0 ? 0 : getBlockIdx(findPointer(query), query), e, order);
+    }
+
+    private int getDataBlock(ByteBuffer query)
+    {
+        return levels.length == 0 ? 0 : getBlockIdx(findPointer(query), query);
+    }
+
+    public Iterator<DataTerm> iterator()
+    {
+        return new TermIterator(0, new Expression("", comparator), IteratorOrder.DESC);
+    }
+
+    public void close() throws IOException
+    {
+        FileUtils.closeQuietly(indexFile);
+    }
+
+    private PointerTerm findPointer(ByteBuffer query)
+    {
+        PointerTerm ptr = null;
+        for (PointerLevel level : levels)
+        {
+            if ((ptr = level.getPointer(ptr, query)) == null)
+                return null;
+        }
+
+        return ptr;
+    }
+
+    private DataTerm getTerm(ByteBuffer query)
+    {
+        SearchResult<DataTerm> term = searchIndex(query, getDataBlock(query));
+        return term.cmp == 0 ? term.result : null;
+    }
+
+    private SearchResult<DataTerm> searchIndex(ByteBuffer query, int blockIdx)
+    {
+        return dataLevel.getBlock(blockIdx).search(comparator, query);
+    }
+
+    private int getBlockIdx(PointerTerm ptr, ByteBuffer query)
+    {
+        int blockIdx = 0;
+        if (ptr != null)
+        {
+            int cmp = ptr.compareTo(comparator, query);
+            blockIdx = (cmp == 0 || cmp > 0) ? ptr.getBlock() : ptr.getBlock() + 1;
+        }
+
+        return blockIdx;
+    }
+
+    protected class PointerLevel extends Level<PointerBlock>
+    {
+        public PointerLevel(long offset, int count)
+        {
+            super(offset, count);
+        }
+
+        public PointerTerm getPointer(PointerTerm parent, ByteBuffer query)
+        {
+            return getBlock(getBlockIdx(parent, query)).search(comparator, query).result;
+        }
+
+        protected PointerBlock cast(MappedBuffer block)
+        {
+            return new PointerBlock(block);
+        }
+    }
+
+    protected class DataLevel extends Level<DataBlock>
+    {
+        protected final int superBlockCnt;
+        protected final long superBlocksOffset;
+
+        public DataLevel(long offset, int count)
+        {
+            super(offset, count);
+            long baseOffset = blockOffsets + blockCount * 8;
+            superBlockCnt = indexFile.getInt(baseOffset);
+            superBlocksOffset = baseOffset + 4;
+        }
+
+        protected DataBlock cast(MappedBuffer block)
+        {
+            return new DataBlock(block);
+        }
+
+        public OnDiskSuperBlock getSuperBlock(int idx)
+        {
+            assert idx < superBlockCnt : String.format("requested index %d is greater than super block count %d", idx, superBlockCnt);
+            long blockOffset = indexFile.getLong(superBlocksOffset + idx * 8);
+            return new OnDiskSuperBlock(indexFile.duplicate().position(blockOffset));
+        }
+    }
+
+    protected class OnDiskSuperBlock
+    {
+        private final TokenTree tokenTree;
+
+        public OnDiskSuperBlock(MappedBuffer buffer)
+        {
+            tokenTree = new TokenTree(descriptor, buffer);
+        }
+
+        public RangeIterator<Long, Token> iterator()
+        {
+            return tokenTree.iterator(keyFetcher);
+        }
+    }
+
+    protected abstract class Level<T extends OnDiskBlock>
+    {
+        protected final long blockOffsets;
+        protected final int blockCount;
+
+        public Level(long offsets, int count)
+        {
+            this.blockOffsets = offsets;
+            this.blockCount = count;
+        }
+
+        public T getBlock(int idx) throws FSReadError
+        {
+            assert idx >= 0 && idx < blockCount;
+
+            // calculate block offset and move there
+            // (long is intentional, we'll just need mmap implementation which supports long positions)
+            long blockOffset = indexFile.getLong(blockOffsets + idx * 8);
+            return cast(indexFile.duplicate().position(blockOffset));
+        }
+
+        protected abstract T cast(MappedBuffer block);
+    }
+
+    protected class DataBlock extends OnDiskBlock<DataTerm>
+    {
+        public DataBlock(MappedBuffer data)
+        {
+            super(descriptor, data, BlockType.DATA);
+        }
+
+        protected DataTerm cast(MappedBuffer data)
+        {
+            return new DataTerm(data, termSize, getBlockIndex());
+        }
+
+        public RangeIterator<Long, Token> getRange(int start, int end)
+        {
+            RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+            NavigableMap<Long, Token> sparse = new TreeMap<>();
+
+            for (int i = start; i < end; i++)
+            {
+                DataTerm term = getTerm(i);
+
+                if (term.isSparse())
+                {
+                    NavigableMap<Long, Token> tokens = term.getSparseTokens();
+                    for (Map.Entry<Long, Token> t : tokens.entrySet())
+                    {
+                        Token token = sparse.get(t.getKey());
+                        if (token == null)
+                            sparse.put(t.getKey(), t.getValue());
+                        else
+                            token.merge(t.getValue());
+                    }
+                }
+                else
+                {
+                    builder.add(term.getTokens());
+                }
+            }
+
+            PrefetchedTokensIterator prefetched = sparse.isEmpty() ? null : new PrefetchedTokensIterator(sparse);
+
+            if (builder.rangeCount() == 0)
+                return prefetched;
+
+            builder.add(prefetched);
+            return builder.build();
+        }
+    }
+
+    protected class PointerBlock extends OnDiskBlock<PointerTerm>
+    {
+        public PointerBlock(MappedBuffer block)
+        {
+            super(descriptor, block, BlockType.POINTER);
+        }
+
+        protected PointerTerm cast(MappedBuffer data)
+        {
+            return new PointerTerm(data, termSize, hasMarkedPartials);
+        }
+    }
+
+    public class DataTerm extends Term implements Comparable<DataTerm>
+    {
+        private final TokenTree perBlockIndex;
+
+        protected DataTerm(MappedBuffer content, OnDiskIndexBuilder.TermSize size, TokenTree perBlockIndex)
+        {
+            super(content, size, hasMarkedPartials);
+            this.perBlockIndex = perBlockIndex;
+        }
+
+        public RangeIterator<Long, Token> getTokens()
+        {
+            final long blockEnd = FBUtilities.align(content.position(), OnDiskIndexBuilder.BLOCK_SIZE);
+
+            if (isSparse())
+                return new PrefetchedTokensIterator(getSparseTokens());
+
+            long offset = blockEnd + 4 + content.getInt(getDataOffset() + 1);
+            return new TokenTree(descriptor, indexFile.duplicate().position(offset)).iterator(keyFetcher);
+        }
+
+        public boolean isSparse()
+        {
+            return content.get(getDataOffset()) > 0;
+        }
+
+        public NavigableMap<Long, Token> getSparseTokens()
+        {
+            long ptrOffset = getDataOffset();
+
+            byte size = content.get(ptrOffset);
+
+            assert size > 0;
+
+            NavigableMap<Long, Token> individualTokens = new TreeMap<>();
+            for (int i = 0; i < size; i++)
+            {
+                Token token = perBlockIndex.get(content.getLong(ptrOffset + 1 + (8 * i)), keyFetcher);
+
+                assert token != null;
+                individualTokens.put(token.get(), token);
+            }
+
+            return individualTokens;
+        }
+
+        public int compareTo(DataTerm other)
+        {
+            return other == null ? 1 : compareTo(comparator, other.getTerm());
+        }
+    }
+
+    protected static class PointerTerm extends Term
+    {
+        public PointerTerm(MappedBuffer content, OnDiskIndexBuilder.TermSize size, boolean hasMarkedPartials)
+        {
+            super(content, size, hasMarkedPartials);
+        }
+
+        public int getBlock()
+        {
+            return content.getInt(getDataOffset());
+        }
+    }
+
+    private static class PrefetchedTokensIterator extends RangeIterator<Long, Token>
+    {
+        private final NavigableMap<Long, Token> tokens;
+        private PeekingIterator<Token> currentIterator;
+
+        public PrefetchedTokensIterator(NavigableMap<Long, Token> tokens)
+        {
+            super(tokens.firstKey(), tokens.lastKey(), tokens.size());
+            this.tokens = tokens;
+            this.currentIterator = Iterators.peekingIterator(tokens.values().iterator());
+        }
+
+        protected Token computeNext()
+        {
+            return currentIterator != null && currentIterator.hasNext()
+                    ? currentIterator.next()
+                    : endOfData();
+        }
+
+        protected void performSkipTo(Long nextToken)
+        {
+            currentIterator = Iterators.peekingIterator(tokens.tailMap(nextToken, true).values().iterator());
+        }
+
+        public void close() throws IOException
+        {
+            endOfData();
+        }
+    }
+
+    public AbstractType<?> getComparator()
+    {
+        return comparator;
+    }
+
+    public String getIndexPath()
+    {
+        return indexPath;
+    }
+
+    private class TermIterator extends AbstractIterator<DataTerm>
+    {
+        private final Expression e;
+        private final IteratorOrder order;
+
+        protected OnDiskBlock<DataTerm> currentBlock;
+        protected int blockIndex, offset;
+
+        private boolean checkLower = true, checkUpper = true;
+
+        public TermIterator(int startBlock, Expression expression, IteratorOrder order)
+        {
+            this.e = expression;
+            this.order = order;
+            this.blockIndex = startBlock;
+
+            nextBlock();
+        }
+
+        protected DataTerm computeNext()
+        {
+            for (;;)
+            {
+                if (currentBlock == null)
+                    return endOfData();
+
+                if (offset >= 0 && offset < currentBlock.termCount())
+                {
+                    DataTerm currentTerm = currentBlock.getTerm(nextOffset());
+
+                    // we need to step over all of the partial terms, in PREFIX mode,
+                    // encountered by the query until upper-bound tells us to stop
+                    if (e.getOp() == Op.PREFIX && currentTerm.isPartial())
+                        continue;
+
+                    // haven't reached the start of the query range yet, let's
+                    // keep skip the current term until lower bound is satisfied
+                    if (checkLower && !e.isLowerSatisfiedBy(currentTerm))
+                        continue;
+
+                    // flip the flag right on the first bounds match
+                    // to avoid expensive comparisons
+                    checkLower = false;
+
+                    if (checkUpper && !e.isUpperSatisfiedBy(currentTerm))
+                        return endOfData();
+
+                    return currentTerm;
+                }
+
+                nextBlock();
+            }
+        }
+
+        protected void nextBlock()
+        {
+            currentBlock = null;
+
+            if (blockIndex < 0 || blockIndex >= dataLevel.blockCount)
+                return;
+
+            currentBlock = dataLevel.getBlock(nextBlockIndex());
+            offset = checkLower ? order.startAt(currentBlock, e) : currentBlock.minOffset(order);
+
+            // let's check the last term of the new block right away
+            // if expression's upper bound is satisfied by it such means that we can avoid
+            // doing any expensive upper bound checks for that block.
+            checkUpper = e.hasUpper() && !e.isUpperSatisfiedBy(currentBlock.getTerm(currentBlock.maxOffset(order)));
+        }
+
+        protected int nextBlockIndex()
+        {
+            int current = blockIndex;
+            blockIndex += order.step;
+            return current;
+        }
+
+        protected int nextOffset()
+        {
+            int current = offset;
+            offset += order.step;
+            return current;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndexBuilder.java b/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndexBuilder.java
new file mode 100644
index 0000000..4946f06
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/disk/OnDiskIndexBuilder.java

@@ -0,0 +1,670 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.*;
+
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.index.sasi.plan.Expression.Op;
+import org.apache.cassandra.index.sasi.sa.IndexedTerm;
+import org.apache.cassandra.index.sasi.sa.IntegralSA;
+import org.apache.cassandra.index.sasi.sa.SA;
+import org.apache.cassandra.index.sasi.sa.TermIterator;
+import org.apache.cassandra.index.sasi.sa.SuffixSA;
+import org.apache.cassandra.db.marshal.*;
+import org.apache.cassandra.io.FSWriteError;
+import org.apache.cassandra.io.util.*;
+import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.Pair;
+
+import com.carrotsearch.hppc.LongArrayList;
+import com.carrotsearch.hppc.LongSet;
+import com.carrotsearch.hppc.ShortArrayList;
+import com.google.common.annotations.VisibleForTesting;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class OnDiskIndexBuilder
+{
+    private static final Logger logger = LoggerFactory.getLogger(OnDiskIndexBuilder.class);
+
+    public enum Mode
+    {
+        PREFIX(EnumSet.of(Op.EQ, Op.MATCH, Op.PREFIX, Op.NOT_EQ, Op.RANGE)),
+        CONTAINS(EnumSet.of(Op.EQ, Op.MATCH, Op.CONTAINS, Op.PREFIX, Op.SUFFIX, Op.NOT_EQ)),
+        SPARSE(EnumSet.of(Op.EQ, Op.NOT_EQ, Op.RANGE));
+
+        Set<Op> supportedOps;
+
+        Mode(Set<Op> ops)
+        {
+            supportedOps = ops;
+        }
+
+        public static Mode mode(String mode)
+        {
+            return Mode.valueOf(mode.toUpperCase());
+        }
+
+        public boolean supports(Op op)
+        {
+            return supportedOps.contains(op);
+        }
+    }
+
+    public enum TermSize
+    {
+        INT(4), LONG(8), UUID(16), VARIABLE(-1);
+
+        public final int size;
+
+        TermSize(int size)
+        {
+            this.size = size;
+        }
+
+        public boolean isConstant()
+        {
+            return this != VARIABLE;
+        }
+
+        public static TermSize of(int size)
+        {
+            switch (size)
+            {
+                case -1:
+                    return VARIABLE;
+
+                case 4:
+                    return INT;
+
+                case 8:
+                    return LONG;
+
+                case 16:
+                    return UUID;
+
+                default:
+                    throw new IllegalStateException("unknown state: " + size);
+            }
+        }
+
+        public static TermSize sizeOf(AbstractType<?> comparator)
+        {
+            if (comparator instanceof Int32Type || comparator instanceof FloatType)
+                return INT;
+
+            if (comparator instanceof LongType || comparator instanceof DoubleType
+                    || comparator instanceof TimestampType || comparator instanceof DateType)
+                return LONG;
+
+            if (comparator instanceof TimeUUIDType || comparator instanceof UUIDType)
+                return UUID;
+
+            return VARIABLE;
+        }
+    }
+
+    public static final int BLOCK_SIZE = 4096;
+    public static final int MAX_TERM_SIZE = 1024;
+    public static final int SUPER_BLOCK_SIZE = 64;
+    public static final int IS_PARTIAL_BIT = 15;
+
+    private static final SequentialWriterOption WRITER_OPTION = SequentialWriterOption.newBuilder()
+                                                                                      .bufferSize(BLOCK_SIZE)
+                                                                                      .build();
+
+    private final List<MutableLevel<InMemoryPointerTerm>> levels = new ArrayList<>();
+    private MutableLevel<InMemoryDataTerm> dataLevel;
+
+    private final TermSize termSize;
+
+    private final AbstractType<?> keyComparator, termComparator;
+
+    private final Map<ByteBuffer, TokenTreeBuilder> terms;
+    private final Mode mode;
+    private final boolean marksPartials;
+
+    private ByteBuffer minKey, maxKey;
+    private long estimatedBytes;
+
+    public OnDiskIndexBuilder(AbstractType<?> keyComparator, AbstractType<?> comparator, Mode mode)
+    {
+        this(keyComparator, comparator, mode, true);
+    }
+
+    public OnDiskIndexBuilder(AbstractType<?> keyComparator, AbstractType<?> comparator, Mode mode, boolean marksPartials)
+    {
+        this.keyComparator = keyComparator;
+        this.termComparator = comparator;
+        this.terms = new HashMap<>();
+        this.termSize = TermSize.sizeOf(comparator);
+        this.mode = mode;
+        this.marksPartials = marksPartials;
+    }
+
+    public OnDiskIndexBuilder add(ByteBuffer term, DecoratedKey key, long keyPosition)
+    {
+        if (term.remaining() >= MAX_TERM_SIZE)
+        {
+            logger.error("Rejecting value (value size {}, maximum size {}).",
+                         FBUtilities.prettyPrintMemory(term.remaining()),
+                         FBUtilities.prettyPrintMemory(Short.MAX_VALUE));
+            return this;
+        }
+
+        TokenTreeBuilder tokens = terms.get(term);
+        if (tokens == null)
+        {
+            terms.put(term, (tokens = new DynamicTokenTreeBuilder()));
+
+            // on-heap size estimates from jol
+            // 64 bytes for TTB + 48 bytes for TreeMap in TTB + size bytes for the term (map key)
+            estimatedBytes += 64 + 48 + term.remaining();
+        }
+
+        tokens.add((Long) key.getToken().getTokenValue(), keyPosition);
+
+        // calculate key range (based on actual key values) for current index
+        minKey = (minKey == null || keyComparator.compare(minKey, key.getKey()) > 0) ? key.getKey() : minKey;
+        maxKey = (maxKey == null || keyComparator.compare(maxKey, key.getKey()) < 0) ? key.getKey() : maxKey;
+
+        // 60 ((boolean(1)*4) + (long(8)*4) + 24) bytes for the LongOpenHashSet created when the keyPosition was added
+        // + 40 bytes for the TreeMap.Entry + 8 bytes for the token (key).
+        // in the case of hash collision for the token we may overestimate but this is extremely rare
+        estimatedBytes += 60 + 40 + 8;
+
+        return this;
+    }
+
+    public long estimatedMemoryUse()
+    {
+        return estimatedBytes;
+    }
+
+    private void addTerm(InMemoryDataTerm term, SequentialWriter out) throws IOException
+    {
+        InMemoryPointerTerm ptr = dataLevel.add(term);
+        if (ptr == null)
+            return;
+
+        int levelIdx = 0;
+        for (;;)
+        {
+            MutableLevel<InMemoryPointerTerm> level = getIndexLevel(levelIdx++, out);
+            if ((ptr = level.add(ptr)) == null)
+                break;
+        }
+    }
+
+    public boolean isEmpty()
+    {
+        return terms.isEmpty();
+    }
+
+    public void finish(Pair<ByteBuffer, ByteBuffer> range, File file, TermIterator terms)
+    {
+        finish(Descriptor.CURRENT, range, file, terms);
+    }
+
+    /**
+     * Finishes up index building process by creating/populating index file.
+     *
+     * @param indexFile The file to write index contents to.
+     *
+     * @return true if index was written successfully, false otherwise (e.g. if index was empty).
+     *
+     * @throws FSWriteError on I/O error.
+     */
+    public boolean finish(File indexFile) throws FSWriteError
+    {
+        return finish(Descriptor.CURRENT, indexFile);
+    }
+
+    @VisibleForTesting
+    protected boolean finish(Descriptor descriptor, File file) throws FSWriteError
+    {
+        // no terms means there is nothing to build
+        if (terms.isEmpty())
+            return false;
+
+        // split terms into suffixes only if it's text, otherwise (even if CONTAINS is set) use terms in original form
+        SA sa = ((termComparator instanceof UTF8Type || termComparator instanceof AsciiType) && mode == Mode.CONTAINS)
+                    ? new SuffixSA(termComparator, mode) : new IntegralSA(termComparator, mode);
+
+        for (Map.Entry<ByteBuffer, TokenTreeBuilder> term : terms.entrySet())
+            sa.add(term.getKey(), term.getValue());
+
+        finish(descriptor, Pair.create(minKey, maxKey), file, sa.finish());
+        return true;
+    }
+
+    @SuppressWarnings("resource")
+    protected void finish(Descriptor descriptor, Pair<ByteBuffer, ByteBuffer> range, File file, TermIterator terms)
+    {
+        SequentialWriter out = null;
+
+        try
+        {
+            out = new SequentialWriter(file, WRITER_OPTION);
+
+            out.writeUTF(descriptor.version.toString());
+
+            out.writeShort(termSize.size);
+
+            // min, max term (useful to find initial scan range from search expressions)
+            ByteBufferUtil.writeWithShortLength(terms.minTerm(), out);
+            ByteBufferUtil.writeWithShortLength(terms.maxTerm(), out);
+
+            // min, max keys covered by index (useful when searching across multiple indexes)
+            ByteBufferUtil.writeWithShortLength(range.left, out);
+            ByteBufferUtil.writeWithShortLength(range.right, out);
+
+            out.writeUTF(mode.toString());
+            out.writeBoolean(marksPartials);
+
+            out.skipBytes((int) (BLOCK_SIZE - out.position()));
+
+            dataLevel = mode == Mode.SPARSE ? new DataBuilderLevel(out, new MutableDataBlock(termComparator, mode))
+                                            : new MutableLevel<>(out, new MutableDataBlock(termComparator, mode));
+            while (terms.hasNext())
+            {
+                Pair<IndexedTerm, TokenTreeBuilder> term = terms.next();
+                addTerm(new InMemoryDataTerm(term.left, term.right), out);
+            }
+
+            dataLevel.finalFlush();
+            for (MutableLevel l : levels)
+                l.flush(); // flush all of the buffers
+
+            // and finally write levels index
+            final long levelIndexPosition = out.position();
+
+            out.writeInt(levels.size());
+            for (int i = levels.size() - 1; i >= 0; i--)
+                levels.get(i).flushMetadata();
+
+            dataLevel.flushMetadata();
+
+            out.writeLong(levelIndexPosition);
+
+            // sync contents of the output and disk,
+            // since it's not done implicitly on close
+            out.sync();
+        }
+        catch (IOException e)
+        {
+            throw new FSWriteError(e, file);
+        }
+        finally
+        {
+            FileUtils.closeQuietly(out);
+        }
+    }
+
+    private MutableLevel<InMemoryPointerTerm> getIndexLevel(int idx, SequentialWriter out)
+    {
+        if (levels.size() == 0)
+            levels.add(new MutableLevel<>(out, new MutableBlock<>()));
+
+        if (levels.size() - 1 < idx)
+        {
+            int toAdd = idx - (levels.size() - 1);
+            for (int i = 0; i < toAdd; i++)
+                levels.add(new MutableLevel<>(out, new MutableBlock<>()));
+        }
+
+        return levels.get(idx);
+    }
+
+    protected static void alignToBlock(SequentialWriter out) throws IOException
+    {
+        long endOfBlock = out.position();
+        if ((endOfBlock & (BLOCK_SIZE - 1)) != 0) // align on the block boundary if needed
+            out.skipBytes((int) (FBUtilities.align(endOfBlock, BLOCK_SIZE) - endOfBlock));
+    }
+
+    private class InMemoryTerm
+    {
+        protected final IndexedTerm term;
+
+        public InMemoryTerm(IndexedTerm term)
+        {
+            this.term = term;
+        }
+
+        public int serializedSize()
+        {
+            return (termSize.isConstant() ? 0 : 2) + term.getBytes().remaining();
+        }
+
+        public void serialize(DataOutputPlus out) throws IOException
+        {
+            if (termSize.isConstant())
+            {
+                out.write(term.getBytes());
+            }
+            else
+            {
+                out.writeShort(term.getBytes().remaining() | ((marksPartials && term.isPartial() ? 1 : 0) << IS_PARTIAL_BIT));
+                out.write(term.getBytes());
+            }
+
+        }
+    }
+
+    private class InMemoryPointerTerm extends InMemoryTerm
+    {
+        protected final int blockCnt;
+
+        public InMemoryPointerTerm(IndexedTerm term, int blockCnt)
+        {
+            super(term);
+            this.blockCnt = blockCnt;
+        }
+
+        public int serializedSize()
+        {
+            return super.serializedSize() + 4;
+        }
+
+        public void serialize(DataOutputPlus out) throws IOException
+        {
+            super.serialize(out);
+            out.writeInt(blockCnt);
+        }
+    }
+
+    private class InMemoryDataTerm extends InMemoryTerm
+    {
+        private final TokenTreeBuilder keys;
+
+        public InMemoryDataTerm(IndexedTerm term, TokenTreeBuilder keys)
+        {
+            super(term);
+            this.keys = keys;
+        }
+    }
+
+    private class MutableLevel<T extends InMemoryTerm>
+    {
+        private final LongArrayList blockOffsets = new LongArrayList();
+
+        protected final SequentialWriter out;
+
+        private final MutableBlock<T> inProcessBlock;
+        private InMemoryPointerTerm lastTerm;
+
+        public MutableLevel(SequentialWriter out, MutableBlock<T> block)
+        {
+            this.out = out;
+            this.inProcessBlock = block;
+        }
+
+        /**
+         * @return If we flushed a block, return the last term of that block; else, null.
+         */
+        public InMemoryPointerTerm add(T term) throws IOException
+        {
+            InMemoryPointerTerm toPromote = null;
+
+            if (!inProcessBlock.hasSpaceFor(term))
+            {
+                flush();
+                toPromote = lastTerm;
+            }
+
+            inProcessBlock.add(term);
+
+            lastTerm = new InMemoryPointerTerm(term.term, blockOffsets.size());
+            return toPromote;
+        }
+
+        public void flush() throws IOException
+        {
+            blockOffsets.add(out.position());
+            inProcessBlock.flushAndClear(out);
+        }
+
+        public void finalFlush() throws IOException
+        {
+            flush();
+        }
+
+        public void flushMetadata() throws IOException
+        {
+            flushMetadata(blockOffsets);
+        }
+
+        protected void flushMetadata(LongArrayList longArrayList) throws IOException
+        {
+            out.writeInt(longArrayList.size());
+            for (int i = 0; i < longArrayList.size(); i++)
+                out.writeLong(longArrayList.get(i));
+        }
+    }
+
+    /** builds standard data blocks and super blocks, as well */
+    private class DataBuilderLevel extends MutableLevel<InMemoryDataTerm>
+    {
+        private final LongArrayList superBlockOffsets = new LongArrayList();
+
+        /** count of regular data blocks written since current super block was init'd */
+        private int dataBlocksCnt;
+        private TokenTreeBuilder superBlockTree;
+
+        public DataBuilderLevel(SequentialWriter out, MutableBlock<InMemoryDataTerm> block)
+        {
+            super(out, block);
+            superBlockTree = new DynamicTokenTreeBuilder();
+        }
+
+        public InMemoryPointerTerm add(InMemoryDataTerm term) throws IOException
+        {
+            InMemoryPointerTerm ptr = super.add(term);
+            if (ptr != null)
+            {
+                dataBlocksCnt++;
+                flushSuperBlock(false);
+            }
+            superBlockTree.add(term.keys);
+            return ptr;
+        }
+
+        public void flushSuperBlock(boolean force) throws IOException
+        {
+            if (dataBlocksCnt == SUPER_BLOCK_SIZE || (force && !superBlockTree.isEmpty()))
+            {
+                superBlockOffsets.add(out.position());
+                superBlockTree.finish().write(out);
+                alignToBlock(out);
+
+                dataBlocksCnt = 0;
+                superBlockTree = new DynamicTokenTreeBuilder();
+            }
+        }
+
+        public void finalFlush() throws IOException
+        {
+            super.flush();
+            flushSuperBlock(true);
+        }
+
+        public void flushMetadata() throws IOException
+        {
+            super.flushMetadata();
+            flushMetadata(superBlockOffsets);
+        }
+    }
+
+    private static class MutableBlock<T extends InMemoryTerm>
+    {
+        protected final DataOutputBufferFixed buffer;
+        protected final ShortArrayList offsets;
+
+        public MutableBlock()
+        {
+            buffer = new DataOutputBufferFixed(BLOCK_SIZE);
+            offsets = new ShortArrayList();
+        }
+
+        public final void add(T term) throws IOException
+        {
+            offsets.add((short) buffer.position());
+            addInternal(term);
+        }
+
+        protected void addInternal(T term) throws IOException
+        {
+            term.serialize(buffer);
+        }
+
+        public boolean hasSpaceFor(T element)
+        {
+            return sizeAfter(element) < BLOCK_SIZE;
+        }
+
+        protected int sizeAfter(T element)
+        {
+            return getWatermark() + 4 + element.serializedSize();
+        }
+
+        protected int getWatermark()
+        {
+            return 4 + offsets.size() * 2 + (int) buffer.position();
+        }
+
+        public void flushAndClear(SequentialWriter out) throws IOException
+        {
+            out.writeInt(offsets.size());
+            for (int i = 0; i < offsets.size(); i++)
+                out.writeShort(offsets.get(i));
+
+            out.write(buffer.buffer());
+
+            alignToBlock(out);
+
+            offsets.clear();
+            buffer.clear();
+        }
+    }
+
+    private static class MutableDataBlock extends MutableBlock<InMemoryDataTerm>
+    {
+        private static final int MAX_KEYS_SPARSE = 5;
+
+        private final AbstractType<?> comparator;
+        private final Mode mode;
+
+        private int offset = 0;
+
+        private final List<TokenTreeBuilder> containers = new ArrayList<>();
+        private TokenTreeBuilder combinedIndex;
+
+        public MutableDataBlock(AbstractType<?> comparator, Mode mode)
+        {
+            this.comparator = comparator;
+            this.mode = mode;
+            this.combinedIndex = initCombinedIndex();
+        }
+
+        protected void addInternal(InMemoryDataTerm term) throws IOException
+        {
+            TokenTreeBuilder keys = term.keys;
+
+            if (mode == Mode.SPARSE)
+            {
+                if (keys.getTokenCount() > MAX_KEYS_SPARSE)
+                    throw new IOException(String.format("Term - '%s' belongs to more than %d keys in %s mode, which is not allowed.",
+                                                        comparator.getString(term.term.getBytes()), MAX_KEYS_SPARSE, mode.name()));
+
+                writeTerm(term, keys);
+            }
+            else
+            {
+                writeTerm(term, offset);
+
+                offset += keys.serializedSize();
+                containers.add(keys);
+            }
+
+            if (mode == Mode.SPARSE)
+                combinedIndex.add(keys);
+        }
+
+        protected int sizeAfter(InMemoryDataTerm element)
+        {
+            return super.sizeAfter(element) + ptrLength(element);
+        }
+
+        public void flushAndClear(SequentialWriter out) throws IOException
+        {
+            super.flushAndClear(out);
+
+            out.writeInt(mode == Mode.SPARSE ? offset : -1);
+
+            if (containers.size() > 0)
+            {
+                for (TokenTreeBuilder tokens : containers)
+                    tokens.write(out);
+            }
+
+            if (mode == Mode.SPARSE && combinedIndex != null)
+                combinedIndex.finish().write(out);
+
+            alignToBlock(out);
+
+            containers.clear();
+            combinedIndex = initCombinedIndex();
+
+            offset = 0;
+        }
+
+        private int ptrLength(InMemoryDataTerm term)
+        {
+            return (term.keys.getTokenCount() > 5)
+                    ? 5 // 1 byte type + 4 byte offset to the tree
+                    : 1 + (8 * (int) term.keys.getTokenCount()); // 1 byte size + n 8 byte tokens
+        }
+
+        private void writeTerm(InMemoryTerm term, TokenTreeBuilder keys) throws IOException
+        {
+            term.serialize(buffer);
+            buffer.writeByte((byte) keys.getTokenCount());
+            for (Pair<Long, LongSet> key : keys)
+                buffer.writeLong(key.left);
+        }
+
+        private void writeTerm(InMemoryTerm term, int offset) throws IOException
+        {
+            term.serialize(buffer);
+            buffer.writeByte(0x0);
+            buffer.writeInt(offset);
+        }
+
+        private TokenTreeBuilder initCombinedIndex()
+        {
+            return mode == Mode.SPARSE ? new DynamicTokenTreeBuilder() : null;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriter.java b/src/java/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriter.java
new file mode 100644
index 0000000..6a46338
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriter.java

@@ -0,0 +1,377 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import java.io.File;
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.HashSet;
+import java.util.Map;
+import java.util.Set;
+import java.util.concurrent.*;
+
+import org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor;
+import org.apache.cassandra.concurrent.NamedThreadFactory;
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.compaction.OperationType;
+import org.apache.cassandra.db.rows.Row;
+import org.apache.cassandra.db.rows.Unfiltered;
+import org.apache.cassandra.index.sasi.analyzer.AbstractAnalyzer;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.utils.CombinedTermIterator;
+import org.apache.cassandra.index.sasi.utils.TypeUtil;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.io.FSError;
+import org.apache.cassandra.io.sstable.Descriptor;
+import org.apache.cassandra.io.sstable.format.SSTableFlushObserver;
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.Pair;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.util.concurrent.Futures;
+import com.google.common.util.concurrent.Uninterruptibles;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class PerSSTableIndexWriter implements SSTableFlushObserver
+{
+    private static final Logger logger = LoggerFactory.getLogger(PerSSTableIndexWriter.class);
+
+    private static final ThreadPoolExecutor INDEX_FLUSHER_MEMTABLE;
+    private static final ThreadPoolExecutor INDEX_FLUSHER_GENERAL;
+
+    static
+    {
+        INDEX_FLUSHER_GENERAL = new JMXEnabledThreadPoolExecutor(1, 8, 60, TimeUnit.SECONDS,
+                                                                 new LinkedBlockingQueue<>(),
+                                                                 new NamedThreadFactory("SASI-General"),
+                                                                 "internal");
+        INDEX_FLUSHER_GENERAL.allowCoreThreadTimeOut(true);
+
+        INDEX_FLUSHER_MEMTABLE = new JMXEnabledThreadPoolExecutor(1, 8, 60, TimeUnit.SECONDS,
+                                                                  new LinkedBlockingQueue<>(),
+                                                                  new NamedThreadFactory("SASI-Memtable"),
+                                                                  "internal");
+        INDEX_FLUSHER_MEMTABLE.allowCoreThreadTimeOut(true);
+    }
+
+    private final int nowInSec = FBUtilities.nowInSeconds();
+
+    private final Descriptor descriptor;
+    private final OperationType source;
+
+    private final AbstractType<?> keyValidator;
+    private final Map<ColumnDefinition, ColumnIndex> supportedIndexes;
+
+    @VisibleForTesting
+    protected final Map<ColumnDefinition, Index> indexes;
+
+    private DecoratedKey currentKey;
+    private long currentKeyPosition;
+    private boolean isComplete;
+
+    public PerSSTableIndexWriter(AbstractType<?> keyValidator,
+                                 Descriptor descriptor,
+                                 OperationType source,
+                                 Map<ColumnDefinition, ColumnIndex> supportedIndexes)
+    {
+        this.keyValidator = keyValidator;
+        this.descriptor = descriptor;
+        this.source = source;
+        this.supportedIndexes = supportedIndexes;
+        this.indexes = new HashMap<>();
+    }
+
+    public void begin()
+    {}
+
+    public void startPartition(DecoratedKey key, long curPosition)
+    {
+        currentKey = key;
+        currentKeyPosition = curPosition;
+    }
+
+    public void nextUnfilteredCluster(Unfiltered unfiltered)
+    {
+        if (!unfiltered.isRow())
+            return;
+
+        Row row = (Row) unfiltered;
+
+        supportedIndexes.keySet().forEach((column) -> {
+            ByteBuffer value = ColumnIndex.getValueOf(column, row, nowInSec);
+            if (value == null)
+                return;
+
+            ColumnIndex columnIndex = supportedIndexes.get(column);
+            if (columnIndex == null)
+                return;
+
+            Index index = indexes.get(column);
+            if (index == null)
+                indexes.put(column, (index = newIndex(columnIndex)));
+
+            index.add(value.duplicate(), currentKey, currentKeyPosition);
+        });
+    }
+
+    public void complete()
+    {
+        if (isComplete)
+            return;
+
+        currentKey = null;
+
+        try
+        {
+            CountDownLatch latch = new CountDownLatch(indexes.size());
+            for (Index index : indexes.values())
+                index.complete(latch);
+
+            Uninterruptibles.awaitUninterruptibly(latch);
+        }
+        finally
+        {
+            indexes.clear();
+            isComplete = true;
+        }
+    }
+
+    public Index getIndex(ColumnDefinition columnDef)
+    {
+        return indexes.get(columnDef);
+    }
+
+    public Descriptor getDescriptor()
+    {
+        return descriptor;
+    }
+
+    @VisibleForTesting
+    protected Index newIndex(ColumnIndex columnIndex)
+    {
+        return new Index(columnIndex);
+    }
+
+    @VisibleForTesting
+    protected class Index
+    {
+        @VisibleForTesting
+        protected final String outputFile;
+
+        private final ColumnIndex columnIndex;
+        private final AbstractAnalyzer analyzer;
+        private final long maxMemorySize;
+
+        @VisibleForTesting
+        protected final Set<Future<OnDiskIndex>> segments;
+        private int segmentNumber = 0;
+
+        private OnDiskIndexBuilder currentBuilder;
+
+        public Index(ColumnIndex columnIndex)
+        {
+            this.columnIndex = columnIndex;
+            this.outputFile = descriptor.filenameFor(columnIndex.getComponent());
+            this.analyzer = columnIndex.getAnalyzer();
+            this.segments = new HashSet<>();
+            this.maxMemorySize = maxMemorySize(columnIndex);
+            this.currentBuilder = newIndexBuilder();
+        }
+
+        public void add(ByteBuffer term, DecoratedKey key, long keyPosition)
+        {
+            if (term.remaining() == 0)
+                return;
+
+            boolean isAdded = false;
+
+            analyzer.reset(term);
+            while (analyzer.hasNext())
+            {
+                ByteBuffer token = analyzer.next();
+                int size = token.remaining();
+
+                if (token.remaining() >= OnDiskIndexBuilder.MAX_TERM_SIZE)
+                {
+                    logger.info("Rejecting value (size {}, maximum {}) for column {} (analyzed {}) at {} SSTable.",
+                            FBUtilities.prettyPrintMemory(term.remaining()),
+                            FBUtilities.prettyPrintMemory(OnDiskIndexBuilder.MAX_TERM_SIZE),
+                            columnIndex.getColumnName(),
+                            columnIndex.getMode().isAnalyzed,
+                            descriptor);
+                    continue;
+                }
+
+                if (!TypeUtil.isValid(token, columnIndex.getValidator()))
+                {
+                    if ((token = TypeUtil.tryUpcast(token, columnIndex.getValidator())) == null)
+                    {
+                        logger.info("({}) Failed to add {} to index for key: {}, value size was {}, validator is {}.",
+                                    outputFile,
+                                    columnIndex.getColumnName(),
+                                    keyValidator.getString(key.getKey()),
+                                    FBUtilities.prettyPrintMemory(size),
+                                    columnIndex.getValidator());
+                        continue;
+                    }
+                }
+
+                currentBuilder.add(token, key, keyPosition);
+                isAdded = true;
+            }
+
+            if (!isAdded || currentBuilder.estimatedMemoryUse() < maxMemorySize)
+                return; // non of the generated tokens were added to the index or memory size wasn't reached
+
+            segments.add(getExecutor().submit(scheduleSegmentFlush(false)));
+        }
+
+        @VisibleForTesting
+        protected Callable<OnDiskIndex> scheduleSegmentFlush(final boolean isFinal)
+        {
+            final OnDiskIndexBuilder builder = currentBuilder;
+            currentBuilder = newIndexBuilder();
+
+            final String segmentFile = filename(isFinal);
+
+            return () -> {
+                long start = System.nanoTime();
+
+                try
+                {
+                    File index = new File(segmentFile);
+                    return builder.finish(index) ? new OnDiskIndex(index, columnIndex.getValidator(), null) : null;
+                }
+                catch (Exception | FSError e)
+                {
+                    logger.error("Failed to build index segment {}", segmentFile, e);
+                    return null;
+                }
+                finally
+                {
+                    if (!isFinal)
+                        logger.info("Flushed index segment {}, took {} ms.", segmentFile, TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start));
+                }
+            };
+        }
+
+        public void complete(final CountDownLatch latch)
+        {
+            logger.info("Scheduling index flush to {}", outputFile);
+
+            getExecutor().submit((Runnable) () -> {
+                long start1 = System.nanoTime();
+
+                OnDiskIndex[] parts = new OnDiskIndex[segments.size() + 1];
+
+                try
+                {
+                    // no parts present, build entire index from memory
+                    if (segments.isEmpty())
+                    {
+                        scheduleSegmentFlush(true).call();
+                        return;
+                    }
+
+                    // parts are present but there is something still in memory, let's flush that inline
+                    if (!currentBuilder.isEmpty())
+                    {
+                        @SuppressWarnings("resource")
+                        OnDiskIndex last = scheduleSegmentFlush(false).call();
+                        segments.add(Futures.immediateFuture(last));
+                    }
+
+                    int index = 0;
+                    ByteBuffer combinedMin = null, combinedMax = null;
+
+                    for (Future<OnDiskIndex> f : segments)
+                    {
+                        OnDiskIndex part = f.get();
+                        if (part == null)
+                            continue;
+
+                        parts[index++] = part;
+                        combinedMin = (combinedMin == null || keyValidator.compare(combinedMin, part.minKey()) > 0) ? part.minKey() : combinedMin;
+                        combinedMax = (combinedMax == null || keyValidator.compare(combinedMax, part.maxKey()) < 0) ? part.maxKey() : combinedMax;
+                    }
+
+                    OnDiskIndexBuilder builder = newIndexBuilder();
+                    builder.finish(Pair.create(combinedMin, combinedMax),
+                                   new File(outputFile),
+                                   new CombinedTermIterator(parts));
+                }
+                catch (Exception | FSError e)
+                {
+                    logger.error("Failed to flush index {}.", outputFile, e);
+                    FileUtils.delete(outputFile);
+                }
+                finally
+                {
+                    logger.info("Index flush to {} took {} ms.", outputFile, TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start1));
+
+                    for (int segment = 0; segment < segmentNumber; segment++)
+                    {
+                        OnDiskIndex part = parts[segment];
+
+                        if (part != null)
+                            FileUtils.closeQuietly(part);
+
+                        FileUtils.delete(outputFile + "_" + segment);
+                    }
+
+                    latch.countDown();
+                }
+            });
+        }
+
+        private ExecutorService getExecutor()
+        {
+            return source == OperationType.FLUSH ? INDEX_FLUSHER_MEMTABLE : INDEX_FLUSHER_GENERAL;
+        }
+
+        private OnDiskIndexBuilder newIndexBuilder()
+        {
+            return new OnDiskIndexBuilder(keyValidator, columnIndex.getValidator(), columnIndex.getMode().mode);
+        }
+
+        public String filename(boolean isFinal)
+        {
+            return outputFile + (isFinal ? "" : "_" + segmentNumber++);
+        }
+    }
+
+    protected long maxMemorySize(ColumnIndex columnIndex)
+    {
+        // 1G for memtable and configuration for compaction
+        return source == OperationType.FLUSH ? 1073741824L : columnIndex.getMode().maxCompactionFlushMemoryInMb;
+    }
+
+    public int hashCode()
+    {
+        return descriptor.hashCode();
+    }
+
+    public boolean equals(Object o)
+    {
+        return !(o == null || !(o instanceof PerSSTableIndexWriter)) && descriptor.equals(((PerSSTableIndexWriter) o).descriptor);
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/disk/StaticTokenTreeBuilder.java b/src/java/org/apache/cassandra/index/sasi/disk/StaticTokenTreeBuilder.java
new file mode 100644
index 0000000..7a41b38
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/disk/StaticTokenTreeBuilder.java

@@ -0,0 +1,252 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Iterator;
+import java.util.SortedMap;
+
+import org.apache.cassandra.index.sasi.utils.CombinedTerm;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.io.util.DataOutputPlus;
+import org.apache.cassandra.utils.AbstractIterator;
+import org.apache.cassandra.utils.Pair;
+
+import com.carrotsearch.hppc.LongSet;
+import com.google.common.collect.Iterators;
+
+/**
+ * Intended usage of this class is to be used in place of {@link DynamicTokenTreeBuilder}
+ * when multiple index segments produced by {@link PerSSTableIndexWriter} are stitched together
+ * by {@link PerSSTableIndexWriter#complete()}.
+ *
+ * This class uses the RangeIterator, now provided by
+ * {@link CombinedTerm#getTokenIterator()}, to iterate the data twice.
+ * The first iteration builds the tree with leaves that contain only enough
+ * information to build the upper layers -- these leaves do not store more
+ * than their minimum and maximum tokens plus their total size, which makes them
+ * un-serializable.
+ *
+ * When the tree is written to disk the final layer is not
+ * written. Its at this point the data is iterated once again to write
+ * the leaves to disk. This (logarithmically) reduces copying of the
+ * token values while building and writing upper layers of the tree,
+ * removes the use of SortedMap when combining SAs, and relies on the
+ * memory mapped SAs otherwise, greatly improving performance and no
+ * longer causing OOMs when TokenTree sizes are big.
+ *
+ * See https://issues.apache.org/jira/browse/CASSANDRA-11383 for more details.
+ */
+public class StaticTokenTreeBuilder extends AbstractTokenTreeBuilder
+{
+    private final CombinedTerm combinedTerm;
+
+    public StaticTokenTreeBuilder(CombinedTerm term)
+    {
+        combinedTerm = term;
+    }
+
+    public void add(Long token, long keyPosition)
+    {
+        throw new UnsupportedOperationException();
+    }
+
+    public void add(SortedMap<Long, LongSet> data)
+    {
+        throw new UnsupportedOperationException();
+    }
+
+    public void add(Iterator<Pair<Long, LongSet>> data)
+    {
+        throw new UnsupportedOperationException();
+    }
+
+    public boolean isEmpty()
+    {
+        return tokenCount == 0;
+    }
+
+    public Iterator<Pair<Long, LongSet>> iterator()
+    {
+        Iterator<Token> iterator = combinedTerm.getTokenIterator();
+        return new AbstractIterator<Pair<Long, LongSet>>()
+        {
+            protected Pair<Long, LongSet> computeNext()
+            {
+                if (!iterator.hasNext())
+                    return endOfData();
+
+                Token token = iterator.next();
+                return Pair.create(token.get(), token.getOffsets());
+            }
+        };
+    }
+
+    public long getTokenCount()
+    {
+        return tokenCount;
+    }
+
+    @Override
+    public void write(DataOutputPlus out) throws IOException
+    {
+        // if the root is not a leaf then none of the leaves have been written (all are PartialLeaf)
+        // so write out the last layer of the tree by converting PartialLeaf to StaticLeaf and
+        // iterating the data once more
+        super.write(out);
+        if (root.isLeaf())
+            return;
+
+        RangeIterator<Long, Token> tokens = combinedTerm.getTokenIterator();
+        ByteBuffer blockBuffer = ByteBuffer.allocate(BLOCK_BYTES);
+        Iterator<Node> leafIterator = leftmostLeaf.levelIterator();
+        while (leafIterator.hasNext())
+        {
+            Leaf leaf = (Leaf) leafIterator.next();
+            Leaf writeableLeaf = new StaticLeaf(Iterators.limit(tokens, leaf.tokenCount()), leaf);
+            writeableLeaf.serialize(-1, blockBuffer);
+            flushBuffer(blockBuffer, out, true);
+        }
+
+    }
+
+    protected void constructTree()
+    {
+        RangeIterator<Long, Token> tokens = combinedTerm.getTokenIterator();
+
+        tokenCount = 0;
+        treeMinToken = tokens.getMinimum();
+        treeMaxToken = tokens.getMaximum();
+        numBlocks = 1;
+
+        root = new InteriorNode();
+        rightmostParent = (InteriorNode) root;
+        Leaf lastLeaf = null;
+        Long lastToken, firstToken = null;
+        int leafSize = 0;
+        while (tokens.hasNext())
+        {
+            Long token = tokens.next().get();
+            if (firstToken == null)
+                firstToken = token;
+
+            tokenCount++;
+            leafSize++;
+
+            // skip until the last token in the leaf
+            if (tokenCount % TOKENS_PER_BLOCK != 0 && token != treeMaxToken)
+                continue;
+
+            lastToken = token;
+            Leaf leaf = new PartialLeaf(firstToken, lastToken, leafSize);
+            if (lastLeaf == null) // first leaf created
+                leftmostLeaf = leaf;
+            else
+                lastLeaf.next = leaf;
+
+
+            rightmostParent.add(leaf);
+            lastLeaf = rightmostLeaf = leaf;
+            firstToken = null;
+            numBlocks++;
+            leafSize = 0;
+        }
+
+        // if the tree is really a single leaf the empty root interior
+        // node must be discarded
+        if (root.tokenCount() == 0)
+        {
+            numBlocks = 1;
+            root = new StaticLeaf(combinedTerm.getTokenIterator(), treeMinToken, treeMaxToken, tokenCount, true);
+        }
+    }
+
+    // This denotes the leaf which only has min/max and token counts
+    // but doesn't have any associated data yet, so it can't be serialized.
+    private class PartialLeaf extends Leaf
+    {
+        private final int size;
+        public PartialLeaf(Long min, Long max, int count)
+        {
+            super(min, max);
+            size = count;
+        }
+
+        public int tokenCount()
+        {
+            return size;
+        }
+
+        public void serializeData(ByteBuffer buf)
+        {
+            throw new UnsupportedOperationException();
+        }
+
+        public boolean isSerializable()
+        {
+            return false;
+        }
+    }
+
+    // This denotes the leaf which has been filled with data and is ready to be serialized
+    private class StaticLeaf extends Leaf
+    {
+        private final Iterator<Token> tokens;
+        private final int count;
+        private final boolean isLast;
+
+        public StaticLeaf(Iterator<Token> tokens, Leaf leaf)
+        {
+            this(tokens, leaf.smallestToken(), leaf.largestToken(), leaf.tokenCount(), leaf.isLastLeaf());
+        }
+
+        public StaticLeaf(Iterator<Token> tokens, Long min, Long max, long count, boolean isLastLeaf)
+        {
+            super(min, max);
+
+            this.count = (int) count; // downcast is safe since leaf size is always < Integer.MAX_VALUE
+            this.tokens = tokens;
+            this.isLast = isLastLeaf;
+        }
+
+        public boolean isLastLeaf()
+        {
+            return isLast;
+        }
+
+        public int tokenCount()
+        {
+            return count;
+        }
+
+        public void serializeData(ByteBuffer buf)
+        {
+            while (tokens.hasNext())
+            {
+                Token entry = tokens.next();
+                createEntry(entry.get(), entry.getOffsets()).serialize(buf);
+            }
+        }
+
+        public boolean isSerializable()
+        {
+            return true;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/disk/Token.java b/src/java/org/apache/cassandra/index/sasi/disk/Token.java
new file mode 100644
index 0000000..4cd1ea3
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/disk/Token.java

@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import com.google.common.primitives.Longs;
+
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.index.sasi.utils.CombinedValue;
+
+import com.carrotsearch.hppc.LongSet;
+
+public abstract class Token implements CombinedValue<Long>, Iterable<DecoratedKey>
+{
+    protected final long token;
+
+    public Token(long token)
+    {
+        this.token = token;
+    }
+
+    public Long get()
+    {
+        return token;
+    }
+
+    public abstract LongSet getOffsets();
+
+    public int compareTo(CombinedValue<Long> o)
+    {
+        return Longs.compare(token, ((Token) o).token);
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java b/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java
new file mode 100644
index 0000000..c69ce00
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/disk/TokenTree.java

@@ -0,0 +1,523 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.index.sasi.utils.AbstractIterator;
+import org.apache.cassandra.index.sasi.utils.CombinedValue;
+import org.apache.cassandra.index.sasi.utils.MappedBuffer;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.utils.MergeIterator;
+
+import com.carrotsearch.hppc.LongOpenHashSet;
+import com.carrotsearch.hppc.LongSet;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Function;
+import com.google.common.collect.Iterators;
+import org.apache.commons.lang3.builder.HashCodeBuilder;
+
+import static org.apache.cassandra.index.sasi.disk.TokenTreeBuilder.EntryType;
+
+// Note: all of the seek-able offsets contained in TokenTree should be sizeof(long)
+// even if currently only lower int portion of them if used, because that makes
+// it possible to switch to mmap implementation which supports long positions
+// without any on-disk format changes and/or re-indexing if one day we'll have a need to.
+public class TokenTree
+{
+    private static final int LONG_BYTES = Long.SIZE / 8;
+    private static final int SHORT_BYTES = Short.SIZE / 8;
+
+    private final Descriptor descriptor;
+    private final MappedBuffer file;
+    private final long startPos;
+    private final long treeMinToken;
+    private final long treeMaxToken;
+    private final long tokenCount;
+
+    @VisibleForTesting
+    protected TokenTree(MappedBuffer tokenTree)
+    {
+        this(Descriptor.CURRENT, tokenTree);
+    }
+
+    public TokenTree(Descriptor d, MappedBuffer tokenTree)
+    {
+        descriptor = d;
+        file = tokenTree;
+        startPos = file.position();
+
+        file.position(startPos + TokenTreeBuilder.SHARED_HEADER_BYTES);
+
+        if (!validateMagic())
+            throw new IllegalArgumentException("invalid token tree");
+
+        tokenCount = file.getLong();
+        treeMinToken = file.getLong();
+        treeMaxToken = file.getLong();
+    }
+
+    public long getCount()
+    {
+        return tokenCount;
+    }
+
+    public RangeIterator<Long, Token> iterator(Function<Long, DecoratedKey> keyFetcher)
+    {
+        return new TokenTreeIterator(file.duplicate(), keyFetcher);
+    }
+
+    public OnDiskToken get(final long searchToken, Function<Long, DecoratedKey> keyFetcher)
+    {
+        seekToLeaf(searchToken, file);
+        long leafStart = file.position();
+        short leafSize = file.getShort(leafStart + 1); // skip the info byte
+
+        file.position(leafStart + TokenTreeBuilder.BLOCK_HEADER_BYTES); // skip to tokens
+        short tokenIndex = searchLeaf(searchToken, leafSize);
+
+        file.position(leafStart + TokenTreeBuilder.BLOCK_HEADER_BYTES);
+
+        OnDiskToken token = OnDiskToken.getTokenAt(file, tokenIndex, leafSize, keyFetcher);
+        return token.get().equals(searchToken) ? token : null;
+    }
+
+    private boolean validateMagic()
+    {
+        switch (descriptor.version.toString())
+        {
+            case Descriptor.VERSION_AA:
+                return true;
+            case Descriptor.VERSION_AB:
+                return TokenTreeBuilder.AB_MAGIC == file.getShort();
+            default:
+                return false;
+        }
+    }
+
+    // finds leaf that *could* contain token
+    private void seekToLeaf(long token, MappedBuffer file)
+    {
+        // this loop always seeks forward except for the first iteration
+        // where it may seek back to the root
+        long blockStart = startPos;
+        while (true)
+        {
+            file.position(blockStart);
+
+            byte info = file.get();
+            boolean isLeaf = (info & 1) == 1;
+
+            if (isLeaf)
+            {
+                file.position(blockStart);
+                break;
+            }
+
+            short tokenCount = file.getShort();
+
+            long minToken = file.getLong();
+            long maxToken = file.getLong();
+
+            long seekBase = blockStart + TokenTreeBuilder.BLOCK_HEADER_BYTES;
+            if (minToken > token)
+            {
+                // seek to beginning of child offsets to locate first child
+                file.position(seekBase + tokenCount * LONG_BYTES);
+                blockStart = (startPos + (int) file.getLong());
+            }
+            else if (maxToken < token)
+            {
+                // seek to end of child offsets to locate last child
+                file.position(seekBase + (2 * tokenCount) * LONG_BYTES);
+                blockStart = (startPos + (int) file.getLong());
+            }
+            else
+            {
+                // skip to end of block header/start of interior block tokens
+                file.position(seekBase);
+
+                short offsetIndex = searchBlock(token, tokenCount, file);
+
+                // file pointer is now at beginning of offsets
+                if (offsetIndex == tokenCount)
+                    file.position(file.position() + (offsetIndex * LONG_BYTES));
+                else
+                    file.position(file.position() + ((tokenCount - offsetIndex - 1) + offsetIndex) * LONG_BYTES);
+
+                blockStart = (startPos + (int) file.getLong());
+            }
+        }
+    }
+
+    private short searchBlock(long searchToken, short tokenCount, MappedBuffer file)
+    {
+        short offsetIndex = 0;
+        for (int i = 0; i < tokenCount; i++)
+        {
+            long readToken = file.getLong();
+            if (searchToken < readToken)
+                break;
+
+            offsetIndex++;
+        }
+
+        return offsetIndex;
+    }
+
+    private short searchLeaf(long searchToken, short tokenCount)
+    {
+        long base = file.position();
+
+        int start = 0;
+        int end = tokenCount;
+        int middle = 0;
+
+        while (start <= end)
+        {
+            middle = start + ((end - start) >> 1);
+
+            // each entry is 16 bytes wide, token is in bytes 4-11
+            long token = file.getLong(base + (middle * (2 * LONG_BYTES) + 4));
+
+            if (token == searchToken)
+                break;
+
+            if (token < searchToken)
+                start = middle + 1;
+            else
+                end = middle - 1;
+        }
+
+        return (short) middle;
+    }
+
+    public class TokenTreeIterator extends RangeIterator<Long, Token>
+    {
+        private final Function<Long, DecoratedKey> keyFetcher;
+        private final MappedBuffer file;
+
+        private long currentLeafStart;
+        private int currentTokenIndex;
+
+        private long leafMinToken;
+        private long leafMaxToken;
+        private short leafSize;
+
+        protected boolean firstIteration = true;
+        private boolean lastLeaf;
+
+        TokenTreeIterator(MappedBuffer file, Function<Long, DecoratedKey> keyFetcher)
+        {
+            super(treeMinToken, treeMaxToken, tokenCount);
+
+            this.file = file;
+            this.keyFetcher = keyFetcher;
+        }
+
+        protected Token computeNext()
+        {
+            maybeFirstIteration();
+
+            if (currentTokenIndex >= leafSize && lastLeaf)
+                return endOfData();
+
+            if (currentTokenIndex < leafSize) // tokens remaining in this leaf
+            {
+                return getTokenAt(currentTokenIndex++);
+            }
+            else // no more tokens remaining in this leaf
+            {
+                assert !lastLeaf;
+
+                seekToNextLeaf();
+                setupBlock();
+                return computeNext();
+            }
+        }
+
+        protected void performSkipTo(Long nextToken)
+        {
+            maybeFirstIteration();
+
+            if (nextToken <= leafMaxToken) // next is in this leaf block
+            {
+                searchLeaf(nextToken);
+            }
+            else // next is in a leaf block that needs to be found
+            {
+                seekToLeaf(nextToken, file);
+                setupBlock();
+                findNearest(nextToken);
+            }
+        }
+
+        private void setupBlock()
+        {
+            currentLeafStart = file.position();
+            currentTokenIndex = 0;
+
+            lastLeaf = (file.get() & (1 << TokenTreeBuilder.LAST_LEAF_SHIFT)) > 0;
+            leafSize = file.getShort();
+
+            leafMinToken = file.getLong();
+            leafMaxToken = file.getLong();
+
+            // seek to end of leaf header/start of data
+            file.position(currentLeafStart + TokenTreeBuilder.BLOCK_HEADER_BYTES);
+        }
+
+        private void findNearest(Long next)
+        {
+            if (next > leafMaxToken && !lastLeaf)
+            {
+                seekToNextLeaf();
+                setupBlock();
+                findNearest(next);
+            }
+            else if (next > leafMinToken)
+                searchLeaf(next);
+        }
+
+        private void searchLeaf(long next)
+        {
+            for (int i = currentTokenIndex; i < leafSize; i++)
+            {
+                if (compareTokenAt(currentTokenIndex, next) >= 0)
+                    break;
+
+                currentTokenIndex++;
+            }
+        }
+
+        private int compareTokenAt(int idx, long toToken)
+        {
+            return Long.compare(file.getLong(getTokenPosition(idx)), toToken);
+        }
+
+        private Token getTokenAt(int idx)
+        {
+            return OnDiskToken.getTokenAt(file, idx, leafSize, keyFetcher);
+        }
+
+        private long getTokenPosition(int idx)
+        {
+            // skip 4 byte entry header to get position pointing directly at the entry's token
+            return OnDiskToken.getEntryPosition(idx, file) + (2 * SHORT_BYTES);
+        }
+
+        private void seekToNextLeaf()
+        {
+            file.position(currentLeafStart + TokenTreeBuilder.BLOCK_BYTES);
+        }
+
+        public void close() throws IOException
+        {
+            // nothing to do here
+        }
+
+        private void maybeFirstIteration()
+        {
+            // seek to the first token only when requested for the first time,
+            // highly predictable branch and saves us a lot by not traversing the tree
+            // on creation time because it's not at all required.
+            if (!firstIteration)
+                return;
+
+            seekToLeaf(treeMinToken, file);
+            setupBlock();
+            firstIteration = false;
+        }
+    }
+
+    public static class OnDiskToken extends Token
+    {
+        private final Set<TokenInfo> info = new HashSet<>(2);
+        private final Set<DecoratedKey> loadedKeys = new TreeSet<>(DecoratedKey.comparator);
+
+        public OnDiskToken(MappedBuffer buffer, long position, short leafSize, Function<Long, DecoratedKey> keyFetcher)
+        {
+            super(buffer.getLong(position + (2 * SHORT_BYTES)));
+            info.add(new TokenInfo(buffer, position, leafSize, keyFetcher));
+        }
+
+        public void merge(CombinedValue<Long> other)
+        {
+            if (!(other instanceof Token))
+                return;
+
+            Token o = (Token) other;
+            if (token != o.token)
+                throw new IllegalArgumentException(String.format("%s != %s", token, o.token));
+
+            if (o instanceof OnDiskToken)
+            {
+                info.addAll(((OnDiskToken) other).info);
+            }
+            else
+            {
+                Iterators.addAll(loadedKeys, o.iterator());
+            }
+        }
+
+        public Iterator<DecoratedKey> iterator()
+        {
+            List<Iterator<DecoratedKey>> keys = new ArrayList<>(info.size());
+
+            for (TokenInfo i : info)
+                keys.add(i.iterator());
+
+            if (!loadedKeys.isEmpty())
+                keys.add(loadedKeys.iterator());
+
+            return MergeIterator.get(keys, DecoratedKey.comparator, new MergeIterator.Reducer<DecoratedKey, DecoratedKey>()
+            {
+                DecoratedKey reduced = null;
+
+                public boolean trivialReduceIsTrivial()
+                {
+                    return true;
+                }
+
+                public void reduce(int idx, DecoratedKey current)
+                {
+                    reduced = current;
+                }
+
+                protected DecoratedKey getReduced()
+                {
+                    return reduced;
+                }
+            });
+        }
+
+        public LongSet getOffsets()
+        {
+            LongSet offsets = new LongOpenHashSet(4);
+            for (TokenInfo i : info)
+            {
+                for (long offset : i.fetchOffsets())
+                    offsets.add(offset);
+            }
+
+            return offsets;
+        }
+
+        public static OnDiskToken getTokenAt(MappedBuffer buffer, int idx, short leafSize, Function<Long, DecoratedKey> keyFetcher)
+        {
+            return new OnDiskToken(buffer, getEntryPosition(idx, buffer), leafSize, keyFetcher);
+        }
+
+        private static long getEntryPosition(int idx, MappedBuffer file)
+        {
+            // info (4 bytes) + token (8 bytes) + offset (4 bytes) = 16 bytes
+            return file.position() + (idx * (2 * LONG_BYTES));
+        }
+    }
+
+    private static class TokenInfo
+    {
+        private final MappedBuffer buffer;
+        private final Function<Long, DecoratedKey> keyFetcher;
+
+        private final long position;
+        private final short leafSize;
+
+        public TokenInfo(MappedBuffer buffer, long position, short leafSize, Function<Long, DecoratedKey> keyFetcher)
+        {
+            this.keyFetcher = keyFetcher;
+            this.buffer = buffer;
+            this.position = position;
+            this.leafSize = leafSize;
+        }
+
+        public Iterator<DecoratedKey> iterator()
+        {
+            return new KeyIterator(keyFetcher, fetchOffsets());
+        }
+
+        public int hashCode()
+        {
+            return new HashCodeBuilder().append(keyFetcher).append(position).append(leafSize).build();
+        }
+
+        public boolean equals(Object other)
+        {
+            if (!(other instanceof TokenInfo))
+                return false;
+
+            TokenInfo o = (TokenInfo) other;
+            return keyFetcher == o.keyFetcher && position == o.position;
+        }
+
+        private long[] fetchOffsets()
+        {
+            short info = buffer.getShort(position);
+            // offset extra is unsigned short (right-most 16 bits of 48 bits allowed for an offset)
+            int offsetExtra = buffer.getShort(position + SHORT_BYTES) & 0xFFFF;
+            // is the it left-most (32-bit) base of the actual offset in the index file
+            int offsetData = buffer.getInt(position + (2 * SHORT_BYTES) + LONG_BYTES);
+
+            EntryType type = EntryType.of(info & TokenTreeBuilder.ENTRY_TYPE_MASK);
+
+            switch (type)
+            {
+                case SIMPLE:
+                    return new long[] { offsetData };
+
+                case OVERFLOW:
+                    long[] offsets = new long[offsetExtra]; // offsetShort contains count of tokens
+                    long offsetPos = (buffer.position() + (2 * (leafSize * LONG_BYTES)) + (offsetData * LONG_BYTES));
+
+                    for (int i = 0; i < offsetExtra; i++)
+                        offsets[i] = buffer.getLong(offsetPos + (i * LONG_BYTES));
+
+                    return offsets;
+
+                case FACTORED:
+                    return new long[] { (((long) offsetData) << Short.SIZE) + offsetExtra };
+
+                case PACKED:
+                    return new long[] { offsetExtra, offsetData };
+
+                default:
+                    throw new IllegalStateException("Unknown entry type: " + type);
+            }
+        }
+    }
+
+    private static class KeyIterator extends AbstractIterator<DecoratedKey>
+    {
+        private final Function<Long, DecoratedKey> keyFetcher;
+        private final long[] offsets;
+        private int index = 0;
+
+        public KeyIterator(Function<Long, DecoratedKey> keyFetcher, long[] offsets)
+        {
+            this.keyFetcher = keyFetcher;
+            this.offsets = offsets;
+        }
+
+        public DecoratedKey computeNext()
+        {
+            return index < offsets.length ? keyFetcher.apply(offsets[index++]) : endOfData();
+        }
+    }
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/index/sasi/disk/TokenTreeBuilder.java b/src/java/org/apache/cassandra/index/sasi/disk/TokenTreeBuilder.java
new file mode 100644
index 0000000..2210964
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/disk/TokenTreeBuilder.java

@@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import java.io.IOException;
+import java.util.*;
+
+import org.apache.cassandra.io.util.DataOutputPlus;
+import org.apache.cassandra.utils.Pair;
+
+import com.carrotsearch.hppc.LongSet;
+
+public interface TokenTreeBuilder extends Iterable<Pair<Long, LongSet>>
+{
+    int BLOCK_BYTES = 4096;
+    int BLOCK_HEADER_BYTES = 64;
+    int OVERFLOW_TRAILER_BYTES = 64;
+    int OVERFLOW_TRAILER_CAPACITY = OVERFLOW_TRAILER_BYTES / 8;
+    int TOKENS_PER_BLOCK = (BLOCK_BYTES - BLOCK_HEADER_BYTES - OVERFLOW_TRAILER_BYTES) / 16;
+    long MAX_OFFSET = (1L << 47) - 1; // 48 bits for (signed) offset
+    byte LAST_LEAF_SHIFT = 1;
+    byte SHARED_HEADER_BYTES = 19;
+    byte ENTRY_TYPE_MASK = 0x03;
+    short AB_MAGIC = 0x5A51;
+
+    // note: ordinal positions are used here, do not change order
+    enum EntryType
+    {
+        SIMPLE, FACTORED, PACKED, OVERFLOW;
+
+        public static EntryType of(int ordinal)
+        {
+            if (ordinal == SIMPLE.ordinal())
+                return SIMPLE;
+
+            if (ordinal == FACTORED.ordinal())
+                return FACTORED;
+
+            if (ordinal == PACKED.ordinal())
+                return PACKED;
+
+            if (ordinal == OVERFLOW.ordinal())
+                return OVERFLOW;
+
+            throw new IllegalArgumentException("Unknown ordinal: " + ordinal);
+        }
+    }
+
+    void add(Long token, long keyPosition);
+    void add(SortedMap<Long, LongSet> data);
+    void add(Iterator<Pair<Long, LongSet>> data);
+    void add(TokenTreeBuilder ttb);
+
+    boolean isEmpty();
+    long getTokenCount();
+
+    TokenTreeBuilder finish();
+
+    int serializedSize();
+    void write(DataOutputPlus out) throws IOException;
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/index/sasi/exceptions/TimeQuotaExceededException.java b/src/java/org/apache/cassandra/index/sasi/exceptions/TimeQuotaExceededException.java
new file mode 100644
index 0000000..af577dc
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/exceptions/TimeQuotaExceededException.java

@@ -0,0 +1,21 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.exceptions;
+
+public class TimeQuotaExceededException extends RuntimeException
+{}

diff --git a/src/java/org/apache/cassandra/index/sasi/memory/IndexMemtable.java b/src/java/org/apache/cassandra/index/sasi/memory/IndexMemtable.java
new file mode 100644
index 0000000..e55a806
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/memory/IndexMemtable.java

@@ -0,0 +1,72 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.memory;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.index.sasi.utils.TypeUtil;
+import org.apache.cassandra.utils.FBUtilities;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class IndexMemtable
+{
+    private static final Logger logger = LoggerFactory.getLogger(IndexMemtable.class);
+
+    private final MemIndex index;
+
+    public IndexMemtable(ColumnIndex columnIndex)
+    {
+        this.index = MemIndex.forColumn(columnIndex.keyValidator(), columnIndex);
+    }
+
+    public long index(DecoratedKey key, ByteBuffer value)
+    {
+        if (value == null || value.remaining() == 0)
+            return 0;
+
+        AbstractType<?> validator = index.columnIndex.getValidator();
+        if (!TypeUtil.isValid(value, validator))
+        {
+            int size = value.remaining();
+            if ((value = TypeUtil.tryUpcast(value, validator)) == null)
+            {
+                logger.error("Can't add column {} to index for key: {}, value size {}, validator: {}.",
+                             index.columnIndex.getColumnName(),
+                             index.columnIndex.keyValidator().getString(key.getKey()),
+                             FBUtilities.prettyPrintMemory(size),
+                             validator);
+                return 0;
+            }
+        }
+
+        return index.add(key, value);
+    }
+
+    public RangeIterator<Long, Token> search(Expression expression)
+    {
+        return index == null ? null : index.search(expression);
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/memory/KeyRangeIterator.java b/src/java/org/apache/cassandra/index/sasi/memory/KeyRangeIterator.java
new file mode 100644
index 0000000..a2f2c0e
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/memory/KeyRangeIterator.java

@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.memory;
+
+import java.io.IOException;
+import java.util.Iterator;
+import java.util.SortedSet;
+import java.util.TreeSet;
+import java.util.concurrent.ConcurrentSkipListSet;
+
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.utils.AbstractIterator;
+import org.apache.cassandra.index.sasi.utils.CombinedValue;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+
+import com.carrotsearch.hppc.LongOpenHashSet;
+import com.carrotsearch.hppc.LongSet;
+import com.google.common.collect.PeekingIterator;
+
+public class KeyRangeIterator extends RangeIterator<Long, Token>
+{
+    private final DKIterator iterator;
+
+    public KeyRangeIterator(ConcurrentSkipListSet<DecoratedKey> keys)
+    {
+        super((Long) keys.first().getToken().getTokenValue(), (Long) keys.last().getToken().getTokenValue(), keys.size());
+        this.iterator = new DKIterator(keys.iterator());
+    }
+
+    protected Token computeNext()
+    {
+        return iterator.hasNext() ? new DKToken(iterator.next()) : endOfData();
+    }
+
+    protected void performSkipTo(Long nextToken)
+    {
+        while (iterator.hasNext())
+        {
+            DecoratedKey key = iterator.peek();
+            if (Long.compare((long) key.getToken().getTokenValue(), nextToken) >= 0)
+                break;
+
+            // consume smaller key
+            iterator.next();
+        }
+    }
+
+    public void close() throws IOException
+    {}
+
+    private static class DKIterator extends AbstractIterator<DecoratedKey> implements PeekingIterator<DecoratedKey>
+    {
+        private final Iterator<DecoratedKey> keys;
+
+        public DKIterator(Iterator<DecoratedKey> keys)
+        {
+            this.keys = keys;
+        }
+
+        protected DecoratedKey computeNext()
+        {
+            return keys.hasNext() ? keys.next() : endOfData();
+        }
+    }
+
+    private static class DKToken extends Token
+    {
+        private final SortedSet<DecoratedKey> keys;
+
+        public DKToken(final DecoratedKey key)
+        {
+            super((long) key.getToken().getTokenValue());
+
+            keys = new TreeSet<DecoratedKey>(DecoratedKey.comparator)
+            {{
+                add(key);
+            }};
+        }
+
+        public LongSet getOffsets()
+        {
+            LongSet offsets = new LongOpenHashSet(4);
+            for (DecoratedKey key : keys)
+                offsets.add((long) key.getToken().getTokenValue());
+
+            return offsets;
+        }
+
+        public void merge(CombinedValue<Long> other)
+        {
+            if (!(other instanceof Token))
+                return;
+
+            Token o = (Token) other;
+            assert o.get().equals(token);
+
+            if (o instanceof DKToken)
+            {
+                keys.addAll(((DKToken) o).keys);
+            }
+            else
+            {
+                for (DecoratedKey key : o)
+                    keys.add(key);
+            }
+        }
+
+        public Iterator<DecoratedKey> iterator()
+        {
+            return keys.iterator();
+        }
+    }
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/index/sasi/memory/MemIndex.java b/src/java/org/apache/cassandra/index/sasi/memory/MemIndex.java
new file mode 100644
index 0000000..22d6c9e
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/memory/MemIndex.java

@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.memory;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+import org.github.jamm.MemoryMeter;
+
+public abstract class MemIndex
+{
+    protected final AbstractType<?> keyValidator;
+    protected final ColumnIndex columnIndex;
+
+    protected MemIndex(AbstractType<?> keyValidator, ColumnIndex columnIndex)
+    {
+        this.keyValidator = keyValidator;
+        this.columnIndex = columnIndex;
+    }
+
+    public abstract long add(DecoratedKey key, ByteBuffer value);
+    public abstract RangeIterator<Long, Token> search(Expression expression);
+
+    public static MemIndex forColumn(AbstractType<?> keyValidator, ColumnIndex columnIndex)
+    {
+        return columnIndex.isLiteral()
+                ? new TrieMemIndex(keyValidator, columnIndex)
+                : new SkipListMemIndex(keyValidator, columnIndex);
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/memory/SkipListMemIndex.java b/src/java/org/apache/cassandra/index/sasi/memory/SkipListMemIndex.java
new file mode 100644
index 0000000..69b57d0
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/memory/SkipListMemIndex.java

@@ -0,0 +1,97 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.memory;
+
+import java.nio.ByteBuffer;
+import java.util.*;
+import java.util.concurrent.ConcurrentSkipListMap;
+import java.util.concurrent.ConcurrentSkipListSet;
+
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.index.sasi.utils.RangeUnionIterator;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+public class SkipListMemIndex extends MemIndex
+{
+    public static final int CSLM_OVERHEAD = 128; // average overhead of CSLM
+
+    private final ConcurrentSkipListMap<ByteBuffer, ConcurrentSkipListSet<DecoratedKey>> index;
+
+    public SkipListMemIndex(AbstractType<?> keyValidator, ColumnIndex columnIndex)
+    {
+        super(keyValidator, columnIndex);
+        index = new ConcurrentSkipListMap<>(columnIndex.getValidator());
+    }
+
+    public long add(DecoratedKey key, ByteBuffer value)
+    {
+        long overhead = CSLM_OVERHEAD; // DKs are shared
+        ConcurrentSkipListSet<DecoratedKey> keys = index.get(value);
+
+        if (keys == null)
+        {
+            ConcurrentSkipListSet<DecoratedKey> newKeys = new ConcurrentSkipListSet<>(DecoratedKey.comparator);
+            keys = index.putIfAbsent(value, newKeys);
+            if (keys == null)
+            {
+                overhead += CSLM_OVERHEAD + value.remaining();
+                keys = newKeys;
+            }
+        }
+
+        keys.add(key);
+
+        return overhead;
+    }
+
+    public RangeIterator<Long, Token> search(Expression expression)
+    {
+        ByteBuffer min = expression.lower == null ? null : expression.lower.value;
+        ByteBuffer max = expression.upper == null ? null : expression.upper.value;
+
+        SortedMap<ByteBuffer, ConcurrentSkipListSet<DecoratedKey>> search;
+
+        if (min == null && max == null)
+        {
+            throw new IllegalArgumentException();
+        }
+        if (min != null && max != null)
+        {
+            search = index.subMap(min, expression.lower.inclusive, max, expression.upper.inclusive);
+        }
+        else if (min == null)
+        {
+            search = index.headMap(max, expression.upper.inclusive);
+        }
+        else
+        {
+            search = index.tailMap(min, expression.lower.inclusive);
+        }
+
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+        search.values().stream()
+                       .filter(keys -> !keys.isEmpty())
+                       .forEach(keys -> builder.add(new KeyRangeIterator(keys)));
+
+        return builder.build();
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/memory/TrieMemIndex.java b/src/java/org/apache/cassandra/index/sasi/memory/TrieMemIndex.java
new file mode 100644
index 0000000..ca60ac5
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/memory/TrieMemIndex.java

@@ -0,0 +1,283 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.memory;
+
+import java.nio.ByteBuffer;
+import java.util.Collections;
+import java.util.List;
+import java.util.concurrent.ConcurrentSkipListSet;
+
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder;
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.index.sasi.plan.Expression.Op;
+import org.apache.cassandra.index.sasi.analyzer.AbstractAnalyzer;
+import org.apache.cassandra.index.sasi.utils.RangeUnionIterator;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+import com.googlecode.concurrenttrees.radix.ConcurrentRadixTree;
+import com.googlecode.concurrenttrees.suffix.ConcurrentSuffixTree;
+import com.googlecode.concurrenttrees.radix.node.concrete.SmartArrayBasedNodeFactory;
+import com.googlecode.concurrenttrees.radix.node.Node;
+import org.apache.cassandra.utils.FBUtilities;
+
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import static org.apache.cassandra.index.sasi.memory.SkipListMemIndex.CSLM_OVERHEAD;
+
+public class TrieMemIndex extends MemIndex
+{
+    private static final Logger logger = LoggerFactory.getLogger(TrieMemIndex.class);
+
+    private final ConcurrentTrie index;
+
+    public TrieMemIndex(AbstractType<?> keyValidator, ColumnIndex columnIndex)
+    {
+        super(keyValidator, columnIndex);
+
+        switch (columnIndex.getMode().mode)
+        {
+            case CONTAINS:
+                index = new ConcurrentSuffixTrie(columnIndex.getDefinition());
+                break;
+
+            case PREFIX:
+                index = new ConcurrentPrefixTrie(columnIndex.getDefinition());
+                break;
+
+            default:
+                throw new IllegalStateException("Unsupported mode: " + columnIndex.getMode().mode);
+        }
+    }
+
+    public long add(DecoratedKey key, ByteBuffer value)
+    {
+        AbstractAnalyzer analyzer = columnIndex.getAnalyzer();
+        analyzer.reset(value.duplicate());
+
+        long size = 0;
+        while (analyzer.hasNext())
+        {
+            ByteBuffer term = analyzer.next();
+
+            if (term.remaining() >= OnDiskIndexBuilder.MAX_TERM_SIZE)
+            {
+                logger.info("Can't add term of column {} to index for key: {}, term size {}, max allowed size {}, use analyzed = true (if not yet set) for that column.",
+                            columnIndex.getColumnName(),
+                            keyValidator.getString(key.getKey()),
+                            FBUtilities.prettyPrintMemory(term.remaining()),
+                            FBUtilities.prettyPrintMemory(OnDiskIndexBuilder.MAX_TERM_SIZE));
+                continue;
+            }
+
+            size += index.add(columnIndex.getValidator().getString(term), key);
+        }
+
+        return size;
+    }
+
+    public RangeIterator<Long, Token> search(Expression expression)
+    {
+        return index.search(expression);
+    }
+
+    private static abstract class ConcurrentTrie
+    {
+        public static final SizeEstimatingNodeFactory NODE_FACTORY = new SizeEstimatingNodeFactory();
+
+        protected final ColumnDefinition definition;
+
+        public ConcurrentTrie(ColumnDefinition column)
+        {
+            definition = column;
+        }
+
+        public long add(String value, DecoratedKey key)
+        {
+            long overhead = CSLM_OVERHEAD;
+            ConcurrentSkipListSet<DecoratedKey> keys = get(value);
+            if (keys == null)
+            {
+                ConcurrentSkipListSet<DecoratedKey> newKeys = new ConcurrentSkipListSet<>(DecoratedKey.comparator);
+                keys = putIfAbsent(value, newKeys);
+                if (keys == null)
+                {
+                    overhead += CSLM_OVERHEAD + value.length();
+                    keys = newKeys;
+                }
+            }
+
+            keys.add(key);
+
+            // get and reset new memory size allocated by current thread
+            overhead += NODE_FACTORY.currentUpdateSize();
+            NODE_FACTORY.reset();
+
+            return overhead;
+        }
+
+        public RangeIterator<Long, Token> search(Expression expression)
+        {
+            ByteBuffer prefix = expression.lower == null ? null : expression.lower.value;
+
+            Iterable<ConcurrentSkipListSet<DecoratedKey>> search = search(expression.getOp(), definition.cellValueType().getString(prefix));
+
+            RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+            for (ConcurrentSkipListSet<DecoratedKey> keys : search)
+            {
+                if (!keys.isEmpty())
+                    builder.add(new KeyRangeIterator(keys));
+            }
+
+            return builder.build();
+        }
+
+        protected abstract ConcurrentSkipListSet<DecoratedKey> get(String value);
+        protected abstract Iterable<ConcurrentSkipListSet<DecoratedKey>> search(Op operator, String value);
+        protected abstract ConcurrentSkipListSet<DecoratedKey> putIfAbsent(String value, ConcurrentSkipListSet<DecoratedKey> key);
+    }
+
+    protected static class ConcurrentPrefixTrie extends ConcurrentTrie
+    {
+        private final ConcurrentRadixTree<ConcurrentSkipListSet<DecoratedKey>> trie;
+
+        private ConcurrentPrefixTrie(ColumnDefinition column)
+        {
+            super(column);
+            trie = new ConcurrentRadixTree<>(NODE_FACTORY);
+        }
+
+        public ConcurrentSkipListSet<DecoratedKey> get(String value)
+        {
+            return trie.getValueForExactKey(value);
+        }
+
+        public ConcurrentSkipListSet<DecoratedKey> putIfAbsent(String value, ConcurrentSkipListSet<DecoratedKey> newKeys)
+        {
+            return trie.putIfAbsent(value, newKeys);
+        }
+
+        public Iterable<ConcurrentSkipListSet<DecoratedKey>> search(Op operator, String value)
+        {
+            switch (operator)
+            {
+                case EQ:
+                case MATCH:
+                    ConcurrentSkipListSet<DecoratedKey> keys = trie.getValueForExactKey(value);
+                    return keys == null ? Collections.emptyList() : Collections.singletonList(keys);
+
+                case PREFIX:
+                    return trie.getValuesForKeysStartingWith(value);
+
+                default:
+                    throw new UnsupportedOperationException(String.format("operation %s is not supported.", operator));
+            }
+        }
+    }
+
+    protected static class ConcurrentSuffixTrie extends ConcurrentTrie
+    {
+        private final ConcurrentSuffixTree<ConcurrentSkipListSet<DecoratedKey>> trie;
+
+        private ConcurrentSuffixTrie(ColumnDefinition column)
+        {
+            super(column);
+            trie = new ConcurrentSuffixTree<>(NODE_FACTORY);
+        }
+
+        public ConcurrentSkipListSet<DecoratedKey> get(String value)
+        {
+            return trie.getValueForExactKey(value);
+        }
+
+        public ConcurrentSkipListSet<DecoratedKey> putIfAbsent(String value, ConcurrentSkipListSet<DecoratedKey> newKeys)
+        {
+            return trie.putIfAbsent(value, newKeys);
+        }
+
+        public Iterable<ConcurrentSkipListSet<DecoratedKey>> search(Op operator, String value)
+        {
+            switch (operator)
+            {
+                case EQ:
+                case MATCH:
+                    ConcurrentSkipListSet<DecoratedKey> keys = trie.getValueForExactKey(value);
+                    return keys == null ? Collections.emptyList() : Collections.singletonList(keys);
+
+                case SUFFIX:
+                    return trie.getValuesForKeysEndingWith(value);
+
+                case PREFIX:
+                case CONTAINS:
+                    return trie.getValuesForKeysContaining(value);
+
+                default:
+                    throw new UnsupportedOperationException(String.format("operation %s is not supported.", operator));
+            }
+        }
+    }
+
+    // This relies on the fact that all of the tree updates are done under exclusive write lock,
+    // method would overestimate in certain circumstances e.g. when nodes are replaced in place,
+    // but it's still better comparing to underestimate since it gives more breathing room for other memory users.
+    private static class SizeEstimatingNodeFactory extends SmartArrayBasedNodeFactory
+    {
+        private final ThreadLocal<Long> updateSize = ThreadLocal.withInitial(() -> 0L);
+
+        public Node createNode(CharSequence edgeCharacters, Object value, List<Node> childNodes, boolean isRoot)
+        {
+            Node node = super.createNode(edgeCharacters, value, childNodes, isRoot);
+            updateSize.set(updateSize.get() + measure(node));
+            return node;
+        }
+
+        public long currentUpdateSize()
+        {
+            return updateSize.get();
+        }
+
+        public void reset()
+        {
+            updateSize.set(0L);
+        }
+
+        private long measure(Node node)
+        {
+            // node with max overhead is CharArrayNodeLeafWithValue = 24B
+            long overhead = 24;
+
+            // array of chars (2 bytes) + CharSequence overhead
+            overhead += 24 + node.getIncomingEdge().length() * 2;
+
+            if (node.getOutgoingEdges() != null)
+            {
+                // 16 bytes for AtomicReferenceArray
+                overhead += 16;
+                overhead += 24 * node.getOutgoingEdges().size();
+            }
+
+            return overhead;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/plan/Expression.java b/src/java/org/apache/cassandra/index/sasi/plan/Expression.java
new file mode 100644
index 0000000..cc156ee
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/plan/Expression.java

@@ -0,0 +1,402 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.plan;
+
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.cql3.Operator;
+import org.apache.cassandra.index.sasi.analyzer.AbstractAnalyzer;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndex;
+import org.apache.cassandra.index.sasi.utils.TypeUtil;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.utils.FBUtilities;
+
+import org.apache.commons.lang3.builder.HashCodeBuilder;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.Iterators;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+public class Expression
+{
+    private static final Logger logger = LoggerFactory.getLogger(Expression.class);
+
+    public enum Op
+    {
+        EQ, MATCH, PREFIX, SUFFIX, CONTAINS, NOT_EQ, RANGE;
+
+        public static Op valueOf(Operator operator)
+        {
+            switch (operator)
+            {
+                case EQ:
+                    return EQ;
+
+                case NEQ:
+                    return NOT_EQ;
+
+                case LT:
+                case GT:
+                case LTE:
+                case GTE:
+                    return RANGE;
+
+                case LIKE_PREFIX:
+                    return PREFIX;
+
+                case LIKE_SUFFIX:
+                    return SUFFIX;
+
+                case LIKE_CONTAINS:
+                    return CONTAINS;
+
+                case LIKE_MATCHES:
+                    return MATCH;
+
+                default:
+                    throw new IllegalArgumentException("unknown operator: " + operator);
+            }
+        }
+    }
+
+    private final QueryController controller;
+
+    public final AbstractAnalyzer analyzer;
+
+    public final ColumnIndex index;
+    public final AbstractType<?> validator;
+    public final boolean isLiteral;
+
+    @VisibleForTesting
+    protected Op operation;
+
+    public Bound lower, upper;
+    public List<ByteBuffer> exclusions = new ArrayList<>();
+
+    public Expression(Expression other)
+    {
+        this(other.controller, other.index);
+        operation = other.operation;
+    }
+
+    public Expression(QueryController controller, ColumnIndex columnIndex)
+    {
+        this.controller = controller;
+        this.index = columnIndex;
+        this.analyzer = columnIndex.getAnalyzer();
+        this.validator = columnIndex.getValidator();
+        this.isLiteral = columnIndex.isLiteral();
+    }
+
+    @VisibleForTesting
+    public Expression(String name, AbstractType<?> validator)
+    {
+        this(null, new ColumnIndex(UTF8Type.instance, ColumnDefinition.regularDef("sasi", "internal", name, validator), null));
+    }
+
+    public Expression setLower(Bound newLower)
+    {
+        lower = newLower == null ? null : new Bound(newLower.value, newLower.inclusive);
+        return this;
+    }
+
+    public Expression setUpper(Bound newUpper)
+    {
+        upper = newUpper == null ? null : new Bound(newUpper.value, newUpper.inclusive);
+        return this;
+    }
+
+    public Expression setOp(Op op)
+    {
+        this.operation = op;
+        return this;
+    }
+
+    public Expression add(Operator op, ByteBuffer value)
+    {
+        boolean lowerInclusive = false, upperInclusive = false;
+        switch (op)
+        {
+            case LIKE_PREFIX:
+            case LIKE_SUFFIX:
+            case LIKE_CONTAINS:
+            case LIKE_MATCHES:
+            case EQ:
+                lower = new Bound(value, true);
+                upper = lower;
+                operation = Op.valueOf(op);
+                break;
+
+            case NEQ:
+                // index expressions are priority sorted
+                // and NOT_EQ is the lowest priority, which means that operation type
+                // is always going to be set before reaching it in case of RANGE or EQ.
+                if (operation == null)
+                {
+                    operation = Op.NOT_EQ;
+                    lower = new Bound(value, true);
+                    upper = lower;
+                }
+                else
+                    exclusions.add(value);
+                break;
+
+            case LTE:
+                upperInclusive = true;
+            case LT:
+                operation = Op.RANGE;
+                upper = new Bound(value, upperInclusive);
+                break;
+
+            case GTE:
+                lowerInclusive = true;
+            case GT:
+                operation = Op.RANGE;
+                lower = new Bound(value, lowerInclusive);
+                break;
+        }
+
+        return this;
+    }
+
+    public Expression addExclusion(ByteBuffer value)
+    {
+        exclusions.add(value);
+        return this;
+    }
+
+    public boolean isSatisfiedBy(ByteBuffer value)
+    {
+        if (!TypeUtil.isValid(value, validator))
+        {
+            int size = value.remaining();
+            if ((value = TypeUtil.tryUpcast(value, validator)) == null)
+            {
+                logger.error("Can't cast value for {} to size accepted by {}, value size is {}.",
+                             index.getColumnName(),
+                             validator,
+                             FBUtilities.prettyPrintMemory(size));
+                return false;
+            }
+        }
+
+        if (lower != null)
+        {
+            // suffix check
+            if (isLiteral)
+            {
+                if (!validateStringValue(value, lower.value))
+                    return false;
+            }
+            else
+            {
+                // range or (not-)equals - (mainly) for numeric values
+                int cmp = validator.compare(lower.value, value);
+
+                // in case of (NOT_)EQ lower == upper
+                if (operation == Op.EQ || operation == Op.NOT_EQ)
+                    return cmp == 0;
+
+                if (cmp > 0 || (cmp == 0 && !lower.inclusive))
+                    return false;
+            }
+        }
+
+        if (upper != null && lower != upper)
+        {
+            // string (prefix or suffix) check
+            if (isLiteral)
+            {
+                if (!validateStringValue(value, upper.value))
+                    return false;
+            }
+            else
+            {
+                // range - mainly for numeric values
+                int cmp = validator.compare(upper.value, value);
+                if (cmp < 0 || (cmp == 0 && !upper.inclusive))
+                    return false;
+            }
+        }
+
+        // as a last step let's check exclusions for the given field,
+        // this covers EQ/RANGE with exclusions.
+        for (ByteBuffer term : exclusions)
+        {
+            if (isLiteral && validateStringValue(value, term))
+                return false;
+            else if (validator.compare(term, value) == 0)
+                return false;
+        }
+
+        return true;
+    }
+
+    private boolean validateStringValue(ByteBuffer columnValue, ByteBuffer requestedValue)
+    {
+        analyzer.reset(columnValue.duplicate());
+        while (analyzer.hasNext())
+        {
+            ByteBuffer term = analyzer.next();
+
+            boolean isMatch = false;
+            switch (operation)
+            {
+                case EQ:
+                case MATCH:
+                // Operation.isSatisfiedBy handles conclusion on !=,
+                // here we just need to make sure that term matched it
+                case NOT_EQ:
+                    isMatch = validator.compare(term, requestedValue) == 0;
+                    break;
+
+                case PREFIX:
+                    isMatch = ByteBufferUtil.startsWith(term, requestedValue);
+                    break;
+
+                case SUFFIX:
+                    isMatch = ByteBufferUtil.endsWith(term, requestedValue);
+                    break;
+
+                case CONTAINS:
+                    isMatch = ByteBufferUtil.contains(term, requestedValue);
+                    break;
+            }
+
+            if (isMatch)
+                return true;
+        }
+
+        return false;
+    }
+
+    public Op getOp()
+    {
+        return operation;
+    }
+
+    public void checkpoint()
+    {
+        if (controller == null)
+            return;
+
+        controller.checkpoint();
+    }
+
+    public boolean hasLower()
+    {
+        return lower != null;
+    }
+
+    public boolean hasUpper()
+    {
+        return upper != null;
+    }
+
+    public boolean isLowerSatisfiedBy(OnDiskIndex.DataTerm term)
+    {
+        if (!hasLower())
+            return true;
+
+        int cmp = term.compareTo(validator, lower.value, false);
+        return cmp > 0 || cmp == 0 && lower.inclusive;
+    }
+
+    public boolean isUpperSatisfiedBy(OnDiskIndex.DataTerm term)
+    {
+        if (!hasUpper())
+            return true;
+
+        int cmp = term.compareTo(validator, upper.value, false);
+        return cmp < 0 || cmp == 0 && upper.inclusive;
+    }
+
+    public boolean isIndexed()
+    {
+        return index.isIndexed();
+    }
+
+    public String toString()
+    {
+        return String.format("Expression{name: %s, op: %s, lower: (%s, %s), upper: (%s, %s), exclusions: %s}",
+                             index.getColumnName(),
+                             operation,
+                             lower == null ? "null" : validator.getString(lower.value),
+                             lower != null && lower.inclusive,
+                             upper == null ? "null" : validator.getString(upper.value),
+                             upper != null && upper.inclusive,
+                             Iterators.toString(Iterators.transform(exclusions.iterator(), validator::getString)));
+    }
+
+    public int hashCode()
+    {
+        return new HashCodeBuilder().append(index.getColumnName())
+                                    .append(operation)
+                                    .append(validator)
+                                    .append(lower).append(upper)
+                                    .append(exclusions).build();
+    }
+
+    public boolean equals(Object other)
+    {
+        if (!(other instanceof Expression))
+            return false;
+
+        if (this == other)
+            return true;
+
+        Expression o = (Expression) other;
+
+        return Objects.equals(index.getColumnName(), o.index.getColumnName())
+                && validator.equals(o.validator)
+                && operation == o.operation
+                && Objects.equals(lower, o.lower)
+                && Objects.equals(upper, o.upper)
+                && exclusions.equals(o.exclusions);
+    }
+
+    public static class Bound
+    {
+        public final ByteBuffer value;
+        public final boolean inclusive;
+
+        public Bound(ByteBuffer value, boolean inclusive)
+        {
+            this.value = value;
+            this.inclusive = inclusive;
+        }
+
+        public boolean equals(Object other)
+        {
+            if (!(other instanceof Bound))
+                return false;
+
+            Bound o = (Bound) other;
+            return value.equals(o.value) && inclusive == o.inclusive;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/plan/Operation.java b/src/java/org/apache/cassandra/index/sasi/plan/Operation.java
new file mode 100644
index 0000000..7c744e1
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/plan/Operation.java

@@ -0,0 +1,505 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.plan;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.*;
+
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.config.ColumnDefinition.Kind;
+import org.apache.cassandra.cql3.Operator;
+import org.apache.cassandra.db.filter.RowFilter;
+import org.apache.cassandra.db.rows.Row;
+import org.apache.cassandra.db.rows.Unfiltered;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.analyzer.AbstractAnalyzer;
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.plan.Expression.Op;
+import org.apache.cassandra.index.sasi.utils.RangeIntersectionIterator;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.index.sasi.utils.RangeUnionIterator;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.*;
+import org.apache.cassandra.utils.FBUtilities;
+
+@SuppressWarnings("resource")
+public class Operation extends RangeIterator<Long, Token>
+{
+    public enum OperationType
+    {
+        AND, OR;
+
+        public boolean apply(boolean a, boolean b)
+        {
+            switch (this)
+            {
+                case OR:
+                    return a | b;
+
+                case AND:
+                    return a & b;
+
+                default:
+                    throw new AssertionError();
+            }
+        }
+    }
+
+    private final QueryController controller;
+
+    protected final OperationType op;
+    protected final ListMultimap<ColumnDefinition, Expression> expressions;
+    protected final RangeIterator<Long, Token> range;
+
+    protected Operation left, right;
+
+    private Operation(OperationType operation,
+                      QueryController controller,
+                      ListMultimap<ColumnDefinition, Expression> expressions,
+                      RangeIterator<Long, Token> range,
+                      Operation left, Operation right)
+    {
+        super(range);
+
+        this.op = operation;
+        this.controller = controller;
+        this.expressions = expressions;
+        this.range = range;
+
+        this.left = left;
+        this.right = right;
+    }
+
+    /**
+     * Recursive "satisfies" checks based on operation
+     * and data from the lower level members using depth-first search
+     * and bubbling the results back to the top level caller.
+     *
+     * Most of the work here is done by {@link #localSatisfiedBy(Unfiltered, Row, boolean)}
+     * see it's comment for details, if there are no local expressions
+     * assigned to Operation it will call satisfiedBy(Row) on it's children.
+     *
+     * Query: first_name = X AND (last_name = Y OR address = XYZ AND street = IL AND city = C) OR (state = 'CA' AND country = 'US')
+     * Row: key1: (first_name: X, last_name: Z, address: XYZ, street: IL, city: C, state: NY, country:US)
+     *
+     * #1                       OR
+     *                        /    \
+     * #2       (first_name) AND   AND (state, country)
+     *                          \
+     * #3            (last_name) OR
+     *                             \
+     * #4                          AND (address, street, city)
+     *
+     *
+     * Evaluation of the key1 is top-down depth-first search:
+     *
+     * --- going down ---
+     * Level #1 is evaluated, OR expression has to pull results from it's children which are at level #2 and OR them together,
+     * Level #2 AND (state, country) could be be evaluated right away, AND (first_name) refers to it's "right" child from level #3
+     * Level #3 OR (last_name) requests results from level #4
+     * Level #4 AND (address, street, city) does logical AND between it's 3 fields, returns result back to level #3.
+     * --- bubbling up ---
+     * Level #3 computes OR between AND (address, street, city) result and it's "last_name" expression
+     * Level #2 computes AND between "first_name" and result of level #3, AND (state, country) which is already computed
+     * Level #1 does OR between results of AND (first_name) and AND (state, country) and returns final result.
+     *
+     * @param currentCluster The row cluster to check.
+     * @param staticRow The static row associated with current cluster.
+     * @param allowMissingColumns allow columns value to be null.
+     * @return true if give Row satisfied all of the expressions in the tree,
+     *         false otherwise.
+     */
+    public boolean satisfiedBy(Unfiltered currentCluster, Row staticRow, boolean allowMissingColumns)
+    {
+        boolean sideL, sideR;
+
+        if (expressions == null || expressions.isEmpty())
+        {
+            sideL =  left != null &&  left.satisfiedBy(currentCluster, staticRow, allowMissingColumns);
+            sideR = right != null && right.satisfiedBy(currentCluster, staticRow, allowMissingColumns);
+
+            // one of the expressions was skipped
+            // because it had no indexes attached
+            if (left == null)
+                return sideR;
+        }
+        else
+        {
+            sideL = localSatisfiedBy(currentCluster, staticRow, allowMissingColumns);
+
+            // if there is no right it means that this expression
+            // is last in the sequence, we can just return result from local expressions
+            if (right == null)
+                return sideL;
+
+            sideR = right.satisfiedBy(currentCluster, staticRow, allowMissingColumns);
+        }
+
+
+        return op.apply(sideL, sideR);
+    }
+
+    /**
+     * Check every expression in the analyzed list to figure out if the
+     * columns in the give row match all of the based on the operation
+     * set to the current operation node.
+     *
+     * The algorithm is as follows: for every given expression from analyzed
+     * list get corresponding column from the Row:
+     *   - apply {@link Expression#isSatisfiedBy(ByteBuffer)}
+     *     method to figure out if it's satisfied;
+     *   - apply logical operation between boolean accumulator and current boolean result;
+     *   - if result == false and node's operation is AND return right away;
+     *
+     * After all of the expressions have been evaluated return resulting accumulator variable.
+     *
+     * Example:
+     *
+     * Operation = (op: AND, columns: [first_name = p, 5 < age < 7, last_name: y])
+     * Row = (first_name: pavel, last_name: y, age: 6, timestamp: 15)
+     *
+     * #1 get "first_name" = p (expressions)
+     *      - row-get "first_name"                      => "pavel"
+     *      - compare "pavel" against "p"               => true (current)
+     *      - set accumulator current                   => true (because this is expression #1)
+     *
+     * #2 get "last_name" = y (expressions)
+     *      - row-get "last_name"                       => "y"
+     *      - compare "y" against "y"                   => true (current)
+     *      - set accumulator to accumulator & current  => true
+     *
+     * #3 get 5 < "age" < 7 (expressions)
+     *      - row-get "age"                             => "6"
+     *      - compare 5 < 6 < 7                         => true (current)
+     *      - set accumulator to accumulator & current  => true
+     *
+     * #4 return accumulator => true (row satisfied all of the conditions)
+     *
+     * @param currentCluster The row cluster to check.
+     * @param staticRow The static row associated with current cluster.
+     * @param allowMissingColumns allow columns value to be null.
+     * @return true if give Row satisfied all of the analyzed expressions,
+     *         false otherwise.
+     */
+    private boolean localSatisfiedBy(Unfiltered currentCluster, Row staticRow, boolean allowMissingColumns)
+    {
+        if (currentCluster == null || !currentCluster.isRow())
+            return false;
+
+        final int now = FBUtilities.nowInSeconds();
+        boolean result = false;
+        int idx = 0;
+
+        for (ColumnDefinition column : expressions.keySet())
+        {
+            if (column.kind == Kind.PARTITION_KEY)
+                continue;
+
+            ByteBuffer value = ColumnIndex.getValueOf(column, column.kind == Kind.STATIC ? staticRow : (Row) currentCluster, now);
+            boolean isMissingColumn = value == null;
+
+            if (!allowMissingColumns && isMissingColumn)
+                throw new IllegalStateException("All indexed columns should be included into the column slice, missing: " + column);
+
+            boolean isMatch = false;
+            // If there is a column with multiple expressions that effectively means an OR
+            // e.g. comment = 'x y z' could be split into 'comment' EQ 'x', 'comment' EQ 'y', 'comment' EQ 'z'
+            // by analyzer, in situation like that we only need to check if at least one of expressions matches,
+            // and there is no hit on the NOT_EQ (if any) which are always at the end of the filter list.
+            // Loop always starts from the end of the list, which makes it possible to break after the last
+            // NOT_EQ condition on first EQ/RANGE condition satisfied, instead of checking every
+            // single expression in the column filter list.
+            List<Expression> filters = expressions.get(column);
+            for (int i = filters.size() - 1; i >= 0; i--)
+            {
+                Expression expression = filters.get(i);
+                isMatch = !isMissingColumn && expression.isSatisfiedBy(value);
+                if (expression.getOp() == Op.NOT_EQ)
+                {
+                    // since this is NOT_EQ operation we have to
+                    // inverse match flag (to check against other expressions),
+                    // and break in case of negative inverse because that means
+                    // that it's a positive hit on the not-eq clause.
+                    isMatch = !isMatch;
+                    if (!isMatch)
+                        break;
+                } // if it was a match on EQ/RANGE or column is missing
+                else if (isMatch || isMissingColumn)
+                    break;
+            }
+
+            if (idx++ == 0)
+            {
+                result = isMatch;
+                continue;
+            }
+
+            result = op.apply(result, isMatch);
+
+            // exit early because we already got a single false
+            if (op == OperationType.AND && !result)
+                return false;
+        }
+
+        return idx == 0 || result;
+    }
+
+    @VisibleForTesting
+    protected static ListMultimap<ColumnDefinition, Expression> analyzeGroup(QueryController controller,
+                                                                             OperationType op,
+                                                                             List<RowFilter.Expression> expressions)
+    {
+        ListMultimap<ColumnDefinition, Expression> analyzed = ArrayListMultimap.create();
+
+        // sort all of the expressions in the operation by name and priority of the logical operator
+        // this gives us an efficient way to handle inequality and combining into ranges without extra processing
+        // and converting expressions from one type to another.
+        Collections.sort(expressions, (a, b) -> {
+            int cmp = a.column().compareTo(b.column());
+            return cmp == 0 ? -Integer.compare(getPriority(a.operator()), getPriority(b.operator())) : cmp;
+        });
+
+        for (final RowFilter.Expression e : expressions)
+        {
+            ColumnIndex columnIndex = controller.getIndex(e);
+            List<Expression> perColumn = analyzed.get(e.column());
+
+            if (columnIndex == null)
+                columnIndex = new ColumnIndex(controller.getKeyValidator(), e.column(), null);
+
+            AbstractAnalyzer analyzer = columnIndex.getAnalyzer();
+            analyzer.reset(e.getIndexValue());
+
+            // EQ/LIKE_*/NOT_EQ can have multiple expressions e.g. text = "Hello World",
+            // becomes text = "Hello" OR text = "World" because "space" is always interpreted as a split point (by analyzer),
+            // NOT_EQ is made an independent expression only in case of pre-existing multiple EQ expressions, or
+            // if there is no EQ operations and NOT_EQ is met or a single NOT_EQ expression present,
+            // in such case we know exactly that there would be no more EQ/RANGE expressions for given column
+            // since NOT_EQ has the lowest priority.
+            boolean isMultiExpression = false;
+            switch (e.operator())
+            {
+                case EQ:
+                    isMultiExpression = false;
+                    break;
+
+                case LIKE_PREFIX:
+                case LIKE_SUFFIX:
+                case LIKE_CONTAINS:
+                case LIKE_MATCHES:
+                    isMultiExpression = true;
+                    break;
+
+                case NEQ:
+                    isMultiExpression = (perColumn.size() == 0 || perColumn.size() > 1
+                                     || (perColumn.size() == 1 && perColumn.get(0).getOp() == Op.NOT_EQ));
+                    break;
+            }
+
+            if (isMultiExpression)
+            {
+                while (analyzer.hasNext())
+                {
+                    final ByteBuffer token = analyzer.next();
+                    perColumn.add(new Expression(controller, columnIndex).add(e.operator(), token));
+                }
+            }
+            else
+            // "range" or not-equals operator, combines both bounds together into the single expression,
+            // iff operation of the group is AND, otherwise we are forced to create separate expressions,
+            // not-equals is combined with the range iff operator is AND.
+            {
+                Expression range;
+                if (perColumn.size() == 0 || op != OperationType.AND)
+                    perColumn.add((range = new Expression(controller, columnIndex)));
+                else
+                    range = Iterables.getLast(perColumn);
+
+                while (analyzer.hasNext())
+                    range.add(e.operator(), analyzer.next());
+            }
+        }
+
+        return analyzed;
+    }
+
+    private static int getPriority(Operator op)
+    {
+        switch (op)
+        {
+            case EQ:
+                return 5;
+
+            case LIKE_PREFIX:
+            case LIKE_SUFFIX:
+            case LIKE_CONTAINS:
+            case LIKE_MATCHES:
+                return 4;
+
+            case GTE:
+            case GT:
+                return 3;
+
+            case LTE:
+            case LT:
+                return 2;
+
+            case NEQ:
+                return 1;
+
+            default:
+                return 0;
+        }
+    }
+
+    protected Token computeNext()
+    {
+        return range != null && range.hasNext() ? range.next() : endOfData();
+    }
+
+    protected void performSkipTo(Long nextToken)
+    {
+        if (range != null)
+            range.skipTo(nextToken);
+    }
+
+    public void close() throws IOException
+    {
+        controller.releaseIndexes(this);
+    }
+
+    public static class Builder
+    {
+        private final QueryController controller;
+
+        protected final OperationType op;
+        protected final List<RowFilter.Expression> expressions;
+
+        protected Builder left, right;
+
+        public Builder(OperationType operation, QueryController controller, RowFilter.Expression... columns)
+        {
+            this.op = operation;
+            this.controller = controller;
+            this.expressions = new ArrayList<>();
+            Collections.addAll(expressions, columns);
+        }
+
+        public Builder setRight(Builder operation)
+        {
+            this.right = operation;
+            return this;
+        }
+
+        public Builder setLeft(Builder operation)
+        {
+            this.left = operation;
+            return this;
+        }
+
+        public void add(RowFilter.Expression e)
+        {
+            expressions.add(e);
+        }
+
+        public void add(Collection<RowFilter.Expression> newExpressions)
+        {
+            if (expressions != null)
+                expressions.addAll(newExpressions);
+        }
+
+        public Operation complete()
+        {
+            if (!expressions.isEmpty())
+            {
+                ListMultimap<ColumnDefinition, Expression> analyzedExpressions = analyzeGroup(controller, op, expressions);
+                RangeIterator.Builder<Long, Token> range = controller.getIndexes(op, analyzedExpressions.values());
+
+                Operation rightOp = null;
+                if (right != null)
+                {
+                    rightOp = right.complete();
+                    range.add(rightOp);
+                }
+
+                return new Operation(op, controller, analyzedExpressions, range.build(), null, rightOp);
+            }
+            else
+            {
+                Operation leftOp = null, rightOp = null;
+                boolean leftIndexes = false, rightIndexes = false;
+
+                if (left != null)
+                {
+                    leftOp = left.complete();
+                    leftIndexes = leftOp != null && leftOp.range != null;
+                }
+
+                if (right != null)
+                {
+                    rightOp = right.complete();
+                    rightIndexes = rightOp != null && rightOp.range != null;
+                }
+
+                RangeIterator<Long, Token> join;
+                /**
+                 * Operation should allow one of it's sub-trees to wrap no indexes, that is related  to the fact that we
+                 * have to accept defined-but-not-indexed columns as well as key range as IndexExpressions.
+                 *
+                 * Two cases are possible:
+                 *
+                 * only left child produced indexed iterators, that could happen when there are two columns
+                 * or key range on the right:
+                 *
+                 *                AND
+                 *              /     \
+                 *            OR       \
+                 *           /   \     AND
+                 *          a     b   /   \
+                 *                  key   key
+                 *
+                 * only right child produced indexed iterators:
+                 *
+                 *               AND
+                 *              /    \
+                 *            AND     a
+                 *           /   \
+                 *         key  key
+                 */
+                if (leftIndexes && !rightIndexes)
+                    join = leftOp;
+                else if (!leftIndexes && rightIndexes)
+                    join = rightOp;
+                else if (leftIndexes)
+                {
+                    RangeIterator.Builder<Long, Token> builder = op == OperationType.OR
+                                                ? RangeUnionIterator.<Long, Token>builder()
+                                                : RangeIntersectionIterator.<Long, Token>builder();
+
+                    join = builder.add(leftOp).add(rightOp).build();
+                }
+                else
+                    throw new AssertionError("both sub-trees have 0 indexes.");
+
+                return new Operation(op, controller, null, join, leftOp, rightOp);
+            }
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java b/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java
new file mode 100644
index 0000000..c8ae0d8
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/plan/QueryController.java

@@ -0,0 +1,255 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.plan;
+
+import java.util.*;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.collect.Sets;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.filter.DataLimits;
+import org.apache.cassandra.db.filter.RowFilter;
+import org.apache.cassandra.db.rows.UnfilteredRowIterator;
+import org.apache.cassandra.index.Index;
+import org.apache.cassandra.index.sasi.SASIIndex;
+import org.apache.cassandra.index.sasi.SSTableIndex;
+import org.apache.cassandra.index.sasi.TermIterator;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.conf.view.View;
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException;
+import org.apache.cassandra.index.sasi.plan.Operation.OperationType;
+import org.apache.cassandra.index.sasi.utils.RangeIntersectionIterator;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.index.sasi.utils.RangeUnionIterator;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.utils.Pair;
+
+public class QueryController
+{
+    private final long executionQuota;
+    private final long executionStart;
+
+    private final ColumnFamilyStore cfs;
+    private final PartitionRangeReadCommand command;
+    private final DataRange range;
+    private final Map<Collection<Expression>, List<RangeIterator<Long, Token>>> resources = new HashMap<>();
+
+    public QueryController(ColumnFamilyStore cfs, PartitionRangeReadCommand command, long timeQuotaMs)
+    {
+        this.cfs = cfs;
+        this.command = command;
+        this.range = command.dataRange();
+        this.executionQuota = TimeUnit.MILLISECONDS.toNanos(timeQuotaMs);
+        this.executionStart = System.nanoTime();
+    }
+
+    public boolean isForThrift()
+    {
+        return command.isForThrift();
+    }
+
+    public CFMetaData metadata()
+    {
+        return command.metadata();
+    }
+
+    public Collection<RowFilter.Expression> getExpressions()
+    {
+        return command.rowFilter().getExpressions();
+    }
+
+    public DataRange dataRange()
+    {
+        return command.dataRange();
+    }
+
+    public AbstractType<?> getKeyValidator()
+    {
+        return cfs.metadata.getKeyValidator();
+    }
+
+    public ColumnIndex getIndex(RowFilter.Expression expression)
+    {
+        Optional<Index> index = cfs.indexManager.getBestIndexFor(expression);
+        return index.isPresent() ? ((SASIIndex) index.get()).getIndex() : null;
+    }
+
+
+    public UnfilteredRowIterator getPartition(DecoratedKey key, ReadExecutionController executionController)
+    {
+        if (key == null)
+            throw new NullPointerException();
+        try
+        {
+            SinglePartitionReadCommand partition = SinglePartitionReadCommand.create(command.isForThrift(),
+                                                                                     cfs.metadata,
+                                                                                     command.nowInSec(),
+                                                                                     command.columnFilter(),
+                                                                                     command.rowFilter().withoutExpressions(),
+                                                                                     DataLimits.NONE,
+                                                                                     key,
+                                                                                     command.clusteringIndexFilter(key));
+
+            return partition.queryMemtableAndDisk(cfs, executionController);
+        }
+        finally
+        {
+            checkpoint();
+        }
+    }
+
+    /**
+     * Build a range iterator from the given list of expressions by applying given operation (OR/AND).
+     * Building of such iterator involves index search, results of which are persisted in the internal resources list
+     * and can be released later via {@link QueryController#releaseIndexes(Operation)}.
+     *
+     * @param op The operation type to coalesce expressions with.
+     * @param expressions The expressions to build range iterator from (expressions with not results are ignored).
+     *
+     * @return The range builder based on given expressions and operation type.
+     */
+    public RangeIterator.Builder<Long, Token> getIndexes(OperationType op, Collection<Expression> expressions)
+    {
+        if (resources.containsKey(expressions))
+            throw new IllegalArgumentException("Can't process the same expressions multiple times.");
+
+        RangeIterator.Builder<Long, Token> builder = op == OperationType.OR
+                                                ? RangeUnionIterator.<Long, Token>builder()
+                                                : RangeIntersectionIterator.<Long, Token>builder();
+
+        List<RangeIterator<Long, Token>> perIndexUnions = new ArrayList<>();
+
+        for (Map.Entry<Expression, Set<SSTableIndex>> e : getView(op, expressions).entrySet())
+        {
+            RangeIterator<Long, Token> index = TermIterator.build(e.getKey(), e.getValue());
+
+            if (index == null)
+                continue;
+
+            builder.add(index);
+            perIndexUnions.add(index);
+        }
+
+        resources.put(expressions, perIndexUnions);
+        return builder;
+    }
+
+    public void checkpoint()
+    {
+        if ((System.nanoTime() - executionStart) >= executionQuota)
+            throw new TimeQuotaExceededException();
+    }
+
+    public void releaseIndexes(Operation operation)
+    {
+        if (operation.expressions != null)
+            releaseIndexes(resources.remove(operation.expressions.values()));
+    }
+
+    private void releaseIndexes(List<RangeIterator<Long, Token>> indexes)
+    {
+        if (indexes == null)
+            return;
+
+        indexes.forEach(FileUtils::closeQuietly);
+    }
+
+    public void finish()
+    {
+        resources.values().forEach(this::releaseIndexes);
+    }
+
+    private Map<Expression, Set<SSTableIndex>> getView(OperationType op, Collection<Expression> expressions)
+    {
+        // first let's determine the primary expression if op is AND
+        Pair<Expression, Set<SSTableIndex>> primary = (op == OperationType.AND) ? calculatePrimary(expressions) : null;
+
+        Map<Expression, Set<SSTableIndex>> indexes = new HashMap<>();
+        for (Expression e : expressions)
+        {
+            // NO_EQ and non-index column query should only act as FILTER BY for satisfiedBy(Row) method
+            // because otherwise it likely to go through the whole index.
+            if (!e.isIndexed() || e.getOp() == Expression.Op.NOT_EQ)
+                continue;
+
+            // primary expression, we'll have to add as is
+            if (primary != null && e.equals(primary.left))
+            {
+                indexes.put(primary.left, primary.right);
+                continue;
+            }
+
+            View view = e.index.getView();
+            if (view == null)
+                continue;
+
+            Set<SSTableIndex> readers = new HashSet<>();
+            if (primary != null && primary.right.size() > 0)
+            {
+                for (SSTableIndex index : primary.right)
+                    readers.addAll(view.match(index.minKey(), index.maxKey()));
+            }
+            else
+            {
+                readers.addAll(applyScope(view.match(e)));
+            }
+
+            indexes.put(e, readers);
+        }
+
+        return indexes;
+    }
+
+    private Pair<Expression, Set<SSTableIndex>> calculatePrimary(Collection<Expression> expressions)
+    {
+        Expression expression = null;
+        Set<SSTableIndex> primaryIndexes = Collections.emptySet();
+
+        for (Expression e : expressions)
+        {
+            if (!e.isIndexed())
+                continue;
+
+            View view = e.index.getView();
+            if (view == null)
+                continue;
+
+            Set<SSTableIndex> indexes = applyScope(view.match(e));
+            if (primaryIndexes.size() > indexes.size())
+            {
+                primaryIndexes = indexes;
+                expression = e;
+            }
+        }
+
+        return expression == null ? null : Pair.create(expression, primaryIndexes);
+    }
+
+    private Set<SSTableIndex> applyScope(Set<SSTableIndex> indexes)
+    {
+        return Sets.filter(indexes, index -> {
+            SSTableReader sstable = index.getSSTable();
+            return range.startKey().compareTo(sstable.last) <= 0 && (range.stopKey().isMinimum() || sstable.first.compareTo(range.stopKey()) <= 0);
+        });
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java b/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java
new file mode 100644
index 0000000..4410756
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/plan/QueryPlan.java

@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.plan;
+
+import java.util.*;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.partitions.UnfilteredPartitionIterator;
+import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.dht.AbstractBounds;
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.plan.Operation.OperationType;
+import org.apache.cassandra.exceptions.RequestTimeoutException;
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.utils.AbstractIterator;
+
+public class QueryPlan
+{
+    private final QueryController controller;
+
+    public QueryPlan(ColumnFamilyStore cfs, ReadCommand command, long executionQuotaMs)
+    {
+        this.controller = new QueryController(cfs, (PartitionRangeReadCommand) command, executionQuotaMs);
+    }
+
+    /**
+     * Converts expressions into operation tree (which is currently just a single AND).
+     *
+     * Operation tree allows us to do a couple of important optimizations
+     * namely, group flattening for AND operations (query rewrite), expression bounds checks,
+     * "satisfies by" checks for resulting rows with an early exit.
+     *
+     * @return root of the operations tree.
+     */
+    private Operation analyze()
+    {
+        try
+        {
+            Operation.Builder and = new Operation.Builder(OperationType.AND, controller);
+            controller.getExpressions().forEach(and::add);
+            return and.complete();
+        }
+        catch (Exception | Error e)
+        {
+            controller.finish();
+            throw e;
+        }
+    }
+
+    public UnfilteredPartitionIterator execute(ReadExecutionController executionController) throws RequestTimeoutException
+    {
+        return new ResultIterator(analyze(), controller, executionController);
+    }
+
+    private static class ResultIterator extends AbstractIterator<UnfilteredRowIterator> implements UnfilteredPartitionIterator
+    {
+        private final AbstractBounds<PartitionPosition> keyRange;
+        private final Operation operationTree;
+        private final QueryController controller;
+        private final ReadExecutionController executionController;
+
+        private Iterator<DecoratedKey> currentKeys = null;
+
+        public ResultIterator(Operation operationTree, QueryController controller, ReadExecutionController executionController)
+        {
+            this.keyRange = controller.dataRange().keyRange();
+            this.operationTree = operationTree;
+            this.controller = controller;
+            this.executionController = executionController;
+            if (operationTree != null)
+                operationTree.skipTo((Long) keyRange.left.getToken().getTokenValue());
+        }
+
+        protected UnfilteredRowIterator computeNext()
+        {
+            if (operationTree == null)
+                return endOfData();
+
+            for (;;)
+            {
+                if (currentKeys == null || !currentKeys.hasNext())
+                {
+                    if (!operationTree.hasNext())
+                         return endOfData();
+
+                    Token token = operationTree.next();
+                    currentKeys = token.iterator();
+                }
+
+                while (currentKeys.hasNext())
+                {
+                    DecoratedKey key = currentKeys.next();
+
+                    if (!keyRange.right.isMinimum() && keyRange.right.compareTo(key) < 0)
+                        return endOfData();
+
+                    try (UnfilteredRowIterator partition = controller.getPartition(key, executionController))
+                    {
+                        Row staticRow = partition.staticRow();
+                        List<Unfiltered> clusters = new ArrayList<>();
+
+                        while (partition.hasNext())
+                        {
+                            Unfiltered row = partition.next();
+                            if (operationTree.satisfiedBy(row, staticRow, true))
+                                clusters.add(row);
+                        }
+
+                        if (!clusters.isEmpty())
+                            return new PartitionIterator(partition, clusters);
+                    }
+                }
+            }
+        }
+
+        private static class PartitionIterator extends AbstractUnfilteredRowIterator
+        {
+            private final Iterator<Unfiltered> rows;
+
+            public PartitionIterator(UnfilteredRowIterator partition, Collection<Unfiltered> content)
+            {
+                super(partition.metadata(),
+                      partition.partitionKey(),
+                      partition.partitionLevelDeletion(),
+                      partition.columns(),
+                      partition.staticRow(),
+                      partition.isReverseOrder(),
+                      partition.stats());
+
+                rows = content.iterator();
+            }
+
+            @Override
+            protected Unfiltered computeNext()
+            {
+                return rows.hasNext() ? rows.next() : endOfData();
+            }
+        }
+
+        public boolean isForThrift()
+        {
+            return controller.isForThrift();
+        }
+
+        public CFMetaData metadata()
+        {
+            return controller.metadata();
+        }
+
+        public void close()
+        {
+            FileUtils.closeQuietly(operationTree);
+            controller.finish();
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/sa/ByteTerm.java b/src/java/org/apache/cassandra/index/sasi/sa/ByteTerm.java
new file mode 100644
index 0000000..c7bbab7
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/sa/ByteTerm.java

@@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.sa;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.index.sasi.disk.TokenTreeBuilder;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+public class ByteTerm extends Term<ByteBuffer>
+{
+    public ByteTerm(int position, ByteBuffer value, TokenTreeBuilder tokens)
+    {
+        super(position, value, tokens);
+    }
+
+    public ByteBuffer getTerm()
+    {
+        return value.duplicate();
+    }
+
+    public ByteBuffer getSuffix(int start)
+    {
+        return (ByteBuffer) value.duplicate().position(value.position() + start);
+    }
+
+    public int compareTo(AbstractType<?> comparator, Term other)
+    {
+        return comparator.compare(value, (ByteBuffer) other.value);
+    }
+
+    public int length()
+    {
+        return value.remaining();
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/sa/CharTerm.java b/src/java/org/apache/cassandra/index/sasi/sa/CharTerm.java
new file mode 100644
index 0000000..533b566
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/sa/CharTerm.java

@@ -0,0 +1,54 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.sa;
+
+import java.nio.ByteBuffer;
+import java.nio.CharBuffer;
+
+import org.apache.cassandra.index.sasi.disk.TokenTreeBuilder;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+import com.google.common.base.Charsets;
+
+public class CharTerm extends Term<CharBuffer>
+{
+    public CharTerm(int position, CharBuffer value, TokenTreeBuilder tokens)
+    {
+        super(position, value, tokens);
+    }
+
+    public ByteBuffer getTerm()
+    {
+        return Charsets.UTF_8.encode(value.duplicate());
+    }
+
+    public ByteBuffer getSuffix(int start)
+    {
+        return Charsets.UTF_8.encode(value.subSequence(value.position() + start, value.remaining()));
+    }
+
+    public int compareTo(AbstractType<?> comparator, Term other)
+    {
+        return value.compareTo((CharBuffer) other.value);
+    }
+
+    public int length()
+    {
+        return value.length();
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/sa/IndexedTerm.java b/src/java/org/apache/cassandra/index/sasi/sa/IndexedTerm.java
new file mode 100644
index 0000000..8e27134
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/sa/IndexedTerm.java

@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.index.sasi.sa;
+
+import java.nio.ByteBuffer;
+
+public class IndexedTerm
+{
+    private final ByteBuffer term;
+    private final boolean isPartial;
+
+    public IndexedTerm(ByteBuffer term, boolean isPartial)
+    {
+        this.term = term;
+        this.isPartial = isPartial;
+    }
+
+    public ByteBuffer getBytes()
+    {
+        return term;
+    }
+
+    public boolean isPartial()
+    {
+        return isPartial;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/sa/IntegralSA.java b/src/java/org/apache/cassandra/index/sasi/sa/IntegralSA.java
new file mode 100644
index 0000000..e3d591f
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/sa/IntegralSA.java

@@ -0,0 +1,85 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.sa;
+
+import java.nio.ByteBuffer;
+import java.util.Collections;
+import java.util.Comparator;
+import java.util.Iterator;
+
+import org.apache.cassandra.index.Index;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder;
+import org.apache.cassandra.index.sasi.disk.TokenTreeBuilder;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.utils.Pair;
+
+public class IntegralSA extends SA<ByteBuffer>
+{
+    public IntegralSA(AbstractType<?> comparator, OnDiskIndexBuilder.Mode mode)
+    {
+        super(comparator, mode);
+    }
+
+    public Term<ByteBuffer> getTerm(ByteBuffer termValue, TokenTreeBuilder tokens)
+    {
+        return new ByteTerm(charCount, termValue, tokens);
+    }
+
+    public TermIterator finish()
+    {
+        return new IntegralSuffixIterator();
+    }
+
+
+    private class IntegralSuffixIterator extends TermIterator
+    {
+        private final Iterator<Term<ByteBuffer>> termIterator;
+
+        public IntegralSuffixIterator()
+        {
+            Collections.sort(terms, new Comparator<Term<?>>()
+            {
+                public int compare(Term<?> a, Term<?> b)
+                {
+                    return a.compareTo(comparator, b);
+                }
+            });
+
+            termIterator = terms.iterator();
+        }
+
+        public ByteBuffer minTerm()
+        {
+            return terms.get(0).getTerm();
+        }
+
+        public ByteBuffer maxTerm()
+        {
+            return terms.get(terms.size() - 1).getTerm();
+        }
+
+        protected Pair<IndexedTerm, TokenTreeBuilder> computeNext()
+        {
+            if (!termIterator.hasNext())
+                return endOfData();
+
+            Term<ByteBuffer> term = termIterator.next();
+            return Pair.create(new IndexedTerm(term.getTerm(), false), term.getTokens().finish());
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/sa/SA.java b/src/java/org/apache/cassandra/index/sasi/sa/SA.java
new file mode 100644
index 0000000..75f9f92
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/sa/SA.java

@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.sa;
+
+import java.nio.Buffer;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder.Mode;
+import org.apache.cassandra.index.sasi.disk.TokenTreeBuilder;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+public abstract class SA<T extends Buffer>
+{
+    protected final AbstractType<?> comparator;
+    protected final Mode mode;
+
+    protected final List<Term<T>> terms = new ArrayList<>();
+    protected int charCount = 0;
+
+    public SA(AbstractType<?> comparator, Mode mode)
+    {
+        this.comparator = comparator;
+        this.mode = mode;
+    }
+
+    public Mode getMode()
+    {
+        return mode;
+    }
+
+    public void add(ByteBuffer termValue, TokenTreeBuilder tokens)
+    {
+        Term<T> term = getTerm(termValue, tokens);
+        terms.add(term);
+        charCount += term.length();
+    }
+
+    public abstract TermIterator finish();
+
+    protected abstract Term<T> getTerm(ByteBuffer termValue, TokenTreeBuilder tokens);
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/sa/SuffixSA.java b/src/java/org/apache/cassandra/index/sasi/sa/SuffixSA.java
new file mode 100644
index 0000000..59c50b4
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/sa/SuffixSA.java

@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.sa;
+
+import java.nio.ByteBuffer;
+import java.nio.CharBuffer;
+
+import org.apache.cassandra.index.sasi.disk.DynamicTokenTreeBuilder;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder;
+import org.apache.cassandra.index.sasi.disk.TokenTreeBuilder;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.utils.Pair;
+
+import com.google.common.base.Charsets;
+import net.mintern.primitive.Primitive;
+
+public class SuffixSA extends SA<CharBuffer>
+{
+    public SuffixSA(AbstractType<?> comparator, OnDiskIndexBuilder.Mode mode)
+    {
+        super(comparator, mode);
+    }
+
+    protected Term<CharBuffer> getTerm(ByteBuffer termValue, TokenTreeBuilder tokens)
+    {
+        return new CharTerm(charCount, Charsets.UTF_8.decode(termValue.duplicate()), tokens);
+    }
+
+    public TermIterator finish()
+    {
+        return new SASuffixIterator();
+    }
+
+    private class SASuffixIterator extends TermIterator
+    {
+
+        private static final int COMPLETE_BIT = 31;
+
+        private final long[] suffixes;
+
+        private int current = 0;
+        private IndexedTerm lastProcessedSuffix;
+        private TokenTreeBuilder container;
+
+        public SASuffixIterator()
+        {
+            // each element has term index and char position encoded as two 32-bit integers
+            // to avoid binary search per suffix while sorting suffix array.
+            suffixes = new long[charCount];
+
+            long termIndex = -1, currentTermLength = -1;
+            boolean isComplete = false;
+            for (int i = 0; i < charCount; i++)
+            {
+                if (i >= currentTermLength || currentTermLength == -1)
+                {
+                    Term currentTerm = terms.get((int) ++termIndex);
+                    currentTermLength = currentTerm.getPosition() + currentTerm.length();
+                    isComplete = true;
+                }
+
+                suffixes[i] = (termIndex << 32) | i;
+                if (isComplete)
+                    suffixes[i] |= (1L << COMPLETE_BIT);
+
+                isComplete = false;
+            }
+
+            Primitive.sort(suffixes, (a, b) -> {
+                Term aTerm = terms.get((int) (a >>> 32));
+                Term bTerm = terms.get((int) (b >>> 32));
+                return comparator.compare(aTerm.getSuffix(clearCompleteBit(a) - aTerm.getPosition()),
+                                          bTerm.getSuffix(clearCompleteBit(b) - bTerm.getPosition()));
+            });
+        }
+
+        private int clearCompleteBit(long value)
+        {
+            return (int) (value & ~(1L << COMPLETE_BIT));
+        }
+
+        private Pair<IndexedTerm, TokenTreeBuilder> suffixAt(int position)
+        {
+            long index = suffixes[position];
+            Term term = terms.get((int) (index >>> 32));
+            boolean isPartitial = (index & ((long) 1 << 31)) == 0;
+            return Pair.create(new IndexedTerm(term.getSuffix(clearCompleteBit(index) - term.getPosition()), isPartitial), term.getTokens());
+        }
+
+        public ByteBuffer minTerm()
+        {
+            return suffixAt(0).left.getBytes();
+        }
+
+        public ByteBuffer maxTerm()
+        {
+            return suffixAt(suffixes.length - 1).left.getBytes();
+        }
+
+        protected Pair<IndexedTerm, TokenTreeBuilder> computeNext()
+        {
+            while (true)
+            {
+                if (current >= suffixes.length)
+                {
+                    if (lastProcessedSuffix == null)
+                        return endOfData();
+
+                    Pair<IndexedTerm, TokenTreeBuilder> result = finishSuffix();
+
+                    lastProcessedSuffix = null;
+                    return result;
+                }
+
+                Pair<IndexedTerm, TokenTreeBuilder> suffix = suffixAt(current++);
+
+                if (lastProcessedSuffix == null)
+                {
+                    lastProcessedSuffix = suffix.left;
+                    container = new DynamicTokenTreeBuilder(suffix.right);
+                }
+                else if (comparator.compare(lastProcessedSuffix.getBytes(), suffix.left.getBytes()) == 0)
+                {
+                    lastProcessedSuffix = suffix.left;
+                    container.add(suffix.right);
+                }
+                else
+                {
+                    Pair<IndexedTerm, TokenTreeBuilder> result = finishSuffix();
+
+                    lastProcessedSuffix = suffix.left;
+                    container = new DynamicTokenTreeBuilder(suffix.right);
+
+                    return result;
+                }
+            }
+        }
+
+        private Pair<IndexedTerm, TokenTreeBuilder> finishSuffix()
+        {
+            return Pair.create(lastProcessedSuffix, container.finish());
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/sa/Term.java b/src/java/org/apache/cassandra/index/sasi/sa/Term.java
new file mode 100644
index 0000000..fe6eca8
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/sa/Term.java

@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.sa;
+
+import java.nio.Buffer;
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.index.sasi.disk.TokenTreeBuilder;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+public abstract class Term<T extends Buffer>
+{
+    protected final int position;
+    protected final T value;
+    protected TokenTreeBuilder tokens;
+
+
+    public Term(int position, T value, TokenTreeBuilder tokens)
+    {
+        this.position = position;
+        this.value = value;
+        this.tokens = tokens;
+    }
+
+    public int getPosition()
+    {
+        return position;
+    }
+
+    public abstract ByteBuffer getTerm();
+    public abstract ByteBuffer getSuffix(int start);
+
+    public TokenTreeBuilder getTokens()
+    {
+        return tokens;
+    }
+
+    public abstract int compareTo(AbstractType<?> comparator, Term other);
+
+    public abstract int length();
+
+}
+

diff --git a/src/java/org/apache/cassandra/index/sasi/sa/TermIterator.java b/src/java/org/apache/cassandra/index/sasi/sa/TermIterator.java
new file mode 100644
index 0000000..c8572a9
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/sa/TermIterator.java

@@ -0,0 +1,31 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.sa;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.index.sasi.disk.TokenTreeBuilder;
+import org.apache.cassandra.utils.Pair;
+
+import com.google.common.collect.AbstractIterator;
+
+public abstract class TermIterator extends AbstractIterator<Pair<IndexedTerm, TokenTreeBuilder>>
+{
+    public abstract ByteBuffer minTerm();
+    public abstract ByteBuffer maxTerm();
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/AbstractIterator.java b/src/java/org/apache/cassandra/index/sasi/utils/AbstractIterator.java
new file mode 100644
index 0000000..cf918c1
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/AbstractIterator.java

@@ -0,0 +1,155 @@
+/*
+ * Copyright (C) 2007 The Guava Authors
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.index.sasi.utils;
+
+import java.util.NoSuchElementException;
+
+import com.google.common.collect.PeekingIterator;
+
+import static com.google.common.base.Preconditions.checkState;
+
+// This is fork of the Guava AbstractIterator, the only difference
+// is that state & next variables are now protected, this was required
+// for SkippableIterator.skipTo(..) to void all previous state.
+public abstract class AbstractIterator<T> implements PeekingIterator<T>
+{
+    protected State state = State.NOT_READY;
+
+    /** Constructor for use by subclasses. */
+    protected AbstractIterator() {}
+
+    protected enum State
+    {
+        /** We have computed the next element and haven't returned it yet. */
+        READY,
+
+        /** We haven't yet computed or have already returned the element. */
+        NOT_READY,
+
+        /** We have reached the end of the data and are finished. */
+        DONE,
+
+        /** We've suffered an exception and are kaput. */
+        FAILED,
+    }
+
+    protected T next;
+
+    /**
+     * Returns the next element. <b>Note:</b> the implementation must call {@link
+     * #endOfData()} when there are no elements left in the iteration. Failure to
+     * do so could result in an infinite loop.
+     *
+     * <p>The initial invocation of {@link #hasNext()} or {@link #next()} calls
+     * this method, as does the first invocation of {@code hasNext} or {@code
+     * next} following each successful call to {@code next}. Once the
+     * implementation either invokes {@code endOfData} or throws an exception,
+     * {@code computeNext} is guaranteed to never be called again.
+     *
+     * <p>If this method throws an exception, it will propagate outward to the
+     * {@code hasNext} or {@code next} invocation that invoked this method. Any
+     * further attempts to use the iterator will result in an {@link
+     * IllegalStateException}.
+     *
+     * <p>The implementation of this method may not invoke the {@code hasNext},
+     * {@code next}, or {@link #peek()} methods on this instance; if it does, an
+     * {@code IllegalStateException} will result.
+     *
+     * @return the next element if there was one. If {@code endOfData} was called
+     *     during execution, the return value will be ignored.
+     * @throws RuntimeException if any unrecoverable error happens. This exception
+     *     will propagate outward to the {@code hasNext()}, {@code next()}, or
+     *     {@code peek()} invocation that invoked this method. Any further
+     *     attempts to use the iterator will result in an
+     *     {@link IllegalStateException}.
+     */
+    protected abstract T computeNext();
+
+    /**
+     * Implementations of {@link #computeNext} <b>must</b> invoke this method when
+     * there are no elements left in the iteration.
+     *
+     * @return {@code null}; a convenience so your {@code computeNext}
+     *     implementation can use the simple statement {@code return endOfData();}
+     */
+    protected final T endOfData()
+    {
+        state = State.DONE;
+        return null;
+    }
+
+    public final boolean hasNext()
+    {
+        checkState(state != State.FAILED);
+
+        switch (state)
+        {
+            case DONE:
+                return false;
+
+            case READY:
+                return true;
+
+            default:
+        }
+
+        return tryToComputeNext();
+    }
+
+    protected boolean tryToComputeNext()
+    {
+        state = State.FAILED; // temporary pessimism
+        next = computeNext();
+
+        if (state != State.DONE)
+        {
+            state = State.READY;
+            return true;
+        }
+
+        return false;
+    }
+
+    public final T next()
+    {
+        if (!hasNext())
+            throw new NoSuchElementException();
+
+        state = State.NOT_READY;
+        return next;
+    }
+
+    public void remove()
+    {
+        throw new UnsupportedOperationException();
+    }
+
+    /**
+     * Returns the next element in the iteration without advancing the iteration,
+     * according to the contract of {@link PeekingIterator#peek()}.
+     *
+     * <p>Implementations of {@code AbstractIterator} that wish to expose this
+     * functionality should implement {@code PeekingIterator}.
+     */
+    public final T peek()
+    {
+        if (!hasNext())
+            throw new NoSuchElementException();
+
+        return next;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/CombinedTerm.java b/src/java/org/apache/cassandra/index/sasi/utils/CombinedTerm.java
new file mode 100644
index 0000000..81e535d
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/CombinedTerm.java

@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.nio.ByteBuffer;
+import java.util.*;
+
+import org.apache.cassandra.index.sasi.disk.*;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndex.DataTerm;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+public class CombinedTerm implements CombinedValue<DataTerm>
+{
+    private final AbstractType<?> comparator;
+    private final DataTerm term;
+    private final List<DataTerm> mergedTerms = new ArrayList<>();
+
+    public CombinedTerm(AbstractType<?> comparator, DataTerm term)
+    {
+        this.comparator = comparator;
+        this.term = term;
+    }
+
+    public ByteBuffer getTerm()
+    {
+        return term.getTerm();
+    }
+
+    public boolean isPartial()
+    {
+        return term.isPartial();
+    }
+
+    public RangeIterator<Long, Token> getTokenIterator()
+    {
+        RangeIterator.Builder<Long, Token> union = RangeUnionIterator.builder();
+        union.add(term.getTokens());
+        mergedTerms.stream().map(OnDiskIndex.DataTerm::getTokens).forEach(union::add);
+
+        return union.build();
+    }
+
+    public TokenTreeBuilder getTokenTreeBuilder()
+    {
+        return new StaticTokenTreeBuilder(this).finish();
+    }
+
+    public void merge(CombinedValue<DataTerm> other)
+    {
+        if (!(other instanceof CombinedTerm))
+            return;
+
+        CombinedTerm o = (CombinedTerm) other;
+
+        assert comparator == o.comparator;
+
+        mergedTerms.add(o.term);
+    }
+
+    public DataTerm get()
+    {
+        return term;
+    }
+
+    public int compareTo(CombinedValue<DataTerm> o)
+    {
+        return term.compareTo(comparator, o.get().getTerm());
+    }
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/CombinedTermIterator.java b/src/java/org/apache/cassandra/index/sasi/utils/CombinedTermIterator.java
new file mode 100644
index 0000000..4b004e0
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/CombinedTermIterator.java

@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.index.sasi.disk.Descriptor;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndex;
+import org.apache.cassandra.index.sasi.disk.TokenTreeBuilder;
+import org.apache.cassandra.index.sasi.sa.IndexedTerm;
+import org.apache.cassandra.index.sasi.sa.TermIterator;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.utils.Pair;
+
+@SuppressWarnings("resource")
+public class CombinedTermIterator extends TermIterator
+{
+    final Descriptor descriptor;
+    final RangeIterator<OnDiskIndex.DataTerm, CombinedTerm> union;
+    final ByteBuffer min;
+    final ByteBuffer max;
+
+    public CombinedTermIterator(OnDiskIndex... sas)
+    {
+        this(Descriptor.CURRENT, sas);
+    }
+
+    public CombinedTermIterator(Descriptor d, OnDiskIndex... parts)
+    {
+        descriptor = d;
+        union = OnDiskIndexIterator.union(parts);
+
+        AbstractType<?> comparator = parts[0].getComparator(); // assumes all SAs have same comparator
+        ByteBuffer minimum = parts[0].minTerm();
+        ByteBuffer maximum = parts[0].maxTerm();
+
+        for (int i = 1; i < parts.length; i++)
+        {
+            OnDiskIndex part = parts[i];
+            if (part == null)
+                continue;
+
+            minimum = comparator.compare(minimum, part.minTerm()) > 0 ? part.minTerm() : minimum;
+            maximum = comparator.compare(maximum, part.maxTerm()) < 0 ? part.maxTerm() : maximum;
+        }
+
+        min = minimum;
+        max = maximum;
+    }
+
+    public ByteBuffer minTerm()
+    {
+        return min;
+    }
+
+    public ByteBuffer maxTerm()
+    {
+        return max;
+    }
+
+    protected Pair<IndexedTerm, TokenTreeBuilder> computeNext()
+    {
+        if (!union.hasNext())
+        {
+            return endOfData();
+        }
+        else
+        {
+            CombinedTerm term = union.next();
+            return Pair.create(new IndexedTerm(term.getTerm(), term.isPartial()), term.getTokenTreeBuilder());
+        }
+
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/CombinedValue.java b/src/java/org/apache/cassandra/index/sasi/utils/CombinedValue.java
new file mode 100644
index 0000000..ca5f9be
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/CombinedValue.java

@@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+public interface CombinedValue<V> extends Comparable<CombinedValue<V>>
+{
+    void merge(CombinedValue<V> other);
+
+    V get();
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/MappedBuffer.java b/src/java/org/apache/cassandra/index/sasi/utils/MappedBuffer.java
new file mode 100644
index 0000000..37ab1be
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/MappedBuffer.java

@@ -0,0 +1,253 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.io.Closeable;
+import java.nio.ByteBuffer;
+import java.nio.MappedByteBuffer;
+import java.nio.channels.FileChannel.MapMode;
+
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.io.util.ChannelProxy;
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.io.util.RandomAccessReader;
+
+import com.google.common.annotations.VisibleForTesting;
+
+public class MappedBuffer implements Closeable
+{
+    private final MappedByteBuffer[] pages;
+
+    private long position, limit;
+    private final long capacity;
+    private final int pageSize, sizeBits;
+
+    private MappedBuffer(MappedBuffer other)
+    {
+        this.sizeBits = other.sizeBits;
+        this.pageSize = other.pageSize;
+        this.position = other.position;
+        this.limit = other.limit;
+        this.capacity = other.capacity;
+        this.pages = other.pages;
+    }
+
+    public MappedBuffer(RandomAccessReader file)
+    {
+        this(file.getChannel(), 30);
+    }
+
+    public MappedBuffer(ChannelProxy file)
+    {
+        this(file, 30);
+    }
+
+    @VisibleForTesting
+    protected MappedBuffer(ChannelProxy file, int numPageBits)
+    {
+        if (numPageBits > Integer.SIZE - 1)
+            throw new IllegalArgumentException("page size can't be bigger than 1G");
+
+        sizeBits = numPageBits;
+        pageSize = 1 << sizeBits;
+        position = 0;
+        limit = capacity = file.size();
+        pages = new MappedByteBuffer[(int) (file.size() / pageSize) + 1];
+
+        try
+        {
+            long offset = 0;
+            for (int i = 0; i < pages.length; i++)
+            {
+                long pageSize = Math.min(this.pageSize, (capacity - offset));
+                pages[i] = file.map(MapMode.READ_ONLY, offset, pageSize);
+                offset += pageSize;
+            }
+        }
+        finally
+        {
+            file.close();
+        }
+    }
+
+    public int comparePageTo(long offset, int length, AbstractType<?> comparator, ByteBuffer other)
+    {
+        return comparator.compare(getPageRegion(offset, length), other);
+    }
+
+    public long capacity()
+    {
+        return capacity;
+    }
+
+    public long position()
+    {
+        return position;
+    }
+
+    public MappedBuffer position(long newPosition)
+    {
+        if (newPosition < 0 || newPosition > limit)
+            throw new IllegalArgumentException("position: " + newPosition + ", limit: " + limit);
+
+        position = newPosition;
+        return this;
+    }
+
+    public long limit()
+    {
+        return limit;
+    }
+
+    public MappedBuffer limit(long newLimit)
+    {
+        if (newLimit < position || newLimit > capacity)
+            throw new IllegalArgumentException();
+
+        limit = newLimit;
+        return this;
+    }
+
+    public long remaining()
+    {
+        return limit - position;
+    }
+
+    public boolean hasRemaining()
+    {
+        return remaining() > 0;
+    }
+
+    public byte get()
+    {
+        return get(position++);
+    }
+
+    public byte get(long pos)
+    {
+        return pages[getPage(pos)].get(getPageOffset(pos));
+    }
+
+    public short getShort()
+    {
+        short value = getShort(position);
+        position += 2;
+        return value;
+    }
+
+    public short getShort(long pos)
+    {
+        if (isPageAligned(pos, 2))
+            return pages[getPage(pos)].getShort(getPageOffset(pos));
+
+        int ch1 = get(pos)     & 0xff;
+        int ch2 = get(pos + 1) & 0xff;
+        return (short) ((ch1 << 8) + ch2);
+    }
+
+    public int getInt()
+    {
+        int value = getInt(position);
+        position += 4;
+        return value;
+    }
+
+    public int getInt(long pos)
+    {
+        if (isPageAligned(pos, 4))
+            return pages[getPage(pos)].getInt(getPageOffset(pos));
+
+        int ch1 = get(pos)     & 0xff;
+        int ch2 = get(pos + 1) & 0xff;
+        int ch3 = get(pos + 2) & 0xff;
+        int ch4 = get(pos + 3) & 0xff;
+
+        return ((ch1 << 24) + (ch2 << 16) + (ch3 << 8) + ch4);
+    }
+
+    public long getLong()
+    {
+        long value = getLong(position);
+        position += 8;
+        return value;
+    }
+
+
+    public long getLong(long pos)
+    {
+        // fast path if the long could be retrieved from a single page
+        // that would avoid multiple expensive look-ups into page array.
+        return (isPageAligned(pos, 8))
+                ? pages[getPage(pos)].getLong(getPageOffset(pos))
+                : ((long) (getInt(pos)) << 32) + (getInt(pos + 4) & 0xFFFFFFFFL);
+    }
+
+    public ByteBuffer getPageRegion(long position, int length)
+    {
+        if (!isPageAligned(position, length))
+            throw new IllegalArgumentException(String.format("range: %s-%s wraps more than one page", position, length));
+
+        ByteBuffer slice = pages[getPage(position)].duplicate();
+
+        int pageOffset = getPageOffset(position);
+        slice.position(pageOffset).limit(pageOffset + length);
+
+        return slice;
+    }
+
+    public MappedBuffer duplicate()
+    {
+        return new MappedBuffer(this);
+    }
+
+    public void close()
+    {
+        if (!FileUtils.isCleanerAvailable())
+            return;
+
+        /*
+         * Try forcing the unmapping of pages using undocumented unsafe sun APIs.
+         * If this fails (non Sun JVM), we'll have to wait for the GC to finalize the mapping.
+         * If this works and a thread tries to access any page, hell will unleash on earth.
+         */
+        try
+        {
+            for (MappedByteBuffer segment : pages)
+                FileUtils.clean(segment);
+        }
+        catch (Exception e)
+        {
+            // This is not supposed to happen
+        }
+    }
+
+    private int getPage(long position)
+    {
+        return (int) (position >> sizeBits);
+    }
+
+    private int getPageOffset(long position)
+    {
+        return (int) (position & pageSize - 1);
+    }
+
+    private boolean isPageAligned(long position, int length)
+    {
+        return pageSize - (getPageOffset(position) + length) > 0;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/OnDiskIndexIterator.java b/src/java/org/apache/cassandra/index/sasi/utils/OnDiskIndexIterator.java
new file mode 100644
index 0000000..ae97cab
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/OnDiskIndexIterator.java

@@ -0,0 +1,64 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.io.IOException;
+import java.util.Iterator;
+
+import org.apache.cassandra.index.sasi.disk.OnDiskIndex;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndex.DataTerm;
+import org.apache.cassandra.db.marshal.AbstractType;
+
+public class OnDiskIndexIterator extends RangeIterator<DataTerm, CombinedTerm>
+{
+    private final AbstractType<?> comparator;
+    private final Iterator<DataTerm> terms;
+
+    public OnDiskIndexIterator(OnDiskIndex index)
+    {
+        super(index.min(), index.max(), Long.MAX_VALUE);
+
+        this.comparator = index.getComparator();
+        this.terms = index.iterator();
+    }
+
+    public static RangeIterator<DataTerm, CombinedTerm> union(OnDiskIndex... union)
+    {
+        RangeUnionIterator.Builder<DataTerm, CombinedTerm> builder = RangeUnionIterator.builder();
+        for (OnDiskIndex e : union)
+        {
+            if (e != null)
+                builder.add(new OnDiskIndexIterator(e));
+        }
+
+        return builder.build();
+    }
+
+    protected CombinedTerm computeNext()
+    {
+        return terms.hasNext() ? new CombinedTerm(comparator, terms.next()) : endOfData();
+    }
+
+    protected void performSkipTo(DataTerm nextToken)
+    {
+        throw new UnsupportedOperationException();
+    }
+
+    public void close() throws IOException
+    {}
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java b/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java
new file mode 100644
index 0000000..02d9527
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/RangeIntersectionIterator.java

@@ -0,0 +1,282 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.List;
+import java.util.PriorityQueue;
+
+import com.google.common.collect.Iterators;
+import org.apache.cassandra.io.util.FileUtils;
+
+import com.google.common.annotations.VisibleForTesting;
+
+@SuppressWarnings("resource")
+public class RangeIntersectionIterator
+{
+    protected enum Strategy
+    {
+        BOUNCE, LOOKUP, ADAPTIVE
+    }
+
+    public static <K extends Comparable<K>, D extends CombinedValue<K>> Builder<K, D> builder()
+    {
+        return builder(Strategy.ADAPTIVE);
+    }
+
+    @VisibleForTesting
+    protected static <K extends Comparable<K>, D extends CombinedValue<K>> Builder<K, D> builder(Strategy strategy)
+    {
+        return new Builder<>(strategy);
+    }
+
+    public static class Builder<K extends Comparable<K>, D extends CombinedValue<K>> extends RangeIterator.Builder<K, D>
+    {
+        private final Strategy strategy;
+
+        public Builder(Strategy strategy)
+        {
+            super(IteratorType.INTERSECTION);
+            this.strategy = strategy;
+        }
+
+        protected RangeIterator<K, D> buildIterator()
+        {
+            // if the range is disjoint we can simply return empty
+            // iterator of any type, because it's not going to produce any results.
+            if (statistics.isDisjoint())
+                return new BounceIntersectionIterator<>(statistics, new PriorityQueue<RangeIterator<K, D>>(1));
+
+            switch (strategy)
+            {
+                case LOOKUP:
+                    return new LookupIntersectionIterator<>(statistics, ranges);
+
+                case BOUNCE:
+                    return new BounceIntersectionIterator<>(statistics, ranges);
+
+                case ADAPTIVE:
+                    return statistics.sizeRatio() <= 0.01d
+                            ? new LookupIntersectionIterator<>(statistics, ranges)
+                            : new BounceIntersectionIterator<>(statistics, ranges);
+
+                default:
+                    throw new IllegalStateException("Unknown strategy: " + strategy);
+            }
+        }
+    }
+
+    private static abstract class AbstractIntersectionIterator<K extends Comparable<K>, D extends CombinedValue<K>> extends RangeIterator<K, D>
+    {
+        protected final PriorityQueue<RangeIterator<K, D>> ranges;
+
+        private AbstractIntersectionIterator(Builder.Statistics<K, D> statistics, PriorityQueue<RangeIterator<K, D>> ranges)
+        {
+            super(statistics);
+            this.ranges = ranges;
+        }
+
+        public void close() throws IOException
+        {
+            for (RangeIterator<K, D> range : ranges)
+                FileUtils.closeQuietly(range);
+        }
+    }
+
+    /**
+     * Iterator which performs intersection of multiple ranges by using bouncing (merge-join) technique to identify
+     * common elements in the given ranges. Aforementioned "bounce" works as follows: range queue is poll'ed for the
+     * range with the smallest current token (main loop), that token is used to {@link RangeIterator#skipTo(Comparable)}
+     * other ranges, if token produced by {@link RangeIterator#skipTo(Comparable)} is equal to current "candidate" token,
+     * both get merged together and the same operation is repeated for next range from the queue, if returned token
+     * is not equal than candidate, candidate's range gets put back into the queue and the main loop gets repeated until
+     * next intersection token is found or at least one iterator runs out of tokens.
+     *
+     * This technique is every efficient to jump over gaps in the ranges.
+     *
+     * @param <K> The type used to sort ranges.
+     * @param <D> The container type which is going to be returned by {@link Iterator#next()}.
+     */
+    @VisibleForTesting
+    protected static class BounceIntersectionIterator<K extends Comparable<K>, D extends CombinedValue<K>> extends AbstractIntersectionIterator<K, D>
+    {
+        private BounceIntersectionIterator(Builder.Statistics<K, D> statistics, PriorityQueue<RangeIterator<K, D>> ranges)
+        {
+            super(statistics, ranges);
+        }
+
+        protected D computeNext()
+        {
+            while (!ranges.isEmpty())
+            {
+                RangeIterator<K, D> head = ranges.poll();
+
+                // jump right to the beginning of the intersection or return next element
+                if (head.getCurrent().compareTo(getMinimum()) < 0)
+                    head.skipTo(getMinimum());
+
+                D candidate = head.hasNext() ? head.next() : null;
+                if (candidate == null || candidate.get().compareTo(getMaximum()) > 0)
+                {
+                    ranges.add(head);
+                    return endOfData();
+                }
+
+                List<RangeIterator<K, D>> processed = new ArrayList<>();
+
+                boolean intersectsAll = true, exhausted = false;
+                while (!ranges.isEmpty())
+                {
+                    RangeIterator<K, D> range = ranges.poll();
+
+                    // found a range which doesn't overlap with one (or possibly more) other range(s)
+                    if (!isOverlapping(head, range))
+                    {
+                        exhausted = true;
+                        intersectsAll = false;
+                        break;
+                    }
+
+                    D point = range.skipTo(candidate.get());
+
+                    if (point == null) // other range is exhausted
+                    {
+                        exhausted = true;
+                        intersectsAll = false;
+                        break;
+                    }
+
+                    processed.add(range);
+
+                    if (candidate.get().equals(point.get()))
+                    {
+                        candidate.merge(point);
+                        // advance skipped range to the next element if any
+                        Iterators.getNext(range, null);
+                    }
+                    else
+                    {
+                        intersectsAll = false;
+                        break;
+                    }
+                }
+
+                ranges.add(head);
+
+                for (RangeIterator<K, D> range : processed)
+                    ranges.add(range);
+
+                if (exhausted)
+                    return endOfData();
+
+                if (intersectsAll)
+                    return candidate;
+            }
+
+            return endOfData();
+        }
+
+        protected void performSkipTo(K nextToken)
+        {
+            List<RangeIterator<K, D>> skipped = new ArrayList<>();
+
+            while (!ranges.isEmpty())
+            {
+                RangeIterator<K, D> range = ranges.poll();
+                range.skipTo(nextToken);
+                skipped.add(range);
+            }
+
+            for (RangeIterator<K, D> range : skipped)
+                ranges.add(range);
+        }
+    }
+
+    /**
+     * Iterator which performs a linear scan over a primary range (the smallest of the ranges)
+     * and O(log(n)) lookup into secondary ranges using values from the primary iterator.
+     * This technique is efficient when one of the intersection ranges is smaller than others
+     * e.g. ratio 0.01d (default), in such situation scan + lookup is more efficient comparing
+     * to "bounce" merge because "bounce" distance is never going to be big.
+     *
+     * @param <K> The type used to sort ranges.
+     * @param <D> The container type which is going to be returned by {@link Iterator#next()}.
+     */
+    @VisibleForTesting
+    protected static class LookupIntersectionIterator<K extends Comparable<K>, D extends CombinedValue<K>> extends AbstractIntersectionIterator<K, D>
+    {
+        private final RangeIterator<K, D> smallestIterator;
+
+        private LookupIntersectionIterator(Builder.Statistics<K, D> statistics, PriorityQueue<RangeIterator<K, D>> ranges)
+        {
+            super(statistics, ranges);
+
+            smallestIterator = statistics.minRange;
+
+            if (smallestIterator.getCurrent().compareTo(getMinimum()) < 0)
+                smallestIterator.skipTo(getMinimum());
+        }
+
+        protected D computeNext()
+        {
+            while (smallestIterator.hasNext())
+            {
+                D candidate = smallestIterator.next();
+                K token = candidate.get();
+
+                boolean intersectsAll = true;
+                for (RangeIterator<K, D> range : ranges)
+                {
+                    // avoid checking against self, much cheaper than changing queue comparator
+                    // to compare based on the size and re-populating such queue.
+                    if (range.equals(smallestIterator))
+                        continue;
+
+                    // found a range which doesn't overlap with one (or possibly more) other range(s)
+                    if (!isOverlapping(smallestIterator, range))
+                        return endOfData();
+
+                    D point = range.skipTo(token);
+
+                    if (point == null) // one of the iterators is exhausted
+                        return endOfData();
+
+                    if (!point.get().equals(token))
+                    {
+                        intersectsAll = false;
+                        break;
+                    }
+
+                    candidate.merge(point);
+                }
+
+                if (intersectsAll)
+                    return candidate;
+            }
+
+            return endOfData();
+        }
+
+        protected void performSkipTo(K nextToken)
+        {
+            smallestIterator.skipTo(nextToken);
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/RangeIterator.java b/src/java/org/apache/cassandra/index/sasi/utils/RangeIterator.java
new file mode 100644
index 0000000..1b5aee4
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/RangeIterator.java

@@ -0,0 +1,279 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.io.Closeable;
+import java.util.Comparator;
+import java.util.List;
+import java.util.PriorityQueue;
+
+import com.google.common.annotations.VisibleForTesting;
+
+public abstract class RangeIterator<K extends Comparable<K>, T extends CombinedValue<K>> extends AbstractIterator<T> implements Closeable
+{
+    private final K min, max;
+    private final long count;
+    private K current;
+
+    protected RangeIterator(Builder.Statistics<K, T> statistics)
+    {
+        this(statistics.min, statistics.max, statistics.tokenCount);
+    }
+
+    public RangeIterator(RangeIterator<K, T> range)
+    {
+        this(range == null ? null : range.min, range == null ? null : range.max, range == null ? -1 : range.count);
+    }
+
+    public RangeIterator(K min, K max, long count)
+    {
+        this.min = min;
+        this.current = min;
+        this.max = max;
+        this.count = count;
+    }
+
+    public final K getMinimum()
+    {
+        return min;
+    }
+
+    public final K getCurrent()
+    {
+        return current;
+    }
+
+    public final K getMaximum()
+    {
+        return max;
+    }
+
+    public final long getCount()
+    {
+        return count;
+    }
+
+    /**
+     * When called, this iterators current position should
+     * be skipped forwards until finding either:
+     *   1) an element equal to or bigger than next
+     *   2) the end of the iterator
+     *
+     * @param nextToken value to skip the iterator forward until matching
+     *
+     * @return The next current token after the skip was performed
+     */
+    public final T skipTo(K nextToken)
+    {
+        if (min == null || max == null)
+            return endOfData();
+
+        if (current.compareTo(nextToken) >= 0)
+            return next == null ? recomputeNext() : next;
+
+        if (max.compareTo(nextToken) < 0)
+            return endOfData();
+
+        performSkipTo(nextToken);
+        return recomputeNext();
+    }
+
+    protected abstract void performSkipTo(K nextToken);
+
+    protected T recomputeNext()
+    {
+        return tryToComputeNext() ? peek() : endOfData();
+    }
+
+    protected boolean tryToComputeNext()
+    {
+        boolean hasNext = super.tryToComputeNext();
+        current = hasNext ? next.get() : getMaximum();
+        return hasNext;
+    }
+
+    public static abstract class Builder<K extends Comparable<K>, D extends CombinedValue<K>>
+    {
+        public enum IteratorType
+        {
+            UNION, INTERSECTION
+        }
+
+        @VisibleForTesting
+        protected final Statistics<K, D> statistics;
+
+        @VisibleForTesting
+        protected final PriorityQueue<RangeIterator<K, D>> ranges;
+
+        public Builder(IteratorType type)
+        {
+            statistics = new Statistics<>(type);
+            ranges = new PriorityQueue<>(16, (Comparator<RangeIterator<K, D>>) (a, b) -> a.getCurrent().compareTo(b.getCurrent()));
+        }
+
+        public K getMinimum()
+        {
+            return statistics.min;
+        }
+
+        public K getMaximum()
+        {
+            return statistics.max;
+        }
+
+        public long getTokenCount()
+        {
+            return statistics.tokenCount;
+        }
+
+        public int rangeCount()
+        {
+            return ranges.size();
+        }
+
+        public Builder<K, D> add(RangeIterator<K, D> range)
+        {
+            if (range == null || range.getMinimum() == null || range.getMaximum() == null)
+                return this;
+
+            ranges.add(range);
+            statistics.update(range);
+
+            return this;
+        }
+
+        public Builder<K, D> add(List<RangeIterator<K, D>> ranges)
+        {
+            if (ranges == null || ranges.isEmpty())
+                return this;
+
+            ranges.forEach(this::add);
+            return this;
+        }
+
+        public final RangeIterator<K, D> build()
+        {
+            switch (rangeCount())
+            {
+                case 0:
+                    return null;
+
+                case 1:
+                    return ranges.poll();
+
+                default:
+                    return buildIterator();
+            }
+        }
+
+        protected abstract RangeIterator<K, D> buildIterator();
+
+        public static class Statistics<K extends Comparable<K>, D extends CombinedValue<K>>
+        {
+            protected final IteratorType iteratorType;
+
+            protected K min, max;
+            protected long tokenCount;
+
+            // iterator with the least number of items
+            protected RangeIterator<K, D> minRange;
+            // iterator with the most number of items
+            protected RangeIterator<K, D> maxRange;
+
+            // tracks if all of the added ranges overlap, which is useful in case of intersection,
+            // as it gives direct answer as to such iterator is going to produce any results.
+            protected boolean isOverlapping = true;
+
+            public Statistics(IteratorType iteratorType)
+            {
+                this.iteratorType = iteratorType;
+            }
+
+            /**
+             * Update statistics information with the given range.
+             *
+             * Updates min/max of the combined range, token count and
+             * tracks range with the least/most number of tokens.
+             *
+             * @param range The range to update statistics with.
+             */
+            public void update(RangeIterator<K, D> range)
+            {
+                switch (iteratorType)
+                {
+                    case UNION:
+                        min = min == null || min.compareTo(range.getMinimum()) > 0 ? range.getMinimum() : min;
+                        max = max == null || max.compareTo(range.getMaximum()) < 0 ? range.getMaximum() : max;
+                        break;
+
+                    case INTERSECTION:
+                        // minimum of the intersection is the biggest minimum of individual iterators
+                        min = min == null || min.compareTo(range.getMinimum()) < 0 ? range.getMinimum() : min;
+                        // maximum of the intersection is the smallest maximum of individual iterators
+                        max = max == null || max.compareTo(range.getMaximum()) > 0 ? range.getMaximum() : max;
+                        break;
+
+                    default:
+                        throw new IllegalStateException("Unknown iterator type: " + iteratorType);
+                }
+
+                // check if new range is disjoint with already added ranges, which means that this intersection
+                // is not going to produce any results, so we can cleanup range storage and never added anything to it.
+                isOverlapping &= isOverlapping(min, max, range);
+
+                minRange = minRange == null ? range : min(minRange, range);
+                maxRange = maxRange == null ? range : max(maxRange, range);
+
+                tokenCount += range.getCount();
+
+            }
+
+            private RangeIterator<K, D> min(RangeIterator<K, D> a, RangeIterator<K, D> b)
+            {
+                return a.getCount() > b.getCount() ? b : a;
+            }
+
+            private RangeIterator<K, D> max(RangeIterator<K, D> a, RangeIterator<K, D> b)
+            {
+                return a.getCount() > b.getCount() ? a : b;
+            }
+
+            public boolean isDisjoint()
+            {
+                return !isOverlapping;
+            }
+
+            public double sizeRatio()
+            {
+                return minRange.getCount() * 1d / maxRange.getCount();
+            }
+        }
+    }
+
+    @VisibleForTesting
+    protected static <K extends Comparable<K>, D extends CombinedValue<K>> boolean isOverlapping(RangeIterator<K, D> a, RangeIterator<K, D> b)
+    {
+        return isOverlapping(a.getCurrent(), a.getMaximum(), b);
+    }
+
+    @VisibleForTesting
+    protected static <K extends Comparable<K>, D extends CombinedValue<K>> boolean isOverlapping(K min, K max, RangeIterator<K, D> b)
+    {
+        return min.compareTo(b.getMaximum()) <= 0 && b.getCurrent().compareTo(max) <= 0;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/RangeUnionIterator.java b/src/java/org/apache/cassandra/index/sasi/utils/RangeUnionIterator.java
new file mode 100644
index 0000000..4be460c
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/RangeUnionIterator.java

@@ -0,0 +1,159 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.io.IOException;
+import java.util.*;
+import java.util.stream.Collectors;
+
+import org.apache.cassandra.io.util.FileUtils;
+
+/**
+ * Range Union Iterator is used to return sorted stream of elements from multiple RangeIterator instances.
+ *
+ * PriorityQueue is used as a sorting mechanism for the ranges, where each computeNext() operation would poll
+ * from the queue (and push when done), which returns range that contains the smallest element, because
+ * sorting is done on the moving window of range iteration {@link RangeIterator#getCurrent()}. Once retrieved
+ * the smallest element (return candidate) is attempted to be merged with other ranges, because there could
+ * be equal elements in adjacent ranges, such ranges are poll'ed only if their {@link RangeIterator#getCurrent()}
+ * equals to the return candidate.
+ *
+ * @param <K> The type used to sort ranges.
+ * @param <D> The container type which is going to be returned by {@link Iterator#next()}.
+ */
+@SuppressWarnings("resource")
+public class RangeUnionIterator<K extends Comparable<K>, D extends CombinedValue<K>> extends RangeIterator<K, D>
+{
+    private final PriorityQueue<RangeIterator<K, D>> ranges;
+
+    private RangeUnionIterator(Builder.Statistics<K, D> statistics, PriorityQueue<RangeIterator<K, D>> ranges)
+    {
+        super(statistics);
+        this.ranges = ranges;
+    }
+
+    public D computeNext()
+    {
+        RangeIterator<K, D> head = null;
+
+        while (!ranges.isEmpty())
+        {
+            head = ranges.poll();
+            if (head.hasNext())
+                break;
+
+            FileUtils.closeQuietly(head);
+        }
+
+        if (head == null || !head.hasNext())
+            return endOfData();
+
+        D candidate = head.next();
+
+        List<RangeIterator<K, D>> processedRanges = new ArrayList<>();
+
+        if (head.hasNext())
+            processedRanges.add(head);
+        else
+            FileUtils.closeQuietly(head);
+
+        while (!ranges.isEmpty())
+        {
+            // peek here instead of poll is an optimization
+            // so we can re-insert less ranges back if candidate
+            // is less than head of the current range.
+            RangeIterator<K, D> range = ranges.peek();
+
+            int cmp = candidate.get().compareTo(range.getCurrent());
+
+            assert cmp <= 0;
+
+            if (cmp < 0)
+            {
+                break; // candidate is smaller than next token, return immediately
+            }
+            else if (cmp == 0)
+            {
+                candidate.merge(range.next()); // consume and merge
+
+                range = ranges.poll();
+                // re-prioritize changed range
+
+                if (range.hasNext())
+                    processedRanges.add(range);
+                else
+                    FileUtils.closeQuietly(range);
+            }
+        }
+
+        ranges.addAll(processedRanges);
+        return candidate;
+    }
+
+    protected void performSkipTo(K nextToken)
+    {
+        List<RangeIterator<K, D>> changedRanges = new ArrayList<>();
+
+        while (!ranges.isEmpty())
+        {
+            if (ranges.peek().getCurrent().compareTo(nextToken) >= 0)
+                break;
+
+            RangeIterator<K, D> head = ranges.poll();
+
+            if (head.getMaximum().compareTo(nextToken) >= 0)
+            {
+                head.skipTo(nextToken);
+                changedRanges.add(head);
+                continue;
+            }
+
+            FileUtils.closeQuietly(head);
+        }
+
+        ranges.addAll(changedRanges.stream().collect(Collectors.toList()));
+    }
+
+    public void close() throws IOException
+    {
+        ranges.forEach(FileUtils::closeQuietly);
+    }
+
+    public static <K extends Comparable<K>, D extends CombinedValue<K>> Builder<K, D> builder()
+    {
+        return new Builder<>();
+    }
+
+    public static <K extends Comparable<K>, D extends CombinedValue<K>> RangeIterator<K, D> build(List<RangeIterator<K, D>> tokens)
+    {
+        return new Builder<K, D>().add(tokens).build();
+    }
+
+    public static class Builder<K extends Comparable<K>, D extends CombinedValue<K>> extends RangeIterator.Builder<K, D>
+    {
+        public Builder()
+        {
+            super(IteratorType.UNION);
+        }
+
+        protected RangeIterator<K, D> buildIterator()
+        {
+            return new RangeUnionIterator<>(statistics, ranges);
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/TypeUtil.java b/src/java/org/apache/cassandra/index/sasi/utils/TypeUtil.java
new file mode 100644
index 0000000..8b38530
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/TypeUtil.java

@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.db.marshal.*;
+import org.apache.cassandra.serializers.MarshalException;
+
+public class TypeUtil
+{
+    public static boolean isValid(ByteBuffer term, AbstractType<?> validator)
+    {
+        try
+        {
+            validator.validate(term);
+            return true;
+        }
+        catch (MarshalException e)
+        {
+            return false;
+        }
+    }
+
+    public static ByteBuffer tryUpcast(ByteBuffer term, AbstractType<?> validator)
+    {
+        if (term.remaining() == 0)
+            return null;
+
+        try
+        {
+            if (validator instanceof Int32Type && term.remaining() == 2)
+            {
+                return Int32Type.instance.decompose((int) term.getShort(term.position()));
+            }
+            else if (validator instanceof LongType)
+            {
+                long upcastToken;
+
+                switch (term.remaining())
+                {
+                    case 2:
+                        upcastToken = (long) term.getShort(term.position());
+                        break;
+
+                    case 4:
+                        upcastToken = (long) Int32Type.instance.compose(term);
+                        break;
+
+                    default:
+                        upcastToken = Long.valueOf(UTF8Type.instance.getString(term));
+                }
+
+                return LongType.instance.decompose(upcastToken);
+            }
+            else if (validator instanceof DoubleType && term.remaining() == 4)
+            {
+                return DoubleType.instance.decompose((double) FloatType.instance.compose(term));
+            }
+
+            // maybe it was a string after all
+            return validator.fromString(UTF8Type.instance.getString(term));
+        }
+        catch (Exception e)
+        {
+            return null;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/trie/AbstractPatriciaTrie.java b/src/java/org/apache/cassandra/index/sasi/utils/trie/AbstractPatriciaTrie.java
new file mode 100644
index 0000000..b359416
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/trie/AbstractPatriciaTrie.java

@@ -0,0 +1,1151 @@
+/*
+ * Copyright 2005-2010 Roger Kapsi, Sam Berlin
+ *
+ *   Licensed under the Apache License, Version 2.0 (the "License");
+ *   you may not use this file except in compliance with the License.
+ *   You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+/**
+ * This class is taken from https://github.com/rkapsi/patricia-trie (v0.6), and slightly modified
+ * to correspond to Cassandra code style, as the only Patricia Trie implementation,
+ * which supports pluggable key comparators (e.g. commons-collections PatriciaTrie (which is based
+ * on rkapsi/patricia-trie project) only supports String keys)
+ * but unfortunately is not deployed to the maven central as a downloadable artifact.
+ */
+
+package org.apache.cassandra.index.sasi.utils.trie;
+
+import java.util.AbstractCollection;
+import java.util.AbstractSet;
+import java.util.Collection;
+import java.util.ConcurrentModificationException;
+import java.util.Iterator;
+import java.util.Map;
+import java.util.NoSuchElementException;
+import java.util.Set;
+
+import org.apache.cassandra.index.sasi.utils.trie.Cursor.Decision;
+
+/**
+ * This class implements the base PATRICIA algorithm and everything that
+ * is related to the {@link Map} interface.
+ */
+abstract class AbstractPatriciaTrie<K, V> extends AbstractTrie<K, V>
+{
+    private static final long serialVersionUID = -2303909182832019043L;
+
+    /**
+     * The root node of the {@link Trie}. 
+     */
+    final TrieEntry<K, V> root = new TrieEntry<>(null, null, -1);
+    
+    /**
+     * Each of these fields are initialized to contain an instance of the
+     * appropriate view the first time this view is requested. The views are
+     * stateless, so there's no reason to create more than one of each.
+     */
+    private transient volatile Set<K> keySet;
+    private transient volatile Collection<V> values;
+    private transient volatile Set<Map.Entry<K,V>> entrySet;
+    
+    /**
+     * The current size of the {@link Trie}
+     */
+    private int size = 0;
+    
+    /**
+     * The number of times this {@link Trie} has been modified.
+     * It's used to detect concurrent modifications and fail-fast
+     * the {@link Iterator}s.
+     */
+    transient int modCount = 0;
+    
+    public AbstractPatriciaTrie(KeyAnalyzer<? super K> keyAnalyzer)
+    {
+        super(keyAnalyzer);
+    }
+    
+    public AbstractPatriciaTrie(KeyAnalyzer<? super K> keyAnalyzer, Map<? extends K, ? extends V> m)
+    {
+        super(keyAnalyzer);
+        putAll(m);
+    }
+    
+    @Override
+    public void clear()
+    {
+        root.key = null;
+        root.bitIndex = -1;
+        root.value = null;
+        
+        root.parent = null;
+        root.left = root;
+        root.right = null;
+        root.predecessor = root;
+        
+        size = 0;
+        incrementModCount();
+    }
+    
+    @Override
+    public int size()
+    {
+        return size;
+    }
+   
+    /**
+     * A helper method to increment the {@link Trie} size
+     * and the modification counter.
+     */
+    void incrementSize()
+    {
+        size++;
+        incrementModCount();
+    }
+    
+    /**
+     * A helper method to decrement the {@link Trie} size
+     * and increment the modification counter.
+     */
+    void decrementSize()
+    {
+        size--;
+        incrementModCount();
+    }
+    
+    /**
+     * A helper method to increment the modification counter.
+     */
+    private void incrementModCount()
+    {
+        ++modCount;
+    }
+    
+    @Override
+    public V put(K key, V value)
+    {
+        if (key == null)
+            throw new NullPointerException("Key cannot be null");
+        
+        int lengthInBits = lengthInBits(key);
+        
+        // The only place to store a key with a length
+        // of zero bits is the root node
+        if (lengthInBits == 0)
+        {
+            if (root.isEmpty())
+                incrementSize();
+            else
+                incrementModCount();
+
+            return root.setKeyValue(key, value);
+        }
+        
+        TrieEntry<K, V> found = getNearestEntryForKey(key);
+        if (compareKeys(key, found.key))
+        {
+            if (found.isEmpty()) // <- must be the root
+                incrementSize();
+            else
+                incrementModCount();
+
+            return found.setKeyValue(key, value);
+        }
+        
+        int bitIndex = bitIndex(key, found.key);
+        if (!Tries.isOutOfBoundsIndex(bitIndex))
+        {
+            if (Tries.isValidBitIndex(bitIndex)) // in 99.999...9% the case
+            {
+                /* NEW KEY+VALUE TUPLE */
+                TrieEntry<K, V> t = new TrieEntry<>(key, value, bitIndex);
+                addEntry(t);
+                incrementSize();
+                return null;
+            }
+            else if (Tries.isNullBitKey(bitIndex))
+            {
+                // A bits of the Key are zero. The only place to
+                // store such a Key is the root Node!
+                
+                /* NULL BIT KEY */
+                if (root.isEmpty())
+                    incrementSize();
+                else
+                    incrementModCount();
+
+                return root.setKeyValue(key, value);
+                
+            }
+            else if (Tries.isEqualBitKey(bitIndex))
+            {
+                // This is a very special and rare case.
+                
+                /* REPLACE OLD KEY+VALUE */
+                if (found != root) {
+                    incrementModCount();
+                    return found.setKeyValue(key, value);
+                }
+            }
+        }
+        
+        throw new IndexOutOfBoundsException("Failed to put: " 
+                + key + " -> " + value + ", " + bitIndex);
+    }
+    
+    /**
+     * Adds the given {@link TrieEntry} to the {@link Trie}
+     */
+    TrieEntry<K, V> addEntry(TrieEntry<K, V> entry)
+    {
+        TrieEntry<K, V> current = root.left;
+        TrieEntry<K, V> path = root;
+
+        while(true)
+        {
+            if (current.bitIndex >= entry.bitIndex || current.bitIndex <= path.bitIndex)
+            {
+                entry.predecessor = entry;
+                
+                if (!isBitSet(entry.key, entry.bitIndex))
+                {
+                    entry.left = entry;
+                    entry.right = current;
+                }
+                else
+                {
+                    entry.left = current;
+                    entry.right = entry;
+                }
+               
+                entry.parent = path;
+                if (current.bitIndex >= entry.bitIndex)
+                    current.parent = entry;
+                
+                // if we inserted an uplink, set the predecessor on it
+                if (current.bitIndex <= path.bitIndex)
+                    current.predecessor = entry;
+         
+                if (path == root || !isBitSet(entry.key, path.bitIndex))
+                    path.left = entry;
+                else
+                    path.right = entry;
+                
+                return entry;
+            }
+                
+            path = current;
+            
+            current = !isBitSet(entry.key, current.bitIndex)
+                       ? current.left : current.right;
+        }
+    }
+    
+    @Override
+    public V get(Object k)
+    {
+        TrieEntry<K, V> entry = getEntry(k);
+        return entry != null ? entry.getValue() : null;
+    }
+
+    /**
+     * Returns the entry associated with the specified key in the
+     * AbstractPatriciaTrie.  Returns null if the map contains no mapping
+     * for this key.
+     * 
+     * This may throw ClassCastException if the object is not of type K.
+     */
+    TrieEntry<K,V> getEntry(Object k)
+    {
+        K key = Tries.cast(k);
+        if (key == null)
+            return null;
+        
+        TrieEntry<K,V> entry = getNearestEntryForKey(key);
+        return !entry.isEmpty() && compareKeys(key, entry.key) ? entry : null;
+    }
+    
+    @Override
+    public Map.Entry<K, V> select(K key)
+    {
+        Reference<Map.Entry<K, V>> reference = new Reference<>();
+        return !selectR(root.left, -1, key, reference) ? reference.get() : null;
+    }
+    
+    @Override
+    public Map.Entry<K,V> select(K key, Cursor<? super K, ? super V> cursor)
+    {
+        Reference<Map.Entry<K, V>> reference = new Reference<>();
+        selectR(root.left, -1, key, cursor, reference);
+        return reference.get();
+    }
+
+    /**
+     * This is equivalent to the other {@link #selectR(TrieEntry, int,
+     * K, Cursor, Reference)} method but without its overhead
+     * because we're selecting only one best matching Entry from the
+     * {@link Trie}.
+     */
+    private boolean selectR(TrieEntry<K, V> h, int bitIndex, final K key, final Reference<Map.Entry<K, V>> reference)
+    {
+        if (h.bitIndex <= bitIndex)
+        {
+            // If we hit the root Node and it is empty
+            // we have to look for an alternative best
+            // matching node.
+            if (!h.isEmpty())
+            {
+                reference.set(h);
+                return false;
+            }
+            return true;
+        }
+
+        if (!isBitSet(key, h.bitIndex))
+        {
+            if (selectR(h.left, h.bitIndex, key, reference))
+            {
+                return selectR(h.right, h.bitIndex, key, reference);
+            }
+        }
+        else
+        {
+            if (selectR(h.right, h.bitIndex, key, reference))
+            {
+                return selectR(h.left, h.bitIndex, key, reference);
+            }
+        }
+
+        return false;
+    }
+    
+    /**
+     * 
+     */
+    private boolean selectR(TrieEntry<K,V> h, int bitIndex, 
+                            final K key, final Cursor<? super K, ? super V> cursor,
+                            final Reference<Map.Entry<K, V>> reference)
+    {
+        if (h.bitIndex <= bitIndex)
+        {
+            if (!h.isEmpty())
+            {
+                Decision decision = cursor.select(h);
+                switch(decision)
+                {
+                    case REMOVE:
+                        throw new UnsupportedOperationException("Cannot remove during select");
+
+                    case EXIT:
+                        reference.set(h);
+                        return false; // exit
+
+                    case REMOVE_AND_EXIT:
+                        TrieEntry<K, V> entry = new TrieEntry<>(h.getKey(), h.getValue(), -1);
+                        reference.set(entry);
+                        removeEntry(h);
+                        return false;
+
+                    case CONTINUE:
+                        // fall through.
+                }
+            }
+
+            return true; // continue
+        }
+
+        if (!isBitSet(key, h.bitIndex))
+        {
+            if (selectR(h.left, h.bitIndex, key, cursor, reference))
+            {
+                return selectR(h.right, h.bitIndex, key, cursor, reference);
+            }
+        }
+        else
+        {
+            if (selectR(h.right, h.bitIndex, key, cursor, reference))
+            {
+                return selectR(h.left, h.bitIndex, key, cursor, reference);
+            }
+        }
+        
+        return false;
+    }
+    
+    @Override
+    public Map.Entry<K, V> traverse(Cursor<? super K, ? super V> cursor)
+    {
+        TrieEntry<K, V> entry = nextEntry(null);
+        while (entry != null)
+        {
+            TrieEntry<K, V> current = entry;
+            
+            Decision decision = cursor.select(current);
+            entry = nextEntry(current);
+            
+            switch(decision)
+            {
+                case EXIT:
+                    return current;
+
+                case REMOVE:
+                    removeEntry(current);
+                    break; // out of switch, stay in while loop
+
+                case REMOVE_AND_EXIT:
+                    Map.Entry<K, V> value = new TrieEntry<>(current.getKey(), current.getValue(), -1);
+                    removeEntry(current);
+                    return value;
+
+                case CONTINUE: // do nothing.
+            }
+        }
+        
+        return null;
+    }
+    
+    @Override
+    public boolean containsKey(Object k)
+    {
+        if (k == null)
+            return false;
+        
+        K key = Tries.cast(k);
+        TrieEntry<K, V> entry = getNearestEntryForKey(key);
+        return !entry.isEmpty() && compareKeys(key, entry.key);
+    }
+    
+    @Override
+    public Set<Map.Entry<K,V>> entrySet()
+    {
+        if (entrySet == null)
+            entrySet = new EntrySet();
+
+        return entrySet;
+    }
+    
+    @Override
+    public Set<K> keySet()
+    {
+        if (keySet == null)
+            keySet = new KeySet();
+        return keySet;
+    }
+    
+    @Override
+    public Collection<V> values()
+    {
+        if (values == null)
+            values = new Values();
+        return values;
+    }
+    
+    /**
+     * {@inheritDoc}
+     * 
+     * @throws ClassCastException if provided key is of an incompatible type 
+     */
+    @Override
+    public V remove(Object k)
+    {
+        if (k == null)
+            return null;
+        
+        K key = Tries.cast(k);
+        TrieEntry<K, V> current = root.left;
+        TrieEntry<K, V> path = root;
+        while (true)
+        {
+            if (current.bitIndex <= path.bitIndex)
+            {
+                if (!current.isEmpty() && compareKeys(key, current.key))
+                {
+                    return removeEntry(current);
+                }
+                else
+                {
+                    return null;
+                }
+            }
+            
+            path = current;
+            current = !isBitSet(key, current.bitIndex) ? current.left : current.right;
+        }
+    }
+    
+    /**
+     * Returns the nearest entry for a given key.  This is useful
+     * for finding knowing if a given key exists (and finding the value
+     * for it), or for inserting the key.
+     * 
+     * The actual get implementation. This is very similar to
+     * selectR but with the exception that it might return the
+     * root Entry even if it's empty.
+     */
+    TrieEntry<K, V> getNearestEntryForKey(K key)
+    {
+        TrieEntry<K, V> current = root.left;
+        TrieEntry<K, V> path = root;
+
+        while(true)
+        {
+            if (current.bitIndex <= path.bitIndex)
+                return current;
+            
+            path = current;
+            current = !isBitSet(key, current.bitIndex) ? current.left : current.right;
+        }
+    }
+    
+    /**
+     * Removes a single entry from the {@link Trie}.
+     * 
+     * If we found a Key (Entry h) then figure out if it's
+     * an internal (hard to remove) or external Entry (easy 
+     * to remove)
+     */
+    V removeEntry(TrieEntry<K, V> h)
+    {
+        if (h != root)
+        {
+            if (h.isInternalNode())
+            {
+                removeInternalEntry(h);
+            }
+            else
+            {
+                removeExternalEntry(h);
+            }
+        }
+        
+        decrementSize();
+        return h.setKeyValue(null, null);
+    }
+    
+    /**
+     * Removes an external entry from the {@link Trie}.
+     * 
+     * If it's an external Entry then just remove it.
+     * This is very easy and straight forward.
+     */
+    private void removeExternalEntry(TrieEntry<K, V> h)
+    {
+        if (h == root)
+        {
+            throw new IllegalArgumentException("Cannot delete root Entry!");
+        }
+        else if (!h.isExternalNode())
+        {
+            throw new IllegalArgumentException(h + " is not an external Entry!");
+        } 
+        
+        TrieEntry<K, V> parent = h.parent;
+        TrieEntry<K, V> child = (h.left == h) ? h.right : h.left;
+        
+        if (parent.left == h)
+        {
+            parent.left = child;
+        }
+        else
+        {
+            parent.right = child;
+        }
+        
+        // either the parent is changing, or the predecessor is changing.
+        if (child.bitIndex > parent.bitIndex)
+        {
+            child.parent = parent;
+        }
+        else
+        {
+            child.predecessor = parent;
+        }
+        
+    }
+    
+    /**
+     * Removes an internal entry from the {@link Trie}.
+     * 
+     * If it's an internal Entry then "good luck" with understanding
+     * this code. The Idea is essentially that Entry p takes Entry h's
+     * place in the trie which requires some re-wiring.
+     */
+    private void removeInternalEntry(TrieEntry<K, V> h)
+    {
+        if (h == root)
+        {
+            throw new IllegalArgumentException("Cannot delete root Entry!");
+        }
+        else if (!h.isInternalNode())
+        {
+            throw new IllegalArgumentException(h + " is not an internal Entry!");
+        } 
+        
+        TrieEntry<K, V> p = h.predecessor;
+        
+        // Set P's bitIndex
+        p.bitIndex = h.bitIndex;
+        
+        // Fix P's parent, predecessor and child Nodes
+        {
+            TrieEntry<K, V> parent = p.parent;
+            TrieEntry<K, V> child = (p.left == h) ? p.right : p.left;
+            
+            // if it was looping to itself previously,
+            // it will now be pointed from it's parent
+            // (if we aren't removing it's parent --
+            //  in that case, it remains looping to itself).
+            // otherwise, it will continue to have the same
+            // predecessor.
+            if (p.predecessor == p && p.parent != h)
+                p.predecessor = p.parent;
+            
+            if (parent.left == p)
+            {
+                parent.left = child;
+            }
+            else
+            {
+                parent.right = child;
+            }
+            
+            if (child.bitIndex > parent.bitIndex)
+            {
+                child.parent = parent;
+            }
+        }
+        
+        // Fix H's parent and child Nodes
+        {         
+            // If H is a parent of its left and right child 
+            // then change them to P
+            if (h.left.parent == h)
+                h.left.parent = p;
+
+            if (h.right.parent == h)
+                h.right.parent = p;
+            
+            // Change H's parent
+            if (h.parent.left == h)
+            {
+                h.parent.left = p;
+            }
+            else
+            {
+                h.parent.right = p;
+            }
+        }
+        
+        // Copy the remaining fields from H to P
+        //p.bitIndex = h.bitIndex;
+        p.parent = h.parent;
+        p.left = h.left;
+        p.right = h.right;
+        
+        // Make sure that if h was pointing to any uplinks,
+        // p now points to them.
+        if (isValidUplink(p.left, p))
+            p.left.predecessor = p;
+        
+        if (isValidUplink(p.right, p))
+            p.right.predecessor = p;
+    }
+    
+    /**
+     * Returns the entry lexicographically after the given entry.
+     * If the given entry is null, returns the first node.
+     */
+    TrieEntry<K, V> nextEntry(TrieEntry<K, V> node)
+    {
+        return (node == null) ? firstEntry() : nextEntryImpl(node.predecessor, node, null);
+    }
+    
+    /**
+     * Scans for the next node, starting at the specified point, and using 'previous'
+     * as a hint that the last node we returned was 'previous' (so we know not to return
+     * it again).  If 'tree' is non-null, this will limit the search to the given tree.
+     * 
+     * The basic premise is that each iteration can follow the following steps:
+     * 
+     * 1) Scan all the way to the left.
+     *   a) If we already started from this node last time, proceed to Step 2.
+     *   b) If a valid uplink is found, use it.
+     *   c) If the result is an empty node (root not set), break the scan.
+     *   d) If we already returned the left node, break the scan.
+     *   
+     * 2) Check the right.
+     *   a) If we already returned the right node, proceed to Step 3.
+     *   b) If it is a valid uplink, use it.
+     *   c) Do Step 1 from the right node.
+     *   
+     * 3) Back up through the parents until we encounter find a parent
+     *    that we're not the right child of.
+     *    
+     * 4) If there's no right child of that parent, the iteration is finished.
+     *    Otherwise continue to Step 5.
+     * 
+     * 5) Check to see if the right child is a valid uplink.
+     *    a) If we already returned that child, proceed to Step 6.
+     *       Otherwise, use it.
+     *    
+     * 6) If the right child of the parent is the parent itself, we've
+     *    already found & returned the end of the Trie, so exit.
+     *    
+     * 7) Do Step 1 on the parent's right child.
+     */
+    TrieEntry<K, V> nextEntryImpl(TrieEntry<K, V> start, TrieEntry<K, V> previous, TrieEntry<K, V> tree)
+    {
+        TrieEntry<K, V> current = start;
+
+        // Only look at the left if this was a recursive or
+        // the first check, otherwise we know we've already looked
+        // at the left.
+        if (previous == null || start != previous.predecessor)
+        {
+            while (!current.left.isEmpty())
+            {
+                // stop traversing if we've already
+                // returned the left of this node.
+                if (previous == current.left)
+                    break;
+                
+                if (isValidUplink(current.left, current))
+                    return current.left;
+                
+                current = current.left;
+            }
+        }
+        
+        // If there's no data at all, exit.
+        if (current.isEmpty())
+            return null;
+        
+        // If we've already returned the left,
+        // and the immediate right is null,
+        // there's only one entry in the Trie
+        // which is stored at the root.
+        //
+        //  / ("")   <-- root
+        //  \_/  \
+        //       null <-- 'current'
+        //
+        if (current.right == null)
+            return null;
+        
+        // If nothing valid on the left, try the right.
+        if (previous != current.right)
+        {
+            // See if it immediately is valid.
+            if (isValidUplink(current.right, current))
+                return current.right;
+            
+            // Must search on the right's side if it wasn't initially valid.
+            return nextEntryImpl(current.right, previous, tree);
+        }
+        
+        // Neither left nor right are valid, find the first parent
+        // whose child did not come from the right & traverse it.
+        while (current == current.parent.right)
+        {
+            // If we're going to traverse to above the subtree, stop.
+            if (current == tree)
+                return null;
+            
+            current = current.parent;
+        }
+
+        // If we're on the top of the subtree, we can't go any higher.
+        if (current == tree)
+            return null;
+        
+        // If there's no right, the parent must be root, so we're done.
+        if (current.parent.right == null)
+            return null;
+        
+        // If the parent's right points to itself, we've found one.
+        if (previous != current.parent.right && isValidUplink(current.parent.right, current.parent))
+            return current.parent.right;
+        
+        // If the parent's right is itself, there can't be any more nodes.
+        if (current.parent.right == current.parent)
+            return null;
+        
+        // We need to traverse down the parent's right's path.
+        return nextEntryImpl(current.parent.right, previous, tree);
+    }
+    
+    /**
+     * Returns the first entry the {@link Trie} is storing.
+     * 
+     * This is implemented by going always to the left until
+     * we encounter a valid uplink. That uplink is the first key.
+     */
+    TrieEntry<K, V> firstEntry()
+    {
+        // if Trie is empty, no first node.
+        return isEmpty() ? null : followLeft(root);
+    }
+    
+    /** 
+     * Goes left through the tree until it finds a valid node. 
+     */
+    TrieEntry<K, V> followLeft(TrieEntry<K, V> node)
+    {
+        while(true)
+        {
+            TrieEntry<K, V> child = node.left;
+            // if we hit root and it didn't have a node, go right instead.
+            if (child.isEmpty())
+                child = node.right;
+            
+            if (child.bitIndex <= node.bitIndex)
+                return child;
+            
+            node = child;
+        }
+    }
+    
+    /** 
+     * Returns true if 'next' is a valid uplink coming from 'from'. 
+     */
+    static boolean isValidUplink(TrieEntry<?, ?> next, TrieEntry<?, ?> from)
+    {
+        return next != null && next.bitIndex <= from.bitIndex && !next.isEmpty();
+    }
+    
+    /**
+     * A {@link Reference} allows us to return something through a Method's 
+     * argument list. An alternative would be to an Array with a length of 
+     * one (1) but that leads to compiler warnings. Computationally and memory
+     * wise there's no difference (except for the need to load the 
+     * {@link Reference} Class but that happens only once).
+     */
+    private static class Reference<E>
+    {
+        
+        private E item;
+        
+        public void set(E item)
+        {
+            this.item = item;
+        }
+        
+        public E get()
+        {
+            return item;
+        }
+    }
+    
+    /**
+     *  A {@link Trie} is a set of {@link TrieEntry} nodes
+     */
+    static class TrieEntry<K,V> extends BasicEntry<K, V>
+    {
+        
+        private static final long serialVersionUID = 4596023148184140013L;
+        
+        /** The index this entry is comparing. */
+        protected int bitIndex;
+        
+        /** The parent of this entry. */
+        protected TrieEntry<K,V> parent;
+        
+        /** The left child of this entry. */
+        protected TrieEntry<K,V> left;
+        
+        /** The right child of this entry. */
+        protected TrieEntry<K,V> right;
+        
+        /** The entry who uplinks to this entry. */ 
+        protected TrieEntry<K,V> predecessor;
+        
+        public TrieEntry(K key, V value, int bitIndex)
+        {
+            super(key, value);
+            
+            this.bitIndex = bitIndex;
+            
+            this.parent = null;
+            this.left = this;
+            this.right = null;
+            this.predecessor = this;
+        }
+        
+        /**
+         * Whether or not the entry is storing a key.
+         * Only the root can potentially be empty, all other
+         * nodes must have a key.
+         */
+        public boolean isEmpty()
+        {
+            return key == null;
+        }
+        
+        /** 
+         * Neither the left nor right child is a loopback 
+         */
+        public boolean isInternalNode()
+        {
+            return left != this && right != this;
+        }
+        
+        /** 
+         * Either the left or right child is a loopback 
+         */
+        public boolean isExternalNode()
+        {
+            return !isInternalNode();
+        }
+    }
+    
+
+    /**
+     * This is a entry set view of the {@link Trie} as returned 
+     * by {@link Map#entrySet()}
+     */
+    private class EntrySet extends AbstractSet<Map.Entry<K,V>>
+    {
+        @Override
+        public Iterator<Map.Entry<K,V>> iterator()
+        {
+            return new EntryIterator();
+        }
+        
+        @Override
+        public boolean contains(Object o)
+        {
+            if (!(o instanceof Map.Entry))
+                return false;
+            
+            TrieEntry<K,V> candidate = getEntry(((Map.Entry<?, ?>)o).getKey());
+            return candidate != null && candidate.equals(o);
+        }
+        
+        @Override
+        public boolean remove(Object o)
+        {
+            int size = size();
+            AbstractPatriciaTrie.this.remove(o);
+            return size != size();
+        }
+        
+        @Override
+        public int size()
+        {
+            return AbstractPatriciaTrie.this.size();
+        }
+        
+        @Override
+        public void clear()
+        {
+            AbstractPatriciaTrie.this.clear();
+        }
+        
+        /**
+         * An {@link Iterator} that returns {@link Entry} Objects
+         */
+        private class EntryIterator extends TrieIterator<Map.Entry<K,V>>
+        {
+            @Override
+            public Map.Entry<K,V> next()
+            {
+                return nextEntry();
+            }
+        }
+    }
+    
+    /**
+     * This is a key set view of the {@link Trie} as returned 
+     * by {@link Map#keySet()}
+     */
+    private class KeySet extends AbstractSet<K>
+    {
+        @Override
+        public Iterator<K> iterator()
+        {
+            return new KeyIterator();
+        }
+        
+        @Override
+        public int size()
+        {
+            return AbstractPatriciaTrie.this.size();
+        }
+        
+        @Override
+        public boolean contains(Object o)
+        {
+            return containsKey(o);
+        }
+        
+        @Override
+        public boolean remove(Object o)
+        {
+            int size = size();
+            AbstractPatriciaTrie.this.remove(o);
+            return size != size();
+        }
+        
+        @Override
+        public void clear()
+        {
+            AbstractPatriciaTrie.this.clear();
+        }
+        
+        /**
+         * An {@link Iterator} that returns Key Objects
+         */
+        private class KeyIterator extends TrieIterator<K>
+        {
+            @Override
+            public K next()
+            {
+                return nextEntry().getKey();
+            }
+        }
+    }
+    
+    /**
+     * This is a value view of the {@link Trie} as returned 
+     * by {@link Map#values()}
+     */
+    private class Values extends AbstractCollection<V>
+    {
+        @Override
+        public Iterator<V> iterator()
+        {
+            return new ValueIterator();
+        }
+        
+        @Override
+        public int size()
+        {
+            return AbstractPatriciaTrie.this.size();
+        }
+        
+        @Override
+        public boolean contains(Object o)
+        {
+            return containsValue(o);
+        }
+        
+        @Override
+        public void clear()
+        {
+            AbstractPatriciaTrie.this.clear();
+        }
+        
+        @Override
+        public boolean remove(Object o)
+        {
+            for (Iterator<V> it = iterator(); it.hasNext(); )
+            {
+                V value = it.next();
+                if (Tries.areEqual(value, o))
+                {
+                    it.remove();
+                    return true;
+                }
+            }
+            return false;
+        }
+        
+        /**
+         * An {@link Iterator} that returns Value Objects
+         */
+        private class ValueIterator extends TrieIterator<V>
+        {
+            @Override
+            public V next()
+            {
+                return nextEntry().getValue();
+            }
+        }
+    }
+    
+    /** 
+     * An iterator for the entries. 
+     */
+    abstract class TrieIterator<E> implements Iterator<E>
+    {
+        /**
+         * For fast-fail
+         */
+        protected int expectedModCount = AbstractPatriciaTrie.this.modCount;
+        
+        protected TrieEntry<K, V> next; // the next node to return
+        protected TrieEntry<K, V> current; // the current entry we're on
+        
+        /**
+         * Starts iteration from the root
+         */
+        protected TrieIterator()
+        {
+            next = AbstractPatriciaTrie.this.nextEntry(null);
+        }
+        
+        /**
+         * Starts iteration at the given entry
+         */
+        protected TrieIterator(TrieEntry<K, V> firstEntry)
+        {
+            next = firstEntry;
+        }
+        
+        /**
+         * Returns the next {@link TrieEntry}
+         */
+        protected TrieEntry<K,V> nextEntry()
+        {
+            if (expectedModCount != AbstractPatriciaTrie.this.modCount)
+                throw new ConcurrentModificationException();
+            
+            TrieEntry<K,V> e = next;
+            if (e == null)
+                throw new NoSuchElementException();
+            
+            next = findNext(e);
+            current = e;
+            return e;
+        }
+        
+        /**
+         * @see PatriciaTrie#nextEntry(TrieEntry)
+         */
+        protected TrieEntry<K, V> findNext(TrieEntry<K, V> prior)
+        {
+            return AbstractPatriciaTrie.this.nextEntry(prior);
+        }
+        
+        @Override
+        public boolean hasNext()
+        {
+            return next != null;
+        }
+        
+        @Override
+        public void remove()
+        {
+            if (current == null)
+                throw new IllegalStateException();
+            
+            if (expectedModCount != AbstractPatriciaTrie.this.modCount)
+                throw new ConcurrentModificationException();
+            
+            TrieEntry<K, V> node = current;
+            current = null;
+            AbstractPatriciaTrie.this.removeEntry(node);
+            
+            expectedModCount = AbstractPatriciaTrie.this.modCount;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/trie/AbstractTrie.java b/src/java/org/apache/cassandra/index/sasi/utils/trie/AbstractTrie.java
new file mode 100644
index 0000000..0bf9c20
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/trie/AbstractTrie.java

@@ -0,0 +1,230 @@
+/*
+ * Copyright 2005-2010 Roger Kapsi, Sam Berlin
+ *
+ *   Licensed under the Apache License, Version 2.0 (the "License");
+ *   you may not use this file except in compliance with the License.
+ *   You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.cassandra.index.sasi.utils.trie;
+
+import java.io.Serializable;
+import java.util.AbstractMap;
+import java.util.Map;
+
+/**
+ * This class is taken from https://github.com/rkapsi/patricia-trie (v0.6), and slightly modified
+ * to correspond to Cassandra code style, as the only Patricia Trie implementation,
+ * which supports pluggable key comparators (e.g. commons-collections PatriciaTrie (which is based
+ * on rkapsi/patricia-trie project) only supports String keys)
+ * but unfortunately is not deployed to the maven central as a downloadable artifact.
+ */
+
+/**
+ * This class provides some basic {@link Trie} functionality and 
+ * utility methods for actual {@link Trie} implementations.
+ */
+abstract class AbstractTrie<K, V> extends AbstractMap<K, V> implements Serializable, Trie<K, V>
+{
+    private static final long serialVersionUID = -6358111100045408883L;
+    
+    /**
+     * The {@link KeyAnalyzer} that's being used to build the 
+     * PATRICIA {@link Trie}
+     */
+    protected final KeyAnalyzer<? super K> keyAnalyzer;
+    
+    /** 
+     * Constructs a new {@link Trie} using the given {@link KeyAnalyzer} 
+     */
+    public AbstractTrie(KeyAnalyzer<? super K> keyAnalyzer)
+    {
+        this.keyAnalyzer = Tries.notNull(keyAnalyzer, "keyAnalyzer");
+    }
+    
+    @Override
+    public K selectKey(K key)
+    {
+        Map.Entry<K, V> entry = select(key);
+        return entry != null ? entry.getKey() : null;
+    }
+    
+    @Override
+    public V selectValue(K key)
+    {
+        Map.Entry<K, V> entry = select(key);
+        return entry != null ? entry.getValue() : null;
+    }
+        
+    @Override
+    public String toString()
+    {
+        StringBuilder buffer = new StringBuilder();
+        buffer.append("Trie[").append(size()).append("]={\n");
+        for (Map.Entry<K, V> entry : entrySet()) {
+            buffer.append("  ").append(entry).append("\n");
+        }
+        buffer.append("}\n");
+        return buffer.toString();
+    }
+    
+    /**
+     * Returns the length of the given key in bits
+     * 
+     * @see KeyAnalyzer#lengthInBits(Object)
+     */
+    final int lengthInBits(K key)
+    {
+        return key == null ? 0 : keyAnalyzer.lengthInBits(key);
+    }
+    
+    /**
+     * Returns whether or not the given bit on the 
+     * key is set or false if the key is null.
+     * 
+     * @see KeyAnalyzer#isBitSet(Object, int)
+     */
+    final boolean isBitSet(K key, int bitIndex)
+    {
+        return key != null && keyAnalyzer.isBitSet(key, bitIndex);
+    }
+    
+    /**
+     * Utility method for calling {@link KeyAnalyzer#bitIndex(Object, Object)}
+     */
+    final int bitIndex(K key, K otherKey)
+    {
+        if (key != null && otherKey != null)
+        {
+            return keyAnalyzer.bitIndex(key, otherKey);            
+        }
+        else if (key != null)
+        {
+            return bitIndex(key);
+        }
+        else if (otherKey != null)
+        {
+            return bitIndex(otherKey);
+        }
+        
+        return KeyAnalyzer.NULL_BIT_KEY;
+    }
+    
+    private int bitIndex(K key)
+    {
+        int lengthInBits = lengthInBits(key);
+        for (int i = 0; i < lengthInBits; i++)
+        {
+            if (isBitSet(key, i))
+                return i;
+        }
+        
+        return KeyAnalyzer.NULL_BIT_KEY;
+    }
+    
+    /**
+     * An utility method for calling {@link KeyAnalyzer#compare(Object, Object)}
+     */
+    final boolean compareKeys(K key, K other)
+    {
+        if (key == null)
+        {
+            return (other == null);
+        }
+        else if (other == null)
+        {
+            return false;
+        }
+        
+        return keyAnalyzer.compare(key, other) == 0;
+    }
+    
+    /**
+     * A basic implementation of {@link Entry}
+     */
+    abstract static class BasicEntry<K, V> implements Map.Entry<K, V>, Serializable
+    {
+        private static final long serialVersionUID = -944364551314110330L;
+
+        protected K key;
+        
+        protected V value;
+        
+        private transient int hashCode = 0;
+        
+        public BasicEntry(K key, V value)
+        {
+            this.key = key;
+            this.value = value;
+        }
+        
+        /**
+         * Replaces the current key and value with the provided
+         * key &amp; value
+         */
+        public V setKeyValue(K key, V value)
+        {
+            this.key = key;
+            this.hashCode = 0;
+            return setValue(value);
+        }
+        
+        @Override
+        public K getKey()
+        {
+            return key;
+        }
+        
+        @Override
+        public V getValue()
+        {
+            return value;
+        }
+        
+        @Override
+        public V setValue(V value)
+        {
+            V previous = this.value;
+            this.value = value;
+            return previous;
+        }
+        
+        @Override
+        public int hashCode()
+        {
+            if (hashCode == 0)
+                hashCode = (key != null ? key.hashCode() : 0);
+            return hashCode;
+        }
+        
+        @Override
+        public boolean equals(Object o)
+        {
+            if (o == this)
+            {
+                return true;
+            }
+            else if (!(o instanceof Map.Entry<?, ?>))
+            {
+                return false;
+            }
+            
+            Map.Entry<?, ?> other = (Map.Entry<?, ?>)o;
+            return Tries.areEqual(key, other.getKey()) && Tries.areEqual(value, other.getValue());
+        }
+        
+        @Override
+        public String toString()
+        {
+            return key + "=" + value;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/trie/Cursor.java b/src/java/org/apache/cassandra/index/sasi/utils/trie/Cursor.java
new file mode 100644
index 0000000..513fae0
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/trie/Cursor.java

@@ -0,0 +1,83 @@
+/*
+ * Copyright 2005-2010 Roger Kapsi, Sam Berlin
+ *
+ *   Licensed under the Apache License, Version 2.0 (the "License");
+ *   you may not use this file except in compliance with the License.
+ *   You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.cassandra.index.sasi.utils.trie;
+
+import java.util.Map;
+import java.util.Map.Entry;
+
+/**
+ * This class is taken from https://github.com/rkapsi/patricia-trie (v0.6), and slightly modified
+ * to correspond to Cassandra code style, as the only Patricia Trie implementation,
+ * which supports pluggable key comparators (e.g. commons-collections PatriciaTrie (which is based
+ * on rkapsi/patricia-trie project) only supports String keys)
+ * but unfortunately is not deployed to the maven central as a downloadable artifact.
+ */
+
+/**
+ * A {@link Cursor} can be used to traverse a {@link Trie}, visit each node 
+ * step by step and make {@link Decision}s on each step how to continue with 
+ * traversing the {@link Trie}.
+ */
+public interface Cursor<K, V>
+{
+    
+    /**
+     * The {@link Decision} tells the {@link Cursor} what to do on each step 
+     * while traversing the {@link Trie}.
+     * 
+     * NOTE: Not all operations that work with a {@link Cursor} support all 
+     * {@link Decision} types
+     */
+    enum Decision
+    {
+        
+        /**
+         * Exit the traverse operation
+         */
+        EXIT, 
+        
+        /**
+         * Continue with the traverse operation
+         */
+        CONTINUE, 
+        
+        /**
+         * Remove the previously returned element
+         * from the {@link Trie} and continue
+         */
+        REMOVE, 
+        
+        /**
+         * Remove the previously returned element
+         * from the {@link Trie} and exit from the
+         * traverse operation
+         */
+        REMOVE_AND_EXIT
+    }
+    
+    /**
+     * Called for each {@link Entry} in the {@link Trie}. Return 
+     * {@link Decision#EXIT} to finish the {@link Trie} operation,
+     * {@link Decision#CONTINUE} to go to the next {@link Entry},
+     * {@link Decision#REMOVE} to remove the {@link Entry} and
+     * continue iterating or {@link Decision#REMOVE_AND_EXIT} to
+     * remove the {@link Entry} and stop iterating.
+     * 
+     * Note: Not all operations support {@link Decision#REMOVE}.
+     */
+    Decision select(Map.Entry<? extends K, ? extends V> entry);
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/trie/KeyAnalyzer.java b/src/java/org/apache/cassandra/index/sasi/utils/trie/KeyAnalyzer.java
new file mode 100644
index 0000000..9cab4ae
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/trie/KeyAnalyzer.java

@@ -0,0 +1,73 @@
+/*
+ * Copyright 2010 Roger Kapsi
+ *
+ *   Licensed under the Apache License, Version 2.0 (the "License");
+ *   you may not use this file except in compliance with the License.
+ *   You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.cassandra.index.sasi.utils.trie;
+
+import java.util.Comparator;
+
+/**
+ * This class is taken from https://github.com/rkapsi/patricia-trie (v0.6), and slightly modified
+ * to correspond to Cassandra code style, as the only Patricia Trie implementation,
+ * which supports pluggable key comparators (e.g. commons-collections PatriciaTrie (which is based
+ * on rkapsi/patricia-trie project) only supports String keys)
+ * but unfortunately is not deployed to the maven central as a downloadable artifact.
+ */
+
+/**
+ * The {@link KeyAnalyzer} provides bit-level access to keys
+ * for the {@link PatriciaTrie}.
+ */
+public interface KeyAnalyzer<K> extends Comparator<K>
+{
+    /**
+     * Returned by {@link #bitIndex(Object, Object)} if a key's
+     * bits were all zero (0).
+     */
+    int NULL_BIT_KEY = -1;
+    
+    /** 
+     * Returned by {@link #bitIndex(Object, Object)} if a the
+     * bits of two keys were all equal.
+     */
+    int EQUAL_BIT_KEY = -2;
+    
+    /**
+     * Returned by {@link #bitIndex(Object, Object)} if a keys 
+     * indices are out of bounds.
+     */
+    int OUT_OF_BOUNDS_BIT_KEY = -3;
+    
+    /**
+     * Returns the key's length in bits.
+     */
+    int lengthInBits(K key);
+    
+    /**
+     * Returns {@code true} if a key's bit it set at the given index.
+     */
+    boolean isBitSet(K key, int bitIndex);
+    
+    /**
+     * Returns the index of the first bit that is different in the two keys.
+     */
+    int bitIndex(K key, K otherKey);
+    
+    /**
+     * Returns {@code true} if the second argument is a 
+     * prefix of the first argument.
+     */
+    boolean isPrefix(K key, K prefix);
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/trie/PatriciaTrie.java b/src/java/org/apache/cassandra/index/sasi/utils/trie/PatriciaTrie.java
new file mode 100644
index 0000000..3c672ec
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/trie/PatriciaTrie.java

@@ -0,0 +1,1261 @@
+/*
+ * Copyright 2005-2010 Roger Kapsi, Sam Berlin
+ *
+ *   Licensed under the Apache License, Version 2.0 (the "License");
+ *   you may not use this file except in compliance with the License.
+ *   You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.cassandra.index.sasi.utils.trie;
+
+import java.io.Serializable;
+import java.util.*;
+
+/**
+ * This class is taken from https://github.com/rkapsi/patricia-trie (v0.6), and slightly modified
+ * to correspond to Cassandra code style, as the only Patricia Trie implementation,
+ * which supports pluggable key comparators (e.g. commons-collections PatriciaTrie (which is based
+ * on rkapsi/patricia-trie project) only supports String keys)
+ * but unfortunately is not deployed to the maven central as a downloadable artifact.
+ */
+
+/**
+ * <h3>PATRICIA {@link Trie}</h3>
+ *  
+ * <i>Practical Algorithm to Retrieve Information Coded in Alphanumeric</i>
+ * 
+ * <p>A PATRICIA {@link Trie} is a compressed {@link Trie}. Instead of storing 
+ * all data at the edges of the {@link Trie} (and having empty internal nodes), 
+ * PATRICIA stores data in every node. This allows for very efficient traversal, 
+ * insert, delete, predecessor, successor, prefix, range, and {@link #select(Object)} 
+ * operations. All operations are performed at worst in O(K) time, where K 
+ * is the number of bits in the largest item in the tree. In practice, 
+ * operations actually take O(A(K)) time, where A(K) is the average number of 
+ * bits of all items in the tree.
+ * 
+ * <p>Most importantly, PATRICIA requires very few comparisons to keys while
+ * doing any operation. While performing a lookup, each comparison (at most 
+ * K of them, described above) will perform a single bit comparison against 
+ * the given key, instead of comparing the entire key to another key.
+ * 
+ * <p>The {@link Trie} can return operations in lexicographical order using the 
+ * {@link #traverse(Cursor)}, 'prefix', 'submap', or 'iterator' methods. The 
+ * {@link Trie} can also scan for items that are 'bitwise' (using an XOR 
+ * metric) by the 'select' method. Bitwise closeness is determined by the 
+ * {@link KeyAnalyzer} returning true or false for a bit being set or not in 
+ * a given key.
+ * 
+ * <p>Any methods here that take an {@link Object} argument may throw a 
+ * {@link ClassCastException} if the method is expecting an instance of K 
+ * and it isn't K.
+ * 
+ * @see <a href="http://en.wikipedia.org/wiki/Radix_tree">Radix Tree</a>
+ * @see <a href="http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Tree/PATRICIA">PATRICIA</a>
+ * @see <a href="http://www.imperialviolet.org/binary/critbit.pdf">Crit-Bit Tree</a>
+ * 
+ * @author Roger Kapsi
+ * @author Sam Berlin
+ */
+public class PatriciaTrie<K, V> extends AbstractPatriciaTrie<K, V> implements Serializable
+{
+    private static final long serialVersionUID = -2246014692353432660L;
+    
+    public PatriciaTrie(KeyAnalyzer<? super K> keyAnalyzer)
+    {
+        super(keyAnalyzer);
+    }
+    
+    public PatriciaTrie(KeyAnalyzer<? super K> keyAnalyzer, Map<? extends K, ? extends V> m)
+    {
+        super(keyAnalyzer, m);
+    }
+    
+    @Override
+    public Comparator<? super K> comparator()
+    {
+        return keyAnalyzer;
+    }
+    
+    @Override
+    public SortedMap<K, V> prefixMap(K prefix)
+    {
+        return lengthInBits(prefix) == 0 ? this : new PrefixRangeMap(prefix);
+    }
+    
+    @Override
+    public K firstKey()
+    {
+        return firstEntry().getKey();
+    }
+    
+    @Override
+    public K lastKey()
+    {
+        TrieEntry<K, V> entry = lastEntry();
+        return entry != null ? entry.getKey() : null;
+    }
+    
+    @Override
+    public SortedMap<K, V> headMap(K toKey)
+    {
+        return new RangeEntryMap(null, toKey);
+    }
+    
+    @Override
+    public SortedMap<K, V> subMap(K fromKey, K toKey)
+    {
+        return new RangeEntryMap(fromKey, toKey);
+    }
+    
+    @Override
+    public SortedMap<K, V> tailMap(K fromKey)
+    {
+        return new RangeEntryMap(fromKey, null);
+    } 
+    
+    /**
+     * Returns an entry strictly higher than the given key,
+     * or null if no such entry exists.
+     */
+    private TrieEntry<K,V> higherEntry(K key)
+    {
+        // TODO: Cleanup so that we don't actually have to add/remove from the
+        //       tree.  (We do it here because there are other well-defined 
+        //       functions to perform the search.)
+        int lengthInBits = lengthInBits(key);
+        
+        if (lengthInBits == 0)
+        {
+            if (!root.isEmpty())
+            {
+                // If data in root, and more after -- return it.
+                return size() > 1 ? nextEntry(root) : null;
+            }
+            else
+            {
+                // Root is empty & we want something after empty, return first.
+                return firstEntry();
+            }
+        }
+        
+        TrieEntry<K, V> found = getNearestEntryForKey(key);
+        if (compareKeys(key, found.key))
+            return nextEntry(found);
+        
+        int bitIndex = bitIndex(key, found.key);
+        if (Tries.isValidBitIndex(bitIndex))
+        {
+            return replaceCeil(key, bitIndex);
+        }
+        else if (Tries.isNullBitKey(bitIndex))
+        {
+            if (!root.isEmpty())
+            {
+                return firstEntry();
+            }
+            else if (size() > 1)
+            {
+                return nextEntry(firstEntry());
+            }
+            else
+            {
+                return null;
+            }
+        }
+        else if (Tries.isEqualBitKey(bitIndex))
+        {
+            return nextEntry(found);
+        }
+
+        // we should have exited above.
+        throw new IllegalStateException("invalid lookup: " + key);
+    }
+    
+    /**
+     * Returns a key-value mapping associated with the least key greater
+     * than or equal to the given key, or null if there is no such key.
+     */
+    TrieEntry<K,V> ceilingEntry(K key)
+    {
+        // Basically:
+        // Follow the steps of adding an entry, but instead...
+        //
+        // - If we ever encounter a situation where we found an equal
+        //   key, we return it immediately.
+        //
+        // - If we hit an empty root, return the first iterable item.
+        //
+        // - If we have to add a new item, we temporarily add it,
+        //   find the successor to it, then remove the added item.
+        //
+        // These steps ensure that the returned value is either the
+        // entry for the key itself, or the first entry directly after
+        // the key.
+        
+        // TODO: Cleanup so that we don't actually have to add/remove from the
+        //       tree.  (We do it here because there are other well-defined 
+        //       functions to perform the search.)
+        int lengthInBits = lengthInBits(key);
+        
+        if (lengthInBits == 0)
+        {
+            if (!root.isEmpty())
+            {
+                return root;
+            }
+            else
+            {
+                return firstEntry();
+            }
+        }
+        
+        TrieEntry<K, V> found = getNearestEntryForKey(key);
+        if (compareKeys(key, found.key))
+            return found;
+        
+        int bitIndex = bitIndex(key, found.key);
+        if (Tries.isValidBitIndex(bitIndex))
+        {
+            return replaceCeil(key, bitIndex);
+        }
+        else if (Tries.isNullBitKey(bitIndex))
+        {
+            if (!root.isEmpty())
+            {
+                return root;
+            }
+            else
+            {
+                return firstEntry();
+            }
+        }
+        else if (Tries.isEqualBitKey(bitIndex))
+        {
+            return found;
+        }
+
+        // we should have exited above.
+        throw new IllegalStateException("invalid lookup: " + key);
+    }
+
+    private TrieEntry<K, V> replaceCeil(K key, int bitIndex)
+    {
+        TrieEntry<K, V> added = new TrieEntry<>(key, null, bitIndex);
+        addEntry(added);
+        incrementSize(); // must increment because remove will decrement
+        TrieEntry<K, V> ceil = nextEntry(added);
+        removeEntry(added);
+        modCount -= 2; // we didn't really modify it.
+        return ceil;
+    }
+
+    private TrieEntry<K, V> replaceLower(K key, int bitIndex)
+    {
+        TrieEntry<K, V> added = new TrieEntry<>(key, null, bitIndex);
+        addEntry(added);
+        incrementSize(); // must increment because remove will decrement
+        TrieEntry<K, V> prior = previousEntry(added);
+        removeEntry(added);
+        modCount -= 2; // we didn't really modify it.
+        return prior;
+    }
+    
+    /**
+     * Returns a key-value mapping associated with the greatest key
+     * strictly less than the given key, or null if there is no such key.
+     */
+    TrieEntry<K,V> lowerEntry(K key)
+    {
+        // Basically:
+        // Follow the steps of adding an entry, but instead...
+        //
+        // - If we ever encounter a situation where we found an equal
+        //   key, we return it's previousEntry immediately.
+        //
+        // - If we hit root (empty or not), return null.
+        //
+        // - If we have to add a new item, we temporarily add it,
+        //   find the previousEntry to it, then remove the added item.
+        //
+        // These steps ensure that the returned value is always just before
+        // the key or null (if there was nothing before it).
+        
+        // TODO: Cleanup so that we don't actually have to add/remove from the
+        //       tree.  (We do it here because there are other well-defined 
+        //       functions to perform the search.)
+        int lengthInBits = lengthInBits(key);
+        
+        if (lengthInBits == 0)
+            return null; // there can never be anything before root.
+        
+        TrieEntry<K, V> found = getNearestEntryForKey(key);
+        if (compareKeys(key, found.key))
+            return previousEntry(found);
+        
+        int bitIndex = bitIndex(key, found.key);
+        if (Tries.isValidBitIndex(bitIndex))
+        {
+            return replaceLower(key, bitIndex);
+        }
+        else if (Tries.isNullBitKey(bitIndex))
+        {
+            return null;
+        }
+        else if (Tries.isEqualBitKey(bitIndex))
+        {
+            return previousEntry(found);
+        }
+
+        // we should have exited above.
+        throw new IllegalStateException("invalid lookup: " + key);
+    }
+    
+    /**
+     * Returns a key-value mapping associated with the greatest key
+     * less than or equal to the given key, or null if there is no such key.
+     */
+    TrieEntry<K,V> floorEntry(K key) {        
+        // TODO: Cleanup so that we don't actually have to add/remove from the
+        //       tree.  (We do it here because there are other well-defined 
+        //       functions to perform the search.)
+        int lengthInBits = lengthInBits(key);
+        
+        if (lengthInBits == 0)
+        {
+            return !root.isEmpty() ? root : null;
+        }
+        
+        TrieEntry<K, V> found = getNearestEntryForKey(key);
+        if (compareKeys(key, found.key))
+            return found;
+        
+        int bitIndex = bitIndex(key, found.key);
+        if (Tries.isValidBitIndex(bitIndex))
+        {
+            return replaceLower(key, bitIndex);
+        }
+        else if (Tries.isNullBitKey(bitIndex))
+        {
+            if (!root.isEmpty())
+            {
+                return root;
+            }
+            else
+            {
+                return null;
+            }
+        }
+        else if (Tries.isEqualBitKey(bitIndex))
+        {
+            return found;
+        }
+
+        // we should have exited above.
+        throw new IllegalStateException("invalid lookup: " + key);
+    }
+    
+    /**
+     * Finds the subtree that contains the prefix.
+     * 
+     * This is very similar to getR but with the difference that
+     * we stop the lookup if h.bitIndex > lengthInBits.
+     */
+    private TrieEntry<K, V> subtree(K prefix)
+    {
+        int lengthInBits = lengthInBits(prefix);
+        
+        TrieEntry<K, V> current = root.left;
+        TrieEntry<K, V> path = root;
+        while(true)
+        {
+            if (current.bitIndex <= path.bitIndex || lengthInBits < current.bitIndex)
+                break;
+            
+            path = current;
+            current = !isBitSet(prefix, current.bitIndex)
+                    ? current.left : current.right;
+        }        
+
+        // Make sure the entry is valid for a subtree.
+        TrieEntry<K, V> entry = current.isEmpty() ? path : current;
+        
+        // If entry is root, it can't be empty.
+        if (entry.isEmpty())
+            return null;
+        
+        // if root && length of root is less than length of lookup,
+        // there's nothing.
+        // (this prevents returning the whole subtree if root has an empty
+        //  string and we want to lookup things with "\0")
+        if (entry == root && lengthInBits(entry.getKey()) < lengthInBits)
+            return null;
+        
+        // Found key's length-th bit differs from our key
+        // which means it cannot be the prefix...
+        if (isBitSet(prefix, lengthInBits) != isBitSet(entry.key, lengthInBits))
+            return null;
+        
+        // ... or there are less than 'length' equal bits
+        int bitIndex = bitIndex(prefix, entry.key);
+        return (bitIndex >= 0 && bitIndex < lengthInBits) ? null : entry;
+    }
+    
+    /**
+     * Returns the last entry the {@link Trie} is storing.
+     * 
+     * <p>This is implemented by going always to the right until
+     * we encounter a valid uplink. That uplink is the last key.
+     */
+    private TrieEntry<K, V> lastEntry()
+    {
+        return followRight(root.left);
+    }
+    
+    /**
+     * Traverses down the right path until it finds an uplink.
+     */
+    private TrieEntry<K, V> followRight(TrieEntry<K, V> node)
+    {
+        // if Trie is empty, no last entry.
+        if (node.right == null)
+            return null;
+        
+        // Go as far right as possible, until we encounter an uplink.
+        while (node.right.bitIndex > node.bitIndex)
+        {
+            node = node.right;
+        }
+        
+        return node.right;
+    }
+    
+    /**
+     * Returns the node lexicographically before the given node (or null if none).
+     * 
+     * This follows four simple branches:
+     *  - If the uplink that returned us was a right uplink:
+     *      - If predecessor's left is a valid uplink from predecessor, return it.
+     *      - Else, follow the right path from the predecessor's left.
+     *  - If the uplink that returned us was a left uplink:
+     *      - Loop back through parents until we encounter a node where 
+     *        node != node.parent.left.
+     *          - If node.parent.left is uplink from node.parent:
+     *              - If node.parent.left is not root, return it.
+     *              - If it is root & root isEmpty, return null.
+     *              - If it is root & root !isEmpty, return root.
+     *          - If node.parent.left is not uplink from node.parent:
+     *              - Follow right path for first right child from node.parent.left   
+     * 
+     * @param start the start entry
+     */
+    private TrieEntry<K, V> previousEntry(TrieEntry<K, V> start)
+    {
+        if (start.predecessor == null)
+            throw new IllegalArgumentException("must have come from somewhere!");
+        
+        if (start.predecessor.right == start)
+        {
+            return isValidUplink(start.predecessor.left, start.predecessor)
+                    ? start.predecessor.left
+                    : followRight(start.predecessor.left);
+        }
+
+        TrieEntry<K, V> node = start.predecessor;
+        while (node.parent != null && node == node.parent.left)
+        {
+            node = node.parent;
+        }
+
+        if (node.parent == null) // can be null if we're looking up root.
+            return null;
+
+        if (isValidUplink(node.parent.left, node.parent))
+        {
+            if (node.parent.left == root)
+            {
+                return root.isEmpty() ? null : root;
+            }
+            else
+            {
+                return node.parent.left;
+            }
+        }
+        else
+        {
+            return followRight(node.parent.left);
+        }
+    }
+    
+    /**
+     * Returns the entry lexicographically after the given entry.
+     * If the given entry is null, returns the first node.
+     * 
+     * This will traverse only within the subtree.  If the given node
+     * is not within the subtree, this will have undefined results.
+     */
+    private TrieEntry<K, V> nextEntryInSubtree(TrieEntry<K, V> node, TrieEntry<K, V> parentOfSubtree)
+    {
+        return (node == null) ? firstEntry() : nextEntryImpl(node.predecessor, node, parentOfSubtree);
+    }
+    
+    private boolean isPrefix(K key, K prefix)
+    {
+        return keyAnalyzer.isPrefix(key, prefix);
+    }
+    
+    /**
+     * A range view of the {@link Trie}
+     */
+    private abstract class RangeMap extends AbstractMap<K, V> implements SortedMap<K, V>
+    {
+        /**
+         * The {@link #entrySet()} view
+         */
+        private transient volatile Set<Map.Entry<K, V>> entrySet;
+
+        /**
+         * Creates and returns an {@link #entrySet()} 
+         * view of the {@link RangeMap}
+         */
+        protected abstract Set<Map.Entry<K, V>> createEntrySet();
+
+        /**
+         * Returns the FROM Key
+         */
+        protected abstract K getFromKey();
+        
+        /**
+         * Whether or not the {@link #getFromKey()} is in the range
+         */
+        protected abstract boolean isFromInclusive();
+        
+        /**
+         * Returns the TO Key
+         */
+        protected abstract K getToKey();
+        
+        /**
+         * Whether or not the {@link #getToKey()} is in the range
+         */
+        protected abstract boolean isToInclusive();
+        
+        
+        @Override
+        public Comparator<? super K> comparator()
+        {
+            return PatriciaTrie.this.comparator();
+        }
+        
+        @Override
+        public boolean containsKey(Object key)
+        {
+            return inRange(Tries.<K>cast(key)) && PatriciaTrie.this.containsKey(key);
+        }
+        
+        @Override
+        public V remove(Object key)
+        {
+            return (!inRange(Tries.<K>cast(key))) ? null : PatriciaTrie.this.remove(key);
+        }
+        
+        @Override
+        public V get(Object key)
+        {
+            return (!inRange(Tries.<K>cast(key))) ? null : PatriciaTrie.this.get(key);
+        }
+        
+        @Override
+        public V put(K key, V value)
+        {
+            if (!inRange(key))
+                throw new IllegalArgumentException("Key is out of range: " + key);
+
+            return PatriciaTrie.this.put(key, value);
+        }
+        
+        @Override
+        public Set<Map.Entry<K, V>> entrySet()
+        {
+            if (entrySet == null)
+                entrySet = createEntrySet();
+            return entrySet;
+        }
+        
+        @Override
+        public SortedMap<K, V> subMap(K fromKey, K toKey)
+        {
+            if (!inRange2(fromKey))
+                throw new IllegalArgumentException("FromKey is out of range: " + fromKey);
+
+            if (!inRange2(toKey))
+                throw new IllegalArgumentException("ToKey is out of range: " + toKey);
+
+            return createRangeMap(fromKey, isFromInclusive(), toKey, isToInclusive());
+        }
+        
+        @Override
+        public SortedMap<K, V> headMap(K toKey)
+        {
+            if (!inRange2(toKey))
+                throw new IllegalArgumentException("ToKey is out of range: " + toKey);
+
+            return createRangeMap(getFromKey(), isFromInclusive(), toKey, isToInclusive());
+        }
+        
+        @Override
+        public SortedMap<K, V> tailMap(K fromKey)
+        {
+            if (!inRange2(fromKey))
+                throw new IllegalArgumentException("FromKey is out of range: " + fromKey);
+
+            return createRangeMap(fromKey, isFromInclusive(), getToKey(), isToInclusive());
+        }
+
+        /**
+         * Returns true if the provided key is greater than TO and
+         * less than FROM
+         */
+        protected boolean inRange(K key)
+        {
+            K fromKey = getFromKey();
+            K toKey = getToKey();
+
+            return (fromKey == null || inFromRange(key, false))
+                    && (toKey == null || inToRange(key, false));
+        }
+
+        /**
+         * This form allows the high endpoint (as well as all legit keys)
+         */
+        protected boolean inRange2(K key)
+        {
+            K fromKey = getFromKey();
+            K toKey = getToKey();
+
+            return (fromKey == null || inFromRange(key, false))
+                    && (toKey == null || inToRange(key, true));
+        }
+
+        /**
+         * Returns true if the provided key is in the FROM range 
+         * of the {@link RangeMap}
+         */
+        protected boolean inFromRange(K key, boolean forceInclusive)
+        {
+            K fromKey = getFromKey();
+            boolean fromInclusive = isFromInclusive();
+
+            int ret = keyAnalyzer.compare(key, fromKey);
+            return (fromInclusive || forceInclusive) ? ret >= 0 : ret > 0;
+        }
+
+        /**
+         * Returns true if the provided key is in the TO range 
+         * of the {@link RangeMap}
+         */
+        protected boolean inToRange(K key, boolean forceInclusive)
+        {
+            K toKey = getToKey();
+            boolean toInclusive = isToInclusive();
+
+            int ret = keyAnalyzer.compare(key, toKey);
+            return (toInclusive || forceInclusive) ? ret <= 0 : ret < 0;
+        }
+
+        /**
+         * Creates and returns a sub-range view of the current {@link RangeMap}
+         */
+        protected abstract SortedMap<K, V> createRangeMap(K fromKey, boolean fromInclusive, K toKey, boolean toInclusive);
+    }
+   
+   /**
+    * A {@link RangeMap} that deals with {@link Entry}s
+    */
+   private class RangeEntryMap extends RangeMap
+   {
+       /** 
+        * The key to start from, null if the beginning. 
+        */
+       protected final K fromKey;
+       
+       /** 
+        * The key to end at, null if till the end. 
+        */
+       protected final K toKey;
+       
+       /** 
+        * Whether or not the 'from' is inclusive. 
+        */
+       protected final boolean fromInclusive;
+       
+       /** 
+        * Whether or not the 'to' is inclusive. 
+        */
+       protected final boolean toInclusive;
+       
+       /**
+        * Creates a {@link RangeEntryMap} with the fromKey included and
+        * the toKey excluded from the range
+        */
+       protected RangeEntryMap(K fromKey, K toKey)
+       {
+           this(fromKey, true, toKey, false);
+       }
+       
+       /**
+        * Creates a {@link RangeEntryMap}
+        */
+       protected RangeEntryMap(K fromKey, boolean fromInclusive, K toKey, boolean toInclusive)
+       {
+           if (fromKey == null && toKey == null)
+               throw new IllegalArgumentException("must have a from or to!");
+           
+           if (fromKey != null && toKey != null && keyAnalyzer.compare(fromKey, toKey) > 0)
+               throw new IllegalArgumentException("fromKey > toKey");
+           
+           this.fromKey = fromKey;
+           this.fromInclusive = fromInclusive;
+           this.toKey = toKey;
+           this.toInclusive = toInclusive;
+       }
+       
+       
+       @Override
+       public K firstKey()
+       {
+           Map.Entry<K,V> e  = fromKey == null
+                ? firstEntry()
+                : fromInclusive ? ceilingEntry(fromKey) : higherEntry(fromKey);
+           
+           K first = e != null ? e.getKey() : null;
+           if (e == null || toKey != null && !inToRange(first, false))
+               throw new NoSuchElementException();
+
+           return first;
+       }
+
+       
+       @Override
+       public K lastKey()
+       {
+           Map.Entry<K,V> e = toKey == null
+                ? lastEntry()
+                : toInclusive ? floorEntry(toKey) : lowerEntry(toKey);
+           
+           K last = e != null ? e.getKey() : null;
+           if (e == null || fromKey != null && !inFromRange(last, false))
+               throw new NoSuchElementException();
+
+           return last;
+       }
+       
+       @Override
+       protected Set<Entry<K, V>> createEntrySet()
+       {
+           return new RangeEntrySet(this);
+       }
+       
+       @Override
+       public K getFromKey()
+       {
+           return fromKey;
+       }
+       
+       @Override
+       public K getToKey()
+       {
+           return toKey;
+       }
+       
+       @Override
+       public boolean isFromInclusive()
+       {
+           return fromInclusive;
+       }
+       
+       @Override
+       public boolean isToInclusive()
+       {
+           return toInclusive;
+       }
+       
+       @Override
+       protected SortedMap<K, V> createRangeMap(K fromKey, boolean fromInclusive, K toKey, boolean toInclusive)
+       {
+           return new RangeEntryMap(fromKey, fromInclusive, toKey, toInclusive);
+       }
+   }
+   
+    /**
+     * A {@link Set} view of a {@link RangeMap}
+     */
+    private class RangeEntrySet extends AbstractSet<Map.Entry<K, V>>
+    {
+
+        private final RangeMap delegate;
+
+        private int size = -1;
+
+        private int expectedModCount = -1;
+
+        /**
+         * Creates a {@link RangeEntrySet}
+         */
+        public RangeEntrySet(RangeMap delegate)
+        {
+            if (delegate == null)
+                throw new NullPointerException("delegate");
+
+            this.delegate = delegate;
+        }
+        
+        @Override
+        public Iterator<Map.Entry<K, V>> iterator()
+        {
+            K fromKey = delegate.getFromKey();
+            K toKey = delegate.getToKey();
+
+            TrieEntry<K, V> first = fromKey == null ? firstEntry() : ceilingEntry(fromKey);
+            TrieEntry<K, V> last = null;
+            if (toKey != null)
+                last = ceilingEntry(toKey);
+
+            return new EntryIterator(first, last);
+        }
+        
+        @Override
+        public int size()
+        {
+            if (size == -1 || expectedModCount != PatriciaTrie.this.modCount)
+            {
+                size = 0;
+
+                for (Iterator<?> it = iterator(); it.hasNext(); it.next())
+                {
+                    ++size;
+                }
+
+                expectedModCount = PatriciaTrie.this.modCount;
+            }
+
+            return size;
+        }
+        
+        @Override
+        public boolean isEmpty()
+        {
+            return !iterator().hasNext();
+        }
+        
+        @Override
+        public boolean contains(Object o)
+        {
+            if (!(o instanceof Map.Entry<?, ?>))
+                return false;
+
+            @SuppressWarnings("unchecked")
+            Map.Entry<K, V> entry = (Map.Entry<K, V>) o;
+            K key = entry.getKey();
+            if (!delegate.inRange(key))
+                return false;
+
+            TrieEntry<K, V> node = getEntry(key);
+            return node != null && Tries.areEqual(node.getValue(), entry.getValue());
+        }
+        
+        @Override
+        public boolean remove(Object o)
+        {
+            if (!(o instanceof Map.Entry<?, ?>))
+                return false;
+
+            @SuppressWarnings("unchecked")
+            Map.Entry<K, V> entry = (Map.Entry<K, V>) o;
+            K key = entry.getKey();
+            if (!delegate.inRange(key))
+                return false;
+
+            TrieEntry<K, V> node = getEntry(key);
+            if (node != null && Tries.areEqual(node.getValue(), entry.getValue()))
+            {
+                removeEntry(node);
+                return true;
+            }
+
+            return false;
+        }
+        
+        /** 
+         * An {@link Iterator} for {@link RangeEntrySet}s. 
+         */
+        private final class EntryIterator extends TrieIterator<Map.Entry<K,V>>
+        {
+            private final K excludedKey;
+
+            /**
+             * Creates a {@link EntryIterator}
+             */
+            private EntryIterator(TrieEntry<K,V> first, TrieEntry<K,V> last)
+            {
+                super(first);
+                this.excludedKey = (last != null ? last.getKey() : null);
+            }
+            
+            @Override
+            public boolean hasNext()
+            {
+                return next != null && !Tries.areEqual(next.key, excludedKey);
+            }
+            
+            @Override
+            public Map.Entry<K,V> next()
+            {
+                if (next == null || Tries.areEqual(next.key, excludedKey))
+                    throw new NoSuchElementException();
+                
+                return nextEntry();
+            }
+        }
+    }   
+   
+    /** 
+     * A submap used for prefix views over the {@link Trie}. 
+     */
+    private class PrefixRangeMap extends RangeMap
+    {
+        
+        private final K prefix;
+        
+        private K fromKey = null;
+        
+        private K toKey = null;
+        
+        private int expectedModCount = -1;
+        
+        private int size = -1;
+        
+        /**
+         * Creates a {@link PrefixRangeMap}
+         */
+        private PrefixRangeMap(K prefix)
+        {
+            this.prefix = prefix;
+        }
+        
+        /**
+         * This method does two things. It determinates the FROM
+         * and TO range of the {@link PrefixRangeMap} and the number
+         * of elements in the range. This method must be called every 
+         * time the {@link Trie} has changed.
+         */
+        private int fixup()
+        {
+            // The trie has changed since we last
+            // found our toKey / fromKey
+            if (size == - 1 || PatriciaTrie.this.modCount != expectedModCount)
+            {
+                Iterator<Map.Entry<K, V>> it = entrySet().iterator();
+                size = 0;
+                
+                Map.Entry<K, V> entry = null;
+                if (it.hasNext())
+                {
+                    entry = it.next();
+                    size = 1;
+                }
+                
+                fromKey = entry == null ? null : entry.getKey();
+                if (fromKey != null)
+                {
+                    TrieEntry<K, V> prior = previousEntry((TrieEntry<K, V>)entry);
+                    fromKey = prior == null ? null : prior.getKey();
+                }
+                
+                toKey = fromKey;
+                
+                while (it.hasNext())
+                {
+                    ++size;
+                    entry = it.next();
+                }
+                
+                toKey = entry == null ? null : entry.getKey();
+                
+                if (toKey != null)
+                {
+                    entry = nextEntry((TrieEntry<K, V>)entry);
+                    toKey = entry == null ? null : entry.getKey();
+                }
+                
+                expectedModCount = PatriciaTrie.this.modCount;
+            }
+            
+            return size;
+        }
+        
+        @Override
+        public K firstKey()
+        {
+            fixup();
+            
+            Map.Entry<K,V> e = fromKey == null ? firstEntry() : higherEntry(fromKey);
+            K first = e != null ? e.getKey() : null;
+            if (e == null || !isPrefix(first, prefix))
+                throw new NoSuchElementException();
+            
+            return first;
+        }
+        
+        @Override
+        public K lastKey()
+        {
+            fixup();
+            
+            Map.Entry<K,V> e = toKey == null ? lastEntry() : lowerEntry(toKey);
+            K last = e != null ? e.getKey() : null;
+            if (e == null || !isPrefix(last, prefix))
+                throw new NoSuchElementException();
+            
+            return last;
+        }
+        
+        /**
+         * Returns true if this {@link PrefixRangeMap}'s key is a prefix
+         * of the provided key.
+         */
+        @Override
+        protected boolean inRange(K key)
+        {
+            return isPrefix(key, prefix);
+        }
+
+        /**
+         * Same as {@link #inRange(Object)}
+         */
+        @Override
+        protected boolean inRange2(K key)
+        {
+            return inRange(key);
+        }
+        
+        /**
+         * Returns true if the provided Key is in the FROM range
+         * of the {@link PrefixRangeMap}
+         */
+        @Override
+        protected boolean inFromRange(K key, boolean forceInclusive)
+        {
+            return isPrefix(key, prefix);
+        }
+        
+        /**
+         * Returns true if the provided Key is in the TO range
+         * of the {@link PrefixRangeMap}
+         */
+        @Override
+        protected boolean inToRange(K key, boolean forceInclusive)
+        {
+            return isPrefix(key, prefix);
+        }
+        
+        @Override
+        protected Set<Map.Entry<K, V>> createEntrySet()
+        {
+            return new PrefixRangeEntrySet(this);
+        }
+        
+        @Override
+        public K getFromKey()
+        {
+            return fromKey;
+        }
+        
+        @Override
+        public K getToKey()
+        {
+            return toKey;
+        }
+        
+        @Override
+        public boolean isFromInclusive()
+        {
+            return false;
+        }
+        
+        @Override
+        public boolean isToInclusive()
+        {
+            return false;
+        }
+        
+        @Override
+        protected SortedMap<K, V> createRangeMap(K fromKey, boolean fromInclusive,
+                                                 K toKey, boolean toInclusive)
+        {
+            return new RangeEntryMap(fromKey, fromInclusive, toKey, toInclusive);
+        }
+    }
+    
+    /**
+     * A prefix {@link RangeEntrySet} view of the {@link Trie}
+     */
+    private final class PrefixRangeEntrySet extends RangeEntrySet
+    {
+        private final PrefixRangeMap delegate;
+        
+        private TrieEntry<K, V> prefixStart;
+        
+        private int expectedModCount = -1;
+        
+        /**
+         * Creates a {@link PrefixRangeEntrySet}
+         */
+        public PrefixRangeEntrySet(PrefixRangeMap delegate)
+        {
+            super(delegate);
+            this.delegate = delegate;
+        }
+        
+        @Override
+        public int size()
+        {
+            return delegate.fixup();
+        }
+        
+        @Override
+        public Iterator<Map.Entry<K,V>> iterator()
+        {
+            if (PatriciaTrie.this.modCount != expectedModCount)
+            {
+                prefixStart = subtree(delegate.prefix);
+                expectedModCount = PatriciaTrie.this.modCount;
+            }
+            
+            if (prefixStart == null)
+            {
+                Set<Map.Entry<K,V>> empty = Collections.emptySet();
+                return empty.iterator();
+            }
+            else if (lengthInBits(delegate.prefix) >= prefixStart.bitIndex)
+            {
+                return new SingletonIterator(prefixStart);
+            }
+            else
+            {
+                return new EntryIterator(prefixStart, delegate.prefix);
+            }
+        }
+        
+        /** 
+         * An {@link Iterator} that holds a single {@link TrieEntry}. 
+         */
+        private final class SingletonIterator implements Iterator<Map.Entry<K, V>>
+        {
+            private final TrieEntry<K, V> entry;
+            
+            private int hit = 0;
+            
+            public SingletonIterator(TrieEntry<K, V> entry)
+            {
+                this.entry = entry;
+            }
+            
+            @Override
+            public boolean hasNext()
+            {
+                return hit == 0;
+            }
+            
+            @Override
+            public Map.Entry<K, V> next()
+            {
+                if (hit != 0)
+                    throw new NoSuchElementException();
+                
+                ++hit;
+                return entry;
+            }
+
+            
+            @Override
+            public void remove()
+            {
+                if (hit != 1)
+                    throw new IllegalStateException();
+                
+                ++hit;
+                PatriciaTrie.this.removeEntry(entry);
+            }
+        }
+        
+        /** 
+         * An {@link Iterator} for iterating over a prefix search. 
+         */
+        private final class EntryIterator extends TrieIterator<Map.Entry<K, V>>
+        {
+            // values to reset the subtree if we remove it.
+            protected final K prefix;
+            protected boolean lastOne;
+            
+            protected TrieEntry<K, V> subtree; // the subtree to search within
+            
+            /**
+             * Starts iteration at the given entry & search only 
+             * within the given subtree.
+             */
+            EntryIterator(TrieEntry<K, V> startScan, K prefix)
+            {
+                subtree = startScan;
+                next = PatriciaTrie.this.followLeft(startScan);
+                this.prefix = prefix;
+            }
+            
+            @Override
+            public Map.Entry<K,V> next()
+            {
+                Map.Entry<K, V> entry = nextEntry();
+                if (lastOne)
+                    next = null;
+                return entry;
+            }
+            
+            @Override
+            protected TrieEntry<K, V> findNext(TrieEntry<K, V> prior)
+            {
+                return PatriciaTrie.this.nextEntryInSubtree(prior, subtree);
+            }
+            
+            @Override
+            public void remove()
+            {
+                // If the current entry we're removing is the subtree
+                // then we need to find a new subtree parent.
+                boolean needsFixing = false;
+                int bitIdx = subtree.bitIndex;
+                if (current == subtree)
+                    needsFixing = true;
+                
+                super.remove();
+                
+                // If the subtree changed its bitIndex or we
+                // removed the old subtree, get a new one.
+                if (bitIdx != subtree.bitIndex || needsFixing)
+                    subtree = subtree(prefix);
+
+                // If the subtree's bitIndex is less than the
+                // length of our prefix, it's the last item
+                // in the prefix tree.
+                if (lengthInBits(prefix) >= subtree.bitIndex)
+                    lastOne = true;
+            }
+        }
+    }
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/trie/Trie.java b/src/java/org/apache/cassandra/index/sasi/utils/trie/Trie.java
new file mode 100644
index 0000000..44809f3
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/trie/Trie.java

@@ -0,0 +1,152 @@
+/*
+ * Copyright 2005-2010 Roger Kapsi, Sam Berlin
+ *
+ *   Licensed under the Apache License, Version 2.0 (the "License");
+ *   you may not use this file except in compliance with the License.
+ *   You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+package org.apache.cassandra.index.sasi.utils.trie;
+
+import java.util.Map;
+import java.util.SortedMap;
+
+import org.apache.cassandra.index.sasi.utils.trie.Cursor.Decision;
+
+/**
+ * This class is taken from https://github.com/rkapsi/patricia-trie (v0.6), and slightly modified
+ * to correspond to Cassandra code style, as the only Patricia Trie implementation,
+ * which supports pluggable key comparators (e.g. commons-collections PatriciaTrie (which is based
+ * on rkapsi/patricia-trie project) only supports String keys)
+ * but unfortunately is not deployed to the maven central as a downloadable artifact.
+ */
+
+/**
+ * Defines the interface for a prefix tree, an ordered tree data structure. For 
+ * more information, see <a href="http://en.wikipedia.org/wiki/Trie">Tries</a>.
+ * 
+ * @author Roger Kapsi
+ * @author Sam Berlin
+ */
+public interface Trie<K, V> extends SortedMap<K, V>
+{
+    /**
+     * Returns the {@link Map.Entry} whose key is closest in a bitwise XOR 
+     * metric to the given key. This is NOT lexicographic closeness.
+     * For example, given the keys:
+     *
+     * <ol>
+     * <li>D = 1000100
+     * <li>H = 1001000
+     * <li>L = 1001100
+     * </ol>
+     * 
+     * If the {@link Trie} contained 'H' and 'L', a lookup of 'D' would 
+     * return 'L', because the XOR distance between D &amp; L is smaller 
+     * than the XOR distance between D &amp; H. 
+     * 
+     * @return The {@link Map.Entry} whose key is closest in a bitwise XOR metric
+     * to the provided key.
+     */
+    Map.Entry<K, V> select(K key);
+    
+    /**
+     * Returns the key that is closest in a bitwise XOR metric to the 
+     * provided key. This is NOT lexicographic closeness!
+     * 
+     * For example, given the keys:
+     * 
+     * <ol>
+     * <li>D = 1000100
+     * <li>H = 1001000
+     * <li>L = 1001100
+     * </ol>
+     * 
+     * If the {@link Trie} contained 'H' and 'L', a lookup of 'D' would 
+     * return 'L', because the XOR distance between D &amp; L is smaller 
+     * than the XOR distance between D &amp; H. 
+     * 
+     * @return The key that is closest in a bitwise XOR metric to the provided key.
+     */
+    @SuppressWarnings("unused")
+    K selectKey(K key);
+    
+    /**
+     * Returns the value whose key is closest in a bitwise XOR metric to 
+     * the provided key. This is NOT lexicographic closeness!
+     * 
+     * For example, given the keys:
+     * 
+     * <ol>
+     * <li>D = 1000100
+     * <li>H = 1001000
+     * <li>L = 1001100
+     * </ol>
+     * 
+     * If the {@link Trie} contained 'H' and 'L', a lookup of 'D' would 
+     * return 'L', because the XOR distance between D &amp; L is smaller 
+     * than the XOR distance between D &amp; H. 
+     * 
+     * @return The value whose key is closest in a bitwise XOR metric
+     * to the provided key.
+     */
+    @SuppressWarnings("unused")
+    V selectValue(K key);
+    
+    /**
+     * Iterates through the {@link Trie}, starting with the entry whose bitwise
+     * value is closest in an XOR metric to the given key. After the closest
+     * entry is found, the {@link Trie} will call select on that entry and continue
+     * calling select for each entry (traversing in order of XOR closeness,
+     * NOT lexicographically) until the cursor returns {@link Decision#EXIT}.
+     * 
+     * <p>The cursor can return {@link Decision#CONTINUE} to continue traversing.
+     * 
+     * <p>{@link Decision#REMOVE_AND_EXIT} is used to remove the current element
+     * and stop traversing.
+     * 
+     * <p>Note: The {@link Decision#REMOVE} operation is not supported.
+     * 
+     * @return The entry the cursor returned {@link Decision#EXIT} on, or null 
+     * if it continued till the end.
+     */
+    Map.Entry<K,V> select(K key, Cursor<? super K, ? super V> cursor);
+    
+    /**
+     * Traverses the {@link Trie} in lexicographical order. 
+     * {@link Cursor#select(java.util.Map.Entry)} will be called on each entry.
+     * 
+     * <p>The traversal will stop when the cursor returns {@link Decision#EXIT}, 
+     * {@link Decision#CONTINUE} is used to continue traversing and 
+     * {@link Decision#REMOVE} is used to remove the element that was selected 
+     * and continue traversing.
+     * 
+     * <p>{@link Decision#REMOVE_AND_EXIT} is used to remove the current element
+     * and stop traversing.
+     *   
+     * @return The entry the cursor returned {@link Decision#EXIT} on, or null 
+     * if it continued till the end.
+     */
+    Map.Entry<K,V> traverse(Cursor<? super K, ? super V> cursor);
+    
+    /**
+     * Returns a view of this {@link Trie} of all elements that are prefixed 
+     * by the given key.
+     * 
+     * <p>In a {@link Trie} with fixed size keys, this is essentially a 
+     * {@link #get(Object)} operation.
+     * 
+     * <p>For example, if the {@link Trie} contains 'Anna', 'Anael', 
+     * 'Analu', 'Andreas', 'Andrea', 'Andres', and 'Anatole', then
+     * a lookup of 'And' would return 'Andreas', 'Andrea', and 'Andres'.
+     */
+    SortedMap<K, V> prefixMap(K prefix);
+}

diff --git a/src/java/org/apache/cassandra/index/sasi/utils/trie/Tries.java b/src/java/org/apache/cassandra/index/sasi/utils/trie/Tries.java
new file mode 100644
index 0000000..c258dd2
--- /dev/null
+++ b/src/java/org/apache/cassandra/index/sasi/utils/trie/Tries.java

@@ -0,0 +1,95 @@
+/*
+ * Copyright 2005-2010 Roger Kapsi
+ *
+ *   Licensed under the Apache License, Version 2.0 (the "License");
+ *   you may not use this file except in compliance with the License.
+ *   You may obtain a copy of the License at
+ *
+ *       http://www.apache.org/licenses/LICENSE-2.0
+ *
+ *   Unless required by applicable law or agreed to in writing, software
+ *   distributed under the License is distributed on an "AS IS" BASIS,
+ *   WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ *   See the License for the specific language governing permissions and
+ *   limitations under the License.
+ */
+
+/**
+ * This class is taken from https://github.com/rkapsi/patricia-trie (v0.6), and slightly modified
+ * to correspond to Cassandra code style, as the only Patricia Trie implementation,
+ * which supports pluggable key comparators (e.g. commons-collections PatriciaTrie (which is based
+ * on rkapsi/patricia-trie project) only supports String keys)
+ * but unfortunately is not deployed to the maven central as a downloadable artifact.
+ */
+
+package org.apache.cassandra.index.sasi.utils.trie;
+
+/**
+ * A collection of {@link Trie} utilities
+ */
+public class Tries
+{
+    /** 
+     * Returns true if bitIndex is a {@link KeyAnalyzer#OUT_OF_BOUNDS_BIT_KEY}
+     */
+    static boolean isOutOfBoundsIndex(int bitIndex)
+    {
+        return bitIndex == KeyAnalyzer.OUT_OF_BOUNDS_BIT_KEY;
+    }
+
+    /** 
+     * Returns true if bitIndex is a {@link KeyAnalyzer#EQUAL_BIT_KEY}
+     */
+    static boolean isEqualBitKey(int bitIndex)
+    {
+        return bitIndex == KeyAnalyzer.EQUAL_BIT_KEY;
+    }
+
+    /** 
+     * Returns true if bitIndex is a {@link KeyAnalyzer#NULL_BIT_KEY} 
+     */
+    static boolean isNullBitKey(int bitIndex)
+    {
+        return bitIndex == KeyAnalyzer.NULL_BIT_KEY;
+    }
+
+    /** 
+     * Returns true if the given bitIndex is valid. Indices 
+     * are considered valid if they're between 0 and 
+     * {@link Integer#MAX_VALUE}
+     */
+    static boolean isValidBitIndex(int bitIndex)
+    {
+        return 0 <= bitIndex;
+    }
+
+    /**
+     * Returns true if both values are either null or equal
+     */
+    static boolean areEqual(Object a, Object b)
+    {
+        return (a == null ? b == null : a.equals(b));
+    }
+
+    /**
+     * Throws a {@link NullPointerException} with the given message if 
+     * the argument is null.
+     */
+    static <T> T notNull(T o, String message)
+    {
+        if (o == null)
+            throw new NullPointerException(message);
+
+        return o;
+    }
+
+    /**
+     * A utility method to cast keys. It actually doesn't
+     * cast anything. It's just fooling the compiler!
+     */
+    @SuppressWarnings("unchecked")
+    static <K> K cast(Object key)
+    {
+        return (K)key;
+    }
+}

diff --git a/src/java/org/apache/cassandra/index/transactions/CleanupTransaction.java b/src/java/org/apache/cassandra/index/transactions/CleanupTransaction.java
index 1d6ba56..29ee14c 100644
--- a/src/java/org/apache/cassandra/index/transactions/CleanupTransaction.java
+++ b/src/java/org/apache/cassandra/index/transactions/CleanupTransaction.java

@@ -26,7 +26,7 @@
  *
  * Notifies registered indexers of each partition being removed and
  *
- * Compaction & Cleanup are somewhat simpler than dealing with incoming writes,
+ * Compaction and Cleanup are somewhat simpler than dealing with incoming writes,
  * being only concerned with cleaning up stale index entries.
  *
  * When multiple versions of a row are compacted, the CleanupTransaction is

diff --git a/src/java/org/apache/cassandra/index/transactions/IndexTransaction.java b/src/java/org/apache/cassandra/index/transactions/IndexTransaction.java
index 3fb8235..3d4b7e2 100644
--- a/src/java/org/apache/cassandra/index/transactions/IndexTransaction.java
+++ b/src/java/org/apache/cassandra/index/transactions/IndexTransaction.java

@@ -26,7 +26,7 @@
  *   Used on the regular write path and when indexing newly acquired SSTables from streaming or sideloading. This type
  *   of transaction may include both row inserts and updates to rows previously existing in the base Memtable. Instances
  *   are scoped to a single partition update and are obtained from the factory method
- *   @{code SecondaryIndexManager#newUpdateTransaction}
+ *   {@code SecondaryIndexManager#newUpdateTransaction}
  *
  * * {@code CompactionTransaction}
  *   Used during compaction when stale entries which have been superceded are cleaned up from the index. As rows in a

diff --git a/src/java/org/apache/cassandra/index/transactions/UpdateTransaction.java b/src/java/org/apache/cassandra/index/transactions/UpdateTransaction.java
index c78304a..51533c2 100644
--- a/src/java/org/apache/cassandra/index/transactions/UpdateTransaction.java
+++ b/src/java/org/apache/cassandra/index/transactions/UpdateTransaction.java

@@ -53,7 +53,7 @@
  * onInserted(row)*              -- called for each Row not already present in the Memtable
  * onUpdated(existing, updated)* -- called for any Row in the update for where a version was already present
  *                                  in the Memtable. It's important to note here that existing is the previous
- *                                  row from the Memtable & updated is the final version replacing it. It is
+ *                                  row from the Memtable and updated is the final version replacing it. It is
  *                                  *not* the incoming row, but the result of merging the incoming and existing
  *                                  rows.
  * commit()                      -- finally, finish is called when the new Partition is swapped into the Memtable

diff --git a/src/java/org/apache/cassandra/io/ISerializer.java b/src/java/org/apache/cassandra/io/ISerializer.java
index 562d226..637a1c7 100644
--- a/src/java/org/apache/cassandra/io/ISerializer.java
+++ b/src/java/org/apache/cassandra/io/ISerializer.java

@@ -43,4 +43,9 @@
     public T deserialize(DataInputPlus in) throws IOException;
 
     public long serializedSize(T t);
+
+    public default void skip(DataInputPlus in) throws IOException
+    {
+        deserialize(in);
+    }
 }

diff --git a/src/java/org/apache/cassandra/io/compress/CompressedRandomAccessReader.java b/src/java/org/apache/cassandra/io/compress/CompressedRandomAccessReader.java
deleted file mode 100644
index 329d932..0000000
--- a/src/java/org/apache/cassandra/io/compress/CompressedRandomAccessReader.java
+++ /dev/null

@@ -1,286 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.io.compress;
-
-import java.io.*;
-import java.nio.ByteBuffer;
-import java.util.concurrent.ThreadLocalRandom;
-import java.util.zip.Checksum;
-import java.util.function.Supplier;
-
-import com.google.common.annotations.VisibleForTesting;
-import com.google.common.primitives.Ints;
-
-import org.apache.cassandra.io.FSReadError;
-import org.apache.cassandra.io.sstable.CorruptSSTableException;
-import org.apache.cassandra.io.util.*;
-import org.apache.cassandra.utils.memory.BufferPool;
-
-/**
- * CRAR extends RAR to transparently uncompress blocks from the file into RAR.buffer.  Most of the RAR
- * "read bytes from the buffer, rebuffering when necessary" machinery works unchanged after that.
- */
-public class CompressedRandomAccessReader extends RandomAccessReader
-{
-    private final CompressionMetadata metadata;
-
-    // we read the raw compressed bytes into this buffer, then move the uncompressed ones into super.buffer.
-    private ByteBuffer compressed;
-
-    // re-use single crc object
-    private final Checksum checksum;
-
-    // raw checksum bytes
-    private ByteBuffer checksumBytes;
-
-    @VisibleForTesting
-    public double getCrcCheckChance()
-    {
-        return metadata.parameters.getCrcCheckChance();
-    }
-
-    protected CompressedRandomAccessReader(Builder builder)
-    {
-        super(builder);
-        this.metadata = builder.metadata;
-        this.checksum = metadata.checksumType.newInstance();
-
-        if (regions == null)
-        {
-            compressed = allocateBuffer(metadata.compressor().initialCompressedBufferLength(metadata.chunkLength()), bufferType);
-            checksumBytes = ByteBuffer.wrap(new byte[4]);
-        }
-    }
-
-    @Override
-    protected void releaseBuffer()
-    {
-        try
-        {
-            if (buffer != null)
-            {
-                BufferPool.put(buffer);
-                buffer = null;
-            }
-        }
-        finally
-        {
-            // this will always be null if using mmap access mode (unlike in parent, where buffer is set to a region)
-            if (compressed != null)
-            {
-                BufferPool.put(compressed);
-                compressed = null;
-            }
-        }
-    }
-
-    @Override
-    protected void reBufferStandard()
-    {
-        try
-        {
-            long position = current();
-            assert position < metadata.dataLength;
-
-            CompressionMetadata.Chunk chunk = metadata.chunkFor(position);
-
-            if (compressed.capacity() < chunk.length)
-            {
-                BufferPool.put(compressed);
-                compressed = allocateBuffer(chunk.length, bufferType);
-            }
-            else
-            {
-                compressed.clear();
-            }
-
-            compressed.limit(chunk.length);
-            if (channel.read(compressed, chunk.offset) != chunk.length)
-                throw new CorruptBlockException(getPath(), chunk);
-
-            compressed.flip();
-            buffer.clear();
-
-            try
-            {
-                metadata.compressor().uncompress(compressed, buffer);
-            }
-            catch (IOException e)
-            {
-                throw new CorruptBlockException(getPath(), chunk);
-            }
-            finally
-            {
-                buffer.flip();
-            }
-
-            if (getCrcCheckChance() > ThreadLocalRandom.current().nextDouble())
-            {
-                compressed.rewind();
-                metadata.checksumType.update( checksum, (compressed));
-
-                if (checksum(chunk) != (int) checksum.getValue())
-                    throw new CorruptBlockException(getPath(), chunk);
-
-                // reset checksum object back to the original (blank) state
-                checksum.reset();
-            }
-
-            // buffer offset is always aligned
-            bufferOffset = position & ~(buffer.capacity() - 1);
-            buffer.position((int) (position - bufferOffset));
-            // the length() can be provided at construction time, to override the true (uncompressed) length of the file;
-            // this is permitted to occur within a compressed segment, so we truncate validBufferBytes if we cross the imposed length
-            if (bufferOffset + buffer.limit() > length())
-                buffer.limit((int)(length() - bufferOffset));
-        }
-        catch (CorruptBlockException e)
-        {
-            throw new CorruptSSTableException(e, getPath());
-        }
-        catch (IOException e)
-        {
-            throw new FSReadError(e, getPath());
-        }
-    }
-
-    @Override
-    protected void reBufferMmap()
-    {
-        try
-        {
-            long position = current();
-            assert position < metadata.dataLength;
-
-            CompressionMetadata.Chunk chunk = metadata.chunkFor(position);
-
-            MmappedRegions.Region region = regions.floor(chunk.offset);
-            long segmentOffset = region.bottom();
-            int chunkOffset = Ints.checkedCast(chunk.offset - segmentOffset);
-            ByteBuffer compressedChunk = region.buffer.duplicate(); // TODO: change to slice(chunkOffset) when we upgrade LZ4-java
-
-            compressedChunk.position(chunkOffset).limit(chunkOffset + chunk.length);
-
-            buffer.clear();
-
-            try
-            {
-                metadata.compressor().uncompress(compressedChunk, buffer);
-            }
-            catch (IOException e)
-            {
-                throw new CorruptBlockException(getPath(), chunk);
-            }
-            finally
-            {
-                buffer.flip();
-            }
-
-            if (getCrcCheckChance() > ThreadLocalRandom.current().nextDouble())
-            {
-                compressedChunk.position(chunkOffset).limit(chunkOffset + chunk.length);
-
-                metadata.checksumType.update( checksum, compressedChunk);
-
-                compressedChunk.limit(compressedChunk.capacity());
-                if (compressedChunk.getInt() != (int) checksum.getValue())
-                    throw new CorruptBlockException(getPath(), chunk);
-
-                // reset checksum object back to the original (blank) state
-                checksum.reset();
-            }
-
-            // buffer offset is always aligned
-            bufferOffset = position & ~(buffer.capacity() - 1);
-            buffer.position((int) (position - bufferOffset));
-            // the length() can be provided at construction time, to override the true (uncompressed) length of the file;
-            // this is permitted to occur within a compressed segment, so we truncate validBufferBytes if we cross the imposed length
-            if (bufferOffset + buffer.limit() > length())
-                buffer.limit((int)(length() - bufferOffset));
-        }
-        catch (CorruptBlockException e)
-        {
-            throw new CorruptSSTableException(e, getPath());
-        }
-
-    }
-
-    private int checksum(CompressionMetadata.Chunk chunk) throws IOException
-    {
-        long position = chunk.offset + chunk.length;
-        checksumBytes.clear();
-        if (channel.read(checksumBytes, position) != checksumBytes.capacity())
-            throw new CorruptBlockException(getPath(), chunk);
-        return checksumBytes.getInt(0);
-    }
-
-    @Override
-    public long length()
-    {
-        return metadata.dataLength;
-    }
-
-    @Override
-    public String toString()
-    {
-        return String.format("%s - chunk length %d, data length %d.", getPath(), metadata.chunkLength(), metadata.dataLength);
-    }
-
-    public final static class Builder extends RandomAccessReader.Builder
-    {
-        private final CompressionMetadata metadata;
-
-        public Builder(ICompressedFile file)
-        {
-            super(file.channel());
-            this.metadata = applyMetadata(file.getMetadata());
-            this.regions = file.regions();
-        }
-
-        public Builder(ChannelProxy channel, CompressionMetadata metadata)
-        {
-            super(channel);
-            this.metadata = applyMetadata(metadata);
-        }
-
-        private CompressionMetadata applyMetadata(CompressionMetadata metadata)
-        {
-            this.overrideLength = metadata.compressedFileLength;
-            this.bufferSize = metadata.chunkLength();
-            this.bufferType = metadata.compressor().preferredBufferType();
-
-            assert Integer.bitCount(this.bufferSize) == 1; //must be a power of two
-
-            return metadata;
-        }
-
-        @Override
-        protected ByteBuffer createBuffer()
-        {
-            buffer = allocateBuffer(bufferSize, bufferType);
-            buffer.limit(0);
-            return buffer;
-        }
-
-        @Override
-        public RandomAccessReader build()
-        {
-            return new CompressedRandomAccessReader(this);
-        }
-    }
-}

diff --git a/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java b/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java
index 9bd1145..9bdb1b4 100644
--- a/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java
+++ b/src/java/org/apache/cassandra/io/compress/CompressedSequentialWriter.java

@@ -17,29 +17,27 @@
  */
 package org.apache.cassandra.io.compress;
 
-import static org.apache.cassandra.utils.Throwables.merge;
-
 import java.io.DataOutputStream;
 import java.io.EOFException;
 import java.io.File;
 import java.io.IOException;
 import java.nio.ByteBuffer;
 import java.nio.channels.Channels;
+import java.util.Optional;
 import java.util.zip.CRC32;
 
 import org.apache.cassandra.io.FSReadError;
 import org.apache.cassandra.io.FSWriteError;
 import org.apache.cassandra.io.sstable.CorruptSSTableException;
 import org.apache.cassandra.io.sstable.metadata.MetadataCollector;
-import org.apache.cassandra.io.util.DataIntegrityMetadata;
-import org.apache.cassandra.io.util.DataPosition;
-import org.apache.cassandra.io.util.FileUtils;
-import org.apache.cassandra.io.util.SequentialWriter;
+import org.apache.cassandra.io.util.*;
 import org.apache.cassandra.schema.CompressionParams;
 
+import static org.apache.cassandra.utils.Throwables.merge;
+
 public class CompressedSequentialWriter extends SequentialWriter
 {
-    private final DataIntegrityMetadata.ChecksumWriter crcMetadata;
+    private final ChecksumWriter crcMetadata;
 
     // holds offset in the file where current chunk should be written
     // changed only by flush() method where data buffer gets compressed and stored to the file
@@ -60,14 +58,34 @@
     private final MetadataCollector sstableMetadataCollector;
 
     private final ByteBuffer crcCheckBuffer = ByteBuffer.allocate(4);
+    private final Optional<File> digestFile;
 
+    /**
+     * Create CompressedSequentialWriter without digest file.
+     *
+     * @param file File to write
+     * @param offsetsPath File name to write compression metadata
+     * @param digestFile File to write digest
+     * @param option Write option (buffer size and type will be set the same as compression params)
+     * @param parameters Compression mparameters
+     * @param sstableMetadataCollector Metadata collector
+     */
     public CompressedSequentialWriter(File file,
                                       String offsetsPath,
+                                      File digestFile,
+                                      SequentialWriterOption option,
                                       CompressionParams parameters,
                                       MetadataCollector sstableMetadataCollector)
     {
-        super(file, parameters.chunkLength(), parameters.getSstableCompressor().preferredBufferType());
+        super(file, SequentialWriterOption.newBuilder()
+                            .bufferSize(option.bufferSize())
+                            .bufferType(option.bufferType())
+                            .bufferSize(parameters.chunkLength())
+                            .bufferType(parameters.getSstableCompressor().preferredBufferType())
+                            .finishOnClose(option.finishOnClose())
+                            .build());
         this.compressor = parameters.getSstableCompressor();
+        this.digestFile = Optional.ofNullable(digestFile);
 
         // buffer for compression should be the same size as buffer itself
         compressed = compressor.preferredBufferType().allocate(compressor.initialCompressedBufferLength(buffer.capacity()));
@@ -76,7 +94,7 @@
         metadataWriter = CompressionMetadata.Writer.open(parameters, offsetsPath);
 
         this.sstableMetadataCollector = sstableMetadataCollector;
-        crcMetadata = new DataIntegrityMetadata.ChecksumWriter(new DataOutputStream(Channels.newOutputStream(channel)));
+        crcMetadata = new ChecksumWriter(new DataOutputStream(Channels.newOutputStream(channel)));
     }
 
     @Override
@@ -92,6 +110,17 @@
         }
     }
 
+    /**
+     * Get a quick estimation on how many bytes have been written to disk
+     *
+     * It should for the most part be exactly the same as getOnDiskFilePointer()
+     */
+    @Override
+    public long getEstimatedOnDiskBytesWritten()
+    {
+        return chunkOffset;
+    }
+
     @Override
     public void flush()
     {
@@ -276,8 +305,7 @@
         protected void doPrepare()
         {
             syncInternal();
-            if (descriptor != null)
-                crcMetadata.writeFullChecksum(descriptor);
+            digestFile.ifPresent(crcMetadata::writeFullChecksum);
             sstableMetadataCollector.addCompressionRatio(compressedSize, uncompressedSize);
             metadataWriter.finalizeLength(current(), chunkCount).prepareToCommit();
         }

diff --git a/src/java/org/apache/cassandra/io/compress/ICompressor.java b/src/java/org/apache/cassandra/io/compress/ICompressor.java
index 5719834..40dc7c2 100644
--- a/src/java/org/apache/cassandra/io/compress/ICompressor.java
+++ b/src/java/org/apache/cassandra/io/compress/ICompressor.java

@@ -49,7 +49,7 @@
     public BufferType preferredBufferType();
 
     /**
-     * Checks if the given buffer would be supported by the compressor. If a type is supported the compressor must be
+     * Checks if the given buffer would be supported by the compressor. If a type is supported, the compressor must be
      * able to use it in combination with all other supported types.
      *
      * Direct and memory-mapped buffers must be supported by all compressors.

diff --git a/src/java/org/apache/cassandra/io/compress/LZ4Compressor.java b/src/java/org/apache/cassandra/io/compress/LZ4Compressor.java
index 3a3b024..3fd889e 100644
--- a/src/java/org/apache/cassandra/io/compress/LZ4Compressor.java
+++ b/src/java/org/apache/cassandra/io/compress/LZ4Compressor.java

@@ -23,31 +23,80 @@
 import java.util.HashSet;
 import java.util.Map;
 import java.util.Set;
+import java.util.concurrent.ConcurrentHashMap;
 
 import com.google.common.annotations.VisibleForTesting;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import net.jpountz.lz4.LZ4Exception;
 import net.jpountz.lz4.LZ4Factory;
-import org.apache.cassandra.schema.CompressionParams;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.utils.Pair;
 
 public class LZ4Compressor implements ICompressor
 {
+    private static final Logger logger = LoggerFactory.getLogger(LZ4Compressor.class);
+
+    public static final String LZ4_FAST_COMPRESSOR = "fast";
+    public static final String LZ4_HIGH_COMPRESSOR = "high";
+    private static final Set<String> VALID_COMPRESSOR_TYPES = new HashSet<>(Arrays.asList(LZ4_FAST_COMPRESSOR, LZ4_HIGH_COMPRESSOR));
+
+    private static final int DEFAULT_HIGH_COMPRESSION_LEVEL = 9;
+    private static final String DEFAULT_LZ4_COMPRESSOR_TYPE = LZ4_FAST_COMPRESSOR;
+
+    public static final String LZ4_HIGH_COMPRESSION_LEVEL = "lz4_high_compressor_level";
+    public static final String LZ4_COMPRESSOR_TYPE = "lz4_compressor_type";
+
     private static final int INTEGER_BYTES = 4;
 
-    @VisibleForTesting
-    public static final LZ4Compressor instance = new LZ4Compressor();
+    private static final ConcurrentHashMap<Pair<String, Integer>, LZ4Compressor> instances = new ConcurrentHashMap<>();
 
-    public static LZ4Compressor create(Map<String, String> args)
+    public static LZ4Compressor create(Map<String, String> args) throws ConfigurationException
     {
+        String compressorType = validateCompressorType(args.get(LZ4_COMPRESSOR_TYPE));
+        Integer compressionLevel = validateCompressionLevel(args.get(LZ4_HIGH_COMPRESSION_LEVEL));
+
+        Pair<String, Integer> compressorTypeAndLevel = Pair.create(compressorType, compressionLevel);
+        LZ4Compressor instance = instances.get(compressorTypeAndLevel);
+        if (instance == null)
+        {
+            if (compressorType.equals(LZ4_FAST_COMPRESSOR) && args.get(LZ4_HIGH_COMPRESSION_LEVEL) != null)
+                logger.warn("'{}' parameter is ignored when '{}' is '{}'", LZ4_HIGH_COMPRESSION_LEVEL, LZ4_COMPRESSOR_TYPE, LZ4_FAST_COMPRESSOR);
+            instance = new LZ4Compressor(compressorType, compressionLevel);
+            LZ4Compressor instanceFromMap = instances.putIfAbsent(compressorTypeAndLevel, instance);
+            if(instanceFromMap != null)
+                instance = instanceFromMap;
+        }
         return instance;
     }
 
     private final net.jpountz.lz4.LZ4Compressor compressor;
     private final net.jpountz.lz4.LZ4FastDecompressor decompressor;
+    @VisibleForTesting
+    final String compressorType;
+    @VisibleForTesting
+    final Integer compressionLevel;
 
-    private LZ4Compressor()
+    private LZ4Compressor(String type, Integer compressionLevel)
     {
+        this.compressorType = type;
+        this.compressionLevel = compressionLevel;
         final LZ4Factory lz4Factory = LZ4Factory.fastestInstance();
-        compressor = lz4Factory.fastCompressor();
+        switch (type)
+        {
+            case LZ4_HIGH_COMPRESSOR:
+            {
+                compressor = lz4Factory.highCompressor(compressionLevel);
+                break;
+            }
+            case LZ4_FAST_COMPRESSOR:
+            default:
+            {
+                compressor = lz4Factory.fastCompressor();
+            }
+        }
+
         decompressor = lz4Factory.fastDecompressor();
     }
 
@@ -127,7 +176,50 @@
 
     public Set<String> supportedOptions()
     {
-        return new HashSet<>();
+        return new HashSet<>(Arrays.asList(LZ4_HIGH_COMPRESSION_LEVEL, LZ4_COMPRESSOR_TYPE));
+    }
+
+    public static String validateCompressorType(String compressorType) throws ConfigurationException
+    {
+        if (compressorType == null)
+            return DEFAULT_LZ4_COMPRESSOR_TYPE;
+
+        if (!VALID_COMPRESSOR_TYPES.contains(compressorType))
+        {
+            throw new ConfigurationException(String.format("Invalid compressor type '%s' specified for LZ4 parameter '%s'. "
+                                                           + "Valid options are %s.", compressorType, LZ4_COMPRESSOR_TYPE,
+                                                           VALID_COMPRESSOR_TYPES.toString()));
+        }
+        else
+        {
+            return compressorType;
+        }
+    }
+
+    public static Integer validateCompressionLevel(String compressionLevel) throws ConfigurationException
+    {
+        if (compressionLevel == null)
+            return DEFAULT_HIGH_COMPRESSION_LEVEL;
+
+        ConfigurationException ex = new ConfigurationException("Invalid value [" + compressionLevel + "] for parameter '"
+                                                                 + LZ4_HIGH_COMPRESSION_LEVEL + "'. Value must be between 1 and 17.");
+
+        Integer level;
+        try
+        {
+            level = Integer.parseInt(compressionLevel);
+        }
+        catch (NumberFormatException e)
+        {
+            throw ex;
+        }
+
+        if (level < 1 || level > 17)
+        {
+            throw ex;
+        }
+
+        return level;
     }
 
     public BufferType preferredBufferType()

diff --git a/src/java/org/apache/cassandra/io/sstable/AbstractSSTableSimpleWriter.java b/src/java/org/apache/cassandra/io/sstable/AbstractSSTableSimpleWriter.java
index 62348ec..0213fd5 100644
--- a/src/java/org/apache/cassandra/io/sstable/AbstractSSTableSimpleWriter.java
+++ b/src/java/org/apache/cassandra/io/sstable/AbstractSSTableSimpleWriter.java

@@ -22,6 +22,7 @@
 import java.io.IOException;
 import java.io.Closeable;
 import java.nio.ByteBuffer;
+import java.util.Collections;
 import java.util.HashSet;
 import java.util.Set;
 import java.util.concurrent.atomic.AtomicInteger;
@@ -65,7 +66,8 @@
                                        0,
                                        ActiveRepairService.UNREPAIRED_SSTABLE,
                                        0,
-                                       new SerializationHeader(true, metadata, columns, EncodingStats.NO_STATS));
+                                       new SerializationHeader(true, metadata, columns, EncodingStats.NO_STATS),
+                                       Collections.emptySet());
     }
 
     private static Descriptor createDescriptor(File directory, final String keyspace, final String columnFamily, final SSTableFormat.Type fmt)

diff --git a/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java b/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java
index 81a3356..76c0e19 100644
--- a/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java
+++ b/src/java/org/apache/cassandra/io/sstable/CQLSSTableWriter.java

@@ -21,28 +21,44 @@
 import java.io.File;
 import java.io.IOException;
 import java.nio.ByteBuffer;
-import java.util.*;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.SortedSet;
+import java.util.stream.Collectors;
 
-import org.apache.cassandra.config.*;
-import org.apache.cassandra.cql3.*;
-import org.apache.cassandra.cql3.statements.CFStatement;
+import com.datastax.driver.core.ProtocolVersion;
+import com.datastax.driver.core.TypeCodec;
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.Config;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.config.Schema;
+import org.apache.cassandra.cql3.ColumnSpecification;
+import org.apache.cassandra.cql3.QueryOptions;
+import org.apache.cassandra.cql3.QueryProcessor;
+import org.apache.cassandra.cql3.UpdateParameters;
+import org.apache.cassandra.cql3.functions.UDHelper;
 import org.apache.cassandra.cql3.statements.CreateTableStatement;
+import org.apache.cassandra.cql3.statements.CreateTypeStatement;
 import org.apache.cassandra.cql3.statements.ParsedStatement;
 import org.apache.cassandra.cql3.statements.UpdateStatement;
 import org.apache.cassandra.db.Clustering;
 import org.apache.cassandra.db.DecoratedKey;
-import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.marshal.UserType;
 import org.apache.cassandra.db.partitions.Partition;
 import org.apache.cassandra.dht.IPartitioner;
 import org.apache.cassandra.dht.Murmur3Partitioner;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.exceptions.RequestValidationException;
+import org.apache.cassandra.exceptions.SyntaxException;
 import org.apache.cassandra.io.sstable.format.SSTableFormat;
 import org.apache.cassandra.schema.KeyspaceMetadata;
 import org.apache.cassandra.schema.KeyspaceParams;
-import org.apache.cassandra.schema.Tables;
 import org.apache.cassandra.schema.Types;
 import org.apache.cassandra.service.ClientState;
+import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.Pair;
 
 /**
@@ -50,12 +66,14 @@
  * <p>
  * Typical usage looks like:
  * <pre>
+ *   String type = CREATE TYPE myKs.myType (a int, b int)";
  *   String schema = "CREATE TABLE myKs.myTable ("
  *                 + "  k int PRIMARY KEY,"
  *                 + "  v1 text,"
- *                 + "  v2 int"
+ *                 + "  v2 int,"
+ *                 + "  v3 myType,"
  *                 + ")";
- *   String insert = "INSERT INTO myKs.myTable (k, v1, v2) VALUES (?, ?, ?)";
+ *   String insert = "INSERT INTO myKs.myTable (k, v1, v2, v3) VALUES (?, ?, ?, ?)";
  *
  *   // Creates a new writer. You need to provide at least the directory where to write the created sstable,
  *   // the schema for the sstable to write and a (prepared) insert statement to use. If you do not use the
@@ -63,13 +81,15 @@
  *   // CQLSSTableWriter.Builder for more details on the available options.
  *   CQLSSTableWriter writer = CQLSSTableWriter.builder()
  *                                             .inDirectory("path/to/directory")
+ *                                             .withType(type)
  *                                             .forTable(schema)
  *                                             .using(insert).build();
  *
+ *   UserType myType = writer.getUDType("myType");
  *   // Adds a nember of rows to the resulting sstable
- *   writer.addRow(0, "test1", 24);
- *   writer.addRow(1, "test2", null);
- *   writer.addRow(2, "test3", 42);
+ *   writer.addRow(0, "test1", 24, myType.newValue().setInt("a", 10).setInt("b", 20));
+ *   writer.addRow(1, "test2", null, null);
+ *   writer.addRow(2, "test3", 42, myType.newValue().setInt("a", 30).setInt("b", 40));
  *
  *   // Close the writer, finalizing the sstable
  *   writer.close();
@@ -81,6 +101,8 @@
  */
 public class CQLSSTableWriter implements Closeable
 {
+    public static final ByteBuffer UNSET_VALUE = ByteBufferUtil.UNSET_BYTE_BUFFER;
+
     static
     {
         Config.setClientMode(true);
@@ -92,12 +114,15 @@
     private final AbstractSSTableSimpleWriter writer;
     private final UpdateStatement insert;
     private final List<ColumnSpecification> boundNames;
+    private final List<TypeCodec> typeCodecs;
 
     private CQLSSTableWriter(AbstractSSTableSimpleWriter writer, UpdateStatement insert, List<ColumnSpecification> boundNames)
     {
         this.writer = writer;
         this.insert = insert;
         this.boundNames = boundNames;
+        this.typeCodecs = boundNames.stream().map(bn ->  UDHelper.codecFor(UDHelper.driverType(bn.type)))
+                                             .collect(Collectors.toList());
     }
 
     /**
@@ -145,8 +170,13 @@
     {
         int size = Math.min(values.size(), boundNames.size());
         List<ByteBuffer> rawValues = new ArrayList<>(size);
+
         for (int i = 0; i < size; i++)
-            rawValues.add(values.get(i) == null ? null : ((AbstractType)boundNames.get(i).type).decompose(values.get(i)));
+        {
+            Object value = values.get(i);
+            rawValues.add(serialize(value, typeCodecs.get(i)));
+        }
+
         return rawAddRow(rawValues);
     }
 
@@ -175,10 +205,11 @@
     {
         int size = boundNames.size();
         List<ByteBuffer> rawValues = new ArrayList<>(size);
-        for (int i = 0; i < size; i++) {
+        for (int i = 0; i < size; i++)
+        {
             ColumnSpecification spec = boundNames.get(i);
             Object value = values.get(spec.name.toString());
-            rawValues.add(value == null ? null : ((AbstractType)spec.type).decompose(value));
+            rawValues.add(serialize(value, typeCodecs.get(i)));
         }
         return rawAddRow(rawValues);
     }
@@ -270,6 +301,20 @@
     }
 
     /**
+     * Returns the User Defined type, used in this SSTable Writer, that can
+     * be used to create UDTValue instances.
+     *
+     * @param dataType name of the User Defined type
+     * @return user defined type
+     */
+    public com.datastax.driver.core.UserType getUDType(String dataType)
+    {
+        KeyspaceMetadata ksm = Schema.instance.getKSMetaData(insert.keyspace());
+        UserType userType = ksm.types.getNullable(ByteBufferUtil.bytes(dataType));
+        return (com.datastax.driver.core.UserType) UDHelper.driverType(userType);
+    }
+
+    /**
      * Close this writer.
      * <p>
      * This method should be called, otherwise the produced sstables are not
@@ -280,6 +325,13 @@
         writer.close();
     }
 
+    private ByteBuffer serialize(Object value, TypeCodec codec)
+    {
+        if (value == null || value == UNSET_VALUE)
+            return (ByteBuffer) value;
+
+        return codec.serialize(value, ProtocolVersion.NEWEST_SUPPORTED);
+    }
     /**
      * A Builder for a CQLSSTableWriter object.
      */
@@ -289,14 +341,17 @@
 
         protected SSTableFormat.Type formatType = null;
 
-        private CFMetaData schema;
-        private UpdateStatement insert;
-        private List<ColumnSpecification> boundNames;
+        private CreateTableStatement.RawStatement schemaStatement;
+        private final List<CreateTypeStatement> typeStatements;
+        private UpdateStatement.ParsedInsert insertStatement;
+        private IPartitioner partitioner;
 
         private boolean sorted = false;
         private long bufferSizeInMB = 128;
 
-        protected Builder() {}
+        protected Builder() {
+            this.typeStatements = new ArrayList<>();
+        }
 
         /**
          * The directory where to write the sstables.
@@ -334,6 +389,12 @@
             return this;
         }
 
+        public Builder withType(String typeDefinition) throws SyntaxException
+        {
+            typeStatements.add(parseStatement(typeDefinition, CreateTypeStatement.class, "CREATE TYPE"));
+            return this;
+        }
+
         /**
          * The schema (CREATE TABLE statement) for the table for which sstable are to be created.
          * <p>
@@ -350,52 +411,8 @@
          */
         public Builder forTable(String schema)
         {
-            try
-            {
-                synchronized (CQLSSTableWriter.class)
-                {
-                    this.schema = getTableMetadata(schema);
-
-                    // We need to register the keyspace/table metadata through Schema, otherwise we won't be able to properly
-                    // build the insert statement in using().
-                    KeyspaceMetadata ksm = Schema.instance.getKSMetaData(this.schema.ksName);
-                    if (ksm == null)
-                    {
-                        createKeyspaceWithTable(this.schema);
-                    }
-                    else if (Schema.instance.getCFMetaData(this.schema.ksName, this.schema.cfName) == null)
-                    {
-                        addTableToKeyspace(ksm, this.schema);
-                    }
-                    return this;
-                }
-            }
-            catch (RequestValidationException e)
-            {
-                throw new IllegalArgumentException(e.getMessage(), e);
-            }
-        }
-
-        /**
-         * Creates the keyspace with the specified table.
-         *
-         * @param table the table that must be created.
-         */
-        private static void createKeyspaceWithTable(CFMetaData table)
-        {
-            Schema.instance.load(KeyspaceMetadata.create(table.ksName, KeyspaceParams.simple(1), Tables.of(table)));
-        }
-
-        /**
-         * Adds the table to the to the specified keyspace.
-         *
-         * @param keyspace the keyspace to add to
-         * @param table the table to add
-         */
-        private static void addTableToKeyspace(KeyspaceMetadata keyspace, CFMetaData table)
-        {
-            Schema.instance.load(table);
-            Schema.instance.setKeyspaceMetadata(keyspace.withSwapped(keyspace.tables.with(table)));
+            this.schemaStatement = parseStatement(schema, CreateTableStatement.RawStatement.class, "CREATE TABLE");
+            return this;
         }
 
         /**
@@ -410,7 +427,7 @@
          */
         public Builder withPartitioner(IPartitioner partitioner)
         {
-            this.schema = schema.copy(partitioner);
+            this.partitioner = partitioner;
             return this;
         }
 
@@ -424,27 +441,16 @@
          * <p>
          * This is a mandatory option, and this needs to be called after foTable().
          *
-         * @param insertStatement an insertion statement that defines the order
+         * @param insert an insertion statement that defines the order
          * of column values to use.
          * @return this builder.
          *
          * @throws IllegalArgumentException if {@code insertStatement} is not a valid insertion
          * statement, does not have a fully-qualified table name or have no bind variables.
          */
-        public Builder using(String insertStatement)
+        public Builder using(String insert)
         {
-            if (schema == null)
-                throw new IllegalStateException("You need to define the schema by calling forTable() prior to this call.");
-
-            Pair<UpdateStatement, List<ColumnSpecification>> p = getStatement(insertStatement, UpdateStatement.class, "INSERT");
-            this.insert = p.left;
-            this.boundNames = p.right;
-            if (this.insert.hasConditions())
-                throw new IllegalArgumentException("Conditional statements are not supported");
-            if (this.insert.isCounter())
-                throw new IllegalArgumentException("Counter update statements are not supported");
-            if (this.boundNames.isEmpty())
-                throw new IllegalArgumentException("Provided insert statement has no bind variables");
+            this.insertStatement = parseStatement(insert, UpdateStatement.ParsedInsert.class, "INSERT");
             return this;
         }
 
@@ -490,54 +496,111 @@
             return this;
         }
 
-        private static CFMetaData getTableMetadata(String schema)
-        {
-            CFStatement parsed = (CFStatement)QueryProcessor.parseStatement(schema);
-            // tables with UDTs are currently not supported by CQLSSTableWrite, so we just use Types.none(), for now
-            // see CASSANDRA-10624 for more details
-            CreateTableStatement statement = (CreateTableStatement) ((CreateTableStatement.RawStatement) parsed).prepare(Types.none()).statement;
-            statement.validate(ClientState.forInternalCalls());
-            return statement.getCFMetaData();
-        }
-
-        private static <T extends CQLStatement> Pair<T, List<ColumnSpecification>> getStatement(String query, Class<T> klass, String type)
-        {
-            try
-            {
-                ClientState state = ClientState.forInternalCalls();
-                ParsedStatement.Prepared prepared = QueryProcessor.getStatement(query, state);
-                CQLStatement stmt = prepared.statement;
-                stmt.validate(state);
-
-                if (!stmt.getClass().equals(klass))
-                    throw new IllegalArgumentException("Invalid query, must be a " + type + " statement");
-
-                return Pair.create(klass.cast(stmt), prepared.boundNames);
-            }
-            catch (RequestValidationException e)
-            {
-                throw new IllegalArgumentException(e.getMessage(), e);
-            }
-        }
-
         @SuppressWarnings("resource")
         public CQLSSTableWriter build()
         {
             if (directory == null)
                 throw new IllegalStateException("No ouptut directory specified, you should provide a directory with inDirectory()");
-            if (schema == null)
+            if (schemaStatement == null)
                 throw new IllegalStateException("Missing schema, you should provide the schema for the SSTable to create with forTable()");
-            if (insert == null)
+            if (insertStatement == null)
                 throw new IllegalStateException("No insert statement specified, you should provide an insert statement through using()");
 
-            AbstractSSTableSimpleWriter writer = sorted
-                                               ? new SSTableSimpleWriter(directory, schema, insert.updatedColumns())
-                                               : new SSTableSimpleUnsortedWriter(directory, schema, insert.updatedColumns(), bufferSizeInMB);
+            synchronized (CQLSSTableWriter.class)
+            {
+                String keyspace = schemaStatement.keyspace();
 
-            if (formatType != null)
-                writer.setSSTableFormatType(formatType);
+                if (Schema.instance.getKSMetaData(keyspace) == null)
+                    Schema.instance.load(KeyspaceMetadata.create(keyspace, KeyspaceParams.simple(1)));
 
-            return new CQLSSTableWriter(writer, insert, boundNames);
+                createTypes(keyspace);
+                CFMetaData cfMetaData = createTable(keyspace);
+                Pair<UpdateStatement, List<ColumnSpecification>> preparedInsert = prepareInsert();
+
+                AbstractSSTableSimpleWriter writer = sorted
+                                                     ? new SSTableSimpleWriter(directory, cfMetaData, preparedInsert.left.updatedColumns())
+                                                     : new SSTableSimpleUnsortedWriter(directory, cfMetaData, preparedInsert.left.updatedColumns(), bufferSizeInMB);
+
+                if (formatType != null)
+                    writer.setSSTableFormatType(formatType);
+
+                return new CQLSSTableWriter(writer, preparedInsert.left, preparedInsert.right);
+            }
+        }
+
+        private void createTypes(String keyspace)
+        {
+            KeyspaceMetadata ksm = Schema.instance.getKSMetaData(keyspace);
+            Types.RawBuilder builder = Types.rawBuilder(keyspace);
+            for (CreateTypeStatement st : typeStatements)
+                st.addToRawBuilder(builder);
+
+            ksm = ksm.withSwapped(builder.build());
+            Schema.instance.setKeyspaceMetadata(ksm);
+        }
+        /**
+         * Creates the table according to schema statement
+         *
+         * @param keyspace name of the keyspace where table should be created
+         */
+        private CFMetaData createTable(String keyspace)
+        {
+            KeyspaceMetadata ksm = Schema.instance.getKSMetaData(keyspace);
+
+            CFMetaData cfMetaData = ksm.tables.getNullable(schemaStatement.columnFamily());
+            if (cfMetaData == null)
+            {
+                CreateTableStatement statement = (CreateTableStatement) schemaStatement.prepare(ksm.types).statement;
+                statement.validate(ClientState.forInternalCalls());
+
+                cfMetaData = statement.getCFMetaData();
+
+                Schema.instance.load(cfMetaData);
+                Schema.instance.setKeyspaceMetadata(ksm.withSwapped(ksm.tables.with(cfMetaData)));
+            }
+
+            if (partitioner != null)
+                return cfMetaData.copy(partitioner);
+            else
+                return cfMetaData;
+        }
+
+        /**
+         * Prepares insert statement for writing data to SSTable
+         *
+         * @return prepared Insert statement and it's bound names
+         */
+        private Pair<UpdateStatement, List<ColumnSpecification>> prepareInsert()
+        {
+            ParsedStatement.Prepared cqlStatement = insertStatement.prepare();
+            UpdateStatement insert = (UpdateStatement) cqlStatement.statement;
+            insert.validate(ClientState.forInternalCalls());
+
+            if (insert.hasConditions())
+                throw new IllegalArgumentException("Conditional statements are not supported");
+            if (insert.isCounter())
+                throw new IllegalArgumentException("Counter update statements are not supported");
+            if (cqlStatement.boundNames.isEmpty())
+                throw new IllegalArgumentException("Provided insert statement has no bind variables");
+
+            return Pair.create(insert, cqlStatement.boundNames);
+        }
+    }
+
+    private static <T extends ParsedStatement> T parseStatement(String query, Class<T> klass, String type)
+    {
+        try
+        {
+            ParsedStatement stmt = QueryProcessor.parseStatement(query);
+
+            if (!stmt.getClass().equals(klass))
+                throw new IllegalArgumentException("Invalid query, must be a " + type + " statement but was: " + stmt.getClass());
+
+            return klass.cast(stmt);
+        }
+        catch (RequestValidationException e)
+        {
+            throw new IllegalArgumentException(e.getMessage(), e);
         }
     }
 }

diff --git a/src/java/org/apache/cassandra/io/sstable/Component.java b/src/java/org/apache/cassandra/io/sstable/Component.java
index 9454882..38152af 100644
--- a/src/java/org/apache/cassandra/io/sstable/Component.java
+++ b/src/java/org/apache/cassandra/io/sstable/Component.java

@@ -19,6 +19,7 @@
 
 import java.io.File;
 import java.util.EnumSet;
+import java.util.regex.Pattern;
 
 import com.google.common.base.Objects;
 
@@ -57,6 +58,8 @@
         SUMMARY("Summary.db"),
         // table of contents, stores the list of all components for the sstable
         TOC("TOC.txt"),
+        // built-in secondary index (may be multiple per sstable)
+        SECONDARY_INDEX("SI_.*.db"),
         // custom component, used by e.g. custom compaction strategy
         CUSTOM(new String[] { null });
         
@@ -74,9 +77,12 @@
         static Type fromRepresentation(String repr)
         {
             for (Type type : TYPES)
-                for (String representation : type.repr)
-                    if (repr.equals(representation))
-                        return type;
+            {
+                if (type.repr == null || type.repr.length == 0 || type.repr[0] == null)
+                    continue;
+                if (Pattern.matches(type.repr[0], repr))
+                    return type;
+            }
             return CUSTOM;
         }
     }
@@ -169,6 +175,7 @@
             case CRC:               component = Component.CRC;                          break;
             case SUMMARY:           component = Component.SUMMARY;                      break;
             case TOC:               component = Component.TOC;                          break;
+            case SECONDARY_INDEX:   component = new Component(Type.SECONDARY_INDEX, path.right); break;
             case CUSTOM:            component = new Component(Type.CUSTOM, path.right); break;
             default:
                  throw new IllegalStateException();

diff --git a/src/java/org/apache/cassandra/io/sstable/Descriptor.java b/src/java/org/apache/cassandra/io/sstable/Descriptor.java
index ff4abfc..7840985 100644
--- a/src/java/org/apache/cassandra/io/sstable/Descriptor.java
+++ b/src/java/org/apache/cassandra/io/sstable/Descriptor.java

@@ -278,11 +278,12 @@
 
         // version
         nexttok = tokenStack.pop();
-        Version version = fmt.info.getVersion(nexttok);
 
-        if (!version.validate(nexttok))
+        if (!Version.validate(nexttok))
             throw new UnsupportedOperationException("SSTable " + name + " is too old to open.  Upgrade to 2.0 first, and run upgradesstables");
 
+        Version version = fmt.info.getVersion(nexttok);
+
         // ks/cf names
         String ksname, cfname;
         if (version.hasNewFileName())

diff --git a/src/java/org/apache/cassandra/io/sstable/IndexHelper.java b/src/java/org/apache/cassandra/io/sstable/IndexHelper.java
deleted file mode 100644
index 74a0fc5..0000000
--- a/src/java/org/apache/cassandra/io/sstable/IndexHelper.java
+++ /dev/null

@@ -1,192 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.io.sstable;
-
-import java.io.*;
-import java.util.Collections;
-import java.util.List;
-
-import org.apache.cassandra.config.CFMetaData;
-import org.apache.cassandra.db.*;
-import org.apache.cassandra.io.ISerializer;
-import org.apache.cassandra.io.sstable.format.Version;
-import org.apache.cassandra.io.util.DataInputPlus;
-import org.apache.cassandra.io.util.DataOutputPlus;
-import org.apache.cassandra.utils.*;
-
-/**
- * Provides helper to serialize, deserialize and use column indexes.
- */
-public final class IndexHelper
-{
-    private IndexHelper()
-    {
-    }
-
-    /**
-     * The index of the IndexInfo in which a scan starting with @name should begin.
-     *
-     * @param name name to search for
-     * @param indexList list of the indexInfo objects
-     * @param comparator the comparator to use
-     * @param reversed whether or not the search is reversed, i.e. we scan forward or backward from name
-     * @param lastIndex where to start the search from in indexList
-     *
-     * @return int index
-     */
-    public static int indexFor(ClusteringPrefix name, List<IndexInfo> indexList, ClusteringComparator comparator, boolean reversed, int lastIndex)
-    {
-        IndexInfo target = new IndexInfo(name, name, 0, 0, null);
-        /*
-        Take the example from the unit test, and say your index looks like this:
-        [0..5][10..15][20..25]
-        and you look for the slice [13..17].
-
-        When doing forward slice, we are doing a binary search comparing 13 (the start of the query)
-        to the lastName part of the index slot. You'll end up with the "first" slot, going from left to right,
-        that may contain the start.
-
-        When doing a reverse slice, we do the same thing, only using as a start column the end of the query,
-        i.e. 17 in this example, compared to the firstName part of the index slots.  bsearch will give us the
-        first slot where firstName > start ([20..25] here), so we subtract an extra one to get the slot just before.
-        */
-        int startIdx = 0;
-        List<IndexInfo> toSearch = indexList;
-        if (reversed)
-        {
-            if (lastIndex < indexList.size() - 1)
-            {
-                toSearch = indexList.subList(0, lastIndex + 1);
-            }
-        }
-        else
-        {
-            if (lastIndex > 0)
-            {
-                startIdx = lastIndex;
-                toSearch = indexList.subList(lastIndex, indexList.size());
-            }
-        }
-        int index = Collections.binarySearch(toSearch, target, comparator.indexComparator(reversed));
-        return startIdx + (index < 0 ? -index - (reversed ? 2 : 1) : index);
-    }
-
-    public static class IndexInfo
-    {
-        private static final long EMPTY_SIZE = ObjectSizes.measure(new IndexInfo(null, null, 0, 0, null));
-
-        public final long offset;
-        public final long width;
-        public final ClusteringPrefix firstName;
-        public final ClusteringPrefix lastName;
-
-        // If at the end of the index block there is an open range tombstone marker, this marker
-        // deletion infos. null otherwise.
-        public final DeletionTime endOpenMarker;
-
-        public IndexInfo(ClusteringPrefix firstName,
-                         ClusteringPrefix lastName,
-                         long offset,
-                         long width,
-                         DeletionTime endOpenMarker)
-        {
-            this.firstName = firstName;
-            this.lastName = lastName;
-            this.offset = offset;
-            this.width = width;
-            this.endOpenMarker = endOpenMarker;
-        }
-
-        public static class Serializer
-        {
-            // This is the default index size that we use to delta-encode width when serializing so we get better vint-encoding.
-            // This is imperfect as user can change the index size and ideally we would save the index size used with each index file
-            // to use as base. However, that's a bit more involved a change that we want for now and very seldom do use change the index
-            // size so using the default is almost surely better than using no base at all.
-            public static final long WIDTH_BASE = 64 * 1024;
-
-            private final ISerializer<ClusteringPrefix> clusteringSerializer;
-            private final Version version;
-
-            public Serializer(CFMetaData metadata, Version version, SerializationHeader header)
-            {
-                this.clusteringSerializer = metadata.serializers().indexEntryClusteringPrefixSerializer(version, header);
-                this.version = version;
-            }
-
-            public void serialize(IndexInfo info, DataOutputPlus out) throws IOException
-            {
-                assert version.storeRows() : "We read old index files but we should never write them";
-
-                clusteringSerializer.serialize(info.firstName, out);
-                clusteringSerializer.serialize(info.lastName, out);
-                out.writeUnsignedVInt(info.offset);
-                out.writeVInt(info.width - WIDTH_BASE);
-
-                out.writeBoolean(info.endOpenMarker != null);
-                if (info.endOpenMarker != null)
-                    DeletionTime.serializer.serialize(info.endOpenMarker, out);
-            }
-
-            public IndexInfo deserialize(DataInputPlus in) throws IOException
-            {
-                ClusteringPrefix firstName = clusteringSerializer.deserialize(in);
-                ClusteringPrefix lastName = clusteringSerializer.deserialize(in);
-                long offset;
-                long width;
-                DeletionTime endOpenMarker = null;
-                if (version.storeRows())
-                {
-                    offset = in.readUnsignedVInt();
-                    width = in.readVInt() + WIDTH_BASE;
-                    if (in.readBoolean())
-                        endOpenMarker = DeletionTime.serializer.deserialize(in);
-                }
-                else
-                {
-                    offset = in.readLong();
-                    width = in.readLong();
-                }
-                return new IndexInfo(firstName, lastName, offset, width, endOpenMarker);
-            }
-
-            public long serializedSize(IndexInfo info)
-            {
-                assert version.storeRows() : "We read old index files but we should never write them";
-
-                long size = clusteringSerializer.serializedSize(info.firstName)
-                          + clusteringSerializer.serializedSize(info.lastName)
-                          + TypeSizes.sizeofUnsignedVInt(info.offset)
-                          + TypeSizes.sizeofVInt(info.width - WIDTH_BASE)
-                          + TypeSizes.sizeof(info.endOpenMarker != null);
-
-                if (info.endOpenMarker != null)
-                    size += DeletionTime.serializer.serializedSize(info.endOpenMarker);
-                return size;
-            }
-        }
-
-        public long unsharedHeapSize()
-        {
-            return EMPTY_SIZE
-                 + firstName.unsharedHeapSize()
-                 + lastName.unsharedHeapSize()
-                 + (endOpenMarker == null ? 0 : endOpenMarker.unsharedHeapSize());
-        }
-    }
-}

diff --git a/src/java/org/apache/cassandra/io/sstable/IndexInfo.java b/src/java/org/apache/cassandra/io/sstable/IndexInfo.java
new file mode 100644
index 0000000..b07ce4a
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/sstable/IndexInfo.java

@@ -0,0 +1,178 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.io.sstable;
+
+import java.io.IOException;
+
+import org.apache.cassandra.db.ClusteringPrefix;
+import org.apache.cassandra.db.DeletionTime;
+import org.apache.cassandra.db.RowIndexEntry;
+import org.apache.cassandra.db.TypeSizes;
+import org.apache.cassandra.io.ISerializer;
+import org.apache.cassandra.io.sstable.format.Version;
+import org.apache.cassandra.io.util.DataInputPlus;
+import org.apache.cassandra.io.util.DataOutputPlus;
+import org.apache.cassandra.utils.ObjectSizes;
+
+/**
+ * {@code IndexInfo} is embedded in the indexed version of {@link RowIndexEntry}.
+ * Each instance roughly covers a range of {@link org.apache.cassandra.config.Config#column_index_size_in_kb column_index_size_in_kb} kB
+ * and contains the first and last clustering value (or slice bound), its offset in the data file and width in the data file.
+ * <p>
+ * Each {@code IndexInfo} object is serialized as follows.
+ * </p>
+ * <p>
+ * Serialization format changed in 3.0. First, the {@code endOpenMarker} has been introduced.
+ * Second, the <i>order</i> of the fields in serialized representation changed to allow future
+ * optimizations to access {@code offset} and {@code width} fields directly without skipping
+ * {@code firstName}/{@code lastName}.
+ * </p>
+ * <p>
+ * {@code
+ *    (*) IndexInfo.firstName (ClusteringPrefix serializer, either Clustering.serializer.serialize or Slice.Bound.serializer.serialize)
+ *    (*) IndexInfo.lastName (ClusteringPrefix serializer, either Clustering.serializer.serialize or Slice.Bound.serializer.serialize)
+ * (long) IndexInfo.offset
+ * (long) IndexInfo.width
+ * (bool) IndexInfo.endOpenMarker != null              (if 3.0)
+ *  (int) IndexInfo.endOpenMarker.localDeletionTime    (if 3.0 && IndexInfo.endOpenMarker != null)
+ * (long) IndexInfo.endOpenMarker.markedForDeletionAt  (if 3.0 && IndexInfo.endOpenMarker != null)
+ * }
+ * </p>
+ */
+public class IndexInfo
+{
+    private static final long EMPTY_SIZE = ObjectSizes.measure(new IndexInfo(null, null, 0, 0, null));
+
+    public final long offset;
+    public final long width;
+    public final ClusteringPrefix firstName;
+    public final ClusteringPrefix lastName;
+
+    // If at the end of the index block there is an open range tombstone marker, this marker
+    // deletion infos. null otherwise.
+    public final DeletionTime endOpenMarker;
+
+    public IndexInfo(ClusteringPrefix firstName,
+                     ClusteringPrefix lastName,
+                     long offset,
+                     long width,
+                     DeletionTime endOpenMarker)
+    {
+        this.firstName = firstName;
+        this.lastName = lastName;
+        this.offset = offset;
+        this.width = width;
+        this.endOpenMarker = endOpenMarker;
+    }
+
+    public static class Serializer implements ISerializer<IndexInfo>
+    {
+        // This is the default index size that we use to delta-encode width when serializing so we get better vint-encoding.
+        // This is imperfect as user can change the index size and ideally we would save the index size used with each index file
+        // to use as base. However, that's a bit more involved a change that we want for now and very seldom do use change the index
+        // size so using the default is almost surely better than using no base at all.
+        public static final long WIDTH_BASE = 64 * 1024;
+
+        private final ISerializer<ClusteringPrefix> clusteringSerializer;
+        private final Version version;
+
+        public Serializer(Version version, ISerializer<ClusteringPrefix> clusteringSerializer)
+        {
+            this.clusteringSerializer = clusteringSerializer;
+            this.version = version;
+        }
+
+        public void serialize(IndexInfo info, DataOutputPlus out) throws IOException
+        {
+            assert version.storeRows() : "We read old index files but we should never write them";
+
+            clusteringSerializer.serialize(info.firstName, out);
+            clusteringSerializer.serialize(info.lastName, out);
+            out.writeUnsignedVInt(info.offset);
+            out.writeVInt(info.width - WIDTH_BASE);
+
+            out.writeBoolean(info.endOpenMarker != null);
+            if (info.endOpenMarker != null)
+                DeletionTime.serializer.serialize(info.endOpenMarker, out);
+        }
+
+        public void skip(DataInputPlus in) throws IOException
+        {
+            clusteringSerializer.skip(in);
+            clusteringSerializer.skip(in);
+            if (version.storeRows())
+            {
+                in.readUnsignedVInt();
+                in.readVInt();
+                if (in.readBoolean())
+                    DeletionTime.serializer.skip(in);
+            }
+            else
+            {
+                in.skipBytes(TypeSizes.sizeof(0L));
+                in.skipBytes(TypeSizes.sizeof(0L));
+            }
+        }
+
+        public IndexInfo deserialize(DataInputPlus in) throws IOException
+        {
+            ClusteringPrefix firstName = clusteringSerializer.deserialize(in);
+            ClusteringPrefix lastName = clusteringSerializer.deserialize(in);
+            long offset;
+            long width;
+            DeletionTime endOpenMarker = null;
+            if (version.storeRows())
+            {
+                offset = in.readUnsignedVInt();
+                width = in.readVInt() + WIDTH_BASE;
+                if (in.readBoolean())
+                    endOpenMarker = DeletionTime.serializer.deserialize(in);
+            }
+            else
+            {
+                offset = in.readLong();
+                width = in.readLong();
+            }
+            return new IndexInfo(firstName, lastName, offset, width, endOpenMarker);
+        }
+
+        public long serializedSize(IndexInfo info)
+        {
+            assert version.storeRows() : "We read old index files but we should never write them";
+
+            long size = clusteringSerializer.serializedSize(info.firstName)
+                        + clusteringSerializer.serializedSize(info.lastName)
+                        + TypeSizes.sizeofUnsignedVInt(info.offset)
+                        + TypeSizes.sizeofVInt(info.width - WIDTH_BASE)
+                        + TypeSizes.sizeof(info.endOpenMarker != null);
+
+            if (info.endOpenMarker != null)
+                size += DeletionTime.serializer.serializedSize(info.endOpenMarker);
+            return size;
+        }
+    }
+
+    public long unsharedHeapSize()
+    {
+        return EMPTY_SIZE
+             + firstName.unsharedHeapSize()
+             + lastName.unsharedHeapSize()
+             + (endOpenMarker == null ? 0 : endOpenMarker.unsharedHeapSize());
+    }
+}

diff --git a/src/java/org/apache/cassandra/io/sstable/IndexSummary.java b/src/java/org/apache/cassandra/io/sstable/IndexSummary.java
index 371a243..6de3478 100644
--- a/src/java/org/apache/cassandra/io/sstable/IndexSummary.java
+++ b/src/java/org/apache/cassandra/io/sstable/IndexSummary.java

@@ -29,7 +29,9 @@
 import org.apache.cassandra.db.PartitionPosition;
 import org.apache.cassandra.dht.IPartitioner;
 import org.apache.cassandra.io.util.*;
+import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.Pair;
 import org.apache.cassandra.utils.concurrent.Ref;
 import org.apache.cassandra.utils.concurrent.WrappedSharedCloseable;
 import org.apache.cassandra.utils.memory.MemoryUtil;
@@ -347,5 +349,26 @@
                 offsets.setInt(i, (int) (offsets.getInt(i) - offsets.size()));
             return new IndexSummary(partitioner, offsets, offsetCount, entries, entries.size(), fullSamplingSummarySize, minIndexInterval, samplingLevel);
         }
+
+        /**
+         * Deserializes the first and last key stored in the summary
+         *
+         * Only for use by offline tools like SSTableMetadataViewer, otherwise SSTable.first/last should be used.
+         */
+        public Pair<DecoratedKey, DecoratedKey> deserializeFirstLastKey(DataInputStream in, IPartitioner partitioner, boolean haveSamplingLevel) throws IOException
+        {
+            in.skipBytes(4); // minIndexInterval
+            int offsetCount = in.readInt();
+            long offheapSize = in.readLong();
+            if (haveSamplingLevel)
+                in.skipBytes(8); // samplingLevel, fullSamplingSummarySize
+
+            in.skip(offsetCount * 4);
+            in.skip(offheapSize - offsetCount * 4);
+
+            DecoratedKey first = partitioner.decorateKey(ByteBufferUtil.readWithLength(in));
+            DecoratedKey last = partitioner.decorateKey(ByteBufferUtil.readWithLength(in));
+            return Pair.create(first, last);
+        }
     }
 }

diff --git a/src/java/org/apache/cassandra/io/sstable/IndexSummaryBuilder.java b/src/java/org/apache/cassandra/io/sstable/IndexSummaryBuilder.java
index 6110afe..1f4fdc2 100644
--- a/src/java/org/apache/cassandra/io/sstable/IndexSummaryBuilder.java
+++ b/src/java/org/apache/cassandra/io/sstable/IndexSummaryBuilder.java

@@ -85,6 +85,7 @@
         }
     }
 
+    @SuppressWarnings("resource")
     public IndexSummaryBuilder(long expectedKeys, int minIndexInterval, int samplingLevel)
     {
         this.samplingLevel = samplingLevel;
@@ -281,7 +282,6 @@
      * @param partitioner the partitioner used for the index summary
      * @return a new IndexSummary
      */
-    @SuppressWarnings("resource")
     public static IndexSummary downsample(IndexSummary existing, int newSamplingLevel, int minIndexInterval, IPartitioner partitioner)
     {
         // To downsample the old index summary, we'll go through (potentially) several rounds of downsampling.

diff --git a/src/java/org/apache/cassandra/io/sstable/IndexSummaryRedistribution.java b/src/java/org/apache/cassandra/io/sstable/IndexSummaryRedistribution.java
index b4eae31..8fb4835 100644
--- a/src/java/org/apache/cassandra/io/sstable/IndexSummaryRedistribution.java
+++ b/src/java/org/apache/cassandra/io/sstable/IndexSummaryRedistribution.java

@@ -40,6 +40,7 @@
 import org.apache.cassandra.db.compaction.OperationType;
 import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.Pair;
 
 import static org.apache.cassandra.io.sstable.Downsampling.BASE_SAMPLING_LEVEL;
@@ -131,8 +132,8 @@
         total = 0;
         for (SSTableReader sstable : Iterables.concat(compacting, oldFormatSSTables, newSSTables))
             total += sstable.getIndexSummaryOffHeapSize();
-        logger.trace("Completed resizing of index summaries; current approximate memory used: {} MB",
-                     total / 1024.0 / 1024.0);
+        logger.trace("Completed resizing of index summaries; current approximate memory used: {}",
+                     FBUtilities.prettyPrintMemory(total));
 
         return newSSTables;
     }
@@ -183,11 +184,12 @@
             int numEntriesAtNewSamplingLevel = IndexSummaryBuilder.entriesAtSamplingLevel(newSamplingLevel, maxSummarySize);
             double effectiveIndexInterval = sstable.getEffectiveIndexInterval();
 
-            logger.trace("{} has {} reads/sec; ideal space for index summary: {} bytes ({} entries); considering moving " +
-                    "from level {} ({} entries, {} bytes) to level {} ({} entries, {} bytes)",
-                    sstable.getFilename(), readsPerSec, idealSpace, targetNumEntries, currentSamplingLevel, currentNumEntries,
-                    currentNumEntries * avgEntrySize, newSamplingLevel, numEntriesAtNewSamplingLevel,
-                    numEntriesAtNewSamplingLevel * avgEntrySize);
+            logger.trace("{} has {} reads/sec; ideal space for index summary: {} ({} entries); considering moving " +
+                    "from level {} ({} entries, {}) " +
+                    "to level {} ({} entries, {})",
+                    sstable.getFilename(), readsPerSec, FBUtilities.prettyPrintMemory(idealSpace), targetNumEntries,
+                    currentSamplingLevel, currentNumEntries, FBUtilities.prettyPrintMemory((long) (currentNumEntries * avgEntrySize)),
+                    newSamplingLevel, numEntriesAtNewSamplingLevel, FBUtilities.prettyPrintMemory((long) (numEntriesAtNewSamplingLevel * avgEntrySize)));
 
             if (effectiveIndexInterval < minIndexInterval)
             {

diff --git a/src/java/org/apache/cassandra/io/sstable/KeyIterator.java b/src/java/org/apache/cassandra/io/sstable/KeyIterator.java
index f02b9d1..d51e97b 100644
--- a/src/java/org/apache/cassandra/io/sstable/KeyIterator.java
+++ b/src/java/org/apache/cassandra/io/sstable/KeyIterator.java

@@ -84,6 +84,7 @@
     private final In in;
     private final IPartitioner partitioner;
 
+    private long keyPosition;
 
     public KeyIterator(Descriptor desc, CFMetaData metadata)
     {
@@ -99,6 +100,7 @@
             if (in.isEOF())
                 return endOfData();
 
+            keyPosition = in.getFilePointer();
             DecoratedKey key = partitioner.decorateKey(ByteBufferUtil.readWithShortLength(in.get()));
             RowIndexEntry.Serializer.skip(in.get(), desc.version); // skip remainder of the entry
             return key;
@@ -123,4 +125,9 @@
     {
         return in.length();
     }
+
+    public long getKeyPosition()
+    {
+        return keyPosition;
+    }
 }

diff --git a/src/java/org/apache/cassandra/io/sstable/SSTableLoader.java b/src/java/org/apache/cassandra/io/sstable/SSTableLoader.java
index 3286522..043f6fa 100644
--- a/src/java/org/apache/cassandra/io/sstable/SSTableLoader.java
+++ b/src/java/org/apache/cassandra/io/sstable/SSTableLoader.java

@@ -159,7 +159,7 @@
         client.init(keyspace);
         outputHandler.output("Established connection to initial hosts");
 
-        StreamPlan plan = new StreamPlan("Bulk Load", 0, connectionsPerHost, false, false).connectionFactory(client.getConnectionFactory());
+        StreamPlan plan = new StreamPlan("Bulk Load", 0, connectionsPerHost, false, false, false).connectionFactory(client.getConnectionFactory());
 
         Map<InetAddress, Collection<Range<Token>>> endpointToRanges = client.getEndpointToRangesMap();
         openSSTables(endpointToRanges);

diff --git a/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java b/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java
index f4a2e1b..715a33a 100644
--- a/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java
+++ b/src/java/org/apache/cassandra/io/sstable/SSTableRewriter.java

@@ -65,7 +65,6 @@
     private long currentlyOpenedEarlyAt; // the position (in MB) in the target file we last (re)opened at
 
     private final List<SSTableWriter> writers = new ArrayList<>();
-    private final boolean isOffline; // true for operations that are performed without Cassandra running (prevents updates of Tracker)
     private final boolean keepOriginals; // true if we do not want to obsolete the originals
 
     private SSTableWriter writer;
@@ -74,34 +73,45 @@
     // for testing (TODO: remove when have byteman setup)
     private boolean throwEarly, throwLate;
 
+    @Deprecated
     public SSTableRewriter(LifecycleTransaction transaction, long maxAge, boolean isOffline)
     {
         this(transaction, maxAge, isOffline, true);
     }
-
+    @Deprecated
     public SSTableRewriter(LifecycleTransaction transaction, long maxAge, boolean isOffline, boolean shouldOpenEarly)
     {
-        this(transaction, maxAge, isOffline, calculateOpenInterval(shouldOpenEarly), false);
+        this(transaction, maxAge, calculateOpenInterval(shouldOpenEarly), false);
     }
 
     @VisibleForTesting
-    public SSTableRewriter(LifecycleTransaction transaction, long maxAge, boolean isOffline, long preemptiveOpenInterval, boolean keepOriginals)
+    public SSTableRewriter(LifecycleTransaction transaction, long maxAge, long preemptiveOpenInterval, boolean keepOriginals)
     {
         this.transaction = transaction;
         this.maxAge = maxAge;
-        this.isOffline = isOffline;
         this.keepOriginals = keepOriginals;
         this.preemptiveOpenInterval = preemptiveOpenInterval;
     }
 
+    @Deprecated
     public static SSTableRewriter constructKeepingOriginals(LifecycleTransaction transaction, boolean keepOriginals, long maxAge, boolean isOffline)
     {
-        return new SSTableRewriter(transaction, maxAge, isOffline, calculateOpenInterval(true), keepOriginals);
+        return constructKeepingOriginals(transaction, keepOriginals, maxAge);
     }
 
-    public static SSTableRewriter construct(ColumnFamilyStore cfs, LifecycleTransaction transaction, boolean keepOriginals, long maxAge, boolean isOffline)
+    public static SSTableRewriter constructKeepingOriginals(LifecycleTransaction transaction, boolean keepOriginals, long maxAge)
     {
-        return new SSTableRewriter(transaction, maxAge, isOffline, calculateOpenInterval(cfs.supportsEarlyOpen()), keepOriginals);
+        return new SSTableRewriter(transaction, maxAge, calculateOpenInterval(true), keepOriginals);
+    }
+
+    public static SSTableRewriter constructWithoutEarlyOpening(LifecycleTransaction transaction, boolean keepOriginals, long maxAge)
+    {
+        return new SSTableRewriter(transaction, maxAge, calculateOpenInterval(false), keepOriginals);
+    }
+
+    public static SSTableRewriter construct(ColumnFamilyStore cfs, LifecycleTransaction transaction, boolean keepOriginals, long maxAge)
+    {
+        return new SSTableRewriter(transaction, maxAge, calculateOpenInterval(cfs.supportsEarlyOpen()), keepOriginals);
     }
 
     private static long calculateOpenInterval(boolean shouldOpenEarly)
@@ -123,7 +133,7 @@
         DecoratedKey key = partition.partitionKey();
         maybeReopenEarly(key);
         RowIndexEntry index = writer.append(partition);
-        if (!isOffline && index != null)
+        if (!transaction.isOffline() && index != null)
         {
             boolean save = false;
             for (SSTableReader reader : transaction.originals())
@@ -159,7 +169,7 @@
     {
         if (writer.getFilePointer() - currentlyOpenedEarlyAt > preemptiveOpenInterval)
         {
-            if (isOffline)
+            if (transaction.isOffline())
             {
                 for (SSTableReader reader : transaction.originals())
                 {
@@ -215,7 +225,7 @@
      */
     private void moveStarts(SSTableReader newReader, DecoratedKey lowerbound)
     {
-        if (isOffline)
+        if (transaction.isOffline())
             return;
         if (preemptiveOpenInterval == Long.MAX_VALUE)
             return;

diff --git a/src/java/org/apache/cassandra/io/sstable/SSTableSimpleIterator.java b/src/java/org/apache/cassandra/io/sstable/SSTableSimpleIterator.java
index f82db4e..eb69271 100644
--- a/src/java/org/apache/cassandra/io/sstable/SSTableSimpleIterator.java
+++ b/src/java/org/apache/cassandra/io/sstable/SSTableSimpleIterator.java

@@ -138,24 +138,27 @@
 
         protected Unfiltered computeNext()
         {
-            try
+            while (true)
             {
-                if (!deserializer.hasNext())
-                    return endOfData();
-
-                Unfiltered unfiltered = deserializer.readNext();
-                if (metadata.isStaticCompactTable() && unfiltered.kind() == Unfiltered.Kind.ROW)
+                try
                 {
-                    Row row = (Row) unfiltered;
-                    ColumnDefinition def = metadata.getColumnDefinition(LegacyLayout.encodeClustering(metadata, row.clustering()));
-                    if (def != null && def.isStatic())
-                        return computeNext();
+                    if (!deserializer.hasNext())
+                        return endOfData();
+
+                    Unfiltered unfiltered = deserializer.readNext();
+                    if (metadata.isStaticCompactTable() && unfiltered.kind() == Unfiltered.Kind.ROW)
+                    {
+                        Row row = (Row) unfiltered;
+                        ColumnDefinition def = metadata.getColumnDefinition(LegacyLayout.encodeClustering(metadata, row.clustering()));
+                        if (def != null && def.isStatic())
+                            continue;
+                    }
+                    return unfiltered;
                 }
-                return unfiltered;
-            }
-            catch (IOException e)
-            {
-                throw new IOError(e);
+                catch (IOException e)
+                {
+                    throw new IOError(e);
+                }
             }
         }
 

diff --git a/src/java/org/apache/cassandra/io/sstable/SSTableTxnWriter.java b/src/java/org/apache/cassandra/io/sstable/SSTableTxnWriter.java
index e889d85..5286ac5 100644
--- a/src/java/org/apache/cassandra/io/sstable/SSTableTxnWriter.java
+++ b/src/java/org/apache/cassandra/io/sstable/SSTableTxnWriter.java

@@ -27,6 +27,7 @@
 import org.apache.cassandra.db.compaction.OperationType;
 import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
 import org.apache.cassandra.db.rows.UnfilteredRowIterator;
+import org.apache.cassandra.index.Index;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.io.sstable.metadata.MetadataCollector;
 import org.apache.cassandra.utils.concurrent.Transactional;
@@ -102,12 +103,18 @@
     }
 
     @SuppressWarnings("resource") // log and writer closed during postCleanup
-    public static SSTableTxnWriter create(CFMetaData cfm, Descriptor descriptor, long keyCount, long repairedAt, int sstableLevel, SerializationHeader header)
+    public static SSTableTxnWriter create(CFMetaData cfm,
+                                          Descriptor descriptor,
+                                          long keyCount,
+                                          long repairedAt,
+                                          int sstableLevel,
+                                          SerializationHeader header,
+                                          Collection<Index> indexes)
     {
         // if the column family store does not exist, we create a new default SSTableMultiWriter to use:
         LifecycleTransaction txn = LifecycleTransaction.offline(OperationType.WRITE);
         MetadataCollector collector = new MetadataCollector(cfm.comparator).sstableLevel(sstableLevel);
-        SSTableMultiWriter writer = SimpleSSTableMultiWriter.create(descriptor, keyCount, repairedAt, cfm, collector, header, txn);
+        SSTableMultiWriter writer = SimpleSSTableMultiWriter.create(descriptor, keyCount, repairedAt, cfm, collector, header, indexes, txn);
         return new SSTableTxnWriter(txn, writer);
     }
 

diff --git a/src/java/org/apache/cassandra/io/sstable/SimpleSSTableMultiWriter.java b/src/java/org/apache/cassandra/io/sstable/SimpleSSTableMultiWriter.java
index fd1b9a7..2217ae2 100644
--- a/src/java/org/apache/cassandra/io/sstable/SimpleSSTableMultiWriter.java
+++ b/src/java/org/apache/cassandra/io/sstable/SimpleSSTableMultiWriter.java

@@ -27,6 +27,7 @@
 import org.apache.cassandra.db.SerializationHeader;
 import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
 import org.apache.cassandra.db.rows.UnfilteredRowIterator;
+import org.apache.cassandra.index.Index;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.io.sstable.format.SSTableWriter;
 import org.apache.cassandra.io.sstable.metadata.MetadataCollector;
@@ -34,9 +35,11 @@
 public class SimpleSSTableMultiWriter implements SSTableMultiWriter
 {
     private final SSTableWriter writer;
+    private final LifecycleTransaction txn;
 
-    protected SimpleSSTableMultiWriter(SSTableWriter writer)
+    protected SimpleSSTableMultiWriter(SSTableWriter writer, LifecycleTransaction txn)
     {
+        this.txn = txn;
         this.writer = writer;
     }
 
@@ -89,6 +92,7 @@
 
     public Throwable abort(Throwable accumulate)
     {
+        txn.untrackNew(writer);
         return writer.abort(accumulate);
     }
 
@@ -109,9 +113,10 @@
                                             CFMetaData cfm,
                                             MetadataCollector metadataCollector,
                                             SerializationHeader header,
+                                            Collection<Index> indexes,
                                             LifecycleTransaction txn)
     {
-        SSTableWriter writer = SSTableWriter.create(descriptor, keyCount, repairedAt, cfm, metadataCollector, header, txn);
-        return new SimpleSSTableMultiWriter(writer);
+        SSTableWriter writer = SSTableWriter.create(descriptor, keyCount, repairedAt, cfm, metadataCollector, header, indexes, txn);
+        return new SimpleSSTableMultiWriter(writer, txn);
     }
 }

diff --git a/src/java/org/apache/cassandra/io/sstable/format/RangeAwareSSTableWriter.java b/src/java/org/apache/cassandra/io/sstable/format/RangeAwareSSTableWriter.java
new file mode 100644
index 0000000..3665da7
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/sstable/format/RangeAwareSSTableWriter.java

@@ -0,0 +1,207 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.io.sstable.format;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collection;
+import java.util.List;
+import java.util.UUID;
+
+import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.Directories;
+import org.apache.cassandra.db.PartitionPosition;
+import org.apache.cassandra.db.SerializationHeader;
+import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
+import org.apache.cassandra.db.rows.UnfilteredRowIterator;
+import org.apache.cassandra.io.sstable.Descriptor;
+import org.apache.cassandra.io.sstable.SSTableMultiWriter;
+import org.apache.cassandra.service.StorageService;
+import org.apache.cassandra.utils.FBUtilities;
+
+public class RangeAwareSSTableWriter implements SSTableMultiWriter
+{
+    private final List<PartitionPosition> boundaries;
+    private final Directories.DataDirectory[] directories;
+    private final int sstableLevel;
+    private final long estimatedKeys;
+    private final long repairedAt;
+    private final SSTableFormat.Type format;
+    private final SerializationHeader header;
+    private final LifecycleTransaction txn;
+    private int currentIndex = -1;
+    public final ColumnFamilyStore cfs;
+    private final List<SSTableMultiWriter> finishedWriters = new ArrayList<>();
+    private final List<SSTableReader> finishedReaders = new ArrayList<>();
+    private SSTableMultiWriter currentWriter = null;
+
+    public RangeAwareSSTableWriter(ColumnFamilyStore cfs, long estimatedKeys, long repairedAt, SSTableFormat.Type format, int sstableLevel, long totalSize, LifecycleTransaction txn, SerializationHeader header) throws IOException
+    {
+        directories = cfs.getDirectories().getWriteableLocations();
+        this.sstableLevel = sstableLevel;
+        this.cfs = cfs;
+        this.estimatedKeys = estimatedKeys / directories.length;
+        this.repairedAt = repairedAt;
+        this.format = format;
+        this.txn = txn;
+        this.header = header;
+        boundaries = StorageService.getDiskBoundaries(cfs, directories);
+        if (boundaries == null)
+        {
+            Directories.DataDirectory localDir = cfs.getDirectories().getWriteableLocation(totalSize);
+            if (localDir == null)
+                throw new IOException(String.format("Insufficient disk space to store %s",
+                                                    FBUtilities.prettyPrintMemory(totalSize)));
+            Descriptor desc = Descriptor.fromFilename(cfs.getSSTablePath(cfs.getDirectories().getLocationForDisk(localDir), format));
+            currentWriter = cfs.createSSTableMultiWriter(desc, estimatedKeys, repairedAt, sstableLevel, header, txn);
+        }
+    }
+
+    private void maybeSwitchWriter(DecoratedKey key)
+    {
+        if (boundaries == null)
+            return;
+
+        boolean switched = false;
+        while (currentIndex < 0 || key.compareTo(boundaries.get(currentIndex)) > 0)
+        {
+            switched = true;
+            currentIndex++;
+        }
+
+        if (switched)
+        {
+            if (currentWriter != null)
+                finishedWriters.add(currentWriter);
+
+            Descriptor desc = Descriptor.fromFilename(cfs.getSSTablePath(cfs.getDirectories().getLocationForDisk(directories[currentIndex])), format);
+            currentWriter = cfs.createSSTableMultiWriter(desc, estimatedKeys, repairedAt, sstableLevel, header, txn);
+        }
+    }
+
+    public boolean append(UnfilteredRowIterator partition)
+    {
+        maybeSwitchWriter(partition.partitionKey());
+        return currentWriter.append(partition);
+    }
+
+    @Override
+    public Collection<SSTableReader> finish(long repairedAt, long maxDataAge, boolean openResult)
+    {
+        if (currentWriter != null)
+            finishedWriters.add(currentWriter);
+        currentWriter = null;
+        for (SSTableMultiWriter writer : finishedWriters)
+        {
+            if (writer.getFilePointer() > 0)
+                finishedReaders.addAll(writer.finish(repairedAt, maxDataAge, openResult));
+            else
+                SSTableMultiWriter.abortOrDie(writer);
+        }
+        return finishedReaders;
+    }
+
+    @Override
+    public Collection<SSTableReader> finish(boolean openResult)
+    {
+        if (currentWriter != null)
+            finishedWriters.add(currentWriter);
+        currentWriter = null;
+        for (SSTableMultiWriter writer : finishedWriters)
+        {
+            if (writer.getFilePointer() > 0)
+                finishedReaders.addAll(writer.finish(openResult));
+            else
+                SSTableMultiWriter.abortOrDie(writer);
+        }
+        return finishedReaders;
+    }
+
+    @Override
+    public Collection<SSTableReader> finished()
+    {
+        return finishedReaders;
+    }
+
+    @Override
+    public SSTableMultiWriter setOpenResult(boolean openResult)
+    {
+        finishedWriters.forEach((w) -> w.setOpenResult(openResult));
+        currentWriter.setOpenResult(openResult);
+        return this;
+    }
+
+    public String getFilename()
+    {
+        return String.join("/", cfs.keyspace.getName(), cfs.getTableName());
+    }
+
+    @Override
+    public long getFilePointer()
+    {
+        return currentWriter.getFilePointer();
+    }
+
+    @Override
+    public UUID getCfId()
+    {
+        return currentWriter.getCfId();
+    }
+
+    @Override
+    public Throwable commit(Throwable accumulate)
+    {
+        if (currentWriter != null)
+            finishedWriters.add(currentWriter);
+        currentWriter = null;
+        for (SSTableMultiWriter writer : finishedWriters)
+            accumulate = writer.commit(accumulate);
+        return accumulate;
+    }
+
+    @Override
+    public Throwable abort(Throwable accumulate)
+    {
+        if (currentWriter != null)
+            finishedWriters.add(currentWriter);
+        currentWriter = null;
+        for (SSTableMultiWriter finishedWriter : finishedWriters)
+            accumulate = finishedWriter.abort(accumulate);
+
+        return accumulate;
+    }
+
+    @Override
+    public void prepareToCommit()
+    {
+        if (currentWriter != null)
+            finishedWriters.add(currentWriter);
+        currentWriter = null;
+        finishedWriters.forEach(SSTableMultiWriter::prepareToCommit);
+    }
+
+    @Override
+    public void close()
+    {
+        if (currentWriter != null)
+            finishedWriters.add(currentWriter);
+        currentWriter = null;
+        finishedWriters.forEach(SSTableMultiWriter::close);
+    }
+}

diff --git a/src/java/org/apache/cassandra/io/sstable/format/SSTableFlushObserver.java b/src/java/org/apache/cassandra/io/sstable/format/SSTableFlushObserver.java
new file mode 100644
index 0000000..f0b6bac
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/sstable/format/SSTableFlushObserver.java

@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.io.sstable.format;
+
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.rows.Unfiltered;
+
+/**
+ * Observer for events in the lifecycle of writing out an sstable.
+ */
+public interface SSTableFlushObserver
+{
+    /**
+     * Called before writing any data to the sstable.
+     */
+    void begin();
+
+    /**
+     * Called when a new partition in being written to the sstable,
+     * but before any cells are processed (see {@link #nextUnfilteredCluster(Unfiltered)}).
+     *
+     * @param key The key being appended to SSTable.
+     * @param indexPosition The position of the key in the SSTable PRIMARY_INDEX file.
+     */
+    void startPartition(DecoratedKey key, long indexPosition);
+
+    /**
+     * Called after the unfiltered cluster is written to the sstable.
+     * Will be preceded by a call to {@code startPartition(DecoratedKey, long)},
+     * and the cluster should be assumed to belong to that partition.
+     *
+     * @param unfilteredCluster The unfiltered cluster being added to SSTable.
+     */
+    void nextUnfilteredCluster(Unfiltered unfilteredCluster);
+
+    /**
+     * Called when all data is written to the file and it's ready to be finished up.
+     */
+    void complete();
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/io/sstable/format/SSTableFormat.java b/src/java/org/apache/cassandra/io/sstable/format/SSTableFormat.java
index 1286f16..e68ca2a 100644
--- a/src/java/org/apache/cassandra/io/sstable/format/SSTableFormat.java
+++ b/src/java/org/apache/cassandra/io/sstable/format/SSTableFormat.java

@@ -18,16 +18,10 @@
 package org.apache.cassandra.io.sstable.format;
 
 import com.google.common.base.CharMatcher;
-import com.google.common.collect.ImmutableList;
 import org.apache.cassandra.config.CFMetaData;
-import org.apache.cassandra.db.LegacyLayout;
 import org.apache.cassandra.db.RowIndexEntry;
 import org.apache.cassandra.db.SerializationHeader;
-import org.apache.cassandra.db.compaction.CompactionController;
 import org.apache.cassandra.io.sstable.format.big.BigFormat;
-import org.apache.cassandra.io.util.FileDataInput;
-
-import java.util.Iterator;
 
 /**
  * Provides the accessors to data on disk.

diff --git a/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java b/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java
index 9f2663e..d11e057 100644
--- a/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java
+++ b/src/java/org/apache/cassandra/io/sstable/format/SSTableReader.java

@@ -45,9 +45,9 @@
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
 import org.apache.cassandra.db.filter.ColumnFilter;
-import org.apache.cassandra.db.rows.SliceableUnfilteredRowIterator;
+import org.apache.cassandra.db.rows.EncodingStats;
+import org.apache.cassandra.db.rows.UnfilteredRowIterator;
 import org.apache.cassandra.dht.AbstractBounds;
 import org.apache.cassandra.dht.Range;
 import org.apache.cassandra.dht.Token;
@@ -420,7 +420,8 @@
             System.exit(1);
         }
 
-        logger.debug("Opening {} ({} bytes)", descriptor, new File(descriptor.filenameFor(Component.DATA)).length());
+        long fileLength = new File(descriptor.filenameFor(Component.DATA)).length();
+        logger.debug("Opening {} ({})", descriptor, FBUtilities.prettyPrintMemory(fileLength));
         SSTableReader sstable = internalOpen(descriptor,
                                              components,
                                              metadata,
@@ -436,7 +437,7 @@
                 : new BufferedSegmentedFile.Builder())
         {
             if (!sstable.loadSummary(ibuilder, dbuilder))
-                sstable.buildSummary(false, ibuilder, dbuilder, false, Downsampling.BASE_SAMPLING_LEVEL);
+                sstable.buildSummary(false, false, Downsampling.BASE_SAMPLING_LEVEL);
             sstable.ifile = ibuilder.buildIndex(sstable.descriptor, sstable.indexSummary);
             sstable.dfile = dbuilder.buildData(sstable.descriptor, statsMetadata);
             sstable.bf = FilterFactory.AlwaysPresent;
@@ -476,7 +477,8 @@
             System.exit(1);
         }
 
-        logger.debug("Opening {} ({} bytes)", descriptor, new File(descriptor.filenameFor(Component.DATA)).length());
+        long fileLength = new File(descriptor.filenameFor(Component.DATA)).length();
+        logger.debug("Opening {} ({})", descriptor, FBUtilities.prettyPrintMemory(fileLength));
         SSTableReader sstable = internalOpen(descriptor,
                                              components,
                                              metadata,
@@ -735,7 +737,7 @@
             boolean builtSummary = false;
             if (recreateBloomFilter || !summaryLoaded)
             {
-                buildSummary(recreateBloomFilter, ibuilder, dbuilder, summaryLoaded, Downsampling.BASE_SAMPLING_LEVEL);
+                buildSummary(recreateBloomFilter, summaryLoaded, Downsampling.BASE_SAMPLING_LEVEL);
                 builtSummary = true;
             }
 
@@ -775,12 +777,10 @@
      * Build index summary(and optionally bloom filter) by reading through Index.db file.
      *
      * @param recreateBloomFilter true if recreate bloom filter
-     * @param ibuilder
-     * @param dbuilder
      * @param summaryLoaded true if index summary is already loaded and not need to build again
      * @throws IOException
      */
-    private void buildSummary(boolean recreateBloomFilter, SegmentedFile.Builder ibuilder, SegmentedFile.Builder dbuilder, boolean summaryLoaded, int samplingLevel) throws IOException
+    private void buildSummary(boolean recreateBloomFilter, boolean summaryLoaded, int samplingLevel) throws IOException
     {
          if (!components.contains(Component.PRIMARY_INDEX))
              return;
@@ -800,12 +800,11 @@
             try (IndexSummaryBuilder summaryBuilder = summaryLoaded ? null : new IndexSummaryBuilder(estimatedKeys, metadata.params.minIndexInterval, samplingLevel))
             {
                 long indexPosition;
-                RowIndexEntry.IndexSerializer rowIndexSerializer = descriptor.getFormat().getIndexSerializer(metadata, descriptor.version, header);
 
                 while ((indexPosition = primaryIndex.getFilePointer()) != indexSize)
                 {
                     ByteBuffer key = ByteBufferUtil.readWithShortLength(primaryIndex);
-                    RowIndexEntry indexEntry = rowIndexSerializer.deserialize(primaryIndex);
+                    RowIndexEntry.Serializer.skip(primaryIndex, descriptor.version);
                     DecoratedKey decoratedKey = decorateKey(key);
                     if (first == null)
                         first = decoratedKey;
@@ -1196,7 +1195,7 @@
 
     /**
      * Gets the position in the index file to start scanning to find the given key (at most indexInterval keys away,
-     * modulo downsampling of the index summary). Always returns a value >= 0
+     * modulo downsampling of the index summary). Always returns a {@code value >= 0}
      */
     public long getIndexScanPosition(PartitionPosition key)
     {
@@ -1522,8 +1521,8 @@
      */
     protected abstract RowIndexEntry getPosition(PartitionPosition key, Operator op, boolean updateCacheAndStats, boolean permitMatchPastLast);
 
-    public abstract SliceableUnfilteredRowIterator iterator(DecoratedKey key, ColumnFilter selectedColumns, boolean reversed, boolean isForThrift);
-    public abstract SliceableUnfilteredRowIterator iterator(FileDataInput file, DecoratedKey key, RowIndexEntry indexEntry, ColumnFilter selectedColumns, boolean reversed, boolean isForThrift);
+    public abstract UnfilteredRowIterator iterator(DecoratedKey key, Slices slices, ColumnFilter selectedColumns, boolean reversed, boolean isForThrift);
+    public abstract UnfilteredRowIterator iterator(FileDataInput file, DecoratedKey key, RowIndexEntry indexEntry, Slices slices, ColumnFilter selectedColumns, boolean reversed, boolean isForThrift);
 
     /**
      * Finds and returns the first key beyond a given token in this SSTable or null if no such key exists.
@@ -1568,7 +1567,7 @@
      */
     public long uncompressedLength()
     {
-        return dfile.length;
+        return dfile.dataLength();
     }
 
     /**
@@ -1607,7 +1606,6 @@
      * When calling this function, the caller must ensure that the SSTableReader is not referenced anywhere
      * except for threads holding a reference.
      *
-     * @return true if the this is the first time the file was marked obsolete.  Calling this
      * multiple times is usually buggy (see exceptions in Tracker.unmarkCompacting and removeOldSSTablesSize).
      */
     public void markObsolete(Runnable tidier)
@@ -1694,7 +1692,7 @@
     /**
      * Direct I/O SSTableScanner over an iterator of bounds.
      *
-     * @param bounds the keys to cover
+     * @param rangeIterator the keys to cover
      * @return A Scanner for seeking over the rows of the SSTable.
      */
     public abstract ISSTableScanner getScanner(Iterator<AbstractBounds<PartitionPosition>> rangeIterator);
@@ -1739,6 +1737,26 @@
         return sstableMetadata.repairedAt != ActiveRepairService.UNREPAIRED_SSTABLE;
     }
 
+    public DecoratedKey keyAt(long indexPosition) throws IOException
+    {
+        DecoratedKey key;
+        try (FileDataInput in = ifile.createReader(indexPosition))
+        {
+            if (in.isEOF())
+                return null;
+
+            key = decorateKey(ByteBufferUtil.readWithShortLength(in));
+
+            // hint read path about key location if caching is enabled
+            // this saves index summary lookup and index file iteration which whould be pretty costly
+            // especially in presence of promoted column indexes
+            if (isKeyCacheSetup())
+                cacheKey(key, rowIndexEntrySerializer.deserialize(in, in.getFilePointer()));
+        }
+
+        return key;
+    }
+
     /**
      * TODO: Move someplace reusable
      */
@@ -1840,6 +1858,14 @@
         return sstableMetadata.maxLocalDeletionTime;
     }
 
+    /** sstable contains no tombstones if maxLocalDeletionTime == Integer.MAX_VALUE */
+    public boolean hasTombstones()
+    {
+        // sstable contains no tombstone if minLocalDeletionTime is still set to  the default value Integer.MAX_VALUE
+        // which is bigger than any valid deletion times
+        return getMinLocalDeletionTime() != Integer.MAX_VALUE;
+    }
+
     public int getMinTTL()
     {
         return sstableMetadata.minTTL;
@@ -1919,6 +1945,11 @@
         return ifile.channel;
     }
 
+    public SegmentedFile getIndexFile()
+    {
+        return ifile;
+    }
+
     /**
      * @param component component to get timestamp.
      * @return last modified time for given component. 0 if given component does not exist or IO error occurs.
@@ -1990,6 +2021,13 @@
         }
     }
 
+    public EncodingStats stats()
+    {
+        // We could return sstable.header.stats(), but this may not be as accurate than the actual sstable stats (see
+        // SerializationHeader.make() for details) so we use the latter instead.
+        return new EncodingStats(getMinTimestamp(), getMinLocalDeletionTime(), getMinTTL());
+    }
+
     public Ref<SSTableReader> tryRef()
     {
         return selfRef.tryRef();

diff --git a/src/java/org/apache/cassandra/io/sstable/format/SSTableWriter.java b/src/java/org/apache/cassandra/io/sstable/format/SSTableWriter.java
index 5f35029..9f2e159 100644
--- a/src/java/org/apache/cassandra/io/sstable/format/SSTableWriter.java
+++ b/src/java/org/apache/cassandra/io/sstable/format/SSTableWriter.java

@@ -18,20 +18,21 @@
 
 package org.apache.cassandra.io.sstable.format;
 
-import java.util.Arrays;
-import java.util.HashSet;
-import java.util.Map;
-import java.util.Set;
+import java.util.*;
 
 import com.google.common.annotations.VisibleForTesting;
+import com.google.common.collect.ImmutableList;
 import com.google.common.collect.Sets;
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.db.RowIndexEntry;
 import org.apache.cassandra.db.SerializationHeader;
+import org.apache.cassandra.db.compaction.OperationType;
 import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
 import org.apache.cassandra.db.rows.UnfilteredRowIterator;
+import org.apache.cassandra.index.Index;
+import org.apache.cassandra.io.FSWriteError;
 import org.apache.cassandra.io.sstable.Component;
 import org.apache.cassandra.io.sstable.Descriptor;
 import org.apache.cassandra.io.sstable.SSTable;
@@ -58,6 +59,7 @@
     protected final RowIndexEntry.IndexSerializer rowIndexEntrySerializer;
     protected final SerializationHeader header;
     protected final TransactionalProxy txnProxy = txnProxy();
+    protected final Collection<SSTableFlushObserver> observers;
 
     protected abstract TransactionalProxy txnProxy();
 
@@ -69,12 +71,13 @@
         protected boolean openResult;
     }
 
-    protected SSTableWriter(Descriptor descriptor, 
-                            long keyCount, 
-                            long repairedAt, 
-                            CFMetaData metadata, 
-                            MetadataCollector metadataCollector, 
-                            SerializationHeader header)
+    protected SSTableWriter(Descriptor descriptor,
+                            long keyCount,
+                            long repairedAt,
+                            CFMetaData metadata,
+                            MetadataCollector metadataCollector,
+                            SerializationHeader header,
+                            Collection<SSTableFlushObserver> observers)
     {
         super(descriptor, components(metadata), metadata);
         this.keyCount = keyCount;
@@ -82,6 +85,7 @@
         this.metadataCollector = metadataCollector;
         this.header = header != null ? header : SerializationHeader.makeWithoutStats(metadata); //null header indicates streaming from pre-3.0 sstable
         this.rowIndexEntrySerializer = descriptor.version.getSSTableFormat().getIndexSerializer(metadata, descriptor.version, header);
+        this.observers = observers == null ? Collections.emptySet() : observers;
     }
 
     public static SSTableWriter create(Descriptor descriptor,
@@ -90,16 +94,23 @@
                                        CFMetaData metadata,
                                        MetadataCollector metadataCollector,
                                        SerializationHeader header,
+                                       Collection<Index> indexes,
                                        LifecycleTransaction txn)
     {
         Factory writerFactory = descriptor.getFormat().getWriterFactory();
-        return writerFactory.open(descriptor, keyCount, repairedAt, metadata, metadataCollector, header, txn);
+        return writerFactory.open(descriptor, keyCount, repairedAt, metadata, metadataCollector, header, observers(descriptor, indexes, txn.opType()), txn);
     }
 
-    public static SSTableWriter create(Descriptor descriptor, long keyCount, long repairedAt, int sstableLevel, SerializationHeader header, LifecycleTransaction txn)
+    public static SSTableWriter create(Descriptor descriptor,
+                                       long keyCount,
+                                       long repairedAt,
+                                       int sstableLevel,
+                                       SerializationHeader header,
+                                       Collection<Index> indexes,
+                                       LifecycleTransaction txn)
     {
         CFMetaData metadata = Schema.instance.getCFMetaData(descriptor);
-        return create(metadata, descriptor, keyCount, repairedAt, sstableLevel, header, txn);
+        return create(metadata, descriptor, keyCount, repairedAt, sstableLevel, header, indexes, txn);
     }
 
     public static SSTableWriter create(CFMetaData metadata,
@@ -108,21 +119,34 @@
                                        long repairedAt,
                                        int sstableLevel,
                                        SerializationHeader header,
+                                       Collection<Index> indexes,
                                        LifecycleTransaction txn)
     {
         MetadataCollector collector = new MetadataCollector(metadata.comparator).sstableLevel(sstableLevel);
-        return create(descriptor, keyCount, repairedAt, metadata, collector, header, txn);
+        return create(descriptor, keyCount, repairedAt, metadata, collector, header, indexes, txn);
     }
 
-    public static SSTableWriter create(String filename, long keyCount, long repairedAt, int sstableLevel, SerializationHeader header,LifecycleTransaction txn)
+    public static SSTableWriter create(String filename,
+                                       long keyCount,
+                                       long repairedAt,
+                                       int sstableLevel,
+                                       SerializationHeader header,
+                                       Collection<Index> indexes,
+                                       LifecycleTransaction txn)
     {
-        return create(Descriptor.fromFilename(filename), keyCount, repairedAt, sstableLevel, header, txn);
+        return create(Descriptor.fromFilename(filename), keyCount, repairedAt, sstableLevel, header, indexes, txn);
     }
 
     @VisibleForTesting
-    public static SSTableWriter create(String filename, long keyCount, long repairedAt, SerializationHeader header, LifecycleTransaction txn)
+    public static SSTableWriter create(String filename,
+                                       long keyCount,
+                                       long repairedAt,
+                                       SerializationHeader header,
+                                       Collection<Index> indexes,
+                                       LifecycleTransaction txn)
     {
-        return create(Descriptor.fromFilename(filename), keyCount, repairedAt, 0, header, txn);
+        Descriptor descriptor = Descriptor.fromFilename(filename);
+        return create(descriptor, keyCount, repairedAt, 0, header, indexes, txn);
     }
 
     private static Set<Component> components(CFMetaData metadata)
@@ -150,6 +174,27 @@
         return components;
     }
 
+    private static Collection<SSTableFlushObserver> observers(Descriptor descriptor,
+                                                              Collection<Index> indexes,
+                                                              OperationType operationType)
+    {
+        if (indexes == null)
+            return Collections.emptyList();
+
+        List<SSTableFlushObserver> observers = new ArrayList<>(indexes.size());
+        for (Index index : indexes)
+        {
+            SSTableFlushObserver observer = index.getFlushObserver(descriptor, operationType);
+            if (observer != null)
+            {
+                observer.begin();
+                observers.add(observer);
+            }
+        }
+
+        return ImmutableList.copyOf(observers);
+    }
+
     public abstract void mark();
 
     /**
@@ -167,6 +212,11 @@
 
     public abstract long getOnDiskFilePointer();
 
+    public long getEstimatedOnDiskBytesWritten()
+    {
+        return getOnDiskFilePointer();
+    }
+
     public abstract void resetAndTruncate();
 
     public SSTableWriter setRepairedAt(long repairedAt)
@@ -211,6 +261,7 @@
     {
         setOpenResult(openResult);
         txnProxy.finish();
+        observers.forEach(SSTableFlushObserver::complete);
         return finished();
     }
 
@@ -231,7 +282,14 @@
 
     public final Throwable commit(Throwable accumulate)
     {
-        return txnProxy.commit(accumulate);
+        try
+        {
+            return txnProxy.commit(accumulate);
+        }
+        finally
+        {
+            observers.forEach(SSTableFlushObserver::complete);
+        }
     }
 
     public final Throwable abort(Throwable accumulate)
@@ -285,6 +343,7 @@
                                            CFMetaData metadata,
                                            MetadataCollector metadataCollector,
                                            SerializationHeader header,
+                                           Collection<SSTableFlushObserver> observers,
                                            LifecycleTransaction txn);
     }
 }

diff --git a/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java b/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java
index e0fb3b1..9b6f491 100644
--- a/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java
+++ b/src/java/org/apache/cassandra/io/sstable/format/big/BigFormat.java

@@ -17,6 +17,7 @@
  */
 package org.apache.cassandra.io.sstable.format.big;
 
+import java.util.Collection;
 import java.util.Set;
 
 import org.apache.cassandra.config.CFMetaData;
@@ -25,10 +26,7 @@
 import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
 import org.apache.cassandra.io.sstable.Component;
 import org.apache.cassandra.io.sstable.Descriptor;
-import org.apache.cassandra.io.sstable.format.SSTableFormat;
-import org.apache.cassandra.io.sstable.format.SSTableReader;
-import org.apache.cassandra.io.sstable.format.SSTableWriter;
-import org.apache.cassandra.io.sstable.format.Version;
+import org.apache.cassandra.io.sstable.format.*;
 import org.apache.cassandra.io.sstable.metadata.MetadataCollector;
 import org.apache.cassandra.io.sstable.metadata.StatsMetadata;
 import org.apache.cassandra.net.MessagingService;
@@ -88,9 +86,10 @@
                                   CFMetaData metadata,
                                   MetadataCollector metadataCollector,
                                   SerializationHeader header,
+                                  Collection<SSTableFlushObserver> observers,
                                   LifecycleTransaction txn)
         {
-            return new BigTableWriter(descriptor, keyCount, repairedAt, metadata, metadataCollector, header, txn);
+            return new BigTableWriter(descriptor, keyCount, repairedAt, metadata, metadataCollector, header, observers, txn);
         }
     }
 
@@ -124,7 +123,7 @@
         // lb (2.2.7): commit log lower bound included
         // ma (3.0.0): swap bf hash order
         //             store rows natively
-        // mb (3.0.6): commit log lower bound included
+        // mb (3.0.7, 3.7): commit log lower bound included
         //
         // NOTE: when adding a new version, please add that to LegacySSTableTest, too.
 

diff --git a/src/java/org/apache/cassandra/io/sstable/format/big/BigTableReader.java b/src/java/org/apache/cassandra/io/sstable/format/big/BigTableReader.java
index 1fbf1f2..7a7ce8c 100644
--- a/src/java/org/apache/cassandra/io/sstable/format/big/BigTableReader.java
+++ b/src/java/org/apache/cassandra/io/sstable/format/big/BigTableReader.java

@@ -21,10 +21,12 @@
 import org.apache.cassandra.cache.KeyCacheKey;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.rows.SliceableUnfilteredRowIterator;
 import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.columniterator.SSTableIterator;
 import org.apache.cassandra.db.columniterator.SSTableReversedIterator;
+import org.apache.cassandra.db.rows.Rows;
+import org.apache.cassandra.db.rows.UnfilteredRowIterator;
+import org.apache.cassandra.db.rows.UnfilteredRowIterators;
 import org.apache.cassandra.dht.AbstractBounds;
 import org.apache.cassandra.dht.Range;
 import org.apache.cassandra.dht.Token;
@@ -57,18 +59,19 @@
         super(desc, components, metadata, maxDataAge, sstableMetadata, openReason, header);
     }
 
-    public SliceableUnfilteredRowIterator iterator(DecoratedKey key, ColumnFilter selectedColumns, boolean reversed, boolean isForThrift)
+    public UnfilteredRowIterator iterator(DecoratedKey key, Slices slices, ColumnFilter selectedColumns, boolean reversed, boolean isForThrift)
     {
-        return reversed
-             ? new SSTableReversedIterator(this, key, selectedColumns, isForThrift)
-             : new SSTableIterator(this, key, selectedColumns, isForThrift);
+        RowIndexEntry rie = getPosition(key, SSTableReader.Operator.EQ);
+        return iterator(null, key, rie, slices, selectedColumns, reversed, isForThrift);
     }
 
-    public SliceableUnfilteredRowIterator iterator(FileDataInput file, DecoratedKey key, RowIndexEntry indexEntry, ColumnFilter selectedColumns, boolean reversed, boolean isForThrift)
+    public UnfilteredRowIterator iterator(FileDataInput file, DecoratedKey key, RowIndexEntry indexEntry, Slices slices, ColumnFilter selectedColumns, boolean reversed, boolean isForThrift)
     {
+        if (indexEntry == null)
+            return UnfilteredRowIterators.noRowsIterator(metadata, key, Rows.EMPTY_STATIC_ROW, DeletionTime.LIVE, reversed);
         return reversed
-             ? new SSTableReversedIterator(this, file, key, indexEntry, selectedColumns, isForThrift)
-             : new SSTableIterator(this, file, key, indexEntry, selectedColumns, isForThrift);
+             ? new SSTableReversedIterator(this, file, key, indexEntry, slices, selectedColumns, isForThrift, ifile)
+             : new SSTableIterator(this, file, key, indexEntry, slices, selectedColumns, isForThrift, ifile);
     }
 
     /**
@@ -227,7 +230,7 @@
                 if (opSatisfied)
                 {
                     // read data position from index entry
-                    RowIndexEntry indexEntry = rowIndexEntrySerializer.deserialize(in);
+                    RowIndexEntry indexEntry = rowIndexEntrySerializer.deserialize(in, in.getFilePointer());
                     if (exactMatch && updateCacheAndStats)
                     {
                         assert key instanceof DecoratedKey; // key can be == to the index key only if it's a true row key
@@ -249,7 +252,7 @@
                     }
                     if (op == Operator.EQ && updateCacheAndStats)
                         bloomFilterTracker.addTruePositive();
-                    Tracing.trace("Partition index with {} entries found for sstable {}", indexEntry.columnsIndex().size(), descriptor.generation);
+                    Tracing.trace("Partition index with {} entries found for sstable {}", indexEntry.columnsIndexCount(), descriptor.generation);
                     return indexEntry;
                 }
 

diff --git a/src/java/org/apache/cassandra/io/sstable/format/big/BigTableScanner.java b/src/java/org/apache/cassandra/io/sstable/format/big/BigTableScanner.java
index a3bd442..6d31844 100644
--- a/src/java/org/apache/cassandra/io/sstable/format/big/BigTableScanner.java
+++ b/src/java/org/apache/cassandra/io/sstable/format/big/BigTableScanner.java

@@ -288,7 +288,7 @@
                             return endOfData();
 
                         currentKey = sstable.decorateKey(ByteBufferUtil.readWithShortLength(ifile));
-                        currentEntry = rowIndexEntrySerializer.deserialize(ifile);
+                        currentEntry = rowIndexEntrySerializer.deserialize(ifile, ifile.getFilePointer());
                     } while (!currentRange.contains(currentKey));
                 }
                 else
@@ -307,7 +307,7 @@
                 {
                     // we need the position of the start of the next key, regardless of whether it falls in the current range
                     nextKey = sstable.decorateKey(ByteBufferUtil.readWithShortLength(ifile));
-                    nextEntry = rowIndexEntrySerializer.deserialize(ifile);
+                    nextEntry = rowIndexEntrySerializer.deserialize(ifile, ifile.getFilePointer());
 
                     if (!currentRange.contains(nextKey))
                     {
@@ -335,7 +335,7 @@
                             }
 
                             ClusteringIndexFilter filter = dataRange.clusteringIndexFilter(partitionKey());
-                            return filter.filter(sstable.iterator(dfile, partitionKey(), currentEntry, columns, filter.isReversed(), isForThrift));
+                            return sstable.iterator(dfile, partitionKey(), currentEntry, filter.getSlices(BigTableScanner.this.metadata()), columns, filter.isReversed(), isForThrift);
                         }
                         catch (CorruptSSTableException | IOException e)
                         {

diff --git a/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java b/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java
index d3630d7..c1d9bbc 100644
--- a/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java
+++ b/src/java/org/apache/cassandra/io/sstable/format/big/BigTableWriter.java

@@ -17,71 +17,87 @@
  */
 package org.apache.cassandra.io.sstable.format.big;
 
-import java.io.*;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Collection;
 import java.util.Map;
 
-import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
-import org.apache.cassandra.db.transform.Transformation;
-import org.apache.cassandra.io.sstable.*;
-import org.apache.cassandra.io.sstable.format.SSTableReader;
-import org.apache.cassandra.io.sstable.format.SSTableWriter;
-
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.cache.ChunkCache;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
 import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.db.transform.Transformation;
 import org.apache.cassandra.io.FSWriteError;
 import org.apache.cassandra.io.compress.CompressedSequentialWriter;
+import org.apache.cassandra.io.sstable.*;
+import org.apache.cassandra.io.sstable.format.SSTableFlushObserver;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.io.sstable.format.SSTableWriter;
 import org.apache.cassandra.io.sstable.metadata.MetadataCollector;
 import org.apache.cassandra.io.sstable.metadata.MetadataComponent;
 import org.apache.cassandra.io.sstable.metadata.MetadataType;
 import org.apache.cassandra.io.sstable.metadata.StatsMetadata;
 import org.apache.cassandra.io.util.*;
-import org.apache.cassandra.utils.ByteBufferUtil;
-import org.apache.cassandra.utils.FBUtilities;
-import org.apache.cassandra.utils.FilterFactory;
-import org.apache.cassandra.utils.IFilter;
+import org.apache.cassandra.utils.*;
 import org.apache.cassandra.utils.concurrent.Transactional;
 
-import org.apache.cassandra.utils.SyncUtil;
-
 public class BigTableWriter extends SSTableWriter
 {
     private static final Logger logger = LoggerFactory.getLogger(BigTableWriter.class);
 
+    private final ColumnIndex columnIndexWriter;
     private final IndexWriter iwriter;
     private final SegmentedFile.Builder dbuilder;
     protected final SequentialWriter dataFile;
     private DecoratedKey lastWrittenKey;
     private DataPosition dataMark;
+    private long lastEarlyOpenLength = 0;
 
-    public BigTableWriter(Descriptor descriptor, 
-                          Long keyCount, 
-                          Long repairedAt, 
-                          CFMetaData metadata, 
+    private final SequentialWriterOption writerOption = SequentialWriterOption.newBuilder()
+                                                        .trickleFsync(DatabaseDescriptor.getTrickleFsync())
+                                                        .trickleFsyncByteInterval(DatabaseDescriptor.getTrickleFsyncIntervalInKb() * 1024)
+                                                        .build();
+
+    public BigTableWriter(Descriptor descriptor,
+                          long keyCount,
+                          long repairedAt,
+                          CFMetaData metadata,
                           MetadataCollector metadataCollector, 
                           SerializationHeader header,
+                          Collection<SSTableFlushObserver> observers,
                           LifecycleTransaction txn)
     {
-        super(descriptor, keyCount, repairedAt, metadata, metadataCollector, header);
+        super(descriptor, keyCount, repairedAt, metadata, metadataCollector, header, observers);
         txn.trackNew(this); // must track before any files are created
 
         if (compression)
         {
-            dataFile = SequentialWriter.open(getFilename(),
+            dataFile = new CompressedSequentialWriter(new File(getFilename()),
                                              descriptor.filenameFor(Component.COMPRESSION_INFO),
+                                             new File(descriptor.filenameFor(descriptor.digestComponent)),
+                                             writerOption,
                                              metadata.params.compression,
                                              metadataCollector);
             dbuilder = SegmentedFile.getCompressedBuilder((CompressedSequentialWriter) dataFile);
         }
         else
         {
-            dataFile = SequentialWriter.open(new File(getFilename()), new File(descriptor.filenameFor(Component.CRC)));
+            dataFile = new ChecksummedSequentialWriter(new File(getFilename()),
+                                                       new File(descriptor.filenameFor(Component.CRC)),
+                                                       new File(descriptor.filenameFor(descriptor.digestComponent)),
+                                                       writerOption);
             dbuilder = SegmentedFile.getBuilder(DatabaseDescriptor.getDiskAccessMode(), false);
         }
-        iwriter = new IndexWriter(keyCount, dataFile);
+        iwriter = new IndexWriter(keyCount);
+
+        columnIndexWriter = new ColumnIndex(this.header, dataFile, descriptor.version, this.observers, getRowIndexEntrySerializer().indexInfoSerializer());
     }
 
     public void mark()
@@ -107,7 +123,7 @@
         return (lastWrittenKey == null) ? 0 : dataFile.position();
     }
 
-    private void afterAppend(DecoratedKey decoratedKey, long dataEnd, RowIndexEntry index) throws IOException
+    private void afterAppend(DecoratedKey decoratedKey, long dataEnd, RowIndexEntry index, ByteBuffer indexInfo) throws IOException
     {
         metadataCollector.addKey(decoratedKey.getKey());
         lastWrittenKey = decoratedKey;
@@ -117,7 +133,7 @@
 
         if (logger.isTraceEnabled())
             logger.trace("wrote {} at {}", decoratedKey, dataEnd);
-        iwriter.append(decoratedKey, index, dataEnd);
+        iwriter.append(decoratedKey, index, dataEnd, indexInfo);
     }
 
     /**
@@ -143,18 +159,33 @@
             return null;
 
         long startPosition = beforeAppend(key);
+        observers.forEach((o) -> o.startPartition(key, iwriter.indexFile.position()));
+
+        //Reuse the writer for each row
+        columnIndexWriter.reset();
 
         try (UnfilteredRowIterator collecting = Transformation.apply(iterator, new StatsCollector(metadataCollector)))
         {
-            ColumnIndex index = ColumnIndex.writeAndBuildIndex(collecting, dataFile, header, descriptor.version);
+            columnIndexWriter.buildRowIndex(collecting);
 
-            RowIndexEntry entry = RowIndexEntry.create(startPosition, collecting.partitionLevelDeletion(), index);
+            // afterAppend() writes the partition key before the first RowIndexEntry - so we have to add it's
+            // serialized size to the index-writer position
+            long indexFilePosition = ByteBufferUtil.serializedSizeWithShortLength(key.getKey()) + iwriter.indexFile.position();
+
+            RowIndexEntry entry = RowIndexEntry.create(startPosition, indexFilePosition,
+                                                       collecting.partitionLevelDeletion(),
+                                                       columnIndexWriter.headerLength,
+                                                       columnIndexWriter.columnIndexCount,
+                                                       columnIndexWriter.indexInfoSerializedSize(),
+                                                       columnIndexWriter.indexSamples(),
+                                                       columnIndexWriter.offsets(),
+                                                       getRowIndexEntrySerializer().indexInfoSerializer());
 
             long endPosition = dataFile.position();
             long rowSize = endPosition - startPosition;
             maybeLogLargePartitionWarning(key, rowSize);
             metadataCollector.addPartitionSizeInBytes(rowSize);
-            afterAppend(key, endPosition, entry);
+            afterAppend(key, endPosition, entry, columnIndexWriter.buffer());
             return entry;
         }
         catch (IOException e)
@@ -163,12 +194,17 @@
         }
     }
 
+    private RowIndexEntry.IndexSerializer<IndexInfo> getRowIndexEntrySerializer()
+    {
+        return (RowIndexEntry.IndexSerializer<IndexInfo>) rowIndexEntrySerializer;
+    }
+
     private void maybeLogLargePartitionWarning(DecoratedKey key, long rowSize)
     {
         if (rowSize > DatabaseDescriptor.getCompactionLargePartitionWarningThreshold())
         {
             String keyString = metadata.getKeyValidator().getString(key.getKey());
-            logger.warn("Writing large partition {}/{}:{} ({} bytes)", metadata.ksName, metadata.cfName, keyString, rowSize);
+            logger.warn("Writing large partition {}/{}:{} ({})", metadata.ksName, metadata.cfName, keyString, FBUtilities.prettyPrintMemory(rowSize));
         }
     }
 
@@ -243,6 +279,7 @@
         IndexSummary indexSummary = iwriter.summary.build(metadata.partitioner, boundary);
         SegmentedFile ifile = iwriter.builder.buildIndex(descriptor, indexSummary, boundary);
         SegmentedFile dfile = dbuilder.buildData(descriptor, stats, boundary);
+        invalidateCacheAtBoundary(dfile);
         SSTableReader sstable = SSTableReader.internalOpen(descriptor,
                                                            components, metadata,
                                                            ifile, dfile, indexSummary,
@@ -254,6 +291,13 @@
         return sstable;
     }
 
+    void invalidateCacheAtBoundary(SegmentedFile dfile)
+    {
+        if (ChunkCache.instance != null && lastEarlyOpenLength != 0 && dfile.dataLength() > lastEarlyOpenLength)
+            ChunkCache.instance.invalidatePosition(dfile, lastEarlyOpenLength);
+        lastEarlyOpenLength = dfile.dataLength();
+    }
+
     public SSTableReader openFinalEarly()
     {
         // we must ensure the data is completely flushed to disk
@@ -274,6 +318,7 @@
         IndexSummary indexSummary = iwriter.summary.build(this.metadata.partitioner);
         SegmentedFile ifile = iwriter.builder.buildIndex(desc, indexSummary);
         SegmentedFile dfile = dbuilder.buildData(desc, stats);
+        invalidateCacheAtBoundary(dfile);
         SSTableReader sstable = SSTableReader.internalOpen(desc,
                                                            components,
                                                            this.metadata,
@@ -303,7 +348,7 @@
             iwriter.prepareToCommit();
 
             // write sstable statistics
-            dataFile.setDescriptor(descriptor).prepareToCommit();
+            dataFile.prepareToCommit();
             writeMetadata(descriptor, finalizeMetadata());
 
             // save the table of components
@@ -335,13 +380,13 @@
         }
     }
 
-    private static void writeMetadata(Descriptor desc, Map<MetadataType, MetadataComponent> components)
+    private void writeMetadata(Descriptor desc, Map<MetadataType, MetadataComponent> components)
     {
         File file = new File(desc.filenameFor(Component.STATS));
-        try (SequentialWriter out = SequentialWriter.open(file))
+        try (SequentialWriter out = new SequentialWriter(file, writerOption))
         {
             desc.getMetadataSerializer().serialize(components, out, desc.version);
-            out.setDescriptor(desc).finish();
+            out.finish();
         }
         catch (IOException e)
         {
@@ -359,6 +404,11 @@
         return dataFile.getOnDiskFilePointer();
     }
 
+    public long getEstimatedOnDiskBytesWritten()
+    {
+        return dataFile.getEstimatedOnDiskBytesWritten();
+    }
+
     /**
      * Encapsulates writing the index and filter for an SSTable. The state of this object is not valid until it has been closed.
      */
@@ -370,27 +420,15 @@
         public final IFilter bf;
         private DataPosition mark;
 
-        IndexWriter(long keyCount, final SequentialWriter dataFile)
+        IndexWriter(long keyCount)
         {
-            indexFile = SequentialWriter.open(new File(descriptor.filenameFor(Component.PRIMARY_INDEX)));
+            indexFile = new SequentialWriter(new File(descriptor.filenameFor(Component.PRIMARY_INDEX)), writerOption);
             builder = SegmentedFile.getBuilder(DatabaseDescriptor.getIndexAccessMode(), false);
             summary = new IndexSummaryBuilder(keyCount, metadata.params.minIndexInterval, Downsampling.BASE_SAMPLING_LEVEL);
             bf = FilterFactory.getFilter(keyCount, metadata.params.bloomFilterFpChance, true, descriptor.version.hasOldBfHashOrder());
             // register listeners to be alerted when the data files are flushed
-            indexFile.setPostFlushListener(new Runnable()
-            {
-                public void run()
-                {
-                    summary.markIndexSynced(indexFile.getLastFlushOffset());
-                }
-            });
-            dataFile.setPostFlushListener(new Runnable()
-            {
-                public void run()
-                {
-                    summary.markDataSynced(dataFile.getLastFlushOffset());
-                }
-            });
+            indexFile.setPostFlushListener(() -> summary.markIndexSynced(indexFile.getLastFlushOffset()));
+            dataFile.setPostFlushListener(() -> summary.markDataSynced(dataFile.getLastFlushOffset()));
         }
 
         // finds the last (-offset) decorated key that can be guaranteed to occur fully in the flushed portion of the index file
@@ -399,14 +437,14 @@
             return summary.getLastReadableBoundary();
         }
 
-        public void append(DecoratedKey key, RowIndexEntry indexEntry, long dataEnd) throws IOException
+        public void append(DecoratedKey key, RowIndexEntry indexEntry, long dataEnd, ByteBuffer indexInfo) throws IOException
         {
             bf.add(key);
             long indexStart = indexFile.position();
             try
             {
                 ByteBufferUtil.writeWithShortLength(key.getKey(), indexFile);
-                rowIndexEntrySerializer.serialize(indexEntry, indexFile);
+                rowIndexEntrySerializer.serialize(indexEntry, indexFile, indexInfo);
             }
             catch (IOException e)
             {
@@ -461,15 +499,15 @@
             flushBf();
 
             // truncate index file
-            long position = iwriter.indexFile.position();
-            iwriter.indexFile.setDescriptor(descriptor).prepareToCommit();
-            FileUtils.truncate(iwriter.indexFile.getPath(), position);
+            long position = indexFile.position();
+            indexFile.prepareToCommit();
+            FileUtils.truncate(indexFile.getPath(), position);
 
             // save summary
             summary.prepareToCommit();
-            try (IndexSummary summary = iwriter.summary.build(getPartitioner()))
+            try (IndexSummary indexSummary = summary.build(getPartitioner()))
             {
-                SSTableReader.saveSummary(descriptor, first, last, iwriter.builder, dbuilder, summary);
+                SSTableReader.saveSummary(descriptor, first, last, builder, dbuilder, indexSummary);
             }
         }
 

diff --git a/src/java/org/apache/cassandra/io/sstable/metadata/LegacyMetadataSerializer.java b/src/java/org/apache/cassandra/io/sstable/metadata/LegacyMetadataSerializer.java
index 4561520..505de49 100644
--- a/src/java/org/apache/cassandra/io/sstable/metadata/LegacyMetadataSerializer.java
+++ b/src/java/org/apache/cassandra/io/sstable/metadata/LegacyMetadataSerializer.java

@@ -24,7 +24,7 @@
 import com.google.common.collect.Maps;
 
 import org.apache.cassandra.db.TypeSizes;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.apache.cassandra.io.sstable.Component;
 import org.apache.cassandra.io.sstable.Descriptor;
 import org.apache.cassandra.io.sstable.format.Version;
@@ -55,7 +55,7 @@
 
         EstimatedHistogram.serializer.serialize(stats.estimatedPartitionSize, out);
         EstimatedHistogram.serializer.serialize(stats.estimatedColumnCount, out);
-        ReplayPosition.serializer.serialize(stats.commitLogUpperBound, out);
+        CommitLogPosition.serializer.serialize(stats.commitLogUpperBound, out);
         out.writeLong(stats.minTimestamp);
         out.writeLong(stats.maxTimestamp);
         out.writeInt(stats.maxLocalDeletionTime);
@@ -72,7 +72,7 @@
         for (ByteBuffer value : stats.maxClusteringValues)
             ByteBufferUtil.writeWithShortLength(value, out);
         if (version.hasCommitLogLowerBound())
-            ReplayPosition.serializer.serialize(stats.commitLogLowerBound, out);
+            CommitLogPosition.serializer.serialize(stats.commitLogLowerBound, out);
     }
 
     /**
@@ -94,8 +94,8 @@
             {
                 EstimatedHistogram partitionSizes = EstimatedHistogram.serializer.deserialize(in);
                 EstimatedHistogram columnCounts = EstimatedHistogram.serializer.deserialize(in);
-                ReplayPosition commitLogLowerBound = ReplayPosition.NONE;
-                ReplayPosition commitLogUpperBound = ReplayPosition.serializer.deserialize(in);
+                CommitLogPosition commitLogLowerBound = CommitLogPosition.NONE;
+                CommitLogPosition commitLogUpperBound = CommitLogPosition.serializer.deserialize(in);
                 long minTimestamp = in.readLong();
                 long maxTimestamp = in.readLong();
                 int maxLocalDeletionTime = in.readInt();
@@ -120,7 +120,7 @@
                     maxColumnNames.add(ByteBufferUtil.readWithShortLength(in));
 
                 if (descriptor.version.hasCommitLogLowerBound())
-                    commitLogLowerBound = ReplayPosition.serializer.deserialize(in);
+                    commitLogLowerBound = CommitLogPosition.serializer.deserialize(in);
 
                 if (types.contains(MetadataType.VALIDATION))
                     components.put(MetadataType.VALIDATION,

diff --git a/src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java b/src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java
index 53cf0b0..299bc87 100644
--- a/src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java
+++ b/src/java/org/apache/cassandra/io/sstable/metadata/MetadataCollector.java

@@ -17,14 +17,11 @@
  */
 package org.apache.cassandra.io.sstable.metadata;
 
-import java.io.File;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.Collections;
-import java.util.HashSet;
 import java.util.List;
 import java.util.Map;
-import java.util.Set;
 
 import com.google.common.collect.Maps;
 import com.google.common.collect.Ordering;
@@ -32,7 +29,7 @@
 import com.clearspring.analytics.stream.cardinality.HyperLogLogPlus;
 import com.clearspring.analytics.stream.cardinality.ICardinality;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.partitions.PartitionStatisticsCollector;
 import org.apache.cassandra.db.rows.Cell;
@@ -69,8 +66,8 @@
     {
         return new StatsMetadata(defaultPartitionSizeHistogram(),
                                  defaultCellPerPartitionCountHistogram(),
-                                 ReplayPosition.NONE,
-                                 ReplayPosition.NONE,
+                                 CommitLogPosition.NONE,
+                                 CommitLogPosition.NONE,
                                  Long.MIN_VALUE,
                                  Long.MAX_VALUE,
                                  Integer.MAX_VALUE,
@@ -91,8 +88,8 @@
     protected EstimatedHistogram estimatedPartitionSize = defaultPartitionSizeHistogram();
     // TODO: cound the number of row per partition (either with the number of cells, or instead)
     protected EstimatedHistogram estimatedCellPerPartitionCount = defaultCellPerPartitionCountHistogram();
-    protected ReplayPosition commitLogLowerBound = ReplayPosition.NONE;
-    protected ReplayPosition commitLogUpperBound = ReplayPosition.NONE;
+    protected CommitLogPosition commitLogLowerBound = CommitLogPosition.NONE;
+    protected CommitLogPosition commitLogUpperBound = CommitLogPosition.NONE;
     protected final MinMaxLongTracker timestampTracker = new MinMaxLongTracker();
     protected final MinMaxIntTracker localDeletionTimeTracker = new MinMaxIntTracker(Cell.NO_DELETION_TIME, Cell.NO_DELETION_TIME);
     protected final MinMaxIntTracker ttlTracker = new MinMaxIntTracker(Cell.NO_TTL, Cell.NO_TTL);
@@ -126,7 +123,7 @@
     {
         this(comparator);
 
-        ReplayPosition min = null, max = null;
+        CommitLogPosition min = null, max = null;
         for (SSTableReader sstable : sstables)
         {
             if (min == null)
@@ -229,13 +226,13 @@
         ttlTracker.update(newTTL);
     }
 
-    public MetadataCollector commitLogLowerBound(ReplayPosition commitLogLowerBound)
+    public MetadataCollector commitLogLowerBound(CommitLogPosition commitLogLowerBound)
     {
         this.commitLogLowerBound = commitLogLowerBound;
         return this;
     }
 
-    public MetadataCollector commitLogUpperBound(ReplayPosition commitLogUpperBound)
+    public MetadataCollector commitLogUpperBound(CommitLogPosition commitLogUpperBound)
     {
         this.commitLogUpperBound = commitLogUpperBound;
         return this;

diff --git a/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java b/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java
index 635adcd..ae1787a 100644
--- a/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java
+++ b/src/java/org/apache/cassandra/io/sstable/metadata/MetadataSerializer.java

@@ -37,7 +37,7 @@
 import org.apache.cassandra.utils.FBUtilities;
 
 /**
- * Metadata serializer for SSTables version >= 'k'.
+ * Metadata serializer for SSTables {@code version >= 'k'}.
  *
  * <pre>
  * File format := | number of components (4 bytes) | toc | component1 | component2 | ... |

diff --git a/src/java/org/apache/cassandra/io/sstable/metadata/StatsMetadata.java b/src/java/org/apache/cassandra/io/sstable/metadata/StatsMetadata.java
index 07e35bb..e765235 100644
--- a/src/java/org/apache/cassandra/io/sstable/metadata/StatsMetadata.java
+++ b/src/java/org/apache/cassandra/io/sstable/metadata/StatsMetadata.java

@@ -26,7 +26,7 @@
 import org.apache.commons.lang3.builder.EqualsBuilder;
 import org.apache.commons.lang3.builder.HashCodeBuilder;
 import org.apache.cassandra.db.TypeSizes;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.apache.cassandra.io.util.DataInputPlus;
 import org.apache.cassandra.io.util.DataOutputPlus;
 import org.apache.cassandra.utils.ByteBufferUtil;
@@ -42,8 +42,8 @@
 
     public final EstimatedHistogram estimatedPartitionSize;
     public final EstimatedHistogram estimatedColumnCount;
-    public final ReplayPosition commitLogLowerBound;
-    public final ReplayPosition commitLogUpperBound;
+    public final CommitLogPosition commitLogLowerBound;
+    public final CommitLogPosition commitLogUpperBound;
     public final long minTimestamp;
     public final long maxTimestamp;
     public final int minLocalDeletionTime;
@@ -62,8 +62,8 @@
 
     public StatsMetadata(EstimatedHistogram estimatedPartitionSize,
                          EstimatedHistogram estimatedColumnCount,
-                         ReplayPosition commitLogLowerBound,
-                         ReplayPosition commitLogUpperBound,
+                         CommitLogPosition commitLogLowerBound,
+                         CommitLogPosition commitLogUpperBound,
                          long minTimestamp,
                          long maxTimestamp,
                          int minLocalDeletionTime,
@@ -239,7 +239,7 @@
             int size = 0;
             size += EstimatedHistogram.serializer.serializedSize(component.estimatedPartitionSize);
             size += EstimatedHistogram.serializer.serializedSize(component.estimatedColumnCount);
-            size += ReplayPosition.serializer.serializedSize(component.commitLogUpperBound);
+            size += CommitLogPosition.serializer.serializedSize(component.commitLogUpperBound);
             if (version.storeRows())
                 size += 8 + 8 + 4 + 4 + 4 + 4 + 8 + 8; // mix/max timestamp(long), min/maxLocalDeletionTime(int), min/max TTL, compressionRatio(double), repairedAt (long)
             else
@@ -258,7 +258,7 @@
             if (version.storeRows())
                 size += 8 + 8; // totalColumnsSet, totalRows
             if (version.hasCommitLogLowerBound())
-                size += ReplayPosition.serializer.serializedSize(component.commitLogLowerBound);
+                size += CommitLogPosition.serializer.serializedSize(component.commitLogLowerBound);
             return size;
         }
 
@@ -266,7 +266,7 @@
         {
             EstimatedHistogram.serializer.serialize(component.estimatedPartitionSize, out);
             EstimatedHistogram.serializer.serialize(component.estimatedColumnCount, out);
-            ReplayPosition.serializer.serialize(component.commitLogUpperBound, out);
+            CommitLogPosition.serializer.serialize(component.commitLogUpperBound, out);
             out.writeLong(component.minTimestamp);
             out.writeLong(component.maxTimestamp);
             if (version.storeRows())
@@ -296,15 +296,15 @@
             }
 
             if (version.hasCommitLogLowerBound())
-                ReplayPosition.serializer.serialize(component.commitLogLowerBound, out);
+                CommitLogPosition.serializer.serialize(component.commitLogLowerBound, out);
         }
 
         public StatsMetadata deserialize(Version version, DataInputPlus in) throws IOException
         {
             EstimatedHistogram partitionSizes = EstimatedHistogram.serializer.deserialize(in);
             EstimatedHistogram columnCounts = EstimatedHistogram.serializer.deserialize(in);
-            ReplayPosition commitLogLowerBound = ReplayPosition.NONE, commitLogUpperBound;
-            commitLogUpperBound = ReplayPosition.serializer.deserialize(in);
+            CommitLogPosition commitLogLowerBound = CommitLogPosition.NONE, commitLogUpperBound;
+            commitLogUpperBound = CommitLogPosition.serializer.deserialize(in);
             long minTimestamp = in.readLong();
             long maxTimestamp = in.readLong();
             // We use MAX_VALUE as that's the default value for "no deletion time"
@@ -337,7 +337,7 @@
             long totalRows = version.storeRows() ? in.readLong() : -1L;
 
             if (version.hasCommitLogLowerBound())
-                commitLogLowerBound = ReplayPosition.serializer.deserialize(in);
+                commitLogLowerBound = CommitLogPosition.serializer.deserialize(in);
 
             return new StatsMetadata(partitionSizes,
                                      columnCounts,

diff --git a/src/java/org/apache/cassandra/io/util/AbstractReaderFileProxy.java b/src/java/org/apache/cassandra/io/util/AbstractReaderFileProxy.java
new file mode 100644
index 0000000..5dc0d37
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/util/AbstractReaderFileProxy.java

@@ -0,0 +1,61 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.io.util;
+
+public abstract class AbstractReaderFileProxy implements ReaderFileProxy
+{
+    protected final ChannelProxy channel;
+    protected final long fileLength;
+
+    public AbstractReaderFileProxy(ChannelProxy channel, long fileLength)
+    {
+        this.channel = channel;
+        this.fileLength = fileLength >= 0 ? fileLength : channel.size();
+    }
+
+    @Override
+    public ChannelProxy channel()
+    {
+        return channel;
+    }
+
+    @Override
+    public long fileLength()
+    {
+        return fileLength;
+    }
+
+    @Override
+    public String toString()
+    {
+        return getClass().getSimpleName() + "(filePath='" + channel + "')";
+    }
+
+    @Override
+    public void close()
+    {
+        // nothing in base class
+    }
+
+    @Override
+    public double getCrcCheckChance()
+    {
+        return 0; // Only valid for compressed files.
+    }
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/io/util/BufferManagingRebufferer.java b/src/java/org/apache/cassandra/io/util/BufferManagingRebufferer.java
new file mode 100644
index 0000000..95af31f
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/util/BufferManagingRebufferer.java

@@ -0,0 +1,147 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.io.util;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.utils.memory.BufferPool;
+
+/**
+ * Buffer manager used for reading from a ChunkReader when cache is not in use. Instances of this class are
+ * reader-specific and thus do not need to be thread-safe since the reader itself isn't.
+ *
+ * The instances reuse themselves as the BufferHolder to avoid having to return a new object for each rebuffer call.
+ */
+public abstract class BufferManagingRebufferer implements Rebufferer, Rebufferer.BufferHolder
+{
+    protected final ChunkReader source;
+    protected final ByteBuffer buffer;
+    protected long offset = 0;
+
+    public static BufferManagingRebufferer on(ChunkReader wrapped)
+    {
+        return wrapped.alignmentRequired()
+             ? new Aligned(wrapped)
+             : new Unaligned(wrapped);
+    }
+
+    abstract long alignedPosition(long position);
+
+    public BufferManagingRebufferer(ChunkReader wrapped)
+    {
+        this.source = wrapped;
+        buffer = RandomAccessReader.allocateBuffer(wrapped.chunkSize(), wrapped.preferredBufferType());
+        buffer.limit(0);
+    }
+
+    @Override
+    public void closeReader()
+    {
+        BufferPool.put(buffer);
+        offset = -1;
+    }
+
+    @Override
+    public void close()
+    {
+        assert offset == -1;    // reader must be closed at this point.
+        source.close();
+    }
+
+    @Override
+    public ChannelProxy channel()
+    {
+        return source.channel();
+    }
+
+    @Override
+    public long fileLength()
+    {
+        return source.fileLength();
+    }
+
+    @Override
+    public BufferHolder rebuffer(long position)
+    {
+        offset = alignedPosition(position);
+        source.readChunk(offset, buffer);
+        return this;
+    }
+
+    @Override
+    public double getCrcCheckChance()
+    {
+        return source.getCrcCheckChance();
+    }
+
+    @Override
+    public String toString()
+    {
+        return "BufferManagingRebufferer." + getClass().getSimpleName() + ":" + source.toString();
+    }
+
+    // BufferHolder methods
+
+    public ByteBuffer buffer()
+    {
+        return buffer;
+    }
+
+    public long offset()
+    {
+        return offset;
+    }
+
+    @Override
+    public void release()
+    {
+        // nothing to do, we don't delete buffers before we're closed.
+    }
+
+    public static class Unaligned extends BufferManagingRebufferer
+    {
+        public Unaligned(ChunkReader wrapped)
+        {
+            super(wrapped);
+        }
+
+        @Override
+        long alignedPosition(long position)
+        {
+            return position;
+        }
+    }
+
+    public static class Aligned extends BufferManagingRebufferer
+    {
+        public Aligned(ChunkReader wrapped)
+        {
+            super(wrapped);
+            assert Integer.bitCount(wrapped.chunkSize()) == 1;
+        }
+
+        @Override
+        long alignedPosition(long position)
+        {
+            return position & -buffer.capacity();
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/io/util/BufferedSegmentedFile.java b/src/java/org/apache/cassandra/io/util/BufferedSegmentedFile.java
index 090c5bd..a46ec14 100644
--- a/src/java/org/apache/cassandra/io/util/BufferedSegmentedFile.java
+++ b/src/java/org/apache/cassandra/io/util/BufferedSegmentedFile.java

@@ -17,11 +17,19 @@
  */
 package org.apache.cassandra.io.util;
 
+import org.apache.cassandra.cache.ChunkCache;
+import org.apache.cassandra.io.compress.BufferType;
+
 public class BufferedSegmentedFile extends SegmentedFile
 {
     public BufferedSegmentedFile(ChannelProxy channel, int bufferSize, long length)
     {
-        super(new Cleanup(channel), channel, bufferSize, length);
+        this(channel, createRebufferer(channel, length, bufferSize), length);
+    }
+
+    private BufferedSegmentedFile(ChannelProxy channel, RebuffererFactory rebufferer, long length)
+    {
+        super(new Cleanup(channel, rebufferer), channel, rebufferer, length);
     }
 
     private BufferedSegmentedFile(BufferedSegmentedFile copy)
@@ -29,6 +37,11 @@
         super(copy);
     }
 
+    private static RebuffererFactory createRebufferer(ChannelProxy channel, long length, int bufferSize)
+    {
+        return ChunkCache.maybeWrap(new SimpleChunkReader(channel, length, BufferType.OFF_HEAP, bufferSize));
+    }
+
     public static class Builder extends SegmentedFile.Builder
     {
         public SegmentedFile complete(ChannelProxy channel, int bufferSize, long overrideLength)

diff --git a/src/java/org/apache/cassandra/io/util/ChannelProxy.java b/src/java/org/apache/cassandra/io/util/ChannelProxy.java
index f866160..361b0d3 100644
--- a/src/java/org/apache/cassandra/io/util/ChannelProxy.java
+++ b/src/java/org/apache/cassandra/io/util/ChannelProxy.java

@@ -125,6 +125,7 @@
     {
         try
         {
+            // FIXME: consider wrapping in a while loop
             return channel.read(buffer, position);
         }
         catch (IOException e)

diff --git a/src/java/org/apache/cassandra/io/util/ChecksumWriter.java b/src/java/org/apache/cassandra/io/util/ChecksumWriter.java
new file mode 100644
index 0000000..dc5eaea
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/util/ChecksumWriter.java

@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.io.util;
+
+import java.io.*;
+import java.nio.ByteBuffer;
+import java.nio.file.Files;
+import java.util.zip.CRC32;
+
+import javax.annotation.Nonnull;
+
+import com.google.common.base.Charsets;
+
+import org.apache.cassandra.io.FSWriteError;
+
+public class ChecksumWriter
+{
+    private final CRC32 incrementalChecksum = new CRC32();
+    private final DataOutput incrementalOut;
+    private final CRC32 fullChecksum = new CRC32();
+
+    public ChecksumWriter(DataOutput incrementalOut)
+    {
+        this.incrementalOut = incrementalOut;
+    }
+
+    public void writeChunkSize(int length)
+    {
+        try
+        {
+            incrementalOut.writeInt(length);
+        }
+        catch (IOException e)
+        {
+            throw new IOError(e);
+        }
+    }
+
+    // checksumIncrementalResult indicates if the checksum we compute for this buffer should itself be
+    // included in the full checksum, translating to if the partial checksum is serialized along with the
+    // data it checksums (in which case the file checksum as calculated by external tools would mismatch if
+    // we did not include it), or independently.
+
+    // CompressedSequentialWriters serialize the partial checksums inline with the compressed data chunks they
+    // corroborate, whereas ChecksummedSequentialWriters serialize them to a different file.
+    public void appendDirect(ByteBuffer bb, boolean checksumIncrementalResult)
+    {
+        try
+        {
+            ByteBuffer toAppend = bb.duplicate();
+            toAppend.mark();
+            incrementalChecksum.update(toAppend);
+            toAppend.reset();
+
+            int incrementalChecksumValue = (int) incrementalChecksum.getValue();
+            incrementalOut.writeInt(incrementalChecksumValue);
+
+            fullChecksum.update(toAppend);
+            if (checksumIncrementalResult)
+            {
+                ByteBuffer byteBuffer = ByteBuffer.allocate(4);
+                byteBuffer.putInt(incrementalChecksumValue);
+                assert byteBuffer.arrayOffset() == 0;
+                fullChecksum.update(byteBuffer.array(), 0, byteBuffer.array().length);
+            }
+            incrementalChecksum.reset();
+
+        }
+        catch (IOException e)
+        {
+            throw new IOError(e);
+        }
+    }
+
+    public void writeFullChecksum(@Nonnull File digestFile)
+    {
+        try (BufferedWriter out = Files.newBufferedWriter(digestFile.toPath(), Charsets.UTF_8))
+        {
+            out.write(String.valueOf(fullChecksum.getValue()));
+        }
+        catch (IOException e)
+        {
+            throw new FSWriteError(e, digestFile);
+        }
+    }
+}
+

diff --git a/src/java/org/apache/cassandra/io/util/ChecksummedRandomAccessReader.java b/src/java/org/apache/cassandra/io/util/ChecksummedRandomAccessReader.java
index 30f1e0c..25ef615 100644
--- a/src/java/org/apache/cassandra/io/util/ChecksummedRandomAccessReader.java
+++ b/src/java/org/apache/cassandra/io/util/ChecksummedRandomAccessReader.java

@@ -19,13 +19,13 @@
 
 import java.io.File;
 import java.io.IOException;
-import java.util.zip.CRC32;
 
 import org.apache.cassandra.io.compress.BufferType;
+import org.apache.cassandra.io.util.DataIntegrityMetadata.ChecksumValidator;
 import org.apache.cassandra.utils.ByteBufferUtil;
-import org.apache.cassandra.utils.Throwables;
+import org.apache.cassandra.utils.ChecksumType;
 
-public class ChecksummedRandomAccessReader extends RandomAccessReader
+public class ChecksummedRandomAccessReader
 {
     @SuppressWarnings("serial")
     public static class CorruptFileException extends RuntimeException
@@ -39,67 +39,56 @@
         }
     }
 
-    private final DataIntegrityMetadata.ChecksumValidator validator;
-
-    private ChecksummedRandomAccessReader(Builder builder)
+    static class ChecksummedRebufferer extends BufferManagingRebufferer
     {
-        super(builder);
-        this.validator = builder.validator;
-    }
+        private final DataIntegrityMetadata.ChecksumValidator validator;
 
-    @SuppressWarnings("resource")
-    @Override
-    protected void reBufferStandard()
-    {
-        long desiredPosition = current();
-        // align with buffer size, as checksums were computed in chunks of buffer size each.
-        bufferOffset = (desiredPosition / buffer.capacity()) * buffer.capacity();
-
-        buffer.clear();
-
-        long position = bufferOffset;
-        while (buffer.hasRemaining())
+        public ChecksummedRebufferer(ChannelProxy channel, ChecksumValidator validator)
         {
-            int n = channel.read(buffer, position);
-            if (n < 0)
-                break;
-            position += n;
+            super(new SimpleChunkReader(channel, channel.size(), BufferType.ON_HEAP, validator.chunkSize));
+            this.validator = validator;
         }
 
-        buffer.flip();
+        @Override
+        public BufferHolder rebuffer(long desiredPosition)
+        {
+            if (desiredPosition != offset + buffer.position())
+                validator.seek(desiredPosition);
 
-        try
-        {
-            validator.validate(ByteBufferUtil.getArray(buffer), 0, buffer.remaining());
-        }
-        catch (IOException e)
-        {
-            throw new CorruptFileException(e, channel.filePath());
+            // align with buffer size, as checksums were computed in chunks of buffer size each.
+            offset = alignedPosition(desiredPosition);
+            source.readChunk(offset, buffer);
+
+            try
+            {
+                validator.validate(ByteBufferUtil.getArray(buffer), 0, buffer.remaining());
+            }
+            catch (IOException e)
+            {
+                throw new CorruptFileException(e, channel().filePath());
+            }
+
+            return this;
         }
 
-        buffer.position((int) (desiredPosition - bufferOffset));
-    }
+        @Override
+        public void close()
+        {
+            try
+            {
+                source.close();
+            }
+            finally
+            {
+                validator.close();
+            }
+        }
 
-    @Override
-    protected void reBufferMmap()
-    {
-        throw new AssertionError("Unsupported operation");
-    }
-
-    @Override
-    public void seek(long newPosition)
-    {
-        validator.seek(newPosition);
-        super.seek(newPosition);
-    }
-
-    @Override
-    public void close()
-    {
-        Throwables.perform(channel.filePath(), Throwables.FileOpType.READ,
-                           super::close,
-                           validator::close,
-                           channel::close);
+        @Override
+        long alignedPosition(long desiredPosition)
+        {
+            return (desiredPosition / buffer.capacity()) * buffer.capacity();
+        }
     }
 
     public static final class Builder extends RandomAccessReader.Builder
@@ -110,18 +99,22 @@
         public Builder(File file, File crcFile) throws IOException
         {
             super(new ChannelProxy(file));
-            this.validator = new DataIntegrityMetadata.ChecksumValidator(new CRC32(),
+            this.validator = new DataIntegrityMetadata.ChecksumValidator(ChecksumType.CRC32,
                                                                          RandomAccessReader.open(crcFile),
                                                                          file.getPath());
+        }
 
-            super.bufferSize(validator.chunkSize)
-                 .bufferType(BufferType.ON_HEAP);
+        @Override
+        protected Rebufferer createRebufferer()
+        {
+            return new ChecksummedRebufferer(channel, validator);
         }
 
         @Override
         public RandomAccessReader build()
         {
-            return new ChecksummedRandomAccessReader(this);
+            // Always own and close the channel.
+            return buildWithChannel();
         }
     }
 }

diff --git a/src/java/org/apache/cassandra/io/util/ChecksummedSequentialWriter.java b/src/java/org/apache/cassandra/io/util/ChecksummedSequentialWriter.java
index fd88151..f89e7cc 100644
--- a/src/java/org/apache/cassandra/io/util/ChecksummedSequentialWriter.java
+++ b/src/java/org/apache/cassandra/io/util/ChecksummedSequentialWriter.java

@@ -19,20 +19,25 @@
 
 import java.io.File;
 import java.nio.ByteBuffer;
-
-import org.apache.cassandra.io.compress.BufferType;
+import java.util.Optional;
 
 public class ChecksummedSequentialWriter extends SequentialWriter
 {
-    private final SequentialWriter crcWriter;
-    private final DataIntegrityMetadata.ChecksumWriter crcMetadata;
+    private static final SequentialWriterOption CRC_WRITER_OPTION = SequentialWriterOption.newBuilder()
+                                                                                          .bufferSize(8 * 1024)
+                                                                                          .build();
 
-    public ChecksummedSequentialWriter(File file, int bufferSize, File crcPath)
+    private final SequentialWriter crcWriter;
+    private final ChecksumWriter crcMetadata;
+    private final Optional<File> digestFile;
+
+    public ChecksummedSequentialWriter(File file, File crcPath, File digestFile, SequentialWriterOption option)
     {
-        super(file, bufferSize, BufferType.ON_HEAP);
-        crcWriter = new SequentialWriter(crcPath, 8 * 1024, BufferType.ON_HEAP);
-        crcMetadata = new DataIntegrityMetadata.ChecksumWriter(crcWriter);
+        super(file, option);
+        crcWriter = new SequentialWriter(crcPath, CRC_WRITER_OPTION);
+        crcMetadata = new ChecksumWriter(crcWriter);
         crcMetadata.writeChunkSize(buffer.capacity());
+        this.digestFile = Optional.ofNullable(digestFile);
     }
 
     @Override
@@ -63,9 +68,8 @@
         protected void doPrepare()
         {
             syncInternal();
-            if (descriptor != null)
-                crcMetadata.writeFullChecksum(descriptor);
-            crcWriter.setDescriptor(descriptor).prepareToCommit();
+            digestFile.ifPresent(crcMetadata::writeFullChecksum);
+            crcWriter.prepareToCommit();
         }
     }
 

diff --git a/src/java/org/apache/cassandra/io/util/ChunkReader.java b/src/java/org/apache/cassandra/io/util/ChunkReader.java
new file mode 100644
index 0000000..a04299a
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/util/ChunkReader.java

@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.io.util;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.io.compress.BufferType;
+
+/**
+ * RandomFileReader component that reads data from a file into a provided buffer and may have requirements over the
+ * size and alignment of reads.
+ * A caching or buffer-managing rebufferer will reference one of these to do the actual reading.
+ * Note: Implementations of this interface must be thread-safe!
+ */
+public interface ChunkReader extends RebuffererFactory
+{
+    /**
+     * Read the chunk at the given position, attempting to fill the capacity of the given buffer.
+     * The filled buffer must be positioned at 0, with limit set at the size of the available data.
+     * The source may have requirements for the positioning and/or size of the buffer (e.g. chunk-aligned and
+     * chunk-sized). These must be satisfied by the caller. 
+     */
+    void readChunk(long position, ByteBuffer buffer);
+
+    /**
+     * Buffer size required for this rebufferer. Must be power of 2 if alignment is required.
+     */
+    int chunkSize();
+
+    /**
+     * If true, positions passed to this rebufferer must be aligned to chunkSize.
+     */
+    boolean alignmentRequired();
+
+    /**
+     * Specifies type of buffer the caller should attempt to give.
+     * This is not guaranteed to be fulfilled.
+     */
+    BufferType preferredBufferType();
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/io/util/CompressedSegmentedFile.java b/src/java/org/apache/cassandra/io/util/CompressedSegmentedFile.java
index 16f791a..7365d40 100644
--- a/src/java/org/apache/cassandra/io/util/CompressedSegmentedFile.java
+++ b/src/java/org/apache/cassandra/io/util/CompressedSegmentedFile.java

@@ -17,49 +17,61 @@
  */
 package org.apache.cassandra.io.util;
 
-import com.google.common.util.concurrent.RateLimiter;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.concurrent.ThreadLocalRandom;
 
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.primitives.Ints;
 
+import org.apache.cassandra.cache.ChunkCache;
 import org.apache.cassandra.config.Config;
+import org.apache.cassandra.config.Config.DiskAccessMode;
+import org.apache.cassandra.io.compress.*;
+import org.apache.cassandra.io.sstable.CorruptSSTableException;
 import org.apache.cassandra.config.DatabaseDescriptor;
-import org.apache.cassandra.io.compress.CompressedRandomAccessReader;
-import org.apache.cassandra.io.compress.CompressedSequentialWriter;
-import org.apache.cassandra.io.compress.CompressionMetadata;
-import org.apache.cassandra.utils.JVMStabilityInspector;
 import org.apache.cassandra.utils.concurrent.Ref;
 
 public class CompressedSegmentedFile extends SegmentedFile implements ICompressedFile
 {
-    private static final Logger logger = LoggerFactory.getLogger(CompressedSegmentedFile.class);
-    private static final boolean useMmap = DatabaseDescriptor.getDiskAccessMode() == Config.DiskAccessMode.mmap;
-
     public final CompressionMetadata metadata;
-    private final MmappedRegions regions;
 
-    public CompressedSegmentedFile(ChannelProxy channel, int bufferSize, CompressionMetadata metadata)
+    public CompressedSegmentedFile(ChannelProxy channel, CompressionMetadata metadata, Config.DiskAccessMode mode)
     {
         this(channel,
-             bufferSize,
              metadata,
-             useMmap
+             mode == DiskAccessMode.mmap
              ? MmappedRegions.map(channel, metadata)
              : null);
     }
 
-    public CompressedSegmentedFile(ChannelProxy channel, int bufferSize, CompressionMetadata metadata, MmappedRegions regions)
+    public CompressedSegmentedFile(ChannelProxy channel, CompressionMetadata metadata, MmappedRegions regions)
     {
-        super(new Cleanup(channel, metadata, regions), channel, bufferSize, metadata.dataLength, metadata.compressedFileLength);
+        this(channel, metadata, regions, createRebufferer(channel, metadata, regions));
+    }
+
+    private static RebuffererFactory createRebufferer(ChannelProxy channel, CompressionMetadata metadata, MmappedRegions regions)
+    {
+        return ChunkCache.maybeWrap(chunkReader(channel, metadata, regions));
+    }
+
+    public static ChunkReader chunkReader(ChannelProxy channel, CompressionMetadata metadata, MmappedRegions regions)
+    {
+        return regions != null
+               ? new Mmap(channel, metadata, regions)
+               : new Standard(channel, metadata);
+    }
+
+    public CompressedSegmentedFile(ChannelProxy channel, CompressionMetadata metadata, MmappedRegions regions, RebuffererFactory rebufferer)
+    {
+        super(new Cleanup(channel, metadata, regions, rebufferer), channel, rebufferer, metadata.compressedFileLength);
         this.metadata = metadata;
-        this.regions = regions;
     }
 
     private CompressedSegmentedFile(CompressedSegmentedFile copy)
     {
         super(copy);
         this.metadata = copy.metadata;
-        this.regions = copy.regions;
     }
 
     public ChannelProxy channel()
@@ -67,33 +79,21 @@
         return channel;
     }
 
-    public MmappedRegions regions()
-    {
-        return regions;
-    }
-
     private static final class Cleanup extends SegmentedFile.Cleanup
     {
         final CompressionMetadata metadata;
-        private final MmappedRegions regions;
 
-        protected Cleanup(ChannelProxy channel, CompressionMetadata metadata, MmappedRegions regions)
+        protected Cleanup(ChannelProxy channel, CompressionMetadata metadata, MmappedRegions regions, ReaderFileProxy rebufferer)
         {
-            super(channel);
+            super(channel, rebufferer);
             this.metadata = metadata;
-            this.regions = regions;
         }
         public void tidy()
         {
-            Throwable err = regions == null ? null : regions.close(null);
-            if (err != null)
+            if (ChunkCache.instance != null)
             {
-                JVMStabilityInspector.inspectThrowable(err);
-
-                // This is not supposed to happen
-                logger.error("Error while closing mmapped regions", err);
+                ChunkCache.instance.invalidateFile(name());
             }
-
             metadata.close();
 
             super.tidy();
@@ -114,9 +114,12 @@
     public static class Builder extends SegmentedFile.Builder
     {
         final CompressedSequentialWriter writer;
+        final Config.DiskAccessMode mode;
+
         public Builder(CompressedSequentialWriter writer)
         {
             this.writer = writer;
+            this.mode = DatabaseDescriptor.getDiskAccessMode();
         }
 
         protected CompressionMetadata metadata(String path, long overrideLength)
@@ -129,7 +132,7 @@
 
         public SegmentedFile complete(ChannelProxy channel, int bufferSize, long overrideLength)
         {
-            return new CompressedSegmentedFile(channel, bufferSize, metadata(channel.filePath(), overrideLength));
+            return new CompressedSegmentedFile(channel, metadata(channel.filePath(), overrideLength), mode);
         }
     }
 
@@ -140,18 +143,216 @@
         super.dropPageCache(metadata.chunkFor(before).offset);
     }
 
-    public RandomAccessReader createReader()
-    {
-        return new CompressedRandomAccessReader.Builder(this).build();
-    }
-
-    public RandomAccessReader createReader(RateLimiter limiter)
-    {
-        return new CompressedRandomAccessReader.Builder(this).limiter(limiter).build();
-    }
-
     public CompressionMetadata getMetadata()
     {
         return metadata;
     }
+
+    public long dataLength()
+    {
+        return metadata.dataLength;
+    }
+
+    @VisibleForTesting
+    public abstract static class CompressedChunkReader extends AbstractReaderFileProxy implements ChunkReader
+    {
+        final CompressionMetadata metadata;
+
+        public CompressedChunkReader(ChannelProxy channel, CompressionMetadata metadata)
+        {
+            super(channel, metadata.dataLength);
+            this.metadata = metadata;
+            assert Integer.bitCount(metadata.chunkLength()) == 1; //must be a power of two
+        }
+
+        @VisibleForTesting
+        public double getCrcCheckChance()
+        {
+            return metadata.parameters.getCrcCheckChance();
+        }
+
+        @Override
+        public String toString()
+        {
+            return String.format("CompressedChunkReader.%s(%s - %s, chunk length %d, data length %d)",
+                                 getClass().getSimpleName(),
+                                 channel.filePath(),
+                                 metadata.compressor().getClass().getSimpleName(),
+                                 metadata.chunkLength(),
+                                 metadata.dataLength);
+        }
+
+        @Override
+        public int chunkSize()
+        {
+            return metadata.chunkLength();
+        }
+
+        @Override
+        public boolean alignmentRequired()
+        {
+            return true;
+        }
+
+        @Override
+        public BufferType preferredBufferType()
+        {
+            return metadata.compressor().preferredBufferType();
+        }
+
+        @Override
+        public Rebufferer instantiateRebufferer()
+        {
+            return BufferManagingRebufferer.on(this);
+        }
+    }
+
+    static class Standard extends CompressedChunkReader
+    {
+        // we read the raw compressed bytes into this buffer, then uncompressed them into the provided one.
+        private final ThreadLocal<ByteBuffer> compressedHolder;
+
+        public Standard(ChannelProxy channel, CompressionMetadata metadata)
+        {
+            super(channel, metadata);
+            compressedHolder = ThreadLocal.withInitial(this::allocateBuffer);
+        }
+
+        public ByteBuffer allocateBuffer()
+        {
+            return allocateBuffer(metadata.compressor().initialCompressedBufferLength(metadata.chunkLength()));
+        }
+
+        public ByteBuffer allocateBuffer(int size)
+        {
+            return metadata.compressor().preferredBufferType().allocate(size);
+        }
+
+        @Override
+        public void readChunk(long position, ByteBuffer uncompressed)
+        {
+            try
+            {
+                // accesses must always be aligned
+                assert (position & -uncompressed.capacity()) == position;
+                assert position <= fileLength;
+
+                CompressionMetadata.Chunk chunk = metadata.chunkFor(position);
+                ByteBuffer compressed = compressedHolder.get();
+
+                if (compressed.capacity() < chunk.length)
+                {
+                    compressed = allocateBuffer(chunk.length);
+                    compressedHolder.set(compressed);
+                }
+                else
+                {
+                    compressed.clear();
+                }
+
+                compressed.limit(chunk.length);
+                if (channel.read(compressed, chunk.offset) != chunk.length)
+                    throw new CorruptBlockException(channel.filePath(), chunk);
+
+                compressed.flip();
+                uncompressed.clear();
+
+                try
+                {
+                    metadata.compressor().uncompress(compressed, uncompressed);
+                }
+                catch (IOException e)
+                {
+                    throw new CorruptBlockException(channel.filePath(), chunk);
+                }
+                finally
+                {
+                    uncompressed.flip();
+                }
+
+                if (getCrcCheckChance() > ThreadLocalRandom.current().nextDouble())
+                {
+                    compressed.rewind();
+                    int checksum = (int) metadata.checksumType.of(compressed);
+
+                    compressed.clear().limit(Integer.BYTES);
+                    if (channel.read(compressed, chunk.offset + chunk.length) != Integer.BYTES
+                        || compressed.getInt(0) != checksum)
+                        throw new CorruptBlockException(channel.filePath(), chunk);
+                }
+            }
+            catch (CorruptBlockException e)
+            {
+                throw new CorruptSSTableException(e, channel.filePath());
+            }
+        }
+    }
+
+    static class Mmap extends CompressedChunkReader
+    {
+        protected final MmappedRegions regions;
+
+        public Mmap(ChannelProxy channel, CompressionMetadata metadata, MmappedRegions regions)
+        {
+            super(channel, metadata);
+            this.regions = regions;
+        }
+
+        @Override
+        public void readChunk(long position, ByteBuffer uncompressed)
+        {
+            try
+            {
+                // accesses must always be aligned
+                assert (position & -uncompressed.capacity()) == position;
+                assert position <= fileLength;
+
+                CompressionMetadata.Chunk chunk = metadata.chunkFor(position);
+
+                MmappedRegions.Region region = regions.floor(chunk.offset);
+                long segmentOffset = region.offset();
+                int chunkOffset = Ints.checkedCast(chunk.offset - segmentOffset);
+                ByteBuffer compressedChunk = region.buffer();
+
+                compressedChunk.position(chunkOffset).limit(chunkOffset + chunk.length);
+
+                uncompressed.clear();
+
+                try
+                {
+                    metadata.compressor().uncompress(compressedChunk, uncompressed);
+                }
+                catch (IOException e)
+                {
+                    throw new CorruptBlockException(channel.filePath(), chunk);
+                }
+                finally
+                {
+                    uncompressed.flip();
+                }
+
+                if (getCrcCheckChance() > ThreadLocalRandom.current().nextDouble())
+                {
+                    compressedChunk.position(chunkOffset).limit(chunkOffset + chunk.length);
+
+                    int checksum = (int) metadata.checksumType.of(compressedChunk);
+
+                    compressedChunk.limit(compressedChunk.capacity());
+                    if (compressedChunk.getInt() != checksum)
+                        throw new CorruptBlockException(channel.filePath(), chunk);
+                }
+            }
+            catch (CorruptBlockException e)
+            {
+                throw new CorruptSSTableException(e, channel.filePath());
+            }
+
+        }
+
+        public void close()
+        {
+            regions.closeQuietly();
+            super.close();
+        }
+    }
 }

diff --git a/src/java/org/apache/cassandra/io/util/DataIntegrityMetadata.java b/src/java/org/apache/cassandra/io/util/DataIntegrityMetadata.java
index 0a89d74..0eecef3 100644
--- a/src/java/org/apache/cassandra/io/util/DataIntegrityMetadata.java
+++ b/src/java/org/apache/cassandra/io/util/DataIntegrityMetadata.java

@@ -17,23 +17,15 @@
  */
 package org.apache.cassandra.io.util;
 
-import java.io.BufferedWriter;
 import java.io.Closeable;
-import java.io.DataOutput;
 import java.io.File;
-import java.io.IOError;
 import java.io.IOException;
-import java.nio.ByteBuffer;
-import java.nio.file.Files;
-import java.util.zip.CRC32;
 import java.util.zip.CheckedInputStream;
 import java.util.zip.Checksum;
 
-import com.google.common.base.Charsets;
-
-import org.apache.cassandra.io.FSWriteError;
 import org.apache.cassandra.io.sstable.Component;
 import org.apache.cassandra.io.sstable.Descriptor;
+import org.apache.cassandra.utils.ChecksumType;
 import org.apache.cassandra.utils.Throwables;
 
 public class DataIntegrityMetadata
@@ -45,21 +37,21 @@
 
     public static class ChecksumValidator implements Closeable
     {
-        private final Checksum checksum;
+        private final ChecksumType checksumType;
         private final RandomAccessReader reader;
         public final int chunkSize;
         private final String dataFilename;
 
         public ChecksumValidator(Descriptor descriptor) throws IOException
         {
-            this(descriptor.version.uncompressedChecksumType().newInstance(),
+            this(descriptor.version.uncompressedChecksumType(),
                  RandomAccessReader.open(new File(descriptor.filenameFor(Component.CRC))),
                  descriptor.filenameFor(Component.DATA));
         }
 
-        public ChecksumValidator(Checksum checksum, RandomAccessReader reader, String dataFilename) throws IOException
+        public ChecksumValidator(ChecksumType checksumType, RandomAccessReader reader, String dataFilename) throws IOException
         {
-            this.checksum = checksum;
+            this.checksumType = checksumType;
             this.reader = reader;
             this.dataFilename = dataFilename;
             chunkSize = reader.readInt();
@@ -79,9 +71,7 @@
 
         public void validate(byte[] bytes, int start, int end) throws IOException
         {
-            checksum.update(bytes, start, end);
-            int current = (int) checksum.getValue();
-            checksum.reset();
+            int current = (int) checksumType.of(bytes, start, end);
             int actual = reader.readInt();
             if (current != actual)
                 throw new IOException("Corrupted File : " + dataFilename);
@@ -143,80 +133,4 @@
                                dataReader::close);
         }
     }
-
-
-    public static class ChecksumWriter
-    {
-        private final CRC32 incrementalChecksum = new CRC32();
-        private final DataOutput incrementalOut;
-        private final CRC32 fullChecksum = new CRC32();
-
-        public ChecksumWriter(DataOutput incrementalOut)
-        {
-            this.incrementalOut = incrementalOut;
-        }
-
-        public void writeChunkSize(int length)
-        {
-            try
-            {
-                incrementalOut.writeInt(length);
-            }
-            catch (IOException e)
-            {
-                throw new IOError(e);
-            }
-        }
-
-        // checksumIncrementalResult indicates if the checksum we compute for this buffer should itself be
-        // included in the full checksum, translating to if the partial checksum is serialized along with the
-        // data it checksums (in which case the file checksum as calculated by external tools would mismatch if
-        // we did not include it), or independently.
-
-        // CompressedSequentialWriters serialize the partial checksums inline with the compressed data chunks they
-        // corroborate, whereas ChecksummedSequentialWriters serialize them to a different file.
-        public void appendDirect(ByteBuffer bb, boolean checksumIncrementalResult)
-        {
-            try
-            {
-
-                ByteBuffer toAppend = bb.duplicate();
-                toAppend.mark();
-                incrementalChecksum.update(toAppend);
-                toAppend.reset();
-
-                int incrementalChecksumValue = (int) incrementalChecksum.getValue();
-                incrementalOut.writeInt(incrementalChecksumValue);
-
-                fullChecksum.update(toAppend);
-                if (checksumIncrementalResult)
-                {
-                    ByteBuffer byteBuffer = ByteBuffer.allocate(4);
-                    byteBuffer.putInt(incrementalChecksumValue);
-                    fullChecksum.update(byteBuffer.array(), 0, byteBuffer.array().length);
-                }
-                incrementalChecksum.reset();
-
-            }
-            catch (IOException e)
-            {
-                throw new IOError(e);
-            }
-        }
-
-        public void writeFullChecksum(Descriptor descriptor)
-        {
-            if (descriptor.digestComponent == null)
-                throw new NullPointerException("Null digest component for " + descriptor.ksname + '.' + descriptor.cfname + " file " + descriptor.baseFilename());
-            File outFile = new File(descriptor.filenameFor(descriptor.digestComponent));
-            try (BufferedWriter out =Files.newBufferedWriter(outFile.toPath(), Charsets.UTF_8))
-            {
-                out.write(String.valueOf(fullChecksum.getValue()));
-            }
-            catch (IOException e)
-            {
-                throw new FSWriteError(e, outFile);
-            }
-        }
-    }
 }

diff --git a/src/java/org/apache/cassandra/io/util/DataOutputBuffer.java b/src/java/org/apache/cassandra/io/util/DataOutputBuffer.java
index 80a7fe2..f08b48f 100644
--- a/src/java/org/apache/cassandra/io/util/DataOutputBuffer.java
+++ b/src/java/org/apache/cassandra/io/util/DataOutputBuffer.java

@@ -21,11 +21,12 @@
 import java.nio.ByteBuffer;
 import java.nio.channels.WritableByteChannel;
 
-import org.apache.cassandra.config.Config;
-
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Preconditions;
 
+import io.netty.util.Recycler;
+import org.apache.cassandra.config.Config;
+
 /**
  * An implementation of the DataOutputStream interface using a FastByteArrayOutputStream and exposing
  * its buffer so copies can be avoided.
@@ -39,19 +40,58 @@
      */
     private static final long DOUBLING_THRESHOLD = Long.getLong(Config.PROPERTY_PREFIX + "DOB_DOUBLING_THRESHOLD_MB", 64);
 
+    /*
+     * Only recycle OutputBuffers up to 1Mb. Larger buffers will be trimmed back to this size.
+     */
+    private static final int MAX_RECYCLE_BUFFER_SIZE = 1024 * 1024;
+
+    private static final int DEFAULT_INITIAL_BUFFER_SIZE = 128;
+
+    public static final Recycler<DataOutputBuffer> RECYCLER = new Recycler<DataOutputBuffer>()
+    {
+        protected DataOutputBuffer newObject(Handle handle)
+        {
+            return new DataOutputBuffer(handle);
+        }
+    };
+
+    private final Recycler.Handle handle;
+
+    private DataOutputBuffer(Recycler.Handle handle)
+    {
+        this(DEFAULT_INITIAL_BUFFER_SIZE, handle);
+    }
+
     public DataOutputBuffer()
     {
-        this(128);
+        this(DEFAULT_INITIAL_BUFFER_SIZE);
     }
 
     public DataOutputBuffer(int size)
     {
-        super(ByteBuffer.allocate(size));
+        this(size, null);
     }
 
-    protected DataOutputBuffer(ByteBuffer buffer)
+    protected DataOutputBuffer(int size, Recycler.Handle handle)
+    {
+        this(ByteBuffer.allocate(size), handle);
+    }
+
+    protected DataOutputBuffer(ByteBuffer buffer, Recycler.Handle handle)
     {
         super(buffer);
+        this.handle = handle;
+    }
+
+    public void recycle()
+    {
+        assert handle != null;
+
+        if (buffer().capacity() <= MAX_RECYCLE_BUFFER_SIZE)
+        {
+            buffer.rewind();
+            RECYCLER.recycle(this, handle);
+        }
     }
 
     @Override
@@ -170,6 +210,7 @@
 
     public byte[] getData()
     {
+        assert buffer.arrayOffset() == 0;
         return buffer.array();
     }
 
@@ -188,6 +229,11 @@
         return getLength();
     }
 
+    public ByteBuffer asNewBuffer()
+    {
+        return ByteBuffer.wrap(getData(), 0, getLength());
+    }
+
     public byte[] toByteArray()
     {
         ByteBuffer buffer = buffer();

diff --git a/src/java/org/apache/cassandra/io/util/DataOutputBufferFixed.java b/src/java/org/apache/cassandra/io/util/DataOutputBufferFixed.java
index c815c9e..5193401 100644
--- a/src/java/org/apache/cassandra/io/util/DataOutputBufferFixed.java
+++ b/src/java/org/apache/cassandra/io/util/DataOutputBufferFixed.java

@@ -38,12 +38,12 @@
 
     public DataOutputBufferFixed(int size)
     {
-        super(ByteBuffer.allocate(size));
+        super(size, null);
     }
 
     public DataOutputBufferFixed(ByteBuffer buffer)
     {
-        super(buffer);
+        super(buffer, null);
     }
 
     @Override
@@ -62,4 +62,9 @@
     {
         throw new BufferOverflowException();
     }
+
+    public void clear()
+    {
+        buffer.clear();
+    }
 }

diff --git a/src/java/org/apache/cassandra/io/util/DataOutputStreamPlus.java b/src/java/org/apache/cassandra/io/util/DataOutputStreamPlus.java
index a846384..4adb6d2 100644
--- a/src/java/org/apache/cassandra/io/util/DataOutputStreamPlus.java
+++ b/src/java/org/apache/cassandra/io/util/DataOutputStreamPlus.java

@@ -22,6 +22,7 @@
 import java.nio.ByteBuffer;
 import java.nio.channels.WritableByteChannel;
 
+import io.netty.util.concurrent.FastThreadLocal;
 import org.apache.cassandra.config.Config;
 import org.apache.cassandra.utils.ByteBufferUtil;
 
@@ -64,7 +65,7 @@
         return bytes;
     }
 
-    private static final ThreadLocal<byte[]> tempBuffer = new ThreadLocal<byte[]>()
+    private static final FastThreadLocal<byte[]> tempBuffer = new FastThreadLocal<byte[]>()
     {
         @Override
         public byte[] initialValue()
@@ -86,7 +87,7 @@
             }
 
             @Override
-            public void close() throws IOException
+            public void close()
             {
             }
 

diff --git a/src/java/org/apache/cassandra/io/util/DiskAwareRunnable.java b/src/java/org/apache/cassandra/io/util/DiskAwareRunnable.java
deleted file mode 100644
index 1a15d6f..0000000
--- a/src/java/org/apache/cassandra/io/util/DiskAwareRunnable.java
+++ /dev/null

@@ -1,42 +0,0 @@
-/*
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- *
- *     http://www.apache.org/licenses/LICENSE-2.0
- *
- * Unless required by applicable law or agreed to in writing, software
- * distributed under the License is distributed on an "AS IS" BASIS,
- * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
- * See the License for the specific language governing permissions and
- * limitations under the License.
- */
-package org.apache.cassandra.io.util;
-
-import java.io.IOException;
-
-import org.apache.cassandra.db.Directories;
-import org.apache.cassandra.io.FSWriteError;
-import org.apache.cassandra.utils.WrappedRunnable;
-
-public abstract class DiskAwareRunnable extends WrappedRunnable
-{
-    protected Directories.DataDirectory getWriteDirectory(long writeSize)
-    {
-        Directories.DataDirectory directory = getDirectories().getWriteableLocation(writeSize);
-        if (directory == null)
-            throw new FSWriteError(new IOException("Insufficient disk space to write " + writeSize + " bytes"), "");
-
-        return directory;
-    }
-
-    /**
-     * Get sstable directories for the CF.
-     * @return Directories instance for the CF.
-     */
-    protected abstract Directories getDirectories();
-}

diff --git a/src/java/org/apache/cassandra/io/util/FileUtils.java b/src/java/org/apache/cassandra/io/util/FileUtils.java
index 9e81da5..39c7800 100644
--- a/src/java/org/apache/cassandra/io/util/FileUtils.java
+++ b/src/java/org/apache/cassandra/io/util/FileUtils.java

@@ -23,10 +23,12 @@
 import java.nio.charset.Charset;
 import java.nio.charset.StandardCharsets;
 import java.nio.file.*;
+import java.nio.file.attribute.BasicFileAttributes;
 import java.text.DecimalFormat;
 import java.util.Arrays;
 import java.util.Collections;
 import java.util.List;
+import java.util.Optional;
 import java.util.concurrent.atomic.AtomicReference;
 
 import org.slf4j.Logger;
@@ -56,7 +58,7 @@
 
     private static final DecimalFormat df = new DecimalFormat("#.##");
     private static final boolean canCleanDirectBuffers;
-    private static final AtomicReference<FSErrorHandler> fsErrorHandler = new AtomicReference<>();
+    private static final AtomicReference<Optional<FSErrorHandler>> fsErrorHandler = new AtomicReference<>(Optional.empty());
 
     static
     {
@@ -339,6 +341,8 @@
 
     public static void clean(ByteBuffer buffer)
     {
+        if (buffer == null)
+            return;
         if (isCleanerAvailable() && buffer.isDirect())
         {
             DirectBuffer db = (DirectBuffer) buffer;
@@ -394,25 +398,25 @@
         {
             d = value / TB;
             String val = df.format(d);
-            return val + " TB";
+            return val + " TiB";
         }
         else if ( value >= GB )
         {
             d = value / GB;
             String val = df.format(d);
-            return val + " GB";
+            return val + " GiB";
         }
         else if ( value >= MB )
         {
             d = value / MB;
             String val = df.format(d);
-            return val + " MB";
+            return val + " MiB";
         }
         else if ( value >= KB )
         {
             d = value / KB;
             String val = df.format(d);
-            return val + " KB";
+            return val + " KiB";
         }
         else
         {
@@ -452,39 +456,45 @@
                 deleteRecursiveOnExit(new File(dir, child));
         }
 
-        logger.trace("Scheduling deferred deletion of file: " + dir);
+        logger.trace("Scheduling deferred deletion of file: {}", dir);
         dir.deleteOnExit();
     }
 
     public static void handleCorruptSSTable(CorruptSSTableException e)
     {
-        FSErrorHandler handler = fsErrorHandler.get();
-        if (handler != null)
-            handler.handleCorruptSSTable(e);
+        fsErrorHandler.get().ifPresent(handler -> handler.handleCorruptSSTable(e));
     }
 
     public static void handleFSError(FSError e)
     {
-        FSErrorHandler handler = fsErrorHandler.get();
-        if (handler != null)
-            handler.handleFSError(e);
+        fsErrorHandler.get().ifPresent(handler -> handler.handleFSError(e));
     }
+
     /**
      * Get the size of a directory in bytes
-     * @param directory The directory for which we need size.
+     * @param folder The directory for which we need size.
      * @return The size of the directory
      */
-    public static long folderSize(File directory)
+    public static long folderSize(File folder)
     {
-        long length = 0;
-        for (File file : directory.listFiles())
+        final long [] sizeArr = {0L};
+        try
         {
-            if (file.isFile())
-                length += file.length();
-            else
-                length += folderSize(file);
+            Files.walkFileTree(folder.toPath(), new SimpleFileVisitor<Path>()
+            {
+                @Override
+                public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
+                {
+                    sizeArr[0] += attrs.size();
+                    return FileVisitResult.CONTINUE;
+                }
+            });
         }
-        return length;
+        catch (IOException e)
+        {
+            logger.error("Error while getting {} folder size. {}", folder, e);
+        }
+        return sizeArr[0];
     }
 
     public static void copyTo(DataInput in, OutputStream out, int length) throws IOException
@@ -575,6 +585,6 @@
 
     public static void setFSErrorHandler(FSErrorHandler handler)
     {
-        fsErrorHandler.getAndSet(handler);
+        fsErrorHandler.getAndSet(Optional.ofNullable(handler));
     }
 }

diff --git a/src/java/org/apache/cassandra/io/util/ICompressedFile.java b/src/java/org/apache/cassandra/io/util/ICompressedFile.java
index 43cef8c..e69487c 100644
--- a/src/java/org/apache/cassandra/io/util/ICompressedFile.java
+++ b/src/java/org/apache/cassandra/io/util/ICompressedFile.java

@@ -23,6 +23,4 @@
 {
     ChannelProxy channel();
     CompressionMetadata getMetadata();
-    MmappedRegions regions();
-
 }

diff --git a/src/java/org/apache/cassandra/io/util/LimitingRebufferer.java b/src/java/org/apache/cassandra/io/util/LimitingRebufferer.java
new file mode 100644
index 0000000..a1e9715
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/util/LimitingRebufferer.java

@@ -0,0 +1,126 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.io.util;
+
+import java.nio.ByteBuffer;
+
+import com.google.common.primitives.Ints;
+import com.google.common.util.concurrent.RateLimiter;
+
+/**
+ * Rebufferer wrapper that applies rate limiting.
+ *
+ * Instantiated once per RandomAccessReader, thread-unsafe.
+ * The instances reuse themselves as the BufferHolder to avoid having to return a new object for each rebuffer call.
+ */
+public class LimitingRebufferer implements Rebufferer, Rebufferer.BufferHolder
+{
+    final private Rebufferer wrapped;
+    final private RateLimiter limiter;
+    final private int limitQuant;
+
+    private BufferHolder bufferHolder;
+    private ByteBuffer buffer;
+    private long offset;
+
+    public LimitingRebufferer(Rebufferer wrapped, RateLimiter limiter, int limitQuant)
+    {
+        this.wrapped = wrapped;
+        this.limiter = limiter;
+        this.limitQuant = limitQuant;
+    }
+
+    @Override
+    public BufferHolder rebuffer(long position)
+    {
+        bufferHolder = wrapped.rebuffer(position);
+        buffer = bufferHolder.buffer();
+        offset = bufferHolder.offset();
+        int posInBuffer = Ints.checkedCast(position - offset);
+        int remaining = buffer.limit() - posInBuffer;
+        if (remaining == 0)
+            return this;
+
+        if (remaining > limitQuant)
+        {
+            buffer.limit(posInBuffer + limitQuant); // certainly below current limit
+            remaining = limitQuant;
+        }
+        limiter.acquire(remaining);
+        return this;
+    }
+
+    @Override
+    public ChannelProxy channel()
+    {
+        return wrapped.channel();
+    }
+
+    @Override
+    public long fileLength()
+    {
+        return wrapped.fileLength();
+    }
+
+    @Override
+    public double getCrcCheckChance()
+    {
+        return wrapped.getCrcCheckChance();
+    }
+
+    @Override
+    public void close()
+    {
+        wrapped.close();
+    }
+
+    @Override
+    public void closeReader()
+    {
+        wrapped.closeReader();
+    }
+
+    @Override
+    public String toString()
+    {
+        return "LimitingRebufferer[" + limiter.toString() + "]:" + wrapped.toString();
+    }
+
+    // BufferHolder methods
+
+    @Override
+    public ByteBuffer buffer()
+    {
+        return buffer;
+    }
+
+    @Override
+    public long offset()
+    {
+        return offset;
+    }
+
+    @Override
+    public void release()
+    {
+        bufferHolder.release();
+    }
+}

diff --git a/src/java/org/apache/cassandra/io/util/MemoryInputStream.java b/src/java/org/apache/cassandra/io/util/MemoryInputStream.java
index e009528..3daa4c4 100644
--- a/src/java/org/apache/cassandra/io/util/MemoryInputStream.java
+++ b/src/java/org/apache/cassandra/io/util/MemoryInputStream.java

@@ -71,6 +71,6 @@
 
     private static ByteBuffer getByteBuffer(long offset, int length)
     {
-        return MemoryUtil.getByteBuffer(offset, length).order(ByteOrder.BIG_ENDIAN);
+        return MemoryUtil.getByteBuffer(offset, length, ByteOrder.BIG_ENDIAN);
     }
 }

diff --git a/src/java/org/apache/cassandra/io/util/MmapRebufferer.java b/src/java/org/apache/cassandra/io/util/MmapRebufferer.java
new file mode 100644
index 0000000..9d79919
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/util/MmapRebufferer.java

@@ -0,0 +1,69 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.io.util;
+
+/**
+ * Rebufferer for memory-mapped files. Thread-safe and shared among reader instances.
+ * This is simply a thin wrapper around MmappedRegions as the buffers there can be used directly after duplication.
+ */
+class MmapRebufferer extends AbstractReaderFileProxy implements Rebufferer, RebuffererFactory
+{
+    protected final MmappedRegions regions;
+
+    public MmapRebufferer(ChannelProxy channel, long fileLength, MmappedRegions regions)
+    {
+        super(channel, fileLength);
+        this.regions = regions;
+    }
+
+    @Override
+    public BufferHolder rebuffer(long position)
+    {
+        return regions.floor(position);
+    }
+
+    @Override
+    public Rebufferer instantiateRebufferer()
+    {
+        return this;
+    }
+
+    @Override
+    public void close()
+    {
+        regions.closeQuietly();
+    }
+
+    @Override
+    public void closeReader()
+    {
+        // Instance is shared among readers. Nothing to release.
+    }
+
+    @Override
+    public String toString()
+    {
+        return String.format("%s(%s - data length %d)",
+                             getClass().getSimpleName(),
+                             channel.filePath(),
+                             fileLength());
+    }
+}

diff --git a/src/java/org/apache/cassandra/io/util/MmappedRegions.java b/src/java/org/apache/cassandra/io/util/MmappedRegions.java
index 8f6cd92..f269b84 100644
--- a/src/java/org/apache/cassandra/io/util/MmappedRegions.java
+++ b/src/java/org/apache/cassandra/io/util/MmappedRegions.java

@@ -22,8 +22,11 @@
 import java.nio.channels.FileChannel;
 import java.util.Arrays;
 
+import org.slf4j.LoggerFactory;
+
 import org.apache.cassandra.io.FSReadError;
 import org.apache.cassandra.io.compress.CompressionMetadata;
+import org.apache.cassandra.utils.JVMStabilityInspector;
 import org.apache.cassandra.utils.Throwables;
 import org.apache.cassandra.utils.concurrent.RefCounted;
 import org.apache.cassandra.utils.concurrent.SharedCloseableImpl;
@@ -190,8 +193,20 @@
         assert !isCleanedUp() : "Attempted to use closed region";
         return state.floor(position);
     }
+    
+    public void closeQuietly()
+    {
+        Throwable err = close(null);
+        if (err != null)
+        {
+            JVMStabilityInspector.inspectThrowable(err);
 
-    public static final class Region
+            // This is not supposed to happen
+            LoggerFactory.getLogger(getClass()).error("Error while closing mmapped regions", err);
+        }
+    }
+
+    public static final class Region implements Rebufferer.BufferHolder
     {
         public final long offset;
         public final ByteBuffer buffer;
@@ -202,15 +217,25 @@
             this.buffer = buffer;
         }
 
-        public long bottom()
+        public ByteBuffer buffer()
+        {
+            return buffer.duplicate();
+        }
+
+        public long offset()
         {
             return offset;
         }
 
-        public long top()
+        public long end()
         {
             return offset + buffer.capacity();
         }
+
+        public void release()
+        {
+            // only released after no readers are present
+        }
     }
 
     private static final class State
@@ -260,7 +285,7 @@
 
         private Region floor(long position)
         {
-            assert 0 <= position && position < length : String.format("%d >= %d", position, length);
+            assert 0 <= position && position <= length : String.format("%d > %d", position, length);
 
             int idx = Arrays.binarySearch(offsets, 0, last +1, position);
             assert idx != -1 : String.format("Bad position %d for regions %s, last %d in %s", position, Arrays.toString(offsets), last, channel);

diff --git a/src/java/org/apache/cassandra/io/util/MmappedSegmentedFile.java b/src/java/org/apache/cassandra/io/util/MmappedSegmentedFile.java
index 5f56ff6..d514bf8 100644
--- a/src/java/org/apache/cassandra/io/util/MmappedSegmentedFile.java
+++ b/src/java/org/apache/cassandra/io/util/MmappedSegmentedFile.java

@@ -19,30 +19,29 @@
 
 import java.io.*;
 
-import com.google.common.util.concurrent.RateLimiter;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.db.TypeSizes;
 import org.apache.cassandra.io.sstable.format.Version;
-import org.apache.cassandra.utils.JVMStabilityInspector;
 
 public class MmappedSegmentedFile extends SegmentedFile
 {
     private static final Logger logger = LoggerFactory.getLogger(MmappedSegmentedFile.class);
 
-    private final MmappedRegions regions;
-
-    public MmappedSegmentedFile(ChannelProxy channel, int bufferSize, long length, MmappedRegions regions)
+    public MmappedSegmentedFile(ChannelProxy channel, long length, MmappedRegions regions)
     {
-        super(new Cleanup(channel, regions), channel, bufferSize, length);
-        this.regions = regions;
+        this(channel, new MmapRebufferer(channel, length, regions), length);
+    }
+
+    public MmappedSegmentedFile(ChannelProxy channel, RebuffererFactory rebufferer, long length)
+    {
+        super(new Cleanup(channel, rebufferer), channel, rebufferer, length);
     }
 
     private MmappedSegmentedFile(MmappedSegmentedFile copy)
     {
         super(copy);
-        this.regions = copy.regions;
     }
 
     public MmappedSegmentedFile sharedCopy()
@@ -50,49 +49,6 @@
         return new MmappedSegmentedFile(this);
     }
 
-    public RandomAccessReader createReader()
-    {
-        return new RandomAccessReader.Builder(channel)
-               .overrideLength(length)
-               .regions(regions)
-               .build();
-    }
-
-    public RandomAccessReader createReader(RateLimiter limiter)
-    {
-        return new RandomAccessReader.Builder(channel)
-               .overrideLength(length)
-               .bufferSize(bufferSize)
-               .regions(regions)
-               .limiter(limiter)
-               .build();
-    }
-
-    private static final class Cleanup extends SegmentedFile.Cleanup
-    {
-        private final MmappedRegions regions;
-
-        Cleanup(ChannelProxy channel, MmappedRegions regions)
-        {
-            super(channel);
-            this.regions = regions;
-        }
-
-        public void tidy()
-        {
-            Throwable err = regions.close(null);
-            if (err != null)
-            {
-                JVMStabilityInspector.inspectThrowable(err);
-
-                // This is not supposed to happen
-                logger.error("Error while closing mmapped regions", err);
-            }
-
-            super.tidy();
-        }
-    }
-
     /**
      * Overrides the default behaviour to create segments of a maximum size.
      */
@@ -110,7 +66,7 @@
             long length = overrideLength > 0 ? overrideLength : channel.size();
             updateRegions(channel, length);
 
-            return new MmappedSegmentedFile(channel, bufferSize, length, regions.sharedCopy());
+            return new MmappedSegmentedFile(channel, length, regions.sharedCopy());
         }
 
         private void updateRegions(ChannelProxy channel, long length)

diff --git a/src/java/org/apache/cassandra/io/util/RandomAccessReader.java b/src/java/org/apache/cassandra/io/util/RandomAccessReader.java
index 1943773..725b367 100644
--- a/src/java/org/apache/cassandra/io/util/RandomAccessReader.java
+++ b/src/java/org/apache/cassandra/io/util/RandomAccessReader.java

@@ -21,11 +21,13 @@
 import java.nio.ByteBuffer;
 import java.nio.ByteOrder;
 
+import com.google.common.annotations.VisibleForTesting;
 import com.google.common.primitives.Ints;
 import com.google.common.util.concurrent.RateLimiter;
 
-import org.apache.cassandra.io.FSReadError;
 import org.apache.cassandra.io.compress.BufferType;
+import org.apache.cassandra.io.compress.CompressionMetadata;
+import org.apache.cassandra.io.util.Rebufferer.BufferHolder;
 import org.apache.cassandra.utils.memory.BufferPool;
 
 public class RandomAccessReader extends RebufferingInputStream implements FileDataInput
@@ -41,62 +43,24 @@
     //       and because our BufferPool currently has a maximum allocation size of this.
     public static final int MAX_BUFFER_SIZE = 1 << 16; // 64k
 
-    // the IO channel to the file, we do not own a reference to this due to
-    // performance reasons (CASSANDRA-9379) so it's up to the owner of the RAR to
-    // ensure that the channel stays open and that it is closed afterwards
-    protected final ChannelProxy channel;
-
-    // optional memory mapped regions for the channel
-    protected final MmappedRegions regions;
-
-    // An optional limiter that will throttle the amount of data we read
-    protected final RateLimiter limiter;
-
-    // the file length, this can be overridden at construction to a value shorter
-    // than the true length of the file; if so, it acts as an imposed limit on reads,
-    // required when opening sstables early not to read past the mark
-    private final long fileLength;
-
-    // the buffer size for buffered readers
-    protected final int bufferSize;
-
-    // the buffer type for buffered readers
-    protected final BufferType bufferType;
-
-    // offset from the beginning of the file
-    protected long bufferOffset;
-
     // offset of the last file mark
     protected long markedPointer;
 
-    protected RandomAccessReader(Builder builder)
-    {
-        super(builder.createBuffer());
+    @VisibleForTesting
+    final Rebufferer rebufferer;
+    BufferHolder bufferHolder = Rebufferer.EMPTY;
 
-        this.channel = builder.channel;
-        this.regions = builder.regions;
-        this.limiter = builder.limiter;
-        this.fileLength = builder.overrideLength <= 0 ? builder.channel.size() : builder.overrideLength;
-        this.bufferSize = builder.bufferSize;
-        this.bufferType = builder.bufferType;
-        this.buffer = builder.buffer;
+    protected RandomAccessReader(Rebufferer rebufferer)
+    {
+        super(Rebufferer.EMPTY.buffer());
+        this.rebufferer = rebufferer;
     }
 
-    protected static ByteBuffer allocateBuffer(int size, BufferType bufferType)
+    public static ByteBuffer allocateBuffer(int size, BufferType bufferType)
     {
         return BufferPool.get(size, bufferType).order(ByteOrder.BIG_ENDIAN);
     }
 
-    protected void releaseBuffer()
-    {
-        if (buffer != null)
-        {
-            if (regions == null)
-                BufferPool.put(buffer);
-            buffer = null;
-        }
-    }
-
     /**
      * Read data from file starting from current currentOffset to populate buffer.
      */
@@ -105,80 +69,40 @@
         if (isEOF())
             return;
 
-        if (regions == null)
-            reBufferStandard();
-        else
-            reBufferMmap();
+        reBufferAt(current());
+    }
 
-        if (limiter != null)
-            limiter.acquire(buffer.remaining());
+    public void reBufferAt(long position)
+    {
+        bufferHolder.release();
+        bufferHolder = rebufferer.rebuffer(position);
+        buffer = bufferHolder.buffer();
+        buffer.position(Ints.checkedCast(position - bufferHolder.offset()));
 
         assert buffer.order() == ByteOrder.BIG_ENDIAN : "Buffer must have BIG ENDIAN byte ordering";
     }
 
-    protected void reBufferStandard()
-    {
-        bufferOffset += buffer.position();
-        assert bufferOffset < fileLength;
-
-        buffer.clear();
-        long position = bufferOffset;
-        long limit = bufferOffset;
-
-        long pageAligedPos = position & ~4095;
-        // Because the buffer capacity is a multiple of the page size, we read less
-        // the first time and then we should read at page boundaries only,
-        // unless the user seeks elsewhere
-        long upperLimit = Math.min(fileLength, pageAligedPos + buffer.capacity());
-        buffer.limit((int)(upperLimit - position));
-        while (buffer.hasRemaining() && limit < upperLimit)
-        {
-            int n = channel.read(buffer, position);
-            if (n < 0)
-                throw new FSReadError(new IOException("Unexpected end of file"), channel.filePath());
-
-            position += n;
-            limit = bufferOffset + buffer.position();
-        }
-
-        buffer.flip();
-    }
-
-    protected void reBufferMmap()
-    {
-        long position = bufferOffset + buffer.position();
-        assert position < fileLength;
-
-        MmappedRegions.Region region = regions.floor(position);
-        bufferOffset = region.bottom();
-        buffer = region.buffer.duplicate();
-        buffer.position(Ints.checkedCast(position - bufferOffset));
-
-        if (limiter != null && bufferSize < buffer.remaining())
-        { // ensure accurate throttling
-            buffer.limit(buffer.position() + bufferSize);
-        }
-    }
-
     @Override
     public long getFilePointer()
     {
+        if (buffer == null)     // closed already
+            return rebufferer.fileLength();
         return current();
     }
 
     protected long current()
     {
-        return bufferOffset + (buffer == null ? 0 : buffer.position());
+        return bufferHolder.offset() + buffer.position();
     }
 
     public String getPath()
     {
-        return channel.filePath();
+        return getChannel().filePath();
     }
 
     public ChannelProxy getChannel()
     {
-        return channel;
+        return rebufferer.channel();
     }
 
     @Override
@@ -242,12 +166,14 @@
     @Override
     public void close()
     {
-	    //make idempotent
+        // close needs to be idempotent.
         if (buffer == null)
             return;
 
-        bufferOffset += buffer.position();
-        releaseBuffer();
+        bufferHolder.release();
+        rebufferer.closeReader();
+        buffer = null;
+        bufferHolder = null;
 
         //For performance reasons we don't keep a reference to the file
         //channel so we don't close it
@@ -256,7 +182,7 @@
     @Override
     public String toString()
     {
-        return getClass().getSimpleName() + "(filePath='" + channel + "')";
+        return getClass().getSimpleName() + ':' + rebufferer.toString();
     }
 
     /**
@@ -281,26 +207,17 @@
         if (buffer == null)
             throw new IllegalStateException("Attempted to seek in a closed RAR");
 
-        if (newPosition >= length()) // it is save to call length() in read-only mode
-        {
-            if (newPosition > length())
-                throw new IllegalArgumentException(String.format("Unable to seek to position %d in %s (%d bytes) in read-only mode",
-                                                             newPosition, getPath(), length()));
-            buffer.limit(0);
-            bufferOffset = newPosition;
-            return;
-        }
-
+        long bufferOffset = bufferHolder.offset();
         if (newPosition >= bufferOffset && newPosition < bufferOffset + buffer.limit())
         {
             buffer.position((int) (newPosition - bufferOffset));
             return;
         }
-        // Set current location to newPosition and clear buffer so reBuffer calculates from newPosition
-        bufferOffset = newPosition;
-        buffer.clear();
-        reBuffer();
-        assert current() == newPosition;
+
+        if (newPosition > length())
+            throw new IllegalArgumentException(String.format("Unable to seek to position %d in %s (%d bytes) in read-only mode",
+                                                         newPosition, getPath(), length()));
+        reBufferAt(newPosition);
     }
 
     /**
@@ -308,10 +225,10 @@
      * represented by zero or more characters followed by {@code '\n'}, {@code
      * '\r'}, {@code "\r\n"} or the end of file marker. The string does not
      * include the line terminating sequence.
-     * <p/>
+     * <p>
      * Blocks until a line terminating sequence has been read, the end of the
      * file is reached or an exception is thrown.
-     *
+     * </p>
      * @return the contents of the line or {@code null} if no characters have
      * been read before the end of the file has been reached.
      * @throws IOException if this file is closed or another I/O error occurs.
@@ -353,7 +270,7 @@
 
     public long length()
     {
-        return fileLength;
+        return rebufferer.fileLength();
     }
 
     public long getPosition()
@@ -361,17 +278,38 @@
         return current();
     }
 
+    public double getCrcCheckChance()
+    {
+        return rebufferer.getCrcCheckChance();
+    }
+
+    protected static Rebufferer instantiateRebufferer(RebuffererFactory fileRebufferer, RateLimiter limiter)
+    {
+        Rebufferer rebufferer = fileRebufferer.instantiateRebufferer();
+
+        if (limiter != null)
+            rebufferer = new LimitingRebufferer(rebufferer, limiter, MAX_BUFFER_SIZE);
+
+        return rebufferer;
+    }
+
+    public static RandomAccessReader build(SegmentedFile file, RateLimiter limiter)
+    {
+        return new RandomAccessReader(instantiateRebufferer(file.rebuffererFactory(), limiter));
+    }
+
+    public static Builder builder(ChannelProxy channel)
+    {
+        return new Builder(channel);
+    }
+
     public static class Builder
     {
         // The NIO file channel or an empty channel
         public final ChannelProxy channel;
 
-        // We override the file length when we open sstables early, so that we do not
-        // read past the early mark
-        public long overrideLength;
-
         // The size of the buffer for buffered readers
-        public int bufferSize;
+        protected int bufferSize;
 
         // The type of the buffer for buffered readers
         public BufferType bufferType;
@@ -379,20 +317,20 @@
         // The buffer
         public ByteBuffer buffer;
 
+        // An optional limiter that will throttle the amount of data we read
+        public RateLimiter limiter;
+
         // The mmap segments for mmap readers
         public MmappedRegions regions;
 
-        // An optional limiter that will throttle the amount of data we read
-        public RateLimiter limiter;
+        // Compression for compressed readers
+        public CompressionMetadata compression;
 
         public Builder(ChannelProxy channel)
         {
             this.channel = channel;
-            this.overrideLength = -1L;
             this.bufferSize = DEFAULT_BUFFER_SIZE;
             this.bufferType = BufferType.OFF_HEAP;
-            this.regions = null;
-            this.limiter = null;
         }
 
         /** The buffer size is typically already page aligned but if that is not the case
@@ -400,38 +338,30 @@
          * buffer size unless we are throttling, in which case we may as well read the maximum
          * directly since the intention is to read the full file, see CASSANDRA-8630.
          * */
-        private void setBufferSize()
+        private int adjustedBufferSize()
         {
             if (limiter != null)
-            {
-                bufferSize = MAX_BUFFER_SIZE;
-                return;
-            }
+                return MAX_BUFFER_SIZE;
 
-            if ((bufferSize & ~4095) != bufferSize)
-            { // should already be a page size multiple but if that's not case round it up
-                bufferSize = (bufferSize + 4095) & ~4095;
-            }
-
-            bufferSize = Math.min(MAX_BUFFER_SIZE, bufferSize);
+            // should already be a page size multiple but if that's not case round it up
+            int wholePageSize = (bufferSize + 4095) & ~4095;
+            return Math.min(MAX_BUFFER_SIZE, wholePageSize);
         }
 
-        protected ByteBuffer createBuffer()
+        protected Rebufferer createRebufferer()
         {
-            setBufferSize();
-
-            buffer = regions == null
-                     ? allocateBuffer(bufferSize, bufferType)
-                     : regions.floor(0).buffer.duplicate();
-
-            buffer.limit(0);
-            return buffer;
+            return instantiateRebufferer(chunkReader(), limiter);
         }
 
-        public Builder overrideLength(long overrideLength)
+        public RebuffererFactory chunkReader()
         {
-            this.overrideLength = overrideLength;
-            return this;
+            if (compression != null)
+                return CompressedSegmentedFile.chunkReader(channel, compression, regions);
+            if (regions != null)
+                return new MmapRebufferer(channel, -1, regions);
+
+            int adjustedSize = adjustedBufferSize();
+            return new SimpleChunkReader(channel, -1, bufferType, adjustedSize);
         }
 
         public Builder bufferSize(int bufferSize)
@@ -455,6 +385,12 @@
             return this;
         }
 
+        public Builder compression(CompressionMetadata metadata)
+        {
+            this.compression = metadata;
+            return this;
+        }
+
         public Builder limiter(RateLimiter limiter)
         {
             this.limiter = limiter;
@@ -463,12 +399,12 @@
 
         public RandomAccessReader build()
         {
-            return new RandomAccessReader(this);
+            return new RandomAccessReader(createRebufferer());
         }
 
         public RandomAccessReader buildWithChannel()
         {
-            return new RandomAccessReaderWithOwnChannel(this);
+            return new RandomAccessReaderWithOwnChannel(createRebufferer());
         }
     }
 
@@ -479,9 +415,9 @@
     // not have a shared channel.
     public static class RandomAccessReaderWithOwnChannel extends RandomAccessReader
     {
-        protected RandomAccessReaderWithOwnChannel(Builder builder)
+        protected RandomAccessReaderWithOwnChannel(Rebufferer rebufferer)
         {
-            super(builder);
+            super(rebufferer);
         }
 
         @Override
@@ -493,7 +429,14 @@
             }
             finally
             {
-                channel.close();
+                try
+                {
+                    rebufferer.close();
+                }
+                finally
+                {
+                    getChannel().close();
+                }
             }
         }
     }

diff --git a/src/java/org/apache/cassandra/io/util/ReaderFileProxy.java b/src/java/org/apache/cassandra/io/util/ReaderFileProxy.java
new file mode 100644
index 0000000..3ddb143
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/util/ReaderFileProxy.java

@@ -0,0 +1,36 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.io.util;
+
+/**
+ * Base class for the RandomAccessReader components that implement reading.
+ */
+public interface ReaderFileProxy extends AutoCloseable
+{
+    void close();               // no checked exceptions
+
+    ChannelProxy channel();
+
+    long fileLength();
+
+    /**
+     * Needed for tests. Returns the table's CRC check chance, which is only set for compressed tables.
+     */
+    double getCrcCheckChance();
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/io/util/Rebufferer.java b/src/java/org/apache/cassandra/io/util/Rebufferer.java
new file mode 100644
index 0000000..e88c7cb
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/util/Rebufferer.java

@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.io.util;
+
+import java.nio.ByteBuffer;
+
+/**
+ * Rebufferer for reading data by a RandomAccessReader.
+ */
+public interface Rebufferer extends ReaderFileProxy
+{
+    /**
+     * Rebuffer (move on or seek to) a given position, and return a buffer that can be used there.
+     * The only guarantee about the size of the returned data is that unless rebuffering at the end of the file,
+     * the buffer will not be empty and will contain the requested position, i.e.
+     * {@code offset <= position < offset + bh.buffer().limit()}, but the buffer will not be positioned there.
+     */
+    BufferHolder rebuffer(long position);
+
+    /**
+     * Called when a reader is closed. Should clean up reader-specific data.
+     */
+    void closeReader();
+
+    public interface BufferHolder
+    {
+        /**
+         * Returns a useable buffer (i.e. one whose position and limit can be freely modified). Its limit will be set
+         * to the size of the available data in the buffer.
+         * The buffer must be treated as read-only.
+         */
+        ByteBuffer buffer();
+
+        /**
+         * Position in the file of the start of the buffer.
+         */
+        long offset();
+
+        /**
+         * To be called when this buffer is no longer in use. Must be called for all BufferHolders, or ChunkCache
+         * will not be able to free blocks.
+         */
+        void release();
+    }
+
+    static final BufferHolder EMPTY = new BufferHolder()
+    {
+        final ByteBuffer EMPTY_BUFFER = ByteBuffer.allocate(0);
+
+        @Override
+        public ByteBuffer buffer()
+        {
+            return EMPTY_BUFFER;
+        }
+
+        @Override
+        public long offset()
+        {
+            return 0;
+        }
+
+        @Override
+        public void release()
+        {
+            // nothing to do
+        }
+    };
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/io/util/RebuffererFactory.java b/src/java/org/apache/cassandra/io/util/RebuffererFactory.java
new file mode 100644
index 0000000..ec35f0b
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/util/RebuffererFactory.java

@@ -0,0 +1,32 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.io.util;
+
+/**
+ * Interface for the classes that can be used to instantiate rebufferers over a given file.
+ *
+ * These are one of two types:
+ *  - chunk sources (e.g. SimpleReadRebufferer) which instantiate a buffer managing rebufferer referencing
+ *    themselves.
+ *  - thread-safe shared rebufferers (e.g. MmapRebufferer) which directly return themselves.
+ */
+public interface RebuffererFactory extends ReaderFileProxy
+{
+    Rebufferer instantiateRebufferer();
+}

diff --git a/src/java/org/apache/cassandra/io/util/RebufferingInputStream.java b/src/java/org/apache/cassandra/io/util/RebufferingInputStream.java
index 15d0975..6745526 100644
--- a/src/java/org/apache/cassandra/io/util/RebufferingInputStream.java
+++ b/src/java/org/apache/cassandra/io/util/RebufferingInputStream.java

@@ -275,16 +275,4 @@
             return -1;
         }
     }
-
-    @Override
-    public void reset() throws IOException
-    {
-        throw new IOException("mark/reset not supported");
-    }
-
-    @Override
-    public boolean markSupported()
-    {
-        return false;
-    }
 }

diff --git a/src/java/org/apache/cassandra/io/util/RewindableDataInputStreamPlus.java b/src/java/org/apache/cassandra/io/util/RewindableDataInputStreamPlus.java
index 3a680f4..ad2e8bd 100644
--- a/src/java/org/apache/cassandra/io/util/RewindableDataInputStreamPlus.java
+++ b/src/java/org/apache/cassandra/io/util/RewindableDataInputStreamPlus.java

@@ -33,19 +33,19 @@
  * Adds mark/reset functionality to another input stream by caching read bytes to a memory buffer and
  * spilling to disk if necessary.
  *
- * When the stream is marked via {@link this#mark()} or {@link this#mark(int)}, up to
+ * When the stream is marked via {@link #mark()} or {@link #mark(int)}, up to
  * <code>maxMemBufferSize</code> will be cached in memory (heap). If more than
  * <code>maxMemBufferSize</code> bytes are read while the stream is marked, the
  * following bytes are cached on the <code>spillFile</code> for up to <code>maxDiskBufferSize</code>.
  *
- * Please note that successive calls to {@link this#mark()} and {@link this#reset()} will write
+ * Please note that successive calls to {@link #mark()} and {@link #reset()} will write
  * sequentially to the same <code>spillFile</code> until <code>maxDiskBufferSize</code> is reached.
  * At this point, if less than <code>maxDiskBufferSize</code> bytes are currently cached on the
  * <code>spillFile</code>, the remaining bytes are written to the beginning of the file,
  * treating the <code>spillFile</code> as a circular buffer.
  *
  * If more than <code>maxMemBufferSize + maxDiskBufferSize</code> are cached while the stream is marked,
- * the following {@link this#reset()} invocation will throw a {@link IllegalStateException}.
+ * the following {@link #reset()} invocation will throw a {@link IllegalStateException}.
  *
  */
 public class RewindableDataInputStreamPlus extends FilterInputStream implements RewindableDataInput, Closeable
@@ -83,7 +83,7 @@
     /* RewindableDataInput methods */
 
     /**
-     * Marks the current position of a stream to return to this position later via the {@link this#reset(DataPosition)} method.
+     * Marks the current position of a stream to return to this position later via the {@link #reset(DataPosition)} method.
      * @return An empty @link{DataPosition} object
      */
     public DataPosition mark()
@@ -93,7 +93,7 @@
     }
 
     /**
-     * Rewinds to the previously marked position via the {@link this#mark()} method.
+     * Rewinds to the previously marked position via the {@link #mark()} method.
      * @param mark it's not possible to return to a custom position, so this parameter is ignored.
      * @throws IOException if an error ocurs while resetting
      */
@@ -121,7 +121,7 @@
 
     /**
      * Marks the current position of a stream to return to this position
-     * later via the {@link this#reset()} method.
+     * later via the {@link #reset()} method.
      * @param readlimit the maximum amount of bytes to cache
      */
     public synchronized void mark(int readlimit)
@@ -277,7 +277,7 @@
             if (len > 0 && diskTailAvailable > 0)
             {
                 int readFromTail = diskTailAvailable < len? diskTailAvailable : len;
-                getIfNotClosed(spillBuffer).read(b, off, readFromTail);
+                readFromTail = getIfNotClosed(spillBuffer).read(b, off, readFromTail);
                 readBytes += readFromTail;
                 diskTailAvailable -= readFromTail;
                 off += readFromTail;
@@ -288,7 +288,7 @@
             if (len > 0 && diskHeadAvailable > 0)
             {
                 int readFromHead = diskHeadAvailable < len? diskHeadAvailable : len;
-                getIfNotClosed(spillBuffer).read(b, off, readFromHead);
+                readFromHead = getIfNotClosed(spillBuffer).read(b, off, readFromHead);
                 readBytes += readFromHead;
                 diskHeadAvailable -= readFromHead;
                 off += readFromHead;

diff --git a/src/java/org/apache/cassandra/io/util/SafeMemory.java b/src/java/org/apache/cassandra/io/util/SafeMemory.java
index e8cd54f..4482d96 100644
--- a/src/java/org/apache/cassandra/io/util/SafeMemory.java
+++ b/src/java/org/apache/cassandra/io/util/SafeMemory.java

@@ -84,7 +84,7 @@
             this.size = size;
         }
 
-        public void tidy() throws Exception
+        public void tidy()
         {
             /** see {@link Memory#Memory(long)} re: null pointers*/
             if (peer != 0)

diff --git a/src/java/org/apache/cassandra/io/util/SafeMemoryWriter.java b/src/java/org/apache/cassandra/io/util/SafeMemoryWriter.java
index 24eb93c..88912f9 100644
--- a/src/java/org/apache/cassandra/io/util/SafeMemoryWriter.java
+++ b/src/java/org/apache/cassandra/io/util/SafeMemoryWriter.java

@@ -33,7 +33,7 @@
 
     private SafeMemoryWriter(SafeMemory memory)
     {
-        super(tailBuffer(memory).order(ByteOrder.BIG_ENDIAN));
+        super(tailBuffer(memory).order(ByteOrder.BIG_ENDIAN), null);
         this.memory = memory;
     }
 

diff --git a/src/java/org/apache/cassandra/io/util/SegmentedFile.java b/src/java/org/apache/cassandra/io/util/SegmentedFile.java
index ab2d291..62e14ba 100644
--- a/src/java/org/apache/cassandra/io/util/SegmentedFile.java
+++ b/src/java/org/apache/cassandra/io/util/SegmentedFile.java

@@ -21,7 +21,6 @@
 import java.io.DataOutput;
 import java.io.File;
 import java.io.IOException;
-import java.util.function.Supplier;
 
 import com.google.common.util.concurrent.RateLimiter;
 
@@ -52,27 +51,21 @@
 public abstract class SegmentedFile extends SharedCloseableImpl
 {
     public final ChannelProxy channel;
-    public final int bufferSize;
-    public final long length;
 
     // This differs from length for compressed files (but we still need length for
     // SegmentIterator because offsets in the file are relative to the uncompressed size)
     public final long onDiskLength;
 
     /**
-     * Use getBuilder to get a Builder to construct a SegmentedFile.
+     * Rebufferer to use to construct RandomAccessReaders.
      */
-    SegmentedFile(Cleanup cleanup, ChannelProxy channel, int bufferSize, long length)
-    {
-        this(cleanup, channel, bufferSize, length, length);
-    }
+    private final RebuffererFactory rebufferer;
 
-    protected SegmentedFile(Cleanup cleanup, ChannelProxy channel, int bufferSize, long length, long onDiskLength)
+    protected SegmentedFile(Cleanup cleanup, ChannelProxy channel, RebuffererFactory rebufferer, long onDiskLength)
     {
         super(cleanup);
+        this.rebufferer = rebufferer;
         this.channel = channel;
-        this.bufferSize = bufferSize;
-        this.length = length;
         this.onDiskLength = onDiskLength;
     }
 
@@ -80,8 +73,7 @@
     {
         super(copy);
         channel = copy.channel;
-        bufferSize = copy.bufferSize;
-        length = copy.length;
+        rebufferer = copy.rebufferer;
         onDiskLength = copy.onDiskLength;
     }
 
@@ -90,12 +82,24 @@
         return channel.filePath();
     }
 
+    public long dataLength()
+    {
+        return rebufferer.fileLength();
+    }
+
+    public RebuffererFactory rebuffererFactory()
+    {
+        return rebufferer;
+    }
+
     protected static class Cleanup implements RefCounted.Tidy
     {
         final ChannelProxy channel;
-        protected Cleanup(ChannelProxy channel)
+        final ReaderFileProxy rebufferer;
+        protected Cleanup(ChannelProxy channel, ReaderFileProxy rebufferer)
         {
             this.channel = channel;
+            this.rebufferer = rebufferer;
         }
 
         public String name()
@@ -105,7 +109,14 @@
 
         public void tidy()
         {
-            channel.close();
+            try
+            {
+                channel.close();
+            }
+            finally
+            {
+                rebufferer.close();
+            }
         }
     }
 
@@ -113,19 +124,12 @@
 
     public RandomAccessReader createReader()
     {
-        return new RandomAccessReader.Builder(channel)
-               .overrideLength(length)
-               .bufferSize(bufferSize)
-               .build();
+        return RandomAccessReader.build(this, null);
     }
 
     public RandomAccessReader createReader(RateLimiter limiter)
     {
-        return new RandomAccessReader.Builder(channel)
-               .overrideLength(length)
-               .bufferSize(bufferSize)
-               .limiter(limiter)
-               .build();
+        return RandomAccessReader.build(this, limiter);
     }
 
     public FileDataInput createReader(long position)
@@ -308,7 +312,7 @@
     @Override
     public String toString() {
         return getClass().getSimpleName() + "(path='" + path() + '\'' +
-               ", length=" + length +
+               ", length=" + rebufferer.fileLength() +
                ')';
-}
+    }
 }

diff --git a/src/java/org/apache/cassandra/io/util/SequentialWriter.java b/src/java/org/apache/cassandra/io/util/SequentialWriter.java
index 26316a2..e71f2fa 100644
--- a/src/java/org/apache/cassandra/io/util/SequentialWriter.java
+++ b/src/java/org/apache/cassandra/io/util/SequentialWriter.java

@@ -17,32 +17,24 @@
  */
 package org.apache.cassandra.io.util;
 
-import java.io.*;
+import java.io.File;
+import java.io.IOException;
 import java.nio.channels.FileChannel;
 import java.nio.file.StandardOpenOption;
 
-import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.io.FSReadError;
 import org.apache.cassandra.io.FSWriteError;
-import org.apache.cassandra.io.compress.BufferType;
-import org.apache.cassandra.io.compress.CompressedSequentialWriter;
-import org.apache.cassandra.schema.CompressionParams;
-import org.apache.cassandra.io.sstable.Descriptor;
-import org.apache.cassandra.io.sstable.metadata.MetadataCollector;
+import org.apache.cassandra.utils.SyncUtil;
 import org.apache.cassandra.utils.concurrent.Transactional;
 
 import static org.apache.cassandra.utils.Throwables.merge;
 
-import org.apache.cassandra.utils.SyncUtil;
-
 /**
  * Adds buffering, mark, and fsyncing to OutputStream.  We always fsync on close; we may also
  * fsync incrementally if Config.trickle_fsync is enabled.
  */
 public class SequentialWriter extends BufferedDataOutputStreamPlus implements Transactional
 {
-    private static final int DEFAULT_BUFFER_SIZE = 64 * 1024;
-
     // absolute path to the given file
     private final String filePath;
 
@@ -53,8 +45,7 @@
 
     // whether to do trickling fsync() to avoid sudden bursts of dirty buffer flushing by kernel causing read
     // latency spikes
-    private boolean trickleFsync;
-    private int trickleFsyncByteInterval;
+    private final SequentialWriterOption option;
     private int bytesSinceTrickleFsync = 0;
 
     protected long lastFlushOffset;
@@ -62,8 +53,6 @@
     protected Runnable runPostFlush;
 
     private final TransactionalProxy txnProxy = txnProxy();
-    private boolean finishOnClose;
-    protected Descriptor descriptor;
 
     // due to lack of multiple-inheritance, we proxy our transactional implementation
     protected class TransactionalProxy extends AbstractTransactional
@@ -102,7 +91,8 @@
     }
 
     // TODO: we should specify as a parameter if we permit an existing file or not
-    private static FileChannel openChannel(File file) {
+    private static FileChannel openChannel(File file)
+    {
         try
         {
             if (file.exists())
@@ -130,43 +120,38 @@
         }
     }
 
-    public SequentialWriter(File file, int bufferSize, BufferType bufferType)
+    /**
+     * Create heap-based, non-compressed SequenialWriter with default buffer size(64k).
+     *
+     * @param file File to write
+     */
+    public SequentialWriter(File file)
     {
-        super(openChannel(file), bufferType.allocate(bufferSize));
+       this(file, SequentialWriterOption.DEFAULT);
+    }
+
+    /**
+     * Create SequentialWriter for given file with specific writer option.
+     *
+     * @param file File to write
+     * @param option Writer option
+     */
+    public SequentialWriter(File file, SequentialWriterOption option)
+    {
+        super(openChannel(file), option.allocateBuffer());
         strictFlushing = true;
         fchannel = (FileChannel)channel;
 
         filePath = file.getAbsolutePath();
 
-        this.trickleFsync = DatabaseDescriptor.getTrickleFsync();
-        this.trickleFsyncByteInterval = DatabaseDescriptor.getTrickleFsyncIntervalInKb() * 1024;
+        this.option = option;
     }
 
-    /**
-     * Open a heap-based, non-compressed SequentialWriter
-     */
-    public static SequentialWriter open(File file)
+    public void skipBytes(int numBytes) throws IOException
     {
-        return new SequentialWriter(file, DEFAULT_BUFFER_SIZE, BufferType.ON_HEAP);
-    }
-
-    public static ChecksummedSequentialWriter open(File file, File crcPath)
-    {
-        return new ChecksummedSequentialWriter(file, DEFAULT_BUFFER_SIZE, crcPath);
-    }
-
-    public static CompressedSequentialWriter open(String dataFilePath,
-                                                  String offsetsPath,
-                                                  CompressionParams parameters,
-                                                  MetadataCollector sstableMetadataCollector)
-    {
-        return new CompressedSequentialWriter(new File(dataFilePath), offsetsPath, parameters, sstableMetadataCollector);
-    }
-
-    public SequentialWriter finishOnClose()
-    {
-        finishOnClose = true;
-        return this;
+        flush();
+        fchannel.position(fchannel.position() + numBytes);
+        bufferOffset = fchannel.position();
     }
 
     /**
@@ -205,10 +190,10 @@
     {
         flushData();
 
-        if (trickleFsync)
+        if (option.trickleFsync())
         {
             bytesSinceTrickleFsync += buffer.position();
-            if (bytesSinceTrickleFsync >= trickleFsyncByteInterval)
+            if (bytesSinceTrickleFsync >= option.trickleFsyncByteInterval())
             {
                 syncDataOnlyInternal();
                 bytesSinceTrickleFsync = 0;
@@ -269,6 +254,11 @@
         return position();
     }
 
+    public long getEstimatedOnDiskBytesWritten()
+    {
+        return getOnDiskFilePointer();
+    }
+
     public long length()
     {
         try
@@ -336,6 +326,7 @@
             throw new FSReadError(e, getPath());
         }
 
+        bufferOffset = truncateTarget;
         resetBuffer();
     }
 
@@ -349,6 +340,7 @@
         try
         {
             fchannel.truncate(toSize);
+            lastFlushOffset = toSize;
         }
         catch (IOException e)
         {
@@ -361,12 +353,6 @@
         return channel.isOpen();
     }
 
-    public SequentialWriter setDescriptor(Descriptor descriptor)
-    {
-        this.descriptor = descriptor;
-        return this;
-    }
-
     public final void prepareToCommit()
     {
         txnProxy.prepareToCommit();
@@ -385,7 +371,7 @@
     @Override
     public final void close()
     {
-        if (finishOnClose)
+        if (option.finishOnClose())
             txnProxy.finish();
         else
             txnProxy.close();

diff --git a/src/java/org/apache/cassandra/io/util/SequentialWriterOption.java b/src/java/org/apache/cassandra/io/util/SequentialWriterOption.java
new file mode 100644
index 0000000..61f375b
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/util/SequentialWriterOption.java

@@ -0,0 +1,154 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.io.util;
+
+import java.nio.ByteBuffer;
+import java.util.Objects;
+
+import org.apache.cassandra.io.compress.BufferType;
+
+/**
+ * SequentialWriter option
+ */
+public class SequentialWriterOption
+{
+    /**
+     * Default write option.
+     *
+     * <ul>
+     *   <li>buffer size: 64 KB
+     *   <li>buffer type: on heap
+     *   <li>trickle fsync: false
+     *   <li>trickle fsync byte interval: 10 MB
+     *   <li>finish on close: false
+     * </ul>
+     */
+    public static final SequentialWriterOption DEFAULT = SequentialWriterOption.newBuilder().build();
+
+    private final int bufferSize;
+    private final BufferType bufferType;
+    private final boolean trickleFsync;
+    private final int trickleFsyncByteInterval;
+    private final boolean finishOnClose;
+
+    private SequentialWriterOption(int bufferSize,
+                                   BufferType bufferType,
+                                   boolean trickleFsync,
+                                   int trickleFsyncByteInterval,
+                                   boolean finishOnClose)
+    {
+        this.bufferSize = bufferSize;
+        this.bufferType = bufferType;
+        this.trickleFsync = trickleFsync;
+        this.trickleFsyncByteInterval = trickleFsyncByteInterval;
+        this.finishOnClose = finishOnClose;
+    }
+
+    public static Builder newBuilder()
+    {
+        return new Builder();
+    }
+
+    public int bufferSize()
+    {
+        return bufferSize;
+    }
+
+    public BufferType bufferType()
+    {
+        return bufferType;
+    }
+
+    public boolean trickleFsync()
+    {
+        return trickleFsync;
+    }
+
+    public int trickleFsyncByteInterval()
+    {
+        return trickleFsyncByteInterval;
+    }
+
+    public boolean finishOnClose()
+    {
+        return finishOnClose;
+    }
+
+    /**
+     * Allocate buffer using set buffer type and buffer size.
+     *
+     * @return allocated ByteBuffer
+     */
+    public ByteBuffer allocateBuffer()
+    {
+        return bufferType.allocate(bufferSize);
+    }
+
+    public static class Builder
+    {
+        /* default buffer size: 64k */
+        private int bufferSize = 64 * 1024;
+        /* default buffer type: on heap */
+        private BufferType bufferType = BufferType.ON_HEAP;
+        /* default: no trickle fsync */
+        private boolean trickleFsync = false;
+        /* default tricle fsync byte interval: 10MB */
+        private int trickleFsyncByteInterval = 10 * 1024 * 1024;
+        private boolean finishOnClose = false;
+
+        /* construct throguh SequentialWriteOption.newBuilder */
+        private Builder() {}
+
+        public SequentialWriterOption build()
+        {
+            return new SequentialWriterOption(bufferSize, bufferType, trickleFsync,
+                                   trickleFsyncByteInterval, finishOnClose);
+        }
+
+        public Builder bufferSize(int bufferSize)
+        {
+            this.bufferSize = bufferSize;
+            return this;
+        }
+
+        public Builder bufferType(BufferType bufferType)
+        {
+            this.bufferType = Objects.requireNonNull(bufferType);
+            return this;
+        }
+
+        public Builder trickleFsync(boolean trickleFsync)
+        {
+            this.trickleFsync = trickleFsync;
+            return this;
+        }
+
+        public Builder trickleFsyncByteInterval(int trickleFsyncByteInterval)
+        {
+            this.trickleFsyncByteInterval = trickleFsyncByteInterval;
+            return this;
+        }
+
+        public Builder finishOnClose(boolean finishOnClose)
+        {
+            this.finishOnClose = finishOnClose;
+            return this;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/io/util/SimpleChunkReader.java b/src/java/org/apache/cassandra/io/util/SimpleChunkReader.java
new file mode 100644
index 0000000..7bfb57b
--- /dev/null
+++ b/src/java/org/apache/cassandra/io/util/SimpleChunkReader.java

@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.io.util;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.io.compress.BufferType;
+
+class SimpleChunkReader extends AbstractReaderFileProxy implements ChunkReader
+{
+    private final int bufferSize;
+    private final BufferType bufferType;
+
+    public SimpleChunkReader(ChannelProxy channel, long fileLength, BufferType bufferType, int bufferSize)
+    {
+        super(channel, fileLength);
+        this.bufferSize = bufferSize;
+        this.bufferType = bufferType;
+    }
+
+    @Override
+    public void readChunk(long position, ByteBuffer buffer)
+    {
+        buffer.clear();
+        channel.read(buffer, position);
+        buffer.flip();
+    }
+
+    @Override
+    public int chunkSize()
+    {
+        return bufferSize;
+    }
+
+    @Override
+    public BufferType preferredBufferType()
+    {
+        return bufferType;
+    }
+
+    @Override
+    public boolean alignmentRequired()
+    {
+        return false;
+    }
+
+    @Override
+    public Rebufferer instantiateRebufferer()
+    {
+        return BufferManagingRebufferer.on(this);
+    }
+
+    @Override
+    public String toString()
+    {
+        return String.format("%s(%s - chunk length %d, data length %d)",
+                             getClass().getSimpleName(),
+                             channel.filePath(),
+                             bufferSize,
+                             fileLength());
+    }
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/locator/DynamicEndpointSnitch.java b/src/java/org/apache/cassandra/locator/DynamicEndpointSnitch.java
index 6280dc2..70aecb0 100644
--- a/src/java/org/apache/cassandra/locator/DynamicEndpointSnitch.java
+++ b/src/java/org/apache/cassandra/locator/DynamicEndpointSnitch.java

@@ -255,7 +255,7 @@
 
     private void updateScores() // this is expensive
     {
-        if (!StorageService.instance.isInitialized()) 
+        if (!StorageService.instance.isGossipActive())
             return;
         if (!registered)
         {

diff --git a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java
index 7c8d95e..756b689 100644
--- a/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java
+++ b/src/java/org/apache/cassandra/locator/NetworkTopologyStrategy.java

@@ -28,6 +28,7 @@
 import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.locator.TokenMetadata.Topology;
 import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.Pair;
 
 import com.google.common.collect.Multimap;
 
@@ -48,14 +49,12 @@
  */
 public class NetworkTopologyStrategy extends AbstractReplicationStrategy
 {
-    private final IEndpointSnitch snitch;
     private final Map<String, Integer> datacenters;
     private static final Logger logger = LoggerFactory.getLogger(NetworkTopologyStrategy.class);
 
     public NetworkTopologyStrategy(String keyspaceName, TokenMetadata tokenMetadata, IEndpointSnitch snitch, Map<String, String> configOptions) throws ConfigurationException
     {
         super(keyspaceName, tokenMetadata, snitch, configOptions);
-        this.snitch = snitch;
 
         Map<String, Integer> newDatacenters = new HashMap<String, Integer>();
         if (configOptions != null)
@@ -75,17 +74,78 @@
     }
 
     /**
-     * calculate endpoints in one pass through the tokens by tracking our progress in each DC, rack etc.
+     * Endpoint adder applying the replication rules for a given DC.
      */
-    @SuppressWarnings("serial")
+    private static final class DatacenterEndpoints
+    {
+        /** List accepted endpoints get pushed into. */
+        Set<InetAddress> endpoints;
+        /**
+         * Racks encountered so far. Replicas are put into separate racks while possible.
+         * For efficiency the set is shared between the instances, using the location pair (dc, rack) to make sure
+         * clashing names aren't a problem.
+         */
+        Set<Pair<String, String>> racks;
+
+        /** Number of replicas left to fill from this DC. */
+        int rfLeft;
+        int acceptableRackRepeats;
+
+        DatacenterEndpoints(int rf, int rackCount, int nodeCount, Set<InetAddress> endpoints, Set<Pair<String, String>> racks)
+        {
+            this.endpoints = endpoints;
+            this.racks = racks;
+            // If there aren't enough nodes in this DC to fill the RF, the number of nodes is the effective RF.
+            this.rfLeft = Math.min(rf, nodeCount);
+            // If there aren't enough racks in this DC to fill the RF, we'll still use at least one node from each rack,
+            // and the difference is to be filled by the first encountered nodes.
+            acceptableRackRepeats = rf - rackCount;
+        }
+
+        /**
+         * Attempts to add an endpoint to the replicas for this datacenter, adding to the endpoints set if successful.
+         * Returns true if the endpoint was added, and this datacenter does not require further replicas.
+         */
+        boolean addEndpointAndCheckIfDone(InetAddress ep, Pair<String,String> location)
+        {
+            if (done())
+                return false;
+
+            if (racks.add(location))
+            {
+                // New rack.
+                --rfLeft;
+                boolean added = endpoints.add(ep);
+                assert added;
+                return done();
+            }
+            if (acceptableRackRepeats <= 0)
+                // There must be rfLeft distinct racks left, do not add any more rack repeats.
+                return false;
+            if (!endpoints.add(ep))
+                // Cannot repeat a node.
+                return false;
+            // Added a node that is from an already met rack to match RF when there aren't enough racks.
+            --acceptableRackRepeats;
+            --rfLeft;
+            return done();
+        }
+
+        boolean done()
+        {
+            assert rfLeft >= 0;
+            return rfLeft == 0;
+        }
+    }
+
+    /**
+     * calculate endpoints in one pass through the tokens by tracking our progress in each DC.
+     */
     public List<InetAddress> calculateNaturalEndpoints(Token searchToken, TokenMetadata tokenMetadata)
     {
         // we want to preserve insertion order so that the first added endpoint becomes primary
         Set<InetAddress> replicas = new LinkedHashSet<>();
-        // replicas we have found in each DC
-        Map<String, Set<InetAddress>> dcReplicas = new HashMap<>(datacenters.size());
-        for (Map.Entry<String, Integer> dc : datacenters.entrySet())
-            dcReplicas.put(dc.getKey(), new HashSet<InetAddress>(dc.getValue()));
+        Set<Pair<String, String>> seenRacks = new HashSet<>();
 
         Topology topology = tokenMetadata.getTopology();
         // all endpoints in each DC, so we can check when we have exhausted all the members of a DC
@@ -94,74 +154,45 @@
         Map<String, Multimap<String, InetAddress>> racks = topology.getDatacenterRacks();
         assert !allEndpoints.isEmpty() && !racks.isEmpty() : "not aware of any cluster members";
 
-        // tracks the racks we have already placed replicas in
-        Map<String, Set<String>> seenRacks = new HashMap<>(datacenters.size());
-        for (Map.Entry<String, Integer> dc : datacenters.entrySet())
-            seenRacks.put(dc.getKey(), new HashSet<String>());
+        int dcsToFill = 0;
+        Map<String, DatacenterEndpoints> dcs = new HashMap<>(datacenters.size() * 2);
 
-        // tracks the endpoints that we skipped over while looking for unique racks
-        // when we relax the rack uniqueness we can append this to the current result so we don't have to wind back the iterator
-        Map<String, Set<InetAddress>> skippedDcEndpoints = new HashMap<>(datacenters.size());
-        for (Map.Entry<String, Integer> dc : datacenters.entrySet())
-            skippedDcEndpoints.put(dc.getKey(), new LinkedHashSet<InetAddress>());
+        // Create a DatacenterEndpoints object for each non-empty DC.
+        for (Map.Entry<String, Integer> en : datacenters.entrySet())
+        {
+            String dc = en.getKey();
+            int rf = en.getValue();
+            int nodeCount = sizeOrZero(allEndpoints.get(dc));
+
+            if (rf <= 0 || nodeCount <= 0)
+                continue;
+
+            DatacenterEndpoints dcEndpoints = new DatacenterEndpoints(rf, sizeOrZero(racks.get(dc)), nodeCount, replicas, seenRacks);
+            dcs.put(dc, dcEndpoints);
+            ++dcsToFill;
+        }
 
         Iterator<Token> tokenIter = TokenMetadata.ringIterator(tokenMetadata.sortedTokens(), searchToken, false);
-        while (tokenIter.hasNext() && !hasSufficientReplicas(dcReplicas, allEndpoints))
+        while (dcsToFill > 0 && tokenIter.hasNext())
         {
             Token next = tokenIter.next();
             InetAddress ep = tokenMetadata.getEndpoint(next);
-            String dc = snitch.getDatacenter(ep);
-            // have we already found all replicas for this dc?
-            if (!datacenters.containsKey(dc) || hasSufficientReplicas(dc, dcReplicas, allEndpoints))
-                continue;
-            // can we skip checking the rack?
-            if (seenRacks.get(dc).size() == racks.get(dc).keySet().size())
-            {
-                dcReplicas.get(dc).add(ep);
-                replicas.add(ep);
-            }
-            else
-            {
-                String rack = snitch.getRack(ep);
-                // is this a new rack?
-                if (seenRacks.get(dc).contains(rack))
-                {
-                    skippedDcEndpoints.get(dc).add(ep);
-                }
-                else
-                {
-                    dcReplicas.get(dc).add(ep);
-                    replicas.add(ep);
-                    seenRacks.get(dc).add(rack);
-                    // if we've run out of distinct racks, add the hosts we skipped past already (up to RF)
-                    if (seenRacks.get(dc).size() == racks.get(dc).keySet().size())
-                    {
-                        Iterator<InetAddress> skippedIt = skippedDcEndpoints.get(dc).iterator();
-                        while (skippedIt.hasNext() && !hasSufficientReplicas(dc, dcReplicas, allEndpoints))
-                        {
-                            InetAddress nextSkipped = skippedIt.next();
-                            dcReplicas.get(dc).add(nextSkipped);
-                            replicas.add(nextSkipped);
-                        }
-                    }
-                }
-            }
+            Pair<String, String> location = topology.getLocation(ep);
+            DatacenterEndpoints dcEndpoints = dcs.get(location.left);
+            if (dcEndpoints != null && dcEndpoints.addEndpointAndCheckIfDone(ep, location))
+                --dcsToFill;
         }
-
-        return new ArrayList<InetAddress>(replicas);
+        return new ArrayList<>(replicas);
     }
 
-    private boolean hasSufficientReplicas(String dc, Map<String, Set<InetAddress>> dcReplicas, Multimap<String, InetAddress> allEndpoints)
+    private int sizeOrZero(Multimap<?, ?> collection)
     {
-        return dcReplicas.get(dc).size() >= Math.min(allEndpoints.get(dc).size(), getReplicationFactor(dc));
+        return collection != null ? collection.asMap().size() : 0;
     }
 
-    private boolean hasSufficientReplicas(Map<String, Set<InetAddress>> dcReplicas, Multimap<String, InetAddress> allEndpoints)
+    private int sizeOrZero(Collection<?> collection)
     {
-        for (String dc : datacenters.keySet())
-            if (!hasSufficientReplicas(dc, dcReplicas, allEndpoints))
-                return false;
-        return true;
+        return collection != null ? collection.size() : 0;
     }
 
     public int getReplicationFactor()
@@ -193,12 +224,6 @@
         }
     }
 
-    public Collection<String> recognizedOptions()
-    {
-        // We explicitely allow all options
-        return null;
-    }
-
     @Override
     public boolean hasSameSettings(AbstractReplicationStrategy other)
     {

diff --git a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java
index 3277af7..6b6182f 100644
--- a/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java
+++ b/src/java/org/apache/cassandra/locator/ReconnectableSnitchHelper.java

@@ -80,7 +80,7 @@
 
     public void onChange(InetAddress endpoint, ApplicationState state, VersionedValue value)
     {
-        if (preferLocal && !Gossiper.instance.isDeadState(Gossiper.instance.getEndpointStateForEndpoint(endpoint)) && state == ApplicationState.INTERNAL_IP)
+        if (preferLocal && state == ApplicationState.INTERNAL_IP && !Gossiper.instance.isDeadState(Gossiper.instance.getEndpointStateForEndpoint(endpoint)))
             reconnect(endpoint, value);
     }
 

diff --git a/src/java/org/apache/cassandra/locator/TokenMetadata.java b/src/java/org/apache/cassandra/locator/TokenMetadata.java
index 97c5f10..8e43016 100644
--- a/src/java/org/apache/cassandra/locator/TokenMetadata.java
+++ b/src/java/org/apache/cassandra/locator/TokenMetadata.java

@@ -126,7 +126,7 @@
     }
 
     /**
-     * To be used by tests only (via {@link StorageService.setPartitionerUnsafe}).
+     * To be used by tests only (via {@link org.apache.cassandra.service.StorageService#setPartitionerUnsafe}).
      */
     @VisibleForTesting
     public TokenMetadata cloneWithNewPartitioner(IPartitioner newPartitioner)
@@ -869,18 +869,18 @@
 
     public Token getPredecessor(Token token)
     {
-        List tokens = sortedTokens();
+        List<Token> tokens = sortedTokens();
         int index = Collections.binarySearch(tokens, token);
         assert index >= 0 : token + " not found in " + StringUtils.join(tokenToEndpointMap.keySet(), ", ");
-        return (Token) (index == 0 ? tokens.get(tokens.size() - 1) : tokens.get(index - 1));
+        return index == 0 ? tokens.get(tokens.size() - 1) : tokens.get(index - 1);
     }
 
     public Token getSuccessor(Token token)
     {
-        List tokens = sortedTokens();
+        List<Token> tokens = sortedTokens();
         int index = Collections.binarySearch(tokens, token);
         assert index >= 0 : token + " not found in " + StringUtils.join(tokenToEndpointMap.keySet(), ", ");
-        return (Token) ((index == (tokens.size() - 1)) ? tokens.get(0) : tokens.get(index + 1));
+        return (index == (tokens.size() - 1)) ? tokens.get(0) : tokens.get(index + 1);
     }
 
     /** @return a copy of the bootstrapping tokens map */
@@ -941,7 +941,7 @@
         }
     }
 
-    public static int firstTokenIndex(final ArrayList ring, Token start, boolean insertMin)
+    public static int firstTokenIndex(final ArrayList<Token> ring, Token start, boolean insertMin)
     {
         assert ring.size() > 0;
         // insert the minimum token (at index == -1) if we were asked to include it and it isn't a member of the ring
@@ -969,7 +969,7 @@
     {
         if (ring.isEmpty())
             return includeMin ? Iterators.singletonIterator(start.getPartitioner().getMinimumToken())
-                              : Iterators.<Token>emptyIterator();
+                              : Collections.emptyIterator();
 
         final boolean insertMin = includeMin && !ring.get(0).isMinimum();
         final int startIndex = firstTokenIndex(ring, start, insertMin);
@@ -1307,5 +1307,14 @@
         {
             return dcRacks;
         }
+
+        /**
+         * @return The DC and rack of the given endpoint.
+         */
+        public Pair<String, String> getLocation(InetAddress addr)
+        {
+            return currentLocations.get(addr);
+        }
+
     }
 }

diff --git a/src/java/org/apache/cassandra/metrics/CacheMetrics.java b/src/java/org/apache/cassandra/metrics/CacheMetrics.java
index 151268b..e623dcb 100644
--- a/src/java/org/apache/cassandra/metrics/CacheMetrics.java
+++ b/src/java/org/apache/cassandra/metrics/CacheMetrics.java

@@ -17,8 +17,6 @@
  */
 package org.apache.cassandra.metrics;
 
-import java.util.concurrent.atomic.AtomicLong;
-
 import com.codahale.metrics.Gauge;
 import com.codahale.metrics.Meter;
 import com.codahale.metrics.RatioGauge;
@@ -56,7 +54,7 @@
      * @param type Type of Cache to identify metrics.
      * @param cache Cache to measure metrics
      */
-    public CacheMetrics(String type, final ICache cache)
+    public CacheMetrics(String type, final ICache<?, ?> cache)
     {
         MetricNameFactory factory = new DefaultNameFactory("Cache", type);
 

diff --git a/src/java/org/apache/cassandra/metrics/CacheMissMetrics.java b/src/java/org/apache/cassandra/metrics/CacheMissMetrics.java
new file mode 100644
index 0000000..19d61ef
--- /dev/null
+++ b/src/java/org/apache/cassandra/metrics/CacheMissMetrics.java

@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.metrics;
+
+import com.codahale.metrics.Gauge;
+import com.codahale.metrics.Meter;
+import com.codahale.metrics.RatioGauge;
+import com.codahale.metrics.Timer;
+import org.apache.cassandra.cache.CacheSize;
+
+import static org.apache.cassandra.metrics.CassandraMetricsRegistry.Metrics;
+
+/**
+ * Metrics for {@code ICache}.
+ */
+public class CacheMissMetrics
+{
+    /** Cache capacity in bytes */
+    public final Gauge<Long> capacity;
+    /** Total number of cache hits */
+    public final Meter misses;
+    /** Total number of cache requests */
+    public final Meter requests;
+    /** Latency of misses */
+    public final Timer missLatency;
+    /** all time cache hit rate */
+    public final Gauge<Double> hitRate;
+    /** 1m hit rate */
+    public final Gauge<Double> oneMinuteHitRate;
+    /** 5m hit rate */
+    public final Gauge<Double> fiveMinuteHitRate;
+    /** 15m hit rate */
+    public final Gauge<Double> fifteenMinuteHitRate;
+    /** Total size of cache, in bytes */
+    public final Gauge<Long> size;
+    /** Total number of cache entries */
+    public final Gauge<Integer> entries;
+
+    /**
+     * Create metrics for given cache.
+     *
+     * @param type Type of Cache to identify metrics.
+     * @param cache Cache to measure metrics
+     */
+    public CacheMissMetrics(String type, final CacheSize cache)
+    {
+        MetricNameFactory factory = new DefaultNameFactory("Cache", type);
+
+        capacity = Metrics.register(factory.createMetricName("Capacity"), (Gauge<Long>) cache::capacity);
+        misses = Metrics.meter(factory.createMetricName("Misses"));
+        requests = Metrics.meter(factory.createMetricName("Requests"));
+        missLatency = Metrics.timer(factory.createMetricName("MissLatency"));
+        hitRate = Metrics.register(factory.createMetricName("HitRate"), new RatioGauge()
+        {
+            @Override
+            public Ratio getRatio()
+            {
+                long req = requests.getCount();
+                long mis = misses.getCount();
+                return Ratio.of(req - mis, req);
+            }
+        });
+        oneMinuteHitRate = Metrics.register(factory.createMetricName("OneMinuteHitRate"), new RatioGauge()
+        {
+            protected Ratio getRatio()
+            {
+                double req = requests.getOneMinuteRate();
+                double mis = misses.getOneMinuteRate();
+                return Ratio.of(req - mis, req);
+            }
+        });
+        fiveMinuteHitRate = Metrics.register(factory.createMetricName("FiveMinuteHitRate"), new RatioGauge()
+        {
+            protected Ratio getRatio()
+            {
+                double req = requests.getFiveMinuteRate();
+                double mis = misses.getFiveMinuteRate();
+                return Ratio.of(req - mis, req);
+            }
+        });
+        fifteenMinuteHitRate = Metrics.register(factory.createMetricName("FifteenMinuteHitRate"), new RatioGauge()
+        {
+            protected Ratio getRatio()
+            {
+                double req = requests.getFifteenMinuteRate();
+                double mis = misses.getFifteenMinuteRate();
+                return Ratio.of(req - mis, req);
+            }
+        });
+        size = Metrics.register(factory.createMetricName("Size"), (Gauge<Long>) cache::weightedSize);
+        entries = Metrics.register(factory.createMetricName("Entries"), (Gauge<Integer>) cache::size);
+    }
+
+    public void reset()
+    {
+        requests.mark(-requests.getCount());
+        misses.mark(-misses.getCount());
+    }
+}

diff --git a/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java b/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java
index 1cd3b6c..0d9e6d6 100644
--- a/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java
+++ b/src/java/org/apache/cassandra/metrics/CassandraMetricsRegistry.java

@@ -191,6 +191,18 @@
             mBeanServer.unregisterMBean(name.getMBeanName());
         } catch (Exception ignore) {}
     }
+    
+    /**
+     * Strips a single final '$' from input
+     * 
+     * @param s String to strip
+     * @return a string with one less '$' at end
+     */
+    private static String withoutFinalDollar(String s)
+    {
+        int l = s.length();
+        return (l!=0 && '$' == s.charAt(l-1))?s.substring(0,l-1):s;
+    }
 
     public interface MetricMBean
     {
@@ -601,7 +613,7 @@
         public MetricName(Class<?> klass, String name, String scope)
         {
             this(klass.getPackage() == null ? "" : klass.getPackage().getName(),
-                    klass.getSimpleName().replaceAll("\\$$", ""),
+                    withoutFinalDollar(klass.getSimpleName()),
                     name,
                     scope);
         }
@@ -811,7 +823,7 @@
         {
             if (type == null || type.isEmpty())
             {
-                type = klass.getSimpleName().replaceAll("\\$$", "");
+                type = withoutFinalDollar(klass.getSimpleName());
             }
             return type;
         }

diff --git a/src/java/org/apache/cassandra/metrics/CommitLogMetrics.java b/src/java/org/apache/cassandra/metrics/CommitLogMetrics.java
index 1da6ed0..08c1c8e 100644
--- a/src/java/org/apache/cassandra/metrics/CommitLogMetrics.java
+++ b/src/java/org/apache/cassandra/metrics/CommitLogMetrics.java

@@ -17,11 +17,10 @@
  */
 package org.apache.cassandra.metrics;
 
-
 import com.codahale.metrics.Gauge;
 import com.codahale.metrics.Timer;
 import org.apache.cassandra.db.commitlog.AbstractCommitLogService;
-import org.apache.cassandra.db.commitlog.CommitLogSegmentManager;
+import org.apache.cassandra.db.commitlog.AbstractCommitLogSegmentManager;
 
 import static org.apache.cassandra.metrics.CassandraMetricsRegistry.Metrics;
 
@@ -42,14 +41,14 @@
     public final Timer waitingOnSegmentAllocation;
     /** The time spent waiting on CL sync; for Periodic this is only occurs when the sync is lagging its sync interval */
     public final Timer waitingOnCommit;
-    
+
     public CommitLogMetrics()
     {
         waitingOnSegmentAllocation = Metrics.timer(factory.createMetricName("WaitingOnSegmentAllocation"));
         waitingOnCommit = Metrics.timer(factory.createMetricName("WaitingOnCommit"));
     }
 
-    public void attach(final AbstractCommitLogService service, final CommitLogSegmentManager allocator)
+    public void attach(final AbstractCommitLogService service, final AbstractCommitLogSegmentManager segmentManager)
     {
         completedTasks = Metrics.register(factory.createMetricName("CompletedTasks"), new Gauge<Long>()
         {
@@ -69,7 +68,7 @@
         {
             public Long getValue()
             {
-                return allocator.onDiskSize();
+                return segmentManager.onDiskSize();
             }
         });
     }

diff --git a/src/java/org/apache/cassandra/metrics/CompactionMetrics.java b/src/java/org/apache/cassandra/metrics/CompactionMetrics.java
index 19eadc8..9d2863f 100644
--- a/src/java/org/apache/cassandra/metrics/CompactionMetrics.java
+++ b/src/java/org/apache/cassandra/metrics/CompactionMetrics.java

@@ -24,6 +24,7 @@
 import com.codahale.metrics.Gauge;
 import com.codahale.metrics.Meter;
 
+import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.Keyspace;
@@ -44,6 +45,9 @@
 
     /** Estimated number of compactions remaining to perform */
     public final Gauge<Integer> pendingTasks;
+    /** Estimated number of compactions remaining to perform, group by keyspace and then table name */
+    public final Gauge<Map<String, Map<String, Integer>>> pendingTasksByTableName;
+
     /** Number of completed compactions since server [re]start */
     public final Gauge<Long> completedTasks;
     /** Total number of compactions since server [re]start */
@@ -68,6 +72,58 @@
                 return n + compactions.size();
             }
         });
+
+        pendingTasksByTableName = Metrics.register(factory.createMetricName("PendingTasksByTableName"),
+            new Gauge<Map<String, Map<String, Integer>>>()
+        {
+            @Override
+            public Map<String, Map<String, Integer>> getValue() {
+                Map<String, Map<String, Integer>> resultMap = new HashMap<>();
+                // estimation of compactions need to be done
+                for (String keyspaceName : Schema.instance.getKeyspaces())
+                {
+                    for (ColumnFamilyStore cfs : Keyspace.open(keyspaceName).getColumnFamilyStores())
+                    {
+                        int taskNumber = cfs.getCompactionStrategyManager().getEstimatedRemainingTasks();
+                        if (taskNumber > 0)
+                        {
+                            if (!resultMap.containsKey(keyspaceName))
+                            {
+                                resultMap.put(keyspaceName, new HashMap<>());
+                            }
+                            resultMap.get(keyspaceName).put(cfs.getTableName(), taskNumber);
+                        }
+                    }
+                }
+
+                // currently running compactions
+                for (CompactionInfo.Holder compaction : compactions)
+                {
+                    CFMetaData metaData = compaction.getCompactionInfo().getCFMetaData();
+                    if (metaData == null)
+                    {
+                        continue;
+                    }
+                    if (!resultMap.containsKey(metaData.ksName))
+                    {
+                        resultMap.put(metaData.ksName, new HashMap<>());
+                    }
+
+                    Map<String, Integer> tableNameToCountMap = resultMap.get(metaData.ksName);
+                    if (tableNameToCountMap.containsKey(metaData.cfName))
+                    {
+                        tableNameToCountMap.put(metaData.cfName,
+                                                tableNameToCountMap.get(metaData.cfName) + 1);
+                    }
+                    else
+                    {
+                        tableNameToCountMap.put(metaData.cfName, 1);
+                    }
+                }
+                return resultMap;
+            }
+        });
+
         completedTasks = Metrics.register(factory.createMetricName("CompletedTasks"), new Gauge<Long>()
         {
             public Long getValue()

diff --git a/src/java/org/apache/cassandra/metrics/DroppedMessageMetrics.java b/src/java/org/apache/cassandra/metrics/DroppedMessageMetrics.java
index 58c80fb..2a94c9f 100644
--- a/src/java/org/apache/cassandra/metrics/DroppedMessageMetrics.java
+++ b/src/java/org/apache/cassandra/metrics/DroppedMessageMetrics.java

@@ -18,6 +18,8 @@
 package org.apache.cassandra.metrics;
 
 import com.codahale.metrics.Meter;
+import com.codahale.metrics.Timer;
+
 import org.apache.cassandra.net.MessagingService;
 
 import static org.apache.cassandra.metrics.CassandraMetricsRegistry.Metrics;
@@ -30,9 +32,17 @@
     /** Number of dropped messages */
     public final Meter dropped;
 
+    /** The dropped latency within node */
+    public final Timer internalDroppedLatency;
+
+    /** The cross node dropped latency */
+    public final Timer crossNodeDroppedLatency;
+
     public DroppedMessageMetrics(MessagingService.Verb verb)
     {
         MetricNameFactory factory = new DefaultNameFactory("DroppedMessage", verb.toString());
         dropped = Metrics.meter(factory.createMetricName("Dropped"));
+        internalDroppedLatency = Metrics.timer(factory.createMetricName("InternalDroppedLatency"));
+        crossNodeDroppedLatency = Metrics.timer(factory.createMetricName("CrossNodeDroppedLatency"));
     }
 }

diff --git a/src/java/org/apache/cassandra/metrics/HintsServiceMetrics.java b/src/java/org/apache/cassandra/metrics/HintsServiceMetrics.java
index 062f67d..0d36905 100644
--- a/src/java/org/apache/cassandra/metrics/HintsServiceMetrics.java
+++ b/src/java/org/apache/cassandra/metrics/HintsServiceMetrics.java

@@ -18,7 +18,7 @@
 package org.apache.cassandra.metrics;
 
 /**
- * Metrics for {@link HintsService}.
+ * Metrics for {@link org.apache.cassandra.hints.HintsService}.
  */
 public final class HintsServiceMetrics
 {

diff --git a/src/java/org/apache/cassandra/metrics/MessagingMetrics.java b/src/java/org/apache/cassandra/metrics/MessagingMetrics.java
new file mode 100644
index 0000000..e126c93
--- /dev/null
+++ b/src/java/org/apache/cassandra/metrics/MessagingMetrics.java

@@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.metrics;
+
+import java.net.InetAddress;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.codahale.metrics.Timer;
+
+import static org.apache.cassandra.metrics.CassandraMetricsRegistry.Metrics;
+
+/**
+ * Metrics for messages
+ */
+public class MessagingMetrics
+{
+    private static Logger logger = LoggerFactory.getLogger(MessagingMetrics.class);
+    private static final MetricNameFactory factory = new DefaultNameFactory("Messaging");
+    public final Timer crossNodeLatency;
+    public final ConcurrentHashMap<String, Timer> dcLatency;
+
+    public MessagingMetrics()
+    {
+        crossNodeLatency = Metrics.timer(factory.createMetricName("CrossNodeLatency"));
+        dcLatency = new ConcurrentHashMap<>();
+    }
+
+    public void addTimeTaken(InetAddress from, long timeTaken)
+    {
+        String dc = DatabaseDescriptor.getEndpointSnitch().getDatacenter(from);
+        Timer timer = dcLatency.get(dc);
+        if (timer == null)
+        {
+            timer = dcLatency.computeIfAbsent(dc, k -> Metrics.timer(factory.createMetricName(dc + "-Latency")));
+        }
+        timer.update(timeTaken, TimeUnit.MILLISECONDS);
+        crossNodeLatency.update(timeTaken, TimeUnit.MILLISECONDS);
+    }
+}

diff --git a/src/java/org/apache/cassandra/metrics/TableMetrics.java b/src/java/org/apache/cassandra/metrics/TableMetrics.java
index 85bf7f6..d4be287 100644
--- a/src/java/org/apache/cassandra/metrics/TableMetrics.java
+++ b/src/java/org/apache/cassandra/metrics/TableMetrics.java

@@ -24,13 +24,18 @@
 
 import com.codahale.metrics.*;
 import com.codahale.metrics.Timer;
+import com.google.common.collect.Iterables;
 import com.google.common.collect.Maps;
+
+import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.db.Memtable;
 import org.apache.cassandra.db.lifecycle.SSTableSet;
+import org.apache.cassandra.index.SecondaryIndexManager;
+import org.apache.cassandra.io.compress.CompressionMetadata;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
-import org.apache.cassandra.io.sstable.metadata.MetadataCollector;
+import org.apache.cassandra.repair.SystemDistributedKeyspace;
 import org.apache.cassandra.utils.EstimatedHistogram;
 import org.apache.cassandra.utils.TopKSampler;
 
@@ -42,6 +47,8 @@
 public class TableMetrics
 {
 
+    public static final long[] EMPTY = new long[0];
+
     /** Total amount of data stored in the memtable that resides on-heap, including column related overhead and partitions overwritten. */
     public final Gauge<Long> memtableOnHeapSize;
     /** Total amount of data stored in the memtable that resides off-heap, including column related overhead and partitions overwritten. */
@@ -76,6 +83,10 @@
     public final LatencyMetrics writeLatency;
     /** Estimated number of tasks pending for this table */
     public final Counter pendingFlushes;
+    /** Total number of bytes flushed since server [re]start */
+    public final Counter bytesFlushed;
+    /** Total number of bytes written by compaction since server [re]start */
+    public final Counter compactionBytesWritten;
     /** Estimate of number of pending compactios for this table */
     public final Gauge<Integer> pendingCompactions;
     /** Number of SSTables on disk for this CF */
@@ -132,6 +143,8 @@
     public final LatencyMetrics casPropose;
     /** CAS Commit metrics */
     public final LatencyMetrics casCommit;
+    /** percent of the data that is repaired */
+    public final Gauge<Double> percentRepaired;
 
     public final Timer coordinatorReadLatency;
     public final Timer coordinatorScanLatency;
@@ -139,6 +152,9 @@
     /** Time spent waiting for free memtable space, either on- or off-heap */
     public final Histogram waitingOnFreeMemtableSpace;
 
+    /** Dropped Mutations Count */
+    public final Counter droppedMutations;
+
     private final MetricNameFactory factory;
     private final MetricNameFactory aliasFactory;
     private static final MetricNameFactory globalFactory = new AllTableMetricNameFactory("Table");
@@ -150,6 +166,40 @@
     public final static LatencyMetrics globalWriteLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Write");
     public final static LatencyMetrics globalRangeLatency = new LatencyMetrics(globalFactory, globalAliasFactory, "Range");
 
+    public final static Gauge<Double> globalPercentRepaired = Metrics.register(globalFactory.createMetricName("PercentRepaired"),
+            new Gauge<Double>()
+    {
+        public Double getValue()
+        {
+            double repaired = 0;
+            double total = 0;
+            for (String keyspace : Schema.instance.getNonSystemKeyspaces())
+            {
+                Keyspace k = Schema.instance.getKeyspaceInstance(keyspace);
+                if (SystemDistributedKeyspace.NAME.equals(k.getName()))
+                    continue;
+                if (k.getReplicationStrategy().getReplicationFactor() < 2)
+                    continue;
+
+                for (ColumnFamilyStore cf : k.getColumnFamilyStores())
+                {
+                    if (!SecondaryIndexManager.isIndexColumnFamily(cf.name))
+                    {
+                        for (SSTableReader sstable : cf.getSSTables(SSTableSet.CANONICAL))
+                        {
+                            if (sstable.isRepaired())
+                            {
+                                repaired += sstable.uncompressedLength();
+                            }
+                            total += sstable.uncompressedLength();
+                        }
+                    }
+                }
+            }
+            return total > 0 ? (repaired / total) * 100 : 100.0;
+        }
+    });
+
     public final Map<Sampler, TopKSampler<ByteBuffer>> samplers;
     /**
      * stores metrics that will be rolled into a single global metric
@@ -171,7 +221,7 @@
         Iterator<SSTableReader> iterator = sstables.iterator();
         if (!iterator.hasNext())
         {
-            return new long[0];
+            return EMPTY;
         }
         long[] firstBucket = getHistogram.getHistogram(iterator.next()).getBuckets(false);
         long[] values = new long[firstBucket.length];
@@ -323,42 +373,39 @@
         {
             public Double getValue()
             {
-                double sum = 0;
-                int total = 0;
-                for (SSTableReader sstable : cfs.getSSTables(SSTableSet.CANONICAL))
-                {
-                    if (sstable.getCompressionRatio() != MetadataCollector.NO_COMPRESSION_RATIO)
-                    {
-                        sum += sstable.getCompressionRatio();
-                        total++;
-                    }
-                }
-                return total != 0 ? sum / total : 0;
+                return computeCompressionRatio(cfs.getSSTables(SSTableSet.CANONICAL));
             }
         }, new Gauge<Double>() // global gauge
         {
             public Double getValue()
             {
-                double sum = 0;
-                int total = 0;
-                for (Keyspace keyspace : Keyspace.all())
+                return computeCompressionRatio(Iterables.concat(Iterables.transform(Keyspace.all(),
+                                                                                    p -> p.getAllSSTables(SSTableSet.CANONICAL))));
+            }
+        });
+        percentRepaired = createTableGauge("PercentRepaired", new Gauge<Double>()
+        {
+            public Double getValue()
+            {
+                double repaired = 0;
+                double total = 0;
+                for (SSTableReader sstable : cfs.getSSTables(SSTableSet.CANONICAL))
                 {
-                    for (SSTableReader sstable : keyspace.getAllSSTables(SSTableSet.CANONICAL))
+                    if (sstable.isRepaired())
                     {
-                        if (sstable.getCompressionRatio() != MetadataCollector.NO_COMPRESSION_RATIO)
-                        {
-                            sum += sstable.getCompressionRatio();
-                            total++;
-                        }
+                        repaired += sstable.uncompressedLength();
                     }
+                    total += sstable.uncompressedLength();
                 }
-                return total != 0 ? sum / total : 0;
+                return total > 0 ? (repaired / total) * 100 : 100.0;
             }
         });
         readLatency = new LatencyMetrics(factory, "Read", cfs.keyspace.metric.readLatency, globalReadLatency);
         writeLatency = new LatencyMetrics(factory, "Write", cfs.keyspace.metric.writeLatency, globalWriteLatency);
         rangeLatency = new LatencyMetrics(factory, "Range", cfs.keyspace.metric.rangeLatency, globalRangeLatency);
         pendingFlushes = createTableCounter("PendingFlushes");
+        bytesFlushed = createTableCounter("BytesFlushed");
+        compactionBytesWritten = createTableCounter("CompactionBytesWritten");
         pendingCompactions = createTableGauge("PendingCompactions", new Gauge<Integer>()
         {
             public Integer getValue()
@@ -640,6 +687,7 @@
         rowCacheHitOutOfRange = createTableCounter("RowCacheHitOutOfRange");
         rowCacheHit = createTableCounter("RowCacheHit");
         rowCacheMiss = createTableCounter("RowCacheMiss");
+        droppedMutations = createTableCounter("DroppedMutations");
 
         casPrepare = new LatencyMetrics(factory, "CasPrepare", cfs.keyspace.metric.casPrepare);
         casPropose = new LatencyMetrics(factory, "CasPropose", cfs.keyspace.metric.casPropose);
@@ -748,6 +796,32 @@
     }
 
     /**
+     * Computes the compression ratio for the specified SSTables
+     *
+     * @param sstables the SSTables
+     * @return the compression ratio for the specified SSTables
+     */
+    private static Double computeCompressionRatio(Iterable<SSTableReader> sstables)
+    {
+        double compressedLengthSum = 0;
+        double dataLengthSum = 0;
+        for (SSTableReader sstable : sstables)
+        {
+            if (sstable.compression)
+            {
+                // We should not have any sstable which are in an open early mode as the sstable were selected
+                // using SSTableSet.CANONICAL.
+                assert sstable.openReason != SSTableReader.OpenReason.EARLY;
+
+                CompressionMetadata compressionMetadata = sstable.getCompressionMetadata();
+                compressedLengthSum += compressionMetadata.compressedFileLength;
+                dataLengthSum += compressionMetadata.dataLength;
+            }
+        }
+        return dataLengthSum != 0 ? compressedLengthSum / dataLengthSum : 0;
+    }
+
+    /**
      * Create a histogram-like interface that will register both a CF, keyspace and global level
      * histogram and forward any updates to both
      */

diff --git a/src/java/org/apache/cassandra/net/IAsyncCallback.java b/src/java/org/apache/cassandra/net/IAsyncCallback.java
index a29260c..d159e0c 100644
--- a/src/java/org/apache/cassandra/net/IAsyncCallback.java
+++ b/src/java/org/apache/cassandra/net/IAsyncCallback.java

@@ -31,7 +31,7 @@
  */
 public interface IAsyncCallback<T>
 {
-    public static Predicate<InetAddress> isAlive = new Predicate<InetAddress>()
+    Predicate<InetAddress> isAlive = new Predicate<InetAddress>()
     {
         public boolean apply(InetAddress endpoint)
         {
@@ -42,7 +42,7 @@
     /**
      * @param msg response received.
      */
-    public void response(MessageIn<T> msg);
+    void response(MessageIn<T> msg);
 
     /**
      * @return true if this callback is on the read path and its latency should be

diff --git a/src/java/org/apache/cassandra/net/IAsyncCallbackWithFailure.java b/src/java/org/apache/cassandra/net/IAsyncCallbackWithFailure.java
index 744bb62..546a416 100644
--- a/src/java/org/apache/cassandra/net/IAsyncCallbackWithFailure.java
+++ b/src/java/org/apache/cassandra/net/IAsyncCallbackWithFailure.java

@@ -25,5 +25,5 @@
     /**
      * Called when there is an exception on the remote node or timeout happens
      */
-    public void onFailure(InetAddress from);
+    void onFailure(InetAddress from);
 }

diff --git a/src/java/org/apache/cassandra/net/IVerbHandler.java b/src/java/org/apache/cassandra/net/IVerbHandler.java
index 574f30f..b9f1a54 100644
--- a/src/java/org/apache/cassandra/net/IVerbHandler.java
+++ b/src/java/org/apache/cassandra/net/IVerbHandler.java

@@ -19,6 +19,8 @@
 
 import java.io.IOException;
 
+import org.apache.cassandra.db.ReadCommand;
+
 /**
  * IVerbHandler provides the method that all verb handlers need to implement.
  * The concrete implementation of this interface would provide the functionality
@@ -36,5 +38,5 @@
      * @param message - incoming message that needs handling.
      * @param id
      */
-    public void doVerb(MessageIn<T> message, int id) throws IOException;
+    void doVerb(MessageIn<T> message, int id) throws IOException;
 }

diff --git a/src/java/org/apache/cassandra/net/IncomingTcpConnection.java b/src/java/org/apache/cassandra/net/IncomingTcpConnection.java
index de64444..9e8e2e1 100644
--- a/src/java/org/apache/cassandra/net/IncomingTcpConnection.java
+++ b/src/java/org/apache/cassandra/net/IncomingTcpConnection.java

@@ -187,18 +187,7 @@
         else
             id = input.readInt();
 
-        long timestamp = System.currentTimeMillis();
-        boolean isCrossNodeTimestamp = false;
-        // make sure to readInt, even if cross_node_to is not enabled
-        int partial = input.readInt();
-        if (DatabaseDescriptor.hasCrossNodeTimeout())
-        {
-            long crossNodeTimestamp = (timestamp & 0xFFFFFFFF00000000L) | (((partial & 0xFFFFFFFFL) << 2) >> 2);
-            isCrossNodeTimestamp = (timestamp != crossNodeTimestamp);
-            timestamp = crossNodeTimestamp;
-        }
-
-        MessageIn message = MessageIn.read(input, version, id);
+        MessageIn message = MessageIn.read(input, version, id, MessageIn.readTimestamp(from, input, System.currentTimeMillis()));
         if (message == null)
         {
             // callback expired; nothing to do
@@ -206,7 +195,7 @@
         }
         if (version <= MessagingService.current_version)
         {
-            MessagingService.instance().receive(message, id, timestamp, isCrossNodeTimestamp);
+            MessagingService.instance().receive(message, id);
         }
         else
         {

diff --git a/src/java/org/apache/cassandra/net/MessageDeliveryTask.java b/src/java/org/apache/cassandra/net/MessageDeliveryTask.java
index ce6eebc..d9f8b7c 100644
--- a/src/java/org/apache/cassandra/net/MessageDeliveryTask.java
+++ b/src/java/org/apache/cassandra/net/MessageDeliveryTask.java

@@ -33,25 +33,22 @@
 
     private final MessageIn message;
     private final int id;
-    private final long constructionTime;
-    private final boolean isCrossNodeTimestamp;
 
-    public MessageDeliveryTask(MessageIn message, int id, long timestamp, boolean isCrossNodeTimestamp)
+    public MessageDeliveryTask(MessageIn message, int id)
     {
         assert message != null;
         this.message = message;
         this.id = id;
-        this.constructionTime = timestamp;
-        this.isCrossNodeTimestamp = isCrossNodeTimestamp;
     }
 
     public void run()
     {
         MessagingService.Verb verb = message.verb;
+        long timeTaken = System.currentTimeMillis() - message.constructionTime.timestamp;
         if (MessagingService.DROPPABLE_VERBS.contains(verb)
-            && System.currentTimeMillis() > constructionTime + message.getTimeout())
+            && timeTaken > message.getTimeout())
         {
-            MessagingService.instance().incrementDroppedMessages(verb, isCrossNodeTimestamp);
+            MessagingService.instance().incrementDroppedMessages(message, timeTaken);
             return;
         }
 
@@ -83,7 +80,7 @@
         }
 
         if (GOSSIP_VERBS.contains(message.verb))
-            Gossiper.instance.setLastProcessedMessageAt(constructionTime);
+            Gossiper.instance.setLastProcessedMessageAt(message.constructionTime.timestamp);
     }
 
     private void handleFailure(Throwable t)

diff --git a/src/java/org/apache/cassandra/net/MessageIn.java b/src/java/org/apache/cassandra/net/MessageIn.java
index 64b8e81..014ee93 100644
--- a/src/java/org/apache/cassandra/net/MessageIn.java
+++ b/src/java/org/apache/cassandra/net/MessageIn.java

@@ -24,40 +24,55 @@
 
 import com.google.common.collect.ImmutableMap;
 
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
 import org.apache.cassandra.concurrent.Stage;
 import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.db.monitoring.ConstructionTime;
+import org.apache.cassandra.db.monitoring.MonitorableImpl;
 import org.apache.cassandra.io.IVersionedSerializer;
 import org.apache.cassandra.io.util.DataInputPlus;
 import org.apache.cassandra.io.util.FileUtils;
 
 public class MessageIn<T>
 {
-    private static final Logger logger = LoggerFactory.getLogger(MessageIn.class);
-
     public final InetAddress from;
     public final T payload;
     public final Map<String, byte[]> parameters;
     public final MessagingService.Verb verb;
     public final int version;
+    public final ConstructionTime constructionTime;
 
-    private MessageIn(InetAddress from, T payload, Map<String, byte[]> parameters, MessagingService.Verb verb, int version)
+    private MessageIn(InetAddress from,
+                      T payload,
+                      Map<String, byte[]> parameters,
+                      MessagingService.Verb verb,
+                      int version,
+                      ConstructionTime constructionTime)
     {
         this.from = from;
         this.payload = payload;
         this.parameters = parameters;
         this.verb = verb;
         this.version = version;
+        this.constructionTime = constructionTime;
     }
 
-    public static <T> MessageIn<T> create(InetAddress from, T payload, Map<String, byte[]> parameters, MessagingService.Verb verb, int version)
+    public static <T> MessageIn<T> create(InetAddress from,
+                                          T payload,
+                                          Map<String, byte[]> parameters,
+                                          MessagingService.Verb verb,
+                                          int version,
+                                          ConstructionTime constructionTime)
     {
-        return new MessageIn<T>(from, payload, parameters, verb, version);
+        return new MessageIn<>(from, payload, parameters, verb, version, constructionTime);
     }
 
     public static <T2> MessageIn<T2> read(DataInputPlus in, int version, int id) throws IOException
     {
+        return read(in, version, id, new ConstructionTime());
+    }
+
+    public static <T2> MessageIn<T2> read(DataInputPlus in, int version, int id, ConstructionTime constructionTime) throws IOException
+    {
         InetAddress from = CompactEndpointSerializationHelper.deserialize(in);
 
         MessagingService.Verb verb = MessagingService.Verb.values()[in.readInt()];
@@ -94,9 +109,34 @@
             serializer = (IVersionedSerializer<T2>) callback.serializer;
         }
         if (payloadSize == 0 || serializer == null)
-            return create(from, null, parameters, verb, version);
+            return create(from, null, parameters, verb, version, constructionTime);
+
         T2 payload = serializer.deserialize(in, version);
-        return MessageIn.create(from, payload, parameters, verb, version);
+        return MessageIn.create(from, payload, parameters, verb, version, constructionTime);
+    }
+
+    public static ConstructionTime createTimestamp()
+    {
+        return new ConstructionTime();
+    }
+
+    public static ConstructionTime readTimestamp(InetAddress from, DataInputPlus input, long timestamp) throws IOException
+    {
+        // make sure to readInt, even if cross_node_to is not enabled
+        int partial = input.readInt();
+        long crossNodeTimestamp = (timestamp & 0xFFFFFFFF00000000L) | (((partial & 0xFFFFFFFFL) << 2) >> 2);
+        if (timestamp > crossNodeTimestamp)
+        {
+            MessagingService.instance().metrics.addTimeTaken(from, timestamp - crossNodeTimestamp);
+        }
+        if(DatabaseDescriptor.hasCrossNodeTimeout())
+        {
+            return new ConstructionTime(crossNodeTimestamp, timestamp != crossNodeTimestamp);
+        }
+        else
+        {
+            return new ConstructionTime();
+        }
     }
 
     public Stage getMessageType()

diff --git a/src/java/org/apache/cassandra/net/MessageOut.java b/src/java/org/apache/cassandra/net/MessageOut.java
index a524e7a..bc5c41b 100644
--- a/src/java/org/apache/cassandra/net/MessageOut.java
+++ b/src/java/org/apache/cassandra/net/MessageOut.java

@@ -33,10 +33,6 @@
 import org.apache.cassandra.io.util.DataOutputPlus;
 import org.apache.cassandra.tracing.Tracing;
 import org.apache.cassandra.utils.FBUtilities;
-import org.apache.cassandra.utils.UUIDGen;
-
-import static org.apache.cassandra.tracing.Tracing.TRACE_HEADER;
-import static org.apache.cassandra.tracing.Tracing.TRACE_TYPE;
 import static org.apache.cassandra.tracing.Tracing.isTracing;
 
 public class MessageOut<T>
@@ -61,8 +57,7 @@
              payload,
              serializer,
              isTracing()
-                 ? ImmutableMap.of(TRACE_HEADER, UUIDGen.decompose(Tracing.instance.getSessionId()),
-                                   TRACE_TYPE, new byte[] { Tracing.TraceType.serialize(Tracing.instance.getTraceType()) })
+                 ? Tracing.instance.getTraceHeaders()
                  : Collections.<String, byte[]>emptyMap());
     }
 

diff --git a/src/java/org/apache/cassandra/net/MessagingService.java b/src/java/org/apache/cassandra/net/MessagingService.java
index d01419f..220fc66 100644
--- a/src/java/org/apache/cassandra/net/MessagingService.java
+++ b/src/java/org/apache/cassandra/net/MessagingService.java

@@ -67,6 +67,7 @@
 import org.apache.cassandra.locator.ILatencySubscriber;
 import org.apache.cassandra.metrics.ConnectionMetrics;
 import org.apache.cassandra.metrics.DroppedMessageMetrics;
+import org.apache.cassandra.metrics.MessagingMetrics;
 import org.apache.cassandra.repair.messages.RepairMessage;
 import org.apache.cassandra.security.SSLFactory;
 import org.apache.cassandra.service.*;
@@ -101,6 +102,8 @@
     private boolean allNodesAtLeast22 = true;
     private boolean allNodesAtLeast30 = true;
 
+    public final MessagingMetrics metrics = new MessagingMetrics();
+
     /* All verb handler identifiers */
     public enum Verb
     {
@@ -810,7 +813,7 @@
         }
     }
 
-    public void receive(MessageIn message, int id, long timestamp, boolean isCrossNodeTimestamp)
+    public void receive(MessageIn message, int id)
     {
         TraceState state = Tracing.instance.initializeFromMessage(message);
         if (state != null)
@@ -821,7 +824,7 @@
             if (!ms.allowIncomingMessage(message, id))
                 return;
 
-        Runnable runnable = new MessageDeliveryTask(message, id, timestamp, isCrossNodeTimestamp);
+        Runnable runnable = new MessageDeliveryTask(message, id);
         LocalAwareExecutorService stage = StageManager.getStage(message.getMessageType());
         assert stage != null : "No stage for message type " + message.verb;
 
@@ -956,17 +959,69 @@
         return versions.containsKey(endpoint);
     }
 
+    public void incrementDroppedMutations(Optional<IMutation> mutationOpt, long timeTaken)
+    {
+        if (mutationOpt.isPresent())
+        {
+            updateDroppedMutationCount(mutationOpt.get());
+        }
+        incrementDroppedMessages(Verb.MUTATION, timeTaken);
+    }
+
     public void incrementDroppedMessages(Verb verb)
     {
         incrementDroppedMessages(verb, false);
     }
 
+    public void incrementDroppedMessages(Verb verb, long timeTaken)
+    {
+        incrementDroppedMessages(verb, timeTaken, false);
+    }
+
+    public void incrementDroppedMessages(MessageIn message, long timeTaken)
+    {
+        if (message.payload instanceof IMutation)
+        {
+            updateDroppedMutationCount((IMutation) message.payload);
+        }
+        incrementDroppedMessages(message.verb, timeTaken, message.constructionTime.isCrossNode);
+    }
+
+    public void incrementDroppedMessages(Verb verb, long timeTaken, boolean isCrossNodeTimeout)
+    {
+        assert DROPPABLE_VERBS.contains(verb) : "Verb " + verb + " should not legally be dropped";
+        incrementDroppedMessages(droppedMessagesMap.get(verb), timeTaken, isCrossNodeTimeout);
+    }
+
     public void incrementDroppedMessages(Verb verb, boolean isCrossNodeTimeout)
     {
         assert DROPPABLE_VERBS.contains(verb) : "Verb " + verb + " should not legally be dropped";
         incrementDroppedMessages(droppedMessagesMap.get(verb), isCrossNodeTimeout);
     }
 
+    private void updateDroppedMutationCount(IMutation mutation)
+    {
+        assert mutation != null : "Mutation should not be null when updating dropped mutations count";
+
+        for (UUID columnFamilyId : mutation.getColumnFamilyIds())
+        {
+            ColumnFamilyStore cfs = Keyspace.open(mutation.getKeyspaceName()).getColumnFamilyStore(columnFamilyId);
+            if (cfs != null)
+            {
+                cfs.metric.droppedMutations.inc();
+            }
+        }
+    }
+
+    private void incrementDroppedMessages(DroppedMessages droppedMessages, long timeTaken, boolean isCrossNodeTimeout)
+    {
+        if (isCrossNodeTimeout)
+            droppedMessages.metrics.crossNodeDroppedLatency.update(timeTaken, TimeUnit.MILLISECONDS);
+        else
+            droppedMessages.metrics.internalDroppedLatency.update(timeTaken, TimeUnit.MILLISECONDS);
+        incrementDroppedMessages(droppedMessages, isCrossNodeTimeout);
+    }
+
     private void incrementDroppedMessages(DroppedMessages droppedMessages, boolean isCrossNodeTimeout)
     {
         droppedMessages.metrics.dropped.mark();
@@ -999,11 +1054,14 @@
             int droppedCrossNodeTimeout = droppedMessages.droppedCrossNodeTimeout.getAndSet(0);
             if (droppedInternalTimeout > 0 || droppedCrossNodeTimeout > 0)
             {
-                ret.add(String.format("%s messages were dropped in last %d ms: %d for internal timeout and %d for cross node timeout",
-                                      verb,
-                                      LOG_DROPPED_INTERVAL_IN_MS,
-                                      droppedInternalTimeout,
-                                      droppedCrossNodeTimeout));
+                ret.add(String.format("%s messages were dropped in last %d ms: %d for internal timeout and %d for cross node timeout."
+                                     + " Mean internal dropped latency: %d ms and Mean cross-node dropped latency: %d ms",
+                                     verb,
+                                     LOG_DROPPED_INTERVAL_IN_MS,
+                                     droppedInternalTimeout,
+                                     droppedCrossNodeTimeout,
+                                     TimeUnit.NANOSECONDS.toMillis((long)droppedMessages.metrics.internalDroppedLatency.getSnapshot().getMean()),
+                                     TimeUnit.NANOSECONDS.toMillis((long)droppedMessages.metrics.crossNodeDroppedLatency.getSnapshot().getMean())));
             }
         }
         return ret;

diff --git a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java
index f573787..c2d10fd 100644
--- a/src/java/org/apache/cassandra/net/OutboundTcpConnection.java
+++ b/src/java/org/apache/cassandra/net/OutboundTcpConnection.java

@@ -284,7 +284,7 @@
                 {
                     byte[] traceTypeBytes = qm.message.parameters.get(Tracing.TRACE_TYPE);
                     Tracing.TraceType traceType = traceTypeBytes == null ? Tracing.TraceType.QUERY : Tracing.TraceType.deserialize(traceTypeBytes[0]);
-                    TraceState.mutateWithTracing(ByteBuffer.wrap(sessionBytes), message, -1, traceType.getTTL());
+                    Tracing.instance.trace(ByteBuffer.wrap(sessionBytes), message, traceType.getTTL());
                 }
                 else
                 {

diff --git a/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java b/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java
index 2af0016..ecadf89 100644
--- a/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java
+++ b/src/java/org/apache/cassandra/net/OutboundTcpConnectionPool.java

@@ -122,8 +122,7 @@
         return newSocket(endPoint());
     }
 
-    // Closing the socket will close the underlying channel.
-    @SuppressWarnings("resource")
+    @SuppressWarnings("resource") // Closing the socket will close the underlying channel.
     public static Socket newSocket(InetAddress endpoint) throws IOException
     {
         // zero means 'bind on any available port.'

diff --git a/src/java/org/apache/cassandra/notifications/MemtableDiscardedNotification.java b/src/java/org/apache/cassandra/notifications/MemtableDiscardedNotification.java
new file mode 100644
index 0000000..778cad0
--- /dev/null
+++ b/src/java/org/apache/cassandra/notifications/MemtableDiscardedNotification.java

@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.notifications;
+
+import org.apache.cassandra.db.Memtable;
+
+public class MemtableDiscardedNotification implements INotification
+{
+    public final Memtable memtable;
+
+    public MemtableDiscardedNotification(Memtable discarded)
+    {
+        this.memtable = discarded;
+    }
+}

diff --git a/src/java/org/apache/cassandra/notifications/MemtableSwitchedNotification.java b/src/java/org/apache/cassandra/notifications/MemtableSwitchedNotification.java
new file mode 100644
index 0000000..946de4e
--- /dev/null
+++ b/src/java/org/apache/cassandra/notifications/MemtableSwitchedNotification.java

@@ -0,0 +1,30 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.notifications;
+
+import org.apache.cassandra.db.Memtable;
+
+public class MemtableSwitchedNotification implements INotification
+{
+    public final Memtable memtable;
+
+    public MemtableSwitchedNotification(Memtable switched)
+    {
+        this.memtable = switched;
+    }
+}

diff --git a/src/java/org/apache/cassandra/notifications/SSTableRepairStatusChanged.java b/src/java/org/apache/cassandra/notifications/SSTableRepairStatusChanged.java
index d1398bc..8c48fa8 100644
--- a/src/java/org/apache/cassandra/notifications/SSTableRepairStatusChanged.java
+++ b/src/java/org/apache/cassandra/notifications/SSTableRepairStatusChanged.java

@@ -24,10 +24,10 @@
 
 public class SSTableRepairStatusChanged implements INotification
 {
-    public final Collection<SSTableReader> sstable;
+    public final Collection<SSTableReader> sstables;
 
     public SSTableRepairStatusChanged(Collection<SSTableReader> repairStatusChanged)
     {
-        this.sstable = repairStatusChanged;
+        this.sstables = repairStatusChanged;
     }
 }

diff --git a/src/java/org/apache/cassandra/repair/LocalSyncTask.java b/src/java/org/apache/cassandra/repair/LocalSyncTask.java
index daace01..a92708f 100644
--- a/src/java/org/apache/cassandra/repair/LocalSyncTask.java
+++ b/src/java/org/apache/cassandra/repair/LocalSyncTask.java

@@ -73,7 +73,7 @@
             isIncremental = prs.isIncremental;
         }
         Tracing.traceRepair(message);
-        new StreamPlan("Repair", repairedAt, 1, false, isIncremental).listeners(this)
+        new StreamPlan("Repair", repairedAt, 1, false, isIncremental, false).listeners(this)
                                             .flushBeforeTransfer(true)
                                             // request ranges from the remote node
                                             .requestRanges(dst, preferred, desc.keyspace, differences, desc.columnFamily)
@@ -98,9 +98,9 @@
                 break;
             case FILE_PROGRESS:
                 ProgressInfo pi = ((StreamEvent.ProgressEvent) event).progress;
-                state.trace("{}/{} bytes ({}%) {} idx:{}{}",
-                            new Object[] { pi.currentBytes,
-                                           pi.totalBytes,
+                state.trace("{}/{} ({}%) {} idx:{}{}",
+                            new Object[] { FBUtilities.prettyPrintMemory(pi.currentBytes),
+                                           FBUtilities.prettyPrintMemory(pi.totalBytes),
                                            pi.currentBytes * 100 / pi.totalBytes,
                                            pi.direction == ProgressInfo.Direction.OUT ? "sent to" : "received from",
                                            pi.sessionIndex,

diff --git a/src/java/org/apache/cassandra/repair/RepairJob.java b/src/java/org/apache/cassandra/repair/RepairJob.java
index cba176c..454865b 100644
--- a/src/java/org/apache/cassandra/repair/RepairJob.java
+++ b/src/java/org/apache/cassandra/repair/RepairJob.java

@@ -85,7 +85,7 @@
             ListenableFuture<List<InetAddress>> allSnapshotTasks = Futures.allAsList(snapshotTasks);
             validations = Futures.transform(allSnapshotTasks, new AsyncFunction<List<InetAddress>, List<TreeResponse>>()
             {
-                public ListenableFuture<List<TreeResponse>> apply(List<InetAddress> endpoints) throws Exception
+                public ListenableFuture<List<TreeResponse>> apply(List<InetAddress> endpoints)
                 {
                     if (parallelismDegree == RepairParallelism.SEQUENTIAL)
                         return sendSequentialValidationRequest(endpoints);
@@ -103,7 +103,7 @@
         // When all validations complete, submit sync tasks
         ListenableFuture<List<SyncStat>> syncResults = Futures.transform(validations, new AsyncFunction<List<TreeResponse>, List<SyncStat>>()
         {
-            public ListenableFuture<List<SyncStat>> apply(List<TreeResponse> trees) throws Exception
+            public ListenableFuture<List<SyncStat>> apply(List<TreeResponse> trees)
             {
                 InetAddress local = FBUtilities.getLocalAddress();
 

diff --git a/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java b/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
index edcb4f9..312daed 100644
--- a/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java
+++ b/src/java/org/apache/cassandra/repair/RepairMessageVerbHandler.java

@@ -90,6 +90,7 @@
                                                                      desc.keyspace, desc.columnFamily), message.from, id);
                         return;
                     }
+
                     ActiveRepairService.ParentRepairSession prs = ActiveRepairService.instance.getParentRepairSession(desc.parentSessionId);
                     if (prs.isGlobal)
                     {
@@ -105,7 +106,7 @@
                                        !sstable.metadata.isIndex() && // exclude SSTables from 2i
                                        new Bounds<>(sstable.first.getToken(), sstable.last.getToken()).intersects(desc.ranges);
                             }
-                        }, true); //ephemeral snapshot, if repair fails, it will be cleaned next startup
+                        }, true, false); //ephemeral snapshot, if repair fails, it will be cleaned next startup
                     }
                     logger.debug("Enqueuing response to snapshot request {} to {}", desc.sessionId, message.from);
                     MessagingService.instance().sendReply(new MessageOut(MessagingService.Verb.INTERNAL_RESPONSE), id, message.from);
@@ -150,7 +151,7 @@
                         {
                             MessagingService.instance().sendReply(new MessageOut(MessagingService.Verb.INTERNAL_RESPONSE), id, message.from);
                         }
-                    }, MoreExecutors.sameThreadExecutor());
+                    }, MoreExecutors.directExecutor());
                     break;
 
                 case CLEANUP:

diff --git a/src/java/org/apache/cassandra/repair/RepairRunnable.java b/src/java/org/apache/cassandra/repair/RepairRunnable.java
index 21d0cd6..d099f72 100644
--- a/src/java/org/apache/cassandra/repair/RepairRunnable.java
+++ b/src/java/org/apache/cassandra/repair/RepairRunnable.java

@@ -194,7 +194,7 @@
         }
 
         final UUID parentSession = UUIDGen.getTimeUUID();
-        SystemDistributedKeyspace.startParentRepair(parentSession, keyspace, cfnames, options.getRanges());
+        SystemDistributedKeyspace.startParentRepair(parentSession, keyspace, cfnames, options);
         long repairedAt;
         try
         {
@@ -276,7 +276,7 @@
         ListenableFuture anticompactionResult = Futures.transform(allSessions, new AsyncFunction<List<RepairSessionResult>, Object>()
         {
             @SuppressWarnings("unchecked")
-            public ListenableFuture apply(List<RepairSessionResult> results) throws Exception
+            public ListenableFuture apply(List<RepairSessionResult> results)
             {
                 // filter out null(=failed) results and get successful ranges
                 for (RepairSessionResult sessionResult : results)

diff --git a/src/java/org/apache/cassandra/repair/StreamingRepairTask.java b/src/java/org/apache/cassandra/repair/StreamingRepairTask.java
index 25ef06e..b6936b6 100644
--- a/src/java/org/apache/cassandra/repair/StreamingRepairTask.java
+++ b/src/java/org/apache/cassandra/repair/StreamingRepairTask.java

@@ -62,7 +62,7 @@
             ActiveRepairService.ParentRepairSession prs = ActiveRepairService.instance.getParentRepairSession(desc.parentSessionId);
             isIncremental = prs.isIncremental;
         }
-        new StreamPlan("Repair", repairedAt, 1, false, isIncremental).listeners(this)
+        new StreamPlan("Repair", repairedAt, 1, false, isIncremental, false).listeners(this)
                                             .flushBeforeTransfer(true)
                                             // request ranges from the remote node
                                             .requestRanges(dest, preferred, desc.keyspace, request.ranges, desc.columnFamily)

diff --git a/src/java/org/apache/cassandra/repair/SystemDistributedKeyspace.java b/src/java/org/apache/cassandra/repair/SystemDistributedKeyspace.java
index 9cf6c3e..fbbc125 100644
--- a/src/java/org/apache/cassandra/repair/SystemDistributedKeyspace.java
+++ b/src/java/org/apache/cassandra/repair/SystemDistributedKeyspace.java

@@ -23,11 +23,15 @@
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.Collection;
+import java.util.Collections;
+import java.util.HashMap;
 import java.util.List;
+import java.util.Map;
 import java.util.Set;
 import java.util.UUID;
 
 import com.google.common.base.Joiner;
+import com.google.common.collect.Lists;
 import com.google.common.collect.Sets;
 
 import org.slf4j.Logger;
@@ -35,15 +39,19 @@
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.cql3.QueryProcessor;
+import org.apache.cassandra.cql3.UntypedResultSet;
 import org.apache.cassandra.db.ConsistencyLevel;
+import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.dht.Range;
 import org.apache.cassandra.dht.Token;
+import org.apache.cassandra.repair.messages.RepairOption;
 import org.apache.cassandra.schema.KeyspaceMetadata;
 import org.apache.cassandra.schema.KeyspaceParams;
 import org.apache.cassandra.schema.Tables;
-import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.FBUtilities;
 
+import static org.apache.cassandra.utils.ByteBufferUtil.bytes;
+
 public final class SystemDistributedKeyspace
 {
     private SystemDistributedKeyspace()
@@ -58,6 +66,8 @@
 
     public static final String PARENT_REPAIR_HISTORY = "parent_repair_history";
 
+    public static final String VIEW_BUILD_STATUS = "view_build_status";
+
     private static final CFMetaData RepairHistory =
         compile(REPAIR_HISTORY,
                 "Repair history",
@@ -90,8 +100,19 @@
                      + "exception_stacktrace text,"
                      + "requested_ranges set<text>,"
                      + "successful_ranges set<text>,"
+                     + "options map<text, text>,"
                      + "PRIMARY KEY (parent_id))");
 
+    private static final CFMetaData ViewBuildStatus =
+    compile(VIEW_BUILD_STATUS,
+            "Materialized View build status",
+            "CREATE TABLE %s ("
+                     + "keyspace_name text,"
+                     + "view_name text,"
+                     + "host_id uuid,"
+                     + "status text,"
+                     + "PRIMARY KEY ((keyspace_name, view_name), host_id))");
+
     private static CFMetaData compile(String name, String description, String schema)
     {
         return CFMetaData.compile(String.format(schema, name), NAME)
@@ -100,18 +121,43 @@
 
     public static KeyspaceMetadata metadata()
     {
-        return KeyspaceMetadata.create(NAME, KeyspaceParams.simple(3), Tables.of(RepairHistory, ParentRepairHistory));
+        return KeyspaceMetadata.create(NAME, KeyspaceParams.simple(3), Tables.of(RepairHistory, ParentRepairHistory, ViewBuildStatus));
     }
 
-    public static void startParentRepair(UUID parent_id, String keyspaceName, String[] cfnames, Collection<Range<Token>> ranges)
+    public static void startParentRepair(UUID parent_id, String keyspaceName, String[] cfnames, RepairOption options)
     {
-
-        String query = "INSERT INTO %s.%s (parent_id, keyspace_name, columnfamily_names, requested_ranges, started_at)"+
-                                 " VALUES (%s,        '%s',          { '%s' },           { '%s' },          toTimestamp(now()))";
-        String fmtQry = String.format(query, NAME, PARENT_REPAIR_HISTORY, parent_id.toString(), keyspaceName, Joiner.on("','").join(cfnames), Joiner.on("','").join(ranges));
+        Collection<Range<Token>> ranges = options.getRanges();
+        String query = "INSERT INTO %s.%s (parent_id, keyspace_name, columnfamily_names, requested_ranges, started_at,          options)"+
+                                 " VALUES (%s,        '%s',          { '%s' },           { '%s' },          toTimestamp(now()), { %s })";
+        String fmtQry = String.format(query,
+                                      NAME,
+                                      PARENT_REPAIR_HISTORY,
+                                      parent_id.toString(),
+                                      keyspaceName,
+                                      Joiner.on("','").join(cfnames),
+                                      Joiner.on("','").join(ranges),
+                                      toCQLMap(options.asMap(), RepairOption.RANGES_KEY, RepairOption.COLUMNFAMILIES_KEY));
         processSilent(fmtQry);
     }
 
+    private static String toCQLMap(Map<String, String> options, String ... ignore)
+    {
+        Set<String> toIgnore = Sets.newHashSet(ignore);
+        StringBuilder map = new StringBuilder();
+        boolean first = true;
+        for (Map.Entry<String, String> entry : options.entrySet())
+        {
+            if (!toIgnore.contains(entry.getKey()))
+            {
+                if (!first)
+                    map.append(',');
+                first = false;
+                map.append(String.format("'%s': '%s'", entry.getKey(), entry.getValue()));
+            }
+        }
+        return map.toString();
+    }
+
     public static void failParentRepair(UUID parent_id, Throwable t)
     {
         String query = "UPDATE %s.%s SET finished_at = toTimestamp(now()), exception_message=?, exception_stacktrace=? WHERE parent_id=%s";
@@ -192,6 +238,58 @@
         processSilent(fmtQry, t.getMessage(), sw.toString());
     }
 
+    public static void startViewBuild(String keyspace, String view, UUID hostId)
+    {
+        String query = "INSERT INTO %s.%s (keyspace_name, view_name, host_id, status) VALUES (?, ?, ?, ?)";
+        QueryProcessor.process(String.format(query, NAME, VIEW_BUILD_STATUS),
+                               ConsistencyLevel.ONE,
+                               Lists.newArrayList(bytes(keyspace),
+                                                  bytes(view),
+                                                  bytes(hostId),
+                                                  bytes(BuildStatus.STARTED.toString())));
+    }
+
+    public static void successfulViewBuild(String keyspace, String view, UUID hostId)
+    {
+        String query = "UPDATE %s.%s SET status = ? WHERE keyspace_name = ? AND view_name = ? AND host_id = ?";
+        QueryProcessor.process(String.format(query, NAME, VIEW_BUILD_STATUS),
+                               ConsistencyLevel.ONE,
+                               Lists.newArrayList(bytes(BuildStatus.SUCCESS.toString()),
+                                                  bytes(keyspace),
+                                                  bytes(view),
+                                                  bytes(hostId)));
+    }
+
+    public static Map<UUID, String> viewStatus(String keyspace, String view)
+    {
+        String query = "SELECT host_id, status FROM %s.%s WHERE keyspace_name = ? AND view_name = ?";
+        UntypedResultSet results;
+        try
+        {
+            results = QueryProcessor.execute(String.format(query, NAME, VIEW_BUILD_STATUS),
+                                             ConsistencyLevel.ONE,
+                                             keyspace,
+                                             view);
+        } catch (Exception e) {
+            return Collections.emptyMap();
+        }
+
+
+        Map<UUID, String> status = new HashMap<>();
+        for (UntypedResultSet.Row row : results)
+        {
+            status.put(row.getUUID("host_id"), row.getString("status"));
+        }
+        return status;
+    }
+
+    public static void setViewRemoved(String keyspaceName, String viewName)
+    {
+        String buildReq = "DELETE FROM %s.%s WHERE keyspace_name = ? AND view_name = ?";
+        QueryProcessor.executeInternal(String.format(buildReq, NAME, VIEW_BUILD_STATUS), keyspaceName, viewName);
+        forceBlockingFlush(VIEW_BUILD_STATUS);
+    }
+
     private static void processSilent(String fmtQry, String... values)
     {
         try
@@ -199,7 +297,7 @@
             List<ByteBuffer> valueList = new ArrayList<>();
             for (String v : values)
             {
-                valueList.add(ByteBufferUtil.bytes(v));
+                valueList.add(bytes(v));
             }
             QueryProcessor.process(fmtQry, ConsistencyLevel.ONE, valueList);
         }
@@ -209,9 +307,19 @@
         }
     }
 
+    public static void forceBlockingFlush(String table)
+    {
+        if (!Boolean.getBoolean("cassandra.unsafesystem"))
+            FBUtilities.waitOnFuture(Keyspace.open(NAME).getColumnFamilyStore(table).forceFlush());
+    }
 
     private enum RepairState
     {
         STARTED, SUCCESS, FAILED
     }
+
+    private enum BuildStatus
+    {
+        UNKNOWN, STARTED, SUCCESS
+    }
 }

diff --git a/src/java/org/apache/cassandra/repair/Validator.java b/src/java/org/apache/cassandra/repair/Validator.java
index 217c9de..e51dc0e 100644
--- a/src/java/org/apache/cassandra/repair/Validator.java
+++ b/src/java/org/apache/cassandra/repair/Validator.java

@@ -128,7 +128,7 @@
      * Called (in order) for every row present in the CF.
      * Hashes the row, and adds it to the tree being built.
      *
-     * @param row Row to add hash
+     * @param partition Partition to add hash
      */
     public void add(UnfilteredRowIterator partition)
     {

diff --git a/src/java/org/apache/cassandra/repair/messages/RepairOption.java b/src/java/org/apache/cassandra/repair/messages/RepairOption.java
index d50a2ed..843efde 100644
--- a/src/java/org/apache/cassandra/repair/messages/RepairOption.java
+++ b/src/java/org/apache/cassandra/repair/messages/RepairOption.java

@@ -19,6 +19,7 @@
 
 import java.util.*;
 
+import com.google.common.base.Joiner;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -28,7 +29,6 @@
 import org.apache.cassandra.dht.Range;
 import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.repair.RepairParallelism;
-import org.apache.cassandra.tools.nodetool.Repair;
 import org.apache.cassandra.utils.FBUtilities;
 
 /**
@@ -45,6 +45,7 @@
     public static final String DATACENTERS_KEY = "dataCenters";
     public static final String HOSTS_KEY = "hosts";
     public static final String TRACE_KEY = "trace";
+    public static final String SUB_RANGE_REPAIR_KEY = "sub_range_repair";
 
     // we don't want to push nodes too much for repair
     public static final int MAX_JOB_THREADS = 4;
@@ -317,4 +318,20 @@
                        ", # of ranges: " + ranges.size() +
                        ')';
     }
+
+    public Map<String, String> asMap()
+    {
+        Map<String, String> options = new HashMap<>();
+        options.put(PARALLELISM_KEY, parallelism.toString());
+        options.put(PRIMARY_RANGE_KEY, Boolean.toString(primaryRange));
+        options.put(INCREMENTAL_KEY, Boolean.toString(incremental));
+        options.put(JOB_THREADS_KEY, Integer.toString(jobThreads));
+        options.put(COLUMNFAMILIES_KEY, Joiner.on(",").join(columnFamilies));
+        options.put(DATACENTERS_KEY, Joiner.on(",").join(dataCenters));
+        options.put(HOSTS_KEY, Joiner.on(",").join(hosts));
+        options.put(SUB_RANGE_REPAIR_KEY, Boolean.toString(isSubrangeRepair));
+        options.put(TRACE_KEY, Boolean.toString(trace));
+        options.put(RANGES_KEY, Joiner.on(",").join(ranges));
+        return options;
+    }
 }

diff --git a/src/java/org/apache/cassandra/schema/CompressionParams.java b/src/java/org/apache/cassandra/schema/CompressionParams.java
index cd1686f..bd10f75 100644
--- a/src/java/org/apache/cassandra/schema/CompressionParams.java
+++ b/src/java/org/apache/cassandra/schema/CompressionParams.java

@@ -56,7 +56,7 @@
     public static final String CHUNK_LENGTH_IN_KB = "chunk_length_in_kb";
     public static final String ENABLED = "enabled";
 
-    public static final CompressionParams DEFAULT = new CompressionParams(LZ4Compressor.instance,
+    public static final CompressionParams DEFAULT = new CompressionParams(LZ4Compressor.create(Collections.<String, String>emptyMap()),
                                                                           DEFAULT_CHUNK_LENGTH,
                                                                           Collections.emptyMap());
 
@@ -139,7 +139,7 @@
 
     public static CompressionParams lz4(Integer chunkLength)
     {
-        return new CompressionParams(LZ4Compressor.instance, chunkLength, Collections.emptyMap());
+        return new CompressionParams(LZ4Compressor.create(Collections.emptyMap()), chunkLength, Collections.emptyMap());
     }
 
     public CompressionParams(String sstableCompressorClass, Integer chunkLength, Map<String, String> otherOptions) throws ConfigurationException
@@ -272,7 +272,7 @@
     private static Map<String, String> copyOptions(Map<? extends CharSequence, ? extends CharSequence> co)
     {
         if (co == null || co.isEmpty())
-            return Collections.<String, String>emptyMap();
+            return Collections.emptyMap();
 
         Map<String, String> compressionOptions = new HashMap<>();
         for (Map.Entry<? extends CharSequence, ? extends CharSequence> entry : co.entrySet())

diff --git a/src/java/org/apache/cassandra/schema/Functions.java b/src/java/org/apache/cassandra/schema/Functions.java
index c65f58d..a936d81 100644
--- a/src/java/org/apache/cassandra/schema/Functions.java
+++ b/src/java/org/apache/cassandra/schema/Functions.java

@@ -131,7 +131,7 @@
      */
     public static boolean typesMatch(AbstractType<?> t1, AbstractType<?> t2)
     {
-        return t1.asCQL3Type().toString().equals(t2.asCQL3Type().toString());
+        return t1.freeze().asCQL3Type().toString().equals(t2.freeze().asCQL3Type().toString());
     }
 
     public static boolean typesMatch(List<AbstractType<?>> t1, List<AbstractType<?>> t2)

diff --git a/src/java/org/apache/cassandra/schema/IndexMetadata.java b/src/java/org/apache/cassandra/schema/IndexMetadata.java
index 7c60a64..04e06ab 100644
--- a/src/java/org/apache/cassandra/schema/IndexMetadata.java
+++ b/src/java/org/apache/cassandra/schema/IndexMetadata.java

@@ -21,6 +21,7 @@
 import java.io.IOException;
 import java.lang.reflect.InvocationTargetException;
 import java.util.*;
+import java.util.regex.Pattern;
 import java.util.stream.Collectors;
 
 import com.google.common.base.Objects;
@@ -46,6 +47,10 @@
 public final class IndexMetadata
 {
     private static final Logger logger = LoggerFactory.getLogger(IndexMetadata.class);
+    
+    private static final Pattern PATTERN_NON_WORD_CHAR = Pattern.compile("\\W");
+    private static final Pattern PATTERN_WORD_CHARS = Pattern.compile("\\w+");
+
 
     public static final Serializer serializer = new Serializer();
 
@@ -127,15 +132,15 @@
 
     public static boolean isNameValid(String name)
     {
-        return name != null && !name.isEmpty() && name.matches("\\w+");
+        return name != null && !name.isEmpty() && PATTERN_WORD_CHARS.matcher(name).matches();
     }
 
     public static String getDefaultIndexName(String cfName, String root)
     {
         if (root == null)
-            return (cfName + "_" + "idx").replaceAll("\\W", "");
+            return PATTERN_NON_WORD_CHAR.matcher(cfName + "_" + "idx").replaceAll("");
         else
-            return (cfName + "_" + root + "_idx").replaceAll("\\W", "");
+            return PATTERN_NON_WORD_CHAR.matcher(cfName + "_" + root + "_idx").replaceAll("");
     }
 
     public void validate(CFMetaData cfm)

diff --git a/src/java/org/apache/cassandra/schema/Indexes.java b/src/java/org/apache/cassandra/schema/Indexes.java
index 49a1d3b..eb49d39 100644
--- a/src/java/org/apache/cassandra/schema/Indexes.java
+++ b/src/java/org/apache/cassandra/schema/Indexes.java

@@ -95,7 +95,7 @@
     /**
      * Get the index with the specified id
      *
-     * @param name a UUID which identifies an index
+     * @param id a UUID which identifies an index
      * @return an empty {@link Optional} if no index with the specified id is found; a non-empty optional of
      *         {@link IndexMetadata} otherwise
      */
@@ -107,7 +107,7 @@
 
     /**
      * Answer true if contains an index with the specified id.
-     * @param name a UUID which identifies an index.
+     * @param id a UUID which identifies an index.
      * @return true if an index with the specified id is found; false otherwise
      */
     public boolean has(UUID id)

diff --git a/src/java/org/apache/cassandra/schema/KeyspaceMetadata.java b/src/java/org/apache/cassandra/schema/KeyspaceMetadata.java
index 76ba27d..e6f7754 100644
--- a/src/java/org/apache/cassandra/schema/KeyspaceMetadata.java
+++ b/src/java/org/apache/cassandra/schema/KeyspaceMetadata.java

@@ -23,6 +23,7 @@
 
 import javax.annotation.Nullable;
 
+import com.google.common.base.MoreObjects;
 import com.google.common.base.Objects;
 import com.google.common.collect.Iterables;
 
@@ -154,14 +155,14 @@
     @Override
     public String toString()
     {
-        return Objects.toStringHelper(this)
-                      .add("name", name)
-                      .add("params", params)
-                      .add("tables", tables)
-                      .add("views", views)
-                      .add("functions", functions)
-                      .add("types", types)
-                      .toString();
+        return MoreObjects.toStringHelper(this)
+                          .add("name", name)
+                          .add("params", params)
+                          .add("tables", tables)
+                          .add("views", views)
+                          .add("functions", functions)
+                          .add("types", types)
+                          .toString();
     }
 
     public void validate()

diff --git a/src/java/org/apache/cassandra/schema/KeyspaceParams.java b/src/java/org/apache/cassandra/schema/KeyspaceParams.java
index c0e8916..63775fa 100644
--- a/src/java/org/apache/cassandra/schema/KeyspaceParams.java
+++ b/src/java/org/apache/cassandra/schema/KeyspaceParams.java

@@ -19,6 +19,7 @@
 
 import java.util.Map;
 
+import com.google.common.base.MoreObjects;
 import com.google.common.base.Objects;
 
 /**
@@ -102,9 +103,9 @@
     @Override
     public String toString()
     {
-        return Objects.toStringHelper(this)
-                      .add(Option.DURABLE_WRITES.toString(), durableWrites)
-                      .add(Option.REPLICATION.toString(), replication)
-                      .toString();
+        return MoreObjects.toStringHelper(this)
+                          .add(Option.DURABLE_WRITES.toString(), durableWrites)
+                          .add(Option.REPLICATION.toString(), replication)
+                          .toString();
     }
 }

diff --git a/src/java/org/apache/cassandra/schema/LegacySchemaMigrator.java b/src/java/org/apache/cassandra/schema/LegacySchemaMigrator.java
index b6d8d2b..93591f0 100644
--- a/src/java/org/apache/cassandra/schema/LegacySchemaMigrator.java
+++ b/src/java/org/apache/cassandra/schema/LegacySchemaMigrator.java

@@ -28,6 +28,7 @@
 
 import org.apache.cassandra.config.*;
 import org.apache.cassandra.cql3.ColumnIdentifier;
+import org.apache.cassandra.cql3.FieldIdentifier;
 import org.apache.cassandra.cql3.QueryProcessor;
 import org.apache.cassandra.cql3.UntypedResultSet;
 import org.apache.cassandra.cql3.functions.FunctionName;
@@ -718,6 +719,14 @@
 
         AbstractType<?> validator = parseType(row.getString("validator"));
 
+        // In the 2.x schema we didn't store UDT's with a FrozenType wrapper because they were implicitly frozen.  After
+        // CASSANDRA-7423 (non-frozen UDTs), this is no longer true, so we need to freeze UDTs and nested freezable
+        // types (UDTs and collections) to properly migrate the schema.  See CASSANDRA-11609 and CASSANDRA-11613.
+        if (validator.isUDT() && validator.isMultiCell())
+            validator = validator.freeze();
+        else
+            validator = validator.freezeNestedMulticellTypes();
+
         return new ColumnDefinition(keyspace, table, name, validator, componentIndex, kind);
     }
 
@@ -747,21 +756,26 @@
             if (row.has("index_options"))
                 indexOptions = fromJsonMap(row.getString("index_options"));
 
-            String indexName = null;
-            if (row.has("index_name"))
-                indexName = row.getString("index_name");
+            if (row.has("index_name")) 
+            {
+                String indexName = row.getString("index_name");
 
-            ColumnDefinition column = createColumnFromColumnRow(row,
-                                                                keyspace,
-                                                                table,
-                                                                rawComparator,
-                                                                rawSubComparator,
-                                                                isSuper,
-                                                                isCQLTable,
-                                                                isStaticCompactTable,
-                                                                needsUpgrade);
-
-            indexes.add(IndexMetadata.fromLegacyMetadata(cfm, column, indexName, kind, indexOptions));
+                ColumnDefinition column = createColumnFromColumnRow(row,
+                                                                    keyspace,
+                                                                    table,
+                                                                    rawComparator,
+                                                                    rawSubComparator,
+                                                                    isSuper,
+                                                                    isCQLTable,
+                                                                    isStaticCompactTable,
+                                                                    needsUpgrade);
+    
+                indexes.add(IndexMetadata.fromLegacyMetadata(cfm, column, indexName, kind, indexOptions));
+            } 
+            else 
+            {
+                logger.error("Failed to find index name for legacy migration of index on {}.{}", keyspace, table);
+            }
         }
 
         return indexes.build();
@@ -831,8 +845,8 @@
         DecoratedKey key = store.metadata.decorateKey(AsciiType.instance.fromString(keyspaceName));
         SinglePartitionReadCommand command = SinglePartitionReadCommand.create(store.metadata, nowInSec, key, slices);
 
-        try (OpOrder.Group op = store.readOrdering.start();
-             RowIterator partition = UnfilteredRowIterators.filter(command.queryMemtableAndDisk(store, op), nowInSec))
+        try (ReadExecutionController controller = command.executionController();
+             RowIterator partition = UnfilteredRowIterators.filter(command.queryMemtableAndDisk(store, controller), nowInSec))
         {
             return partition.next().primaryKeyLivenessInfo().timestamp();
         }
@@ -845,10 +859,10 @@
                               SystemKeyspace.LEGACY_USERTYPES);
         UntypedResultSet.Row row = query(query, keyspaceName, typeName).one();
 
-        List<ByteBuffer> names =
+        List<FieldIdentifier> names =
             row.getList("field_names", UTF8Type.instance)
                .stream()
-               .map(ByteBufferUtil::bytes)
+               .map(t -> FieldIdentifier.forInternalString(t))
                .collect(Collectors.toList());
 
         List<AbstractType<?>> types =
@@ -857,7 +871,7 @@
                .map(LegacySchemaMigrator::parseType)
                .collect(Collectors.toList());
 
-        return new UserType(keyspaceName, bytes(typeName), names, types);
+        return new UserType(keyspaceName, bytes(typeName), names, types, true);
     }
 
     /*

diff --git a/src/java/org/apache/cassandra/schema/SchemaKeyspace.java b/src/java/org/apache/cassandra/schema/SchemaKeyspace.java
index e3756ec..8e3961e 100644
--- a/src/java/org/apache/cassandra/schema/SchemaKeyspace.java
+++ b/src/java/org/apache/cassandra/schema/SchemaKeyspace.java

@@ -116,6 +116,7 @@
                 + "min_index_interval int,"
                 + "read_repair_chance double,"
                 + "speculative_retry text,"
+                + "cdc boolean,"
                 + "PRIMARY KEY ((keyspace_name), table_name))");
 
     private static final CFMetaData Columns =
@@ -179,6 +180,7 @@
                 + "min_index_interval int,"
                 + "read_repair_chance double,"
                 + "speculative_retry text,"
+                + "cdc boolean,"
                 + "PRIMARY KEY ((keyspace_name), view_name))");
 
     private static final CFMetaData Indexes =
@@ -303,8 +305,8 @@
                 continue;
 
             ReadCommand cmd = getReadCommandForTableSchema(table);
-            try (ReadOrderGroup orderGroup = cmd.startOrderGroup();
-                 PartitionIterator schema = cmd.executeInternal(orderGroup))
+            try (ReadExecutionController executionController = cmd.executionController();
+                 PartitionIterator schema = cmd.executeInternal(executionController))
             {
                 while (schema.hasNext())
                 {
@@ -351,7 +353,8 @@
     private static void convertSchemaToMutations(Map<DecoratedKey, Mutation> mutationMap, String schemaTableName)
     {
         ReadCommand cmd = getReadCommandForTableSchema(schemaTableName);
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup(); UnfilteredPartitionIterator iter = cmd.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController();
+             UnfilteredPartitionIterator iter = cmd.executeLocally(executionController))
         {
             while (iter.hasNext())
             {
@@ -368,7 +371,7 @@
                         mutationMap.put(key, mutation);
                     }
 
-                    mutation.add(PartitionUpdate.fromIterator(partition));
+                    mutation.add(PartitionUpdate.fromIterator(partition, cmd.columnFilter()));
                 }
             }
         }
@@ -432,7 +435,7 @@
     {
         RowUpdateBuilder adder = new RowUpdateBuilder(Types, timestamp, mutation)
                                  .clustering(type.getNameAsString())
-                                 .frozenList("field_names", type.fieldNames().stream().map(SchemaKeyspace::bbToString).collect(toList()))
+                                 .frozenList("field_names", type.fieldNames().stream().map(FieldIdentifier::toString).collect(toList()))
                                  .frozenList("field_types", type.fieldTypes().stream().map(AbstractType::asCQL3Type).map(CQL3Type::toString).collect(toList()));
 
         adder.build();
@@ -507,7 +510,8 @@
              .frozenMap("caching", params.caching.asMap())
              .frozenMap("compaction", params.compaction.asMap())
              .frozenMap("compression", params.compression.asMap())
-             .frozenMap("extensions", params.extensions);
+             .frozenMap("extensions", params.extensions)
+             .add("cdc", params.cdc);
     }
 
     public static Mutation makeUpdateTableMutation(KeyspaceMetadata keyspace,
@@ -947,6 +951,12 @@
         boolean isCompound = flags.contains(CFMetaData.Flag.COMPOUND);
 
         List<ColumnDefinition> columns = fetchColumns(keyspaceName, tableName, types);
+        if (!columns.stream().anyMatch(ColumnDefinition::isPartitionKey))
+        {
+            String msg = String.format("Table %s.%s did not have any partition key columns in the schema tables", keyspaceName, tableName);
+            throw new AssertionError(msg);
+        }
+
         Map<ByteBuffer, CFMetaData.DroppedColumn> droppedColumns = fetchDroppedColumns(keyspaceName, tableName);
         Indexes indexes = fetchIndexes(keyspaceName, tableName);
         Triggers triggers = fetchTriggers(keyspaceName, tableName);
@@ -985,6 +995,7 @@
                           .readRepairChance(row.getDouble("read_repair_chance"))
                           .crcCheckChance(row.getDouble("crc_check_chance"))
                           .speculativeRetry(SpeculativeRetryParam.fromString(row.getString("speculative_retry")))
+                          .cdc(row.has("cdc") ? row.getBoolean("cdc") : false)
                           .build();
     }
 

diff --git a/src/java/org/apache/cassandra/schema/TableParams.java b/src/java/org/apache/cassandra/schema/TableParams.java
index 29d3e29..02112af 100644
--- a/src/java/org/apache/cassandra/schema/TableParams.java
+++ b/src/java/org/apache/cassandra/schema/TableParams.java

@@ -26,6 +26,7 @@
 
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.utils.BloomCalculations;
+
 import static java.lang.String.format;
 
 public final class TableParams
@@ -48,7 +49,8 @@
         MIN_INDEX_INTERVAL,
         READ_REPAIR_CHANCE,
         SPECULATIVE_RETRY,
-        CRC_CHECK_CHANCE;
+        CRC_CHECK_CHANCE,
+        CDC;
 
         @Override
         public String toString()
@@ -82,6 +84,7 @@
     public final CompactionParams compaction;
     public final CompressionParams compression;
     public final ImmutableMap<String, ByteBuffer> extensions;
+    public final boolean cdc;
 
     private TableParams(Builder builder)
     {
@@ -102,6 +105,7 @@
         compaction = builder.compaction;
         compression = builder.compression;
         extensions = builder.extensions;
+        cdc = builder.cdc;
     }
 
     public static Builder builder()
@@ -125,7 +129,8 @@
                             .minIndexInterval(params.minIndexInterval)
                             .readRepairChance(params.readRepairChance)
                             .speculativeRetry(params.speculativeRetry)
-                            .extensions(params.extensions);
+                            .extensions(params.extensions)
+                            .cdc(params.cdc);
     }
 
     public void validate()
@@ -215,7 +220,8 @@
             && caching.equals(p.caching)
             && compaction.equals(p.compaction)
             && compression.equals(p.compression)
-            && extensions.equals(p.extensions);
+            && extensions.equals(p.extensions)
+            && cdc == p.cdc;
     }
 
     @Override
@@ -235,7 +241,8 @@
                                 caching,
                                 compaction,
                                 compression,
-                                extensions);
+                                extensions,
+                                cdc);
     }
 
     @Override
@@ -257,6 +264,7 @@
                           .add(Option.COMPACTION.toString(), compaction)
                           .add(Option.COMPRESSION.toString(), compression)
                           .add(Option.EXTENSIONS.toString(), extensions)
+                          .add(Option.CDC.toString(), cdc)
                           .toString();
     }
 
@@ -277,6 +285,7 @@
         private CompactionParams compaction = CompactionParams.DEFAULT;
         private CompressionParams compression = CompressionParams.DEFAULT;
         private ImmutableMap<String, ByteBuffer> extensions = ImmutableMap.of();
+        private boolean cdc;
 
         public Builder()
         {
@@ -371,6 +380,12 @@
             return this;
         }
 
+        public Builder cdc(boolean val)
+        {
+            cdc = val;
+            return this;
+        }
+
         public Builder extensions(Map<String, ByteBuffer> val)
         {
             extensions = ImmutableMap.copyOf(val);

diff --git a/src/java/org/apache/cassandra/schema/TriggerMetadata.java b/src/java/org/apache/cassandra/schema/TriggerMetadata.java
index 2e0d547..d985081 100644
--- a/src/java/org/apache/cassandra/schema/TriggerMetadata.java
+++ b/src/java/org/apache/cassandra/schema/TriggerMetadata.java

@@ -18,6 +18,7 @@
  */
 package org.apache.cassandra.schema;
 
+import com.google.common.base.MoreObjects;
 import com.google.common.base.Objects;
 
 public final class TriggerMetadata
@@ -64,9 +65,9 @@
     @Override
     public String toString()
     {
-        return Objects.toStringHelper(this)
-                      .add("name", name)
-                      .add("class", classOption)
-                      .toString();
+        return MoreObjects.toStringHelper(this)
+                          .add("name", name)
+                          .add("class", classOption)
+                          .toString();
     }
 }

diff --git a/src/java/org/apache/cassandra/schema/Types.java b/src/java/org/apache/cassandra/schema/Types.java
index 1b71364..25efd70 100644
--- a/src/java/org/apache/cassandra/schema/Types.java
+++ b/src/java/org/apache/cassandra/schema/Types.java

@@ -22,12 +22,9 @@
 
 import javax.annotation.Nullable;
 
-import com.google.common.collect.HashMultimap;
-import com.google.common.collect.ImmutableMap;
-import com.google.common.collect.MapDifference;
-import com.google.common.collect.Maps;
-import com.google.common.collect.Multimap;
+import com.google.common.collect.*;
 
+import org.apache.cassandra.cql3.FieldIdentifier;
 import org.apache.cassandra.cql3.CQL3Type;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.UserType;
@@ -139,7 +136,30 @@
     @Override
     public boolean equals(Object o)
     {
-        return this == o || (o instanceof Types && types.equals(((Types) o).types));
+        if (this == o)
+            return true;
+
+        if (!(o instanceof Types))
+            return false;
+
+        Types other = (Types) o;
+
+        if (types.size() != other.types.size())
+            return false;
+
+        Iterator<Map.Entry<ByteBuffer, UserType>> thisIter = this.types.entrySet().iterator();
+        Iterator<Map.Entry<ByteBuffer, UserType>> otherIter = other.types.entrySet().iterator();
+        while (thisIter.hasNext())
+        {
+            Map.Entry<ByteBuffer, UserType> thisNext = thisIter.next();
+            Map.Entry<ByteBuffer, UserType> otherNext = otherIter.next();
+            if (!thisNext.getKey().equals(otherNext.getKey()))
+                return false;
+
+            if (!thisNext.getValue().equals(otherNext.getValue(), true))  // ignore freezing
+                return false;
+        }
+        return true;
     }
 
     @Override
@@ -156,7 +176,7 @@
 
     public static final class Builder
     {
-        final ImmutableMap.Builder<ByteBuffer, UserType> types = ImmutableMap.builder();
+        final ImmutableSortedMap.Builder<ByteBuffer, UserType> types = ImmutableSortedMap.naturalOrder();
 
         private Builder()
         {
@@ -169,6 +189,7 @@
 
         public Builder add(UserType type)
         {
+            assert type.isMultiCell();
             types.put(type.name, type);
             return this;
         }
@@ -283,9 +304,9 @@
 
             UserType prepare(String keyspace, Types types)
             {
-                List<ByteBuffer> preparedFieldNames =
+                List<FieldIdentifier> preparedFieldNames =
                     fieldNames.stream()
-                              .map(ByteBufferUtil::bytes)
+                              .map(t -> FieldIdentifier.forInternalString(t))
                               .collect(toList());
 
                 List<AbstractType<?>> preparedFieldTypes =
@@ -293,7 +314,7 @@
                               .map(t -> t.prepareInternal(keyspace, types).getType())
                               .collect(toList());
 
-                return new UserType(keyspace, bytes(name), preparedFieldNames, preparedFieldTypes);
+                return new UserType(keyspace, bytes(name), preparedFieldNames, preparedFieldTypes, true);
             }
 
             @Override

diff --git a/src/java/org/apache/cassandra/security/CipherFactory.java b/src/java/org/apache/cassandra/security/CipherFactory.java
new file mode 100644
index 0000000..7c1495a
--- /dev/null
+++ b/src/java/org/apache/cassandra/security/CipherFactory.java

@@ -0,0 +1,175 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.security;
+
+import java.io.IOException;
+import java.lang.reflect.Constructor;
+import java.security.InvalidAlgorithmParameterException;
+import java.security.InvalidKeyException;
+import java.security.Key;
+import java.security.NoSuchAlgorithmException;
+import java.security.SecureRandom;
+import java.util.Arrays;
+import java.util.concurrent.ExecutionException;
+import javax.crypto.Cipher;
+import javax.crypto.NoSuchPaddingException;
+import javax.crypto.spec.IvParameterSpec;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.RemovalListener;
+import com.google.common.cache.RemovalNotification;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.TransparentDataEncryptionOptions;
+
+/**
+ * A factory for loading encryption keys from {@link KeyProvider} instances.
+ * Maintains a cache of loaded keys to avoid invoking the key provider on every call.
+ */
+public class CipherFactory
+{
+    private final Logger logger = LoggerFactory.getLogger(CipherFactory.class);
+
+    /**
+     * Keep around thread local instances of Cipher as they are quite expensive to instantiate (@code Cipher#getInstance).
+     * Bonus points if you can avoid calling (@code Cipher#init); hence, the point of the supporting struct
+     * for caching Cipher instances.
+     */
+    private static final ThreadLocal<CachedCipher> cipherThreadLocal = new ThreadLocal<>();
+
+    private final SecureRandom secureRandom;
+    private final LoadingCache<String, Key> cache;
+    private final int ivLength;
+    private final KeyProvider keyProvider;
+
+    public CipherFactory(TransparentDataEncryptionOptions options)
+    {
+        logger.info("initializing CipherFactory");
+        ivLength = options.iv_length;
+
+        try
+        {
+            secureRandom = SecureRandom.getInstance("SHA1PRNG");
+            Class<KeyProvider> keyProviderClass = (Class<KeyProvider>)Class.forName(options.key_provider.class_name);
+            Constructor ctor = keyProviderClass.getConstructor(TransparentDataEncryptionOptions.class);
+            keyProvider = (KeyProvider)ctor.newInstance(options);
+        }
+        catch (Exception e)
+        {
+            throw new RuntimeException("couldn't load cipher factory", e);
+        }
+
+        cache = CacheBuilder.newBuilder() // by default cache is unbounded
+                .maximumSize(64) // a value large enough that we should never even get close (so nothing gets evicted)
+                .concurrencyLevel(Runtime.getRuntime().availableProcessors())
+                .removalListener(new RemovalListener<String, Key>()
+                {
+                    public void onRemoval(RemovalNotification<String, Key> notice)
+                    {
+                        // maybe reload the key? (to avoid the reload being on the user's dime)
+                        logger.info("key {} removed from cipher key cache", notice.getKey());
+                    }
+                })
+                .build(new CacheLoader<String, Key>()
+                {
+                    @Override
+                    public Key load(String alias) throws Exception
+                    {
+                        logger.info("loading secret key for alias {}", alias);
+                        return keyProvider.getSecretKey(alias);
+                    }
+                });
+    }
+
+    public Cipher getEncryptor(String transformation, String keyAlias) throws IOException
+    {
+        byte[] iv = new byte[ivLength];
+        secureRandom.nextBytes(iv);
+        return buildCipher(transformation, keyAlias, iv, Cipher.ENCRYPT_MODE);
+    }
+
+    public Cipher getDecryptor(String transformation, String keyAlias, byte[] iv) throws IOException
+    {
+        assert iv != null && iv.length > 0 : "trying to decrypt, but the initialization vector is empty";
+        return buildCipher(transformation, keyAlias, iv, Cipher.DECRYPT_MODE);
+    }
+
+    @VisibleForTesting
+    Cipher buildCipher(String transformation, String keyAlias, byte[] iv, int cipherMode) throws IOException
+    {
+        try
+        {
+            CachedCipher cachedCipher = cipherThreadLocal.get();
+            if (cachedCipher != null)
+            {
+                Cipher cipher = cachedCipher.cipher;
+                // rigorous checks to make sure we've absolutely got the correct instance (with correct alg/key/iv/...)
+                if (cachedCipher.mode == cipherMode && cipher.getAlgorithm().equals(transformation)
+                    && cachedCipher.keyAlias.equals(keyAlias) && Arrays.equals(cipher.getIV(), iv))
+                    return cipher;
+            }
+
+            Key key = retrieveKey(keyAlias);
+            Cipher cipher = Cipher.getInstance(transformation);
+            cipher.init(cipherMode, key, new IvParameterSpec(iv));
+            cipherThreadLocal.set(new CachedCipher(cipherMode, keyAlias, cipher));
+            return cipher;
+        }
+        catch (NoSuchAlgorithmException | NoSuchPaddingException | InvalidAlgorithmParameterException | InvalidKeyException e)
+        {
+            logger.error("could not build cipher", e);
+            throw new IOException("cannot load cipher", e);
+        }
+    }
+
+    private Key retrieveKey(String keyAlias) throws IOException
+    {
+        try
+        {
+            return cache.get(keyAlias);
+        }
+        catch (ExecutionException e)
+        {
+            if (e.getCause() instanceof IOException)
+                throw (IOException)e.getCause();
+            throw new IOException("failed to load key from cache: " + keyAlias, e);
+        }
+    }
+
+    /**
+     * A simple struct to use with the thread local caching of Cipher as we can't get the mode (encrypt/decrypt) nor
+     * key_alias (or key!) from the Cipher itself to use for comparisons
+     */
+    private static class CachedCipher
+    {
+        public final int mode;
+        public final String keyAlias;
+        public final Cipher cipher;
+
+        private CachedCipher(int mode, String keyAlias, Cipher cipher)
+        {
+            this.mode = mode;
+            this.keyAlias = keyAlias;
+            this.cipher = cipher;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/security/EncryptionContext.java b/src/java/org/apache/cassandra/security/EncryptionContext.java
new file mode 100644
index 0000000..8176d60
--- /dev/null
+++ b/src/java/org/apache/cassandra/security/EncryptionContext.java

@@ -0,0 +1,172 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.security;
+
+import java.io.IOException;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.Map;
+import javax.crypto.Cipher;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Objects;
+
+import org.apache.cassandra.config.TransparentDataEncryptionOptions;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.io.compress.ICompressor;
+import org.apache.cassandra.io.compress.LZ4Compressor;
+import org.apache.cassandra.utils.Hex;
+
+/**
+ * A (largely) immutable wrapper for the application-wide file-level encryption settings.
+ */
+public class EncryptionContext
+{
+    public static final String ENCRYPTION_CIPHER = "encCipher";
+    public static final String ENCRYPTION_KEY_ALIAS = "encKeyAlias";
+    public static final String ENCRYPTION_IV = "encIV";
+
+    private final TransparentDataEncryptionOptions tdeOptions;
+    private final ICompressor compressor;
+    private final CipherFactory cipherFactory;
+
+    private final byte[] iv;
+    private final int chunkLength;
+
+    public EncryptionContext()
+    {
+        this(new TransparentDataEncryptionOptions());
+    }
+
+    public EncryptionContext(TransparentDataEncryptionOptions tdeOptions)
+    {
+        this(tdeOptions, null, true);
+    }
+
+    @VisibleForTesting
+    public EncryptionContext(TransparentDataEncryptionOptions tdeOptions, byte[] iv, boolean init)
+    {
+        this.tdeOptions = tdeOptions;
+        compressor = LZ4Compressor.create(Collections.<String, String>emptyMap());
+        chunkLength = tdeOptions.chunk_length_kb * 1024;
+        this.iv = iv;
+
+        // always attempt to load the cipher factory, as we could be in the situation where the user has disabled encryption,
+        // but has existing commitlogs and sstables on disk that are still encrypted (and still need to be read)
+        CipherFactory factory = null;
+
+        if (tdeOptions.enabled && init)
+        {
+            try
+            {
+                factory = new CipherFactory(tdeOptions);
+            }
+            catch (Exception e)
+            {
+                throw new ConfigurationException("failed to load key provider for transparent data encryption", e);
+            }
+        }
+
+        cipherFactory = factory;
+    }
+
+    public ICompressor getCompressor()
+    {
+        return compressor;
+    }
+
+    public Cipher getEncryptor() throws IOException
+    {
+        return cipherFactory.getEncryptor(tdeOptions.cipher, tdeOptions.key_alias);
+    }
+
+    public Cipher getDecryptor() throws IOException
+    {
+        if (iv == null || iv.length == 0)
+            throw new IllegalStateException("no initialization vector (IV) found in this context");
+        return cipherFactory.getDecryptor(tdeOptions.cipher, tdeOptions.key_alias, iv);
+    }
+
+    public boolean isEnabled()
+    {
+        return tdeOptions.enabled;
+    }
+
+    public int getChunkLength()
+    {
+        return chunkLength;
+    }
+
+    public byte[] getIV()
+    {
+        return iv;
+    }
+
+    public TransparentDataEncryptionOptions getTransparentDataEncryptionOptions()
+    {
+        return tdeOptions;
+    }
+
+    public boolean equals(Object o)
+    {
+        return o instanceof EncryptionContext && equals((EncryptionContext) o);
+    }
+
+    public boolean equals(EncryptionContext other)
+    {
+        return Objects.equal(tdeOptions, other.tdeOptions)
+               && Objects.equal(compressor, other.compressor)
+               && Arrays.equals(iv, other.iv);
+    }
+
+    public Map<String, String> toHeaderParameters()
+    {
+        Map<String, String> map = new HashMap<>(3);
+        // add compression options, someday ...
+        if (tdeOptions.enabled)
+        {
+            map.put(ENCRYPTION_CIPHER, tdeOptions.cipher);
+            map.put(ENCRYPTION_KEY_ALIAS, tdeOptions.key_alias);
+
+            if (iv != null && iv.length > 0)
+                map.put(ENCRYPTION_IV, Hex.bytesToHex(iv));
+        }
+        return map;
+    }
+
+    /**
+     * If encryption headers are found in the {@code parameters},
+     * those headers are merged with the application-wide {@code encryptionContext}.
+     */
+    public static EncryptionContext createFromMap(Map<?, ?> parameters, EncryptionContext encryptionContext)
+    {
+        if (parameters == null || parameters.isEmpty())
+            return new EncryptionContext(new TransparentDataEncryptionOptions(false));
+
+        String keyAlias = (String)parameters.get(ENCRYPTION_KEY_ALIAS);
+        String cipher = (String)parameters.get(ENCRYPTION_CIPHER);
+        String ivString = (String)parameters.get(ENCRYPTION_IV);
+        if (keyAlias == null || cipher == null)
+            return new EncryptionContext(new TransparentDataEncryptionOptions(false));
+
+        TransparentDataEncryptionOptions tdeOptions = new TransparentDataEncryptionOptions(cipher, keyAlias, encryptionContext.getTransparentDataEncryptionOptions().key_provider);
+        byte[] iv = ivString != null ? Hex.hexToBytes(ivString) : null;
+        return new EncryptionContext(tdeOptions, iv, true);
+    }
+}

diff --git a/src/java/org/apache/cassandra/security/EncryptionUtils.java b/src/java/org/apache/cassandra/security/EncryptionUtils.java
new file mode 100644
index 0000000..7e72b3e
--- /dev/null
+++ b/src/java/org/apache/cassandra/security/EncryptionUtils.java

@@ -0,0 +1,313 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.security;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.ReadableByteChannel;
+import java.nio.channels.WritableByteChannel;
+import javax.crypto.BadPaddingException;
+import javax.crypto.Cipher;
+import javax.crypto.IllegalBlockSizeException;
+import javax.crypto.ShortBufferException;
+
+import com.google.common.base.Preconditions;
+
+import org.apache.cassandra.db.commitlog.EncryptedSegment;
+import org.apache.cassandra.io.compress.ICompressor;
+import org.apache.cassandra.io.util.ChannelProxy;
+import org.apache.cassandra.io.util.FileDataInput;
+import org.apache.cassandra.utils.ByteBufferUtil;
+
+/**
+ * Encryption and decryption functions specific to the commit log.
+ * See comments in {@link EncryptedSegment} for details on the binary format.
+ * The normal, and expected, invocation pattern is to compress then encrypt the data on the encryption pass,
+ * then decrypt and uncompress the data on the decrypt pass.
+ */
+public class EncryptionUtils
+{
+    public static final int COMPRESSED_BLOCK_HEADER_SIZE = 4;
+    public static final int ENCRYPTED_BLOCK_HEADER_SIZE = 8;
+
+    private static final ThreadLocal<ByteBuffer> reusableBuffers = new ThreadLocal<ByteBuffer>()
+    {
+        protected ByteBuffer initialValue()
+        {
+            return ByteBuffer.allocate(ENCRYPTED_BLOCK_HEADER_SIZE);
+        }
+    };
+
+    /**
+     * Compress the raw data, as well as manage sizing of the {@code outputBuffer}; if the buffer is not big enough,
+     * deallocate current, and allocate a large enough buffer.
+     * Write the two header lengths (plain text length, compressed length) to the beginning of the buffer as we want those
+     * values encapsulated in the encrypted block, as well.
+     *
+     * @return the byte buffer that was actaully written to; it may be the {@code outputBuffer} if it had enough capacity,
+     * or it may be a new, larger instance. Callers should capture the return buffer (if calling multiple times).
+     */
+    public static ByteBuffer compress(ByteBuffer inputBuffer, ByteBuffer outputBuffer, boolean allowBufferResize, ICompressor compressor) throws IOException
+    {
+        int inputLength = inputBuffer.remaining();
+        final int compressedLength = compressor.initialCompressedBufferLength(inputLength);
+        outputBuffer = ByteBufferUtil.ensureCapacity(outputBuffer, compressedLength + COMPRESSED_BLOCK_HEADER_SIZE, allowBufferResize);
+
+        outputBuffer.putInt(inputLength);
+        compressor.compress(inputBuffer, outputBuffer);
+        outputBuffer.flip();
+
+        return outputBuffer;
+    }
+
+    /**
+     * Encrypt the input data, and writes out to the same input buffer; if the buffer is not big enough,
+     * deallocate current, and allocate a large enough buffer.
+     * Writes the cipher text and headers out to the channel, as well.
+     *
+     * Note: channel is a parameter as we cannot write header info to the output buffer as we assume the input and output
+     * buffers can be the same buffer (and writing the headers to a shared buffer will corrupt any input data). Hence,
+     * we write out the headers directly to the channel, and then the cipher text (once encrypted).
+     */
+    public static ByteBuffer encryptAndWrite(ByteBuffer inputBuffer, WritableByteChannel channel, boolean allowBufferResize, Cipher cipher) throws IOException
+    {
+        final int plainTextLength = inputBuffer.remaining();
+        final int encryptLength = cipher.getOutputSize(plainTextLength);
+        ByteBuffer outputBuffer = inputBuffer.duplicate();
+        outputBuffer = ByteBufferUtil.ensureCapacity(outputBuffer, encryptLength, allowBufferResize);
+
+        // it's unfortunate that we need to allocate a small buffer here just for the headers, but if we reuse the input buffer
+        // for the output, then we would overwrite the first n bytes of the real data with the header data.
+        ByteBuffer intBuf = ByteBuffer.allocate(ENCRYPTED_BLOCK_HEADER_SIZE);
+        intBuf.putInt(0, encryptLength);
+        intBuf.putInt(4, plainTextLength);
+        channel.write(intBuf);
+
+        try
+        {
+            cipher.doFinal(inputBuffer, outputBuffer);
+        }
+        catch (ShortBufferException | IllegalBlockSizeException | BadPaddingException e)
+        {
+            throw new IOException("failed to encrypt commit log block", e);
+        }
+
+        outputBuffer.position(0).limit(encryptLength);
+        channel.write(outputBuffer);
+        outputBuffer.position(0).limit(encryptLength);
+
+        return outputBuffer;
+    }
+
+    public static ByteBuffer encrypt(ByteBuffer inputBuffer, ByteBuffer outputBuffer, boolean allowBufferResize, Cipher cipher) throws IOException
+    {
+        Preconditions.checkNotNull(outputBuffer, "output buffer may not be null");
+        return encryptAndWrite(inputBuffer, new ChannelAdapter(outputBuffer), allowBufferResize, cipher);
+    }
+
+    /**
+     * Decrypt the input data, as well as manage sizing of the {@code outputBuffer}; if the buffer is not big enough,
+     * deallocate current, and allocate a large enough buffer.
+     *
+     * @return the byte buffer that was actaully written to; it may be the {@code outputBuffer} if it had enough capacity,
+     * or it may be a new, larger instance. Callers should capture the return buffer (if calling multiple times).
+     */
+    public static ByteBuffer decrypt(ReadableByteChannel channel, ByteBuffer outputBuffer, boolean allowBufferResize, Cipher cipher) throws IOException
+    {
+        ByteBuffer metadataBuffer = reusableBuffers.get();
+        if (metadataBuffer.capacity() < ENCRYPTED_BLOCK_HEADER_SIZE)
+        {
+            metadataBuffer = ByteBufferUtil.ensureCapacity(metadataBuffer, ENCRYPTED_BLOCK_HEADER_SIZE, true);
+            reusableBuffers.set(metadataBuffer);
+        }
+
+        metadataBuffer.position(0).limit(ENCRYPTED_BLOCK_HEADER_SIZE);
+        channel.read(metadataBuffer);
+        if (metadataBuffer.remaining() < ENCRYPTED_BLOCK_HEADER_SIZE)
+            throw new IllegalStateException("could not read encrypted blocked metadata header");
+        int encryptedLength = metadataBuffer.getInt();
+        // this is the length of the compressed data
+        int plainTextLength = metadataBuffer.getInt();
+
+        outputBuffer = ByteBufferUtil.ensureCapacity(outputBuffer, Math.max(plainTextLength, encryptedLength), allowBufferResize);
+        outputBuffer.position(0).limit(encryptedLength);
+        channel.read(outputBuffer);
+
+        ByteBuffer dupe = outputBuffer.duplicate();
+        dupe.clear();
+
+        try
+        {
+            cipher.doFinal(outputBuffer, dupe);
+        }
+        catch (ShortBufferException | IllegalBlockSizeException | BadPaddingException e)
+        {
+            throw new IOException("failed to decrypt commit log block", e);
+        }
+
+        dupe.position(0).limit(plainTextLength);
+        return dupe;
+    }
+
+    // path used when decrypting commit log files
+    public static ByteBuffer decrypt(FileDataInput fileDataInput, ByteBuffer outputBuffer, boolean allowBufferResize, Cipher cipher) throws IOException
+    {
+        return decrypt(new DataInputReadChannel(fileDataInput), outputBuffer, allowBufferResize, cipher);
+    }
+
+    /**
+     * Uncompress the input data, as well as manage sizing of the {@code outputBuffer}; if the buffer is not big enough,
+     * deallocate current, and allocate a large enough buffer.
+     *
+     * @return the byte buffer that was actaully written to; it may be the {@code outputBuffer} if it had enough capacity,
+     * or it may be a new, larger instance. Callers should capture the return buffer (if calling multiple times).
+     */
+    public static ByteBuffer uncompress(ByteBuffer inputBuffer, ByteBuffer outputBuffer, boolean allowBufferResize, ICompressor compressor) throws IOException
+    {
+        int outputLength = inputBuffer.getInt();
+        outputBuffer = ByteBufferUtil.ensureCapacity(outputBuffer, outputLength, allowBufferResize);
+        compressor.uncompress(inputBuffer, outputBuffer);
+        outputBuffer.position(0).limit(outputLength);
+
+        return outputBuffer;
+    }
+
+    public static int uncompress(byte[] input, int inputOffset, int inputLength, byte[] output, int outputOffset, ICompressor compressor) throws IOException
+    {
+        int outputLength = readInt(input, inputOffset);
+        inputOffset += 4;
+        inputLength -= 4;
+
+        if (output.length - outputOffset < outputLength)
+        {
+            String msg = String.format("buffer to uncompress into is not large enough; buf size = %d, buf offset = %d, target size = %s",
+                                       output.length, outputOffset, outputLength);
+            throw new IllegalStateException(msg);
+        }
+
+        return compressor.uncompress(input, inputOffset, inputLength, output, outputOffset);
+    }
+
+    private static int readInt(byte[] input, int inputOffset)
+    {
+        return  (input[inputOffset + 3] & 0xFF)
+                | ((input[inputOffset + 2] & 0xFF) << 8)
+                | ((input[inputOffset + 1] & 0xFF) << 16)
+                | ((input[inputOffset] & 0xFF) << 24);
+    }
+
+    /**
+     * A simple {@link java.nio.channels.Channel} adapter for ByteBuffers.
+     */
+    private static final class ChannelAdapter implements WritableByteChannel
+    {
+        private final ByteBuffer buffer;
+
+        private ChannelAdapter(ByteBuffer buffer)
+        {
+            this.buffer = buffer;
+        }
+
+        public int write(ByteBuffer src)
+        {
+            int count = src.remaining();
+            buffer.put(src);
+            return count;
+        }
+
+        public boolean isOpen()
+        {
+            return true;
+        }
+
+        public void close()
+        {
+            // nop
+        }
+    }
+
+    private static class DataInputReadChannel implements ReadableByteChannel
+    {
+        private final FileDataInput fileDataInput;
+
+        private DataInputReadChannel(FileDataInput dataInput)
+        {
+            this.fileDataInput = dataInput;
+        }
+
+        public int read(ByteBuffer dst) throws IOException
+        {
+            int readLength = dst.remaining();
+            // we should only be performing encrypt/decrypt operations with on-heap buffers, so calling BB.array() should be legit here
+            fileDataInput.readFully(dst.array(), dst.position(), readLength);
+            return readLength;
+        }
+
+        public boolean isOpen()
+        {
+            try
+            {
+                return fileDataInput.isEOF();
+            }
+            catch (IOException e)
+            {
+                return true;
+            }
+        }
+
+        public void close()
+        {
+            // nop
+        }
+    }
+
+    public static class ChannelProxyReadChannel implements ReadableByteChannel
+    {
+        private final ChannelProxy channelProxy;
+        private volatile long currentPosition;
+
+        public ChannelProxyReadChannel(ChannelProxy channelProxy, long currentPosition)
+        {
+            this.channelProxy = channelProxy;
+            this.currentPosition = currentPosition;
+        }
+
+        public int read(ByteBuffer dst) throws IOException
+        {
+            int bytesRead = channelProxy.read(dst, currentPosition);
+            dst.flip();
+            currentPosition += bytesRead;
+            return bytesRead;
+        }
+
+        public long getCurrentPosition()
+        {
+            return currentPosition;
+        }
+
+        public boolean isOpen()
+        {
+            return channelProxy.isCleanedUp();
+        }
+
+        public void close()
+        {
+            // nop
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/security/JKSKeyProvider.java b/src/java/org/apache/cassandra/security/JKSKeyProvider.java
new file mode 100644
index 0000000..db7a2b9
--- /dev/null
+++ b/src/java/org/apache/cassandra/security/JKSKeyProvider.java

@@ -0,0 +1,83 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.security;
+
+import java.io.FileInputStream;
+import java.io.IOException;
+import java.security.Key;
+import java.security.KeyStore;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.TransparentDataEncryptionOptions;
+
+/**
+ * A {@code KeyProvider} that retrieves keys from a java keystore.
+ */
+public class JKSKeyProvider implements KeyProvider
+{
+    private static final Logger logger = LoggerFactory.getLogger(JKSKeyProvider.class);
+    static final String PROP_KEYSTORE = "keystore";
+    static final String PROP_KEYSTORE_PW = "keystore_password";
+    static final String PROP_KEYSTORE_TYPE = "store_type";
+    static final String PROP_KEY_PW = "key_password";
+
+    private final KeyStore store;
+    private final boolean isJceks;
+    private final TransparentDataEncryptionOptions options;
+
+    public JKSKeyProvider(TransparentDataEncryptionOptions options)
+    {
+        this.options = options;
+        logger.info("initializing keystore from file {}", options.get(PROP_KEYSTORE));
+        try (FileInputStream inputStream = new FileInputStream(options.get(PROP_KEYSTORE)))
+        {
+            store = KeyStore.getInstance(options.get(PROP_KEYSTORE_TYPE));
+            store.load(inputStream, options.get(PROP_KEYSTORE_PW).toCharArray());
+            isJceks = store.getType().equalsIgnoreCase("jceks");
+        }
+        catch (Exception e)
+        {
+            throw new RuntimeException("couldn't load keystore", e);
+        }
+    }
+
+    public Key getSecretKey(String keyAlias) throws IOException
+    {
+        // there's a lovely behavior with jceks files that all aliases are lower-cased
+        if (isJceks)
+            keyAlias = keyAlias.toLowerCase();
+
+        Key key;
+        try
+        {
+            String password = options.get(PROP_KEY_PW);
+            if (password == null || password.isEmpty())
+                password = options.get(PROP_KEYSTORE_PW);
+            key = store.getKey(keyAlias, password.toCharArray());
+        }
+        catch (Exception e)
+        {
+            throw new IOException("unable to load key from keystore");
+        }
+        if (key == null)
+            throw new IOException(String.format("key %s was not found in keystore", keyAlias));
+        return key;
+    }
+}

diff --git a/src/java/org/apache/cassandra/security/KeyProvider.java b/src/java/org/apache/cassandra/security/KeyProvider.java
new file mode 100644
index 0000000..f380aed
--- /dev/null
+++ b/src/java/org/apache/cassandra/security/KeyProvider.java

@@ -0,0 +1,33 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.security;
+
+import java.io.IOException;
+import java.security.Key;
+
+/**
+ * Customizable key retrieval mechanism. Implementations should expect that retrieved keys will be cached.
+ * Further, each key will be requested non-concurrently (that is, no stampeding herds for the same key), although
+ * unique keys may be requested concurrently (unless you mark {@code getSecretKey} synchronized).
+ *
+ * Implementations must provide a constructor that accepts {@code TransparentDataEncryptionOptions} as the sole parameter.
+ */
+public interface KeyProvider
+{
+    Key getSecretKey(String alias) throws IOException;
+}

diff --git a/src/java/org/apache/cassandra/security/SSLFactory.java b/src/java/org/apache/cassandra/security/SSLFactory.java
index a327de9..2e59b06 100644
--- a/src/java/org/apache/cassandra/security/SSLFactory.java
+++ b/src/java/org/apache/cassandra/security/SSLFactory.java

@@ -31,6 +31,7 @@
 
 import javax.net.ssl.KeyManagerFactory;
 import javax.net.ssl.SSLContext;
+import javax.net.ssl.SSLParameters;
 import javax.net.ssl.SSLServerSocket;
 import javax.net.ssl.SSLSocket;
 import javax.net.ssl.TrustManager;
@@ -53,7 +54,6 @@
 public final class SSLFactory
 {
     private static final Logger logger = LoggerFactory.getLogger(SSLFactory.class);
-    public static final String[] ACCEPTED_PROTOCOLS = new String[] {"SSLv2Hello", "TLSv1", "TLSv1.1", "TLSv1.2"};
     private static boolean checkedExpiry = false;
 
     public static SSLServerSocket getServerSocket(EncryptionOptions options, InetAddress address, int port) throws IOException
@@ -61,11 +61,9 @@
         SSLContext ctx = createSSLContext(options, true);
         SSLServerSocket serverSocket = (SSLServerSocket)ctx.getServerSocketFactory().createServerSocket();
         serverSocket.setReuseAddress(true);
-        String[] suites = filterCipherSuites(serverSocket.getSupportedCipherSuites(), options.cipher_suites);
-        serverSocket.setEnabledCipherSuites(suites);
-        serverSocket.setNeedClientAuth(options.require_client_auth);
-        serverSocket.setEnabledProtocols(ACCEPTED_PROTOCOLS);
+        prepareSocket(serverSocket, options);
         serverSocket.bind(new InetSocketAddress(address, port), 500);
+
         return serverSocket;
     }
 
@@ -74,9 +72,7 @@
     {
         SSLContext ctx = createSSLContext(options, true);
         SSLSocket socket = (SSLSocket) ctx.getSocketFactory().createSocket(address, port, localAddress, localPort);
-        String[] suites = filterCipherSuites(socket.getSupportedCipherSuites(), options.cipher_suites);
-        socket.setEnabledCipherSuites(suites);
-        socket.setEnabledProtocols(ACCEPTED_PROTOCOLS);
+        prepareSocket(socket, options);
         return socket;
     }
 
@@ -85,9 +81,7 @@
     {
         SSLContext ctx = createSSLContext(options, true);
         SSLSocket socket = (SSLSocket) ctx.getSocketFactory().createSocket(address, port);
-        String[] suites = filterCipherSuites(socket.getSupportedCipherSuites(), options.cipher_suites);
-        socket.setEnabledCipherSuites(suites);
-        socket.setEnabledProtocols(ACCEPTED_PROTOCOLS);
+        prepareSocket(socket, options);
         return socket;
     }
 
@@ -96,12 +90,37 @@
     {
         SSLContext ctx = createSSLContext(options, true);
         SSLSocket socket = (SSLSocket) ctx.getSocketFactory().createSocket();
-        String[] suites = filterCipherSuites(socket.getSupportedCipherSuites(), options.cipher_suites);
-        socket.setEnabledCipherSuites(suites);
-        socket.setEnabledProtocols(ACCEPTED_PROTOCOLS);
+        prepareSocket(socket, options);
         return socket;
     }
 
+    /** Sets relevant socket options specified in encryption settings */
+    private static void prepareSocket(SSLServerSocket serverSocket, EncryptionOptions options)
+    {
+        String[] suites = filterCipherSuites(serverSocket.getSupportedCipherSuites(), options.cipher_suites);
+        if(options.require_endpoint_verification)
+        {
+            SSLParameters sslParameters = serverSocket.getSSLParameters();
+            sslParameters.setEndpointIdentificationAlgorithm("HTTPS");
+            serverSocket.setSSLParameters(sslParameters);
+        }
+        serverSocket.setEnabledCipherSuites(suites);
+        serverSocket.setNeedClientAuth(options.require_client_auth);
+    }
+
+    /** Sets relevant socket options specified in encryption settings */
+    private static void prepareSocket(SSLSocket socket, EncryptionOptions options)
+    {
+        String[] suites = filterCipherSuites(socket.getSupportedCipherSuites(), options.cipher_suites);
+        if(options.require_endpoint_verification)
+        {
+            SSLParameters sslParameters = socket.getSSLParameters();
+            sslParameters.setEndpointIdentificationAlgorithm("HTTPS");
+            socket.setSSLParameters(sslParameters);
+        }
+        socket.setEnabledCipherSuites(suites);
+    }
+
     @SuppressWarnings("resource")
     public static SSLContext createSSLContext(EncryptionOptions options, boolean buildTruststore) throws IOException
     {

diff --git a/src/java/org/apache/cassandra/serializers/AsciiSerializer.java b/src/java/org/apache/cassandra/serializers/AsciiSerializer.java
index b013b23..e265cb2 100644
--- a/src/java/org/apache/cassandra/serializers/AsciiSerializer.java
+++ b/src/java/org/apache/cassandra/serializers/AsciiSerializer.java

@@ -35,7 +35,7 @@
         for (int i = bytes.position(); i < bytes.limit(); i++)
         {
             byte b = bytes.get(i);
-            if (b < 0 || b > 127)
+            if (b < 0)
                 throw new MarshalException("Invalid byte for ascii: " + Byte.toString(b));
         }
     }

diff --git a/src/java/org/apache/cassandra/serializers/ByteSerializer.java b/src/java/org/apache/cassandra/serializers/ByteSerializer.java
index 8c736cb..9d34fbc 100644
--- a/src/java/org/apache/cassandra/serializers/ByteSerializer.java
+++ b/src/java/org/apache/cassandra/serializers/ByteSerializer.java

@@ -28,7 +28,7 @@
 
     public Byte deserialize(ByteBuffer bytes)
     {
-        return bytes.remaining() == 0 ? null : bytes.get(bytes.position());
+        return bytes == null || bytes.remaining() == 0 ? null : bytes.get(bytes.position());
     }
 
     public ByteBuffer serialize(Byte value)

diff --git a/src/java/org/apache/cassandra/serializers/TypeSerializer.java b/src/java/org/apache/cassandra/serializers/TypeSerializer.java
index e66c36d..bf197cb 100644
--- a/src/java/org/apache/cassandra/serializers/TypeSerializer.java
+++ b/src/java/org/apache/cassandra/serializers/TypeSerializer.java

@@ -23,11 +23,17 @@
 public interface TypeSerializer<T>
 {
     public ByteBuffer serialize(T value);
+
+    /*
+     * Does not modify the position or limit of the buffer even temporarily.
+     */
     public T deserialize(ByteBuffer bytes);
 
     /*
      * Validate that the byte array is a valid sequence for the type this represents.
      * This guarantees deserialize() can be called without errors.
+     *
+     * Does not modify the position or limit of the buffer even temporarily
      */
     public void validate(ByteBuffer bytes) throws MarshalException;
 

diff --git a/src/java/org/apache/cassandra/serializers/UTF8Serializer.java b/src/java/org/apache/cassandra/serializers/UTF8Serializer.java
index e3ea2d5..e7a5854 100644
--- a/src/java/org/apache/cassandra/serializers/UTF8Serializer.java
+++ b/src/java/org/apache/cassandra/serializers/UTF8Serializer.java

@@ -94,10 +94,8 @@
                             if (b == (byte)0xf0)
                                 // 0xf0, 0x90-0xbf, 0x80-0xbf, 0x80-0xbf
                                 state = State.FOUR_90bf;
-                            else if (b == (byte)0xf4)
-                                // 0xf4, 0x80-0xbf, 0x80-0xbf, 0x80-0xbf
-                                state = State.FOUR_80bf_3;
                             else
+                                // 0xf4, 0x80-0xbf, 0x80-0xbf, 0x80-0xbf
                                 // 0xf1-0xf3, 0x80-0xbf, 0x80-0xbf, 0x80-0xbf
                                 state = State.FOUR_80bf_3;
                             break;

diff --git a/src/java/org/apache/cassandra/serializers/UUIDSerializer.java b/src/java/org/apache/cassandra/serializers/UUIDSerializer.java
index f8e2582..4501f34 100644
--- a/src/java/org/apache/cassandra/serializers/UUIDSerializer.java
+++ b/src/java/org/apache/cassandra/serializers/UUIDSerializer.java

@@ -34,7 +34,7 @@
 
     public ByteBuffer serialize(UUID value)
     {
-        return value == null ? ByteBufferUtil.EMPTY_BYTE_BUFFER : ByteBuffer.wrap(UUIDGen.decompose(value));
+        return value == null ? ByteBufferUtil.EMPTY_BYTE_BUFFER : UUIDGen.toByteBuffer(value);
     }
 
     public void validate(ByteBuffer bytes) throws MarshalException

diff --git a/src/java/org/apache/cassandra/service/ActiveRepairService.java b/src/java/org/apache/cassandra/service/ActiveRepairService.java
index 09154c2..04e39db 100644
--- a/src/java/org/apache/cassandra/service/ActiveRepairService.java
+++ b/src/java/org/apache/cassandra/service/ActiveRepairService.java

@@ -160,7 +160,7 @@
                 gossiper.unregister(session);
                 sessions.remove(session.getId());
             }
-        }, MoreExecutors.sameThreadExecutor());
+        }, MoreExecutors.directExecutor());
         session.start(executor);
         return session;
     }
@@ -439,7 +439,7 @@
             {
                 removeParentRepairSession(parentRepairSession);
             }
-        }, MoreExecutors.sameThreadExecutor());
+        }, MoreExecutors.directExecutor());
 
         return allAntiCompactionResults;
     }
@@ -611,7 +611,7 @@
                                !(sstable.metadata.isIndex()) && // exclude SSTables from 2i
                                new Bounds<>(sstable.first.getToken(), sstable.last.getToken()).intersects(ranges);
                     }
-                }, true);
+                }, true, false);
 
                 if (isAlreadyRepairing(cfId, parentSessionId, snapshottedSSTables))
                 {

diff --git a/src/java/org/apache/cassandra/service/AsyncRepairCallback.java b/src/java/org/apache/cassandra/service/AsyncRepairCallback.java
index dec5319..4e70d56 100644
--- a/src/java/org/apache/cassandra/service/AsyncRepairCallback.java
+++ b/src/java/org/apache/cassandra/service/AsyncRepairCallback.java

@@ -46,7 +46,7 @@
         {
             StageManager.getStage(Stage.READ_REPAIR).execute(new WrappedRunnable()
             {
-                protected void runMayThrow() throws DigestMismatchException, IOException
+                protected void runMayThrow()
                 {
                     repairResolver.resolve();
                 }

diff --git a/src/java/org/apache/cassandra/service/CacheService.java b/src/java/org/apache/cassandra/service/CacheService.java
index c51a5d1..625d687 100644
--- a/src/java/org/apache/cassandra/service/CacheService.java
+++ b/src/java/org/apache/cassandra/service/CacheService.java

@@ -40,10 +40,8 @@
 import org.apache.cassandra.cache.AutoSavingCache.CacheSerializer;
 import org.apache.cassandra.concurrent.Stage;
 import org.apache.cassandra.concurrent.StageManager;
-import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.config.DatabaseDescriptor;
-import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.db.filter.*;
@@ -64,7 +62,7 @@
 
     public static final String MBEAN_NAME = "org.apache.cassandra.db:type=Caches";
 
-    public static enum CacheType
+    public enum CacheType
     {
         KEY_CACHE("KeyCache"),
         ROW_CACHE("RowCache"),
@@ -72,7 +70,7 @@
 
         private final String name;
 
-        private CacheType(String typeName)
+        CacheType(String typeName)
         {
             name = typeName;
         }
@@ -391,7 +389,8 @@
 
                     ClusteringIndexFilter filter = new ClusteringIndexNamesFilter(FBUtilities.<Clustering>singleton(name.clustering, cfs.metadata.comparator), false);
                     SinglePartitionReadCommand cmd = SinglePartitionReadCommand.create(cfs.metadata, nowInSec, key, builder.build(), filter);
-                    try (OpOrder.Group op = cfs.readOrdering.start(); RowIterator iter = UnfilteredRowIterators.filter(cmd.queryMemtableAndDisk(cfs, op), nowInSec))
+                    try (ReadExecutionController controller = cmd.executionController();
+                         RowIterator iter = UnfilteredRowIterators.filter(cmd.queryMemtableAndDisk(cfs, controller), nowInSec))
                     {
                         Cell cell;
                         if (column.isStatic())
@@ -430,9 +429,9 @@
             //Keyspace and CF name are deserialized by AutoSaving cache and used to fetch the CFS provided as a
             //parameter so they aren't deserialized here, even though they are serialized by this serializer
             final ByteBuffer buffer = ByteBufferUtil.readWithLength(in);
-            final int rowsToCache = cfs.metadata.params.caching.rowsPerPartitionToCache();
             if (cfs == null  || !cfs.isRowCacheEnabled())
                 return null;
+            final int rowsToCache = cfs.metadata.params.caching.rowsPerPartitionToCache();
             assert(!cfs.isIndex());//Shouldn't have row cache entries for indexes
 
             return StageManager.getStage(Stage.READ).submit(new Callable<Pair<RowCacheKey, IRowCacheEntry>>()
@@ -441,7 +440,8 @@
                 {
                     DecoratedKey key = cfs.decorateKey(buffer);
                     int nowInSec = FBUtilities.nowInSeconds();
-                    try (OpOrder.Group op = cfs.readOrdering.start(); UnfilteredRowIterator iter = SinglePartitionReadCommand.fullPartitionRead(cfs.metadata, nowInSec, key).queryMemtableAndDisk(cfs, op))
+                    SinglePartitionReadCommand cmd = SinglePartitionReadCommand.fullPartitionRead(cfs.metadata, nowInSec, key);
+                    try (ReadExecutionController controller = cmd.executionController(); UnfilteredRowIterator iter = cmd.queryMemtableAndDisk(cfs, controller))
                     {
                         CachedPartition toCache = CachedBTreePartition.create(DataLimits.cqlLimits(rowsToCache).filter(iter, nowInSec), nowInSec);
                         return Pair.create(new RowCacheKey(cfs.metadata.ksAndCFName, key), (IRowCacheEntry)toCache);
@@ -467,7 +467,9 @@
             ByteBufferUtil.writeWithLength(key.key, out);
             out.writeInt(key.desc.generation);
             out.writeBoolean(true);
-            key.desc.getFormat().getIndexSerializer(cfs.metadata, key.desc.version, SerializationHeader.forKeyCache(cfs.metadata)).serialize(entry, out);
+
+            SerializationHeader header = new SerializationHeader(false, cfs.metadata, cfs.metadata.partitionColumns(), EncodingStats.NO_STATS);
+            key.desc.getFormat().getIndexSerializer(cfs.metadata, key.desc.version, header).serializeForCache(entry, out);
         }
 
         public Future<Pair<KeyCacheKey, RowIndexEntry>> deserialize(DataInputPlus input, ColumnFamilyStore cfs) throws IOException
@@ -483,20 +485,20 @@
             ByteBuffer key = ByteBufferUtil.read(input, keyLength);
             int generation = input.readInt();
             input.readBoolean(); // backwards compatibility for "promoted indexes" boolean
-            SSTableReader reader = null;
+            SSTableReader reader;
             if (cfs == null || !cfs.isKeyCacheEnabled() || (reader = findDesc(generation, cfs.getSSTables(SSTableSet.CANONICAL))) == null)
             {
                 // The sstable doesn't exist anymore, so we can't be sure of the exact version and assume its the current version. The only case where we'll be
                 // wrong is during upgrade, in which case we fail at deserialization. This is not a huge deal however since 1) this is unlikely enough that
                 // this won't affect many users (if any) and only once, 2) this doesn't prevent the node from starting and 3) CASSANDRA-10219 shows that this
                 // part of the code has been broken for a while without anyone noticing (it is, btw, still broken until CASSANDRA-10219 is fixed).
-                RowIndexEntry.Serializer.skip(input, BigFormat.instance.getLatestVersion());
+                RowIndexEntry.Serializer.skipForCache(input, BigFormat.instance.getLatestVersion());
                 return null;
             }
             RowIndexEntry.IndexSerializer<?> indexSerializer = reader.descriptor.getFormat().getIndexSerializer(reader.metadata,
                                                                                                                 reader.descriptor.version,
-                                                                                                                SerializationHeader.forKeyCache(cfs.metadata));
-            RowIndexEntry entry = indexSerializer.deserialize(input);
+                                                                                                                reader.header);
+            RowIndexEntry<?> entry = indexSerializer.deserializeForCache(input);
             return Futures.immediateFuture(Pair.create(new KeyCacheKey(cfs.metadata.ksAndCFName, reader.descriptor, key), entry));
         }
 

diff --git a/src/java/org/apache/cassandra/service/CassandraDaemon.java b/src/java/org/apache/cassandra/service/CassandraDaemon.java
index d7a90f0..0151208 100644
--- a/src/java/org/apache/cassandra/service/CassandraDaemon.java
+++ b/src/java/org/apache/cassandra/service/CassandraDaemon.java

@@ -22,39 +22,33 @@
 import java.lang.management.ManagementFactory;
 import java.lang.management.MemoryPoolMXBean;
 import java.net.InetAddress;
+import java.net.URL;
 import java.net.UnknownHostException;
-import java.rmi.registry.LocateRegistry;
-import java.rmi.server.RMIServerSocketFactory;
-import java.util.Collections;
 import java.util.List;
-import java.util.Map;
 import java.util.concurrent.TimeUnit;
-
 import javax.management.MBeanServer;
 import javax.management.ObjectName;
 import javax.management.StandardMBean;
 import javax.management.remote.JMXConnectorServer;
-import javax.management.remote.JMXServiceURL;
-import javax.management.remote.rmi.RMIConnectorServer;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.util.concurrent.Futures;
+import com.google.common.util.concurrent.ListenableFuture;
+import com.google.common.util.concurrent.Uninterruptibles;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
 
 import com.addthis.metrics3.reporter.config.ReporterConfig;
 import com.codahale.metrics.Meter;
 import com.codahale.metrics.MetricRegistryListener;
 import com.codahale.metrics.SharedMetricRegistries;
-import com.google.common.annotations.VisibleForTesting;
-import com.google.common.util.concurrent.Futures;
-import com.google.common.util.concurrent.ListenableFuture;
-import com.google.common.util.concurrent.Uninterruptibles;
-
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
-import org.apache.cassandra.concurrent.*;
+import org.apache.cassandra.batchlog.LegacyBatchlogMigrator;
+import org.apache.cassandra.concurrent.ScheduledExecutors;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
+import org.apache.cassandra.cql3.functions.ThreadAwareSecurityManager;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.batchlog.LegacyBatchlogMigrator;
 import org.apache.cassandra.db.commitlog.CommitLog;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.exceptions.StartupException;
@@ -67,7 +61,6 @@
 import org.apache.cassandra.metrics.DefaultNameFactory;
 import org.apache.cassandra.metrics.StorageMetrics;
 import org.apache.cassandra.schema.LegacySchemaMigrator;
-import org.apache.cassandra.cql3.functions.ThreadAwareSecurityManager;
 import org.apache.cassandra.thrift.ThriftServer;
 import org.apache.cassandra.tracing.Tracing;
 import org.apache.cassandra.utils.*;
@@ -81,7 +74,6 @@
 public class CassandraDaemon
 {
     public static final String MBEAN_NAME = "org.apache.cassandra.db:type=NativeAccess";
-    private static JMXConnectorServer jmxServer = null;
 
     private static final Logger logger;
     static {
@@ -105,22 +97,46 @@
 
     private void maybeInitJmx()
     {
+        // If the standard com.sun.management.jmxremote.port property has been set
+        // then the JVM agent will have already started up a default JMX connector
+        // server. This behaviour is deprecated, but some clients may be relying
+        // on it, so log a warning and skip setting up the server with the settings
+        // as configured in cassandra-env.(sh|ps1)
+        // See: CASSANDRA-11540 & CASSANDRA-11725
         if (System.getProperty("com.sun.management.jmxremote.port") != null)
+        {
+            logger.warn("JMX settings in cassandra-env.sh have been bypassed as the JMX connector server is " +
+                        "already initialized. Please refer to cassandra-env.(sh|ps1) for JMX configuration info");
             return;
+        }
 
-        String jmxPort = System.getProperty("cassandra.jmx.local.port");
+        System.setProperty("java.rmi.server.randomIDs", "true");
+
+        // If a remote port has been specified then use that to set up a JMX
+        // connector server which can be accessed remotely. Otherwise, look
+        // for the local port property and create a server which is bound
+        // only to the loopback address. Auth options are applied to both
+        // remote and local-only servers, but currently SSL is only
+        // available for remote.
+        // If neither is remote nor local port is set in cassandra-env.(sh|ps)
+        // then JMX is effectively  disabled.
+        boolean localOnly = false;
+        String jmxPort = System.getProperty("cassandra.jmx.remote.port");
+
+        if (jmxPort == null)
+        {
+            localOnly = true;
+            jmxPort = System.getProperty("cassandra.jmx.local.port");
+        }
+
         if (jmxPort == null)
             return;
 
-        System.setProperty("java.rmi.server.hostname", InetAddress.getLoopbackAddress().getHostAddress());
-        RMIServerSocketFactory serverFactory = new RMIServerSocketFactoryImpl();
-        Map<String, ?> env = Collections.singletonMap(RMIConnectorServer.RMI_SERVER_SOCKET_FACTORY_ATTRIBUTE, serverFactory);
         try
         {
-            LocateRegistry.createRegistry(Integer.valueOf(jmxPort), null, serverFactory);
-            JMXServiceURL url = new JMXServiceURL(String.format("service:jmx:rmi://localhost/jndi/rmi://localhost:%s/jmxrmi", jmxPort));
-            jmxServer = new RMIConnectorServer(url, env, ManagementFactory.getPlatformMBeanServer());
-            jmxServer.start();
+            jmxServer = JMXServerUtils.createJMXServer(Integer.parseInt(jmxPort), localOnly);
+            if (jmxServer == null)
+                return;
         }
         catch (IOException e)
         {
@@ -132,6 +148,7 @@
 
     public Server thriftServer;
     private NativeTransportService nativeTransportService;
+    private JMXConnectorServer jmxServer;
 
     private final boolean runManaged;
     protected final StartupChecks startupChecks;
@@ -242,7 +259,16 @@
                 continue;
 
             for (CFMetaData cfm : Schema.instance.getTablesAndViews(keyspaceName))
-                ColumnFamilyStore.scrubDataDirectories(cfm);
+            {
+                try
+                {
+                    ColumnFamilyStore.scrubDataDirectories(cfm);
+                }
+                catch (StartupException e)
+                {
+                    exitOrFail(e.returnCode, e.getMessage(), e.getCause());
+                }
+            }
         }
 
         Keyspace.setInitialized();
@@ -283,10 +309,10 @@
             logger.warn("Unable to start GCInspector (currently only supported on the Sun JVM)");
         }
 
-        // replay the log if necessary
+        // Replay any CommitLogSegments found on disk
         try
         {
-            CommitLog.instance.recover();
+            CommitLog.instance.recoverSegmentsOnDisk();
         }
         catch (IOException e)
         {
@@ -315,24 +341,29 @@
             }
         }
 
-        Runnable viewRebuild = new Runnable()
-        {
-            @Override
-            public void run()
-            {
-                for (Keyspace keyspace : Keyspace.all())
-                {
-                    keyspace.viewManager.buildAllViews();
-                }
-                logger.debug("Completed submission of build tasks for any materialized views defined at startup");
-            }
-        };
-
-        ScheduledExecutors.optionalTasks.schedule(viewRebuild, StorageService.RING_DELAY, TimeUnit.MILLISECONDS);
-
-
         SystemKeyspace.finishStartup();
 
+        // Metrics
+        String metricsReporterConfigFile = System.getProperty("cassandra.metricsReporterConfigFile");
+        if (metricsReporterConfigFile != null)
+        {
+            logger.info("Trying to load metrics-reporter-config from file: {}", metricsReporterConfigFile);
+            try
+            {
+                URL resource = CassandraDaemon.class.getClassLoader().getResource(metricsReporterConfigFile);
+                if (resource == null) {
+                    logger.warn("Failed to load metrics-reporter-config, file does not exist: {}", metricsReporterConfigFile);
+                } else {
+                    String reportFileLocation = resource.getFile();
+                    ReporterConfig.loadFromFile(reportFileLocation).enableAll(CassandraMetricsRegistry.Metrics);
+                }
+            }
+            catch (Exception e)
+            {
+                logger.warn("Failed to load metrics-reporter-config, metric sinks will not be activated", e);
+            }
+        }
+
         // start server internals
         StorageService.instance.registerDaemon(this);
         try
@@ -345,23 +376,19 @@
             exitOrFail(1, "Fatal configuration error", e);
         }
 
-        Mx4jTool.maybeLoad();
+        // Because we are writing to the system_distributed keyspace, this should happen after that is created, which
+        // happens in StorageService.instance.initServer()
+        Runnable viewRebuild = () -> {
+            for (Keyspace keyspace : Keyspace.all())
+            {
+                keyspace.viewManager.buildAllViews();
+            }
+            logger.debug("Completed submission of build tasks for any materialized views defined at startup");
+        };
 
-        // Metrics
-        String metricsReporterConfigFile = System.getProperty("cassandra.metricsReporterConfigFile");
-        if (metricsReporterConfigFile != null)
-        {
-            logger.info("Trying to load metrics-reporter-config from file: {}", metricsReporterConfigFile);
-            try
-            {
-                String reportFileLocation = CassandraDaemon.class.getClassLoader().getResource(metricsReporterConfigFile).getFile();
-                ReporterConfig.loadFromFile(reportFileLocation).enableAll(CassandraMetricsRegistry.Metrics);
-            }
-            catch (Exception e)
-            {
-                logger.warn("Failed to load metrics-reporter-config, metric sinks will not be activated", e);
-            }
-        }
+        ScheduledExecutors.optionalTasks.schedule(viewRebuild, StorageService.RING_DELAY, TimeUnit.MILLISECONDS);
+
+        Mx4jTool.maybeLoad();
 
         if (!FBUtilities.getBroadcastAddress().equals(InetAddress.getLoopbackAddress()))
             waitForGossipToSettle();
@@ -429,7 +456,9 @@
 	        }
 
 	        logger.info("JVM vendor/version: {}/{}", System.getProperty("java.vm.name"), System.getProperty("java.version"));
-	        logger.info("Heap size: {}/{}", Runtime.getRuntime().totalMemory(), Runtime.getRuntime().maxMemory());
+	        logger.info("Heap size: {}/{}",
+                        FBUtilities.prettyPrintMemory(Runtime.getRuntime().totalMemory()),
+                        FBUtilities.prettyPrintMemory(Runtime.getRuntime().maxMemory()));
 
 	        for(MemoryPoolMXBean pool: ManagementFactory.getMemoryPoolMXBeans())
 	            logger.info("{} {}: {}", pool.getName(), pool.getType(), pool.getPeakUsage());
@@ -626,7 +655,8 @@
         stop();
         destroy();
         // completely shut down cassandra
-        if(!runManaged) {
+        if(!runManaged)
+        {
             System.exit(0);
         }
     }
@@ -686,21 +716,24 @@
         instance.activate();
     }
 
-    private void exitOrFail(int code, String message) {
+    private void exitOrFail(int code, String message)
+    {
         exitOrFail(code, message, null);
     }
 
-    private void exitOrFail(int code, String message, Throwable cause) {
-            if(runManaged) {
-                RuntimeException t = cause!=null ? new RuntimeException(message, cause) : new RuntimeException(message);
-                throw t;
-            }
-            else {
-                logger.error(message, cause);
-                System.exit(code);
-            }
-
+    private void exitOrFail(int code, String message, Throwable cause)
+    {
+        if (runManaged)
+        {
+            RuntimeException t = cause!=null ? new RuntimeException(message, cause) : new RuntimeException(message);
+            throw t;
         }
+        else
+        {
+            logger.error(message, cause);
+            System.exit(code);
+        }
+    }
 
     static class NativeAccess implements NativeAccessMBean
     {

diff --git a/src/java/org/apache/cassandra/service/ClientState.java b/src/java/org/apache/cassandra/service/ClientState.java
index 43002d2..b131701 100644
--- a/src/java/org/apache/cassandra/service/ClientState.java
+++ b/src/java/org/apache/cassandra/service/ClientState.java

@@ -29,6 +29,7 @@
 
 import org.apache.cassandra.auth.*;
 import org.apache.cassandra.config.Config;
+import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.cql3.QueryHandler;
@@ -293,6 +294,12 @@
         hasAccess(keyspace, perm, DataResource.table(keyspace, columnFamily));
     }
 
+    public void hasColumnFamilyAccess(CFMetaData cfm, Permission perm)
+    throws UnauthorizedException, InvalidRequestException
+    {
+        hasAccess(cfm.ksName, perm, cfm.resource);
+    }
+
     private void hasAccess(String keyspace, Permission perm, DataResource resource)
     throws UnauthorizedException, InvalidRequestException
     {
@@ -311,7 +318,7 @@
 
     public void ensureHasPermission(Permission perm, IResource resource) throws UnauthorizedException
     {
-        if (DatabaseDescriptor.getAuthorizer() instanceof AllowAllAuthorizer)
+        if (!DatabaseDescriptor.getAuthorizer().requireAuthorization())
             return;
 
         // Access to built in functions is unrestricted
@@ -327,7 +334,7 @@
     public void ensureHasPermission(Permission permission, Function function)
     {
         // Save creating a FunctionResource is we don't need to
-        if (DatabaseDescriptor.getAuthorizer() instanceof AllowAllAuthorizer)
+        if (!DatabaseDescriptor.getAuthorizer().requireAuthorization())
             return;
 
         // built in functions are always available to all

diff --git a/src/java/org/apache/cassandra/service/DataResolver.java b/src/java/org/apache/cassandra/service/DataResolver.java
index 4e5bfb8..2c1b347 100644
--- a/src/java/org/apache/cassandra/service/DataResolver.java
+++ b/src/java/org/apache/cassandra/service/DataResolver.java

@@ -28,6 +28,7 @@
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.filter.ClusteringIndexFilter;
+import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.filter.DataLimits;
 import org.apache.cassandra.db.partitions.*;
 import org.apache.cassandra.db.rows.*;
@@ -173,7 +174,7 @@
             // For each source, the time of the current deletion as known by the source.
             private final DeletionTime[] sourceDeletionTime = new DeletionTime[sources.length];
             // For each source, record if there is an open range to send as repair, and from where.
-            private final Slice.Bound[] markerToRepair = new Slice.Bound[sources.length];
+            private final ClusteringBound[] markerToRepair = new ClusteringBound[sources.length];
 
             public MergeListener(DecoratedKey partitionKey, PartitionColumns columns, boolean isReversed)
             {
@@ -203,10 +204,22 @@
 
                     public void onCell(int i, Clustering clustering, Cell merged, Cell original)
                     {
-                        if (merged != null && !merged.equals(original))
+                        if (merged != null && !merged.equals(original) && isQueried(merged))
                             currentRow(i, clustering).addCell(merged);
                     }
 
+                    private boolean isQueried(Cell cell)
+                    {
+                        // When we read, we may have some cell that have been fetched but are not selected by the user. Those cells may
+                        // have empty values as optimization (see CASSANDRA-10655) and hence they should not be included in the read-repair.
+                        // This is fine since those columns are not actually requested by the user and are only present for the sake of CQL
+                        // semantic (making sure we can always distinguish between a row that doesn't exist from one that do exist but has
+                        /// no value for the column requested by the user) and so it won't be unexpected by the user that those columns are
+                        // not repaired.
+                        ColumnDefinition column = cell.column();
+                        ColumnFilter filter = command.columnFilter();
+                        return column.isComplex() ? filter.fetchedCellIsQueried(column, cell.path()) : filter.fetchedColumnIsQueried(column);
+                    }
                 };
             }
 
@@ -318,8 +331,8 @@
 
                         if (merged.isClose(isReversed))
                         {
-                            // We're closing the merged range. If we've marked the source as needing to be repaired for
-                            // that range, close and add it to the repair to be sent.
+                            // We're closing the merged range. If we're recorded that this should be repaird for the
+                            // source, close and add said range to the repair to send.
                             if (markerToRepair[i] != null)
                                 closeOpenMarker(i, merged.closeBound(isReversed));
 
@@ -342,9 +355,9 @@
                     mergedDeletionTime = merged.isOpen(isReversed) ? merged.openDeletionTime(isReversed) : null;
             }
 
-            private void closeOpenMarker(int i, Slice.Bound close)
+            private void closeOpenMarker(int i, ClusteringBound close)
             {
-                Slice.Bound open = markerToRepair[i];
+                ClusteringBound open = markerToRepair[i];
                 update(i).add(new RangeTombstone(Slice.make(isReversed ? close : open, isReversed ? open : close), currentDeletion()));
                 markerToRepair[i] = null;
             }

diff --git a/src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java b/src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java
index 88a1fce..c653683 100644
--- a/src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java
+++ b/src/java/org/apache/cassandra/service/DefaultFSErrorHandler.java

@@ -39,7 +39,7 @@
     @Override
     public void handleCorruptSSTable(CorruptSSTableException e)
     {
-        if (!StorageService.instance.isSetupCompleted())
+        if (!StorageService.instance.isDaemonSetupCompleted())
             handleStartupFSError(e);
 
         JVMStabilityInspector.inspectThrowable(e);
@@ -54,7 +54,7 @@
     @Override
     public void handleFSError(FSError e)
     {
-        if (!StorageService.instance.isSetupCompleted())
+        if (!StorageService.instance.isDaemonSetupCompleted())
             handleStartupFSError(e);
 
         JVMStabilityInspector.inspectThrowable(e);

diff --git a/src/java/org/apache/cassandra/service/MigrationManager.java b/src/java/org/apache/cassandra/service/MigrationManager.java
index 0f2fd28..ba239b3 100644
--- a/src/java/org/apache/cassandra/service/MigrationManager.java
+++ b/src/java/org/apache/cassandra/service/MigrationManager.java

@@ -431,6 +431,7 @@
 
     public static void announceTypeUpdate(UserType updatedType, boolean announceLocally)
     {
+        logger.info(String.format("Update type '%s.%s' to %s", updatedType.keyspace, updatedType.getNameAsString(), updatedType));
         announceNewType(updatedType, announceLocally);
     }
 

diff --git a/src/java/org/apache/cassandra/service/NativeTransportService.java b/src/java/org/apache/cassandra/service/NativeTransportService.java
index eff3a89..48839f1 100644
--- a/src/java/org/apache/cassandra/service/NativeTransportService.java
+++ b/src/java/org/apache/cassandra/service/NativeTransportService.java

@@ -145,16 +145,7 @@
         servers = Collections.emptyList();
 
         // shutdown executors used by netty for native transport server
-        Future<?> wgStop = workerGroup.shutdownGracefully(0, 0, TimeUnit.SECONDS);
-
-        try
-        {
-            wgStop.await(5000);
-        }
-        catch (InterruptedException e1)
-        {
-            Thread.currentThread().interrupt();
-        }
+        workerGroup.shutdownGracefully(3, 5, TimeUnit.SECONDS).awaitUninterruptibly();
 
         // shutdownGracefully not implemented yet in RequestThreadPoolExecutor
         eventExecutorGroup.shutdown();

diff --git a/src/java/org/apache/cassandra/service/QueryState.java b/src/java/org/apache/cassandra/service/QueryState.java
index ddbc959..c70c692 100644
--- a/src/java/org/apache/cassandra/service/QueryState.java
+++ b/src/java/org/apache/cassandra/service/QueryState.java

@@ -18,6 +18,9 @@
 package org.apache.cassandra.service;
 
 import java.net.InetAddress;
+import java.nio.ByteBuffer;
+import java.util.Collections;
+import java.util.Map;
 import java.util.UUID;
 import java.util.concurrent.ThreadLocalRandom;
 
@@ -76,14 +79,19 @@
 
     public void createTracingSession()
     {
+        createTracingSession(Collections.EMPTY_MAP);
+    }
+
+    public void createTracingSession(Map<String,ByteBuffer> customPayload)
+    {
         UUID session = this.preparedTracingSession;
         if (session == null)
         {
-            Tracing.instance.newSession();
+            Tracing.instance.newSession(customPayload);
         }
         else
         {
-            Tracing.instance.newSession(session);
+            Tracing.instance.newSession(session, customPayload);
             this.preparedTracingSession = null;
         }
     }

diff --git a/src/java/org/apache/cassandra/service/ReadCallback.java b/src/java/org/apache/cassandra/service/ReadCallback.java
index 8747004..47eacdf 100644
--- a/src/java/org/apache/cassandra/service/ReadCallback.java
+++ b/src/java/org/apache/cassandra/service/ReadCallback.java

@@ -192,7 +192,8 @@
                                                            result,
                                                            Collections.<String, byte[]>emptyMap(),
                                                            MessagingService.Verb.INTERNAL_RESPONSE,
-                                                           MessagingService.current_version);
+                                                           MessagingService.current_version,
+                                                           MessageIn.createTimestamp());
         response(message);
     }
 

diff --git a/src/java/org/apache/cassandra/service/StartupChecks.java b/src/java/org/apache/cassandra/service/StartupChecks.java
index ad6a104..223da51 100644
--- a/src/java/org/apache/cassandra/service/StartupChecks.java
+++ b/src/java/org/apache/cassandra/service/StartupChecks.java

@@ -32,12 +32,16 @@
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
-import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.db.Directories;
+import org.apache.cassandra.db.SystemKeyspace;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.exceptions.StartupException;
 import org.apache.cassandra.io.sstable.Descriptor;
 import org.apache.cassandra.io.util.FileUtils;
-import org.apache.cassandra.utils.*;
+import org.apache.cassandra.utils.CLibrary;
+import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.SigarLibrary;
 
 /**
  * Verifies that the system and environment is in a fit state to be started.
@@ -71,6 +75,7 @@
     private final List<StartupCheck> DEFAULT_TESTS = ImmutableList.of(checkJemalloc,
                                                                       checkValidLaunchDate,
                                                                       checkJMXPorts,
+                                                                      checkJMXProperties,
                                                                       inspectJvmOptions,
                                                                       checkJnaInitialization,
                                                                       initSigarLibrary,
@@ -109,7 +114,7 @@
 
     public static final StartupCheck checkJemalloc = new StartupCheck()
     {
-        public void execute() throws StartupException
+        public void execute()
         {
             if (FBUtilities.isWindows())
                 return;
@@ -135,8 +140,9 @@
         {
             long now = System.currentTimeMillis();
             if (now < EARLIEST_LAUNCH_DATE)
-                throw new StartupException(1, String.format("current machine time is %s, but that is seemingly incorrect. exiting now.",
-                                                            new Date(now).toString()));
+                throw new StartupException(StartupException.ERR_WRONG_MACHINE_STATE,
+                                           String.format("current machine time is %s, but that is seemingly incorrect. exiting now.",
+                                                         new Date(now).toString()));
         }
     };
 
@@ -144,7 +150,7 @@
     {
         public void execute()
         {
-            String jmxPort = System.getProperty("com.sun.management.jmxremote.port");
+            String jmxPort = System.getProperty("cassandra.jmx.remote.port");
             if (jmxPort == null)
             {
                 logger.warn("JMX is not enabled to receive remote connections. Please see cassandra-env.sh for more info.");
@@ -154,7 +160,19 @@
             }
             else
             {
-                logger.info("JMX is enabled to receive remote connections on port: " + jmxPort);
+                logger.info("JMX is enabled to receive remote connections on port: {}", jmxPort);
+            }
+        }
+    };
+
+    public static final StartupCheck checkJMXProperties = new StartupCheck()
+    {
+        public void execute()
+        {
+            if (System.getProperty("com.sun.management.jmxremote.port") != null)
+            {
+                logger.warn("Use of com.sun.management.jmxremote.port at startup is deprecated. " +
+                            "Please use cassandra.jmx.remote.port instead.");
             }
         }
     };
@@ -187,7 +205,7 @@
         {
             // Fail-fast if JNA is not available or failing to initialize properly
             if (!CLibrary.jnaAvailable())
-                throw new StartupException(3, "JNA failing to initialize properly. ");
+                throw new StartupException(StartupException.ERR_WRONG_MACHINE_STATE, "JNA failing to initialize properly. ");
         }
     };
 
@@ -217,12 +235,14 @@
                 logger.warn("Directory {} doesn't exist", dataDir);
                 // if they don't, failing their creation, stop cassandra.
                 if (!dir.mkdirs())
-                    throw new StartupException(3, "Has no permission to create directory "+ dataDir);
+                    throw new StartupException(StartupException.ERR_WRONG_DISK_STATE,
+                                               "Has no permission to create directory "+ dataDir);
             }
 
             // if directories exist verify their permissions
             if (!Directories.verifyFullPermissions(dir, dataDir))
-                throw new StartupException(3, "Insufficient permissions on directory " + dataDir);
+                throw new StartupException(StartupException.ERR_WRONG_DISK_STATE,
+                                           "Insufficient permissions on directory " + dataDir);
         }
     };
 
@@ -238,7 +258,7 @@
 
             FileVisitor<Path> sstableVisitor = new SimpleFileVisitor<Path>()
             {
-                public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException
+                public FileVisitResult visitFile(Path file, BasicFileAttributes attrs)
                 {
                     if (!Descriptor.isValidFile(file.getFileName().toString()))
                         return FileVisitResult.CONTINUE;
@@ -279,11 +299,12 @@
             }
 
             if (!invalid.isEmpty())
-                throw new StartupException(3, String.format("Detected unreadable sstables %s, please check " +
-                                                            "NEWS.txt and ensure that you have upgraded through " +
-                                                            "all required intermediate versions, running " +
-                                                            "upgradesstables",
-                                                            Joiner.on(",").join(invalid)));
+                throw new StartupException(StartupException.ERR_WRONG_DISK_STATE,
+                                           String.format("Detected unreadable sstables %s, please check " +
+                                                         "NEWS.txt and ensure that you have upgraded through " +
+                                                         "all required intermediate versions, running " +
+                                                         "upgradesstables",
+                                                         Joiner.on(",").join(invalid)));
 
         }
     };
@@ -325,7 +346,7 @@
                         String formatMessage = "Cannot start node if snitch's data center (%s) differs from previous data center (%s). " +
                                                "Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_dc=true.";
 
-                        throw new StartupException(100, String.format(formatMessage, currentDc, storedDc));
+                        throw new StartupException(StartupException.ERR_WRONG_CONFIG, String.format(formatMessage, currentDc, storedDc));
                     }
                 }
             }
@@ -347,7 +368,7 @@
                         String formatMessage = "Cannot start node if snitch's rack (%s) differs from previous rack (%s). " +
                                                "Please fix the snitch configuration, decommission and rebootstrap this node or use the flag -Dcassandra.ignore_rack=true.";
 
-                        throw new StartupException(100, String.format(formatMessage, currentRack, storedRack));
+                        throw new StartupException(StartupException.ERR_WRONG_CONFIG, String.format(formatMessage, currentRack, storedRack));
                     }
                 }
             }

diff --git a/src/java/org/apache/cassandra/service/StorageProxy.java b/src/java/org/apache/cassandra/service/StorageProxy.java
index 483da67..3ce8013 100644
--- a/src/java/org/apache/cassandra/service/StorageProxy.java
+++ b/src/java/org/apache/cassandra/service/StorageProxy.java

@@ -48,6 +48,7 @@
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.filter.DataLimits;
 import org.apache.cassandra.db.filter.TombstoneOverwhelmingException;
+import org.apache.cassandra.db.monitoring.ConstructionTime;
 import org.apache.cassandra.db.partitions.*;
 import org.apache.cassandra.db.rows.RowIterator;
 import org.apache.cassandra.db.view.ViewUtils;
@@ -518,6 +519,8 @@
         MessageOut<Commit> message = new MessageOut<Commit>(MessagingService.Verb.PAXOS_COMMIT, proposal, Commit.serializer);
         for (InetAddress destination : Iterables.concat(naturalEndpoints, pendingEndpoints))
         {
+            checkHintOverload(destination);
+
             if (FailureDetector.instance.isAlive(destination))
             {
                 if (shouldBlock)
@@ -955,7 +958,7 @@
             logger.trace("Sending batchlog store request {} to {} for {} mutations", batch.id, target, batch.size());
 
             if (canDoLocalRequest(target))
-                performLocally(Stage.MUTATION, () -> BatchlogManager.store(batch), handler);
+                performLocally(Stage.MUTATION, Optional.empty(), () -> BatchlogManager.store(batch), handler);
             else
                 MessagingService.instance().sendRR(message, target, handler);
         }
@@ -1244,7 +1247,7 @@
             submitHint(mutation, endpointsToHint, responseHandler);
 
         if (insertLocal)
-            performLocally(stage, mutation::apply, responseHandler);
+            performLocally(stage, Optional.of(mutation), mutation::apply, responseHandler);
 
         if (dcGroups != null)
         {
@@ -1281,7 +1284,7 @@
         InetAddress target = iter.next();
 
         // Add the other destinations of the same message as a FORWARD_HEADER entry
-        try (DataOutputBuffer out = new DataOutputBuffer())
+        try(DataOutputBuffer out = new DataOutputBuffer())
         {
             out.writeInt(targets.size() - 1);
             while (iter.hasNext())
@@ -1333,9 +1336,9 @@
         });
     }
 
-    private static void performLocally(Stage stage, final Runnable runnable, final IAsyncCallbackWithFailure<?> handler)
+    private static void performLocally(Stage stage, Optional<IMutation> mutation, final Runnable runnable, final IAsyncCallbackWithFailure<?> handler)
     {
-        StageManager.getStage(stage).maybeExecuteImmediately(new LocalMutationRunnable()
+        StageManager.getStage(stage).maybeExecuteImmediately(new LocalMutationRunnable(mutation)
         {
             public void runMayThrow()
             {
@@ -1467,7 +1470,7 @@
             {
                 assert mutation instanceof CounterMutation;
 
-                Mutation result = ((CounterMutation) mutation).apply();
+                Mutation result = ((CounterMutation) mutation).applyCounterMutation();
                 responseHandler.response(null);
 
                 Set<InetAddress> remotes = Sets.difference(ImmutableSet.copyOf(targets),
@@ -1791,10 +1794,25 @@
         {
             try
             {
-                try (ReadOrderGroup orderGroup = command.startOrderGroup(); UnfilteredPartitionIterator iterator = command.executeLocally(orderGroup))
+                command.setMonitoringTime(new ConstructionTime(constructionTime), timeout);
+
+                ReadResponse response;
+                try (ReadExecutionController executionController = command.executionController();
+                     UnfilteredPartitionIterator iterator = command.executeLocally(executionController))
                 {
-                    handler.response(command.createResponse(iterator));
+                    response = command.createResponse(iterator);
                 }
+
+                if (command.complete())
+                {
+                    handler.response(response);
+                }
+                else
+                {
+                    MessagingService.instance().incrementDroppedMessages(verb, System.currentTimeMillis() - constructionTime);
+                    handler.onFailure(FBUtilities.getBroadcastAddress());
+                }
+
                 MessagingService.instance().addLatency(FBUtilities.getBroadcastAddress(), TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start));
             }
             catch (Throwable t)
@@ -2348,9 +2366,8 @@
      * @param cfname
      * @throws UnavailableException If some of the hosts in the ring are down.
      * @throws TimeoutException
-     * @throws IOException
      */
-    public static void truncateBlocking(String keyspace, String cfname) throws UnavailableException, TimeoutException, IOException
+    public static void truncateBlocking(String keyspace, String cfname) throws UnavailableException, TimeoutException
     {
         logger.debug("Starting a blocking truncate operation on keyspace {}, CF {}", keyspace, cfname);
         if (isAnyStorageHostDown())
@@ -2428,20 +2445,23 @@
      */
     private static abstract class DroppableRunnable implements Runnable
     {
-        private final long constructionTime = System.nanoTime();
-        private final MessagingService.Verb verb;
+        final long constructionTime;
+        final MessagingService.Verb verb;
+        final long timeout;
 
         public DroppableRunnable(MessagingService.Verb verb)
         {
+            this.constructionTime = System.currentTimeMillis();
             this.verb = verb;
+            this.timeout = DatabaseDescriptor.getTimeout(verb);
         }
 
         public final void run()
         {
-
-            if (TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - constructionTime) > DatabaseDescriptor.getTimeout(verb))
+            long timeTaken = System.currentTimeMillis() - constructionTime;
+            if (timeTaken > timeout)
             {
-                MessagingService.instance().incrementDroppedMessages(verb);
+                MessagingService.instance().incrementDroppedMessages(verb, timeTaken);
                 return;
             }
             try
@@ -2454,11 +2474,6 @@
             }
         }
 
-        protected MessagingService.Verb verb()
-        {
-            return verb;
-        }
-
         abstract protected void runMayThrow() throws Exception;
     }
 
@@ -2470,13 +2485,27 @@
     {
         private final long constructionTime = System.currentTimeMillis();
 
+        private final Optional<IMutation> mutationOpt;
+
+        public LocalMutationRunnable(Optional<IMutation> mutationOpt)
+        {
+            this.mutationOpt = mutationOpt;
+        }
+
+        public LocalMutationRunnable()
+        {
+            this.mutationOpt = Optional.empty();
+        }
+
         public final void run()
         {
             final MessagingService.Verb verb = verb();
-            if (System.currentTimeMillis() > constructionTime + DatabaseDescriptor.getTimeout(verb))
+            long mutationTimeout = DatabaseDescriptor.getTimeout(verb);
+            long timeTaken = System.currentTimeMillis() - constructionTime;
+            if (timeTaken > mutationTimeout)
             {
-                if (MessagingService.DROPPABLE_VERBS.contains(verb()))
-                    MessagingService.instance().incrementDroppedMessages(verb);
+                if (MessagingService.DROPPABLE_VERBS.contains(verb))
+                    MessagingService.instance().incrementDroppedMutations(mutationOpt, timeTaken);
                 HintRunnable runnable = new HintRunnable(Collections.singleton(FBUtilities.getBroadcastAddress()))
                 {
                     protected void runMayThrow() throws Exception

diff --git a/src/java/org/apache/cassandra/service/StorageService.java b/src/java/org/apache/cassandra/service/StorageService.java
index 35830d9..7500593 100644
--- a/src/java/org/apache/cassandra/service/StorageService.java
+++ b/src/java/org/apache/cassandra/service/StorageService.java

@@ -27,6 +27,8 @@
 import java.util.concurrent.*;
 import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.atomic.AtomicInteger;
+import java.util.regex.MatchResult;
+import java.util.regex.Pattern;
 import javax.annotation.Nullable;
 import javax.management.*;
 import javax.management.openmbean.TabularData;
@@ -102,7 +104,7 @@
 public class StorageService extends NotificationBroadcasterSupport implements IEndpointStateChangeSubscriber, StorageServiceMBean
 {
     private static final Logger logger = LoggerFactory.getLogger(StorageService.class);
-
+    
     public static final int RING_DELAY = getRingDelay(); // delay after which we assume ring has stablized
 
     private final JMXProgressSupport progressSupport = new JMXProgressSupport(this);
@@ -169,8 +171,10 @@
     /* true if node is rebuilding and receiving data */
     private final AtomicBoolean isRebuilding = new AtomicBoolean();
 
-    private boolean initialized;
+    private volatile boolean initialized = false;
     private volatile boolean joined = false;
+    private volatile boolean gossipActive = false;
+    private volatile boolean authSetupComplete = false;
 
     /* the probability for tracing any particular request, 0 disables tracing and 1 enables for all */
     private double traceProbability = 0.0;
@@ -290,24 +294,24 @@
     // should only be called via JMX
     public void stopGossiping()
     {
-        if (initialized)
+        if (gossipActive)
         {
             logger.warn("Stopping gossip by operator request");
             Gossiper.instance.stop();
-            initialized = false;
+            gossipActive = false;
         }
     }
 
     // should only be called via JMX
     public void startGossiping()
     {
-        if (!initialized)
+        if (!gossipActive)
         {
             logger.warn("Starting gossip by operator request");
             setGossipTokens(getLocalTokens());
             Gossiper.instance.forceNewerGeneration();
             Gossiper.instance.start((int) (System.currentTimeMillis() / 1000));
-            initialized = true;
+            gossipActive = true;
         }
     }
 
@@ -383,7 +387,7 @@
 
     public void stopTransports()
     {
-        if (isInitialized())
+        if (isGossipActive())
         {
             logger.error("Stopping gossiper");
             stopGossiping();
@@ -421,7 +425,12 @@
         return initialized;
     }
 
-    public boolean isSetupCompleted()
+    public boolean isGossipActive()
+    {
+        return gossipActive;
+    }
+
+    public boolean isDaemonSetupCompleted()
     {
         return daemon == null
                ? false
@@ -435,50 +444,56 @@
         daemon.deactivate();
     }
 
-    public synchronized Collection<Token> prepareReplacementInfo() throws ConfigurationException
+    private synchronized UUID prepareReplacementInfo(InetAddress replaceAddress) throws ConfigurationException
     {
         logger.info("Gathering node replacement information for {}", DatabaseDescriptor.getReplaceAddress());
-        if (!MessagingService.instance().isListening())
-            MessagingService.instance().listen();
-
-        // make magic happen
         Gossiper.instance.doShadowRound();
+        // as we've completed the shadow round of gossip, we should be able to find the node we're replacing
+        if (Gossiper.instance.getEndpointStateForEndpoint(replaceAddress) == null)
+            throw new RuntimeException(String.format("Cannot replace_address %s because it doesn't exist in gossip", replaceAddress));
 
-        UUID hostId = null;
-        // now that we've gossiped at least once, we should be able to find the node we're replacing
-        if (Gossiper.instance.getEndpointStateForEndpoint(DatabaseDescriptor.getReplaceAddress())== null)
-            throw new RuntimeException("Cannot replace_address " + DatabaseDescriptor.getReplaceAddress() + " because it doesn't exist in gossip");
-        hostId = Gossiper.instance.getHostId(DatabaseDescriptor.getReplaceAddress());
         try
         {
-            VersionedValue tokensVersionedValue = Gossiper.instance.getEndpointStateForEndpoint(DatabaseDescriptor.getReplaceAddress()).getApplicationState(ApplicationState.TOKENS);
+            VersionedValue tokensVersionedValue = Gossiper.instance.getEndpointStateForEndpoint(replaceAddress).getApplicationState(ApplicationState.TOKENS);
             if (tokensVersionedValue == null)
-                throw new RuntimeException("Could not find tokens for " + DatabaseDescriptor.getReplaceAddress() + " to replace");
-            Collection<Token> tokens = TokenSerializer.deserialize(tokenMetadata.partitioner, new DataInputStream(new ByteArrayInputStream(tokensVersionedValue.toBytes())));
+                throw new RuntimeException(String.format("Could not find tokens for %s to replace", replaceAddress));
 
-            SystemKeyspace.setLocalHostId(hostId); // use the replacee's host Id as our own so we receive hints, etc
-            Gossiper.instance.resetEndpointStateMap(); // clean up since we have what we need
-            return tokens;
+            bootstrapTokens = TokenSerializer.deserialize(tokenMetadata.partitioner, new DataInputStream(new ByteArrayInputStream(tokensVersionedValue.toBytes())));
         }
         catch (IOException e)
         {
             throw new RuntimeException(e);
         }
+
+        // we'll use the replacee's host Id as our own so we receive hints, etc
+        UUID localHostId = Gossiper.instance.getHostId(replaceAddress);
+        SystemKeyspace.setLocalHostId(localHostId);
+        Gossiper.instance.resetEndpointStateMap(); // clean up since we have what we need
+        return localHostId;
     }
 
-    public synchronized void checkForEndpointCollision() throws ConfigurationException
+    private synchronized void checkForEndpointCollision(UUID localHostId) throws ConfigurationException
     {
+        if (Boolean.getBoolean("cassandra.allow_unsafe_join"))
+        {
+            logger.warn("Skipping endpoint collision check as cassandra.allow_unsafe_join=true");
+            return;
+        }
+
         logger.debug("Starting shadow gossip round to check for endpoint collision");
-        if (!MessagingService.instance().isListening())
-            MessagingService.instance().listen();
         Gossiper.instance.doShadowRound();
-        if (!Gossiper.instance.isSafeForBootstrap(FBUtilities.getBroadcastAddress()))
+        // If bootstrapping, check whether any previously known status for the endpoint makes it unsafe to do so.
+        // If not bootstrapping, compare the host id for this endpoint learned from gossip (if any) with the local
+        // one, which was either read from system.local or generated at startup. If a learned id is present &
+        // doesn't match the local, then the node needs replacing
+        if (!Gossiper.instance.isSafeForStartup(FBUtilities.getBroadcastAddress(), localHostId, shouldBootstrap()))
         {
             throw new RuntimeException(String.format("A node with address %s already exists, cancelling join. " +
                                                      "Use cassandra.replace_address if you want to replace this node.",
                                                      FBUtilities.getBroadcastAddress()));
         }
-        if (useStrictConsistency && !allowSimultaneousMoves())
+
+        if (shouldBootstrap() && useStrictConsistency && !allowSimultaneousMoves())
         {
             for (Map.Entry<InetAddress, EndpointState> entry : Gossiper.instance.getEndpointStates())
             {
@@ -492,6 +507,7 @@
                     throw new UnsupportedOperationException("Other bootstrapping/leaving/moving nodes detected, cannot bootstrap while cassandra.consistent.rangemovement is true");
             }
         }
+        logger.debug("Resetting gossip state after shadow round");
         Gossiper.instance.resetEndpointStateMap();
     }
 
@@ -504,6 +520,7 @@
     public void unsafeInitialize() throws ConfigurationException
     {
         initialized = true;
+        gossipActive = true;
         Gossiper.instance.register(this);
         Gossiper.instance.start((int) (System.currentTimeMillis() / 1000)); // needed for node-ring gathering.
         Gossiper.instance.addLocalApplicationState(ApplicationState.NET_VERSION, valueFactory.networkVersion());
@@ -538,8 +555,6 @@
         logger.info("CQL supported versions: {} (default: {})",
                 StringUtils.join(ClientState.getCQLSupportedVersion(), ","), ClientState.DEFAULT_CQL_VERSION);
 
-        initialized = true;
-
         try
         {
             // Ensure StorageProxy is initialized on start-up; see CASSANDRA-3797.
@@ -552,27 +567,6 @@
             throw new AssertionError(e);
         }
 
-        if (Boolean.parseBoolean(System.getProperty("cassandra.load_ring_state", "true")))
-        {
-            logger.info("Loading persisted ring state");
-            Multimap<InetAddress, Token> loadedTokens = SystemKeyspace.loadTokens();
-            Map<InetAddress, UUID> loadedHostIds = SystemKeyspace.loadHostIds();
-            for (InetAddress ep : loadedTokens.keySet())
-            {
-                if (ep.equals(FBUtilities.getBroadcastAddress()))
-                {
-                    // entry has been mistakenly added, delete it
-                    SystemKeyspace.removeEndpoint(ep);
-                }
-                else
-                {
-                    if (loadedHostIds.containsKey(ep))
-                        tokenMetadata.updateHostId(loadedHostIds.get(ep), ep);
-                    Gossiper.instance.addSavedEndpoint(ep);
-                }
-            }
-        }
-
         // daemon threads, like our executors', continue to run while shutdown hooks are invoked
         drainOnShutdown = new Thread(new WrappedRunnable()
         {
@@ -645,6 +639,9 @@
         if (!Boolean.parseBoolean(System.getProperty("cassandra.start_gossip", "true")))
         {
             logger.info("Not starting gossip as requested.");
+            // load ring state in preparation for starting gossip later
+            loadRingState();
+            initialized = true;
             return;
         }
 
@@ -679,6 +676,32 @@
             }
             logger.info("Not joining ring as requested. Use JMX (StorageService->joinRing()) to initiate ring joining");
         }
+
+        initialized = true;
+    }
+
+    private void loadRingState()
+    {
+        if (Boolean.parseBoolean(System.getProperty("cassandra.load_ring_state", "true")))
+        {
+            logger.info("Loading persisted ring state");
+            Multimap<InetAddress, Token> loadedTokens = SystemKeyspace.loadTokens();
+            Map<InetAddress, UUID> loadedHostIds = SystemKeyspace.loadHostIds();
+            for (InetAddress ep : loadedTokens.keySet())
+            {
+                if (ep.equals(FBUtilities.getBroadcastAddress()))
+                {
+                    // entry has been mistakenly added, delete it
+                    SystemKeyspace.removeEndpoint(ep);
+                }
+                else
+                {
+                    if (loadedHostIds.containsKey(ep))
+                        tokenMetadata.updateHostId(loadedHostIds.get(ep), ep);
+                    Gossiper.instance.addSavedEndpoint(ep);
+                }
+            }
+        }
     }
 
     /**
@@ -714,47 +737,71 @@
                 else
                     throw new ConfigurationException("This node was decommissioned and will not rejoin the ring unless cassandra.override_decommission=true has been set, or all existing data is removed and the node is bootstrapped again");
             }
-            if (replacing && !(Boolean.parseBoolean(System.getProperty("cassandra.join_ring", "true"))))
-                throw new ConfigurationException("Cannot set both join_ring=false and attempt to replace a node");
+
             if (DatabaseDescriptor.getReplaceTokens().size() > 0 || DatabaseDescriptor.getReplaceNode() != null)
                 throw new RuntimeException("Replace method removed; use cassandra.replace_address instead");
+
+            if (!MessagingService.instance().isListening())
+                MessagingService.instance().listen();
+
+            UUID localHostId = SystemKeyspace.getLocalHostId();
+
             if (replacing)
             {
                 if (SystemKeyspace.bootstrapComplete())
                     throw new RuntimeException("Cannot replace address with a node that is already bootstrapped");
-                if (!DatabaseDescriptor.isAutoBootstrap())
-                    throw new RuntimeException("Trying to replace_address with auto_bootstrap disabled will not work, check your configuration");
-                bootstrapTokens = prepareReplacementInfo();
+
+                if (!(Boolean.parseBoolean(System.getProperty("cassandra.join_ring", "true"))))
+                    throw new ConfigurationException("Cannot set both join_ring=false and attempt to replace a node");
+
+                if (!DatabaseDescriptor.isAutoBootstrap() && !Boolean.getBoolean("cassandra.allow_unsafe_replace"))
+                    throw new RuntimeException("Replacing a node without bootstrapping risks invalidating consistency " +
+                                               "guarantees as the expected data may not be present until repair is run. " +
+                                               "To perform this operation, please restart with " +
+                                               "-Dcassandra.allow_unsafe_replace=true");
+
+                InetAddress replaceAddress = DatabaseDescriptor.getReplaceAddress();
+                localHostId = prepareReplacementInfo(replaceAddress);
                 appStates.put(ApplicationState.TOKENS, valueFactory.tokens(bootstrapTokens));
-                appStates.put(ApplicationState.STATUS, valueFactory.hibernate(true));
+
+                // if want to bootstrap the ranges of the node we're replacing,
+                // go into hibernate mode while that happens. Otherwise, persist
+                // the tokens we're taking over locally so that they don't get
+                // clobbered with auto generated ones in joinTokenRing
+                if (DatabaseDescriptor.isAutoBootstrap())
+                    appStates.put(ApplicationState.STATUS, valueFactory.hibernate(true));
+                else
+                    SystemKeyspace.updateTokens(bootstrapTokens);
             }
-            else if (shouldBootstrap())
+            else
             {
-                checkForEndpointCollision();
+                checkForEndpointCollision(localHostId);
             }
 
             // have to start the gossip service before we can see any info on other nodes.  this is necessary
             // for bootstrap to get the load info it needs.
             // (we won't be part of the storage ring though until we add a counterId to our state, below.)
             // Seed the host ID-to-endpoint map with our own ID.
-            UUID localHostId = SystemKeyspace.getLocalHostId();
             getTokenMetadata().updateHostId(localHostId, FBUtilities.getBroadcastAddress());
             appStates.put(ApplicationState.NET_VERSION, valueFactory.networkVersion());
             appStates.put(ApplicationState.HOST_ID, valueFactory.hostId(localHostId));
             appStates.put(ApplicationState.RPC_ADDRESS, valueFactory.rpcaddress(DatabaseDescriptor.getBroadcastRpcAddress()));
             appStates.put(ApplicationState.RELEASE_VERSION, valueFactory.releaseVersion());
+
+            // load the persisted ring state. This used to be done earlier in the init process,
+            // but now we always perform a shadow round when preparing to join and we have to
+            // clear endpoint states after doing that.
+            loadRingState();
+
             logger.info("Starting up server gossip");
             Gossiper.instance.register(this);
             Gossiper.instance.start(SystemKeyspace.incrementAndGetGeneration(), appStates); // needed for node-ring gathering.
+            gossipActive = true;
             // gossip snitch infos (local DC and rack)
             gossipSnitchInfo();
             // gossip Schema.emptyVersion forcing immediate check for schema updates (see MigrationManager#maybeScheduleSchemaPull)
             Schema.instance.updateVersionAndAnnounce(); // Ensure we know our own actual Schema UUID in preparation for updates
-
-            if (!MessagingService.instance().isListening())
-                MessagingService.instance().listen();
             LoadBroadcaster.instance.startBroadcasting();
-
             HintsService.instance.startDispatch();
             BatchlogManager.instance.start();
         }
@@ -996,6 +1043,12 @@
         DatabaseDescriptor.getAuthenticator().setup();
         DatabaseDescriptor.getAuthorizer().setup();
         MigrationManager.instance.register(new AuthMigrationListener());
+        authSetupComplete = true;
+    }
+
+    public boolean isAuthSetupComplete()
+    {
+        return authSetupComplete;
     }
 
     private void maybeAddKeyspace(KeyspaceMetadata ksm)
@@ -1050,13 +1103,26 @@
 
     public void rebuild(String sourceDc)
     {
+        rebuild(sourceDc, null, null);
+    }
+
+    public void rebuild(String sourceDc, String keyspace, String tokens)
+    {
         // check on going rebuild
         if (!isRebuilding.compareAndSet(false, true))
         {
             throw new IllegalStateException("Node is still rebuilding. Check nodetool netstats.");
         }
 
-        logger.info("rebuild from dc: {}", sourceDc == null ? "(any dc)" : sourceDc);
+        // check the arguments
+        if (keyspace == null && tokens != null)
+        {
+            throw new IllegalArgumentException("Cannot specify tokens without keyspace.");
+        }
+
+        logger.info("rebuild from dc: {}, {}, {}", sourceDc == null ? "(any dc)" : sourceDc,
+                    keyspace == null ? "(All keyspaces)" : keyspace,
+                    tokens == null ? "(All tokens)" : tokens);
 
         try
         {
@@ -1066,13 +1132,41 @@
                                                        "Rebuild",
                                                        !replacing && useStrictConsistency,
                                                        DatabaseDescriptor.getEndpointSnitch(),
-                                                       streamStateStore);
+                                                       streamStateStore,
+                                                       false);
             streamer.addSourceFilter(new RangeStreamer.FailureDetectorSourceFilter(FailureDetector.instance));
             if (sourceDc != null)
                 streamer.addSourceFilter(new RangeStreamer.SingleDatacenterFilter(DatabaseDescriptor.getEndpointSnitch(), sourceDc));
 
-            for (String keyspaceName : Schema.instance.getNonLocalStrategyKeyspaces())
-                streamer.addRanges(keyspaceName, getLocalRanges(keyspaceName));
+            if (keyspace == null)
+            {
+                for (String keyspaceName : Schema.instance.getNonLocalStrategyKeyspaces())
+                    streamer.addRanges(keyspaceName, getLocalRanges(keyspaceName));
+            }
+            else if (tokens == null)
+            {
+                streamer.addRanges(keyspace, getLocalRanges(keyspace));
+            }
+            else
+            {
+                Token.TokenFactory factory = getTokenFactory();
+                List<Range<Token>> ranges = new ArrayList<>();
+                Pattern rangePattern = Pattern.compile("\\(\\s*(-?\\w+)\\s*,\\s*(-?\\w+)\\s*\\]");
+                try (Scanner tokenScanner = new Scanner(tokens))
+                {
+                    while (tokenScanner.findInLine(rangePattern) != null)
+                    {
+                        MatchResult range = tokenScanner.match();
+                        Token startToken = factory.fromString(range.group(1));
+                        Token endToken = factory.fromString(range.group(2));
+                        logger.info(String.format("adding range: (%s,%s]", startToken, endToken));
+                        ranges.add(new Range<>(startToken, endToken));
+                    }
+                    if (tokenScanner.hasNext())
+                        throw new IllegalArgumentException("Unexpected string: " + tokenScanner.next());
+                }
+                streamer.addRanges(keyspace, ranges);
+            }
 
             StreamResultFuture resultFuture = streamer.fetchAsync();
             // wait for result
@@ -1095,6 +1189,94 @@
         }
     }
 
+    public void setRpcTimeout(long value)
+    {
+        DatabaseDescriptor.setRpcTimeout(value);
+        logger.info("set rpc timeout to {} ms", value);
+    }
+
+    public long getRpcTimeout()
+    {
+        return DatabaseDescriptor.getRpcTimeout();
+    }
+
+    public void setReadRpcTimeout(long value)
+    {
+        DatabaseDescriptor.setReadRpcTimeout(value);
+        logger.info("set read rpc timeout to {} ms", value);
+    }
+
+    public long getReadRpcTimeout()
+    {
+        return DatabaseDescriptor.getReadRpcTimeout();
+    }
+
+    public void setRangeRpcTimeout(long value)
+    {
+        DatabaseDescriptor.setRangeRpcTimeout(value);
+        logger.info("set range rpc timeout to {} ms", value);
+    }
+
+    public long getRangeRpcTimeout()
+    {
+        return DatabaseDescriptor.getRangeRpcTimeout();
+    }
+
+    public void setWriteRpcTimeout(long value)
+    {
+        DatabaseDescriptor.setWriteRpcTimeout(value);
+        logger.info("set write rpc timeout to {} ms", value);
+    }
+
+    public long getWriteRpcTimeout()
+    {
+        return DatabaseDescriptor.getWriteRpcTimeout();
+    }
+
+    public void setCounterWriteRpcTimeout(long value)
+    {
+        DatabaseDescriptor.setCounterWriteRpcTimeout(value);
+        logger.info("set counter write rpc timeout to {} ms", value);
+    }
+
+    public long getCounterWriteRpcTimeout()
+    {
+        return DatabaseDescriptor.getCounterWriteRpcTimeout();
+    }
+
+    public void setCasContentionTimeout(long value)
+    {
+        DatabaseDescriptor.setCasContentionTimeout(value);
+        logger.info("set cas contention rpc timeout to {} ms", value);
+    }
+
+    public long getCasContentionTimeout()
+    {
+        return DatabaseDescriptor.getCasContentionTimeout();
+    }
+
+    public void setTruncateRpcTimeout(long value)
+    {
+        DatabaseDescriptor.setTruncateRpcTimeout(value);
+        logger.info("set truncate rpc timeout to {} ms", value);
+    }
+
+    public long getTruncateRpcTimeout()
+    {
+        return DatabaseDescriptor.getTruncateRpcTimeout();
+    }
+
+    public void setStreamingSocketTimeout(int value)
+    {
+        DatabaseDescriptor.setStreamingSocketTimeout(value);
+        logger.info("set streaming socket timeout to {} ms", value);
+    }
+
+    public int getStreamingSocketTimeout()
+    {
+        return DatabaseDescriptor.getStreamingSocketTimeout();
+    }
+
     public void setStreamThroughputMbPerSec(int value)
     {
         DatabaseDescriptor.setStreamThroughputOutboundMegabitsPerSec(value);
@@ -2408,7 +2590,7 @@
     }
 
     // TODO
-    public final void deliverHints(String host) throws UnknownHostException
+    public final void deliverHints(String host)
     {
         throw new UnsupportedOperationException();
     }
@@ -2630,6 +2812,64 @@
         }
     }
 
+    public int relocateSSTables(String keyspaceName, String ... columnFamilies) throws IOException, ExecutionException, InterruptedException
+    {
+        return relocateSSTables(0, keyspaceName, columnFamilies);
+    }
+
+    public int relocateSSTables(int jobs, String keyspaceName, String ... columnFamilies) throws IOException, ExecutionException, InterruptedException
+    {
+        CompactionManager.AllSSTableOpStatus status = CompactionManager.AllSSTableOpStatus.SUCCESSFUL;
+        for (ColumnFamilyStore cfs : getValidColumnFamilies(false, false, keyspaceName, columnFamilies))
+        {
+            CompactionManager.AllSSTableOpStatus oneStatus = cfs.relocateSSTables(jobs);
+            if (oneStatus != CompactionManager.AllSSTableOpStatus.SUCCESSFUL)
+                status = oneStatus;
+        }
+        return status.statusCode;
+    }
+
+    /**
+     * Takes the snapshot of a multiple column family from different keyspaces. A snapshot name must be specified.
+     *
+     * @param tag
+     *            the tag given to the snapshot; may not be null or empty
+     * @param options
+     *            Map of options (skipFlush is the only supported option for now)
+     * @param entities
+     *            list of keyspaces / tables in the form of empty | ks1 ks2 ... | ks1.cf1,ks2.cf2,...
+     */
+    @Override
+    public void takeSnapshot(String tag, Map<String, String> options, String... entities) throws IOException
+    {
+        boolean skipFlush = Boolean.parseBoolean(options.getOrDefault("skipFlush", "false"));
+
+        if (entities != null && entities.length > 0 && entities[0].contains("."))
+        {
+            takeMultipleTableSnapshot(tag, skipFlush, entities);
+        }
+        else
+        {
+            takeSnapshot(tag, skipFlush, entities);
+        }
+    }
+
+    /**
+     * Takes the snapshot of a specific table. A snapshot name must be
+     * specified.
+     *
+     * @param keyspaceName
+     *            the keyspace which holds the specified table
+     * @param tableName
+     *            the table to snapshot
+     * @param tag
+     *            the tag given to the snapshot; may not be null or empty
+     */
+    public void takeTableSnapshot(String keyspaceName, String tableName, String tag)
+            throws IOException {
+        takeMultipleTableSnapshot(tag, false, keyspaceName + "." + tableName);
+    }
+
     /**
      * Takes the snapshot for the given keyspaces. A snapshot name must be specified.
      *
@@ -2638,6 +2878,32 @@
      */
     public void takeSnapshot(String tag, String... keyspaceNames) throws IOException
     {
+        takeSnapshot(tag, false, keyspaceNames);
+    }
+
+    /**
+     * Takes the snapshot of a multiple column family from different keyspaces. A snapshot name must be specified.
+     *
+     * @param tag
+     *            the tag given to the snapshot; may not be null or empty
+     * @param tableList
+     *            list of tables from different keyspace in the form of ks1.cf1 ks2.cf2
+     */
+    public void takeMultipleTableSnapshot(String tag, String... tableList)
+            throws IOException
+    {
+        takeMultipleTableSnapshot(tag, false, tableList);
+    }
+
+    /**
+     * Takes the snapshot for the given keyspaces. A snapshot name must be specified.
+     *
+     * @param tag the tag given to the snapshot; may not be null or empty
+     * @param skipFlush Skip blocking flush of memtable
+     * @param keyspaceNames the names of the keyspaces to snapshot; empty means "all."
+     */
+    private void takeSnapshot(String tag, boolean skipFlush, String... keyspaceNames) throws IOException
+    {
         if (operationMode == Mode.JOINING)
             throw new IOException("Cannot snapshot until bootstrap completes");
         if (tag == null || tag.equals(""))
@@ -2663,37 +2929,7 @@
 
 
         for (Keyspace keyspace : keyspaces)
-            keyspace.snapshot(tag, null);
-    }
-
-    /**
-     * Takes the snapshot of a specific table. A snapshot name must be specified.
-     *
-     * @param keyspaceName the keyspace which holds the specified table
-     * @param tableName the table to snapshot
-     * @param tag the tag given to the snapshot; may not be null or empty
-     */
-    public void takeTableSnapshot(String keyspaceName, String tableName, String tag) throws IOException
-    {
-        if (keyspaceName == null)
-            throw new IOException("You must supply a keyspace name");
-        if (operationMode == Mode.JOINING)
-            throw new IOException("Cannot snapshot until bootstrap completes");
-
-        if (tableName == null)
-            throw new IOException("You must supply a table name");
-        if (tableName.contains("."))
-            throw new IllegalArgumentException("Cannot take a snapshot of a secondary index by itself. Run snapshot on the table that owns the index.");
-
-        if (tag == null || tag.equals(""))
-            throw new IOException("You must supply a snapshot name.");
-
-        Keyspace keyspace = getValidKeyspace(keyspaceName);
-        ColumnFamilyStore columnFamilyStore = keyspace.getColumnFamilyStore(tableName);
-        if (columnFamilyStore.snapshotExists(tag))
-            throw new IOException("Snapshot " + tag + " already exists.");
-
-        columnFamilyStore.snapshot(tag);
+            keyspace.snapshot(tag, null, skipFlush);
     }
 
     /**
@@ -2702,17 +2938,18 @@
      *
      * @param tag
      *            the tag given to the snapshot; may not be null or empty
+     * @param skipFlush
+     *            Skip blocking flush of memtable
      * @param tableList
      *            list of tables from different keyspace in the form of ks1.cf1 ks2.cf2
      */
-    @Override
-    public void takeMultipleTableSnapshot(String tag, String... tableList)
+    private void takeMultipleTableSnapshot(String tag, boolean skipFlush, String... tableList)
             throws IOException
     {
         Map<Keyspace, List<String>> keyspaceColumnfamily = new HashMap<Keyspace, List<String>>();
         for (String table : tableList)
         {
-            String splittedString[] = table.split("\\.");
+            String splittedString[] = StringUtils.split(table, '.');
             if (splittedString.length == 2)
             {
                 String keyspaceName = splittedString[0];
@@ -2755,7 +2992,7 @@
         for (Entry<Keyspace, List<String>> entry : keyspaceColumnfamily.entrySet())
         {
             for (String table : entry.getValue())
-                entry.getKey().snapshot(tag, table);
+                entry.getKey().snapshot(tag, table, skipFlush);
         }
 
     }
@@ -3795,7 +4032,7 @@
 
     /**
      * Force a remove operation to complete. This may be necessary if a remove operation
-     * blocks forever due to node/stream failure. removeToken() must be called
+     * blocks forever due to node/stream failure. removeNode() must be called
      * first, this is a last resort measure.  No further attempt will be made to restore replicas.
      */
     public void forceRemoveCompletion()
@@ -3814,18 +4051,18 @@
         }
         else
         {
-            logger.warn("No tokens to force removal on, call 'removenode' first");
+            logger.warn("No nodes to force removal on, call 'removenode' first");
         }
     }
 
     /**
      * Remove a node that has died, attempting to restore the replica count.
      * If the node is alive, decommission should be attempted.  If decommission
-     * fails, then removeToken should be called.  If we fail while trying to
+     * fails, then removeNode should be called.  If we fail while trying to
      * restore the replica count, finally forceRemoveCompleteion should be
      * called to forcibly remove the node without regard to replica count.
      *
-     * @param hostIdString token for the node
+     * @param hostIdString Host ID for the node
      */
     public void removeNode(String hostIdString)
     {
@@ -3837,7 +4074,8 @@
         if (endpoint == null)
             throw new UnsupportedOperationException("Host ID not found.");
 
-        Collection<Token> tokens = tokenMetadata.getTokens(endpoint);
+        if (!tokenMetadata.isMember(endpoint))
+            throw new UnsupportedOperationException("Node to be removed is not a member of the token ring");
 
         if (endpoint.equals(myAddress))
              throw new UnsupportedOperationException("Cannot remove self");
@@ -3852,6 +4090,8 @@
         if (!replicatingNodes.isEmpty())
             throw new UnsupportedOperationException("This node is already processing a removal. Wait for it to complete, or use 'removenode force' if this has failed.");
 
+        Collection<Token> tokens = tokenMetadata.getTokens(endpoint);
+
         // Find the endpoints that are going to become responsible for data
         for (String keyspaceName : Schema.instance.getNonLocalStrategyKeyspaces())
         {
@@ -4181,6 +4421,25 @@
         return Collections.unmodifiableList(Schema.instance.getNonLocalStrategyKeyspaces());
     }
 
+    public Map<String, String> getViewBuildStatuses(String keyspace, String view)
+    {
+        Map<UUID, String> coreViewStatus = SystemDistributedKeyspace.viewStatus(keyspace, view);
+        Map<InetAddress, UUID> hostIdToEndpoint = tokenMetadata.getEndpointToHostIdMapForReading();
+        Map<String, String> result = new HashMap<>();
+
+        for (Map.Entry<InetAddress, UUID> entry : hostIdToEndpoint.entrySet())
+        {
+            UUID hostId = entry.getValue();
+            InetAddress endpoint = entry.getKey();
+            result.put(endpoint.toString(),
+                       coreViewStatus.containsKey(hostId)
+                       ? coreViewStatus.get(hostId)
+                       : "UNKNOWN");
+        }
+
+        return Collections.unmodifiableMap(result);
+    }
+
     public void updateSnitch(String epSnitchClassName, Boolean dynamic, Integer dynamicUpdateInterval, Integer dynamicResetInterval, Double dynamicBadnessThreshold) throws ClassNotFoundException
     {
         IEndpointSnitch oldSnitch = DatabaseDescriptor.getEndpointSnitch();
@@ -4499,4 +4758,61 @@
         logger.info(String.format("Updated hinted_handoff_throttle_in_kb to %d", throttleInKB));
     }
 
+    public static List<PartitionPosition> getDiskBoundaries(ColumnFamilyStore cfs, Directories.DataDirectory[] directories)
+    {
+        if (!cfs.getPartitioner().splitter().isPresent())
+            return null;
+
+        Collection<Range<Token>> lr;
+
+        if (StorageService.instance.isBootstrapMode())
+        {
+            lr = StorageService.instance.getTokenMetadata().getPendingRanges(cfs.keyspace.getName(), FBUtilities.getBroadcastAddress());
+        }
+        else
+        {
+            // Reason we use use the future settled TMD is that if we decommission a node, we want to stream
+            // from that node to the correct location on disk, if we didn't, we would put new files in the wrong places.
+            // We do this to minimize the amount of data we need to move in rebalancedisks once everything settled
+            TokenMetadata tmd = StorageService.instance.getTokenMetadata().cloneAfterAllSettled();
+            lr = cfs.keyspace.getReplicationStrategy().getAddressRanges(tmd).get(FBUtilities.getBroadcastAddress());
+        }
+
+        if (lr == null || lr.isEmpty())
+            return null;
+        List<Range<Token>> localRanges = Range.sort(lr);
+
+        return getDiskBoundaries(localRanges, cfs.getPartitioner(), directories);
+    }
+
+    public static List<PartitionPosition> getDiskBoundaries(ColumnFamilyStore cfs)
+    {
+        return getDiskBoundaries(cfs, cfs.getDirectories().getWriteableLocations());
+    }
+
+    /**
+     * Returns a list of disk boundaries, the result will differ depending on whether vnodes are enabled or not.
+     *
+     * What is returned are upper bounds for the disks, meaning everything from partitioner.minToken up to
+     * getDiskBoundaries(..).get(0) should be on the first disk, everything between 0 to 1 should be on the second disk
+     * etc.
+     *
+     * The final entry in the returned list will always be the partitioner maximum tokens upper key bound
+     *
+     * @param localRanges
+     * @param partitioner
+     * @param dataDirectories
+     * @return
+     */
+    public static List<PartitionPosition> getDiskBoundaries(List<Range<Token>> localRanges, IPartitioner partitioner, Directories.DataDirectory[] dataDirectories)
+    {
+        assert partitioner.splitter().isPresent();
+        Splitter splitter = partitioner.splitter().get();
+        List<Token> boundaries = splitter.splitOwnedRanges(dataDirectories.length, localRanges, DatabaseDescriptor.getNumTokens() > 1);
+        List<PartitionPosition> diskBoundaries = new ArrayList<>();
+        for (int i = 0; i < boundaries.size() - 1; i++)
+            diskBoundaries.add(boundaries.get(i).maxKeyBound());
+        diskBoundaries.add(partitioner.getMaximumToken().maxKeyBound());
+        return diskBoundaries;
+    }
 }

diff --git a/src/java/org/apache/cassandra/service/StorageServiceMBean.java b/src/java/org/apache/cassandra/service/StorageServiceMBean.java
index e111bc4..f7da817 100644
--- a/src/java/org/apache/cassandra/service/StorageServiceMBean.java
+++ b/src/java/org/apache/cassandra/service/StorageServiceMBean.java

@@ -194,31 +194,34 @@
     public List<InetAddress> getNaturalEndpoints(String keyspaceName, ByteBuffer key);
 
     /**
-     * Takes the snapshot for the given keyspaces. A snapshot name must be specified.
-     *
-     * @param tag the tag given to the snapshot; may not be null or empty
-     * @param keyspaceNames the name of the keyspaces to snapshot; empty means "all."
+     * @deprecated use {@link #takeSnapshot(String tag, Map options, String... entities)} instead.
      */
+    @Deprecated
     public void takeSnapshot(String tag, String... keyspaceNames) throws IOException;
 
     /**
-     * Takes the snapshot of a specific column family. A snapshot name must be specified.
-     *
-     * @param keyspaceName the keyspace which holds the specified column family
-     * @param tableName the table to snapshot
-     * @param tag the tag given to the snapshot; may not be null or empty
+     * @deprecated use {@link #takeSnapshot(String tag, Map options, String... entities)} instead.
      */
+    @Deprecated
     public void takeTableSnapshot(String keyspaceName, String tableName, String tag) throws IOException;
 
     /**
+     * @deprecated use {@link #takeSnapshot(String tag, Map options, String... entities)} instead.
+     */
+    @Deprecated
+    public void takeMultipleTableSnapshot(String tag, String... tableList) throws IOException;
+
+    /**
      * Takes the snapshot of a multiple column family from different keyspaces. A snapshot name must be specified.
      * 
      * @param tag
      *            the tag given to the snapshot; may not be null or empty
-     * @param tableList
-     *            list of tables from different keyspace in the form of ks1.cf1 ks2.cf2
+     * @param options
+     *            Map of options (skipFlush is the only supported option for now)
+     * @param entities
+     *            list of keyspaces / tables in the form of empty | ks1 ks2 ... | ks1.cf1,ks2.cf2,...
      */
-    public void takeMultipleTableSnapshot(String tag, String... tableList) throws IOException;
+    public void takeSnapshot(String tag, Map<String, String> options, String... entities) throws IOException;
 
     /**
      * Remove the snapshot with the given name from the given keyspaces.
@@ -248,6 +251,9 @@
      */
     public void forceKeyspaceCompaction(boolean splitOutput, String keyspaceName, String... tableNames) throws IOException, ExecutionException, InterruptedException;
 
+    @Deprecated
+    public int relocateSSTables(String keyspace, String ... cfnames) throws IOException, ExecutionException, InterruptedException;
+    public int relocateSSTables(int jobs, String keyspace, String ... cfnames) throws IOException, ExecutionException, InterruptedException;
     /**
      * Trigger a cleanup of keys on a single keyspace
      */
@@ -390,7 +396,7 @@
      * If classQualifer is not empty but level is empty/null, it will set the level to null for the defined classQualifer<br>
      * If level cannot be parsed, then the level will be defaulted to DEBUG<br>
      * <br>
-     * The logback configuration should have < jmxConfigurator /> set
+     * The logback configuration should have {@code < jmxConfigurator />} set
      * 
      * @param classQualifier The logger's classQualifer
      * @param level The log level
@@ -429,7 +435,7 @@
 
     /**
      * given a list of tokens (representing the nodes in the cluster), returns
-     *   a mapping from "token -> %age of cluster owned by that token"
+     *   a mapping from {@code "token -> %age of cluster owned by that token"}
      */
     public Map<InetAddress, Float> getOwnership();
 
@@ -448,6 +454,8 @@
 
     public List<String> getNonLocalStrategyKeyspaces();
 
+    public Map<String, String> getViewBuildStatuses(String keyspace, String view);
+
     /**
      * Change endpointsnitch class and dynamic-ness (and dynamic attributes) at runtime
      * @param epSnitchClassName        the canonical path name for a class implementing IEndpointSnitch
@@ -470,7 +478,7 @@
     // allows a user to forcibly completely stop cassandra
     public void stopDaemon();
 
-    // to determine if gossip is disabled
+    // to determine if initialization has completed
     public boolean isInitialized();
 
     // allows a user to disable thrift
@@ -490,6 +498,30 @@
     public void joinRing() throws IOException;
     public boolean isJoined();
 
+    public void setRpcTimeout(long value);
+    public long getRpcTimeout();
+
+    public void setReadRpcTimeout(long value);
+    public long getReadRpcTimeout();
+
+    public void setRangeRpcTimeout(long value);
+    public long getRangeRpcTimeout();
+
+    public void setWriteRpcTimeout(long value);
+    public long getWriteRpcTimeout();
+
+    public void setCounterWriteRpcTimeout(long value);
+    public long getCounterWriteRpcTimeout();
+
+    public void setCasContentionTimeout(long value);
+    public long getCasContentionTimeout();
+
+    public void setTruncateRpcTimeout(long value);
+    public long getTruncateRpcTimeout();
+
+    public void setStreamingSocketTimeout(int value);
+    public int getStreamingSocketTimeout();
+
     public void setStreamThroughputMbPerSec(int value);
     public int getStreamThroughputMbPerSec();
 
@@ -511,6 +543,16 @@
      */
     public void rebuild(String sourceDc);
 
+    /**
+     * Same as {@link #rebuild(String)}, but only for specified keyspace and ranges.
+     *
+     * @param sourceDc Name of DC from which to select sources for streaming or null to pick any node
+     * @param keyspace Name of the keyspace which to rebuild or null to rebuild all keyspaces.
+     * @param tokens Range of tokens to rebuild or null to rebuild all token ranges. In the format of:
+     *               "(start_token_1,end_token_1],(start_token_2,end_token_2],...(start_token_n,end_token_n]"
+     */
+    public void rebuild(String sourceDc, String keyspace, String tokens);
+
     /** Starts a bulk load and blocks until it completes. */
     public void bulkLoad(String directory);
 
@@ -526,7 +568,7 @@
      * Load new SSTables to the given keyspace/table
      *
      * @param ksName The parent keyspace name
-     * @param cfName The ColumnFamily name where SSTables belong
+     * @param tableName The ColumnFamily name where SSTables belong
      */
     public void loadNewSSTables(String ksName, String tableName);
 

diff --git a/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java b/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
index 48f6c04..06252ef 100644
--- a/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java
+++ b/src/java/org/apache/cassandra/service/pager/AbstractQueryPager.java

@@ -52,9 +52,9 @@
         this.remainingInPartition = limits.perPartitionCount();
     }
 
-    public ReadOrderGroup startOrderGroup()
+    public ReadExecutionController executionController()
     {
-        return command.startOrderGroup();
+        return command.executionController();
     }
 
     public PartitionIterator fetchPage(int pageSize, ConsistencyLevel consistency, ClientState clientState) throws RequestValidationException, RequestExecutionException
@@ -67,14 +67,14 @@
         return Transformation.apply(nextPageReadCommand(pageSize).execute(consistency, clientState), pager);
     }
 
-    public PartitionIterator fetchPageInternal(int pageSize, ReadOrderGroup orderGroup) throws RequestValidationException, RequestExecutionException
+    public PartitionIterator fetchPageInternal(int pageSize, ReadExecutionController executionController) throws RequestValidationException, RequestExecutionException
     {
         if (isExhausted())
             return EmptyIterators.partition();
 
         pageSize = Math.min(pageSize, remaining);
         Pager pager = new Pager(limits.forPaging(pageSize), command.nowInSec());
-        return Transformation.apply(nextPageReadCommand(pageSize).executeInternal(orderGroup), pager);
+        return Transformation.apply(nextPageReadCommand(pageSize).executeInternal(executionController), pager);
     }
 
     private class Pager extends Transformation<RowIterator>

diff --git a/src/java/org/apache/cassandra/service/pager/MultiPartitionPager.java b/src/java/org/apache/cassandra/service/pager/MultiPartitionPager.java
index 8caa14d..57d6c62 100644
--- a/src/java/org/apache/cassandra/service/pager/MultiPartitionPager.java
+++ b/src/java/org/apache/cassandra/service/pager/MultiPartitionPager.java

@@ -88,7 +88,7 @@
             return null;
 
         PagingState state = pagers[current].state();
-        return new PagingState(pagers[current].key(), state == null ? null : state.rowMark, remaining, Integer.MAX_VALUE);
+        return new PagingState(pagers[current].key(), state == null ? null : state.rowMark, remaining, pagers[current].remainingInPartition());
     }
 
     public boolean isExhausted()
@@ -106,14 +106,14 @@
         return true;
     }
 
-    public ReadOrderGroup startOrderGroup()
+    public ReadExecutionController executionController()
     {
         // Note that for all pagers, the only difference is the partition key to which it applies, so in practice we
         // can use any of the sub-pager ReadOrderGroup group to protect the whole pager
         for (int i = current; i < pagers.length; i++)
         {
             if (pagers[i] != null)
-                return pagers[i].startOrderGroup();
+                return pagers[i].executionController();
         }
         throw new AssertionError("Shouldn't be called on an exhausted pager");
     }
@@ -129,10 +129,10 @@
     }
 
     @SuppressWarnings("resource") // iter closed via countingIter
-    public PartitionIterator fetchPageInternal(int pageSize, ReadOrderGroup orderGroup) throws RequestValidationException, RequestExecutionException
+    public PartitionIterator fetchPageInternal(int pageSize, ReadExecutionController executionController) throws RequestValidationException, RequestExecutionException
     {
         int toQuery = Math.min(remaining, pageSize);
-        PagersIterator iter = new PagersIterator(toQuery, null, null, orderGroup);
+        PagersIterator iter = new PagersIterator(toQuery, null, null, executionController);
         DataLimits.Counter counter = limit.forPaging(toQuery).newCounter(nowInSec, true);
         iter.setCounter(counter);
         return counter.applyTo(iter);
@@ -149,14 +149,14 @@
         private final ClientState clientState;
 
         // For internal queries
-        private final ReadOrderGroup orderGroup;
+        private final ReadExecutionController executionController;
 
-        public PagersIterator(int pageSize, ConsistencyLevel consistency, ClientState clientState, ReadOrderGroup orderGroup)
+        public PagersIterator(int pageSize, ConsistencyLevel consistency, ClientState clientState, ReadExecutionController executionController)
         {
             this.pageSize = pageSize;
             this.consistency = consistency;
             this.clientState = clientState;
-            this.orderGroup = orderGroup;
+            this.executionController = executionController;
         }
 
         public void setCounter(DataLimits.Counter counter)
@@ -177,7 +177,7 @@
 
                 int toQuery = pageSize - counter.counted();
                 result = consistency == null
-                       ? pagers[current].fetchPageInternal(toQuery, orderGroup)
+                       ? pagers[current].fetchPageInternal(toQuery, executionController)
                        : pagers[current].fetchPage(toQuery, consistency, clientState);
             }
             return result.next();

diff --git a/src/java/org/apache/cassandra/service/pager/QueryPager.java b/src/java/org/apache/cassandra/service/pager/QueryPager.java
index cdf2b97..e2d7f5e 100644
--- a/src/java/org/apache/cassandra/service/pager/QueryPager.java
+++ b/src/java/org/apache/cassandra/service/pager/QueryPager.java

@@ -18,8 +18,8 @@
 package org.apache.cassandra.service.pager;
 
 import org.apache.cassandra.db.ConsistencyLevel;
+import org.apache.cassandra.db.ReadExecutionController;
 import org.apache.cassandra.db.EmptyIterators;
-import org.apache.cassandra.db.ReadOrderGroup;
 import org.apache.cassandra.db.partitions.PartitionIterator;
 import org.apache.cassandra.exceptions.RequestExecutionException;
 import org.apache.cassandra.exceptions.RequestValidationException;
@@ -46,11 +46,11 @@
  */
 public interface QueryPager
 {
-    public static final QueryPager EMPTY = new QueryPager()
+    QueryPager EMPTY = new QueryPager()
     {
-        public ReadOrderGroup startOrderGroup()
+        public ReadExecutionController executionController()
         {
-            return ReadOrderGroup.emptyGroup();
+            return ReadExecutionController.empty();
         }
 
         public PartitionIterator fetchPage(int pageSize, ConsistencyLevel consistency, ClientState clientState) throws RequestValidationException, RequestExecutionException
@@ -58,7 +58,7 @@
             return EmptyIterators.partition();
         }
 
-        public PartitionIterator fetchPageInternal(int pageSize, ReadOrderGroup orderGroup) throws RequestValidationException, RequestExecutionException
+        public PartitionIterator fetchPageInternal(int pageSize, ReadExecutionController executionController) throws RequestValidationException, RequestExecutionException
         {
             return EmptyIterators.partition();
         }
@@ -99,16 +99,16 @@
      *
      * @return a newly started order group for this {@code QueryPager}.
      */
-    public ReadOrderGroup startOrderGroup();
+    public ReadExecutionController executionController();
 
     /**
      * Fetches the next page internally (in other, this does a local query).
      *
      * @param pageSize the maximum number of elements to return in the next page.
-     * @param orderGroup the {@code ReadOrderGroup} protecting the read.
+     * @param executionController the {@code ReadExecutionController} protecting the read.
      * @return the page of result.
      */
-    public PartitionIterator fetchPageInternal(int pageSize, ReadOrderGroup orderGroup) throws RequestValidationException, RequestExecutionException;
+    public PartitionIterator fetchPageInternal(int pageSize, ReadExecutionController executionController) throws RequestValidationException, RequestExecutionException;
 
     /**
      * Whether or not this pager is exhausted, i.e. whether or not a call to

diff --git a/src/java/org/apache/cassandra/service/pager/SinglePartitionPager.java b/src/java/org/apache/cassandra/service/pager/SinglePartitionPager.java
index 6f17284..acb55bb 100644
--- a/src/java/org/apache/cassandra/service/pager/SinglePartitionPager.java
+++ b/src/java/org/apache/cassandra/service/pager/SinglePartitionPager.java

@@ -70,7 +70,11 @@
 
     protected ReadCommand nextPageReadCommand(int pageSize)
     {
-        return command.forPaging(lastReturned == null ? null : lastReturned.clustering(command.metadata()), pageSize);
+        Clustering clustering = lastReturned == null ? null : lastReturned.clustering(command.metadata());
+        DataLimits limits = (lastReturned == null || command.isForThrift()) ? limits().forPaging(pageSize)
+                                                                            : limits().forPaging(pageSize, key(), remainingInPartition());
+
+        return command.forPaging(clustering, limits);
     }
 
     protected void recordLast(DecoratedKey key, Row last)

diff --git a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java
index d3d8ed2..71ea7bd 100644
--- a/src/java/org/apache/cassandra/streaming/ConnectionHandler.java
+++ b/src/java/org/apache/cassandra/streaming/ConnectionHandler.java

@@ -37,6 +37,8 @@
 
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
+
+import io.netty.util.concurrent.FastThreadLocalThread;
 import org.apache.cassandra.io.util.DataOutputStreamPlus;
 import org.apache.cassandra.io.util.BufferedDataOutputStreamPlus;
 import org.apache.cassandra.io.util.WrappedDataOutputStreamPlus;
@@ -214,7 +216,7 @@
             this.socket = socket;
             this.protocolVersion = protocolVersion;
 
-            new Thread(this, name() + "-" + session.peer).start();
+            new FastThreadLocalThread(this, name() + "-" + socket.getRemoteSocketAddress()).start();
         }
 
         public ListenableFuture<?> close()

diff --git a/src/java/org/apache/cassandra/streaming/StreamCoordinator.java b/src/java/org/apache/cassandra/streaming/StreamCoordinator.java
index 603366d..2cb75f7 100644
--- a/src/java/org/apache/cassandra/streaming/StreamCoordinator.java
+++ b/src/java/org/apache/cassandra/streaming/StreamCoordinator.java

@@ -41,19 +41,23 @@
     // streaming is handled directly by the ConnectionHandler's incoming and outgoing threads.
     private static final DebuggableThreadPoolExecutor streamExecutor = DebuggableThreadPoolExecutor.createWithFixedPoolSize("StreamConnectionEstablisher",
                                                                                                                             FBUtilities.getAvailableProcessors());
+    private final boolean connectSequentially;
 
     private Map<InetAddress, HostStreamingData> peerSessions = new HashMap<>();
     private final int connectionsPerHost;
     private StreamConnectionFactory factory;
     private final boolean keepSSTableLevel;
     private final boolean isIncremental;
+    private Iterator<StreamSession> sessionsToConnect = null;
 
-    public StreamCoordinator(int connectionsPerHost, boolean keepSSTableLevel, boolean isIncremental, StreamConnectionFactory factory)
+    public StreamCoordinator(int connectionsPerHost, boolean keepSSTableLevel, boolean isIncremental,
+                             StreamConnectionFactory factory, boolean connectSequentially)
     {
         this.connectionsPerHost = connectionsPerHost;
         this.factory = factory;
         this.keepSSTableLevel = keepSSTableLevel;
         this.isIncremental = isIncremental;
+        this.connectSequentially = connectSequentially;
     }
 
     public void setConnectionFactory(StreamConnectionFactory factory)
@@ -89,12 +93,59 @@
         return connectionsPerHost == 0;
     }
 
-    public void connectAllStreamSessions()
+    public void connect(StreamResultFuture future)
+    {
+        if (this.connectSequentially)
+            connectSequentially(future);
+        else
+            connectAllStreamSessions();
+    }
+
+    private void connectAllStreamSessions()
     {
         for (HostStreamingData data : peerSessions.values())
             data.connectAllStreamSessions();
     }
 
+    private void connectSequentially(StreamResultFuture future)
+    {
+        sessionsToConnect = getAllStreamSessions().iterator();
+        future.addEventListener(new StreamEventHandler()
+        {
+            public void handleStreamEvent(StreamEvent event)
+            {
+                if (event.eventType == StreamEvent.Type.STREAM_PREPARED || event.eventType == StreamEvent.Type.STREAM_COMPLETE)
+                    connectNext();
+            }
+
+            public void onSuccess(StreamState result)
+            {
+
+            }
+
+            public void onFailure(Throwable t)
+            {
+
+            }
+        });
+        connectNext();
+    }
+
+    private void connectNext()
+    {
+        if (sessionsToConnect == null)
+            return;
+
+        if (sessionsToConnect.hasNext())
+        {
+            StreamSession next = sessionsToConnect.next();
+            logger.debug("Connecting next session {} with {}.", next.planId(), next.peer.getHostAddress());
+            streamExecutor.execute(new StreamSessionConnector(next));
+        }
+        else
+            logger.debug("Finished connecting all sessions");
+    }
+
     public synchronized Set<InetAddress> getPeers()
     {
         return new HashSet<>(peerSessions.keySet());

diff --git a/src/java/org/apache/cassandra/streaming/StreamHook.java b/src/java/org/apache/cassandra/streaming/StreamHook.java
new file mode 100644
index 0000000..d610297
--- /dev/null
+++ b/src/java/org/apache/cassandra/streaming/StreamHook.java

@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.streaming;
+
+import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.io.sstable.SSTableMultiWriter;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.streaming.messages.OutgoingFileMessage;
+import org.apache.cassandra.utils.FBUtilities;
+
+public interface StreamHook
+{
+    public static final StreamHook instance = createHook();
+
+    public OutgoingFileMessage reportOutgoingFile(StreamSession session, SSTableReader sstable, OutgoingFileMessage message);
+    public void reportStreamFuture(StreamSession session, StreamResultFuture future);
+    public void reportIncomingFile(ColumnFamilyStore cfs, SSTableMultiWriter writer, StreamSession session, int sequenceNumber);
+
+    static StreamHook createHook()
+    {
+        String className =  System.getProperty("cassandra.stream_hook");
+        if (className != null)
+        {
+            return FBUtilities.construct(className, StreamHook.class.getSimpleName());
+        }
+        else
+        {
+            return new StreamHook()
+            {
+                public OutgoingFileMessage reportOutgoingFile(StreamSession session, SSTableReader sstable, OutgoingFileMessage message)
+                {
+                    return message;
+                }
+
+                public void reportStreamFuture(StreamSession session, StreamResultFuture future) {}
+
+                public void reportIncomingFile(ColumnFamilyStore cfs, SSTableMultiWriter writer, StreamSession session, int sequenceNumber) {}
+            };
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/streaming/StreamManager.java b/src/java/org/apache/cassandra/streaming/StreamManager.java
index dc8ec19..52652c0 100644
--- a/src/java/org/apache/cassandra/streaming/StreamManager.java
+++ b/src/java/org/apache/cassandra/streaming/StreamManager.java

@@ -131,7 +131,7 @@
             {
                 initiatedStreams.remove(result.planId);
             }
-        }, MoreExecutors.sameThreadExecutor());
+        }, MoreExecutors.directExecutor());
 
         initiatedStreams.put(result.planId, result);
     }
@@ -146,7 +146,7 @@
             {
                 receivingStreams.remove(result.planId);
             }
-        }, MoreExecutors.sameThreadExecutor());
+        }, MoreExecutors.directExecutor());
 
         receivingStreams.put(result.planId, result);
     }

diff --git a/src/java/org/apache/cassandra/streaming/StreamPlan.java b/src/java/org/apache/cassandra/streaming/StreamPlan.java
index 0d963ed..e9d43cb 100644
--- a/src/java/org/apache/cassandra/streaming/StreamPlan.java
+++ b/src/java/org/apache/cassandra/streaming/StreamPlan.java

@@ -32,6 +32,7 @@
  */
 public class StreamPlan
 {
+    public static final String[] EMPTY_COLUMN_FAMILIES = new String[0];
     private final UUID planId = UUIDGen.getTimeUUID();
     private final String description;
     private final List<StreamEventHandler> handlers = new ArrayList<>();
@@ -47,19 +48,21 @@
      */
     public StreamPlan(String description)
     {
-        this(description, ActiveRepairService.UNREPAIRED_SSTABLE, 1, false, false);
+        this(description, ActiveRepairService.UNREPAIRED_SSTABLE, 1, false, false, false);
     }
 
-    public StreamPlan(String description, boolean keepSSTableLevels)
+    public StreamPlan(String description, boolean keepSSTableLevels, boolean connectSequentially)
     {
-        this(description, ActiveRepairService.UNREPAIRED_SSTABLE, 1, keepSSTableLevels, false);
+        this(description, ActiveRepairService.UNREPAIRED_SSTABLE, 1, keepSSTableLevels, false, connectSequentially);
     }
 
-    public StreamPlan(String description, long repairedAt, int connectionsPerHost, boolean keepSSTableLevels, boolean isIncremental)
+    public StreamPlan(String description, long repairedAt, int connectionsPerHost, boolean keepSSTableLevels,
+                      boolean isIncremental, boolean connectSequentially)
     {
         this.description = description;
         this.repairedAt = repairedAt;
-        this.coordinator = new StreamCoordinator(connectionsPerHost, keepSSTableLevels, isIncremental, new DefaultConnectionFactory());
+        this.coordinator = new StreamCoordinator(connectionsPerHost, keepSSTableLevels, isIncremental, new DefaultConnectionFactory(),
+                                                 connectSequentially);
     }
 
     /**
@@ -73,7 +76,7 @@
      */
     public StreamPlan requestRanges(InetAddress from, InetAddress connecting, String keyspace, Collection<Range<Token>> ranges)
     {
-        return requestRanges(from, connecting, keyspace, ranges, new String[0]);
+        return requestRanges(from, connecting, keyspace, ranges, EMPTY_COLUMN_FAMILIES);
     }
 
     /**
@@ -114,7 +117,7 @@
      */
     public StreamPlan transferRanges(InetAddress to, InetAddress connecting, String keyspace, Collection<Range<Token>> ranges)
     {
-        return transferRanges(to, connecting, keyspace, ranges, new String[0]);
+        return transferRanges(to, connecting, keyspace, ranges, EMPTY_COLUMN_FAMILIES);
     }
 
     /**

diff --git a/src/java/org/apache/cassandra/streaming/StreamReader.java b/src/java/org/apache/cassandra/streaming/StreamReader.java
index f8db26b..0cd6329 100644
--- a/src/java/org/apache/cassandra/streaming/StreamReader.java
+++ b/src/java/org/apache/cassandra/streaming/StreamReader.java

@@ -35,9 +35,9 @@
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.rows.*;
-import org.apache.cassandra.io.sstable.Descriptor;
 import org.apache.cassandra.io.sstable.SSTableMultiWriter;
 import org.apache.cassandra.io.sstable.SSTableSimpleIterator;
+import org.apache.cassandra.io.sstable.format.RangeAwareSSTableWriter;
 import org.apache.cassandra.io.sstable.format.SSTableFormat;
 import org.apache.cassandra.io.sstable.format.Version;
 import org.apache.cassandra.io.util.RewindableDataInputStreamPlus;
@@ -67,8 +67,6 @@
     protected final SerializationHeader.Component header;
     protected final int fileSeqNum;
 
-    protected Descriptor desc;
-
     public StreamReader(FileMessageHeader header, StreamSession session)
     {
         this.session = session;
@@ -119,17 +117,17 @@
             {
                 writePartition(deserializer, writer);
                 // TODO move this to BytesReadTracker
-                session.progress(desc, ProgressInfo.Direction.IN, in.getBytesRead(), totalSize);
+                session.progress(writer.getFilename(), ProgressInfo.Direction.IN, in.getBytesRead(), totalSize);
             }
             logger.debug("[Stream #{}] Finished receiving file #{} from {} readBytes = {}, totalSize = {}",
-                         session.planId(), fileSeqNum, session.peer, in.getBytesRead(), totalSize);
+                         session.planId(), fileSeqNum, session.peer, FBUtilities.prettyPrintMemory(in.getBytesRead()), FBUtilities.prettyPrintMemory(totalSize));
             return writer;
         }
         catch (Throwable e)
         {
             if (deserializer != null)
                 logger.warn("[Stream {}] Error while reading partition {} from stream on ks='{}' and table='{}'.",
-                            session.planId(), deserializer.partitionKey(), cfs.keyspace.getName(), cfs.getColumnFamilyName());
+                            session.planId(), deserializer.partitionKey(), cfs.keyspace.getName(), cfs.getTableName());
             if (writer != null)
             {
                 writer.abort(e);
@@ -156,10 +154,11 @@
     {
         Directories.DataDirectory localDir = cfs.getDirectories().getWriteableLocation(totalSize);
         if (localDir == null)
-            throw new IOException("Insufficient disk space to store " + totalSize + " bytes");
-        desc = Descriptor.fromFilename(cfs.getSSTablePath(cfs.getDirectories().getLocationForDisk(localDir), format));
+            throw new IOException(String.format("Insufficient disk space to store %s", FBUtilities.prettyPrintMemory(totalSize)));
 
-        return cfs.createSSTableMultiWriter(desc, estimatedKeys, repairedAt, sstableLevel, getHeader(cfs.metadata), session.getTransaction(cfId));
+        RangeAwareSSTableWriter writer = new RangeAwareSSTableWriter(cfs, estimatedKeys, repairedAt, format, sstableLevel, totalSize, session.getTransaction(cfId), getHeader(cfs.metadata));
+        StreamHook.instance.reportIncomingFile(cfs, writer, session, fileSeqNum);
+        return writer;
     }
 
     protected void drain(InputStream dis, long bytesRead) throws IOException

diff --git a/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java b/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java
index 040906b..88238bc 100644
--- a/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java
+++ b/src/java/org/apache/cassandra/streaming/StreamReceiveTask.java

@@ -35,6 +35,7 @@
 import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.db.Mutation;
 import org.apache.cassandra.db.compaction.OperationType;
+import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
 import org.apache.cassandra.db.partitions.PartitionUpdate;
 import org.apache.cassandra.db.rows.UnfilteredRowIterator;
@@ -151,6 +152,7 @@
         public void run()
         {
             boolean hasViews = false;
+            boolean hasCDC = false;
             ColumnFamilyStore cfs = null;
             try
             {
@@ -165,16 +167,22 @@
                 }
                 cfs = Keyspace.open(kscf.left).getColumnFamilyStore(kscf.right);
                 hasViews = !Iterables.isEmpty(View.findAll(kscf.left, kscf.right));
+                hasCDC = cfs.metadata.params.cdc;
 
                 Collection<SSTableReader> readers = task.sstables;
 
                 try (Refs<SSTableReader> refs = Refs.ref(readers))
                 {
-                    //We have a special path for views.
-                    //Since the view requires cleaning up any pre-existing state, we must put
-                    //all partitions through the same write path as normal mutations.
-                    //This also ensures any 2is are also updated
-                    if (hasViews)
+                    /*
+                     * We have a special path for views and for CDC.
+                     *
+                     * For views, since the view requires cleaning up any pre-existing state, we must put all partitions
+                     * through the same write path as normal mutations. This also ensures any 2is are also updated.
+                     *
+                     * For CDC-enabled tables, we want to ensure that the mutations are run through the CommitLog so they
+                     * can be archived by the CDC process on discard.
+                     */
+                    if (hasViews || hasCDC)
                     {
                         for (SSTableReader reader : readers)
                         {
@@ -184,8 +192,17 @@
                                 {
                                     try (UnfilteredRowIterator rowIterator = scanner.next())
                                     {
-                                        //Apply unsafe (we will flush below before transaction is done)
-                                        new Mutation(PartitionUpdate.fromIterator(rowIterator)).applyUnsafe();
+                                        Mutation m = new Mutation(PartitionUpdate.fromIterator(rowIterator, ColumnFilter.all(cfs.metadata)));
+
+                                        // MV *can* be applied unsafe if there's no CDC on the CFS as we flush below
+                                        // before transaction is done.
+                                        //
+                                        // If the CFS has CDC, however, these updates need to be written to the CommitLog
+                                        // so they get archived into the cdc_raw folder
+                                        if (hasCDC)
+                                            m.apply();
+                                        else
+                                            m.applyUnsafe();
                                     }
                                 }
                             }
@@ -235,9 +252,9 @@
             }
             finally
             {
-                //We don't keep the streamed sstables since we've applied them manually
-                //So we abort the txn and delete the streamed sstables
-                if (hasViews)
+                // We don't keep the streamed sstables since we've applied them manually so we abort the txn and delete
+                // the streamed sstables.
+                if (hasViews || hasCDC)
                 {
                     if (cfs != null)
                         cfs.forceBlockingFlush();

diff --git a/src/java/org/apache/cassandra/streaming/StreamResultFuture.java b/src/java/org/apache/cassandra/streaming/StreamResultFuture.java
index b299b87..71ca9b1 100644
--- a/src/java/org/apache/cassandra/streaming/StreamResultFuture.java
+++ b/src/java/org/apache/cassandra/streaming/StreamResultFuture.java

@@ -19,6 +19,7 @@
 
 import java.io.IOException;
 import java.net.InetAddress;
+import java.net.Socket;
 import java.util.*;
 import java.util.concurrent.ConcurrentLinkedQueue;
 
@@ -28,6 +29,7 @@
 import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.net.IncomingStreamingConnection;
+import org.apache.cassandra.utils.FBUtilities;
 
 /**
  * A future on the result ({@link StreamState}) of a streaming plan.
@@ -72,10 +74,12 @@
 
     private StreamResultFuture(UUID planId, String description, boolean keepSSTableLevels, boolean isIncremental)
     {
-        this(planId, description, new StreamCoordinator(0, keepSSTableLevels, isIncremental, new DefaultConnectionFactory()));
+        this(planId, description, new StreamCoordinator(0, keepSSTableLevels, isIncremental,
+                                                        new DefaultConnectionFactory(), false));
     }
 
-    static StreamResultFuture init(UUID planId, String description, Collection<StreamEventHandler> listeners, StreamCoordinator coordinator)
+    static StreamResultFuture init(UUID planId, String description, Collection<StreamEventHandler> listeners,
+                                   StreamCoordinator coordinator)
     {
         StreamResultFuture future = createAndRegister(planId, description, coordinator);
         if (listeners != null)
@@ -84,14 +88,15 @@
                 future.addEventListener(listener);
         }
 
-        logger.info("[Stream #{}] Executing streaming plan for {}", planId, description);
+        logger.info("[Stream #{}] Executing streaming plan for {}", planId,  description);
 
         // Initialize and start all sessions
         for (final StreamSession session : coordinator.getAllStreamSessions())
         {
             session.init(future);
         }
-        coordinator.connectAllStreamSessions();
+
+        coordinator.connect(future);
 
         return future;
     }
@@ -166,13 +171,13 @@
     void handleSessionPrepared(StreamSession session)
     {
         SessionInfo sessionInfo = session.getSessionInfo();
-        logger.info("[Stream #{} ID#{}] Prepare completed. Receiving {} files({} bytes), sending {} files({} bytes)",
-                    session.planId(),
-                    session.sessionIndex(),
-                    sessionInfo.getTotalFilesToReceive(),
-                    sessionInfo.getTotalSizeToReceive(),
-                    sessionInfo.getTotalFilesToSend(),
-                    sessionInfo.getTotalSizeToSend());
+        logger.info("[Stream #{} ID#{}] Prepare completed. Receiving {} files({}), sending {} files({})",
+                              session.planId(),
+                              session.sessionIndex(),
+                              sessionInfo.getTotalFilesToReceive(),
+                              FBUtilities.prettyPrintMemory(sessionInfo.getTotalSizeToReceive()),
+                              sessionInfo.getTotalFilesToSend(),
+                              FBUtilities.prettyPrintMemory(sessionInfo.getTotalSizeToSend()));
         StreamEvent.SessionPreparedEvent event = new StreamEvent.SessionPreparedEvent(planId, sessionInfo);
         coordinator.addSessionInfo(sessionInfo);
         fireStreamEvent(event);

diff --git a/src/java/org/apache/cassandra/streaming/StreamSession.java b/src/java/org/apache/cassandra/streaming/StreamSession.java
index 12f561b..e99abbd 100644
--- a/src/java/org/apache/cassandra/streaming/StreamSession.java
+++ b/src/java/org/apache/cassandra/streaming/StreamSession.java

@@ -43,8 +43,6 @@
 import org.apache.cassandra.dht.Range;
 import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.gms.*;
-import org.apache.cassandra.io.sstable.Component;
-import org.apache.cassandra.io.sstable.Descriptor;
 import org.apache.cassandra.metrics.StreamingMetrics;
 import org.apache.cassandra.service.ActiveRepairService;
 import org.apache.cassandra.streaming.messages.*;
@@ -226,6 +224,7 @@
     public void init(StreamResultFuture streamResult)
     {
         this.streamResult = streamResult;
+        StreamHook.instance.reportStreamFuture(this, streamResult);
     }
 
     public void start()
@@ -539,7 +538,10 @@
         }
         else
         {
-            logger.error("[Stream #{}] Streaming error occurred", planId(), e);
+            logger.error("[Stream #{}] Streaming error occurred on session with peer {}{}", planId(),
+                                                                                            peer.getHostAddress(),
+                                                                                            peer.equals(connecting) ? "" : " through " + connecting.getHostAddress(),
+                                                                                            e);
         }
         // send session failure message
         if (handler.isOutgoingConnected())
@@ -607,9 +609,9 @@
         receivers.get(message.header.cfId).received(message.sstable);
     }
 
-    public void progress(Descriptor desc, ProgressInfo.Direction direction, long bytes, long total)
+    public void progress(String filename, ProgressInfo.Direction direction, long bytes, long total)
     {
-        ProgressInfo progress = new ProgressInfo(peer, index, desc.filenameFor(Component.DATA), direction, bytes, total);
+        ProgressInfo progress = new ProgressInfo(peer, index, filename, direction, bytes, total);
         streamResult.handleProgress(progress);
     }
 

diff --git a/src/java/org/apache/cassandra/streaming/StreamTransferTask.java b/src/java/org/apache/cassandra/streaming/StreamTransferTask.java
index f14abd2..e8d0cae 100644
--- a/src/java/org/apache/cassandra/streaming/StreamTransferTask.java
+++ b/src/java/org/apache/cassandra/streaming/StreamTransferTask.java

@@ -23,14 +23,12 @@
 import java.util.concurrent.atomic.AtomicInteger;
 
 import com.google.common.base.Throwables;
-import com.google.common.collect.Iterables;
 
 import org.apache.cassandra.concurrent.NamedThreadFactory;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.streaming.messages.OutgoingFileMessage;
 import org.apache.cassandra.utils.Pair;
 import org.apache.cassandra.utils.concurrent.Ref;
-import org.apache.cassandra.utils.concurrent.RefCounted;
 
 /**
  * StreamTransferTask sends sections of SSTable files in certain ColumnFamily.
@@ -56,6 +54,7 @@
     {
         assert ref.get() != null && cfId.equals(ref.get().metadata.cfId);
         OutgoingFileMessage message = new OutgoingFileMessage(ref, sequenceNumber.getAndIncrement(), estimatedKeys, sections, repairedAt, session.keepSSTableLevel());
+        message = StreamHook.instance.reportOutgoingFile(session, ref.get(), message);
         files.put(message.header.sequenceNumber, message);
         totalSize += message.header.size();
     }

diff --git a/src/java/org/apache/cassandra/streaming/StreamWriter.java b/src/java/org/apache/cassandra/streaming/StreamWriter.java
index 721ae1e..1d30419 100644
--- a/src/java/org/apache/cassandra/streaming/StreamWriter.java
+++ b/src/java/org/apache/cassandra/streaming/StreamWriter.java

@@ -35,6 +35,7 @@
 import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.io.util.RandomAccessReader;
 import org.apache.cassandra.streaming.StreamManager.StreamRateLimiter;
+import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.Pair;
 
 /**
@@ -108,7 +109,7 @@
                     long lastBytesRead = write(file, validator, readOffset, length, bytesRead);
                     bytesRead += lastBytesRead;
                     progress += (lastBytesRead - readOffset);
-                    session.progress(sstable.descriptor, ProgressInfo.Direction.OUT, progress, totalSize);
+                    session.progress(sstable.descriptor.filenameFor(Component.DATA), ProgressInfo.Direction.OUT, progress, totalSize);
                     readOffset = 0;
                 }
 
@@ -116,7 +117,7 @@
                 compressedOutput.flush();
             }
             logger.debug("[Stream #{}] Finished streaming file {} to {}, bytesTransferred = {}, totalSize = {}",
-                         session.planId(), sstable.getFilename(), session.peer, progress, totalSize);
+                         session.planId(), sstable.getFilename(), session.peer, FBUtilities.prettyPrintMemory(progress), FBUtilities.prettyPrintMemory(totalSize));
         }
     }
 

diff --git a/src/java/org/apache/cassandra/streaming/compress/CompressedInputStream.java b/src/java/org/apache/cassandra/streaming/compress/CompressedInputStream.java
index 55ac7ac..3aaa1a3 100644
--- a/src/java/org/apache/cassandra/streaming/compress/CompressedInputStream.java
+++ b/src/java/org/apache/cassandra/streaming/compress/CompressedInputStream.java

@@ -25,14 +25,13 @@
 import java.util.concurrent.BlockingQueue;
 import java.util.concurrent.ThreadLocalRandom;
 import java.util.function.Supplier;
-import java.util.zip.Checksum;
 
 import com.google.common.collect.Iterators;
 import com.google.common.primitives.Ints;
-
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import io.netty.util.concurrent.FastThreadLocalThread;
 import org.apache.cassandra.io.compress.CompressionMetadata;
 import org.apache.cassandra.utils.ChecksumType;
 import org.apache.cassandra.utils.WrappedRunnable;
@@ -60,7 +59,7 @@
     // number of bytes in the buffer that are actually valid
     protected int validBufferBytes = -1;
 
-    private final Checksum checksum;
+    private final ChecksumType checksumType;
 
     // raw checksum bytes
     private final byte[] checksumBytes = new byte[4];
@@ -76,37 +75,67 @@
     public CompressedInputStream(InputStream source, CompressionInfo info, ChecksumType checksumType, Supplier<Double> crcCheckChanceSupplier)
     {
         this.info = info;
-        this.checksum =  checksumType.newInstance();
         this.buffer = new byte[info.parameters.chunkLength()];
         // buffer is limited to store up to 1024 chunks
         this.dataBuffer = new ArrayBlockingQueue<>(Math.min(info.chunks.length, 1024));
         this.crcCheckChanceSupplier = crcCheckChanceSupplier;
+        this.checksumType = checksumType;
 
-        new Thread(new Reader(source, info, dataBuffer)).start();
+        new FastThreadLocalThread(new Reader(source, info, dataBuffer)).start();
     }
 
+    private void decompressNextChunk() throws IOException
+    {
+        try
+        {
+            byte[] compressedWithCRC = dataBuffer.take();
+            if (compressedWithCRC == POISON_PILL)
+                throw new EOFException("No chunk available");
+            decompress(compressedWithCRC);
+        }
+        catch (InterruptedException e)
+        {
+            throw new EOFException("No chunk available");
+        }
+    }
+
+    @Override
     public int read() throws IOException
     {
         if (current >= bufferOffset + buffer.length || validBufferBytes == -1)
-        {
-            try
-            {
-                byte[] compressedWithCRC = dataBuffer.take();
-                if (compressedWithCRC == POISON_PILL)
-                    throw new EOFException("No chunk available");
-                decompress(compressedWithCRC);
-            }
-            catch (InterruptedException e)
-            {
-                throw new EOFException("No chunk available");
-            }
-        }
+            decompressNextChunk();
 
         assert current >= bufferOffset && current < bufferOffset + validBufferBytes;
 
         return ((int) buffer[(int) (current++ - bufferOffset)]) & 0xff;
     }
 
+    @Override
+    public int read(byte[] b, int off, int len) throws IOException
+    {
+        long nextCurrent = current + len;
+
+        if (current >= bufferOffset + buffer.length || validBufferBytes == -1)
+            decompressNextChunk();
+
+        assert nextCurrent >= bufferOffset;
+
+        int read = 0;
+        while (read < len)
+        {
+            int nextLen = Math.min((len - read), (int)((bufferOffset + validBufferBytes) - current));
+
+            System.arraycopy(buffer, (int)(current - bufferOffset), b, off + read, nextLen);
+            read += nextLen;
+
+            current += nextLen;
+            if (read != len)
+                decompressNextChunk();
+        }
+
+        return len;
+    }
+
     public void position(long position)
     {
         assert position >= current : "stream can only read forward.";
@@ -122,14 +151,11 @@
         // validate crc randomly
         if (this.crcCheckChanceSupplier.get() > ThreadLocalRandom.current().nextDouble())
         {
-            checksum.update(compressed, 0, compressed.length - checksumBytes.length);
+            int checksum = (int) checksumType.of(compressed, 0, compressed.length - checksumBytes.length);
 
             System.arraycopy(compressed, compressed.length - checksumBytes.length, checksumBytes, 0, checksumBytes.length);
-            if (Ints.fromByteArray(checksumBytes) != (int) checksum.getValue())
+            if (Ints.fromByteArray(checksumBytes) != checksum)
                 throw new IOException("CRC unmatched");
-
-            // reset checksum object back to the original (blank) state
-            checksum.reset();
         }
 
         // buffer offset is always aligned

diff --git a/src/java/org/apache/cassandra/streaming/compress/CompressedStreamReader.java b/src/java/org/apache/cassandra/streaming/compress/CompressedStreamReader.java
index 9719587..8dafa9c 100644
--- a/src/java/org/apache/cassandra/streaming/compress/CompressedStreamReader.java
+++ b/src/java/org/apache/cassandra/streaming/compress/CompressedStreamReader.java

@@ -17,15 +17,11 @@
  */
 package org.apache.cassandra.streaming.compress;
 
-import java.io.DataInputStream;
 import java.io.IOException;
 import java.nio.channels.Channels;
 import java.nio.channels.ReadableByteChannel;
 
 import com.google.common.base.Throwables;
-
-import org.apache.cassandra.io.sstable.SSTableMultiWriter;
-
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -33,11 +29,13 @@
 import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.io.compress.CompressionMetadata;
+import org.apache.cassandra.io.sstable.SSTableMultiWriter;
+import org.apache.cassandra.io.util.TrackedInputStream;
 import org.apache.cassandra.streaming.ProgressInfo;
 import org.apache.cassandra.streaming.StreamReader;
 import org.apache.cassandra.streaming.StreamSession;
 import org.apache.cassandra.streaming.messages.FileMessageHeader;
-import org.apache.cassandra.io.util.TrackedInputStream;
+import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.Pair;
 
 /**
@@ -90,6 +88,7 @@
         try
         {
             writer = createWriter(cfs, totalSize, repairedAt, format);
+            String filename = writer.getFilename();
             int sectionIdx = 0;
             for (Pair<Long, Long> section : sections)
             {
@@ -105,11 +104,11 @@
                 {
                     writePartition(deserializer, writer);
                     // when compressed, report total bytes of compressed chunks read since remoteFile.size is the sum of chunks transferred
-                    session.progress(desc, ProgressInfo.Direction.IN, cis.getTotalCompressedBytesRead(), totalSize);
+                    session.progress(filename, ProgressInfo.Direction.IN, cis.getTotalCompressedBytesRead(), totalSize);
                 }
             }
             logger.debug("[Stream #{}] Finished receiving file #{} from {} readBytes = {}, totalSize = {}", session.planId(), fileSeqNum,
-                         session.peer, cis.getTotalCompressedBytesRead(), totalSize);
+                         session.peer, FBUtilities.prettyPrintMemory(cis.getTotalCompressedBytesRead()), FBUtilities.prettyPrintMemory(totalSize));
             return writer;
         }
         catch (Throwable e)

diff --git a/src/java/org/apache/cassandra/streaming/compress/CompressedStreamWriter.java b/src/java/org/apache/cassandra/streaming/compress/CompressedStreamWriter.java
index f37af29..900c1ad 100644
--- a/src/java/org/apache/cassandra/streaming/compress/CompressedStreamWriter.java
+++ b/src/java/org/apache/cassandra/streaming/compress/CompressedStreamWriter.java

@@ -29,6 +29,7 @@
 import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.io.compress.CompressionMetadata;
+import org.apache.cassandra.io.sstable.Component;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.io.util.ChannelProxy;
 import org.apache.cassandra.io.util.DataOutputStreamPlus;
@@ -37,6 +38,7 @@
 import org.apache.cassandra.streaming.ProgressInfo;
 import org.apache.cassandra.streaming.StreamSession;
 import org.apache.cassandra.streaming.StreamWriter;
+import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.Pair;
 
 /**
@@ -88,11 +90,11 @@
                     long lastWrite = out.applyToChannel((wbc) -> fc.transferTo(section.left + bytesTransferredFinal, toTransfer, wbc));
                     bytesTransferred += lastWrite;
                     progress += lastWrite;
-                    session.progress(sstable.descriptor, ProgressInfo.Direction.OUT, progress, totalSize);
+                    session.progress(sstable.descriptor.filenameFor(Component.DATA), ProgressInfo.Direction.OUT, progress, totalSize);
                 }
             }
             logger.debug("[Stream #{}] Finished streaming file {} to {}, bytesTransferred = {}, totalSize = {}",
-                         session.planId(), sstable.getFilename(), session.peer, progress, totalSize);
+                         session.planId(), sstable.getFilename(), session.peer, FBUtilities.prettyPrintMemory(progress), FBUtilities.prettyPrintMemory(totalSize));
         }
     }
 

diff --git a/src/java/org/apache/cassandra/streaming/messages/IncomingFileMessage.java b/src/java/org/apache/cassandra/streaming/messages/IncomingFileMessage.java
index d881d43..b828dc4 100644
--- a/src/java/org/apache/cassandra/streaming/messages/IncomingFileMessage.java
+++ b/src/java/org/apache/cassandra/streaming/messages/IncomingFileMessage.java

@@ -72,7 +72,7 @@
             }
         }
 
-        public void serialize(IncomingFileMessage message, DataOutputStreamPlus out, int version, StreamSession session) throws IOException
+        public void serialize(IncomingFileMessage message, DataOutputStreamPlus out, int version, StreamSession session)
         {
             throw new UnsupportedOperationException("Not allowed to call serialize on an incoming file");
         }

diff --git a/src/java/org/apache/cassandra/streaming/messages/OutgoingFileMessage.java b/src/java/org/apache/cassandra/streaming/messages/OutgoingFileMessage.java
index f10b42e..9530e14 100644
--- a/src/java/org/apache/cassandra/streaming/messages/OutgoingFileMessage.java
+++ b/src/java/org/apache/cassandra/streaming/messages/OutgoingFileMessage.java

@@ -38,7 +38,7 @@
 {
     public static Serializer<OutgoingFileMessage> serializer = new Serializer<OutgoingFileMessage>()
     {
-        public OutgoingFileMessage deserialize(ReadableByteChannel in, int version, StreamSession session) throws IOException
+        public OutgoingFileMessage deserialize(ReadableByteChannel in, int version, StreamSession session)
         {
             throw new UnsupportedOperationException("Not allowed to call deserialize on an outgoing file");
         }

diff --git a/src/java/org/apache/cassandra/thrift/CassandraServer.java b/src/java/org/apache/cassandra/thrift/CassandraServer.java
index 0dec94e..a189000 100644
--- a/src/java/org/apache/cassandra/thrift/CassandraServer.java
+++ b/src/java/org/apache/cassandra/thrift/CassandraServer.java

@@ -48,7 +48,6 @@
 import org.apache.cassandra.dht.*;
 import org.apache.cassandra.dht.Range;
 import org.apache.cassandra.exceptions.*;
-import org.apache.cassandra.index.Index;
 import org.apache.cassandra.io.util.DataOutputBuffer;
 import org.apache.cassandra.locator.DynamicEndpointSnitch;
 import org.apache.cassandra.metrics.ClientMetrics;
@@ -361,7 +360,7 @@
     private ClusteringIndexFilter toInternalFilter(CFMetaData metadata, ColumnParent parent, SliceRange range)
     {
         if (metadata.isSuper() && parent.isSetSuper_column())
-            return new ClusteringIndexNamesFilter(FBUtilities.singleton(new Clustering(parent.bufferForSuper_column()), metadata.comparator), range.reversed);
+            return new ClusteringIndexNamesFilter(FBUtilities.singleton(Clustering.make(parent.bufferForSuper_column()), metadata.comparator), range.reversed);
         else
             return new ClusteringIndexSliceFilter(makeSlices(metadata, range), range.reversed);
     }
@@ -385,13 +384,13 @@
                 {
                     if (parent.isSetSuper_column())
                     {
-                        return new ClusteringIndexNamesFilter(FBUtilities.singleton(new Clustering(parent.bufferForSuper_column()), metadata.comparator), false);
+                        return new ClusteringIndexNamesFilter(FBUtilities.singleton(Clustering.make(parent.bufferForSuper_column()), metadata.comparator), false);
                     }
                     else
                     {
                         NavigableSet<Clustering> clusterings = new TreeSet<>(metadata.comparator);
                         for (ByteBuffer bb : predicate.column_names)
-                            clusterings.add(new Clustering(bb));
+                            clusterings.add(Clustering.make(bb));
                         return new ClusteringIndexNamesFilter(clusterings, false);
                     }
                 }
@@ -461,7 +460,7 @@
             // We only want to include the static columns that are selected by the slices
             for (ColumnDefinition def : columns.statics)
             {
-                if (slices.selects(new Clustering(def.name.bytes)))
+                if (slices.selects(Clustering.make(def.name.bytes)))
                     builder.add(def);
             }
             columns = builder.build();
@@ -618,7 +617,7 @@
                     builder.select(dynamicDef, CellPath.create(column_path.column));
                     columns = builder.build();
                 }
-                filter = new ClusteringIndexNamesFilter(FBUtilities.singleton(new Clustering(column_path.super_column), metadata.comparator),
+                filter = new ClusteringIndexNamesFilter(FBUtilities.singleton(Clustering.make(column_path.super_column), metadata.comparator),
                                                   false);
             }
             else
@@ -632,7 +631,7 @@
                     builder.add(cellname.column);
                     builder.add(metadata.compactValueColumn());
                     columns = builder.build();
-                    filter = new ClusteringIndexNamesFilter(FBUtilities.singleton(new Clustering(column_path.column), metadata.comparator), false);
+                    filter = new ClusteringIndexNamesFilter(FBUtilities.singleton(Clustering.make(column_path.column), metadata.comparator), false);
                 }
                 else
                 {
@@ -817,9 +816,21 @@
     private Cell cellFromColumn(CFMetaData metadata, LegacyLayout.LegacyCellName name, Column column)
     {
         CellPath path = name.collectionElement == null ? null : CellPath.create(name.collectionElement);
-        return column.ttl == 0
-             ? BufferCell.live(metadata, name.column, column.timestamp, column.value, path)
-             : BufferCell.expiring(name.column, column.timestamp, column.ttl, FBUtilities.nowInSeconds(), column.value, path);
+        int ttl = getTtl(metadata, column);
+        return ttl == LivenessInfo.NO_TTL
+             ? BufferCell.live(name.column, column.timestamp, column.value, path)
+             : BufferCell.expiring(name.column, column.timestamp, ttl, FBUtilities.nowInSeconds(), column.value, path);
+    }
+
+    private int getTtl(CFMetaData metadata,Column column)
+    {
+        if (!column.isSetTtl())
+            return metadata.params.defaultTimeToLive;
+
+        if (column.ttl == LivenessInfo.NO_TTL && metadata.params.defaultTimeToLive != LivenessInfo.NO_TTL)
+            return LivenessInfo.NO_TTL;
+
+        return column.ttl;
     }
 
     private void internal_insert(ByteBuffer key, ColumnParent column_parent, Column column, ConsistencyLevel consistency_level)
@@ -942,7 +953,7 @@
             DecoratedKey dk = metadata.decorateKey(key);
             int nowInSec = FBUtilities.nowInSeconds();
 
-            PartitionUpdate partitionUpdates = PartitionUpdate.fromIterator(LegacyLayout.toRowIterator(metadata, dk, toLegacyCells(metadata, updates, nowInSec).iterator(), nowInSec));
+            PartitionUpdate partitionUpdates = PartitionUpdate.fromIterator(LegacyLayout.toRowIterator(metadata, dk, toLegacyCells(metadata, updates, nowInSec).iterator(), nowInSec), ColumnFilter.all(metadata));
             // Indexed column values cannot be larger than 64K.  See CASSANDRA-3057/4240 for more details
             Keyspace.open(metadata.ksName).getColumnFamilyStore(metadata.cfName).indexManager.validate(partitionUpdates);
 
@@ -1144,7 +1155,7 @@
 
                 sortAndMerge(metadata, cells, nowInSec);
                 DecoratedKey dk = metadata.decorateKey(key);
-                PartitionUpdate update = PartitionUpdate.fromIterator(LegacyLayout.toUnfilteredRowIterator(metadata, dk, delInfo, cells.iterator()));
+                PartitionUpdate update = PartitionUpdate.fromIterator(LegacyLayout.toUnfilteredRowIterator(metadata, dk, delInfo, cells.iterator()), ColumnFilter.all(metadata));
 
                 // Indexed column values cannot be larger than 64K.  See CASSANDRA-3057/4240 for more details
                 Keyspace.open(metadata.ksName).getColumnFamilyStore(metadata.cfName).indexManager.validate(update);
@@ -1207,7 +1218,7 @@
         }
     }
 
-    private void addRange(CFMetaData cfm, LegacyLayout.LegacyDeletionInfo delInfo, Slice.Bound start, Slice.Bound end, long timestamp, int nowInSec)
+    private void addRange(CFMetaData cfm, LegacyLayout.LegacyDeletionInfo delInfo, ClusteringBound start, ClusteringBound end, long timestamp, int nowInSec)
     {
         delInfo.add(cfm, new RangeTombstone(Slice.make(start, end), new DeletionTime(timestamp, nowInSec)));
     }
@@ -1222,7 +1233,7 @@
                 try
                 {
                     if (del.super_column == null && cfm.isSuper())
-                        addRange(cfm, delInfo, Slice.Bound.inclusiveStartOf(c), Slice.Bound.inclusiveEndOf(c), del.timestamp, nowInSec);
+                        addRange(cfm, delInfo, ClusteringBound.inclusiveStartOf(c), ClusteringBound.inclusiveEndOf(c), del.timestamp, nowInSec);
                     else if (del.super_column != null)
                         cells.add(toLegacyDeletion(cfm, del.super_column, c, del.timestamp, nowInSec));
                     else
@@ -1256,7 +1267,7 @@
         else
         {
             if (del.super_column != null)
-                addRange(cfm, delInfo, Slice.Bound.inclusiveStartOf(del.super_column), Slice.Bound.inclusiveEndOf(del.super_column), del.timestamp, nowInSec);
+                addRange(cfm, delInfo, ClusteringBound.inclusiveStartOf(del.super_column), ClusteringBound.inclusiveEndOf(del.super_column), del.timestamp, nowInSec);
             else
                 delInfo.add(new DeletionTime(del.timestamp, nowInSec));
         }
@@ -1354,7 +1365,7 @@
         }
         else if (column_path.super_column != null && column_path.column == null)
         {
-            Row row = BTreeRow.emptyDeletedRow(new Clustering(column_path.super_column), Row.Deletion.regular(new DeletionTime(timestamp, nowInSec)));
+            Row row = BTreeRow.emptyDeletedRow(Clustering.make(column_path.super_column), Row.Deletion.regular(new DeletionTime(timestamp, nowInSec)));
             update = PartitionUpdate.singleRowUpdate(metadata, dk, row);
         }
         else
@@ -1612,7 +1623,7 @@
                 ClusteringIndexFilter filter = new ClusteringIndexSliceFilter(Slices.ALL, false);
                 DataLimits limits = getLimits(range.count, true, Integer.MAX_VALUE);
                 Clustering pageFrom = metadata.isSuper()
-                                    ? new Clustering(start_column)
+                                    ? Clustering.make(start_column)
                                     : LegacyLayout.decodeCellName(metadata, start_column).clustering;
                 PartitionRangeReadCommand cmd = new PartitionRangeReadCommand(false,
                                                                               0,
@@ -2099,10 +2110,6 @@
         {
             throw new TimedOutException();
         }
-        catch (IOException e)
-        {
-            throw (UnavailableException) new UnavailableException().initCause(e);
-        }
         finally
         {
             Tracing.instance.stopSession();
@@ -2171,7 +2178,7 @@
                 // See UpdateParameters.addCounter() for more details on this
                 ByteBuffer value = CounterContext.instance().createLocal(column.value);
                 CellPath path = name.collectionElement == null ? null : CellPath.create(name.collectionElement);
-                Cell cell = BufferCell.live(metadata, name.column, FBUtilities.timestampMicros(), value, path);
+                Cell cell = BufferCell.live(name.column, FBUtilities.timestampMicros(), value, path);
 
                 PartitionUpdate update = PartitionUpdate.singleRowUpdate(metadata, key, BTreeRow.singleCellRow(name.clustering, cell));
 
@@ -2426,8 +2433,8 @@
             for (int i = 0 ; i < request.getColumn_slices().size() ; i++)
             {
                 fixOptionalSliceParameters(request.getColumn_slices().get(i));
-                Slice.Bound start = LegacyLayout.decodeBound(metadata, request.getColumn_slices().get(i).start, true).bound;
-                Slice.Bound finish = LegacyLayout.decodeBound(metadata, request.getColumn_slices().get(i).finish, false).bound;
+                ClusteringBound start = LegacyLayout.decodeBound(metadata, request.getColumn_slices().get(i).start, true).bound;
+                ClusteringBound finish = LegacyLayout.decodeBound(metadata, request.getColumn_slices().get(i).finish, false).bound;
 
                 int compare = metadata.comparator.compare(start, finish);
                 if (!request.reversed && compare > 0)

diff --git a/src/java/org/apache/cassandra/thrift/CustomTThreadPoolServer.java b/src/java/org/apache/cassandra/thrift/CustomTThreadPoolServer.java
index c5f34ae..46da9d5 100644
--- a/src/java/org/apache/cassandra/thrift/CustomTThreadPoolServer.java
+++ b/src/java/org/apache/cassandra/thrift/CustomTThreadPoolServer.java

@@ -56,8 +56,8 @@
  * Slightly modified version of the Apache Thrift TThreadPoolServer.
  * <p>
  * This allows passing an executor so you have more control over the actual
- * behaviour of the tasks being run.
- * <p/>
+ * behavior of the tasks being run.
+ * </p>
  * Newer version of Thrift should make this obsolete.
  */
 public class CustomTThreadPoolServer extends TServer
@@ -256,8 +256,7 @@
                     SSLServerSocket sslServerSocket = (SSLServerSocket) sslServer.getServerSocket();
                     String[] suites = SSLFactory.filterCipherSuites(sslServerSocket.getSupportedCipherSuites(), clientEnc.cipher_suites);
                     sslServerSocket.setEnabledCipherSuites(suites);
-                    sslServerSocket.setEnabledProtocols(SSLFactory.ACCEPTED_PROTOCOLS);
-                    serverTransport = new TCustomServerSocket(sslServer.getServerSocket(), args.keepAlive, args.sendBufferSize, args.recvBufferSize);
+                    serverTransport = new TCustomServerSocket(sslServerSocket, args.keepAlive, args.sendBufferSize, args.recvBufferSize);
                 }
                 else
                 {

diff --git a/src/java/org/apache/cassandra/thrift/ThriftConversion.java b/src/java/org/apache/cassandra/thrift/ThriftConversion.java
index 3443b6e..35adddf 100644
--- a/src/java/org/apache/cassandra/thrift/ThriftConversion.java
+++ b/src/java/org/apache/cassandra/thrift/ThriftConversion.java

@@ -18,7 +18,6 @@
 package org.apache.cassandra.thrift;
 
 import java.util.*;
-import java.util.regex.Matcher;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Strings;
@@ -35,7 +34,7 @@
 import org.apache.cassandra.db.filter.RowFilter;
 import org.apache.cassandra.db.marshal.*;
 import org.apache.cassandra.exceptions.*;
-import org.apache.cassandra.index.internal.CassandraIndex;
+import org.apache.cassandra.index.TargetParser;
 import org.apache.cassandra.io.compress.ICompressor;
 import org.apache.cassandra.locator.AbstractReplicationStrategy;
 import org.apache.cassandra.locator.LocalStrategy;
@@ -591,7 +590,7 @@
         IndexMetadata matchedIndex = null;
         for (IndexMetadata index : cfMetaData.getIndexes())
         {
-            Pair<ColumnDefinition, IndexTarget.Type> target  = CassandraIndex.parseTarget(cfMetaData, index);
+            Pair<ColumnDefinition, IndexTarget.Type> target  = TargetParser.parse(cfMetaData, index);
             if (target.left.equals(column))
             {
                 // we already found an index for this column, we've no option but to

diff --git a/src/java/org/apache/cassandra/thrift/ThriftResultsMerger.java b/src/java/org/apache/cassandra/thrift/ThriftResultsMerger.java
index ea3fa2f..a14409e 100644
--- a/src/java/org/apache/cassandra/thrift/ThriftResultsMerger.java
+++ b/src/java/org/apache/cassandra/thrift/ThriftResultsMerger.java

@@ -199,7 +199,7 @@
             Cell cell = staticCells.next();
 
             // Given a static cell, the equivalent row uses the column name as clustering and the value as unique cell value.
-            builder.newRow(new Clustering(cell.column().name.bytes));
+            builder.newRow(Clustering.make(cell.column().name.bytes));
             builder.addCell(new BufferCell(metadata().compactValueColumn(), cell.timestamp(), cell.ttl(), cell.localDeletionTime(), cell.value(), cell.path()));
             nextToMerge = builder.build();
         }

diff --git a/src/java/org/apache/cassandra/thrift/ThriftValidation.java b/src/java/org/apache/cassandra/thrift/ThriftValidation.java
index 5e46459..be3e489 100644
--- a/src/java/org/apache/cassandra/thrift/ThriftValidation.java
+++ b/src/java/org/apache/cassandra/thrift/ThriftValidation.java

@@ -363,8 +363,8 @@
     {
         if (column.isSetTtl())
         {
-            if (column.ttl <= 0)
-                throw new org.apache.cassandra.exceptions.InvalidRequestException("ttl must be positive");
+            if (column.ttl < 0)
+                throw new org.apache.cassandra.exceptions.InvalidRequestException("ttl must be greater or equal to 0");
 
             if (column.ttl > Attributes.MAX_TTL)
                 throw new org.apache.cassandra.exceptions.InvalidRequestException(String.format("ttl is too large. requested (%d) maximum (%d)", column.ttl, Attributes.MAX_TTL));

diff --git a/src/java/org/apache/cassandra/tools/BulkLoadException.java b/src/java/org/apache/cassandra/tools/BulkLoadException.java
new file mode 100644
index 0000000..3c5c03d
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/BulkLoadException.java

@@ -0,0 +1,33 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.tools;
+
+public class BulkLoadException extends Exception
+{
+
+    private static final long serialVersionUID = 1L;
+
+    public BulkLoadException(Throwable cause)
+    {
+        super(cause);
+    }
+
+}

diff --git a/src/java/org/apache/cassandra/tools/BulkLoader.java b/src/java/org/apache/cassandra/tools/BulkLoader.java
index 2b2db15..c094d0a 100644
--- a/src/java/org/apache/cassandra/tools/BulkLoader.java
+++ b/src/java/org/apache/cassandra/tools/BulkLoader.java

@@ -17,64 +17,41 @@
  */
 package org.apache.cassandra.tools;
 
-import java.io.File;
 import java.io.IOException;
-import java.lang.reflect.Constructor;
-import java.lang.reflect.InvocationTargetException;
 import java.net.InetAddress;
-import java.net.MalformedURLException;
-import java.net.UnknownHostException;
-import java.util.*;
-
-import com.google.common.collect.HashMultimap;
-import com.google.common.collect.Multimap;
-import org.apache.commons.cli.*;
+import java.util.Set;
+import javax.net.ssl.SSLContext;
 
 import com.datastax.driver.core.AuthProvider;
 import com.datastax.driver.core.JdkSSLOptions;
-import com.datastax.driver.core.PlainTextAuthProvider;
 import com.datastax.driver.core.SSLOptions;
-import javax.net.ssl.SSLContext;
-import org.apache.cassandra.config.*;
-import org.apache.cassandra.exceptions.ConfigurationException;
+import com.google.common.collect.HashMultimap;
+import com.google.common.collect.Multimap;
+import org.apache.commons.cli.Option;
+import org.apache.commons.cli.Options;
+
+import org.apache.cassandra.config.Config;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.config.EncryptionOptions;
 import org.apache.cassandra.io.sstable.SSTableLoader;
 import org.apache.cassandra.security.SSLFactory;
 import org.apache.cassandra.streaming.*;
+import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.JVMStabilityInspector;
 import org.apache.cassandra.utils.NativeSSTableLoaderClient;
 import org.apache.cassandra.utils.OutputHandler;
 
 public class BulkLoader
 {
-    private static final String TOOL_NAME = "sstableloader";
-    private static final String VERBOSE_OPTION  = "verbose";
-    private static final String HELP_OPTION  = "help";
-    private static final String NOPROGRESS_OPTION  = "no-progress";
-    private static final String IGNORE_NODES_OPTION  = "ignore";
-    private static final String INITIAL_HOST_ADDRESS_OPTION = "nodes";
-    private static final String NATIVE_PORT_OPTION = "port";
-    private static final String USER_OPTION = "username";
-    private static final String PASSWD_OPTION = "password";
-    private static final String AUTH_PROVIDER_OPTION = "auth-provider";
-    private static final String THROTTLE_MBITS = "throttle";
-    private static final String INTER_DC_THROTTLE_MBITS = "inter-dc-throttle";
+    public static void main(String args[]) throws BulkLoadException
+    {
+        LoaderOptions options = LoaderOptions.builder().parseArgs(args).build();
+        load(options);
+    }
 
-    /* client encryption options */
-    private static final String SSL_TRUSTSTORE = "truststore";
-    private static final String SSL_TRUSTSTORE_PW = "truststore-password";
-    private static final String SSL_KEYSTORE = "keystore";
-    private static final String SSL_KEYSTORE_PW = "keystore-password";
-    private static final String SSL_PROTOCOL = "ssl-protocol";
-    private static final String SSL_ALGORITHM = "ssl-alg";
-    private static final String SSL_STORE_TYPE = "store-type";
-    private static final String SSL_CIPHER_SUITES = "ssl-ciphers";
-    private static final String CONNECTIONS_PER_HOST = "connections-per-host";
-    private static final String CONFIG_PATH = "conf-path";
-
-    public static void main(String args[])
+    public static void load(LoaderOptions options) throws BulkLoadException
     {
         Config.setClientMode(true);
-        LoaderOptions options = LoaderOptions.parseArgs(args).validateArguments();
         OutputHandler handler = new OutputHandler.SystemOutput(options.verbose, options.debug);
         SSTableLoader loader = new SSTableLoader(
                 options.directory,
@@ -86,8 +63,8 @@
                         options.sslStoragePort,
                         options.serverEncOptions,
                         buildSSLOptions(options.clientEncOptions)),
-                handler,
-                options.connectionsPerHost);
+                        handler,
+                        options.connectionsPerHost);
         DatabaseDescriptor.setStreamThroughputOutboundMegabitsPerSec(options.throttle);
         DatabaseDescriptor.setInterDCStreamThroughputOutboundMegabitsPerSec(options.interDcThrottle);
         StreamResultFuture future = null;
@@ -110,9 +87,11 @@
             JVMStabilityInspector.inspectThrowable(e);
             System.err.println(e.getMessage());
             if (e.getCause() != null)
+            {
                 System.err.println(e.getCause());
+            }
             e.printStackTrace(System.err);
-            System.exit(1);
+            throw new BulkLoadException(e);
         }
 
         try
@@ -120,18 +99,20 @@
             future.get();
 
             if (!options.noProgress)
+            {
                 indicator.printSummary(options.connectionsPerHost);
+            }
 
             // Give sockets time to gracefully close
             Thread.sleep(1000);
-            System.exit(0); // We need that to stop non daemonized threads
+            // System.exit(0); // We need that to stop non daemonized threads
         }
         catch (Exception e)
         {
             System.err.println("Streaming to the following hosts failed:");
             System.err.println(loader.getFailedHosts());
             e.printStackTrace(System.err);
-            System.exit(1);
+            throw new BulkLoadException(e);
         }
     }
 
@@ -142,7 +123,7 @@
         private long lastProgress;
         private long lastTime;
 
-        private int peak = 0;
+        private long peak = 0;
         private int totalFiles = 0;
 
         private final Multimap<InetAddress, SessionInfo> sessionsByHost = HashMultimap.create();
@@ -196,14 +177,16 @@
                         long current = 0;
                         int completed = 0;
 
-                        if (progressInfo != null && session.peer.equals(progressInfo.peer) && (session.sessionIndex == progressInfo.sessionIndex))
+                        if (progressInfo != null && session.peer.equals(progressInfo.peer) && session.sessionIndex == progressInfo.sessionIndex)
                         {
                             session.updateProgress(progressInfo);
                         }
                         for (ProgressInfo progress : session.getSendingFiles())
                         {
                             if (progress.isCompleted())
+                            {
                                 completed++;
+                            }
                             current += progress.currentBytes;
                         }
                         totalProgress += current;
@@ -215,7 +198,9 @@
                         sb.append(" ").append(String.format("%-3d", size == 0 ? 100L : current * 100L / size)).append("% ");
 
                         if (updateTotalFiles)
+                        {
                             totalFiles += session.getTotalFilesToSend();
+                        }
                     }
                 }
 
@@ -224,35 +209,37 @@
                 lastProgress = totalProgress;
 
                 sb.append("total: ").append(totalSize == 0 ? 100L : totalProgress * 100L / totalSize).append("% ");
-                sb.append(String.format("%-3d", mbPerSec(deltaProgress, deltaTime))).append("MB/s");
-                int average = mbPerSec(totalProgress, (time - start));
-                if (average > peak)
-                    peak = average;
-                sb.append("(avg: ").append(average).append(" MB/s)");
+                sb.append(FBUtilities.prettyPrintMemoryPerSecond(deltaProgress, deltaTime));
+                long average = bytesPerSecond(totalProgress, time - start);
 
-                System.out.print(sb.toString());
+                if (average > peak)
+                {
+                    peak = average;
+                }
+                sb.append(" (avg: ").append(FBUtilities.prettyPrintMemoryPerSecond(totalProgress, time - start)).append(")");
+
+                System.out.println(sb.toString());
             }
         }
 
-        private int mbPerSec(long bytes, long timeInNano)
+        private long bytesPerSecond(long bytes, long timeInNano)
         {
-            double bytesPerNano = ((double)bytes) / timeInNano;
-            return (int)((bytesPerNano * 1000 * 1000 * 1000) / (1024 * 1024));
+            return timeInNano != 0 ? (long) (((double) bytes / timeInNano) * 1000 * 1000 * 1000) : 0;
         }
 
         private void printSummary(int connectionsPerHost)
         {
             long end = System.nanoTime();
             long durationMS = ((end - start) / (1000000));
-            int average = mbPerSec(lastProgress, (end - start));
+
             StringBuilder sb = new StringBuilder();
             sb.append("\nSummary statistics: \n");
-            sb.append(String.format("   %-30s: %-10d%n", "Connections per host: ", connectionsPerHost));
-            sb.append(String.format("   %-30s: %-10d%n", "Total files transferred: ", totalFiles));
-            sb.append(String.format("   %-30s: %-10d%n", "Total bytes transferred: ", lastProgress));
-            sb.append(String.format("   %-30s: %-10d%n", "Total duration (ms): ", durationMS));
-            sb.append(String.format("   %-30s: %-10d%n", "Average transfer rate (MB/s): ", + average));
-            sb.append(String.format("   %-30s: %-10d%n", "Peak transfer rate (MB/s): ", + peak));
+            sb.append(String.format("   %-24s: %-10d%n", "Connections per host ", connectionsPerHost));
+            sb.append(String.format("   %-24s: %-10d%n", "Total files transferred ", totalFiles));
+            sb.append(String.format("   %-24s: %-10s%n", "Total bytes transferred ", FBUtilities.prettyPrintMemory(lastProgress)));
+            sb.append(String.format("   %-24s: %-10s%n", "Total duration ", durationMS + " ms"));
+            sb.append(String.format("   %-24s: %-10s%n", "Average transfer rate ", FBUtilities.prettyPrintMemoryPerSecond(lastProgress, end - start)));
+            sb.append(String.format("   %-24s: %-10s%n", "Peak transfer rate ",  FBUtilities.prettyPrintMemoryPerSecond(peak)));
             System.out.println(sb.toString());
         }
     }
@@ -261,7 +248,9 @@
     {
 
         if (!clientEncryptionOptions.enabled)
+        {
             return null;
+        }
 
         SSLContext sslContext;
         try
@@ -296,7 +285,7 @@
             super(hosts, port, authProvider, sslOptions);
             this.storagePort = storagePort;
             this.sslStoragePort = sslStoragePort;
-            this.serverEncOptions = serverEncryptionOptions;
+            serverEncOptions = serverEncryptionOptions;
         }
 
         @Override
@@ -306,333 +295,6 @@
         }
     }
 
-    static class LoaderOptions
-    {
-        public final File directory;
-
-        public boolean debug;
-        public boolean verbose;
-        public boolean noProgress;
-        public int nativePort = 9042;
-        public String user;
-        public String passwd;
-        public String authProviderName;
-        public AuthProvider authProvider;
-        public int throttle = 0;
-        public int interDcThrottle = 0;
-        public int storagePort;
-        public int sslStoragePort;
-        public EncryptionOptions.ClientEncryptionOptions clientEncOptions = new EncryptionOptions.ClientEncryptionOptions();
-        public int connectionsPerHost = 1;
-        public EncryptionOptions.ServerEncryptionOptions serverEncOptions = new EncryptionOptions.ServerEncryptionOptions();
-
-        public final Set<InetAddress> hosts = new HashSet<>();
-        public final Set<InetAddress> ignores = new HashSet<>();
-
-        LoaderOptions(File directory)
-        {
-            this.directory = directory;
-        }
-
-        public static LoaderOptions parseArgs(String cmdArgs[])
-        {
-            CommandLineParser parser = new GnuParser();
-            CmdLineOptions options = getCmdLineOptions();
-            try
-            {
-                CommandLine cmd = parser.parse(options, cmdArgs, false);
-
-                if (cmd.hasOption(HELP_OPTION))
-                {
-                    printUsage(options);
-                    System.exit(0);
-                }
-
-                String[] args = cmd.getArgs();
-                if (args.length == 0)
-                {
-                    System.err.println("Missing sstable directory argument");
-                    printUsage(options);
-                    System.exit(1);
-                }
-
-                if (args.length > 1)
-                {
-                    System.err.println("Too many arguments");
-                    printUsage(options);
-                    System.exit(1);
-                }
-
-                String dirname = args[0];
-                File dir = new File(dirname);
-
-                if (!dir.exists())
-                    errorMsg("Unknown directory: " + dirname, options);
-
-                if (!dir.isDirectory())
-                    errorMsg(dirname + " is not a directory", options);
-
-                LoaderOptions opts = new LoaderOptions(dir);
-
-                opts.verbose = cmd.hasOption(VERBOSE_OPTION);
-                opts.noProgress = cmd.hasOption(NOPROGRESS_OPTION);
-
-                if (cmd.hasOption(NATIVE_PORT_OPTION))
-                    opts.nativePort = Integer.parseInt(cmd.getOptionValue(NATIVE_PORT_OPTION));
-
-                if (cmd.hasOption(USER_OPTION))
-                    opts.user = cmd.getOptionValue(USER_OPTION);
-
-                if (cmd.hasOption(PASSWD_OPTION))
-                    opts.passwd = cmd.getOptionValue(PASSWD_OPTION);
-
-                if (cmd.hasOption(AUTH_PROVIDER_OPTION))
-                    opts.authProviderName = cmd.getOptionValue(AUTH_PROVIDER_OPTION);
-
-                if (cmd.hasOption(INITIAL_HOST_ADDRESS_OPTION))
-                {
-                    String[] nodes = cmd.getOptionValue(INITIAL_HOST_ADDRESS_OPTION).split(",");
-                    try
-                    {
-                        for (String node : nodes)
-                        {
-                            opts.hosts.add(InetAddress.getByName(node.trim()));
-                        }
-                    }
-                    catch (UnknownHostException e)
-                    {
-                        errorMsg("Unknown host: " + e.getMessage(), options);
-                    }
-
-                }
-                else
-                {
-                    System.err.println("Initial hosts must be specified (-d)");
-                    printUsage(options);
-                    System.exit(1);
-                }
-
-                if (cmd.hasOption(IGNORE_NODES_OPTION))
-                {
-                    String[] nodes = cmd.getOptionValue(IGNORE_NODES_OPTION).split(",");
-                    try
-                    {
-                        for (String node : nodes)
-                        {
-                            opts.ignores.add(InetAddress.getByName(node.trim()));
-                        }
-                    }
-                    catch (UnknownHostException e)
-                    {
-                        errorMsg("Unknown host: " + e.getMessage(), options);
-                    }
-                }
-
-                if (cmd.hasOption(CONNECTIONS_PER_HOST))
-                    opts.connectionsPerHost = Integer.parseInt(cmd.getOptionValue(CONNECTIONS_PER_HOST));
-
-                // try to load config file first, so that values can be rewritten with other option values.
-                // otherwise use default config.
-                Config config;
-                if (cmd.hasOption(CONFIG_PATH))
-                {
-                    File configFile = new File(cmd.getOptionValue(CONFIG_PATH));
-                    if (!configFile.exists())
-                    {
-                        errorMsg("Config file not found", options);
-                    }
-                    config = new YamlConfigurationLoader().loadConfig(configFile.toURI().toURL());
-                }
-                else
-                {
-                    config = new Config();
-                    // unthrottle stream by default
-                    config.stream_throughput_outbound_megabits_per_sec = 0;
-                    config.inter_dc_stream_throughput_outbound_megabits_per_sec = 0;
-                }
-                opts.storagePort = config.storage_port;
-                opts.sslStoragePort = config.ssl_storage_port;
-                opts.throttle = config.stream_throughput_outbound_megabits_per_sec;
-                opts.interDcThrottle = config.inter_dc_stream_throughput_outbound_megabits_per_sec;
-                opts.clientEncOptions = config.client_encryption_options;
-                opts.serverEncOptions = config.server_encryption_options;
-
-                if (cmd.hasOption(THROTTLE_MBITS))
-                {
-                    opts.throttle = Integer.parseInt(cmd.getOptionValue(THROTTLE_MBITS));
-                }
-
-                if (cmd.hasOption(INTER_DC_THROTTLE_MBITS))
-                {
-                    opts.interDcThrottle = Integer.parseInt(cmd.getOptionValue(INTER_DC_THROTTLE_MBITS));
-                }
-
-                if (cmd.hasOption(SSL_TRUSTSTORE) || cmd.hasOption(SSL_TRUSTSTORE_PW) ||
-                    cmd.hasOption(SSL_KEYSTORE) || cmd.hasOption(SSL_KEYSTORE_PW))
-                {
-                    opts.clientEncOptions.enabled = true;
-                }
-
-                if (cmd.hasOption(SSL_TRUSTSTORE))
-                {
-                    opts.clientEncOptions.truststore = cmd.getOptionValue(SSL_TRUSTSTORE);
-                }
-
-                if (cmd.hasOption(SSL_TRUSTSTORE_PW))
-                {
-                    opts.clientEncOptions.truststore_password = cmd.getOptionValue(SSL_TRUSTSTORE_PW);
-                }
-
-                if (cmd.hasOption(SSL_KEYSTORE))
-                {
-                    opts.clientEncOptions.keystore = cmd.getOptionValue(SSL_KEYSTORE);
-                    // if a keystore was provided, lets assume we'll need to use it
-                    opts.clientEncOptions.require_client_auth = true;
-                }
-
-                if (cmd.hasOption(SSL_KEYSTORE_PW))
-                {
-                    opts.clientEncOptions.keystore_password = cmd.getOptionValue(SSL_KEYSTORE_PW);
-                }
-
-                if (cmd.hasOption(SSL_PROTOCOL))
-                {
-                    opts.clientEncOptions.protocol = cmd.getOptionValue(SSL_PROTOCOL);
-                }
-
-                if (cmd.hasOption(SSL_ALGORITHM))
-                {
-                    opts.clientEncOptions.algorithm = cmd.getOptionValue(SSL_ALGORITHM);
-                }
-
-                if (cmd.hasOption(SSL_STORE_TYPE))
-                {
-                    opts.clientEncOptions.store_type = cmd.getOptionValue(SSL_STORE_TYPE);
-                }
-
-                if (cmd.hasOption(SSL_CIPHER_SUITES))
-                {
-                    opts.clientEncOptions.cipher_suites = cmd.getOptionValue(SSL_CIPHER_SUITES).split(",");
-                }
-
-                return opts;
-            }
-            catch (ParseException | ConfigurationException | MalformedURLException e)
-            {
-                errorMsg(e.getMessage(), options);
-                return null;
-            }
-        }
-
-        public LoaderOptions validateArguments()
-        {
-            // Both username and password need to be provided
-            if ((user != null) != (passwd != null))
-                errorMsg("Username and password must both be provided", getCmdLineOptions());
-
-            if (user != null)
-            {
-                // Support for 3rd party auth providers that support plain text credentials.
-                // In this case the auth provider must provide a constructor of the form:
-                //
-                // public MyAuthProvider(String username, String password)
-                if (authProviderName != null)
-                {
-                    try
-                    {
-                        Class authProviderClass = Class.forName(authProviderName);
-                        Constructor constructor = authProviderClass.getConstructor(String.class, String.class);
-                        authProvider = (AuthProvider)constructor.newInstance(user, passwd);
-                    }
-                    catch (ClassNotFoundException e)
-                    {
-                        errorMsg("Unknown auth provider: " + e.getMessage(), getCmdLineOptions());
-                    }
-                    catch (NoSuchMethodException e)
-                    {
-                        errorMsg("Auth provider does not support plain text credentials: " + e.getMessage(), getCmdLineOptions());
-                    }
-                    catch (InstantiationException | IllegalAccessException | IllegalArgumentException | InvocationTargetException e)
-                    {
-                        errorMsg("Could not create auth provider with plain text credentials: " + e.getMessage(), getCmdLineOptions());
-                    }
-                }
-                else
-                {
-                    // If a 3rd party auth provider wasn't provided use the driver plain text provider
-                    authProvider = new PlainTextAuthProvider(user, passwd);
-                }
-            }
-            // Alternate support for 3rd party auth providers that don't use plain text credentials.
-            // In this case the auth provider must provide a nullary constructor of the form:
-            //
-            // public MyAuthProvider()
-            else if (authProviderName != null)
-            {
-                try
-                {
-                    authProvider = (AuthProvider)Class.forName(authProviderName).newInstance();
-                }
-                catch (ClassNotFoundException | InstantiationException | IllegalAccessException e)
-                {
-                    errorMsg("Unknown auth provider" + e.getMessage(), getCmdLineOptions());
-                }
-            }
-
-            return this;
-        }
-
-        private static void errorMsg(String msg, CmdLineOptions options)
-        {
-            System.err.println(msg);
-            printUsage(options);
-            System.exit(1);
-        }
-
-        private static CmdLineOptions getCmdLineOptions()
-        {
-            CmdLineOptions options = new CmdLineOptions();
-            options.addOption("v",  VERBOSE_OPTION,      "verbose output");
-            options.addOption("h",  HELP_OPTION,         "display this help message");
-            options.addOption(null, NOPROGRESS_OPTION,   "don't display progress");
-            options.addOption("i",  IGNORE_NODES_OPTION, "NODES", "don't stream to this (comma separated) list of nodes");
-            options.addOption("d",  INITIAL_HOST_ADDRESS_OPTION, "initial hosts", "Required. try to connect to these hosts (comma separated) initially for ring information");
-            options.addOption("p",  NATIVE_PORT_OPTION, "rpc port", "port used for native connection (default 9042)");
-            options.addOption("t",  THROTTLE_MBITS, "throttle", "throttle speed in Mbits (default unlimited)");
-            options.addOption("idct",  INTER_DC_THROTTLE_MBITS, "inter-dc-throttle", "inter-datacenter throttle speed in Mbits (default unlimited)");
-            options.addOption("u",  USER_OPTION, "username", "username for cassandra authentication");
-            options.addOption("pw", PASSWD_OPTION, "password", "password for cassandra authentication");
-            options.addOption("ap", AUTH_PROVIDER_OPTION, "auth provider", "custom AuthProvider class name for cassandra authentication");
-            options.addOption("cph", CONNECTIONS_PER_HOST, "connectionsPerHost", "number of concurrent connections-per-host.");
-            // ssl connection-related options
-            options.addOption("ts", SSL_TRUSTSTORE, "TRUSTSTORE", "Client SSL: full path to truststore");
-            options.addOption("tspw", SSL_TRUSTSTORE_PW, "TRUSTSTORE-PASSWORD", "Client SSL: password of the truststore");
-            options.addOption("ks", SSL_KEYSTORE, "KEYSTORE", "Client SSL: full path to keystore");
-            options.addOption("kspw", SSL_KEYSTORE_PW, "KEYSTORE-PASSWORD", "Client SSL: password of the keystore");
-            options.addOption("prtcl", SSL_PROTOCOL, "PROTOCOL", "Client SSL: connections protocol to use (default: TLS)");
-            options.addOption("alg", SSL_ALGORITHM, "ALGORITHM", "Client SSL: algorithm (default: SunX509)");
-            options.addOption("st", SSL_STORE_TYPE, "STORE-TYPE", "Client SSL: type of store");
-            options.addOption("ciphers", SSL_CIPHER_SUITES, "CIPHER-SUITES", "Client SSL: comma-separated list of encryption suites to use");
-            options.addOption("f", CONFIG_PATH, "path to config file", "cassandra.yaml file path for streaming throughput and client/server SSL.");
-            return options;
-        }
-
-        public static void printUsage(Options options)
-        {
-            String usage = String.format("%s [options] <dir_path>", TOOL_NAME);
-            String header = System.lineSeparator() +
-                            "Bulk load the sstables found in the directory <dir_path> to the configured cluster." +
-                            "The parent directories of <dir_path> are used as the target keyspace/table name. " +
-                            "So for instance, to load an sstable named Standard1-g-1-Data.db into Keyspace1/Standard1, " +
-                            "you will need to have the files Standard1-g-1-Data.db and Standard1-g-1-Index.db into a directory /path/to/Keyspace1/Standard1/.";
-            String footer = System.lineSeparator() +
-                            "You can provide cassandra.yaml file with -f command line option to set up streaming throughput, client and server encryption options. " +
-                            "Only stream_throughput_outbound_megabits_per_sec, inter_dc_stream_throughput_outbound_megabits_per_sec, server_encryption_options and client_encryption_options are read from yaml. " +
-                            "You can override options read from cassandra.yaml with corresponding command line options.";
-            new HelpFormatter().printHelp(usage, header, options, footer);
-        }
-    }
-
     public static class CmdLineOptions extends Options
     {
         /**

diff --git a/src/java/org/apache/cassandra/tools/JsonTransformer.java b/src/java/org/apache/cassandra/tools/JsonTransformer.java
index 3deed96..3b98595 100644
--- a/src/java/org/apache/cassandra/tools/JsonTransformer.java
+++ b/src/java/org/apache/cassandra/tools/JsonTransformer.java

@@ -24,6 +24,7 @@
 import java.io.OutputStream;
 import java.io.OutputStreamWriter;
 import java.nio.ByteBuffer;
+import java.nio.charset.StandardCharsets;
 import java.time.Instant;
 import java.util.List;
 import java.util.concurrent.TimeUnit;
@@ -31,11 +32,7 @@
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
-import org.apache.cassandra.db.ClusteringPrefix;
-import org.apache.cassandra.db.DecoratedKey;
-import org.apache.cassandra.db.DeletionTime;
-import org.apache.cassandra.db.LivenessInfo;
-import org.apache.cassandra.db.RangeTombstone;
+import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.CollectionType;
 import org.apache.cassandra.db.marshal.CompositeType;
@@ -95,7 +92,7 @@
     public static void toJson(ISSTableScanner currentScanner, Stream<UnfilteredRowIterator> partitions, boolean rawTime, CFMetaData metadata, OutputStream out)
             throws IOException
     {
-        try (JsonGenerator json = jsonFactory.createJsonGenerator(new OutputStreamWriter(out, "UTF-8")))
+        try (JsonGenerator json = jsonFactory.createJsonGenerator(new OutputStreamWriter(out, StandardCharsets.UTF_8)))
         {
             JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata);
             json.writeStartArray();
@@ -106,7 +103,7 @@
 
     public static void keysToJson(ISSTableScanner currentScanner, Stream<DecoratedKey> keys, boolean rawTime, CFMetaData metadata, OutputStream out) throws IOException
     {
-        try (JsonGenerator json = jsonFactory.createJsonGenerator(new OutputStreamWriter(out, "UTF-8")))
+        try (JsonGenerator json = jsonFactory.createJsonGenerator(new OutputStreamWriter(out, StandardCharsets.UTF_8)))
         {
             JsonTransformer transformer = new JsonTransformer(json, currentScanner, rawTime, metadata);
             json.writeStartArray();
@@ -317,7 +314,7 @@
         }
     }
 
-    private void serializeBound(RangeTombstone.Bound bound, DeletionTime deletionTime) throws IOException
+    private void serializeBound(ClusteringBound bound, DeletionTime deletionTime) throws IOException
     {
         json.writeFieldName(bound.isStart() ? "start" : "end");
         json.writeStartObject();
@@ -384,7 +381,6 @@
                     objectIndenter.setCompact(true);
                     json.writeStartObject();
                     json.writeFieldName("name");
-                    AbstractType<?> type = cd.column().type;
                     json.writeString(cd.column().name.toCQLString());
                     serializeDeletion(complexData.complexDeletion());
                     objectIndenter.setCompact(true);

diff --git a/src/java/org/apache/cassandra/tools/LoaderOptions.java b/src/java/org/apache/cassandra/tools/LoaderOptions.java
new file mode 100644
index 0000000..28d7bce
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/LoaderOptions.java

@@ -0,0 +1,563 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.tools;
+
+import java.io.File;
+import java.lang.reflect.Constructor;
+import java.lang.reflect.InvocationTargetException;
+import java.net.*;
+import java.util.HashSet;
+import java.util.Set;
+
+import org.apache.cassandra.config.*;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.tools.BulkLoader.CmdLineOptions;
+
+import com.datastax.driver.core.AuthProvider;
+import com.datastax.driver.core.PlainTextAuthProvider;
+import org.apache.commons.cli.*;
+
+public class LoaderOptions
+{
+
+    public static final String HELP_OPTION = "help";
+    public static final String VERBOSE_OPTION = "verbose";
+    public static final String NOPROGRESS_OPTION = "no-progress";
+    public static final String NATIVE_PORT_OPTION = "port";
+    public static final String USER_OPTION = "username";
+    public static final String PASSWD_OPTION = "password";
+    public static final String AUTH_PROVIDER_OPTION = "auth-provider";
+    public static final String INITIAL_HOST_ADDRESS_OPTION = "nodes";
+    public static final String IGNORE_NODES_OPTION = "ignore";
+    public static final String CONNECTIONS_PER_HOST = "connections-per-host";
+    public static final String CONFIG_PATH = "conf-path";
+    public static final String THROTTLE_MBITS = "throttle";
+    public static final String INTER_DC_THROTTLE_MBITS = "inter-dc-throttle";
+    public static final String TOOL_NAME = "sstableloader";
+
+    /* client encryption options */
+    public static final String SSL_TRUSTSTORE = "truststore";
+    public static final String SSL_TRUSTSTORE_PW = "truststore-password";
+    public static final String SSL_KEYSTORE = "keystore";
+    public static final String SSL_KEYSTORE_PW = "keystore-password";
+    public static final String SSL_PROTOCOL = "ssl-protocol";
+    public static final String SSL_ALGORITHM = "ssl-alg";
+    public static final String SSL_STORE_TYPE = "store-type";
+    public static final String SSL_CIPHER_SUITES = "ssl-ciphers";
+
+    public final File directory;
+    public final boolean debug;
+    public final boolean verbose;
+    public final boolean noProgress;
+    public final int nativePort;
+    public final String user;
+    public final String passwd;
+    public final AuthProvider authProvider;
+    public final int throttle;
+    public final int interDcThrottle;
+    public final int storagePort;
+    public final int sslStoragePort;
+    public final EncryptionOptions.ClientEncryptionOptions clientEncOptions;
+    public final int connectionsPerHost;
+    public final EncryptionOptions.ServerEncryptionOptions serverEncOptions;
+    public final Set<InetAddress> hosts;
+    public final Set<InetAddress> ignores = new HashSet<>();
+
+    LoaderOptions(Builder builder)
+    {
+        directory = builder.directory;
+        debug = builder.debug;
+        verbose = builder.verbose;
+        noProgress = builder.noProgress;
+        nativePort = builder.nativePort;
+        user = builder.user;
+        passwd = builder.passwd;
+        authProvider = builder.authProvider;
+        throttle = builder.throttle;
+        interDcThrottle = builder.interDcThrottle;
+        storagePort = builder.storagePort;
+        sslStoragePort = builder.sslStoragePort;
+        clientEncOptions = builder.clientEncOptions;
+        connectionsPerHost = builder.connectionsPerHost;
+        serverEncOptions = builder.serverEncOptions;
+        hosts = builder.hosts;
+    }
+
+    static class Builder
+    {
+        File directory;
+        boolean debug;
+        boolean verbose;
+        boolean noProgress;
+        int nativePort = 9042;
+        String user;
+        String passwd;
+        String authProviderName;
+        AuthProvider authProvider;
+        int throttle = 0;
+        int interDcThrottle = 0;
+        int storagePort;
+        int sslStoragePort;
+        EncryptionOptions.ClientEncryptionOptions clientEncOptions = new EncryptionOptions.ClientEncryptionOptions();
+        int connectionsPerHost = 1;
+        EncryptionOptions.ServerEncryptionOptions serverEncOptions = new EncryptionOptions.ServerEncryptionOptions();
+        Set<InetAddress> hosts = new HashSet<>();
+        Set<InetAddress> ignores = new HashSet<>();
+
+        Builder()
+        {
+            //
+        }
+
+        public LoaderOptions build()
+        {
+            constructAuthProvider();
+            return new LoaderOptions(this);
+        }
+
+        public Builder directory(File directory)
+        {
+            this.directory = directory;
+            return this;
+        }
+
+        public Builder debug(boolean debug)
+        {
+            this.debug = debug;
+            return this;
+        }
+
+        public Builder verbose(boolean verbose)
+        {
+            this.verbose = verbose;
+            return this;
+        }
+
+        public Builder noProgress(boolean noProgress)
+        {
+            this.noProgress = noProgress;
+            return this;
+        }
+
+        public Builder nativePort(int nativePort)
+        {
+            this.nativePort = nativePort;
+            return this;
+        }
+
+        public Builder user(String user)
+        {
+            this.user = user;
+            return this;
+        }
+
+        public Builder password(String passwd)
+        {
+            this.passwd = passwd;
+            return this;
+        }
+
+        public Builder authProvider(AuthProvider authProvider)
+        {
+            this.authProvider = authProvider;
+            return this;
+        }
+
+        public Builder throttle(int throttle)
+        {
+            this.throttle = throttle;
+            return this;
+        }
+
+        public Builder interDcThrottle(int interDcThrottle)
+        {
+            this.interDcThrottle = interDcThrottle;
+            return this;
+        }
+
+        public Builder storagePort(int storagePort)
+        {
+            this.storagePort = storagePort;
+            return this;
+        }
+
+        public Builder sslStoragePort(int sslStoragePort)
+        {
+            this.sslStoragePort = sslStoragePort;
+            return this;
+        }
+
+        public Builder encOptions(EncryptionOptions.ClientEncryptionOptions encOptions)
+        {
+            this.clientEncOptions = encOptions;
+            return this;
+        }
+
+        public Builder connectionsPerHost(int connectionsPerHost)
+        {
+            this.connectionsPerHost = connectionsPerHost;
+            return this;
+        }
+
+        public Builder serverEncOptions(EncryptionOptions.ServerEncryptionOptions serverEncOptions)
+        {
+            this.serverEncOptions = serverEncOptions;
+            return this;
+        }
+
+        public Builder hosts(Set<InetAddress> hosts)
+        {
+            this.hosts = hosts;
+            return this;
+        }
+
+        public Builder host(InetAddress host)
+        {
+            hosts.add(host);
+            return this;
+        }
+
+        public Builder ignore(Set<InetAddress> ignores)
+        {
+            this.ignores = ignores;
+            return this;
+        }
+
+        public Builder ignore(InetAddress ignore)
+        {
+            ignores.add(ignore);
+            return this;
+        }
+
+        public Builder parseArgs(String cmdArgs[])
+        {
+            CommandLineParser parser = new GnuParser();
+            CmdLineOptions options = getCmdLineOptions();
+            try
+            {
+                CommandLine cmd = parser.parse(options, cmdArgs, false);
+
+                if (cmd.hasOption(HELP_OPTION))
+                {
+                    printUsage(options);
+                    System.exit(0);
+                }
+
+                String[] args = cmd.getArgs();
+                if (args.length == 0)
+                {
+                    System.err.println("Missing sstable directory argument");
+                    printUsage(options);
+                    System.exit(1);
+                }
+
+                if (args.length > 1)
+                {
+                    System.err.println("Too many arguments");
+                    printUsage(options);
+                    System.exit(1);
+                }
+
+                String dirname = args[0];
+                File dir = new File(dirname);
+
+                if (!dir.exists())
+                {
+                    errorMsg("Unknown directory: " + dirname, options);
+                }
+
+                if (!dir.isDirectory())
+                {
+                    errorMsg(dirname + " is not a directory", options);
+                }
+
+                directory = dir;
+
+                verbose = cmd.hasOption(VERBOSE_OPTION);
+                noProgress = cmd.hasOption(NOPROGRESS_OPTION);
+
+                if (cmd.hasOption(NATIVE_PORT_OPTION))
+                {
+                    nativePort = Integer.parseInt(cmd.getOptionValue(NATIVE_PORT_OPTION));
+                }
+
+                if (cmd.hasOption(USER_OPTION))
+                {
+                    user = cmd.getOptionValue(USER_OPTION);
+                }
+
+                if (cmd.hasOption(PASSWD_OPTION))
+                {
+                    passwd = cmd.getOptionValue(PASSWD_OPTION);
+                }
+
+                if (cmd.hasOption(AUTH_PROVIDER_OPTION))
+                {
+                    authProviderName = cmd.getOptionValue(AUTH_PROVIDER_OPTION);
+                }
+
+                if (cmd.hasOption(INITIAL_HOST_ADDRESS_OPTION))
+                {
+                    String[] nodes = cmd.getOptionValue(INITIAL_HOST_ADDRESS_OPTION).split(",");
+                    try
+                    {
+                        for (String node : nodes)
+                        {
+                            hosts.add(InetAddress.getByName(node.trim()));
+                        }
+                    } catch (UnknownHostException e)
+                    {
+                        errorMsg("Unknown host: " + e.getMessage(), options);
+                    }
+
+                } else
+                {
+                    System.err.println("Initial hosts must be specified (-d)");
+                    printUsage(options);
+                    System.exit(1);
+                }
+
+                if (cmd.hasOption(IGNORE_NODES_OPTION))
+                {
+                    String[] nodes = cmd.getOptionValue(IGNORE_NODES_OPTION).split(",");
+                    try
+                    {
+                        for (String node : nodes)
+                        {
+                            ignores.add(InetAddress.getByName(node.trim()));
+                        }
+                    } catch (UnknownHostException e)
+                    {
+                        errorMsg("Unknown host: " + e.getMessage(), options);
+                    }
+                }
+
+                if (cmd.hasOption(CONNECTIONS_PER_HOST))
+                {
+                    connectionsPerHost = Integer.parseInt(cmd.getOptionValue(CONNECTIONS_PER_HOST));
+                }
+
+                // try to load config file first, so that values can be
+                // rewritten with other option values.
+                // otherwise use default config.
+                Config config;
+                if (cmd.hasOption(CONFIG_PATH))
+                {
+                    File configFile = new File(cmd.getOptionValue(CONFIG_PATH));
+                    if (!configFile.exists())
+                    {
+                        errorMsg("Config file not found", options);
+                    }
+                    config = new YamlConfigurationLoader().loadConfig(configFile.toURI().toURL());
+                }
+                else
+                {
+                    config = new Config();
+                    // unthrottle stream by default
+                    config.stream_throughput_outbound_megabits_per_sec = 0;
+                    config.inter_dc_stream_throughput_outbound_megabits_per_sec = 0;
+                }
+                storagePort = config.storage_port;
+                sslStoragePort = config.ssl_storage_port;
+                throttle = config.stream_throughput_outbound_megabits_per_sec;
+                clientEncOptions = config.client_encryption_options;
+                serverEncOptions = config.server_encryption_options;
+
+                if (cmd.hasOption(THROTTLE_MBITS))
+                {
+                    throttle = Integer.parseInt(cmd.getOptionValue(THROTTLE_MBITS));
+                }
+
+                if (cmd.hasOption(INTER_DC_THROTTLE_MBITS))
+                {
+                    interDcThrottle = Integer.parseInt(cmd.getOptionValue(INTER_DC_THROTTLE_MBITS));
+                }
+
+                if (cmd.hasOption(SSL_TRUSTSTORE) || cmd.hasOption(SSL_TRUSTSTORE_PW) ||
+                            cmd.hasOption(SSL_KEYSTORE) || cmd.hasOption(SSL_KEYSTORE_PW))
+                {
+                    clientEncOptions.enabled = true;
+                }
+
+                if (cmd.hasOption(SSL_TRUSTSTORE))
+                {
+                    clientEncOptions.truststore = cmd.getOptionValue(SSL_TRUSTSTORE);
+                }
+
+                if (cmd.hasOption(SSL_TRUSTSTORE_PW))
+                {
+                    clientEncOptions.truststore_password = cmd.getOptionValue(SSL_TRUSTSTORE_PW);
+                }
+
+                if (cmd.hasOption(SSL_KEYSTORE))
+                {
+                    clientEncOptions.keystore = cmd.getOptionValue(SSL_KEYSTORE);
+                    // if a keystore was provided, lets assume we'll need to use
+                    // it
+                    clientEncOptions.require_client_auth = true;
+                }
+
+                if (cmd.hasOption(SSL_KEYSTORE_PW))
+                {
+                    clientEncOptions.keystore_password = cmd.getOptionValue(SSL_KEYSTORE_PW);
+                }
+
+                if (cmd.hasOption(SSL_PROTOCOL))
+                {
+                    clientEncOptions.protocol = cmd.getOptionValue(SSL_PROTOCOL);
+                }
+
+                if (cmd.hasOption(SSL_ALGORITHM))
+                {
+                    clientEncOptions.algorithm = cmd.getOptionValue(SSL_ALGORITHM);
+                }
+
+                if (cmd.hasOption(SSL_STORE_TYPE))
+                {
+                    clientEncOptions.store_type = cmd.getOptionValue(SSL_STORE_TYPE);
+                }
+
+                if (cmd.hasOption(SSL_CIPHER_SUITES))
+                {
+                    clientEncOptions.cipher_suites = cmd.getOptionValue(SSL_CIPHER_SUITES).split(",");
+                }
+
+                return this;
+            }
+            catch (ParseException | ConfigurationException | MalformedURLException e)
+            {
+                errorMsg(e.getMessage(), options);
+                return null;
+            }
+        }
+
+        private void constructAuthProvider()
+        {
+            // Both username and password need to be provided
+            if ((user != null) != (passwd != null))
+                errorMsg("Username and password must both be provided", getCmdLineOptions());
+
+            if (user != null)
+            {
+                // Support for 3rd party auth providers that support plain text credentials.
+                // In this case the auth provider must provide a constructor of the form:
+                //
+                // public MyAuthProvider(String username, String password)
+                if (authProviderName != null)
+                {
+                    try
+                    {
+                        Class authProviderClass = Class.forName(authProviderName);
+                        Constructor constructor = authProviderClass.getConstructor(String.class, String.class);
+                        authProvider = (AuthProvider)constructor.newInstance(user, passwd);
+                    }
+                    catch (ClassNotFoundException e)
+                    {
+                        errorMsg("Unknown auth provider: " + e.getMessage(), getCmdLineOptions());
+                    }
+                    catch (NoSuchMethodException e)
+                    {
+                        errorMsg("Auth provider does not support plain text credentials: " + e.getMessage(), getCmdLineOptions());
+                    }
+                    catch (InstantiationException | IllegalAccessException | IllegalArgumentException | InvocationTargetException e)
+                    {
+                        errorMsg("Could not create auth provider with plain text credentials: " + e.getMessage(), getCmdLineOptions());
+                    }
+                }
+                else
+                {
+                    // If a 3rd party auth provider wasn't provided use the driver plain text provider
+                    this.authProvider = new PlainTextAuthProvider(user, passwd);
+                }
+            }
+            // Alternate support for 3rd party auth providers that don't use plain text credentials.
+            // In this case the auth provider must provide a nullary constructor of the form:
+            //
+            // public MyAuthProvider()
+            else if (authProviderName != null)
+            {
+                try
+                {
+                    authProvider = (AuthProvider)Class.forName(authProviderName).newInstance();
+                }
+                catch (ClassNotFoundException | InstantiationException | IllegalAccessException e)
+                {
+                    errorMsg("Unknown auth provider: " + e.getMessage(), getCmdLineOptions());
+                }
+            }
+        }
+    }
+
+    public static Builder builder()
+    {
+        return new Builder();
+    }
+
+    private static void errorMsg(String msg, CmdLineOptions options)
+    {
+        System.err.println(msg);
+        printUsage(options);
+        System.exit(1);
+    }
+
+    private static CmdLineOptions getCmdLineOptions()
+    {
+        CmdLineOptions options = new CmdLineOptions();
+        options.addOption("v", VERBOSE_OPTION, "verbose output");
+        options.addOption("h", HELP_OPTION, "display this help message");
+        options.addOption(null, NOPROGRESS_OPTION, "don't display progress");
+        options.addOption("i", IGNORE_NODES_OPTION, "NODES", "don't stream to this (comma separated) list of nodes");
+        options.addOption("d", INITIAL_HOST_ADDRESS_OPTION, "initial hosts", "Required. try to connect to these hosts (comma separated) initially for ring information");
+        options.addOption("p", NATIVE_PORT_OPTION, "rpc port", "port used for native connection (default 9042)");
+        options.addOption("t", THROTTLE_MBITS, "throttle", "throttle speed in Mbits (default unlimited)");
+        options.addOption("idct", INTER_DC_THROTTLE_MBITS, "inter-dc-throttle", "inter-datacenter throttle speed in Mbits (default unlimited)");
+        options.addOption("u", USER_OPTION, "username", "username for cassandra authentication");
+        options.addOption("pw", PASSWD_OPTION, "password", "password for cassandra authentication");
+        options.addOption("ap", AUTH_PROVIDER_OPTION, "auth provider", "custom AuthProvider class name for cassandra authentication");
+        options.addOption("cph", CONNECTIONS_PER_HOST, "connectionsPerHost", "number of concurrent connections-per-host.");
+        // ssl connection-related options
+        options.addOption("ts", SSL_TRUSTSTORE, "TRUSTSTORE", "Client SSL: full path to truststore");
+        options.addOption("tspw", SSL_TRUSTSTORE_PW, "TRUSTSTORE-PASSWORD", "Client SSL: password of the truststore");
+        options.addOption("ks", SSL_KEYSTORE, "KEYSTORE", "Client SSL: full path to keystore");
+        options.addOption("kspw", SSL_KEYSTORE_PW, "KEYSTORE-PASSWORD", "Client SSL: password of the keystore");
+        options.addOption("prtcl", SSL_PROTOCOL, "PROTOCOL", "Client SSL: connections protocol to use (default: TLS)");
+        options.addOption("alg", SSL_ALGORITHM, "ALGORITHM", "Client SSL: algorithm (default: SunX509)");
+        options.addOption("st", SSL_STORE_TYPE, "STORE-TYPE", "Client SSL: type of store");
+        options.addOption("ciphers", SSL_CIPHER_SUITES, "CIPHER-SUITES", "Client SSL: comma-separated list of encryption suites to use");
+        options.addOption("f", CONFIG_PATH, "path to config file", "cassandra.yaml file path for streaming throughput and client/server SSL.");
+        return options;
+    }
+
+    public static void printUsage(Options options)
+    {
+        String usage = String.format("%s [options] <dir_path>", TOOL_NAME);
+        String header = System.lineSeparator() +
+                "Bulk load the sstables found in the directory <dir_path> to the configured cluster." +
+                "The parent directories of <dir_path> are used as the target keyspace/table name. " +
+                "So for instance, to load an sstable named Standard1-g-1-Data.db into Keyspace1/Standard1, " +
+                "you will need to have the files Standard1-g-1-Data.db and Standard1-g-1-Index.db into a directory /path/to/Keyspace1/Standard1/.";
+        String footer = System.lineSeparator() +
+                "You can provide cassandra.yaml file with -f command line option to set up streaming throughput, client and server encryption options. " +
+                "Only stream_throughput_outbound_megabits_per_sec, server_encryption_options and client_encryption_options are read from yaml. " +
+                "You can override options read from cassandra.yaml with corresponding command line options.";
+        new HelpFormatter().printHelp(usage, header, options, footer);
+    }
+}

diff --git a/src/java/org/apache/cassandra/tools/NodeProbe.java b/src/java/org/apache/cassandra/tools/NodeProbe.java
index 80fab15..3bf99ef 100644
--- a/src/java/org/apache/cassandra/tools/NodeProbe.java
+++ b/src/java/org/apache/cassandra/tools/NodeProbe.java

@@ -85,11 +85,13 @@
 import org.apache.cassandra.streaming.management.StreamStateCompositeData;
 
 import com.google.common.base.Function;
+import com.google.common.base.Strings;
 import com.google.common.collect.Iterables;
 import com.google.common.collect.Maps;
 import com.google.common.collect.Multimap;
 import com.google.common.collect.Sets;
 import com.google.common.util.concurrent.Uninterruptibles;
+import org.apache.cassandra.tools.nodetool.GetTimeout;
 
 /**
  * JMX client operations for Cassandra.
@@ -224,7 +226,7 @@
                 mbeanServerConn, ManagementFactory.RUNTIME_MXBEAN_NAME, RuntimeMXBean.class);
     }
 
-    private RMIClientSocketFactory getRMIClientSocketFactory() throws IOException
+    private RMIClientSocketFactory getRMIClientSocketFactory()
     {
         if (Boolean.parseBoolean(System.getProperty("ssl.enable")))
             return new SslRMIClientSocketFactory();
@@ -303,12 +305,21 @@
         }
     }
 
+    public void forceUserDefinedCompaction(String datafiles) throws IOException, ExecutionException, InterruptedException
+    {
+        compactionProxy.forceUserDefinedCompaction(datafiles);
+    }
 
     public void forceKeyspaceCompaction(boolean splitOutput, String keyspaceName, String... tableNames) throws IOException, ExecutionException, InterruptedException
     {
         ssProxy.forceKeyspaceCompaction(splitOutput, keyspaceName, tableNames);
     }
 
+    public void relocateSSTables(int jobs, String keyspace, String[] cfnames) throws IOException, ExecutionException, InterruptedException
+    {
+        ssProxy.relocateSSTables(jobs, keyspace, cfnames);
+    }
+
     public void forceKeyspaceFlush(String keyspaceName, String... tableNames) throws IOException, ExecutionException, InterruptedException
     {
         ssProxy.forceKeyspaceFlush(keyspaceName, tableNames);
@@ -519,9 +530,10 @@
      *
      * @param snapshotName the name of the snapshot.
      * @param table the table to snapshot or all on null
+     * @param options Options (skipFlush for now)
      * @param keyspaces the keyspaces to snapshot
      */
-    public void takeSnapshot(String snapshotName, String table, String... keyspaces) throws IOException
+    public void takeSnapshot(String snapshotName, String table, Map<String, String> options, String... keyspaces) throws IOException
     {
         if (table != null)
         {
@@ -529,10 +541,11 @@
             {
                 throw new IOException("When specifying the table for a snapshot, you must specify one and only one keyspace");
             }
-            ssProxy.takeTableSnapshot(keyspaces[0], table, snapshotName);
+
+            ssProxy.takeSnapshot(snapshotName, options, keyspaces[0] + "." + table);
         }
         else
-            ssProxy.takeSnapshot(snapshotName, keyspaces);
+            ssProxy.takeSnapshot(snapshotName, options, keyspaces);
     }
 
     /**
@@ -540,21 +553,22 @@
      *
      * @param snapshotName
      *            the name of the snapshot.
+     * @param options
+     *            Options (skipFlush for now)
      * @param tableList
      *            list of columnfamily from different keyspace in the form of ks1.cf1 ks2.cf2
      */
-    public void takeMultipleTableSnapshot(String snapshotName, String... tableList)
+    public void takeMultipleTableSnapshot(String snapshotName, Map<String, String> options, String... tableList)
             throws IOException
     {
         if (null != tableList && tableList.length != 0)
         {
-            ssProxy.takeMultipleTableSnapshot(snapshotName, tableList);
+            ssProxy.takeSnapshot(snapshotName, options, tableList);
         }
         else
         {
             throw new IOException("The column family List  for a snapshot should not be empty or null");
         }
-
     }
 
     /**
@@ -689,10 +703,10 @@
         return ssProxy.getNaturalEndpoints(keyspace, cf, key);
     }
 
-    public List<String> getSSTables(String keyspace, String cf, String key)
+    public List<String> getSSTables(String keyspace, String cf, String key, boolean hexFormat)
     {
         ColumnFamilyStoreMBean cfsProxy = getCfsProxy(keyspace, cf);
-        return cfsProxy.getSSTablesForKey(key);
+        return cfsProxy.getSSTablesForKey(key, hexFormat);
     }
 
     public Set<StreamState> getStreamStatus()
@@ -849,6 +863,11 @@
         return spProxy.getHintedHandoffDisabledDCs();
     }
 
+    public Map<String, String> getViewBuildStatuses(String keyspace, String view)
+    {
+        return ssProxy.getViewBuildStatuses(keyspace, view);
+    }
+
     public void pauseHintsDelivery()
     {
         hhProxy.pauseHintsDelivery(true);
@@ -953,6 +972,31 @@
         return ssProxy.getCompactionThroughputMbPerSec();
     }
 
+    public long getTimeout(String type)
+    {
+        switch (type)
+        {
+            case "misc":
+                return ssProxy.getRpcTimeout();
+            case "read":
+                return ssProxy.getReadRpcTimeout();
+            case "range":
+                return ssProxy.getRangeRpcTimeout();
+            case "write":
+                return ssProxy.getWriteRpcTimeout();
+            case "counterwrite":
+                return ssProxy.getCounterWriteRpcTimeout();
+            case "cascontention":
+                return ssProxy.getCasContentionTimeout();
+            case "truncate":
+                return ssProxy.getTruncateRpcTimeout();
+            case "streamingsocket":
+                return (long) ssProxy.getStreamingSocketTimeout();
+            default:
+                throw new RuntimeException("Timeout type requires one of (" + GetTimeout.TIMEOUT_TYPES + ")");
+        }
+    }
+
     public int getStreamThroughput()
     {
         return ssProxy.getStreamThroughputMbPerSec();
@@ -998,6 +1042,44 @@
         compactionProxy.stopCompaction(string);
     }
 
+    public void setTimeout(String type, long value)
+    {
+        if (value < 0)
+            throw new RuntimeException("timeout must be non-negative");
+
+        switch (type)
+        {
+            case "misc":
+                ssProxy.setRpcTimeout(value);
+                break;
+            case "read":
+                ssProxy.setReadRpcTimeout(value);
+                break;
+            case "range":
+                ssProxy.setRangeRpcTimeout(value);
+                break;
+            case "write":
+                ssProxy.setWriteRpcTimeout(value);
+                break;
+            case "counterwrite":
+                ssProxy.setCounterWriteRpcTimeout(value);
+                break;
+            case "cascontention":
+                ssProxy.setCasContentionTimeout(value);
+                break;
+            case "truncate":
+                ssProxy.setTruncateRpcTimeout(value);
+                break;
+            case "streamingsocket":
+                if (value > Integer.MAX_VALUE)
+                    throw new RuntimeException("streamingsocket timeout must be less than " + Integer.MAX_VALUE);
+                ssProxy.setStreamingSocketTimeout((int) value);
+                break;
+            default:
+                throw new RuntimeException("Timeout type requires one of (" + GetTimeout.TIMEOUT_TYPES + ")");
+        }
+    }
+
     public void stopById(String compactionId)
     {
         compactionProxy.stopCompactionById(compactionId);
@@ -1028,9 +1110,9 @@
         return ssProxy.describeRingJMX(keyspaceName);
     }
 
-    public void rebuild(String sourceDc)
+    public void rebuild(String sourceDc, String keyspace, String tokens)
     {
-        ssProxy.rebuild(sourceDc);
+        ssProxy.rebuild(sourceDc, keyspace, tokens);
     }
 
     public List<String> sampleKeyRange()
@@ -1084,9 +1166,18 @@
                             CassandraMetricsRegistry.JmxGaugeMBean.class).getValue();
                 case "Requests":
                 case "Hits":
+                case "Misses":
                     return JMX.newMBeanProxy(mbeanServerConn,
                             new ObjectName("org.apache.cassandra.metrics:type=Cache,scope=" + cacheType + ",name=" + metricName),
                             CassandraMetricsRegistry.JmxMeterMBean.class).getCount();
+                case "MissLatency":
+                    return JMX.newMBeanProxy(mbeanServerConn,
+                            new ObjectName("org.apache.cassandra.metrics:type=Cache,scope=" + cacheType + ",name=" + metricName),
+                            CassandraMetricsRegistry.JmxTimerMBean.class).getMean();
+                case "MissLatencyUnit":
+                    return JMX.newMBeanProxy(mbeanServerConn,
+                            new ObjectName("org.apache.cassandra.metrics:type=Cache,scope=" + cacheType + ",name=MissLatency"),
+                            CassandraMetricsRegistry.JmxTimerMBean.class).getDurationUnit();
                 default:
                     throw new RuntimeException("Unknown cache metric name.");
 
@@ -1114,16 +1205,28 @@
 
     /**
      * Retrieve ColumnFamily metrics
-     * @param ks Keyspace for which stats are to be displayed.
-     * @param cf ColumnFamily for which stats are to be displayed.
+     * @param ks Keyspace for which stats are to be displayed or null for the global value
+     * @param cf ColumnFamily for which stats are to be displayed or null for the keyspace value (if ks supplied)
      * @param metricName View {@link TableMetrics}.
      */
     public Object getColumnFamilyMetric(String ks, String cf, String metricName)
     {
         try
         {
-            String type = cf.contains(".") ? "IndexTable" : "Table";
-            ObjectName oName = new ObjectName(String.format("org.apache.cassandra.metrics:type=%s,keyspace=%s,scope=%s,name=%s", type, ks, cf, metricName));
+            ObjectName oName = null;
+            if (!Strings.isNullOrEmpty(ks) && !Strings.isNullOrEmpty(cf))
+            {
+                String type = cf.contains(".") ? "IndexTable" : "Table";
+                oName = new ObjectName(String.format("org.apache.cassandra.metrics:type=%s,keyspace=%s,scope=%s,name=%s", type, ks, cf, metricName));
+            }
+            else if (!Strings.isNullOrEmpty(ks))
+            {
+                oName = new ObjectName(String.format("org.apache.cassandra.metrics:type=Keyspace,keyspace=%s,name=%s", ks, metricName));
+            }
+            else
+            {
+                oName = new ObjectName(String.format("org.apache.cassandra.metrics:type=Table,name=%s", metricName));
+            }
             switch(metricName)
             {
                 case "BloomFilterDiskSpaceUsed":
@@ -1144,6 +1247,7 @@
                 case "MemtableLiveDataSize":
                 case "MemtableOffHeapSize":
                 case "MinPartitionSize":
+                case "PercentRepaired":
                 case "RecentBloomFilterFalsePositives":
                 case "RecentBloomFilterFalseRatio":
                 case "SnapshotsSize":
@@ -1155,6 +1259,7 @@
                 case "WriteTotalLatency":
                 case "ReadTotalLatency":
                 case "PendingFlushes":
+                case "DroppedMutations":
                     return JMX.newMBeanProxy(mbeanServerConn, oName, CassandraMetricsRegistry.JmxCounterMBean.class).getCount();
                 case "CoordinatorReadLatency":
                 case "CoordinatorScanLatency":
@@ -1209,6 +1314,7 @@
                             CassandraMetricsRegistry.JmxCounterMBean.class);
                 case "CompletedTasks":
                 case "PendingTasks":
+                case "PendingTasksByTableName":
                     return JMX.newMBeanProxy(mbeanServerConn,
                             new ObjectName("org.apache.cassandra.metrics:type=Compaction,name=" + metricName),
                             CassandraMetricsRegistry.JmxGaugeMBean.class).getValue();

diff --git a/src/java/org/apache/cassandra/tools/NodeTool.java b/src/java/org/apache/cassandra/tools/NodeTool.java
index 7d125ad..8640b58 100644
--- a/src/java/org/apache/cassandra/tools/NodeTool.java
+++ b/src/java/org/apache/cassandra/tools/NodeTool.java

@@ -79,6 +79,7 @@
                 GcStats.class,
                 GetCompactionThreshold.class,
                 GetCompactionThroughput.class,
+                GetTimeout.class,
                 GetStreamThroughput.class,
                 GetTraceProbability.class,
                 GetInterDCStreamThroughput.class,
@@ -103,6 +104,7 @@
                 SetHintedHandoffThrottleInKB.class,
                 SetCompactionThreshold.class,
                 SetCompactionThroughput.class,
+                SetTimeout.class,
                 SetStreamThroughput.class,
                 SetInterDCStreamThroughput.class,
                 SetTraceProbability.class,
@@ -136,7 +138,9 @@
                 DisableHintsForDC.class,
                 EnableHintsForDC.class,
                 FailureDetectorInfo.class,
-                RefreshSizeEstimates.class
+                RefreshSizeEstimates.class,
+                RelocateSSTables.class,
+                ViewBuildStatus.class
         );
 
         Cli.CliBuilder<Runnable> builder = Cli.builder("nodetool");
@@ -305,7 +309,7 @@
                     nodeClient = new NodeProbe(host, parseInt(port));
                 else
                     nodeClient = new NodeProbe(host, parseInt(port), username, password);
-            } catch (IOException e)
+            } catch (IOException | SecurityException e)
             {
                 Throwable rootCause = Throwables.getRootCause(e);
                 System.err.println(format("nodetool: Failed to connect to '%s:%s' - %s: '%s'.", host, port, rootCause.getClass().getSimpleName(), rootCause.getMessage()));

diff --git a/src/java/org/apache/cassandra/tools/RepairRunner.java b/src/java/org/apache/cassandra/tools/RepairRunner.java
index 0813775..8497a1b 100644
--- a/src/java/org/apache/cassandra/tools/RepairRunner.java
+++ b/src/java/org/apache/cassandra/tools/RepairRunner.java

@@ -56,7 +56,8 @@
         cmd = ssProxy.repairAsync(keyspace, options);
         if (cmd <= 0)
         {
-            String message = String.format("[%s] Nothing to repair for keyspace '%s'", format.format(System.currentTimeMillis()), keyspace);
+            // repairAsync can only return 0 for replication factor 1.
+            String message = String.format("[%s] Replication factor is 1. No repair is needed for keyspace '%s'", format.format(System.currentTimeMillis()), keyspace);
             out.println(message);
         }
         else

diff --git a/src/java/org/apache/cassandra/tools/SSTableExport.java b/src/java/org/apache/cassandra/tools/SSTableExport.java
index 09dbbed..cc6b84b 100644
--- a/src/java/org/apache/cassandra/tools/SSTableExport.java
+++ b/src/java/org/apache/cassandra/tools/SSTableExport.java

@@ -98,14 +98,10 @@
         if (!desc.version.storeRows())
             throw new IOException("pre-3.0 SSTable is not supported.");
 
-        EnumSet<MetadataType> types = EnumSet.of(MetadataType.VALIDATION, MetadataType.STATS, MetadataType.HEADER);
+        EnumSet<MetadataType> types = EnumSet.of(MetadataType.STATS, MetadataType.HEADER);
         Map<MetadataType, MetadataComponent> sstableMetadata = desc.getMetadataSerializer().deserialize(desc, types);
-        ValidationMetadata validationMetadata = (ValidationMetadata) sstableMetadata.get(MetadataType.VALIDATION);
         SerializationHeader.Component header = (SerializationHeader.Component) sstableMetadata.get(MetadataType.HEADER);
-
-        IPartitioner partitioner = SecondaryIndexManager.isIndexColumnFamily(desc.cfname)
-                                   ? new LocalPartitioner(header.getKeyType())
-                                   : FBUtilities.newPartitioner(validationMetadata.partitioner);
+        IPartitioner partitioner = FBUtilities.newPartitioner(desc);
 
         CFMetaData.Builder builder = CFMetaData.Builder.create("keyspace", "table").withPartitioner(partitioner);
         header.getStaticColumns().entrySet().stream()

diff --git a/src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java b/src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java
index 420b802..3c8ba64 100644
--- a/src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java
+++ b/src/java/org/apache/cassandra/tools/SSTableMetadataViewer.java

@@ -17,14 +17,27 @@
  */
 package org.apache.cassandra.tools;
 
-import java.io.File;
-import java.io.IOException;
-import java.io.PrintStream;
+import java.io.*;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
 import java.util.EnumSet;
+import java.util.List;
 import java.util.Map;
+import java.util.stream.Collectors;
 
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.SerializationHeader;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.db.rows.EncodingStats;
+import org.apache.cassandra.dht.IPartitioner;
+import org.apache.cassandra.io.compress.CompressionMetadata;
+import org.apache.cassandra.io.sstable.Component;
 import org.apache.cassandra.io.sstable.Descriptor;
+import org.apache.cassandra.io.sstable.IndexSummary;
 import org.apache.cassandra.io.sstable.metadata.*;
+import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.Pair;
 
 /**
  * Shows the contents of sstable metadata
@@ -54,6 +67,11 @@
                 ValidationMetadata validation = (ValidationMetadata) metadata.get(MetadataType.VALIDATION);
                 StatsMetadata stats = (StatsMetadata) metadata.get(MetadataType.STATS);
                 CompactionMetadata compaction = (CompactionMetadata) metadata.get(MetadataType.COMPACTION);
+                CompressionMetadata compression = null;
+                File compressionFile = new File(descriptor.filenameFor(Component.COMPRESSION_INFO));
+                if (compressionFile.exists())
+                    compression = CompressionMetadata.create(fname);
+                SerializationHeader.Component header = (SerializationHeader.Component) metadata.get(MetadataType.HEADER);
 
                 out.printf("SSTable: %s%n", descriptor);
                 if (validation != null)
@@ -65,17 +83,44 @@
                 {
                     out.printf("Minimum timestamp: %s%n", stats.minTimestamp);
                     out.printf("Maximum timestamp: %s%n", stats.maxTimestamp);
+                    out.printf("SSTable min local deletion time: %s%n", stats.minLocalDeletionTime);
                     out.printf("SSTable max local deletion time: %s%n", stats.maxLocalDeletionTime);
-                    out.printf("Compression ratio: %s%n", stats.compressionRatio);
+                    out.printf("Compressor: %s%n", compression != null ? compression.compressor().getClass().getName() : "-");
+                    if (compression != null)
+                        out.printf("Compression ratio: %s%n", stats.compressionRatio);
+                    out.printf("TTL min: %s%n", stats.minTTL);
+                    out.printf("TTL max: %s%n", stats.maxTTL);
+
+                    if (validation != null && header != null)
+                        printMinMaxToken(descriptor, FBUtilities.newPartitioner(descriptor), header.getKeyType(), out);
+
+                    if (header != null && header.getClusteringTypes().size() == stats.minClusteringValues.size())
+                    {
+                        List<AbstractType<?>> clusteringTypes = header.getClusteringTypes();
+                        List<ByteBuffer> minClusteringValues = stats.minClusteringValues;
+                        List<ByteBuffer> maxClusteringValues = stats.maxClusteringValues;
+                        String[] minValues = new String[clusteringTypes.size()];
+                        String[] maxValues = new String[clusteringTypes.size()];
+                        for (int i = 0; i < clusteringTypes.size(); i++)
+                        {
+                            minValues[i] = clusteringTypes.get(i).getString(minClusteringValues.get(i));
+                            maxValues[i] = clusteringTypes.get(i).getString(maxClusteringValues.get(i));
+                        }
+                        out.printf("minClustringValues: %s%n", Arrays.toString(minValues));
+                        out.printf("maxClustringValues: %s%n", Arrays.toString(maxValues));
+                    }
                     out.printf("Estimated droppable tombstones: %s%n", stats.getEstimatedDroppableTombstoneRatio((int) (System.currentTimeMillis() / 1000)));
                     out.printf("SSTable Level: %d%n", stats.sstableLevel);
                     out.printf("Repaired at: %d%n", stats.repairedAt);
                     out.printf("Minimum replay position: %s\n", stats.commitLogLowerBound);
                     out.printf("Maximum replay position: %s\n", stats.commitLogUpperBound);
+                    out.printf("totalColumnsSet: %s%n", stats.totalColumnsSet);
+                    out.printf("totalRows: %s%n", stats.totalRows);
                     out.println("Estimated tombstone drop times:");
-                    for (Map.Entry<Double, Long> entry : stats.estimatedTombstoneDropTime.getAsMap().entrySet())
+
+                    for (Map.Entry<Number, long[]> entry : stats.estimatedTombstoneDropTime.getAsMap().entrySet())
                     {
-                        out.printf("%-10s:%10s%n",entry.getKey().intValue(), entry.getValue());
+                        out.printf("%-10s:%10s%n",entry.getKey().intValue(), entry.getValue()[0]);
                     }
                     printHistograms(stats, out);
                 }
@@ -83,6 +128,30 @@
                 {
                     out.printf("Estimated cardinality: %s%n", compaction.cardinalityEstimator.cardinality());
                 }
+                if (header != null)
+                {
+                    EncodingStats encodingStats = header.getEncodingStats();
+                    AbstractType<?> keyType = header.getKeyType();
+                    List<AbstractType<?>> clusteringTypes = header.getClusteringTypes();
+                    Map<ByteBuffer, AbstractType<?>> staticColumns = header.getStaticColumns();
+                    Map<String, String> statics = staticColumns.entrySet().stream()
+                                                               .collect(Collectors.toMap(
+                                                                e -> UTF8Type.instance.getString(e.getKey()),
+                                                                e -> e.getValue().toString()));
+                    Map<ByteBuffer, AbstractType<?>> regularColumns = header.getRegularColumns();
+                    Map<String, String> regulars = regularColumns.entrySet().stream()
+                                                                 .collect(Collectors.toMap(
+                                                                 e -> UTF8Type.instance.getString(e.getKey()),
+                                                                 e -> e.getValue().toString()));
+
+                    out.printf("EncodingStats minTTL: %s%n", encodingStats.minTTL);
+                    out.printf("EncodingStats minLocalDeletionTime: %s%n", encodingStats.minLocalDeletionTime);
+                    out.printf("EncodingStats minTimestamp: %s%n", encodingStats.minTimestamp);
+                    out.printf("KeyType: %s%n", keyType.toString());
+                    out.printf("ClusteringTypes: %s%n", clusteringTypes.toString());
+                    out.printf("StaticColumns: {%s}%n", FBUtilities.toString(statics));
+                    out.printf("RegularColumns: {%s}%n", FBUtilities.toString(regulars));
+                }
             }
             else
             {
@@ -108,4 +177,19 @@
                                       (i < ecch.length ? ecch[i] : "")));
         }
     }
+
+    private static void printMinMaxToken(Descriptor descriptor, IPartitioner partitioner, AbstractType<?> keyType, PrintStream out) throws IOException
+    {
+        File summariesFile = new File(descriptor.filenameFor(Component.SUMMARY));
+        if (!summariesFile.exists())
+            return;
+
+        try (DataInputStream iStream = new DataInputStream(new FileInputStream(summariesFile)))
+        {
+            Pair<DecoratedKey, DecoratedKey> firstLast = new IndexSummary.IndexSummarySerializer().deserializeFirstLastKey(iStream, partitioner, descriptor.version.hasSamplingLevel());
+            out.printf("First token: %s (key=%s)%n", firstLast.left.getToken(), keyType.getString(firstLast.left.getKey()));
+            out.printf("Last token: %s (key=%s)%n", firstLast.right.getToken(), keyType.getString(firstLast.right.getKey()));
+        }
+    }
+
 }

diff --git a/src/java/org/apache/cassandra/tools/SSTableOfflineRelevel.java b/src/java/org/apache/cassandra/tools/SSTableOfflineRelevel.java
index b27b07a..9f0395b 100644
--- a/src/java/org/apache/cassandra/tools/SSTableOfflineRelevel.java
+++ b/src/java/org/apache/cassandra/tools/SSTableOfflineRelevel.java

@@ -17,18 +17,22 @@
  */
 package org.apache.cassandra.tools;
 
+import java.io.File;
 import java.io.IOException;
 import java.io.PrintStream;
 import java.util.ArrayList;
 import java.util.Collections;
 import java.util.Comparator;
-import java.util.HashSet;
 import java.util.Iterator;
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
 
 import com.google.common.base.Throwables;
+import com.google.common.collect.ArrayListMultimap;
+import com.google.common.collect.HashMultimap;
+import com.google.common.collect.Multimap;
+import com.google.common.collect.SetMultimap;
 
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.db.ColumnFamilyStore;
@@ -96,7 +100,7 @@
         Keyspace ks = Keyspace.openWithoutSSTables(keyspace);
         ColumnFamilyStore cfs = ks.getColumnFamilyStore(columnfamily);
         Directories.SSTableLister lister = cfs.getDirectories().sstableLister(Directories.OnTxnErr.THROW).skipTemporary(true);
-        Set<SSTableReader> sstables = new HashSet<>();
+        SetMultimap<File, SSTableReader> sstableMultimap = HashMultimap.create();
         for (Map.Entry<Descriptor, Set<Component>> sstable : lister.list().entrySet())
         {
             if (sstable.getKey() != null)
@@ -104,7 +108,7 @@
                 try
                 {
                     SSTableReader reader = SSTableReader.open(sstable.getKey());
-                    sstables.add(reader);
+                    sstableMultimap.put(reader.descriptor.directory, reader);
                 }
                 catch (Throwable t)
                 {
@@ -113,13 +117,20 @@
                 }
             }
         }
-        if (sstables.isEmpty())
+        if (sstableMultimap.isEmpty())
         {
             out.println("No sstables to relevel for "+keyspace+"."+columnfamily);
             System.exit(1);
         }
-        Relevel rl = new Relevel(sstables);
-        rl.relevel(dryRun);
+        for (File directory : sstableMultimap.keySet())
+        {
+            if (!sstableMultimap.get(directory).isEmpty())
+            {
+                Relevel rl = new Relevel(sstableMultimap.get(directory));
+                out.println("For sstables in " + directory + ":");
+                rl.relevel(dryRun);
+            }
+        }
         System.exit(0);
 
     }
@@ -134,8 +145,23 @@
             approxExpectedLevels = (int) Math.ceil(Math.log10(sstables.size()));
         }
 
+        private void printLeveling(Iterable<SSTableReader> sstables)
+        {
+            Multimap<Integer, SSTableReader> leveling = ArrayListMultimap.create();
+            int maxLevel = 0;
+            for (SSTableReader sstable : sstables)
+            {
+                leveling.put(sstable.getSSTableLevel(), sstable);
+                maxLevel = Math.max(sstable.getSSTableLevel(), maxLevel);
+            }
+            System.out.println("Current leveling:");
+            for (int i = 0; i <= maxLevel; i++)
+                System.out.println(String.format("L%d=%d", i, leveling.get(i).size()));
+        }
+
         public void relevel(boolean dryRun) throws IOException
         {
+            printLeveling(sstables);
             List<SSTableReader> sortedSSTables = new ArrayList<>(sstables);
             Collections.sort(sortedSSTables, new Comparator<SSTableReader>()
             {
@@ -178,8 +204,9 @@
                 System.out.println("New leveling: ");
 
             System.out.println("L0="+l0.size());
+            // item 0 in levels is the highest level we will create, printing from L1 up here:
             for (int i = levels.size() - 1; i >= 0; i--)
-                System.out.println(String.format("L%d %d", levels.size() - i, levels.get(i).size()));
+                System.out.println(String.format("L%d=%d", levels.size() - i, levels.get(i).size()));
 
             if (!dryRun)
             {

diff --git a/src/java/org/apache/cassandra/tools/StandaloneScrubber.java b/src/java/org/apache/cassandra/tools/StandaloneScrubber.java
index 4249430..42772ef 100644
--- a/src/java/org/apache/cassandra/tools/StandaloneScrubber.java
+++ b/src/java/org/apache/cassandra/tools/StandaloneScrubber.java

@@ -161,7 +161,7 @@
     private static void checkManifest(CompactionStrategyManager strategyManager, ColumnFamilyStore cfs, Collection<SSTableReader> sstables)
     {
         int maxSizeInMB = (int)((cfs.getCompactionStrategyManager().getMaxSSTableBytes()) / (1024L * 1024L));
-        if (strategyManager.getStrategies().size() == 2 && strategyManager.getStrategies().get(0) instanceof LeveledCompactionStrategy)
+        if (strategyManager.getCompactionParams().klass().equals(LeveledCompactionStrategy.class))
         {
             System.out.println("Checking leveled manifest");
             Predicate<SSTableReader> repairedPredicate = new Predicate<SSTableReader>()

diff --git a/src/java/org/apache/cassandra/tools/nodetool/Compact.java b/src/java/org/apache/cassandra/tools/nodetool/Compact.java
index 002541d..f268f0a 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/Compact.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/Compact.java

@@ -27,18 +27,37 @@
 import org.apache.cassandra.tools.NodeProbe;
 import org.apache.cassandra.tools.NodeTool.NodeToolCmd;
 
-@Command(name = "compact", description = "Force a (major) compaction on one or more tables")
+@Command(name = "compact", description = "Force a (major) compaction on one or more tables or user-defined compaction on given SSTables")
 public class Compact extends NodeToolCmd
 {
-    @Arguments(usage = "[<keyspace> <tables>...]", description = "The keyspace followed by one or many tables")
+    @Arguments(usage = "[<keyspace> <tables>...] or <SSTable file>...", description = "The keyspace followed by one or many tables or list of SSTable data files when using --user-defined")
     private List<String> args = new ArrayList<>();
 
     @Option(title = "split_output", name = {"-s", "--split-output"}, description = "Use -s to not create a single big file")
     private boolean splitOutput = false;
 
+    @Option(title = "user-defined", name = {"--user-defined"}, description = "Use --user-defined to submit listed files for user-defined compaction")
+    private boolean userDefined = false;
+
     @Override
     public void execute(NodeProbe probe)
     {
+        if (splitOutput && userDefined)
+        {
+            throw new RuntimeException("Invalid option combination: User defined compaction can not be split");
+        }
+        else if (userDefined)
+        {
+            try
+            {
+                String userDefinedFiles = String.join(",", args);
+                probe.forceUserDefinedCompaction(userDefinedFiles);
+            } catch (Exception e) {
+                throw new RuntimeException("Error occurred during user defined compaction", e);
+            }
+            return;
+        }
+
         List<String> keyspaces = parseOptionalKeyspace(args, probe);
         String[] tableNames = parseOptionalTables(args);
 

diff --git a/src/java/org/apache/cassandra/tools/nodetool/CompactionHistory.java b/src/java/org/apache/cassandra/tools/nodetool/CompactionHistory.java
index cbb054a..40c6887 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/CompactionHistory.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/CompactionHistory.java

@@ -17,16 +17,22 @@
  */
 package org.apache.cassandra.tools.nodetool;
 
-import static com.google.common.collect.Iterables.toArray;
-import io.airlift.command.Command;
-
+import java.time.Instant;
+import java.time.LocalDateTime;
+import java.time.ZoneId;
+import java.util.ArrayList;
+import java.util.Collections;
 import java.util.List;
 import java.util.Set;
-
 import javax.management.openmbean.TabularData;
 
+import io.airlift.command.Command;
+
 import org.apache.cassandra.tools.NodeProbe;
 import org.apache.cassandra.tools.NodeTool.NodeToolCmd;
+import org.apache.cassandra.tools.nodetool.formatter.TableBuilder;
+
+import static com.google.common.collect.Iterables.toArray;
 
 @Command(name = "compactionhistory", description = "Print history of compaction")
 public class CompactionHistory extends NodeToolCmd
@@ -43,15 +49,75 @@
             return;
         }
 
-        String format = "%-41s%-19s%-29s%-26s%-15s%-15s%s%n";
+        TableBuilder table = new TableBuilder();
         List<String> indexNames = tabularData.getTabularType().getIndexNames();
-        System.out.printf(format, toArray(indexNames, Object.class));
+        table.add(toArray(indexNames, String.class));
 
         Set<?> values = tabularData.keySet();
+        List<CompactionHistoryRow> chr = new ArrayList<>();
         for (Object eachValue : values)
         {
             List<?> value = (List<?>) eachValue;
-            System.out.printf(format, toArray(value, Object.class));
+            CompactionHistoryRow chc = new CompactionHistoryRow((String)value.get(0),
+                                                                (String)value.get(1),
+                                                                (String)value.get(2),
+                                                                (Long)value.get(3),
+                                                                (Long)value.get(4),
+                                                                (Long)value.get(5),
+                                                                (String)value.get(6));
+            chr.add(chc);
+        }
+        Collections.sort(chr);
+        for (CompactionHistoryRow eachChc : chr)
+        {
+            table.add(eachChc.getAllAsArray());
+        }
+        table.printTo(System.out);
+    }
+
+    /**
+     * Allows the Compaction History output to be ordered by 'compactedAt' - that is the
+     * time at which compaction finished.
+     */
+    private static class CompactionHistoryRow implements Comparable<CompactionHistoryRow>
+    {
+        private final String id;
+        private final String ksName;
+        private final String cfName;
+        private final long compactedAt;
+        private final long bytesIn;
+        private final long bytesOut;
+        private final String rowMerged;
+
+        CompactionHistoryRow(String id, String ksName, String cfName, long compactedAt, long bytesIn, long bytesOut, String rowMerged)
+        {
+            this.id = id;
+            this.ksName = ksName;
+            this.cfName = cfName;
+            this.compactedAt = compactedAt;
+            this.bytesIn = bytesIn;
+            this.bytesOut = bytesOut;
+            this.rowMerged = rowMerged;
+        }
+
+        public int compareTo(CompactionHistoryRow chc)
+        {
+            return Long.signum(chc.compactedAt - this.compactedAt);
+        }
+
+        public String[] getAllAsArray()
+        {
+            String[] obj = new String[7];
+            obj[0] = this.id;
+            obj[1] = this.ksName;
+            obj[2] = this.cfName;
+            Instant instant = Instant.ofEpochMilli(this.compactedAt);
+            LocalDateTime ldt = LocalDateTime.ofInstant(instant, ZoneId.systemDefault());
+            obj[3] = ldt.toString();
+            obj[4] = Long.toString(this.bytesIn);
+            obj[5] = Long.toString(this.bytesOut);
+            obj[6] = this.rowMerged;
+            return obj;
         }
     }
-}
\ No newline at end of file
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/CompactionStats.java b/src/java/org/apache/cassandra/tools/nodetool/CompactionStats.java
index e57d2ee..69fcbab 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/CompactionStats.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/CompactionStats.java

@@ -17,44 +17,63 @@
  */
 package org.apache.cassandra.tools.nodetool;
 
-import static java.lang.String.format;
-import io.airlift.command.Command;
-import io.airlift.command.Option;
-
 import java.text.DecimalFormat;
-import java.util.ArrayList;
 import java.util.List;
 import java.util.Map;
+import java.util.Map.Entry;
+
+import io.airlift.command.Command;
+import io.airlift.command.Option;
 
 import org.apache.cassandra.db.compaction.CompactionManagerMBean;
 import org.apache.cassandra.db.compaction.OperationType;
 import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.tools.NodeProbe;
-import org.apache.cassandra.tools.NodeTool;
 import org.apache.cassandra.tools.NodeTool.NodeToolCmd;
+import org.apache.cassandra.tools.nodetool.formatter.TableBuilder;
+
+import static java.lang.String.format;
 
 @Command(name = "compactionstats", description = "Print statistics on compactions")
 public class CompactionStats extends NodeToolCmd
 {
     @Option(title = "human_readable",
             name = {"-H", "--human-readable"},
-            description = "Display bytes in human readable form, i.e. KB, MB, GB, TB")
+            description = "Display bytes in human readable form, i.e. KiB, MiB, GiB, TiB")
     private boolean humanReadable = false;
 
     @Override
     public void execute(NodeProbe probe)
     {
         CompactionManagerMBean cm = probe.getCompactionManagerProxy();
-        System.out.println("pending tasks: " + probe.getCompactionMetric("PendingTasks"));
+        Map<String, Map<String, Integer>> pendingTaskNumberByTable =
+            (Map<String, Map<String, Integer>>) probe.getCompactionMetric("PendingTasksByTableName");
+        int numTotalPendingTask = 0;
+        for (Entry<String, Map<String, Integer>> ksEntry : pendingTaskNumberByTable.entrySet())
+        {
+            for (Entry<String, Integer> tableEntry : ksEntry.getValue().entrySet())
+                numTotalPendingTask += tableEntry.getValue();
+        }
+        System.out.println("pending tasks: " + numTotalPendingTask);
+        for (Entry<String, Map<String, Integer>> ksEntry : pendingTaskNumberByTable.entrySet())
+        {
+            String ksName = ksEntry.getKey();
+            for (Entry<String, Integer> tableEntry : ksEntry.getValue().entrySet())
+            {
+                String tableName = tableEntry.getKey();
+                int pendingTaskCount = tableEntry.getValue();
+
+                System.out.println("- " + ksName + '.' + tableName + ": " + pendingTaskCount);
+            }
+        }
+        System.out.println();
         long remainingBytes = 0;
+        TableBuilder table = new TableBuilder();
         List<Map<String, String>> compactions = cm.getCompactions();
         if (!compactions.isEmpty())
         {
             int compactionThroughput = probe.getCompactionThroughput();
-            List<String[]> lines = new ArrayList<>();
-            int[] columnSizes = new int[] { 0, 0, 0, 0, 0, 0, 0, 0 };
-
-            addLine(lines, columnSizes, "id", "compaction type", "keyspace", "table", "completed", "total", "unit", "progress");
+            table.add("id", "compaction type", "keyspace", "table", "completed", "total", "unit", "progress");
             for (Map<String, String> c : compactions)
             {
                 long total = Long.parseLong(c.get("total"));
@@ -67,24 +86,11 @@
                 String unit = c.get("unit");
                 String percentComplete = total == 0 ? "n/a" : new DecimalFormat("0.00").format((double) completed / total * 100) + "%";
                 String id = c.get("compactionId");
-                addLine(lines, columnSizes, id, taskType, keyspace, columnFamily, completedStr, totalStr, unit, percentComplete);
+                table.add(id, taskType, keyspace, columnFamily, completedStr, totalStr, unit, percentComplete);
                 if (taskType.equals(OperationType.COMPACTION.toString()))
                     remainingBytes += total - completed;
             }
-
-            StringBuilder buffer = new StringBuilder();
-            for (int columnSize : columnSizes) {
-                buffer.append("%");
-                buffer.append(columnSize + 3);
-                buffer.append("s");
-            }
-            buffer.append("%n");
-            String format = buffer.toString();
-
-            for (String[] line : lines)
-            {
-                System.out.printf(format, line[0], line[1], line[2], line[3], line[4], line[5], line[6], line[7]);
-            }
+            table.printTo(System.out);
 
             String remainingTime = "n/a";
             if (compactionThroughput != 0)
@@ -95,11 +101,4 @@
             System.out.printf("%25s%10s%n", "Active compaction remaining time : ", remainingTime);
         }
     }
-
-    private void addLine(List<String[]> lines, int[] columnSizes, String... columns) {
-        lines.add(columns);
-        for (int i = 0; i < columns.length; i++) {
-            columnSizes[i] = Math.max(columnSizes[i], columns[i].length());
-        }
-    }
 }
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/tools/nodetool/GetSSTables.java b/src/java/org/apache/cassandra/tools/nodetool/GetSSTables.java
index 2c5d46b..849ad94 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/GetSSTables.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/GetSSTables.java

@@ -24,13 +24,19 @@
 import java.util.ArrayList;
 import java.util.List;
 
+import io.airlift.command.Option;
 import org.apache.cassandra.tools.NodeProbe;
 import org.apache.cassandra.tools.NodeTool.NodeToolCmd;
 
 @Command(name = "getsstables", description = "Print the sstable filenames that own the key")
 public class GetSSTables extends NodeToolCmd
 {
-    @Arguments(usage = "<keyspace> <table> <key>", description = "The keyspace, the table, and the key")
+    @Option(title = "hex_format",
+           name = {"-hf", "--hex-format"},
+           description = "Specify the key in hexadecimal string format")
+    private boolean hexFormat = false;
+
+    @Arguments(usage = "<keyspace> <cfname> <key>", description = "The keyspace, the column family, and the key")
     private List<String> args = new ArrayList<>();
 
     @Override
@@ -41,7 +47,7 @@
         String cf = args.get(1);
         String key = args.get(2);
 
-        List<String> sstables = probe.getSSTables(ks, cf, key);
+        List<String> sstables = probe.getSSTables(ks, cf, key, hexFormat);
         for (String sstable : sstables)
         {
             System.out.println(sstable);

diff --git a/src/java/org/apache/cassandra/tools/nodetool/GetTimeout.java b/src/java/org/apache/cassandra/tools/nodetool/GetTimeout.java
new file mode 100644
index 0000000..b12c9a7
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/nodetool/GetTimeout.java

@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.tools.nodetool;
+
+import io.airlift.command.Arguments;
+import io.airlift.command.Command;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.cassandra.tools.NodeProbe;
+import org.apache.cassandra.tools.NodeTool.NodeToolCmd;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+@Command(name = "gettimeout", description = "Print the timeout of the given type in ms")
+public class GetTimeout extends NodeToolCmd
+{
+    public static final String TIMEOUT_TYPES = "read, range, write, counterwrite, cascontention, truncate, streamingsocket, misc (general rpc_timeout_in_ms)";
+
+    @Arguments(usage = "<timeout_type>", description = "The timeout type, one of (" + TIMEOUT_TYPES + ")")
+    private List<String> args = new ArrayList<>();
+
+    @Override
+    public void execute(NodeProbe probe)
+    {
+        checkArgument(args.size() == 1, "gettimeout requires a timeout type, one of (" + TIMEOUT_TYPES + ")");
+        try
+        {
+            System.out.println("Current timeout for type " + args.get(0) + ": " + probe.getTimeout(args.get(0)) + " ms");
+        } catch (Exception e)
+        {
+            throw new IllegalArgumentException(e.getMessage());
+        }
+
+    }
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/Info.java b/src/java/org/apache/cassandra/tools/nodetool/Info.java
index 0d9bd73..032e47f 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/Info.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/Info.java

@@ -43,7 +43,7 @@
     @Override
     public void execute(NodeProbe probe)
     {
-        boolean gossipInitialized = probe.isInitialized();
+        boolean gossipInitialized = probe.isGossipRunning();
 
         System.out.printf("%-23s: %s%n", "ID", probe.getLocalHostId());
         System.out.printf("%-23s: %s%n", "Gossip active", gossipInitialized);
@@ -117,6 +117,31 @@
                 probe.getCacheMetric("CounterCache", "HitRate"),
                 cacheService.getCounterCacheSavePeriodInSeconds());
 
+        // Chunk Cache: Hits, Requests, RecentHitRate, SavePeriodInSeconds
+        try
+        {
+            System.out.printf("%-23s: entries %d, size %s, capacity %s, %d misses, %d requests, %.3f recent hit rate, %.3f %s miss latency%n",
+                    "Chunk Cache",
+                    probe.getCacheMetric("ChunkCache", "Entries"),
+                    FileUtils.stringifyFileSize((long) probe.getCacheMetric("ChunkCache", "Size")),
+                    FileUtils.stringifyFileSize((long) probe.getCacheMetric("ChunkCache", "Capacity")),
+                    probe.getCacheMetric("ChunkCache", "Misses"),
+                    probe.getCacheMetric("ChunkCache", "Requests"),
+                    probe.getCacheMetric("ChunkCache", "HitRate"),
+                    probe.getCacheMetric("ChunkCache", "MissLatency"),
+                    probe.getCacheMetric("ChunkCache", "MissLatencyUnit"));
+        }
+        catch (RuntimeException e)
+        {
+            if (!(e.getCause() instanceof InstanceNotFoundException))
+                throw e;
+
+            // Chunk cache is not on.
+        }
+
+        // Global table stats
+        System.out.printf("%-23s: %s%%%n", "Percent Repaired", probe.getColumnFamilyMetric(null, null, "PercentRepaired"));
+
         // check if node is already joined, before getting tokens, since it throws exception if not.
         if (probe.isJoined())
         {

diff --git a/src/java/org/apache/cassandra/tools/nodetool/ListSnapshots.java b/src/java/org/apache/cassandra/tools/nodetool/ListSnapshots.java
index ee7bf34..1b3065b 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/ListSnapshots.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/ListSnapshots.java

@@ -17,17 +17,17 @@
  */
 package org.apache.cassandra.tools.nodetool;
 
-import io.airlift.command.Command;
-
 import java.util.List;
 import java.util.Map;
 import java.util.Set;
-
 import javax.management.openmbean.TabularData;
 
+import io.airlift.command.Command;
+
 import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.tools.NodeProbe;
 import org.apache.cassandra.tools.NodeTool.NodeToolCmd;
+import org.apache.cassandra.tools.nodetool.formatter.TableBuilder;
 
 @Command(name = "listsnapshots", description = "Lists all the snapshots along with the size on disk and true size.")
 public class ListSnapshots extends NodeToolCmd
@@ -47,10 +47,10 @@
             }
 
             final long trueSnapshotsSize = probe.trueSnapshotsSize();
-            final String format = "%-20s%-29s%-29s%-19s%-19s%n";
+            TableBuilder table = new TableBuilder();
             // display column names only once
             final List<String> indexNames = snapshotDetails.entrySet().iterator().next().getValue().getTabularType().getIndexNames();
-            System.out.printf(format, (Object[]) indexNames.toArray(new String[indexNames.size()]));
+            table.add(indexNames.toArray(new String[indexNames.size()]));
 
             for (final Map.Entry<String, TabularData> snapshotDetail : snapshotDetails.entrySet())
             {
@@ -58,9 +58,10 @@
                 for (Object eachValue : values)
                 {
                     final List<?> value = (List<?>) eachValue;
-                    System.out.printf(format, value.toArray(new Object[value.size()]));
+                    table.add(value.toArray(new String[value.size()]));
                 }
             }
+            table.printTo(System.out);
 
             System.out.println("\nTotal TrueDiskSpaceUsed: " + FileUtils.stringifyFileSize(trueSnapshotsSize) + "\n");
         }

diff --git a/src/java/org/apache/cassandra/tools/nodetool/NetStats.java b/src/java/org/apache/cassandra/tools/nodetool/NetStats.java
index 5b84dff..c171a3e 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/NetStats.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/NetStats.java

@@ -35,7 +35,7 @@
 {
     @Option(title = "human_readable",
             name = {"-H", "--human-readable"},
-            description = "Display bytes in human readable form, i.e. KB, MB, GB, TB")
+            description = "Display bytes in human readable form, i.e. KiB, MiB, GiB, TiB")
     private boolean humanReadable = false;
 
     @Override

diff --git a/src/java/org/apache/cassandra/tools/nodetool/ProxyHistograms.java b/src/java/org/apache/cassandra/tools/nodetool/ProxyHistograms.java
index 2a2851d..656e7ed 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/ProxyHistograms.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/ProxyHistograms.java

@@ -29,24 +29,30 @@
     @Override
     public void execute(NodeProbe probe)
     {
-        String[] percentiles = new String[]{"50%", "75%", "95%", "98%", "99%", "Min", "Max"};
+        String[] percentiles = {"50%", "75%", "95%", "98%", "99%", "Min", "Max"};
         double[] readLatency = probe.metricPercentilesAsArray(probe.getProxyMetric("Read"));
         double[] writeLatency = probe.metricPercentilesAsArray(probe.getProxyMetric("Write"));
         double[] rangeLatency = probe.metricPercentilesAsArray(probe.getProxyMetric("RangeSlice"));
+        double[] casReadLatency = probe.metricPercentilesAsArray(probe.getProxyMetric("CASRead"));
+        double[] casWriteLatency = probe.metricPercentilesAsArray(probe.getProxyMetric("CASWrite"));
+        double[] viewWriteLatency = probe.metricPercentilesAsArray(probe.getProxyMetric("ViewWrite"));
 
         System.out.println("proxy histograms");
-        System.out.println(format("%-10s%18s%18s%18s",
-                "Percentile", "Read Latency", "Write Latency", "Range Latency"));
-        System.out.println(format("%-10s%18s%18s%18s",
-                "", "(micros)", "(micros)", "(micros)"));
+        System.out.println(format("%-10s%19s%19s%19s%19s%19s%19s",
+                "Percentile", "Read Latency", "Write Latency", "Range Latency", "CAS Read Latency", "CAS Write Latency", "View Write Latency"));
+        System.out.println(format("%-10s%19s%19s%19s%19s%19s%19s",
+                "", "(micros)", "(micros)", "(micros)", "(micros)", "(micros)", "(micros)"));
         for (int i = 0; i < percentiles.length; i++)
         {
-            System.out.println(format("%-10s%18.2f%18.2f%18.2f",
+            System.out.println(format("%-10s%19.2f%19.2f%19.2f%19.2f%19.2f%19.2f",
                     percentiles[i],
                     readLatency[i],
                     writeLatency[i],
-                    rangeLatency[i]));
+                    rangeLatency[i],
+                    casReadLatency[i],
+                    casWriteLatency[i],
+                    viewWriteLatency[i]));
         }
         System.out.println();
     }
-}
\ No newline at end of file
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/Rebuild.java b/src/java/org/apache/cassandra/tools/nodetool/Rebuild.java
index 8a6dbf1..865f9fe 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/Rebuild.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/Rebuild.java

@@ -19,6 +19,7 @@
 
 import io.airlift.command.Arguments;
 import io.airlift.command.Command;
+import io.airlift.command.Option;
 
 import org.apache.cassandra.tools.NodeProbe;
 import org.apache.cassandra.tools.NodeTool.NodeToolCmd;
@@ -26,12 +27,29 @@
 @Command(name = "rebuild", description = "Rebuild data by streaming from other nodes (similarly to bootstrap)")
 public class Rebuild extends NodeToolCmd
 {
-    @Arguments(usage = "<src-dc-name>", description = "Name of DC from which to select sources for streaming. By default, pick any DC")
+    @Arguments(usage = "<src-dc-name>",
+               description = "Name of DC from which to select sources for streaming. By default, pick any DC")
     private String sourceDataCenterName = null;
 
+    @Option(title = "specific_keyspace",
+            name = {"-ks", "--keyspace"},
+            description = "Use -ks to rebuild specific keyspace.")
+    private String keyspace = null;
+
+    @Option(title = "specific_tokens",
+            name = {"-ts", "--tokens"},
+            description = "Use -ts to rebuild specific token ranges, in the format of \"(start_token_1,end_token_1],(start_token_2,end_token_2],...(start_token_n,end_token_n]\".")
+    private String tokens = null;
+
     @Override
     public void execute(NodeProbe probe)
     {
-        probe.rebuild(sourceDataCenterName);
+        // check the arguments
+        if (keyspace == null && tokens != null)
+        {
+            throw new IllegalArgumentException("Cannot specify tokens without keyspace.");
+        }
+
+        probe.rebuild(sourceDataCenterName, keyspace, tokens);
     }
 }
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/tools/nodetool/RelocateSSTables.java b/src/java/org/apache/cassandra/tools/nodetool/RelocateSSTables.java
new file mode 100644
index 0000000..7c3066c
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/nodetool/RelocateSSTables.java

@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.tools.nodetool;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import io.airlift.command.Arguments;
+import io.airlift.command.Command;
+import io.airlift.command.Option;
+import org.apache.cassandra.tools.NodeProbe;
+import org.apache.cassandra.tools.NodeTool;
+
+@Command(name = "relocatesstables", description = "Relocates sstables to the correct disk")
+public class RelocateSSTables extends NodeTool.NodeToolCmd
+{
+    @Arguments(usage = "<keyspace> <table>", description = "The keyspace and table name")
+    private List<String> args = new ArrayList<>();
+
+    @Option(title = "jobs",
+            name = {"-j", "--jobs"},
+            description = "Number of sstables to relocate simultanously, set to 0 to use all available compaction threads")
+    private int jobs = 2;
+
+    @Override
+    public void execute(NodeProbe probe)
+    {
+        List<String> keyspaces = parseOptionalKeyspace(args, probe);
+        String[] cfnames = parseOptionalTables(args);
+        try
+        {
+            for (String keyspace : keyspaces)
+                probe.relocateSSTables(jobs, keyspace, cfnames);
+        }
+        catch (Exception e)
+        {
+            throw new RuntimeException("Got error while relocating", e);
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/Ring.java b/src/java/org/apache/cassandra/tools/nodetool/Ring.java
index 03d9449..55220a1 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/Ring.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/Ring.java

@@ -88,7 +88,7 @@
         }
         catch (IllegalArgumentException ex)
         {
-            System.out.printf("%nError: " + ex.getMessage() + "%n");
+            System.out.printf("%nError: %s%n", ex.getMessage());
             return;
         }
 
@@ -174,4 +174,4 @@
         }
         System.out.println();
     }
-}
\ No newline at end of file
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/SetTimeout.java b/src/java/org/apache/cassandra/tools/nodetool/SetTimeout.java
new file mode 100644
index 0000000..0b99efd
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/nodetool/SetTimeout.java

@@ -0,0 +1,55 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.tools.nodetool;
+
+import io.airlift.command.Arguments;
+import io.airlift.command.Command;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.cassandra.tools.NodeProbe;
+import org.apache.cassandra.tools.NodeTool.NodeToolCmd;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+@Command(name = "settimeout", description = "Set the specified timeout in ms, or 0 to disable timeout")
+public class SetTimeout extends NodeToolCmd
+{
+    @Arguments(usage = "<timeout_type> <timeout_in_ms>", description = "Timeout type followed by value in ms " +
+            "(0 disables socket streaming timeout). Type should be one of (" + GetTimeout.TIMEOUT_TYPES + ")",
+            required = true)
+    private List<String> args = new ArrayList<>();
+
+    @Override
+    public void execute(NodeProbe probe)
+    {
+        checkArgument(args.size() == 2, "Timeout type followed by value in ms (0 disables socket streaming timeout)." +
+                " Type should be one of (" + GetTimeout.TIMEOUT_TYPES + ")");
+
+        try
+        {
+            String type = args.get(0);
+            long timeout = Long.parseLong(args.get(1));
+            probe.setTimeout(type, timeout);
+        } catch (Exception e)
+        {
+            throw new IllegalArgumentException(e.getMessage());
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/Snapshot.java b/src/java/org/apache/cassandra/tools/nodetool/Snapshot.java
index 4f549e5..8941ec1 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/Snapshot.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/Snapshot.java

@@ -25,7 +25,9 @@
 
 import java.io.IOException;
 import java.util.ArrayList;
+import java.util.HashMap;
 import java.util.List;
+import java.util.Map;
 
 import org.apache.cassandra.tools.NodeProbe;
 import org.apache.cassandra.tools.NodeTool.NodeToolCmd;
@@ -45,6 +47,9 @@
     @Option(title = "ktlist", name = { "-kt", "--kt-list", "-kc", "--kc.list" }, description = "The list of Keyspace.table to take snapshot.(you must not specify only keyspace)")
     private String ktList = null;
 
+    @Option(title = "skip-flush", name = {"-sf", "--skip-flush"}, description = "Do not flush memtables before snapshotting (snapshot will not contain unflushed data)")
+    private boolean skipFlush = false;
+
     @Override
     public void execute(NodeProbe probe)
     {
@@ -54,6 +59,9 @@
 
             sb.append("Requested creating snapshot(s) for ");
 
+            Map<String, String> options = new HashMap<String,String>();
+            options.put("skipFlush", Boolean.toString(skipFlush));
+
             // Create a separate path for kclist to avoid breaking of already existing scripts
             if (null != ktList && !ktList.isEmpty())
             {
@@ -67,8 +75,9 @@
                 }
                 if (!snapshotName.isEmpty())
                     sb.append(" with snapshot name [").append(snapshotName).append("]");
+                sb.append(" and options ").append(options.toString());
                 System.out.println(sb.toString());
-                probe.takeMultipleTableSnapshot(snapshotName, ktList.split(","));
+                probe.takeMultipleTableSnapshot(snapshotName, options, ktList.split(","));
                 System.out.println("Snapshot directory: " + snapshotName);
             }
             else
@@ -80,10 +89,10 @@
 
                 if (!snapshotName.isEmpty())
                     sb.append(" with snapshot name [").append(snapshotName).append("]");
-
+                sb.append(" and options ").append(options.toString());
                 System.out.println(sb.toString());
 
-                probe.takeSnapshot(snapshotName, table, toArray(keyspaces, String.class));
+                probe.takeSnapshot(snapshotName, table, options, toArray(keyspaces, String.class));
                 System.out.println("Snapshot directory: " + snapshotName);
             }
         }

diff --git a/src/java/org/apache/cassandra/tools/nodetool/Status.java b/src/java/org/apache/cassandra/tools/nodetool/Status.java
index 99f745d..a43b703 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/Status.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/Status.java

@@ -81,7 +81,7 @@
         }
         catch (IllegalArgumentException ex)
         {
-            System.out.printf("%nError: " + ex.getMessage() + "%n");
+            System.out.printf("%nError: %s%n", ex.getMessage());
             System.exit(1);
         }
 
@@ -204,4 +204,4 @@
 
         return format;
     }
-}
\ No newline at end of file
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/TableHistograms.java b/src/java/org/apache/cassandra/tools/nodetool/TableHistograms.java
index be3f799..8f4ffa6 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/TableHistograms.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/TableHistograms.java

@@ -34,16 +34,29 @@
 @Command(name = "tablehistograms", description = "Print statistic histograms for a given table")
 public class TableHistograms extends NodeToolCmd
 {
-    @Arguments(usage = "<keyspace> <table>", description = "The keyspace and table name")
+    @Arguments(usage = "<keyspace> <table> | <keyspace.table>", description = "The keyspace and table name")
     private List<String> args = new ArrayList<>();
 
     @Override
     public void execute(NodeProbe probe)
     {
-        checkArgument(args.size() == 2, "tablehistograms requires keyspace and table name arguments");
-
-        String keyspace = args.get(0);
-        String table = args.get(1);
+        String keyspace = null, table = null;
+        if (args.size() == 2)
+        {
+            keyspace = args.get(0);
+            table = args.get(1);
+        }
+        else if (args.size() == 1)
+        {
+            String[] input = args.get(0).split("\\.");
+            checkArgument(input.length == 2, "tablehistograms requires keyspace and table name arguments");
+            keyspace = input[0];
+            table = input[1];
+        }
+        else
+        {
+            checkArgument(false, "tablehistograms requires keyspace and table name arguments");
+        }
 
         // calculate percentile of row size and column count
         long[] estimatedPartitionSize = (long[]) probe.getColumnFamilyMetric(keyspace, table, "EstimatedPartitionSizeHistogram");

diff --git a/src/java/org/apache/cassandra/tools/nodetool/TableStats.java b/src/java/org/apache/cassandra/tools/nodetool/TableStats.java
index bb7f192..ec729a5 100644
--- a/src/java/org/apache/cassandra/tools/nodetool/TableStats.java
+++ b/src/java/org/apache/cassandra/tools/nodetool/TableStats.java

@@ -17,23 +17,23 @@
  */
 package org.apache.cassandra.tools.nodetool;
 
+import java.util.*;
+import javax.management.InstanceNotFoundException;
+
+import com.google.common.collect.ArrayListMultimap;
 import io.airlift.command.Arguments;
 import io.airlift.command.Command;
 import io.airlift.command.Option;
 
-import java.util.ArrayList;
-import java.util.HashMap;
-import java.util.Iterator;
-import java.util.List;
-import java.util.Map;
-
-import javax.management.InstanceNotFoundException;
-
 import org.apache.cassandra.db.ColumnFamilyStoreMBean;
 import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.metrics.CassandraMetricsRegistry;
 import org.apache.cassandra.tools.NodeProbe;
 import org.apache.cassandra.tools.NodeTool.NodeToolCmd;
+import org.apache.cassandra.tools.nodetool.stats.StatsHolder;
+import org.apache.cassandra.tools.nodetool.stats.StatsKeyspace;
+import org.apache.cassandra.tools.nodetool.stats.StatsTable;
+import org.apache.cassandra.tools.nodetool.stats.TableStatsPrinter;
 
 @Command(name = "tablestats", description = "Print statistics on tables")
 public class TableStats extends NodeToolCmd
@@ -46,113 +46,83 @@
 
     @Option(title = "human_readable",
             name = {"-H", "--human-readable"},
-            description = "Display bytes in human readable form, i.e. KB, MB, GB, TB")
+            description = "Display bytes in human readable form, i.e. KiB, MiB, GiB, TiB")
     private boolean humanReadable = false;
 
+    @Option(title = "format",
+            name = {"-F", "--format"},
+            description = "Output format (json, yaml)")
+    private String outputFormat = "";
+
     @Override
     public void execute(NodeProbe probe)
     {
-        TableStats.OptionFilter filter = new OptionFilter(ignore, tableNames);
-        Map<String, List<ColumnFamilyStoreMBean>> tableStoreMap = new HashMap<>();
-
-        // get a list of column family stores
-        Iterator<Map.Entry<String, ColumnFamilyStoreMBean>> tables = probe.getColumnFamilyStoreMBeanProxies();
-
-        while (tables.hasNext())
+        if (!outputFormat.isEmpty() && !"json".equals(outputFormat) && !"yaml".equals(outputFormat))
         {
-            Map.Entry<String, ColumnFamilyStoreMBean> entry = tables.next();
+            throw new IllegalArgumentException("arguments for -F are json,yaml only.");
+        }
+
+        TableStats.OptionFilter filter = new OptionFilter(ignore, tableNames);
+        ArrayListMultimap<String, ColumnFamilyStoreMBean> selectedTableMbeans = ArrayListMultimap.create();
+        Map<String, StatsKeyspace> keyspaceStats = new HashMap<>();
+
+        // get a list of table stores
+        Iterator<Map.Entry<String, ColumnFamilyStoreMBean>> tableMBeans = probe.getColumnFamilyStoreMBeanProxies();
+
+        while (tableMBeans.hasNext())
+        {
+            Map.Entry<String, ColumnFamilyStoreMBean> entry = tableMBeans.next();
             String keyspaceName = entry.getKey();
             ColumnFamilyStoreMBean tableProxy = entry.getValue();
 
-            if (!tableStoreMap.containsKey(keyspaceName) && filter.isColumnFamilyIncluded(entry.getKey(), tableProxy.getColumnFamilyName()))
+            if (filter.isKeyspaceIncluded(keyspaceName))
             {
-                List<ColumnFamilyStoreMBean> columnFamilies = new ArrayList<>();
-                columnFamilies.add(tableProxy);
-                tableStoreMap.put(keyspaceName, columnFamilies);
-            } else if (filter.isColumnFamilyIncluded(entry.getKey(), tableProxy.getColumnFamilyName()))
-            {
-                tableStoreMap.get(keyspaceName).add(tableProxy);
+                StatsKeyspace stats = keyspaceStats.get(keyspaceName);
+                if (stats == null)
+                {
+                    stats = new StatsKeyspace(probe, keyspaceName);
+                    keyspaceStats.put(keyspaceName, stats);
+                }
+                stats.add(tableProxy);
+
+                if (filter.isTableIncluded(keyspaceName, tableProxy.getTableName()))
+                    selectedTableMbeans.put(keyspaceName, tableProxy);
             }
         }
 
         // make sure all specified keyspace and tables exist
         filter.verifyKeyspaces(probe.getKeyspaces());
-        filter.verifyColumnFamilies();
+        filter.verifyTables();
 
-        // print out the table statistics
-        for (Map.Entry<String, List<ColumnFamilyStoreMBean>> entry : tableStoreMap.entrySet())
+        // get metrics of keyspace
+        StatsHolder holder = new StatsHolder();
+        for (Map.Entry<String, Collection<ColumnFamilyStoreMBean>> entry : selectedTableMbeans.asMap().entrySet())
         {
             String keyspaceName = entry.getKey();
-            List<ColumnFamilyStoreMBean> columnFamilies = entry.getValue();
-            long keyspaceReadCount = 0;
-            long keyspaceWriteCount = 0;
-            int keyspacePendingFlushes = 0;
-            double keyspaceTotalReadTime = 0.0f;
-            double keyspaceTotalWriteTime = 0.0f;
+            Collection<ColumnFamilyStoreMBean> tables = entry.getValue();
+            StatsKeyspace statsKeyspace = keyspaceStats.get(keyspaceName);
 
-            System.out.println("Keyspace: " + keyspaceName);
-            for (ColumnFamilyStoreMBean table : columnFamilies)
+            // get metrics of table statistics for this keyspace
+            for (ColumnFamilyStoreMBean table : tables)
             {
-                String tableName = table.getColumnFamilyName();
-                long writeCount = ((CassandraMetricsRegistry.JmxTimerMBean) probe.getColumnFamilyMetric(keyspaceName, tableName, "WriteLatency")).getCount();
-                long readCount = ((CassandraMetricsRegistry.JmxTimerMBean) probe.getColumnFamilyMetric(keyspaceName, tableName, "ReadLatency")).getCount();
-
-                if (readCount > 0)
-                {
-                    keyspaceReadCount += readCount;
-                    keyspaceTotalReadTime += (long) probe.getColumnFamilyMetric(keyspaceName, tableName, "ReadTotalLatency");
-                }
-                if (writeCount > 0)
-                {
-                    keyspaceWriteCount += writeCount;
-                    keyspaceTotalWriteTime += (long) probe.getColumnFamilyMetric(keyspaceName, tableName, "WriteTotalLatency");
-                }
-                keyspacePendingFlushes += (long) probe.getColumnFamilyMetric(keyspaceName, tableName, "PendingFlushes");
-            }
-
-            double keyspaceReadLatency = keyspaceReadCount > 0
-                                         ? keyspaceTotalReadTime / keyspaceReadCount / 1000
-                                         : Double.NaN;
-            double keyspaceWriteLatency = keyspaceWriteCount > 0
-                                          ? keyspaceTotalWriteTime / keyspaceWriteCount / 1000
-                                          : Double.NaN;
-
-            System.out.println("\tRead Count: " + keyspaceReadCount);
-            System.out.println("\tRead Latency: " + String.format("%s", keyspaceReadLatency) + " ms.");
-            System.out.println("\tWrite Count: " + keyspaceWriteCount);
-            System.out.println("\tWrite Latency: " + String.format("%s", keyspaceWriteLatency) + " ms.");
-            System.out.println("\tPending Flushes: " + keyspacePendingFlushes);
-
-            // print out column family statistics for this keyspace
-            for (ColumnFamilyStoreMBean table : columnFamilies)
-            {
-                String tableName = table.getColumnFamilyName();
-                if (tableName.contains("."))
-                    System.out.println("\t\tTable (index): " + tableName);
-                else
-                    System.out.println("\t\tTable: " + tableName);
-
-                System.out.println("\t\tSSTable count: " + probe.getColumnFamilyMetric(keyspaceName, tableName, "LiveSSTableCount"));
-
+                String tableName = table.getTableName();
+                StatsTable statsTable = new StatsTable();
+                statsTable.name = tableName;
+                statsTable.isIndex = tableName.contains(".");
+                statsTable.sstableCount = probe.getColumnFamilyMetric(keyspaceName, tableName, "LiveSSTableCount");
                 int[] leveledSStables = table.getSSTableCountPerLevel();
                 if (leveledSStables != null)
                 {
-                    System.out.print("\t\tSSTables in each level: [");
+                    statsTable.isLeveledSstable = true;
+
                     for (int level = 0; level < leveledSStables.length; level++)
                     {
                         int count = leveledSStables[level];
-                        System.out.print(count);
                         long maxCount = 4L; // for L0
                         if (level > 0)
                             maxCount = (long) Math.pow(10, level);
-                        //  show max threshold for level when exceeded
-                        if (count > maxCount)
-                            System.out.print("/" + maxCount);
-
-                        if (level < leveledSStables.length - 1)
-                            System.out.print(", ");
-                        else
-                            System.out.println("]");
+                        // show max threshold for level when exceeded
+                        statsTable.sstablesInEachLevel.add(count + ((count > maxCount) ? "/" + maxCount : ""));
                     }
                 }
 
@@ -160,8 +130,8 @@
                 Long bloomFilterOffHeapSize = null;
                 Long indexSummaryOffHeapSize = null;
                 Long compressionMetadataOffHeapSize = null;
-
                 Long offHeapSize = null;
+                Double percentRepaired = null;
 
                 try
                 {
@@ -169,8 +139,8 @@
                     bloomFilterOffHeapSize = (Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "BloomFilterOffHeapMemoryUsed");
                     indexSummaryOffHeapSize = (Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "IndexSummaryOffHeapMemoryUsed");
                     compressionMetadataOffHeapSize = (Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "CompressionMetadataOffHeapMemoryUsed");
-
                     offHeapSize = memtableOffHeapSize + bloomFilterOffHeapSize + indexSummaryOffHeapSize + compressionMetadataOffHeapSize;
+                    percentRepaired = (Double) probe.getColumnFamilyMetric(keyspaceName, tableName, "PercentRepaired");
                 }
                 catch (RuntimeException e)
                 {
@@ -179,61 +149,90 @@
                         throw e;
                 }
 
-                System.out.println("\t\tSpace used (live): " + format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "LiveDiskSpaceUsed"), humanReadable));
-                System.out.println("\t\tSpace used (total): " + format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "TotalDiskSpaceUsed"), humanReadable));
-                System.out.println("\t\tSpace used by snapshots (total): " + format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "SnapshotsSize"), humanReadable));
+                statsTable.spaceUsedLive = format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "LiveDiskSpaceUsed"), humanReadable);
+                statsTable.spaceUsedTotal = format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "TotalDiskSpaceUsed"), humanReadable);
+                statsTable.spaceUsedBySnapshotsTotal = format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "SnapshotsSize"), humanReadable);
                 if (offHeapSize != null)
-                    System.out.println("\t\tOff heap memory used (total): " + format(offHeapSize, humanReadable));
-                System.out.println("\t\tSSTable Compression Ratio: " + probe.getColumnFamilyMetric(keyspaceName, tableName, "CompressionRatio"));
+                {
+                    statsTable.offHeapUsed = true;
+                    statsTable.offHeapMemoryUsedTotal = format(offHeapSize, humanReadable);
 
+                }
+                if (percentRepaired != null)
+                {
+                    statsTable.percentRepaired = Math.round(100 * percentRepaired) / 100.0;
+                }
+                statsTable.sstableCompressionRatio = probe.getColumnFamilyMetric(keyspaceName, tableName, "CompressionRatio");
                 Object estimatedPartitionCount = probe.getColumnFamilyMetric(keyspaceName, tableName, "EstimatedPartitionCount");
                 if (Long.valueOf(-1L).equals(estimatedPartitionCount))
                 {
                     estimatedPartitionCount = 0L;
                 }
-                System.out.println("\t\tNumber of keys (estimate): " + estimatedPartitionCount);
+                statsTable.numberOfKeysEstimate = estimatedPartitionCount;
 
-                System.out.println("\t\tMemtable cell count: " + probe.getColumnFamilyMetric(keyspaceName, tableName, "MemtableColumnsCount"));
-                System.out.println("\t\tMemtable data size: " + format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "MemtableLiveDataSize"), humanReadable));
+                statsTable.memtableCellCount = probe.getColumnFamilyMetric(keyspaceName, tableName, "MemtableColumnsCount");
+                statsTable.memtableDataSize = format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "MemtableLiveDataSize"), humanReadable);
                 if (memtableOffHeapSize != null)
-                    System.out.println("\t\tMemtable off heap memory used: " + format(memtableOffHeapSize, humanReadable));
-                System.out.println("\t\tMemtable switch count: " + probe.getColumnFamilyMetric(keyspaceName, tableName, "MemtableSwitchCount"));
-                System.out.println("\t\tLocal read count: " + ((CassandraMetricsRegistry.JmxTimerMBean) probe.getColumnFamilyMetric(keyspaceName, tableName, "ReadLatency")).getCount());
+                {
+                    statsTable.memtableOffHeapUsed = true;
+                    statsTable.memtableOffHeapMemoryUsed = format(memtableOffHeapSize, humanReadable);
+                }
+                statsTable.memtableSwitchCount = probe.getColumnFamilyMetric(keyspaceName, tableName, "MemtableSwitchCount");
+                statsTable.localReadCount = ((CassandraMetricsRegistry.JmxTimerMBean) probe.getColumnFamilyMetric(keyspaceName, tableName, "ReadLatency")).getCount();
+
                 double localReadLatency = ((CassandraMetricsRegistry.JmxTimerMBean) probe.getColumnFamilyMetric(keyspaceName, tableName, "ReadLatency")).getMean() / 1000;
                 double localRLatency = localReadLatency > 0 ? localReadLatency : Double.NaN;
-                System.out.printf("\t\tLocal read latency: %01.3f ms%n", localRLatency);
-                System.out.println("\t\tLocal write count: " + ((CassandraMetricsRegistry.JmxTimerMBean) probe.getColumnFamilyMetric(keyspaceName, tableName, "WriteLatency")).getCount());
+                statsTable.localReadLatencyMs = localRLatency;
+                statsTable.localWriteCount = ((CassandraMetricsRegistry.JmxTimerMBean) probe.getColumnFamilyMetric(keyspaceName, tableName, "WriteLatency")).getCount();
+
                 double localWriteLatency = ((CassandraMetricsRegistry.JmxTimerMBean) probe.getColumnFamilyMetric(keyspaceName, tableName, "WriteLatency")).getMean() / 1000;
                 double localWLatency = localWriteLatency > 0 ? localWriteLatency : Double.NaN;
-                System.out.printf("\t\tLocal write latency: %01.3f ms%n", localWLatency);
-                System.out.println("\t\tPending flushes: " + probe.getColumnFamilyMetric(keyspaceName, tableName, "PendingFlushes"));
-                System.out.println("\t\tBloom filter false positives: " + probe.getColumnFamilyMetric(keyspaceName, tableName, "BloomFilterFalsePositives"));
-                System.out.printf("\t\tBloom filter false ratio: %s%n", String.format("%01.5f", probe.getColumnFamilyMetric(keyspaceName, tableName, "RecentBloomFilterFalseRatio")));
-                System.out.println("\t\tBloom filter space used: " + format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "BloomFilterDiskSpaceUsed"), humanReadable));
+                statsTable.localWriteLatencyMs = localWLatency;
+                statsTable.pendingFlushes = probe.getColumnFamilyMetric(keyspaceName, tableName, "PendingFlushes");
+
+                statsTable.bloomFilterFalsePositives = probe.getColumnFamilyMetric(keyspaceName, tableName, "BloomFilterFalsePositives");
+                statsTable.bloomFilterFalseRatio = probe.getColumnFamilyMetric(keyspaceName, tableName, "RecentBloomFilterFalseRatio");
+                statsTable.bloomFilterSpaceUsed = format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "BloomFilterDiskSpaceUsed"), humanReadable);
+
                 if (bloomFilterOffHeapSize != null)
-                    System.out.println("\t\tBloom filter off heap memory used: " + format(bloomFilterOffHeapSize, humanReadable));
+                {
+                    statsTable.bloomFilterOffHeapUsed = true;
+                    statsTable.bloomFilterOffHeapMemoryUsed = format(bloomFilterOffHeapSize, humanReadable);
+                }
+
                 if (indexSummaryOffHeapSize != null)
-                    System.out.println("\t\tIndex summary off heap memory used: " + format(indexSummaryOffHeapSize, humanReadable));
+                {
+                    statsTable.indexSummaryOffHeapUsed = true;
+                    statsTable.indexSummaryOffHeapMemoryUsed = format(indexSummaryOffHeapSize, humanReadable);
+                }
                 if (compressionMetadataOffHeapSize != null)
-                    System.out.println("\t\tCompression metadata off heap memory used: " + format(compressionMetadataOffHeapSize, humanReadable));
+                {
+                    statsTable.compressionMetadataOffHeapUsed = true;
+                    statsTable.compressionMetadataOffHeapMemoryUsed = format(compressionMetadataOffHeapSize, humanReadable);
+                }
+                statsTable.compactedPartitionMinimumBytes = (Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "MinPartitionSize");
+                statsTable.compactedPartitionMaximumBytes = (Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "MaxPartitionSize");
+                statsTable.compactedPartitionMeanBytes = (Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "MeanPartitionSize");
 
-                System.out.println("\t\tCompacted partition minimum bytes: " + format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "MinPartitionSize"), humanReadable));
-                System.out.println("\t\tCompacted partition maximum bytes: " + format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "MaxPartitionSize"), humanReadable));
-                System.out.println("\t\tCompacted partition mean bytes: " + format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "MeanPartitionSize"), humanReadable));
                 CassandraMetricsRegistry.JmxHistogramMBean histogram = (CassandraMetricsRegistry.JmxHistogramMBean) probe.getColumnFamilyMetric(keyspaceName, tableName, "LiveScannedHistogram");
-                System.out.println("\t\tAverage live cells per slice (last five minutes): " + histogram.getMean());
-                System.out.println("\t\tMaximum live cells per slice (last five minutes): " + histogram.getMax());
-                histogram = (CassandraMetricsRegistry.JmxHistogramMBean) probe.getColumnFamilyMetric(keyspaceName, tableName, "TombstoneScannedHistogram");
-                System.out.println("\t\tAverage tombstones per slice (last five minutes): " + histogram.getMean());
-                System.out.println("\t\tMaximum tombstones per slice (last five minutes): " + histogram.getMax());
+                statsTable.averageLiveCellsPerSliceLastFiveMinutes = histogram.getMean();
+                statsTable.maximumLiveCellsPerSliceLastFiveMinutes = histogram.getMax();
 
-                System.out.println("");
+                histogram = (CassandraMetricsRegistry.JmxHistogramMBean) probe.getColumnFamilyMetric(keyspaceName, tableName, "TombstoneScannedHistogram");
+                statsTable.averageTombstonesPerSliceLastFiveMinutes = histogram.getMean();
+                statsTable.maximumTombstonesPerSliceLastFiveMinutes = histogram.getMax();
+                statsTable.droppedMutations = format((Long) probe.getColumnFamilyMetric(keyspaceName, tableName, "DroppedMutations"), humanReadable);
+                statsKeyspace.tables.add(statsTable);
             }
-            System.out.println("----------------");
+            holder.keyspaces.add(statsKeyspace);
         }
+        // print out the keyspace and table statistics
+        TableStatsPrinter printer = TableStatsPrinter.from(outputFormat);
+        printer.print(holder, System.out);
     }
 
-    private String format(long bytes, boolean humanReadable) {
+    private String format(long bytes, boolean humanReadable)
+    {
         return humanReadable ? FileUtils.stringifyFileSize(bytes) : Long.toString(bytes);
     }
 
@@ -242,12 +241,13 @@
      */
     private static class OptionFilter
     {
-        private Map<String, List<String>> filter = new HashMap<>();
-        private Map<String, List<String>> verifier = new HashMap<>();
-        private List<String> filterList = new ArrayList<>();
-        private boolean ignoreMode;
+        private final Map<String, List<String>> filter = new HashMap<>();
+        private final Map<String, List<String>> verifier = new HashMap<>(); // Same as filter initially, but we remove tables every time we've checked them for inclusion
+                                                                            // in isTableIncluded() so that we detect if those table requested don't exist (verifyTables())
+        private final List<String> filterList = new ArrayList<>();
+        private final boolean ignoreMode;
 
-        public OptionFilter(boolean ignoreMode, List<String> filterList)
+        OptionFilter(boolean ignoreMode, List<String> filterList)
         {
             this.filterList.addAll(filterList);
             this.ignoreMode = ignoreMode;
@@ -259,26 +259,19 @@
                 // build the map that stores the keyspaces and tables to use
                 if (!filter.containsKey(keyValues[0]))
                 {
-                    filter.put(keyValues[0], new ArrayList<String>());
-                    verifier.put(keyValues[0], new ArrayList<String>());
+                    filter.put(keyValues[0], new ArrayList<>());
+                    verifier.put(keyValues[0], new ArrayList<>());
+                }
 
-                    if (keyValues.length == 2)
-                    {
-                        filter.get(keyValues[0]).add(keyValues[1]);
-                        verifier.get(keyValues[0]).add(keyValues[1]);
-                    }
-                } else
+                if (keyValues.length == 2)
                 {
-                    if (keyValues.length == 2)
-                    {
-                        filter.get(keyValues[0]).add(keyValues[1]);
-                        verifier.get(keyValues[0]).add(keyValues[1]);
-                    }
+                    filter.get(keyValues[0]).add(keyValues[1]);
+                    verifier.get(keyValues[0]).add(keyValues[1]);
                 }
             }
         }
 
-        public boolean isColumnFamilyIncluded(String keyspace, String columnFamily)
+        public boolean isTableIncluded(String keyspace, String table)
         {
             // supplying empty params list is treated as wanting to display all keyspaces and tables
             if (filterList.isEmpty())
@@ -291,12 +284,24 @@
                 return ignoreMode;
                 // only a keyspace with no tables was supplied
                 // so ignore or include (based on the flag) every column family in specified keyspace
-            else if (tables.size() == 0)
+            else if (tables.isEmpty())
                 return !ignoreMode;
 
             // keyspace exists, and it contains specific table
-            verifier.get(keyspace).remove(columnFamily);
-            return ignoreMode ^ tables.contains(columnFamily);
+            verifier.get(keyspace).remove(table);
+            return ignoreMode ^ tables.contains(table);
+        }
+
+        public boolean isKeyspaceIncluded(String keyspace)
+        {
+            // supplying empty params list is treated as wanting to display all keyspaces and tables
+            if (filterList.isEmpty())
+                return !ignoreMode;
+
+            // Note that if there is any table for the keyspace, we want to include the keyspace irregarding
+            // of the ignoreMode, since the ignoreMode then apply to the table inside the keyspace but the
+            // keyspace itself is not ignored
+            return filter.get(keyspace) != null || ignoreMode;
         }
 
         public void verifyKeyspaces(List<String> keyspaces)
@@ -306,10 +311,10 @@
                     throw new IllegalArgumentException("Unknown keyspace: " + ks);
         }
 
-        public void verifyColumnFamilies()
+        public void verifyTables()
         {
             for (String ks : filter.keySet())
-                if (verifier.get(ks).size() > 0)
+                if (!verifier.get(ks).isEmpty())
                     throw new IllegalArgumentException("Unknown tables: " + verifier.get(ks) + " in keyspace: " + ks);
         }
     }

diff --git a/src/java/org/apache/cassandra/tools/nodetool/ViewBuildStatus.java b/src/java/org/apache/cassandra/tools/nodetool/ViewBuildStatus.java
new file mode 100644
index 0000000..0696396
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/nodetool/ViewBuildStatus.java

@@ -0,0 +1,84 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.tools.nodetool;
+
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+import io.airlift.command.Arguments;
+import io.airlift.command.Command;
+import org.apache.cassandra.tools.NodeProbe;
+import org.apache.cassandra.tools.NodeTool;
+import org.apache.cassandra.tools.nodetool.formatter.TableBuilder;
+
+import static com.google.common.base.Preconditions.checkArgument;
+
+@Command(name = "viewbuildstatus", description = "Show progress of a materialized view build")
+public class ViewBuildStatus extends NodeTool.NodeToolCmd
+{
+    private final static String SUCCESS = "SUCCESS";
+
+    @Arguments(usage = "<keyspace> <view> | <keyspace.view>", description = "The keyspace and view name")
+    private List<String> args = new ArrayList<>();
+
+    protected void execute(NodeProbe probe)
+    {
+        String keyspace = null, view = null;
+        if (args.size() == 2)
+        {
+            keyspace = args.get(0);
+            view = args.get(1);
+        }
+        else if (args.size() == 1)
+        {
+            String[] input = args.get(0).split("\\.");
+            checkArgument(input.length == 2, "viewbuildstatus requires keyspace and view name arguments");
+            keyspace = input[0];
+            view = input[1];
+        }
+        else
+        {
+            checkArgument(false, "viewbuildstatus requires keyspace and view name arguments");
+        }
+
+        Map<String, String> buildStatus = probe.getViewBuildStatuses(keyspace, view);
+        boolean failed = false;
+        TableBuilder builder = new TableBuilder();
+
+        builder.add("Host", "Info");
+        for (Map.Entry<String, String> status : buildStatus.entrySet())
+        {
+            if (!status.getValue().equals(SUCCESS)) {
+                failed = true;
+            }
+            builder.add(status.getKey(), status.getValue());
+        }
+
+        if (failed) {
+            System.out.println(String.format("%s.%s has not finished building; node status is below.", keyspace, view));
+            System.out.println();
+            builder.printTo(System.out);
+            System.exit(1);
+        } else {
+            System.out.println(String.format("%s.%s has finished building", keyspace, view));
+            System.exit(0);
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/formatter/TableBuilder.java b/src/java/org/apache/cassandra/tools/nodetool/formatter/TableBuilder.java
new file mode 100644
index 0000000..a56e52e
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/nodetool/formatter/TableBuilder.java

@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.tools.nodetool.formatter;
+
+import java.io.PrintStream;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Objects;
+import javax.annotation.Nonnull;
+
+/**
+ * Build and print table.
+ *
+ * usage:
+ * <pre>
+ * {@code
+ * TableBuilder table = new TableBuilder();
+ * for (String[] row : data)
+ * {
+ *     table.add(row);
+ * }
+ * table.print(System.out);
+ * }
+ * </pre>
+ */
+public class TableBuilder
+{
+    // column delimiter char
+    private final char columnDelimiter;
+
+    private int[] maximumColumnWidth;
+    private final List<String[]> rows = new ArrayList<>();
+
+    public TableBuilder()
+    {
+        this(' ');
+    }
+
+    public TableBuilder(char columnDelimiter)
+    {
+        this.columnDelimiter = columnDelimiter;
+    }
+
+    public void add(@Nonnull String... row)
+    {
+        Objects.requireNonNull(row);
+
+        if (rows.isEmpty())
+        {
+            maximumColumnWidth = new int[row.length];
+        }
+
+        // expand max column widths if given row has more columns
+        if (row.length > maximumColumnWidth.length)
+        {
+            int[] tmp = new int[row.length];
+            System.arraycopy(maximumColumnWidth, 0, tmp, 0, maximumColumnWidth.length);
+            maximumColumnWidth = tmp;
+        }
+        // calculate maximum column width
+        int i = 0;
+        for (String col : row)
+        {
+            maximumColumnWidth[i] = Math.max(maximumColumnWidth[i], col != null ? col.length() : 1);
+            i++;
+        }
+        rows.add(row);
+    }
+
+    public void printTo(PrintStream out)
+    {
+        if (rows.isEmpty())
+            return;
+
+        for (String[] row : rows)
+        {
+            for (int i = 0; i < maximumColumnWidth.length; i++)
+            {
+                String col = i < row.length ? row[i] : "";
+                out.print(String.format("%-" + maximumColumnWidth[i] + 's', col != null ? col : ""));
+                if (i < maximumColumnWidth.length - 1)
+                    out.print(columnDelimiter);
+            }
+            out.println();
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/stats/StatsHolder.java b/src/java/org/apache/cassandra/tools/nodetool/stats/StatsHolder.java
new file mode 100644
index 0000000..e26f3f7
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/nodetool/stats/StatsHolder.java

@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.tools.nodetool.stats;
+
+import java.util.ArrayList;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+public class StatsHolder
+{
+    public List<StatsKeyspace> keyspaces;
+
+    public StatsHolder()
+    {
+        keyspaces = new ArrayList<>();
+    }
+
+    public Map<String, HashMap<String, Object>> convert2Map()
+    {
+        HashMap<String, HashMap<String, Object>> mpRet = new HashMap<>();
+        for (StatsKeyspace keyspace : keyspaces)
+        {
+            // store each keyspace's metrics to map
+            HashMap<String, Object> mpKeyspace = new HashMap<>();
+            mpKeyspace.put("read_latency", keyspace.readLatency());
+            mpKeyspace.put("read_count", keyspace.readCount);
+            mpKeyspace.put("read_latency_ms", keyspace.readLatency());
+            mpKeyspace.put("write_count", keyspace.writeCount);
+            mpKeyspace.put("write_latency_ms", keyspace.writeLatency());
+            mpKeyspace.put("pending_flushes", keyspace.pendingFlushes);
+
+            // store each table's metrics to map
+            List<StatsTable> tables = keyspace.tables;
+            Map<String, Map<String, Object>> mpTables = new HashMap<>();
+            for (StatsTable table : tables)
+            {
+                Map<String, Object> mpTable = new HashMap<>();
+
+                mpTable.put("sstables_in_each_level", table.sstablesInEachLevel);
+                mpTable.put("space_used_live", table.spaceUsedLive);
+                mpTable.put("space_used_total", table.spaceUsedTotal);
+                mpTable.put("space_used_by_snapshots_total", table.spaceUsedBySnapshotsTotal);
+                if (table.offHeapUsed)
+                    mpTable.put("off_heap_memory_used_total", table.offHeapMemoryUsedTotal);
+                mpTable.put("sstable_compression_ratio", table.sstableCompressionRatio);
+                mpTable.put("number_of_keys_estimate", table.numberOfKeysEstimate);
+                mpTable.put("memtable_cell_count", table.memtableCellCount);
+                mpTable.put("memtable_data_size", table.memtableDataSize);
+                if (table.memtableOffHeapUsed)
+                    mpTable.put("memtable_off_heap_memory_used", table.memtableOffHeapMemoryUsed);
+                mpTable.put("memtable_switch_count", table.memtableSwitchCount);
+                mpTable.put("local_read_count", table.localReadCount);
+                mpTable.put("local_read_latency_ms", String.format("%01.3f", table.localReadLatencyMs));
+                mpTable.put("local_write_count", table.localWriteCount);
+                mpTable.put("local_write_latency_ms", String.format("%01.3f", table.localWriteLatencyMs));
+                mpTable.put("pending_flushes", table.pendingFlushes);
+                mpTable.put("percent_repaired", table.percentRepaired);
+                mpTable.put("bloom_filter_false_positives", table.bloomFilterFalsePositives);
+                mpTable.put("bloom_filter_false_ratio", String.format("%01.5f", table.bloomFilterFalseRatio));
+                mpTable.put("bloom_filter_space_used", table.bloomFilterSpaceUsed);
+                if (table.bloomFilterOffHeapUsed)
+                    mpTable.put("bloom_filter_off_heap_memory_used", table.bloomFilterOffHeapMemoryUsed);
+                if (table.indexSummaryOffHeapUsed)
+                    mpTable.put("index_summary_off_heap_memory_used", table.indexSummaryOffHeapMemoryUsed);
+                if (table.compressionMetadataOffHeapUsed)
+                    mpTable.put("compression_metadata_off_heap_memory_used",
+                                table.compressionMetadataOffHeapMemoryUsed);
+                mpTable.put("compacted_partition_minimum_bytes", table.compactedPartitionMinimumBytes);
+                mpTable.put("compacted_partition_maximum_bytes", table.compactedPartitionMaximumBytes);
+                mpTable.put("compacted_partition_mean_bytes", table.compactedPartitionMeanBytes);
+                mpTable.put("average_live_cells_per_slice_last_five_minutes",
+                            table.averageLiveCellsPerSliceLastFiveMinutes);
+                mpTable.put("maximum_live_cells_per_slice_last_five_minutes",
+                            table.maximumLiveCellsPerSliceLastFiveMinutes);
+                mpTable.put("average_tombstones_per_slice_last_five_minutes",
+                            table.averageTombstonesPerSliceLastFiveMinutes);
+                mpTable.put("maximum_tombstones_per_slice_last_five_minutes",
+                            table.maximumTombstonesPerSliceLastFiveMinutes);
+                mpTable.put("dropped_mutations", table.droppedMutations);
+
+                mpTables.put(table.name, mpTable);
+            }
+            mpKeyspace.put("tables", mpTables);
+            mpRet.put(keyspace.name, mpKeyspace);
+        }
+        return mpRet;
+    }
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/stats/StatsKeyspace.java b/src/java/org/apache/cassandra/tools/nodetool/stats/StatsKeyspace.java
new file mode 100644
index 0000000..dc15332
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/nodetool/stats/StatsKeyspace.java

@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.tools.nodetool.stats;
+
+import java.util.ArrayList;
+import java.util.List;
+
+import org.apache.cassandra.db.ColumnFamilyStoreMBean;
+import org.apache.cassandra.metrics.CassandraMetricsRegistry;
+import org.apache.cassandra.tools.NodeProbe;
+
+public class StatsKeyspace
+{
+    public List<StatsTable> tables = new ArrayList<>();
+    private final NodeProbe probe;
+
+    public String name;
+    public long readCount;
+    public long writeCount;
+    public int pendingFlushes;
+    private double totalReadTime;
+    private double totalWriteTime;
+
+    public StatsKeyspace(NodeProbe probe, String keyspaceName)
+    {
+        this.probe = probe;
+        this.name = keyspaceName;
+    }
+
+    public void add(ColumnFamilyStoreMBean table)
+    {
+        String tableName = table.getTableName();
+        long tableWriteCount = ((CassandraMetricsRegistry.JmxTimerMBean) probe.getColumnFamilyMetric(name, tableName, "WriteLatency")).getCount();
+        long tableReadCount = ((CassandraMetricsRegistry.JmxTimerMBean) probe.getColumnFamilyMetric(name, tableName, "ReadLatency")).getCount();
+
+        if (tableReadCount > 0)
+        {
+            readCount += tableReadCount;
+            totalReadTime += (long) probe.getColumnFamilyMetric(name, tableName, "ReadTotalLatency");
+        }
+        if (tableWriteCount > 0)
+        {
+            writeCount += tableWriteCount;
+            totalWriteTime += (long) probe.getColumnFamilyMetric(name, tableName, "WriteTotalLatency");
+        }
+        pendingFlushes += (long) probe.getColumnFamilyMetric(name, tableName, "PendingFlushes");
+    }
+
+    public double readLatency()
+    {
+        return readCount > 0
+               ? totalReadTime / readCount / 1000
+               : Double.NaN;
+    }
+
+    public double writeLatency()
+    {
+        return writeCount > 0
+               ? totalWriteTime / writeCount / 1000
+               : Double.NaN;
+    }
+}
\ No newline at end of file

diff --git a/src/java/org/apache/cassandra/tools/nodetool/stats/StatsPrinter.java b/src/java/org/apache/cassandra/tools/nodetool/stats/StatsPrinter.java
new file mode 100644
index 0000000..2d98781
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/nodetool/stats/StatsPrinter.java

@@ -0,0 +1,25 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.tools.nodetool.stats;
+
+import java.io.PrintStream;
+
+public interface StatsPrinter<T>
+{
+    void printFormat(T data, PrintStream out);
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/stats/StatsTable.java b/src/java/org/apache/cassandra/tools/nodetool/stats/StatsTable.java
new file mode 100644
index 0000000..71f35e9
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/nodetool/stats/StatsTable.java

@@ -0,0 +1,66 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.tools.nodetool.stats;
+
+import java.util.ArrayList;
+import java.util.List;
+
+public class StatsTable
+{
+    public String name;
+    public boolean isIndex;
+    public boolean isLeveledSstable = false;
+    public Object sstableCount;
+    public String spaceUsedLive;
+    public String spaceUsedTotal;
+    public String spaceUsedBySnapshotsTotal;
+    public boolean offHeapUsed = false;
+    public String offHeapMemoryUsedTotal;
+    public Object sstableCompressionRatio;
+    public Object numberOfKeysEstimate;
+    public Object memtableCellCount;
+    public String memtableDataSize;
+    public boolean memtableOffHeapUsed = false;
+    public String memtableOffHeapMemoryUsed;
+    public Object memtableSwitchCount;
+    public long localReadCount;
+    public double localReadLatencyMs;
+    public long localWriteCount;
+    public double localWriteLatencyMs;
+    public Object pendingFlushes;
+    public Object bloomFilterFalsePositives;
+    public Object bloomFilterFalseRatio;
+    public String bloomFilterSpaceUsed;
+    public boolean bloomFilterOffHeapUsed = false;
+    public String bloomFilterOffHeapMemoryUsed;
+    public boolean indexSummaryOffHeapUsed = false;
+    public String indexSummaryOffHeapMemoryUsed;
+    public boolean compressionMetadataOffHeapUsed = false;
+    public String compressionMetadataOffHeapMemoryUsed;
+    public long compactedPartitionMinimumBytes;
+    public long compactedPartitionMaximumBytes;
+    public long compactedPartitionMeanBytes;
+    public double percentRepaired;
+    public double averageLiveCellsPerSliceLastFiveMinutes;
+    public long maximumLiveCellsPerSliceLastFiveMinutes;
+    public double averageTombstonesPerSliceLastFiveMinutes;
+    public long maximumTombstonesPerSliceLastFiveMinutes;
+    public String droppedMutations;
+    public List<String> sstablesInEachLevel = new ArrayList<>();
+}

diff --git a/src/java/org/apache/cassandra/tools/nodetool/stats/TableStatsPrinter.java b/src/java/org/apache/cassandra/tools/nodetool/stats/TableStatsPrinter.java
new file mode 100644
index 0000000..a6da189
--- /dev/null
+++ b/src/java/org/apache/cassandra/tools/nodetool/stats/TableStatsPrinter.java

@@ -0,0 +1,150 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.tools.nodetool.stats;
+
+import java.io.PrintStream;
+import java.util.List;
+
+import org.json.simple.JSONObject;
+import org.yaml.snakeyaml.Yaml;
+
+public enum TableStatsPrinter
+{
+    DEFAULT(new DefaultPrinter()),
+    JSON(new JsonPrinter()),
+    YAML(new YamlPrinter()),;
+
+    private final StatsPrinter<StatsHolder> printer;
+
+    TableStatsPrinter(StatsPrinter<StatsHolder> printer)
+    {
+        this.printer = printer;
+    }
+
+    public void print(StatsHolder stats, PrintStream out)
+    {
+        printer.printFormat(stats, out);
+    }
+
+    public static TableStatsPrinter from(String format)
+    {
+        switch (format)
+        {
+            case "json":
+                return JSON;
+            case "yaml":
+                return YAML;
+            default:
+                return DEFAULT;
+        }
+    }
+
+    private static class DefaultPrinter implements StatsPrinter<StatsHolder>
+    {
+        @Override
+        public void printFormat(StatsHolder data, PrintStream out)
+        {
+            List<StatsKeyspace> keyspaces = data.keyspaces;
+            for (StatsKeyspace keyspace : keyspaces)
+            {
+                // print each keyspace's information
+                out.println("Keyspace : " + keyspace.name);
+                out.println("\tRead Count: " + keyspace.readCount);
+                out.println("\tRead Latency: " + keyspace.readLatency() + " ms.");
+                out.println("\tWrite Count: " + keyspace.writeCount);
+                out.println("\tWrite Latency: " + keyspace.writeLatency() + " ms.");
+                out.println("\tPending Flushes: " + keyspace.pendingFlushes);
+
+                // print each table's information
+                List<StatsTable> tables = keyspace.tables;
+                for (StatsTable table : tables)
+                {
+                    out.println("\t\tTable" + (table.isIndex ? " (index): " + table.name : ": ") + table.name);
+                    if (table.isLeveledSstable)
+                        out.println("\t\tSSTables in each level: [" + String.join(", ",
+                                                                                  table.sstablesInEachLevel) + "]");
+
+                    out.println("\t\tSpace used (live): " + table.spaceUsedLive);
+                    out.println("\t\tSpace used (total): " + table.spaceUsedTotal);
+                    out.println("\t\tSpace used by snapshots (total): " + table.spaceUsedBySnapshotsTotal);
+
+                    if (table.offHeapUsed)
+                        out.println("\t\tOff heap memory used (total): " + table.offHeapMemoryUsedTotal);
+                    out.println("\t\tSSTable Compression Ratio: " + table.sstableCompressionRatio);
+                    out.println("\t\tNumber of keys (estimate): " + table.numberOfKeysEstimate);
+                    out.println("\t\tMemtable cell count: " + table.memtableCellCount);
+                    out.println("\t\tMemtable data size: " + table.memtableDataSize);
+
+                    if (table.memtableOffHeapUsed)
+                        out.println("\t\tMemtable off heap memory used: " + table.memtableOffHeapMemoryUsed);
+                    out.println("\t\tMemtable switch count: " + table.memtableSwitchCount);
+                    out.println("\t\tLocal read count: " + table.localReadCount);
+                    out.printf("\t\tLocal read latency: %01.3f ms%n", table.localReadLatencyMs);
+                    out.println("\t\tLocal write count: " + table.localWriteCount);
+                    out.printf("\t\tLocal write latency: %01.3f ms%n", table.localWriteLatencyMs);
+                    out.println("\t\tPending flushes: " + table.pendingFlushes);
+                    out.println("\t\tPercent repaired: " + table.percentRepaired);
+
+                    out.println("\t\tBloom filter false positives: " + table.bloomFilterFalsePositives);
+                    out.printf("\t\tBloom filter false ratio: %01.5f%n", table.bloomFilterFalseRatio);
+                    out.println("\t\tBloom filter space used: " + table.bloomFilterSpaceUsed);
+
+                    if (table.bloomFilterOffHeapUsed)
+                        out.println("\t\tBloom filter off heap memory used: " + table.bloomFilterOffHeapMemoryUsed);
+                    if (table.indexSummaryOffHeapUsed)
+                        out.println("\t\tIndex summary off heap memory used: " + table.indexSummaryOffHeapMemoryUsed);
+                    if (table.compressionMetadataOffHeapUsed)
+                        out.println("\t\tCompression metadata off heap memory used: " + table.compressionMetadataOffHeapMemoryUsed);
+
+                    out.println("\t\tCompacted partition minimum bytes: " + table.compactedPartitionMinimumBytes);
+                    out.println("\t\tCompacted partition maximum bytes: " + table.compactedPartitionMaximumBytes);
+                    out.println("\t\tCompacted partition mean bytes: " + table.compactedPartitionMeanBytes);
+                    out.println("\t\tAverage live cells per slice (last five minutes): " + table.averageLiveCellsPerSliceLastFiveMinutes);
+                    out.println("\t\tMaximum live cells per slice (last five minutes): " + table.maximumLiveCellsPerSliceLastFiveMinutes);
+                    out.println("\t\tAverage tombstones per slice (last five minutes): " + table.averageTombstonesPerSliceLastFiveMinutes);
+                    out.println("\t\tMaximum tombstones per slice (last five minutes): " + table.maximumTombstonesPerSliceLastFiveMinutes);
+                    out.println("\t\tDropped Mutations: " + table.droppedMutations);
+                    out.println("");
+                }
+                out.println("----------------");
+            }
+        }
+    }
+
+    private static class JsonPrinter implements StatsPrinter<StatsHolder>
+    {
+        @Override
+        public void printFormat(StatsHolder data, PrintStream out)
+        {
+            JSONObject json = new JSONObject();
+            json.putAll(data.convert2Map());
+            out.println(json.toString());
+        }
+    }
+
+    private static class YamlPrinter implements StatsPrinter<StatsHolder>
+    {
+        @Override
+        public void printFormat(StatsHolder data, PrintStream out)
+        {
+            Yaml yaml = new Yaml();
+            out.println(yaml.dump(data.convert2Map()));
+        }
+    }
+
+}

diff --git a/src/java/org/apache/cassandra/tracing/ExpiredTraceState.java b/src/java/org/apache/cassandra/tracing/ExpiredTraceState.java
index 5cc3c21..bc8d5dd 100644
--- a/src/java/org/apache/cassandra/tracing/ExpiredTraceState.java
+++ b/src/java/org/apache/cassandra/tracing/ExpiredTraceState.java

@@ -1,5 +1,5 @@
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -7,33 +7,44 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 package org.apache.cassandra.tracing;
 
-import java.util.UUID;
-
 import org.apache.cassandra.utils.FBUtilities;
 
-public class ExpiredTraceState extends TraceState
+class ExpiredTraceState extends TraceState
 {
-    public ExpiredTraceState(UUID sessionId, Tracing.TraceType traceType)
+    private final TraceState delegate;
+
+    ExpiredTraceState(TraceState delegate)
     {
-        super(FBUtilities.getBroadcastAddress(), sessionId, traceType);
+        super(FBUtilities.getBroadcastAddress(), delegate.sessionId, delegate.traceType);
+        this.delegate = delegate;
     }
 
     public int elapsed()
     {
         return -1;
     }
+
+    protected void traceImpl(String message)
+    {
+        delegate.traceImpl(message);
+    }
+
+    protected void waitForPendingEvents()
+    {
+        delegate.waitForPendingEvents();
+    }
 }

diff --git a/src/java/org/apache/cassandra/tracing/TraceState.java b/src/java/org/apache/cassandra/tracing/TraceState.java
index 03e510f..ec2bc9e 100644
--- a/src/java/org/apache/cassandra/tracing/TraceState.java
+++ b/src/java/org/apache/cassandra/tracing/TraceState.java

@@ -19,7 +19,6 @@
 
 import java.net.InetAddress;
 import java.nio.ByteBuffer;
-import java.util.Collections;
 import java.util.List;
 import java.util.UUID;
 import java.util.concurrent.CopyOnWriteArrayList;
@@ -27,19 +26,9 @@
 import java.util.concurrent.atomic.AtomicInteger;
 
 import com.google.common.base.Stopwatch;
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
 import org.slf4j.helpers.MessageFormatter;
 
-import org.apache.cassandra.concurrent.Stage;
-import org.apache.cassandra.concurrent.StageManager;
-import org.apache.cassandra.db.ConsistencyLevel;
-import org.apache.cassandra.db.Mutation;
-import org.apache.cassandra.exceptions.OverloadedException;
-import org.apache.cassandra.service.StorageProxy;
 import org.apache.cassandra.utils.ByteBufferUtil;
-import org.apache.cassandra.utils.JVMStabilityInspector;
-import org.apache.cassandra.utils.WrappedRunnable;
 import org.apache.cassandra.utils.progress.ProgressEvent;
 import org.apache.cassandra.utils.progress.ProgressEventNotifier;
 import org.apache.cassandra.utils.progress.ProgressListener;
@@ -48,12 +37,8 @@
  * ThreadLocal state for a tracing session. The presence of an instance of this class as a ThreadLocal denotes that an
  * operation is being traced.
  */
-public class TraceState implements ProgressEventNotifier
+public abstract class TraceState implements ProgressEventNotifier
 {
-    private static final Logger logger = LoggerFactory.getLogger(TraceState.class);
-    private static final int WAIT_FOR_PENDING_EVENTS_TIMEOUT_SECS =
-    Integer.valueOf(System.getProperty("cassandra.wait_for_tracing_events_timeout_secs", "1"));
-
     public final UUID sessionId;
     public final InetAddress coordinator;
     public final Stopwatch watch;
@@ -78,7 +63,7 @@
     // See CASSANDRA-7626 for more details.
     private final AtomicInteger references = new AtomicInteger(1);
 
-    public TraceState(InetAddress coordinator, UUID sessionId, Tracing.TraceType traceType)
+    protected TraceState(InetAddress coordinator, UUID sessionId, Tracing.TraceType traceType)
     {
         assert coordinator != null;
         assert sessionId != null;
@@ -90,7 +75,7 @@
         this.ttl = traceType.getTTL();
         watch = Stopwatch.createStarted();
         this.status = Status.IDLE;
-}
+    }
 
     /**
      * Activate notification with provided {@code tag} name.
@@ -160,7 +145,7 @@
         return status;
     }
 
-    private synchronized void notifyActivity()
+    protected synchronized void notifyActivity()
     {
         status = Status.ACTIVE;
         notifyAll();
@@ -186,12 +171,7 @@
         if (notify)
             notifyActivity();
 
-        final String threadName = Thread.currentThread().getName();
-        final int elapsed = elapsed();
-
-        executeMutation(TraceKeyspace.makeEventMutation(sessionIdBytes, message, elapsed, threadName, ttl));
-        if (logger.isTraceEnabled())
-            logger.trace("Adding <{}> to trace events", message);
+        traceImpl(message);
 
         for (ProgressListener listener : listeners)
         {
@@ -199,72 +179,9 @@
         }
     }
 
-    static void executeMutation(final Mutation mutation)
-    {
-        StageManager.getStage(Stage.TRACING).execute(new WrappedRunnable()
-        {
-            protected void runMayThrow() throws Exception
-            {
-                mutateWithCatch(mutation);
-            }
-        });
-    }
+    protected abstract void traceImpl(String message);
 
-    /**
-     * Called from {@link org.apache.cassandra.net.OutboundTcpConnection} for non-local traces (traces
-     * that are not initiated by local node == coordinator).
-     */
-    public static void mutateWithTracing(final ByteBuffer sessionId, final String message, final int elapsed, final int ttl)
-    {
-        final String threadName = Thread.currentThread().getName();
-
-        StageManager.getStage(Stage.TRACING).execute(new WrappedRunnable()
-        {
-            public void runMayThrow()
-            {
-                mutateWithCatch(TraceKeyspace.makeEventMutation(sessionId, message, elapsed, threadName, ttl));
-            }
-        });
-    }
-
-    static void mutateWithCatch(Mutation mutation)
-    {
-        try
-        {
-            StorageProxy.mutate(Collections.singletonList(mutation), ConsistencyLevel.ANY);
-        }
-        catch (OverloadedException e)
-        {
-            Tracing.logger.warn("Too many nodes are overloaded to save trace events");
-        }
-    }
-
-    /**
-     * Post a no-op event to the TRACING stage, so that we can be sure that any previous mutations
-     * have at least been applied to one replica. This works because the tracking executor only
-     * has one thread in its pool, see {@link StageManager#tracingExecutor()}.
-     */
-    protected void waitForPendingEvents()
-    {
-        if (WAIT_FOR_PENDING_EVENTS_TIMEOUT_SECS <= 0)
-            return;
-
-        try
-        {
-            if (logger.isTraceEnabled())
-                logger.trace("Waiting for up to {} seconds for trace events to complete",
-                             +WAIT_FOR_PENDING_EVENTS_TIMEOUT_SECS);
-
-            StageManager.getStage(Stage.TRACING).submit(StageManager.NO_OP_TASK)
-                        .get(WAIT_FOR_PENDING_EVENTS_TIMEOUT_SECS, TimeUnit.SECONDS);
-        }
-        catch (Throwable t)
-        {
-            JVMStabilityInspector.inspectThrowable(t);
-            logger.debug("Failed to wait for tracing events to complete: {}", t);
-        }
-    }
-
+    protected abstract void waitForPendingEvents();
 
     public boolean acquireReference()
     {

diff --git a/src/java/org/apache/cassandra/tracing/TraceStateImpl.java b/src/java/org/apache/cassandra/tracing/TraceStateImpl.java
new file mode 100644
index 0000000..e2d3a68
--- /dev/null
+++ b/src/java/org/apache/cassandra/tracing/TraceStateImpl.java

@@ -0,0 +1,112 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.tracing;
+
+import java.net.InetAddress;
+import java.util.Collections;
+import java.util.UUID;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.concurrent.Stage;
+import org.apache.cassandra.concurrent.StageManager;
+import org.apache.cassandra.db.ConsistencyLevel;
+import org.apache.cassandra.db.Mutation;
+import org.apache.cassandra.exceptions.OverloadedException;
+import org.apache.cassandra.service.StorageProxy;
+import org.apache.cassandra.utils.JVMStabilityInspector;
+import org.apache.cassandra.utils.WrappedRunnable;
+
+/**
+ * ThreadLocal state for a tracing session. The presence of an instance of this class as a ThreadLocal denotes that an
+ * operation is being traced.
+ */
+public class TraceStateImpl extends TraceState
+{
+    private static final Logger logger = LoggerFactory.getLogger(TraceStateImpl.class);
+    private static final int WAIT_FOR_PENDING_EVENTS_TIMEOUT_SECS =
+      Integer.valueOf(System.getProperty("cassandra.wait_for_tracing_events_timeout_secs", "1"));
+
+    public TraceStateImpl(InetAddress coordinator, UUID sessionId, Tracing.TraceType traceType)
+    {
+        super(coordinator, sessionId, traceType);
+    }
+
+    protected void traceImpl(String message)
+    {
+        final String threadName = Thread.currentThread().getName();
+        final int elapsed = elapsed();
+
+        executeMutation(TraceKeyspace.makeEventMutation(sessionIdBytes, message, elapsed, threadName, ttl));
+        if (logger.isTraceEnabled())
+            logger.trace("Adding <{}> to trace events", message);
+    }
+
+    /**
+     * Post a no-op event to the TRACING stage, so that we can be sure that any previous mutations
+     * have at least been applied to one replica. This works because the tracking executor only
+     * has one thread in its pool, see {@link StageManager#tracingExecutor()}.
+     */
+    protected void waitForPendingEvents()
+    {
+        if (WAIT_FOR_PENDING_EVENTS_TIMEOUT_SECS <= 0)
+            return;
+
+        try
+        {
+            if (logger.isTraceEnabled())
+                logger.trace("Waiting for up to {} seconds for trace events to complete",
+                             WAIT_FOR_PENDING_EVENTS_TIMEOUT_SECS);
+
+            StageManager.getStage(Stage.TRACING).submit(StageManager.NO_OP_TASK)
+                        .get(WAIT_FOR_PENDING_EVENTS_TIMEOUT_SECS, TimeUnit.SECONDS);
+        }
+        catch (Throwable t)
+        {
+            JVMStabilityInspector.inspectThrowable(t);
+            logger.debug("Failed to wait for tracing events to complete: {}", t);
+        }
+    }
+
+    static void executeMutation(final Mutation mutation)
+    {
+        StageManager.getStage(Stage.TRACING).execute(new WrappedRunnable()
+        {
+            protected void runMayThrow()
+            {
+                mutateWithCatch(mutation);
+            }
+        });
+    }
+
+    static void mutateWithCatch(Mutation mutation)
+    {
+        try
+        {
+            StorageProxy.mutate(Collections.singletonList(mutation), ConsistencyLevel.ANY);
+        }
+        catch (OverloadedException e)
+        {
+            Tracing.logger.warn("Too many nodes are overloaded to save trace events");
+        }
+    }
+
+}

diff --git a/src/java/org/apache/cassandra/tracing/Tracing.java b/src/java/org/apache/cassandra/tracing/Tracing.java
index bf9cee7..e69645f 100644
--- a/src/java/org/apache/cassandra/tracing/Tracing.java
+++ b/src/java/org/apache/cassandra/tracing/Tracing.java

@@ -21,11 +21,13 @@
 
 import java.net.InetAddress;
 import java.nio.ByteBuffer;
+import java.util.Collections;
 import java.util.Map;
 import java.util.UUID;
 import java.util.concurrent.ConcurrentHashMap;
 import java.util.concurrent.ConcurrentMap;
 
+import com.google.common.collect.ImmutableMap;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -35,14 +37,15 @@
 import org.apache.cassandra.net.MessageIn;
 import org.apache.cassandra.net.MessagingService;
 import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.JVMStabilityInspector;
 import org.apache.cassandra.utils.UUIDGen;
 
 
 /**
  * A trace session context. Able to track and store trace sessions. A session is usually a user initiated query, and may
- * have multiple local and remote events before it is completed. All events and sessions are stored at keyspace.
+ * have multiple local and remote events before it is completed.
  */
-public class Tracing implements ExecutorLocal<TraceState>
+public abstract class Tracing implements ExecutorLocal<TraceState>
 {
     public static final String TRACE_HEADER = "TraceSession";
     public static final String TRACE_TYPE = "TraceType";
@@ -77,15 +80,34 @@
         }
     }
 
-    static final Logger logger = LoggerFactory.getLogger(Tracing.class);
+    protected static final Logger logger = LoggerFactory.getLogger(Tracing.class);
 
     private final InetAddress localAddress = FBUtilities.getLocalAddress();
 
     private final ThreadLocal<TraceState> state = new ThreadLocal<>();
 
-    private final ConcurrentMap<UUID, TraceState> sessions = new ConcurrentHashMap<>();
+    protected final ConcurrentMap<UUID, TraceState> sessions = new ConcurrentHashMap<>();
 
-    public static final Tracing instance = new Tracing();
+    public static final Tracing instance;
+
+    static {
+        Tracing tracing = null;
+        String customTracingClass = System.getProperty("cassandra.custom_tracing_class");
+        if (null != customTracingClass)
+        {
+            try
+            {
+                tracing = FBUtilities.construct(customTracingClass, "Tracing");
+                logger.info("Using {} as tracing queries (as requested with -Dcassandra.custom_tracing_class)", customTracingClass);
+            }
+            catch (Exception e)
+            {
+                JVMStabilityInspector.inspectThrowable(e);
+                logger.error("Cannot use class {} for tracing ({}), ignoring by defaulting on normal tracing", customTracingClass, e.getMessage());
+            }
+        }
+        instance = null != tracing ? tracing : new TracingImpl();
+    }
 
     public UUID getSessionId()
     {
@@ -110,30 +132,33 @@
      */
     public static boolean isTracing()
     {
-        return instance.state.get() != null;
+        return instance.get() != null;
     }
 
-    public UUID newSession()
+    public UUID newSession(Map<String,ByteBuffer> customPayload)
     {
         return newSession(TraceType.QUERY);
     }
 
     public UUID newSession(TraceType traceType)
     {
-        return newSession(TimeUUIDType.instance.compose(ByteBuffer.wrap(UUIDGen.getTimeUUIDBytes())), traceType);
+        return newSession(
+                TimeUUIDType.instance.compose(ByteBuffer.wrap(UUIDGen.getTimeUUIDBytes())),
+                traceType,
+                Collections.EMPTY_MAP);
     }
 
-    public UUID newSession(UUID sessionId)
+    public UUID newSession(UUID sessionId, Map<String,ByteBuffer> customPayload)
     {
-        return newSession(sessionId, TraceType.QUERY);
+        return newSession(sessionId, TraceType.QUERY, Collections.EMPTY_MAP);
     }
 
-    private UUID newSession(UUID sessionId, TraceType traceType)
+    protected UUID newSession(UUID sessionId, TraceType traceType, Map<String,ByteBuffer> customPayload)
     {
-        assert state.get() == null;
+        assert get() == null;
 
-        TraceState ts = new TraceState(localAddress, sessionId, traceType);
-        state.set(ts);
+        TraceState ts = newTraceState(localAddress, sessionId, traceType);
+        set(ts);
         sessions.put(sessionId, ts);
 
         return sessionId;
@@ -145,30 +170,29 @@
             sessions.remove(state.sessionId);
     }
 
+
     /**
      * Stop the session and record its complete.  Called by coodinator when request is complete.
      */
     public void stopSession()
     {
-        TraceState state = this.state.get();
+        TraceState state = get();
         if (state == null) // inline isTracing to avoid implicit two calls to state.get()
         {
             logger.trace("request complete");
         }
         else
         {
-            final int elapsed = state.elapsed();
-            final ByteBuffer sessionId = state.sessionIdBytes;
-            final int ttl = state.ttl;
-
-            TraceState.executeMutation(TraceKeyspace.makeStopSessionMutation(sessionId, elapsed, ttl));
+            stopSessionImpl();
 
             state.stop();
             sessions.remove(state.sessionId);
-            this.state.set(null);
+            set(null);
         }
     }
 
+    protected abstract void stopSessionImpl();
+
     public TraceState get()
     {
         return state.get();
@@ -189,24 +213,11 @@
         return begin(request, null, parameters);
     }
 
-    public TraceState begin(final String request, final InetAddress client, final Map<String, String> parameters)
-    {
-        assert isTracing();
-
-        final TraceState state = this.state.get();
-        final long startedAt = System.currentTimeMillis();
-        final ByteBuffer sessionId = state.sessionIdBytes;
-        final String command = state.traceType.toString();
-        final int ttl = state.ttl;
-
-        TraceState.executeMutation(TraceKeyspace.makeStartSessionMutation(sessionId, client, parameters, request, startedAt, command, ttl));
-
-        return state;
-    }
+    public abstract TraceState begin(String request, InetAddress client, Map<String, String> parameters);
 
     /**
      * Determines the tracing context from a message.  Does NOT set the threadlocal state.
-     * 
+     *
      * @param message The internode message
      */
     public TraceState initializeFromMessage(final MessageIn<?> message)
@@ -218,7 +229,7 @@
 
         assert sessionBytes.length == 16;
         UUID sessionId = UUIDGen.getUUID(ByteBuffer.wrap(sessionBytes));
-        TraceState ts = sessions.get(sessionId);
+        TraceState ts = get(sessionId);
         if (ts != null && ts.acquireReference())
             return ts;
 
@@ -230,16 +241,26 @@
         if (message.verb == MessagingService.Verb.REQUEST_RESPONSE)
         {
             // received a message for a session we've already closed out.  see CASSANDRA-5668
-            return new ExpiredTraceState(sessionId, traceType);
+            return new ExpiredTraceState(newTraceState(message.from, sessionId, traceType));
         }
         else
         {
-            ts = new TraceState(message.from, sessionId, traceType);
+            ts = newTraceState(message.from, sessionId, traceType);
             sessions.put(sessionId, ts);
             return ts;
         }
     }
 
+    public Map<String, byte[]> getTraceHeaders()
+    {
+        assert isTracing();
+
+        return ImmutableMap.of(
+                TRACE_HEADER, UUIDGen.decompose(Tracing.instance.getSessionId()),
+                TRACE_TYPE, new byte[] { Tracing.TraceType.serialize(Tracing.instance.getTraceType()) });
+    }
+
+    protected abstract TraceState newTraceState(InetAddress coordinator, UUID sessionId, Tracing.TraceType traceType);
 
     // repair just gets a varargs method since it's so heavyweight anyway
     public static void traceRepair(String format, Object... args)
@@ -287,4 +308,10 @@
 
         state.trace(format, args);
     }
+
+    /**
+     * Called from {@link org.apache.cassandra.net.OutboundTcpConnection} for non-local traces (traces
+     * that are not initiated by local node == coordinator).
+     */
+    public abstract void trace(ByteBuffer sessionId, String message, int ttl);
 }

diff --git a/src/java/org/apache/cassandra/tracing/TracingImpl.java b/src/java/org/apache/cassandra/tracing/TracingImpl.java
new file mode 100644
index 0000000..52ac183
--- /dev/null
+++ b/src/java/org/apache/cassandra/tracing/TracingImpl.java

@@ -0,0 +1,89 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.tracing;
+
+import java.net.InetAddress;
+import java.nio.ByteBuffer;
+import java.util.Map;
+import java.util.UUID;
+
+import org.apache.cassandra.concurrent.Stage;
+import org.apache.cassandra.concurrent.StageManager;
+import org.apache.cassandra.utils.WrappedRunnable;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+
+
+/**
+ * A trace session context. Able to track and store trace sessions. A session is usually a user initiated query, and may
+ * have multiple local and remote events before it is completed. All events and sessions are stored at keyspace.
+ */
+class TracingImpl extends Tracing
+{
+    private static final Logger logger = LoggerFactory.getLogger(TracingImpl.class);
+
+    public void stopSessionImpl() {
+        TraceState state = get();
+        int elapsed = state.elapsed();
+        ByteBuffer sessionId = state.sessionIdBytes;
+        int ttl = state.ttl;
+        TraceStateImpl.executeMutation(TraceKeyspace.makeStopSessionMutation(sessionId, elapsed, ttl));
+    }
+
+    public TraceState begin(final String request, final InetAddress client, final Map<String, String> parameters)
+    {
+        assert isTracing();
+
+        final TraceState state = get();
+        final long startedAt = System.currentTimeMillis();
+        final ByteBuffer sessionId = state.sessionIdBytes;
+        final String command = state.traceType.toString();
+        final int ttl = state.ttl;
+
+        TraceStateImpl.executeMutation(TraceKeyspace.makeStartSessionMutation(sessionId, client, parameters, request, startedAt, command, ttl));
+
+        return state;
+    }
+
+    @Override
+    protected TraceState newTraceState(InetAddress coordinator, UUID sessionId, TraceType traceType)
+    {
+        return new TraceStateImpl(coordinator, sessionId, traceType);
+    }
+
+    /**
+     * Called from {@link org.apache.cassandra.net.OutboundTcpConnection} for non-local traces (traces
+     * that are not initiated by local node == coordinator).
+     */
+    public void trace(final ByteBuffer sessionId, final String message, final int ttl)
+    {
+        final String threadName = Thread.currentThread().getName();
+
+        StageManager.getStage(Stage.TRACING).execute(new WrappedRunnable()
+        {
+            public void runMayThrow()
+            {
+                TraceStateImpl.mutateWithCatch(TraceKeyspace.makeEventMutation(sessionId, message, -1, threadName, ttl));
+            }
+        });
+    }
+}

diff --git a/src/java/org/apache/cassandra/transport/CBUtil.java b/src/java/org/apache/cassandra/transport/CBUtil.java
index 800a9a8..43f4bbd 100644
--- a/src/java/org/apache/cassandra/transport/CBUtil.java
+++ b/src/java/org/apache/cassandra/transport/CBUtil.java

@@ -33,15 +33,19 @@
 import java.util.Map;
 import java.util.UUID;
 
-import io.netty.buffer.*;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.ByteBufAllocator;
+import io.netty.buffer.ByteBufUtil;
+import io.netty.buffer.PooledByteBufAllocator;
+import io.netty.buffer.UnpooledByteBufAllocator;
 import io.netty.util.CharsetUtil;
-
+import io.netty.util.concurrent.FastThreadLocal;
 import org.apache.cassandra.config.Config;
 import org.apache.cassandra.db.ConsistencyLevel;
 import org.apache.cassandra.db.TypeSizes;
+import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.Pair;
 import org.apache.cassandra.utils.UUIDGen;
-import org.apache.cassandra.utils.ByteBufferUtil;
 
 /**
  * ByteBuf utility methods.
@@ -55,9 +59,7 @@
     public static final boolean USE_HEAP_ALLOCATOR = Boolean.getBoolean(Config.PROPERTY_PREFIX + "netty_use_heap_allocator");
     public static final ByteBufAllocator allocator = USE_HEAP_ALLOCATOR ? new UnpooledByteBufAllocator(false) : new PooledByteBufAllocator(true);
 
-    private CBUtil() {}
-
-    private final static ThreadLocal<CharsetDecoder> decoder = new ThreadLocal<CharsetDecoder>()
+    private final static FastThreadLocal<CharsetDecoder> TL_UTF8_DECODER = new FastThreadLocal<CharsetDecoder>()
     {
         @Override
         protected CharsetDecoder initialValue()
@@ -66,6 +68,40 @@
         }
     };
 
+    private final static FastThreadLocal<CharBuffer> TL_CHAR_BUFFER = new FastThreadLocal<>();
+
+    private CBUtil() {}
+
+
+    // Taken from Netty's ChannelBuffers.decodeString(). We need to use our own decoder to properly handle invalid
+    // UTF-8 sequences.  See CASSANDRA-8101 for more details.  This can be removed once https://github.com/netty/netty/pull/2999
+    // is resolved in a release used by Cassandra.
+    private static String decodeString(ByteBuffer src) throws CharacterCodingException
+    {
+        // the decoder needs to be reset every time we use it, hence the copy per thread
+        CharsetDecoder theDecoder = TL_UTF8_DECODER.get();
+        theDecoder.reset();
+        CharBuffer dst = TL_CHAR_BUFFER.get();
+        int capacity = (int) ((double) src.remaining() * theDecoder.maxCharsPerByte());
+        if (dst == null) {
+            capacity = Math.max(capacity, 4096);
+            dst = CharBuffer.allocate(capacity);
+            TL_CHAR_BUFFER.set(dst);
+        }
+        else {
+            dst.clear();
+            if (dst.capacity() < capacity){
+                dst = CharBuffer.allocate(capacity);
+                TL_CHAR_BUFFER.set(dst);
+            }
+        }
+        CoderResult cr = theDecoder.decode(src, dst, true);
+        if (!cr.isUnderflow())
+            cr.throwException();
+
+        return dst.flip().toString();
+    }
+
     private static String readString(ByteBuf cb, int length)
     {
         if (length == 0)
@@ -97,34 +133,12 @@
         }
     }
 
-    // Taken from Netty's ChannelBuffers.decodeString(). We need to use our own decoder to properly handle invalid
-    // UTF-8 sequences.  See CASSANDRA-8101 for more details.  This can be removed once https://github.com/netty/netty/pull/2999
-    // is resolved in a release used by Cassandra.
-    private static String decodeString(ByteBuffer src) throws CharacterCodingException
-    {
-        // the decoder needs to be reset every time we use it, hence the copy per thread
-        CharsetDecoder theDecoder = decoder.get();
-        theDecoder.reset();
-
-        final CharBuffer dst = CharBuffer.allocate(
-                (int) ((double) src.remaining() * theDecoder.maxCharsPerByte()));
-
-        CoderResult cr = theDecoder.decode(src, dst, true);
-        if (!cr.isUnderflow())
-            cr.throwException();
-
-        cr = theDecoder.flush(dst);
-        if (!cr.isUnderflow())
-            cr.throwException();
-
-        return dst.flip().toString();
-    }
-
     public static void writeString(String str, ByteBuf cb)
     {
-        byte[] bytes = str.getBytes(CharsetUtil.UTF_8);
-        cb.writeShort(bytes.length);
-        cb.writeBytes(bytes);
+        int writerIndex = cb.writerIndex();
+        cb.writeShort(0);
+        int lengthBytes = ByteBufUtil.writeUtf8(cb, str);
+        cb.setShort(writerIndex, lengthBytes);
     }
 
     public static int sizeOfString(String str)

diff --git a/src/java/org/apache/cassandra/transport/DataType.java b/src/java/org/apache/cassandra/transport/DataType.java
index e3eaf32..eb1f1f4 100644
--- a/src/java/org/apache/cassandra/transport/DataType.java
+++ b/src/java/org/apache/cassandra/transport/DataType.java

@@ -28,8 +28,9 @@
 
 import io.netty.buffer.ByteBuf;
 
-import org.apache.cassandra.exceptions.RequestValidationException;
+import org.apache.cassandra.cql3.FieldIdentifier;
 import org.apache.cassandra.db.marshal.*;
+import org.apache.cassandra.exceptions.RequestValidationException;
 import org.apache.cassandra.utils.Pair;
 
 public enum DataType implements OptionCodec.Codecable<DataType>
@@ -66,6 +67,7 @@
     private final int id;
     private final int protocolVersion;
     private final AbstractType type;
+    private final Pair<DataType, Object> pair;
     private static final Map<AbstractType, DataType> dataTypeMap = new HashMap<AbstractType, DataType>();
     static
     {
@@ -81,6 +83,7 @@
         this.id = id;
         this.type = type;
         this.protocolVersion = protocolVersion;
+        pair = Pair.create(this, null);
     }
 
     public int getId(int version)
@@ -109,14 +112,14 @@
                 String ks = CBUtil.readString(cb);
                 ByteBuffer name = UTF8Type.instance.decompose(CBUtil.readString(cb));
                 int n = cb.readUnsignedShort();
-                List<ByteBuffer> fieldNames = new ArrayList<>(n);
+                List<FieldIdentifier> fieldNames = new ArrayList<>(n);
                 List<AbstractType<?>> fieldTypes = new ArrayList<>(n);
                 for (int i = 0; i < n; i++)
                 {
-                    fieldNames.add(UTF8Type.instance.decompose(CBUtil.readString(cb)));
+                    fieldNames.add(FieldIdentifier.forInternalString(CBUtil.readString(cb)));
                     fieldTypes.add(DataType.toType(codec.decodeOne(cb, version)));
                 }
-                return new UserType(ks, name, fieldNames, fieldTypes);
+                return new UserType(ks, name, fieldNames, fieldTypes, true);
             case TUPLE:
                 n = cb.readUnsignedShort();
                 List<AbstractType<?>> types = new ArrayList<>(n);
@@ -161,7 +164,7 @@
                 cb.writeShort(udt.size());
                 for (int i = 0; i < udt.size(); i++)
                 {
-                    CBUtil.writeString(UTF8Type.instance.compose(udt.fieldName(i)), cb);
+                    CBUtil.writeString(udt.fieldName(i).toString(), cb);
                     codec.writeOne(DataType.fromType(udt.fieldType(i), version), cb, version);
                 }
                 break;
@@ -201,7 +204,7 @@
                 size += 2;
                 for (int i = 0; i < udt.size(); i++)
                 {
-                    size += CBUtil.sizeOfString(UTF8Type.instance.compose(udt.fieldName(i)));
+                    size += CBUtil.sizeOfString(udt.fieldName(i).toString());
                     size += codec.oneSerializedSize(DataType.fromType(udt.fieldType(i), version), version);
                 }
                 return size;
@@ -261,7 +264,7 @@
             // Fall back to CUSTOM if target doesn't know this data type
             if (version < dt.protocolVersion)
                 return Pair.<DataType, Object>create(CUSTOM, type.toString());
-            return Pair.create(dt, null);
+            return dt.pair;
         }
     }
 

diff --git a/src/java/org/apache/cassandra/transport/Frame.java b/src/java/org/apache/cassandra/transport/Frame.java
index 3940b47..d0d4aee 100644
--- a/src/java/org/apache/cassandra/transport/Frame.java
+++ b/src/java/org/apache/cassandra/transport/Frame.java

@@ -27,6 +27,7 @@
 import io.netty.handler.codec.ByteToMessageDecoder;
 import io.netty.handler.codec.MessageToMessageDecoder;
 import io.netty.handler.codec.MessageToMessageEncoder;
+import io.netty.util.Attribute;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.transport.messages.ErrorMessage;
@@ -224,12 +225,13 @@
             idx += bodyLength;
             buffer.readerIndex(idx);
 
-            Connection connection = ctx.channel().attr(Connection.attributeKey).get();
+            Attribute<Connection> attrConn = ctx.channel().attr(Connection.attributeKey);
+            Connection connection = attrConn.get();
             if (connection == null)
             {
                 // First message seen on this channel, attach the connection object
                 connection = factory.newConnection(ctx.channel(), version);
-                ctx.channel().attr(Connection.attributeKey).set(connection);
+                attrConn.set(connection);
             }
             else if (connection.getVersion() != version)
             {

diff --git a/src/java/org/apache/cassandra/transport/Server.java b/src/java/org/apache/cassandra/transport/Server.java
index 63194d0..388fca0 100644
--- a/src/java/org/apache/cassandra/transport/Server.java
+++ b/src/java/org/apache/cassandra/transport/Server.java

@@ -359,7 +359,6 @@
             String[] suites = SSLFactory.filterCipherSuites(sslEngine.getSupportedCipherSuites(), encryptionOptions.cipher_suites);
             sslEngine.setEnabledCipherSuites(suites);
             sslEngine.setNeedClientAuth(encryptionOptions.require_client_auth);
-            sslEngine.setEnabledProtocols(SSLFactory.ACCEPTED_PROTOCOLS);
             return new SslHandler(sslEngine);
         }
     }

diff --git a/src/java/org/apache/cassandra/transport/SimpleClient.java b/src/java/org/apache/cassandra/transport/SimpleClient.java
index 4759c2a..6e20cfa 100644
--- a/src/java/org/apache/cassandra/transport/SimpleClient.java
+++ b/src/java/org/apache/cassandra/transport/SimpleClient.java

@@ -293,7 +293,6 @@
             sslEngine.setUseClientMode(true);
             String[] suites = SSLFactory.filterCipherSuites(sslEngine.getSupportedCipherSuites(), encryptionOptions.cipher_suites);
             sslEngine.setEnabledCipherSuites(suites);
-            sslEngine.setEnabledProtocols(SSLFactory.ACCEPTED_PROTOCOLS);
             channel.pipeline().addFirst("ssl", new SslHandler(sslEngine));
         }
     }

diff --git a/src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java b/src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java
index e9923b4..a5348a4 100644
--- a/src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java
+++ b/src/java/org/apache/cassandra/transport/messages/ExecuteMessage.java

@@ -23,6 +23,7 @@
 import io.netty.buffer.ByteBuf;
 
 import org.apache.cassandra.cql3.CQLStatement;
+import org.apache.cassandra.cql3.ColumnSpecification;
 import org.apache.cassandra.cql3.QueryHandler;
 import org.apache.cassandra.cql3.QueryOptions;
 import org.apache.cassandra.cql3.statements.ParsedStatement;
@@ -110,7 +111,7 @@
 
             if (state.traceNextQuery())
             {
-                state.createTracingSession();
+                state.createTracingSession(getCustomPayload());
 
                 ImmutableMap.Builder<String, String> builder = ImmutableMap.builder();
                 if (options.getPageSize() > 0)
@@ -119,8 +120,23 @@
                     builder.put("consistency_level", options.getConsistency().name());
                 if(options.getSerialConsistency() != null)
                     builder.put("serial_consistency_level", options.getSerialConsistency().name());
+                builder.put("query", prepared.rawCQLStatement);
 
-                // TODO we don't have [typed] access to CQL bind variables here.  CASSANDRA-4560 is open to add support.
+                for(int i=0;i<prepared.boundNames.size();i++)
+                {
+                    ColumnSpecification cs = prepared.boundNames.get(i);
+                    String boundName = cs.name.toString();
+                    String boundValue = cs.type.asCQL3Type().toCQLLiteral(options.getValues().get(i), options.getProtocolVersion());
+                    if ( boundValue.length() > 1000 )
+                    {
+                        boundValue = boundValue.substring(0, 1000) + "...'";
+                    }
+
+                    //Here we prefix boundName with the index to avoid possible collission in builder keys due to
+                    //having multiple boundValues for the same variable
+                    builder.put("bound_var_" + Integer.toString(i) + "_" + boundName, boundValue);
+                }
+
                 Tracing.instance.begin("Execute CQL3 prepared query", state.getClientAddress(), builder.build());
             }
 

diff --git a/src/java/org/apache/cassandra/triggers/TriggerExecutor.java b/src/java/org/apache/cassandra/triggers/TriggerExecutor.java
index 40d4094..8cfa3e2 100644
--- a/src/java/org/apache/cassandra/triggers/TriggerExecutor.java
+++ b/src/java/org/apache/cassandra/triggers/TriggerExecutor.java

@@ -31,6 +31,7 @@
 import org.apache.cassandra.cql3.QueryProcessor;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.partitions.PartitionUpdate;
+import org.apache.cassandra.exceptions.CassandraException;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.schema.TriggerMetadata;
 import org.apache.cassandra.schema.Triggers;
@@ -231,9 +232,13 @@
             }
             return tmutations;
         }
+        catch (CassandraException ex)
+        {
+            throw ex;
+        }
         catch (Exception ex)
         {
-            throw new RuntimeException(String.format("Exception while creating trigger on table with ID: %s", update.metadata().cfId), ex);
+            throw new RuntimeException(String.format("Exception while executing trigger on table with ID: %s", update.metadata().cfId), ex);
         }
         finally
         {

diff --git a/src/java/org/apache/cassandra/utils/BloomFilter.java b/src/java/org/apache/cassandra/utils/BloomFilter.java
index ce6c638..4ff07b7 100644
--- a/src/java/org/apache/cassandra/utils/BloomFilter.java
+++ b/src/java/org/apache/cassandra/utils/BloomFilter.java

@@ -19,13 +19,15 @@
 
 import com.google.common.annotations.VisibleForTesting;
 
+import io.netty.util.concurrent.FastThreadLocal;
+import net.nicoulaj.compilecommand.annotations.Inline;
 import org.apache.cassandra.utils.concurrent.Ref;
 import org.apache.cassandra.utils.concurrent.WrappedSharedCloseable;
 import org.apache.cassandra.utils.obs.IBitSet;
 
 public class BloomFilter extends WrappedSharedCloseable implements IFilter
 {
-    private static final ThreadLocal<long[]> reusableIndexes = new ThreadLocal<long[]>()
+    private final static FastThreadLocal<long[]> reusableIndexes = new FastThreadLocal<long[]>()
     {
         protected long[] initialValue()
         {
@@ -84,16 +86,19 @@
     // to avoid generating a lot of garbage since stack allocation currently does not support stores
     // (CASSANDRA-6609).  it returns the array so that the caller does not need to perform
     // a second threadlocal lookup.
+    @Inline
     private long[] indexes(FilterKey key)
     {
         // we use the same array both for storing the hash result, and for storing the indexes we return,
         // so that we do not need to allocate two arrays.
         long[] indexes = reusableIndexes.get();
+
         key.filterHash(indexes);
         setIndexes(indexes[1], indexes[0], hashCount, bitset.capacity(), indexes);
         return indexes;
     }
 
+    @Inline
     private void setIndexes(long base, long inc, int count, long max, long[] results)
     {
         if (oldBfHashOrder)

diff --git a/src/java/org/apache/cassandra/utils/ByteBufferUtil.java b/src/java/org/apache/cassandra/utils/ByteBufferUtil.java
index c1b0721..cb4fc1d 100644
--- a/src/java/org/apache/cassandra/utils/ByteBufferUtil.java
+++ b/src/java/org/apache/cassandra/utils/ByteBufferUtil.java

@@ -35,8 +35,8 @@
 import net.nicoulaj.compilecommand.annotations.Inline;
 import org.apache.cassandra.db.TypeSizes;
 import org.apache.cassandra.io.util.DataInputPlus;
+import org.apache.cassandra.io.compress.BufferType;
 import org.apache.cassandra.io.util.DataOutputPlus;
-import org.apache.cassandra.io.util.FileDataInput;
 import org.apache.cassandra.io.util.FileUtils;
 
 /**
@@ -162,7 +162,6 @@
     public static byte[] getArray(ByteBuffer buffer)
     {
         int length = buffer.remaining();
-
         if (buffer.hasArray())
         {
             int boff = buffer.arrayOffset() + buffer.position();
@@ -386,7 +385,6 @@
 
     /**
      * @param in data input
-     * @return null
      * @throws IOException if an I/O error occurs.
      */
     public static void skipShortLength(DataInputPlus in) throws IOException
@@ -510,8 +508,16 @@
         };
     }
 
+    /*
+     * Does not modify position or limit of buffer even temporarily
+     * so this is safe even without duplication.
+     */
     public static String bytesToHex(ByteBuffer bytes)
     {
+        if (bytes.hasArray()) {
+            return Hex.bytesToHex(bytes.array(), bytes.arrayOffset() + bytes.position(), bytes.remaining());
+        }
+
         final int offset = bytes.position();
         final int size = bytes.remaining();
         final char[] c = new char[size * 2];
@@ -622,4 +628,120 @@
         return readBytes(bb, length);
     }
 
+    /**
+     * Ensure {@code buf} is large enough for {@code outputLength}. If not, it is cleaned up and a new buffer is allocated;
+     * else; buffer has it's position/limit set appropriately.
+     *
+     * @param buf buffer to test the size of; may be null, in which case, a new buffer is allocated.
+     * @param outputLength the minimum target size of the buffer
+     * @param allowBufferResize true if resizing (reallocating) the buffer is allowed
+     * @return {@code buf} if it was large enough, else a newly allocated buffer.
+     */
+    public static ByteBuffer ensureCapacity(ByteBuffer buf, int outputLength, boolean allowBufferResize)
+    {
+        BufferType bufferType = buf != null ? BufferType.typeOf(buf) : BufferType.ON_HEAP;
+        return ensureCapacity(buf, outputLength, allowBufferResize, bufferType);
+    }
+
+    /**
+     * Ensure {@code buf} is large enough for {@code outputLength}. If not, it is cleaned up and a new buffer is allocated;
+     * else; buffer has it's position/limit set appropriately.
+     *
+     * @param buf buffer to test the size of; may be null, in which case, a new buffer is allocated.
+     * @param outputLength the minimum target size of the buffer
+     * @param allowBufferResize true if resizing (reallocating) the buffer is allowed
+     * @param bufferType on- or off- heap byte buffer
+     * @return {@code buf} if it was large enough, else a newly allocated buffer.
+     */
+    public static ByteBuffer ensureCapacity(ByteBuffer buf, int outputLength, boolean allowBufferResize, BufferType bufferType)
+    {
+        if (0 > outputLength)
+            throw new IllegalArgumentException("invalid size for output buffer: " + outputLength);
+        if (buf == null || buf.capacity() < outputLength)
+        {
+            if (!allowBufferResize)
+                throw new IllegalStateException(String.format("output buffer is not large enough for data: current capacity %d, required %d", buf.capacity(), outputLength));
+            FileUtils.clean(buf);
+            buf = bufferType.allocate(outputLength);
+        }
+        else
+        {
+            buf.position(0).limit(outputLength);
+        }
+        return buf;
+    }
+
+    /**
+     * Check is the given buffer contains a given sub-buffer.
+     *
+     * @param buffer The buffer to search for sequence of bytes in.
+     * @param subBuffer The buffer to match.
+     *
+     * @return true if buffer contains sub-buffer, false otherwise.
+     */
+    public static boolean contains(ByteBuffer buffer, ByteBuffer subBuffer)
+    {
+        int len = subBuffer.remaining();
+        if (buffer.remaining() - len < 0)
+            return false;
+
+        // adapted form the JDK's String.indexOf()
+        byte first = subBuffer.get(subBuffer.position());
+        int max = buffer.position() + (buffer.remaining() - len);
+
+        for (int i = buffer.position(); i <= max; i++)
+        {
+            /* Look for first character. */
+            if (buffer.get(i) != first)
+            {
+                while (++i <= max && buffer.get(i) != first)
+                {}
+            }
+
+            /* (maybe) Found first character, now look at the rest of v2 */
+            if (i <= max)
+            {
+                int j = i + 1;
+                int end = j + len - 1;
+                for (int k = 1 + subBuffer.position(); j < end && buffer.get(j) == subBuffer.get(k); j++, k++)
+                {}
+
+                if (j == end)
+                    return true;
+            }
+        }
+        return false;
+    }
+
+    public static boolean startsWith(ByteBuffer src, ByteBuffer prefix)
+    {
+        return startsWith(src, prefix, 0);
+    }
+
+    public static boolean endsWith(ByteBuffer src, ByteBuffer suffix)
+    {
+        return startsWith(src, suffix, src.remaining() - suffix.remaining());
+    }
+
+    private static boolean startsWith(ByteBuffer src, ByteBuffer prefix, int offset)
+    {
+        if (offset < 0)
+            return false;
+
+        int sPos = src.position() + offset;
+        int pPos = prefix.position();
+
+        if (src.remaining() - offset < prefix.remaining())
+            return false;
+
+        int len = Math.min(src.remaining() - offset, prefix.remaining());
+
+        while (len-- > 0)
+        {
+            if (src.get(sPos++) != prefix.get(pPos++))
+                return false;
+        }
+
+        return true;
+    }
 }

diff --git a/src/java/org/apache/cassandra/utils/CassandraVersion.java b/src/java/org/apache/cassandra/utils/CassandraVersion.java
index 62d68be..759ca97 100644
--- a/src/java/org/apache/cassandra/utils/CassandraVersion.java
+++ b/src/java/org/apache/cassandra/utils/CassandraVersion.java

@@ -26,12 +26,19 @@
 
 /**
  * Implements versioning used in Cassandra and CQL.
- * <p/>
+ * <p>
  * Note: The following code uses a slight variation from the semver document (http://semver.org).
+ * </p>
  */
 public class CassandraVersion implements Comparable<CassandraVersion>
 {
-    private static final String VERSION_REGEXP = "(\\d+)\\.(\\d+)\\.(\\d+)(\\-[.\\w]+)?([.+][.\\w]+)?";
+    /**
+     * note: 3rd group matches to words but only allows number and checked after regexp test.
+     * this is because 3rd and the last can be identical.
+     **/
+    private static final String VERSION_REGEXP = "(\\d+)\\.(\\d+)(?:\\.(\\w+))?(\\-[.\\w]+)?([.+][.\\w]+)?";
+    private static final Pattern PATTERN_WHITESPACE = Pattern.compile("\\w+");
+
     private static final Pattern pattern = Pattern.compile(VERSION_REGEXP);
     private static final Pattern SNAPSHOT = Pattern.compile("-SNAPSHOT");
 
@@ -42,15 +49,6 @@
     private final String[] preRelease;
     private final String[] build;
 
-    private CassandraVersion(int major, int minor, int patch, String[] preRelease, String[] build)
-    {
-        this.major = major;
-        this.minor = minor;
-        this.patch = patch;
-        this.preRelease = preRelease;
-        this.build = build;
-    }
-
     /**
      * Parse a version from a string.
      *
@@ -69,7 +67,7 @@
         {
             this.major = Integer.parseInt(matcher.group(1));
             this.minor = Integer.parseInt(matcher.group(2));
-            this.patch = Integer.parseInt(matcher.group(3));
+            this.patch = matcher.group(3) != null ? Integer.parseInt(matcher.group(3)) : 0;
 
             String pr = matcher.group(4);
             String bld = matcher.group(5);
@@ -79,7 +77,7 @@
         }
         catch (NumberFormatException e)
         {
-            throw new IllegalArgumentException("Invalid version value: " + version);
+            throw new IllegalArgumentException("Invalid version value: " + version, e);
         }
     }
 
@@ -87,10 +85,10 @@
     {
         // Drop initial - or +
         str = str.substring(1);
-        String[] parts = str.split("\\.");
+        String[] parts = StringUtils.split(str, '.');
         for (String part : parts)
         {
-            if (!part.matches("\\w+"))
+            if (!PATTERN_WHITESPACE.matcher(part).matches())
                 throw new IllegalArgumentException("Invalid version value: " + version);
         }
         return parts;
@@ -123,13 +121,14 @@
     /**
      * Returns a version that is backward compatible with this version amongst a list
      * of provided version, or null if none can be found.
-     * <p/>
+     * <p>
      * For instance:
      * "2.0.0".findSupportingVersion("2.0.0", "3.0.0") == "2.0.0"
      * "2.0.0".findSupportingVersion("2.1.3", "3.0.0") == "2.1.3"
      * "2.0.0".findSupportingVersion("3.0.0") == null
      * "2.0.3".findSupportingVersion("2.0.0") == "2.0.0"
      * "2.1.0".findSupportingVersion("2.0.0") == null
+     * </p>
      */
     public CassandraVersion findSupportingVersion(CassandraVersion... versions)
     {

diff --git a/src/java/org/apache/cassandra/utils/ChecksumType.java b/src/java/org/apache/cassandra/utils/ChecksumType.java
index c9a1eb8..3fa245b 100644
--- a/src/java/org/apache/cassandra/utils/ChecksumType.java
+++ b/src/java/org/apache/cassandra/utils/ChecksumType.java

@@ -24,7 +24,7 @@
 
 public enum ChecksumType
 {
-    Adler32()
+    Adler32
     {
 
         @Override
@@ -40,7 +40,7 @@
         }
 
     },
-    CRC32()
+    CRC32
     {
 
         @Override
@@ -58,6 +58,28 @@
     };
 
     public abstract Checksum newInstance();
-
     public abstract void update(Checksum checksum, ByteBuffer buf);
+
+    private ThreadLocal<Checksum> instances = ThreadLocal.withInitial(this::newInstance);
+
+    public Checksum threadLocalInstance()
+    {
+        return instances.get();
+    }
+
+    public long of(ByteBuffer buf)
+    {
+        Checksum checksum = instances.get();
+        checksum.reset();
+        update(checksum, buf);
+        return checksum.getValue();
+    }
+
+    public long of(byte[] data, int off, int len)
+    {
+        Checksum checksum = instances.get();
+        checksum.reset();
+        checksum.update(data, off, len);
+        return checksum.getValue();
+    }
 }

diff --git a/src/java/org/apache/cassandra/utils/DirectorySizeCalculator.java b/src/java/org/apache/cassandra/utils/DirectorySizeCalculator.java
new file mode 100644
index 0000000..aa7898c
--- /dev/null
+++ b/src/java/org/apache/cassandra/utils/DirectorySizeCalculator.java

@@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.utils;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.file.FileVisitResult;
+import java.nio.file.Path;
+import java.nio.file.SimpleFileVisitor;
+import java.nio.file.attribute.BasicFileAttributes;
+import java.util.List;
+import java.util.Set;
+import java.util.concurrent.atomic.AtomicLong;
+
+import com.google.common.collect.ImmutableSet;
+
+import static com.google.common.collect.Sets.newHashSet;
+
+/**
+ * Walks directory recursively, summing up total contents of files within.
+ */
+public class DirectorySizeCalculator extends SimpleFileVisitor<Path>
+{
+    protected final AtomicLong size = new AtomicLong(0);
+    protected Set<String> visited = newHashSet(); //count each file only once
+    protected Set<String> alive = newHashSet();
+    protected final File path;
+
+    public DirectorySizeCalculator(File path)
+    {
+        super();
+        this.path = path;
+        rebuildFileList();
+    }
+
+    public DirectorySizeCalculator(List<File> files)
+    {
+        super();
+        this.path = null;
+        ImmutableSet.Builder<String> builder = ImmutableSet.builder();
+        for (File file : files)
+            builder.add(file.getName());
+        alive = builder.build();
+    }
+
+    public boolean isAcceptable(Path file)
+    {
+        return true;
+    }
+
+    public void rebuildFileList()
+    {
+        assert path != null;
+        ImmutableSet.Builder<String> builder = ImmutableSet.builder();
+        for (File file : path.listFiles())
+            builder.add(file.getName());
+        size.set(0);
+        alive = builder.build();
+    }
+
+    @Override
+    public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException
+    {
+        if (isAcceptable(file))
+        {
+            size.addAndGet(attrs.size());
+            visited.add(file.toFile().getName());
+        }
+        return FileVisitResult.CONTINUE;
+    }
+
+    @Override
+    public FileVisitResult visitFileFailed(Path file, IOException exc) throws IOException
+    {
+        return FileVisitResult.CONTINUE;
+    }
+
+    public long getAllocatedSize()
+    {
+        return size.get();
+    }
+}

diff --git a/src/java/org/apache/cassandra/utils/FBUtilities.java b/src/java/org/apache/cassandra/utils/FBUtilities.java
index 5f0e0a0..af2cb1b 100644
--- a/src/java/org/apache/cassandra/utils/FBUtilities.java
+++ b/src/java/org/apache/cassandra/utils/FBUtilities.java

@@ -32,6 +32,7 @@
 import javax.annotation.Nonnull;
 import javax.annotation.Nullable;
 
+import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Joiner;
 import org.apache.commons.lang3.StringUtils;
 import org.slf4j.Logger;
@@ -42,11 +43,18 @@
 import org.apache.cassandra.auth.IRoleManager;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.SerializationHeader;
+import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.dht.IPartitioner;
+import org.apache.cassandra.dht.LocalPartitioner;
 import org.apache.cassandra.dht.Range;
 import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.io.IVersionedSerializer;
+import org.apache.cassandra.io.sstable.Descriptor;
+import org.apache.cassandra.io.sstable.metadata.MetadataComponent;
+import org.apache.cassandra.io.sstable.metadata.MetadataType;
+import org.apache.cassandra.io.sstable.metadata.ValidationMetadata;
 import org.apache.cassandra.schema.CompressionParams;
 import org.apache.cassandra.io.util.DataOutputBuffer;
 import org.apache.cassandra.io.util.DataOutputBufferFixed;
@@ -395,10 +403,37 @@
             result.get(ms, TimeUnit.MILLISECONDS);
     }
 
+    /**
+     * Create a new instance of a partitioner defined in an SSTable Descriptor
+     * @param desc Descriptor of an sstable
+     * @return a new IPartitioner instance
+     * @throws IOException
+     */
+    public static IPartitioner newPartitioner(Descriptor desc) throws IOException
+    {
+        EnumSet<MetadataType> types = EnumSet.of(MetadataType.VALIDATION, MetadataType.HEADER);
+        Map<MetadataType, MetadataComponent> sstableMetadata = desc.getMetadataSerializer().deserialize(desc, types);
+        ValidationMetadata validationMetadata = (ValidationMetadata) sstableMetadata.get(MetadataType.VALIDATION);
+        SerializationHeader.Component header = (SerializationHeader.Component) sstableMetadata.get(MetadataType.HEADER);
+        return newPartitioner(validationMetadata.partitioner, Optional.of(header.getKeyType()));
+    }
+
     public static IPartitioner newPartitioner(String partitionerClassName) throws ConfigurationException
     {
+        return newPartitioner(partitionerClassName, Optional.empty());
+    }
+
+    @VisibleForTesting
+    static IPartitioner newPartitioner(String partitionerClassName, Optional<AbstractType<?>> comparator) throws ConfigurationException
+    {
         if (!partitionerClassName.contains("."))
             partitionerClassName = "org.apache.cassandra.dht." + partitionerClassName;
+
+        if (partitionerClassName.equals("org.apache.cassandra.dht.LocalPartitioner"))
+        {
+            assert comparator.isPresent() : "Expected a comparator for local partitioner";
+            return new LocalPartitioner(comparator.get());
+        }
         return FBUtilities.instanceOrConstruct(partitionerClassName, "partitioner");
     }
 
@@ -588,11 +623,36 @@
 
     public static String prettyPrintMemory(long size)
     {
+        return prettyPrintMemory(size, false);
+    }
+
+    public static String prettyPrintMemory(long size, boolean includeSpace)
+    {
         if (size >= 1 << 30)
-            return String.format("%.3fGiB", size / (double) (1 << 30));
+            return String.format("%.3f%sGiB", size / (double) (1 << 30), includeSpace ? " " : "");
         if (size >= 1 << 20)
-            return String.format("%.3fMiB", size / (double) (1 << 20));
-        return String.format("%.3fKiB", size / (double) (1 << 10));
+            return String.format("%.3f%sMiB", size / (double) (1 << 20), includeSpace ? " " : "");
+        return String.format("%.3f%sKiB", size / (double) (1 << 10), includeSpace ? " " : "");
+    }
+
+    public static String prettyPrintMemoryPerSecond(long rate)
+    {
+        if (rate >= 1 << 30)
+            return String.format("%.3fGiB/s", rate / (double) (1 << 30));
+        if (rate >= 1 << 20)
+            return String.format("%.3fMiB/s", rate / (double) (1 << 20));
+        return String.format("%.3fKiB/s", rate / (double) (1 << 10));
+    }
+
+    public static String prettyPrintMemoryPerSecond(long bytes, long timeInNano)
+    {
+        // We can't sanely calculate a rate over 0 nanoseconds
+        if (timeInNano == 0)
+            return "NaN  KiB/s";
+
+        long rate = (long) (((double) bytes / timeInNano) * 1000 * 1000 * 1000);
+
+        return prettyPrintMemoryPerSecond(rate);
     }
 
     /**
@@ -832,4 +892,21 @@
             throw new RuntimeException(e);
         }
     }
+	
+	public static void sleepQuietly(long millis)
+    {
+        try
+        {
+            Thread.sleep(millis);
+        }
+        catch (InterruptedException e)
+        {
+            throw new RuntimeException(e);
+        }
+    }
+
+    public static long align(long val, int boundary)
+    {
+        return (val + boundary) & ~(boundary - 1);
+    }
 }

diff --git a/src/java/org/apache/cassandra/utils/FastByteOperations.java b/src/java/org/apache/cassandra/utils/FastByteOperations.java
index 68e395c..cf8d305 100644
--- a/src/java/org/apache/cassandra/utils/FastByteOperations.java
+++ b/src/java/org/apache/cassandra/utils/FastByteOperations.java

@@ -333,7 +333,7 @@
          * @param memoryOffset2 Where to start comparing in the right buffer (pure memory address if buffer1 is null, or relative otherwise)
          * @param length1 How much to compare from the left buffer
          * @param length2 How much to compare from the right buffer
-         * @return 0 if equal, < 0 if left is less than right, etc.
+         * @return 0 if equal, {@code < 0} if left is less than right, etc.
          */
         @Inline
         public static int compareTo(Object buffer1, long memoryOffset1, int length1,

diff --git a/src/java/org/apache/cassandra/utils/FilterFactory.java b/src/java/org/apache/cassandra/utils/FilterFactory.java
index 869f3fa..298e734 100644
--- a/src/java/org/apache/cassandra/utils/FilterFactory.java
+++ b/src/java/org/apache/cassandra/utils/FilterFactory.java

@@ -20,14 +20,14 @@
 import java.io.DataInput;
 import java.io.IOException;
 
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
 import org.apache.cassandra.io.util.DataOutputPlus;
 import org.apache.cassandra.utils.obs.IBitSet;
 import org.apache.cassandra.utils.obs.OffHeapBitSet;
 import org.apache.cassandra.utils.obs.OpenBitSet;
 
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
-
 public class FilterFactory
 {
     public static final IFilter AlwaysPresent = new AlwaysPresentFilter();

diff --git a/src/java/org/apache/cassandra/utils/HeapUtils.java b/src/java/org/apache/cassandra/utils/HeapUtils.java
index bfc8a0b..65364d8 100644
--- a/src/java/org/apache/cassandra/utils/HeapUtils.java
+++ b/src/java/org/apache/cassandra/utils/HeapUtils.java

@@ -125,15 +125,16 @@
      */
     private static void logProcessOutput(Process p) throws IOException
     {
-        BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream()));
-
-        StrBuilder builder = new StrBuilder();
-        String line;
-        while ((line = input.readLine()) != null)
+        try (BufferedReader input = new BufferedReader(new InputStreamReader(p.getInputStream())))
         {
-            builder.appendln(line);
+            StrBuilder builder = new StrBuilder();
+            String line;
+            while ((line = input.readLine()) != null)
+            {
+                builder.appendln(line);
+            }
+            logger.info(builder.toString());
         }
-        logger.info(builder.toString());
     }
 
     /**

diff --git a/src/java/org/apache/cassandra/utils/Hex.java b/src/java/org/apache/cassandra/utils/Hex.java
index 0883c34..c4b4586 100644
--- a/src/java/org/apache/cassandra/utils/Hex.java
+++ b/src/java/org/apache/cassandra/utils/Hex.java

@@ -70,10 +70,15 @@
 
     public static String bytesToHex(byte... bytes)
     {
-        char[] c = new char[bytes.length * 2];
-        for (int i = 0; i < bytes.length; i++)
+        return bytesToHex(bytes, 0, bytes.length);
+    }
+
+    public static String bytesToHex(byte bytes[], int offset, int length)
+    {
+        char[] c = new char[length * 2];
+        for (int i = 0; i < length; i++)
         {
-            int bint = bytes[i];
+            int bint = bytes[i + offset];
             c[i * 2] = byteToChar[(bint & 0xf0) >> 4];
             c[1 + i * 2] = byteToChar[bint & 0x0f];
         }
@@ -96,7 +101,7 @@
             try
             {
                 s = stringConstructor.newInstance(0, c.length, c);
-            } 
+            }
             catch (InvocationTargetException ite) {
                 // The underlying constructor failed. Unwrapping the exception.
                 Throwable cause = ite.getCause();

diff --git a/src/java/org/apache/cassandra/utils/HistogramBuilder.java b/src/java/org/apache/cassandra/utils/HistogramBuilder.java
index 5d22352..093c52c 100644
--- a/src/java/org/apache/cassandra/utils/HistogramBuilder.java
+++ b/src/java/org/apache/cassandra/utils/HistogramBuilder.java

@@ -25,6 +25,9 @@
 public class HistogramBuilder
 {
 
+    public static final long[] EMPTY_LONG_ARRAY = new long[]{};
+    public static final long[] ZERO = new long[]{ 0 };
+
     public HistogramBuilder() {}
     public HistogramBuilder(long[] values)
     {
@@ -73,7 +76,7 @@
         final long[] values = this.values;
 
         if (count == 0)
-            return new EstimatedHistogram(new long[] { }, new long[] { 0 });
+            return new EstimatedHistogram(EMPTY_LONG_ARRAY, ZERO);
 
         long min = Long.MAX_VALUE, max = Long.MIN_VALUE;
         double sum = 0, sumsq = 0;
@@ -114,7 +117,7 @@
             // minormax == mean we have no range to produce, but given the exclusive starts
             // that begin at zero by default (or -Inf) in EstimatedHistogram we have to generate a min range
             // to indicate where we start from
-            return ismin ? new long[] { mean - 1 } : new long[0];
+            return ismin ? new long[] { mean - 1 } : EMPTY_LONG_ARRAY;
 
         if (stdev < 1)
         {

diff --git a/src/java/org/apache/cassandra/utils/IMergeIterator.java b/src/java/org/apache/cassandra/utils/IMergeIterator.java
index deddc4c..e45b897 100644
--- a/src/java/org/apache/cassandra/utils/IMergeIterator.java
+++ b/src/java/org/apache/cassandra/utils/IMergeIterator.java

@@ -21,5 +21,6 @@
 
 public interface IMergeIterator<In, Out> extends CloseableIterator<Out>
 {
+
     Iterable<? extends Iterator<In>> iterators();
 }

diff --git a/src/java/org/apache/cassandra/utils/IntervalTree.java b/src/java/org/apache/cassandra/utils/IntervalTree.java
index b92112e..f761180 100644
--- a/src/java/org/apache/cassandra/utils/IntervalTree.java
+++ b/src/java/org/apache/cassandra/utils/IntervalTree.java

@@ -114,7 +114,7 @@
     public Iterator<I> iterator()
     {
         if (head == null)
-            return Iterators.<I>emptyIterator();
+            return Collections.emptyIterator();
 
         return new TreeIterator(head);
     }
@@ -272,20 +272,21 @@
 
         protected I computeNext()
         {
-            if (current != null && current.hasNext())
-                return current.next();
+            while (true)
+            {
+                if (current != null && current.hasNext())
+                    return current.next();
 
-            IntervalNode node = stack.pollFirst();
-            if (node == null)
-                return endOfData();
+                IntervalNode node = stack.pollFirst();
+                if (node == null)
+                    return endOfData();
 
-            current = node.intersectsLeft.iterator();
+                current = node.intersectsLeft.iterator();
 
-            // We know this is the smaller not returned yet, but before doing
-            // its parent, we must do everyone on it's right.
-            gotoMinOf(node.right);
-
-            return computeNext();
+                // We know this is the smaller not returned yet, but before doing
+                // its parent, we must do everyone on it's right.
+                gotoMinOf(node.right);
+            }
         }
 
         private void gotoMinOf(IntervalNode node)

diff --git a/src/java/org/apache/cassandra/utils/IteratorWithLowerBound.java b/src/java/org/apache/cassandra/utils/IteratorWithLowerBound.java
new file mode 100644
index 0000000..85eeede
--- /dev/null
+++ b/src/java/org/apache/cassandra/utils/IteratorWithLowerBound.java

@@ -0,0 +1,24 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.utils;
+
+public interface IteratorWithLowerBound<In>
+{
+    In lowerBound();
+}

diff --git a/src/java/org/apache/cassandra/utils/JMXServerUtils.java b/src/java/org/apache/cassandra/utils/JMXServerUtils.java
new file mode 100644
index 0000000..dad757e
--- /dev/null
+++ b/src/java/org/apache/cassandra/utils/JMXServerUtils.java

@@ -0,0 +1,371 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.utils;
+
+import java.io.IOException;
+import java.lang.management.ManagementFactory;
+import java.lang.reflect.InvocationHandler;
+import java.lang.reflect.Proxy;
+import java.net.InetAddress;
+import java.rmi.*;
+import java.rmi.server.RMIClientSocketFactory;
+import java.rmi.server.RMIServerSocketFactory;
+import java.rmi.server.UnicastRemoteObject;
+import java.util.Arrays;
+import java.util.HashMap;
+import java.util.Map;
+import java.util.stream.Collectors;
+import javax.management.remote.*;
+import javax.management.remote.rmi.RMIConnectorServer;
+import javax.rmi.ssl.SslRMIClientSocketFactory;
+import javax.rmi.ssl.SslRMIServerSocketFactory;
+import javax.security.auth.Subject;
+
+import com.google.common.collect.ImmutableMap;
+import org.apache.commons.lang3.StringUtils;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import com.sun.jmx.remote.internal.RMIExporter;
+import com.sun.jmx.remote.security.JMXPluggableAuthenticator;
+import org.apache.cassandra.auth.jmx.AuthenticationProxy;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import sun.rmi.registry.RegistryImpl;
+import sun.rmi.server.UnicastServerRef2;
+
+public class JMXServerUtils
+{
+    private static final Logger logger = LoggerFactory.getLogger(JMXServerUtils.class);
+
+    private static java.rmi.registry.Registry registry;
+
+    /**
+     * Creates a server programmatically. This allows us to set parameters which normally are
+     * inaccessable.
+     */
+    public static JMXConnectorServer createJMXServer(int port, boolean local)
+    throws IOException
+    {
+        Map<String, Object> env = new HashMap<>();
+
+        String urlTemplate = "service:jmx:rmi://%1$s/jndi/rmi://%1$s:%2$d/jmxrmi";
+        InetAddress serverAddress = null;
+        if (local)
+        {
+            serverAddress = InetAddress.getLoopbackAddress();
+            System.setProperty("java.rmi.server.hostname", serverAddress.getHostAddress());
+        }
+
+        // Configure the RMI client & server socket factories, including SSL config.
+        env.putAll(configureJmxSocketFactories(serverAddress, local));
+
+
+        // Configure authn, using a JMXAuthenticator which either wraps a set log LoginModules configured
+        // via a JAAS configuration entry, or one which delegates to the standard file based authenticator.
+        // Authn is disabled if com.sun.management.jmxremote.authenticate=false
+        env.putAll(configureJmxAuthentication());
+
+        // Configure authz - if a custom proxy class is specified an instance will be returned.
+        // If not, but a location for the standard access file is set in system properties, the
+        // return value is null, and an entry is added to the env map detailing that location
+        // If neither method is specified, no access control is applied
+        MBeanServerForwarder authzProxy = configureJmxAuthorization(env);
+
+        // Make sure we use our custom exporter so a full GC doesn't get scheduled every
+        // sun.rmi.dgc.server.gcInterval millis (default is 3600000ms/1 hour)
+        env.put(RMIExporter.EXPORTER_ATTRIBUTE, new Exporter());
+
+        String url = String.format(urlTemplate, (serverAddress != null ? serverAddress.getHostAddress() : "0.0.0.0"), port);
+
+        int rmiPort = Integer.getInteger("com.sun.management.jmxremote.rmi.port", 0);
+        JMXConnectorServer jmxServer =
+            JMXConnectorServerFactory.newJMXConnectorServer(new JMXServiceURL("rmi", null, rmiPort),
+                                                            env,
+                                                            ManagementFactory.getPlatformMBeanServer());
+
+        // If a custom authz proxy was created, attach it to the server now.
+        if (authzProxy != null)
+            jmxServer.setMBeanServerForwarder(authzProxy);
+
+        jmxServer.start();
+
+        // use a custom Registry to avoid having to interact with it internally using the remoting interface
+        configureRMIRegistry(port, env);
+
+        logger.info("Configured JMX server at: {}", url);
+        return jmxServer;
+    }
+
+    private static void configureRMIRegistry(int port, Map<String, Object> env) throws RemoteException
+    {
+        Exporter exporter = (Exporter)env.get(RMIExporter.EXPORTER_ATTRIBUTE);
+        // If ssl is enabled, make sure it's also in place for the RMI registry
+        // by using the SSL socket factories already created and stashed in env
+        if (Boolean.getBoolean("com.sun.management.jmxremote.ssl"))
+        {
+            registry = new Registry(port,
+                                   (RMIClientSocketFactory)env.get(RMIConnectorServer.RMI_CLIENT_SOCKET_FACTORY_ATTRIBUTE),
+                                   (RMIServerSocketFactory)env.get(RMIConnectorServer.RMI_SERVER_SOCKET_FACTORY_ATTRIBUTE),
+                                   exporter.connectorServer);
+        }
+        else
+        {
+            registry = new Registry(port, exporter.connectorServer);
+        }
+    }
+
+    private static Map<String, Object> configureJmxAuthentication()
+    {
+        Map<String, Object> env = new HashMap<>();
+        if (!Boolean.getBoolean("com.sun.management.jmxremote.authenticate"))
+            return env;
+
+        // If authentication is enabled, initialize the appropriate JMXAuthenticator
+        // and stash it in the environment settings.
+        // A JAAS configuration entry takes precedence. If one is supplied, use
+        // Cassandra's own custom JMXAuthenticator implementation which delegates
+        // auth to the LoginModules specified by the JAAS configuration entry.
+        // If no JAAS entry is found, an instance of the JDK's own
+        // JMXPluggableAuthenticator is created. In that case, the admin may have
+        // set a location for the JMX password file which must be added to env
+        // before creating the authenticator. If no password file has been
+        // explicitly set, it's read from the default location
+        // $JAVA_HOME/lib/management/jmxremote.password
+        String configEntry = System.getProperty("cassandra.jmx.remote.login.config");
+        if (configEntry != null)
+        {
+            env.put(JMXConnectorServer.AUTHENTICATOR, new AuthenticationProxy(configEntry));
+        }
+        else
+        {
+            String passwordFile = System.getProperty("com.sun.management.jmxremote.password.file");
+            if (passwordFile != null)
+            {
+                // stash the password file location where JMXPluggableAuthenticator expects it
+                env.put("jmx.remote.x.password.file", passwordFile);
+            }
+
+            env.put(JMXConnectorServer.AUTHENTICATOR, new JMXPluggableAuthenticatorWrapper(env));
+        }
+
+        return env;
+    }
+
+    private static MBeanServerForwarder configureJmxAuthorization(Map<String, Object> env)
+    {
+        // If a custom authz proxy is supplied (Cassandra ships with AuthorizationProxy, which
+        // delegates to its own role based IAuthorizer), then instantiate and return one which
+        // can be set as the JMXConnectorServer's MBeanServerForwarder.
+        // If no custom proxy is supplied, check system properties for the location of the
+        // standard access file & stash it in env
+        String authzProxyClass = System.getProperty("cassandra.jmx.authorizer");
+        if (authzProxyClass != null)
+        {
+            final InvocationHandler handler = FBUtilities.construct(authzProxyClass, "JMX authz proxy");
+            final Class[] interfaces = { MBeanServerForwarder.class };
+
+            Object proxy = Proxy.newProxyInstance(MBeanServerForwarder.class.getClassLoader(), interfaces, handler);
+            return MBeanServerForwarder.class.cast(proxy);
+        }
+        else
+        {
+            String accessFile = System.getProperty("com.sun.management.jmxremote.access.file");
+            if (accessFile != null)
+            {
+                env.put("jmx.remote.x.access.file", accessFile);
+            }
+            return null;
+        }
+    }
+
+    private static Map<String, Object> configureJmxSocketFactories(InetAddress serverAddress, boolean localOnly)
+    {
+        Map<String, Object> env = new HashMap<>();
+        if (Boolean.getBoolean("com.sun.management.jmxremote.ssl"))
+        {
+            boolean requireClientAuth = Boolean.getBoolean("com.sun.management.jmxremote.ssl.need.client.auth");
+            String[] protocols = null;
+            String protocolList = System.getProperty("com.sun.management.jmxremote.ssl.enabled.protocols");
+            if (protocolList != null)
+            {
+                System.setProperty("javax.rmi.ssl.client.enabledProtocols", protocolList);
+                protocols = StringUtils.split(protocolList, ',');
+            }
+
+            String[] ciphers = null;
+            String cipherList = System.getProperty("com.sun.management.jmxremote.ssl.enabled.cipher.suites");
+            if (cipherList != null)
+            {
+                System.setProperty("javax.rmi.ssl.client.enabledCipherSuites", cipherList);
+                ciphers = StringUtils.split(cipherList, ',');
+            }
+
+            SslRMIClientSocketFactory clientFactory = new SslRMIClientSocketFactory();
+            SslRMIServerSocketFactory serverFactory = new SslRMIServerSocketFactory(ciphers, protocols, requireClientAuth);
+            env.put(RMIConnectorServer.RMI_SERVER_SOCKET_FACTORY_ATTRIBUTE, serverFactory);
+            env.put(RMIConnectorServer.RMI_CLIENT_SOCKET_FACTORY_ATTRIBUTE, clientFactory);
+            env.put("com.sun.jndi.rmi.factory.socket", clientFactory);
+            logJmxSslConfig(serverFactory);
+        }
+        else if (localOnly){
+            env.put(RMIConnectorServer.RMI_SERVER_SOCKET_FACTORY_ATTRIBUTE,
+                    new RMIServerSocketFactoryImpl(serverAddress));
+        }
+
+        return env;
+    }
+
+    private static void logJmxSslConfig(SslRMIServerSocketFactory serverFactory)
+    {
+        logger.debug("JMX SSL configuration. { protocols: [{}], cipher_suites: [{}], require_client_auth: {} }",
+                     serverFactory.getEnabledProtocols() == null
+                     ? "'JVM defaults'"
+                     : Arrays.stream(serverFactory.getEnabledProtocols()).collect(Collectors.joining("','", "'", "'")),
+                     serverFactory.getEnabledCipherSuites() == null
+                     ? "'JVM defaults'"
+                     : Arrays.stream(serverFactory.getEnabledCipherSuites()).collect(Collectors.joining("','", "'", "'")),
+                     serverFactory.getNeedClientAuth());
+    }
+
+    private static class JMXPluggableAuthenticatorWrapper implements JMXAuthenticator
+    {
+        final Map<?, ?> env;
+        private JMXPluggableAuthenticatorWrapper(Map<?, ?> env)
+        {
+            this.env = ImmutableMap.copyOf(env);
+        }
+
+        public Subject authenticate(Object credentials)
+        {
+            JMXPluggableAuthenticator authenticator = new JMXPluggableAuthenticator(env);
+            return authenticator.authenticate(credentials);
+        }
+    }
+
+    /**
+     * In the RMI subsystem, the ObjectTable instance holds references to remote
+     * objects for distributed garbage collection purposes. When objects are
+     * added to the ObjectTable (exported), a flag is passed to * indicate the
+     * "permanence" of that object. Exporting as permanent has two effects; the
+     * object is not eligible for distributed garbage collection, and its
+     * existence will not prevent the JVM from exiting after termination of all
+     * non-daemon threads terminate. Neither of these is bad for our case, as we
+     * attach the server exactly once (i.e. at startup, not subsequently using
+     * the Attach API) and don't disconnect it before shutdown. The primary
+     * benefit we gain is that it doesn't trigger the scheduled full GC that
+     * is otherwise incurred by programatically configuring the management server.
+     *
+     * To that end, we use this private implementation of RMIExporter to register
+     * our JMXConnectorServer as a permanent object by adding it to the map of
+     * environment variables under the key RMIExporter.EXPORTER_ATTRIBUTE
+     * (com.sun.jmx.remote.rmi.exporter) prior to calling server.start()
+     *
+     * See also:
+     *  * CASSANDRA-2967 for background
+     *  * https://www.jclarity.com/2015/01/27/rmi-system-gc-unplugged/ for more detail
+     *  * https://bugs.openjdk.java.net/browse/JDK-6760712 for info on setting the exporter
+     *  * sun.management.remote.ConnectorBootstrap to trace how the inbuilt management agent
+     *    sets up the JMXConnectorServer
+     */
+    private static class Exporter implements RMIExporter
+    {
+        // the first object to be exported by this instance is *always* the JMXConnectorServer
+        // instance created by createJMXServer. Keep a handle to it, as it needs to be supplied
+        // to our custom Registry too.
+        private Remote connectorServer;
+
+        public Remote exportObject(Remote obj, int port, RMIClientSocketFactory csf, RMIServerSocketFactory ssf)
+        throws RemoteException
+        {
+            Remote remote = new UnicastServerRef2(port, csf, ssf).exportObject(obj, null, true);
+            // Keep a reference to the first object exported, the JMXConnectorServer
+            if (connectorServer == null)
+                connectorServer = remote;
+
+            return remote;
+        }
+
+        public boolean unexportObject(Remote obj, boolean force) throws NoSuchObjectException
+        {
+            return UnicastRemoteObject.unexportObject(obj, force);
+        }
+    }
+
+    /**
+     * Using this class avoids the necessity to interact with the registry via its
+     * remoting interface. This is necessary because when SSL is enabled for the registry,
+     * that remote interaction is treated just the same as one from an external client.
+     * That is problematic when binding the JMXConnectorServer to the Registry as it requires
+     * the client, which in this case is our own internal code, to connect like any other SSL
+     * client, meaning we need a truststore containing our own certificate.
+     * This bypasses the binding API completely, which emulates the behaviour of
+     * ConnectorBootstrap when the subsystem is initialized by the JVM Agent directly.
+     *
+     * See CASSANDRA-12109.
+     */
+    private static class Registry extends RegistryImpl
+    {
+        private final static String KEY = "jmxrmi";
+        private final Remote connectorServer;
+
+        private Registry(int port, Remote connectorServer) throws RemoteException
+        {
+            super(port);
+            this.connectorServer = connectorServer;
+        }
+
+        private Registry(int port,
+                         RMIClientSocketFactory csf,
+                         RMIServerSocketFactory ssf,
+                         Remote connectorServer) throws RemoteException
+        {
+            super(port, csf, ssf);
+            this.connectorServer = connectorServer;
+        }
+
+        public Remote lookup(String name) throws RemoteException, NotBoundException
+        {
+            if (name.equals(KEY))
+                return connectorServer;
+
+            throw new NotBoundException(String.format("Only the JMX Connector Server named %s " +
+                                                      "is bound in this registry", KEY));
+        }
+
+        public void bind(String name, Remote obj) throws RemoteException, AlreadyBoundException
+        {
+            throw new UnsupportedOperationException("Unsupported");
+        }
+
+        public void unbind(String name) throws RemoteException, NotBoundException
+        {
+            throw new UnsupportedOperationException("Unsupported");
+        }
+
+        public void rebind(String name, Remote obj) throws RemoteException
+        {
+            throw new UnsupportedOperationException("Unsupported");
+        }
+
+        public String[] list() throws RemoteException
+        {
+            return new String[] {KEY};
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/utils/JVMStabilityInspector.java b/src/java/org/apache/cassandra/utils/JVMStabilityInspector.java
index c06a97b..e1a109a 100644
--- a/src/java/org/apache/cassandra/utils/JVMStabilityInspector.java
+++ b/src/java/org/apache/cassandra/utils/JVMStabilityInspector.java

@@ -77,11 +77,12 @@
 
     public static void inspectCommitLogThrowable(Throwable t)
     {
-        if (!StorageService.instance.isSetupCompleted())
+        if (!StorageService.instance.isDaemonSetupCompleted())
         {
             logger.error("Exiting due to error while processing commit log during initialization.", t);
             killer.killCurrentJVM(t, true);
-        } else if (DatabaseDescriptor.getCommitFailurePolicy() == Config.CommitFailurePolicy.die)
+        }
+        else if (DatabaseDescriptor.getCommitFailurePolicy() == Config.CommitFailurePolicy.die)
             killer.killCurrentJVM(t);
         else
             inspectThrowable(t);

diff --git a/src/java/org/apache/cassandra/utils/MD5Digest.java b/src/java/org/apache/cassandra/utils/MD5Digest.java
index 2dc57de..2feb09e 100644
--- a/src/java/org/apache/cassandra/utils/MD5Digest.java
+++ b/src/java/org/apache/cassandra/utils/MD5Digest.java

@@ -17,9 +17,10 @@
  */
 package org.apache.cassandra.utils;
 
-import java.io.UnsupportedEncodingException;
+import java.nio.charset.StandardCharsets;
 import java.util.Arrays;
 
+
 /**
  * The result of the computation of an MD5 digest.
  *
@@ -51,14 +52,7 @@
 
     public static MD5Digest compute(String toHash)
     {
-        try
-        {
-            return compute(toHash.getBytes("UTF-8"));
-        }
-        catch (UnsupportedEncodingException e)
-        {
-            throw new RuntimeException(e.getMessage());
-        }
+        return compute(toHash.getBytes(StandardCharsets.UTF_8));
     }
 
     @Override

diff --git a/src/java/org/apache/cassandra/utils/MergeIterator.java b/src/java/org/apache/cassandra/utils/MergeIterator.java
index 0cc5306..c9e445b 100644
--- a/src/java/org/apache/cassandra/utils/MergeIterator.java
+++ b/src/java/org/apache/cassandra/utils/MergeIterator.java

@@ -17,11 +17,8 @@
  */
 package org.apache.cassandra.utils;
 
-import java.io.Closeable;
 import java.util.*;
 
-import org.apache.cassandra.utils.AbstractIterator;
-
 /** Merges sorted input iterators which individually contain unique items. */
 public abstract class MergeIterator<In,Out> extends AbstractIterator<Out> implements IMergeIterator<In, Out>
 {
@@ -203,7 +200,7 @@
 
             reducer.onKeyChange();
             assert !heap[0].equalParent;
-            reducer.reduce(heap[0].idx, heap[0].consume());
+            heap[0].consume(reducer);
             final int size = this.size;
             final int sortedSectionSize = Math.min(size, SORTED_SECTION_SIZE);
             int i;
@@ -212,7 +209,7 @@
                 {
                     if (!heap[i].equalParent)
                         break consume;
-                    reducer.reduce(heap[i].idx, heap[i].consume());
+                    heap[i].consume(reducer);
                 }
                 i = Math.max(i, consumeHeap(i) + 1);
             }
@@ -230,7 +227,7 @@
             if (idx >= size || !heap[idx].equalParent)
                 return -1;
 
-            reducer.reduce(heap[idx].idx, heap[idx].consume());
+            heap[idx].consume(reducer);
             int nextIdx = (idx << 1) - (SORTED_SECTION_SIZE - 1);
             return Math.max(idx, Math.max(consumeHeap(nextIdx), consumeHeap(nextIdx + 1)));
         }
@@ -354,6 +351,7 @@
         private final Comparator<? super In> comp;
         private final int idx;
         private In item;
+        private In lowerBound;
         boolean equalParent;
 
         public Candidate(int idx, Iterator<? extends In> iter, Comparator<? super In> comp)
@@ -361,29 +359,55 @@
             this.iter = iter;
             this.comp = comp;
             this.idx = idx;
+            this.lowerBound = iter instanceof IteratorWithLowerBound ? ((IteratorWithLowerBound<In>)iter).lowerBound() : null;
         }
 
         /** @return this if our iterator had an item, and it is now available, otherwise null */
         protected Candidate<In> advance()
         {
+            if (lowerBound != null)
+            {
+                item = lowerBound;
+                return this;
+            }
+
             if (!iter.hasNext())
                 return null;
+
             item = iter.next();
             return this;
         }
 
         public int compareTo(Candidate<In> that)
         {
-            assert item != null && that.item != null;
-            return comp.compare(this.item, that.item);
+            assert this.item != null && that.item != null;
+            int ret = comp.compare(this.item, that.item);
+            if (ret == 0 && (this.isLowerBound() ^ that.isLowerBound()))
+            {   // if the items are equal and one of them is a lower bound (but not the other one)
+                // then ensure the lower bound is less than the real item so we can safely
+                // skip lower bounds when consuming
+                return this.isLowerBound() ? -1 : 1;
+            }
+            return ret;
         }
 
-        public In consume()
+        private boolean isLowerBound()
         {
-            In temp = item;
-            item = null;
-            assert temp != null;
-            return temp;
+            return item == lowerBound;
+        }
+
+        public void consume(Reducer reducer)
+        {
+            if (isLowerBound())
+            {
+                item = null;
+                lowerBound = null;
+            }
+            else
+            {
+                reducer.reduce(idx, item);
+                item = null;
+            }
         }
 
         public boolean needsAdvance()

diff --git a/src/java/org/apache/cassandra/utils/MerkleTree.java b/src/java/org/apache/cassandra/utils/MerkleTree.java
index b3bccac..bc39b91 100644
--- a/src/java/org/apache/cassandra/utils/MerkleTree.java
+++ b/src/java/org/apache/cassandra/utils/MerkleTree.java

@@ -326,19 +326,28 @@
 
     TreeRange getHelper(Hashable hashable, Token pleft, Token pright, byte depth, Token t)
     {
-        if (hashable instanceof Leaf)
+        while (true)
         {
-            // we've reached a hash: wrap it up and deliver it
-            return new TreeRange(this, pleft, pright, depth, hashable);
-        }
-        // else: node.
+            if (hashable instanceof Leaf)
+            {
+                // we've reached a hash: wrap it up and deliver it
+                return new TreeRange(this, pleft, pright, depth, hashable);
+            }
+            // else: node.
 
-        Inner node = (Inner)hashable;
-        if (Range.contains(pleft, node.token, t))
-            // left child contains token
-            return getHelper(node.lchild, pleft, node.token, inc(depth), t);
-        // else: right child contains token
-        return getHelper(node.rchild, node.token, pright, inc(depth), t);
+            Inner node = (Inner) hashable;
+            depth = inc(depth);
+            if (Range.contains(pleft, node.token, t))
+            { // left child contains token
+                hashable = node.lchild;
+                pright = node.token;
+            }
+            else
+            { // else: right child contains token
+                hashable = node.rchild;
+                pleft = node.token;
+            }
+        }
     }
 
     /**
@@ -404,33 +413,42 @@
      */
     private Hashable findHelper(Hashable current, Range<Token> activeRange, Range<Token> find) throws StopRecursion
     {
-        if (current instanceof Leaf)
+        while (true)
         {
-            if (!find.contains(activeRange))
-                // we are not fully contained in this range!
+            if (current instanceof Leaf)
+            {
+                if (!find.contains(activeRange))
+                    // we are not fully contained in this range!
+                    throw new StopRecursion.BadRange();
+                return current;
+            }
+            // else: node.
+
+            Inner node = (Inner) current;
+            Range<Token> leftRange = new Range<>(activeRange.left, node.token);
+            Range<Token> rightRange = new Range<>(node.token, activeRange.right);
+
+            if (find.contains(activeRange))
+                // this node is fully contained in the range
+                return node.calc();
+
+            // else: one of our children contains the range
+
+            if (leftRange.contains(find))
+            { // left child contains/matches the range
+                current = node.lchild;
+                activeRange = leftRange;
+            }
+            else if (rightRange.contains(find))
+            { // right child contains/matches the range
+                current = node.rchild;
+                activeRange = rightRange;
+            }
+            else
+            {
                 throw new StopRecursion.BadRange();
-            return current;
+            }
         }
-        // else: node.
-
-        Inner node = (Inner)current;
-        Range<Token> leftRange = new Range<Token>(activeRange.left, node.token);
-        Range<Token> rightRange = new Range<Token>(node.token, activeRange.right);
-
-        if (find.contains(activeRange))
-            // this node is fully contained in the range
-            return node.calc();
-
-        // else: one of our children contains the range
-
-        if (leftRange.contains(find))
-            // left child contains/matches the range
-            return findHelper(node.lchild, leftRange, find);
-        else if (rightRange.contains(find))
-            // right child contains/matches the range
-            return findHelper(node.rchild, rightRange, find);
-        else
-            throw new StopRecursion.BadRange();
     }
 
     /**

diff --git a/src/java/org/apache/cassandra/utils/NanoTimeToCurrentTimeMillis.java b/src/java/org/apache/cassandra/utils/NanoTimeToCurrentTimeMillis.java
index f124383..69b7a47 100644
--- a/src/java/org/apache/cassandra/utils/NanoTimeToCurrentTimeMillis.java
+++ b/src/java/org/apache/cassandra/utils/NanoTimeToCurrentTimeMillis.java

@@ -19,6 +19,7 @@
 
 import java.util.concurrent.TimeUnit;
 
+import org.apache.cassandra.concurrent.ScheduledExecutors;
 import org.apache.cassandra.config.Config;
 
 import com.google.common.annotations.VisibleForTesting;
@@ -36,9 +37,6 @@
 
     private static volatile long TIMESTAMP_BASE[] = new long[] { System.currentTimeMillis(), System.nanoTime() };
 
-    @VisibleForTesting
-    public static final Object TIMESTAMP_UPDATE = new Object();
-
     /*
      * System.currentTimeMillis() is 25 nanoseconds. This is 2 nanoseconds (maybe) according to JMH.
      * Faster than calling both currentTimeMillis() and nanoTime().
@@ -48,41 +46,29 @@
      * These timestamps don't order with System.currentTimeMillis() because currentTimeMillis() can tick over
      * before this one does. I have seen it behind by as much as 2ms on Linux and 25ms on Windows.
      */
-    public static final long convert(long nanoTime)
+    public static long convert(long nanoTime)
     {
         final long timestampBase[] = TIMESTAMP_BASE;
         return timestampBase[0] + TimeUnit.NANOSECONDS.toMillis(nanoTime - timestampBase[1]);
     }
 
+    public static void updateNow()
+    {
+        ScheduledExecutors.scheduledFastTasks.submit(NanoTimeToCurrentTimeMillis::updateTimestampBase);
+    }
+
     static
     {
-        //Pick up updates from NTP periodically
-        Thread t = new Thread("NanoTimeToCurrentTimeMillis updater")
-        {
-            @Override
-            public void run()
-            {
-                while (true)
-                {
-                    try
-                    {
-                        synchronized (TIMESTAMP_UPDATE)
-                        {
-                            TIMESTAMP_UPDATE.wait(TIMESTAMP_UPDATE_INTERVAL);
-                        }
-                    }
-                    catch (InterruptedException e)
-                    {
-                        return;
-                    }
+        ScheduledExecutors.scheduledFastTasks.scheduleWithFixedDelay(NanoTimeToCurrentTimeMillis::updateTimestampBase,
+                                                                     TIMESTAMP_UPDATE_INTERVAL,
+                                                                     TIMESTAMP_UPDATE_INTERVAL,
+                                                                     TimeUnit.MILLISECONDS);
+    }
 
-                    TIMESTAMP_BASE = new long[] {
-                            Math.max(TIMESTAMP_BASE[0], System.currentTimeMillis()),
-                            Math.max(TIMESTAMP_BASE[1], System.nanoTime()) };
-                }
-            }
-        };
-        t.setDaemon(true);
-        t.start();
+    private static void updateTimestampBase()
+    {
+        TIMESTAMP_BASE = new long[] {
+                                    Math.max(TIMESTAMP_BASE[0], System.currentTimeMillis()),
+                                    Math.max(TIMESTAMP_BASE[1], System.nanoTime()) };
     }
 }

diff --git a/src/java/org/apache/cassandra/utils/OverlapIterator.java b/src/java/org/apache/cassandra/utils/OverlapIterator.java
index b346a62..7c1544a 100644
--- a/src/java/org/apache/cassandra/utils/OverlapIterator.java
+++ b/src/java/org/apache/cassandra/utils/OverlapIterator.java

@@ -17,7 +17,7 @@
  * specific language governing permissions and limitations
  * under the License.
  *
-*/
+ */
 package org.apache.cassandra.utils;
 
 import java.util.*;

diff --git a/src/java/org/apache/cassandra/utils/RMIServerSocketFactoryImpl.java b/src/java/org/apache/cassandra/utils/RMIServerSocketFactoryImpl.java
index 6d4ff72..4ac4a39 100644
--- a/src/java/org/apache/cassandra/utils/RMIServerSocketFactoryImpl.java
+++ b/src/java/org/apache/cassandra/utils/RMIServerSocketFactoryImpl.java

@@ -26,14 +26,19 @@
 import java.rmi.server.RMIServerSocketFactory;
 import javax.net.ServerSocketFactory;
 
-
 public class RMIServerSocketFactoryImpl implements RMIServerSocketFactory
 {
+    // Address to bind server sockets too, may be null indicating all local interfaces are to be bound
+    private final InetAddress bindAddress;
+
+    public RMIServerSocketFactoryImpl(InetAddress bindAddress)
+    {
+        this.bindAddress = bindAddress;
+    }
 
     public ServerSocket createServerSocket(final int pPort) throws IOException
     {
-        ServerSocket socket = ServerSocketFactory.getDefault()
-                                                 .createServerSocket(pPort, 0, InetAddress.getLoopbackAddress());
+        ServerSocket socket = ServerSocketFactory.getDefault().createServerSocket(pPort, 0, bindAddress);
         socket.setReuseAddress(true);
         return socket;
     }
@@ -57,3 +62,4 @@
         return RMIServerSocketFactoryImpl.class.hashCode();
     }
 }
+

diff --git a/src/java/org/apache/cassandra/utils/SearchIterator.java b/src/java/org/apache/cassandra/utils/SearchIterator.java
index 5309f4a..95cb33c 100644
--- a/src/java/org/apache/cassandra/utils/SearchIterator.java
+++ b/src/java/org/apache/cassandra/utils/SearchIterator.java

@@ -26,7 +26,7 @@
      * if this or any key greater has already been returned by the iterator, the method may
      * choose to return null, the correct or incorrect output, or fail an assertion.
      *
-     * it is permitted to search past the end of the iterator, i.e. !hasNext() => next(?) == null
+     * it is permitted to search past the end of the iterator, i.e. {@code !hasNext() => next(?) == null}
      *
      * @param key to search for
      * @return value associated with key, if present in direction of travel

diff --git a/src/java/org/apache/cassandra/utils/StreamingHistogram.java b/src/java/org/apache/cassandra/utils/StreamingHistogram.java
index b925395..a500450 100644
--- a/src/java/org/apache/cassandra/utils/StreamingHistogram.java
+++ b/src/java/org/apache/cassandra/utils/StreamingHistogram.java

@@ -39,7 +39,10 @@
     public static final StreamingHistogramSerializer serializer = new StreamingHistogramSerializer();
 
     // TreeMap to hold bins of histogram.
-    private final TreeMap<Double, Long> bin;
+    // The key is a numeric type so we can avoid boxing/unboxing streams of different key types
+    // The value is a unboxed long array always of length == 1
+    // Serialized Histograms always writes with double keys for backwards compatibility
+    private final TreeMap<Number, long[]> bin;
 
     // maximum bin size for this histogram
     private final int maxBinSize;
@@ -51,22 +54,28 @@
     public StreamingHistogram(int maxBinSize)
     {
         this.maxBinSize = maxBinSize;
-        bin = new TreeMap<>();
+        bin = new TreeMap<>((o1, o2) -> {
+            if (o1.getClass().equals(o2.getClass()))
+                return ((Comparable)o1).compareTo(o2);
+            else
+            	return Double.compare(o1.doubleValue(), o2.doubleValue());
+        });
     }
 
     private StreamingHistogram(int maxBinSize, Map<Double, Long> bin)
     {
-        this.maxBinSize = maxBinSize;
-        this.bin = new TreeMap<>(bin);
+        this(maxBinSize);
+        for (Map.Entry<Double, Long> entry : bin.entrySet())
+            this.bin.put(entry.getKey(), new long[]{entry.getValue()});
     }
 
     /**
      * Adds new point p to this histogram.
      * @param p
      */
-    public void update(double p)
+    public void update(Number p)
     {
-        update(p, 1);
+        update(p, 1L);
     }
 
     /**
@@ -74,30 +83,31 @@
      * @param p
      * @param m
      */
-    public void update(double p, long m)
+    public void update(Number p, long m)
     {
-        Long mi = bin.get(p);
+        long[] mi = bin.get(p);
         if (mi != null)
         {
             // we found the same p so increment that counter
-            bin.put(p, mi + m);
+            mi[0] += m;
         }
         else
         {
-            bin.put(p, m);
+            mi = new long[]{m};
+            bin.put(p, mi);
             // if bin size exceeds maximum bin size then trim down to max size
             while (bin.size() > maxBinSize)
             {
                 // find points p1, p2 which have smallest difference
-                Iterator<Double> keys = bin.keySet().iterator();
-                double p1 = keys.next();
-                double p2 = keys.next();
+                Iterator<Number> keys = bin.keySet().iterator();
+                double p1 = keys.next().doubleValue();
+                double p2 = keys.next().doubleValue();
                 double smallestDiff = p2 - p1;
                 double q1 = p1, q2 = p2;
                 while (keys.hasNext())
                 {
                     p1 = p2;
-                    p2 = keys.next();
+                    p2 = keys.next().doubleValue();
                     double diff = p2 - p1;
                     if (diff < smallestDiff)
                     {
@@ -107,9 +117,13 @@
                     }
                 }
                 // merge those two
-                long k1 = bin.remove(q1);
-                long k2 = bin.remove(q2);
-                bin.put((q1 * k1 + q2 * k2) / (k1 + k2), k1 + k2);
+                long[] a1 = bin.remove(q1);
+                long[] a2 = bin.remove(q2);
+                long k1 = a1[0];
+                long k2 = a2[0];
+
+                a1[0] += k2;
+                bin.put((q1 * k1 + q2 * k2) / (k1 + k2), a1);
             }
         }
     }
@@ -124,8 +138,8 @@
         if (other == null)
             return;
 
-        for (Map.Entry<Double, Long> entry : other.getAsMap().entrySet())
-            update(entry.getKey(), entry.getValue());
+        for (Map.Entry<Number, long[]> entry : other.getAsMap().entrySet())
+            update(entry.getKey(), entry.getValue()[0]);
     }
 
     /**
@@ -138,32 +152,32 @@
     {
         double sum = 0;
         // find the points pi, pnext which satisfy pi <= b < pnext
-        Map.Entry<Double, Long> pnext = bin.higherEntry(b);
+        Map.Entry<Number, long[]> pnext = bin.higherEntry(b);
         if (pnext == null)
         {
             // if b is greater than any key in this histogram,
             // just count all appearance and return
-            for (Long value : bin.values())
-                sum += value;
+            for (long[] value : bin.values())
+                sum += value[0];
         }
         else
         {
-            Map.Entry<Double, Long> pi = bin.floorEntry(b);
+            Map.Entry<Number, long[]> pi = bin.floorEntry(b);
             if (pi == null)
                 return 0;
             // calculate estimated count mb for point b
-            double weight = (b - pi.getKey()) / (pnext.getKey() - pi.getKey());
-            double mb = pi.getValue() + (pnext.getValue() - pi.getValue()) * weight;
-            sum += (pi.getValue() + mb) * weight / 2;
+            double weight = (b - pi.getKey().doubleValue()) / (pnext.getKey().doubleValue() - pi.getKey().doubleValue());
+            double mb = pi.getValue()[0] + (pnext.getValue()[0] - pi.getValue()[0]) * weight;
+            sum += (pi.getValue()[0] + mb) * weight / 2;
 
-            sum += pi.getValue() / 2.0;
-            for (Long value : bin.headMap(pi.getKey(), false).values())
-                sum += value;
+            sum += pi.getValue()[0] / 2.0;
+            for (long[] value : bin.headMap(pi.getKey(), false).values())
+                sum += value[0];
         }
         return sum;
     }
 
-    public Map<Double, Long> getAsMap()
+    public Map<Number, long[]> getAsMap()
     {
         return Collections.unmodifiableMap(bin);
     }
@@ -173,12 +187,12 @@
         public void serialize(StreamingHistogram histogram, DataOutputPlus out) throws IOException
         {
             out.writeInt(histogram.maxBinSize);
-            Map<Double, Long> entries = histogram.getAsMap();
+            Map<Number, long[]> entries = histogram.getAsMap();
             out.writeInt(entries.size());
-            for (Map.Entry<Double, Long> entry : entries.entrySet())
+            for (Map.Entry<Number, long[]> entry : entries.entrySet())
             {
-                out.writeDouble(entry.getKey());
-                out.writeLong(entry.getValue());
+                out.writeDouble(entry.getKey().doubleValue());
+                out.writeLong(entry.getValue()[0]);
             }
         }
 
@@ -198,7 +212,7 @@
         public long serializedSize(StreamingHistogram histogram)
         {
             long size = TypeSizes.sizeof(histogram.maxBinSize);
-            Map<Double, Long> entries = histogram.getAsMap();
+            Map<Number, long[]> entries = histogram.getAsMap();
             size += TypeSizes.sizeof(entries.size());
             // size of entries = size * (8(double) + 8(long))
             size += entries.size() * (8L + 8L);

diff --git a/src/java/org/apache/cassandra/utils/UUIDGen.java b/src/java/org/apache/cassandra/utils/UUIDGen.java
index 3efcb5e..a8b3093 100644
--- a/src/java/org/apache/cassandra/utils/UUIDGen.java
+++ b/src/java/org/apache/cassandra/utils/UUIDGen.java

@@ -25,13 +25,13 @@
 import java.util.Collection;
 import java.util.Random;
 import java.util.UUID;
+import java.util.concurrent.atomic.AtomicLong;
 import java.util.concurrent.TimeUnit;
 
 import com.google.common.annotations.VisibleForTesting;
 import com.google.common.base.Charsets;
 import com.google.common.primitives.Ints;
 
-
 /**
  * The goods are here: www.ietf.org/rfc/rfc4122.txt.
  */
@@ -60,7 +60,7 @@
     // placement of this singleton is important.  It needs to be instantiated *AFTER* the other statics.
     private static final UUIDGen instance = new UUIDGen();
 
-    private long lastNanos;
+    private AtomicLong lastNanos = new AtomicLong();
 
     private UUIDGen()
     {
@@ -142,6 +142,15 @@
         return new UUID(raw.getLong(raw.position()), raw.getLong(raw.position() + 8));
     }
 
+    public static ByteBuffer toByteBuffer(UUID uuid)
+    {
+        ByteBuffer buffer = ByteBuffer.allocate(16);
+        buffer.putLong(uuid.getMostSignificantBits());
+        buffer.putLong(uuid.getLeastSignificantBits());
+        buffer.flip();
+        return buffer;
+    }
+
     /** decomposes a uuid into raw bytes. */
     public static byte[] decompose(UUID uuid)
     {
@@ -239,7 +248,7 @@
      * of a type 1 UUID (a time-based UUID).
      *
      * To specify a 100-nanoseconds precision timestamp, one should provide a milliseconds timestamp and
-     * a number 0 <= n < 10000 such that n*100 is the number of nanoseconds within that millisecond.
+     * a number {@code 0 <= n < 10000} such that n*100 is the number of nanoseconds within that millisecond.
      *
      * <p><i><b>Warning:</b> This method is not guaranteed to return unique UUIDs; Multiple
      * invocations using identical timestamps will result in identical UUIDs.</i></p>
@@ -294,15 +303,31 @@
 
     // needs to return two different values for the same when.
     // we can generate at most 10k UUIDs per ms.
-    private synchronized long createTimeSafe()
+    private long createTimeSafe()
     {
-        long nanosSince = (System.currentTimeMillis() - START_EPOCH) * 10000;
-        if (nanosSince > lastNanos)
-            lastNanos = nanosSince;
-        else
-            nanosSince = ++lastNanos;
-
-        return createTime(nanosSince);
+        long newLastNanos;
+        while (true)
+        {
+            //Generate a candidate value for new lastNanos
+            newLastNanos = (System.currentTimeMillis() - START_EPOCH) * 10000;
+            long originalLastNanos = lastNanos.get();
+            if (newLastNanos > originalLastNanos)
+            {
+                //Slow path once per millisecond do a CAS
+                if (lastNanos.compareAndSet(originalLastNanos, newLastNanos))
+                {
+                    break;
+                }
+            }
+            else
+            {
+                //Fast path do an atomic increment
+                //Or when falling behind this will move time forward past the clock if necessary
+                newLastNanos = lastNanos.incrementAndGet();
+                break;
+            }
+        }
+        return createTime(newLastNanos);
     }
 
     private long createTimeUnsafe(long when, int nanos)

diff --git a/src/java/org/apache/cassandra/utils/WindowsTimer.java b/src/java/org/apache/cassandra/utils/WindowsTimer.java
index 9db8559..eed8eb2 100644
--- a/src/java/org/apache/cassandra/utils/WindowsTimer.java
+++ b/src/java/org/apache/cassandra/utils/WindowsTimer.java

@@ -51,7 +51,7 @@
             return;
         assert(period > 0);
         if (timeBeginPeriod(period) != 0)
-            logger.warn("Failed to set timer to : " + period + ". Performance will be degraded.");
+            logger.warn("Failed to set timer to : {}. Performance will be degraded.", period);
     }
 
     public static void endTimerPeriod(int period)
@@ -60,6 +60,6 @@
             return;
         assert(period > 0);
         if (timeEndPeriod(period) != 0)
-            logger.warn("Failed to end accelerated timer period. System timer will remain set to: " + period + " ms.");
+            logger.warn("Failed to end accelerated timer period. System timer will remain set to: {} ms.", period);
     }
 }

diff --git a/src/java/org/apache/cassandra/utils/btree/BTree.java b/src/java/org/apache/cassandra/utils/btree/BTree.java
index fe08011..33f4152 100644
--- a/src/java/org/apache/cassandra/utils/btree/BTree.java
+++ b/src/java/org/apache/cassandra/utils/btree/BTree.java

@@ -24,6 +24,7 @@
 import com.google.common.collect.Iterators;
 import com.google.common.collect.Ordering;
 
+import io.netty.util.Recycler;
 import org.apache.cassandra.utils.ObjectSizes;
 
 import static com.google.common.collect.Iterables.concat;
@@ -67,6 +68,8 @@
     // NB we encode Path indexes as Bytes, so this needs to be less than Byte.MAX_VALUE / 2
     static final int FAN_FACTOR = 1 << FAN_SHIFT;
 
+    static final int MINIMAL_NODE_SIZE = FAN_FACTOR >> 1;
+
     // An empty BTree Leaf - which is the same as an empty BTree
     static final Object[] EMPTY_LEAF = new Object[1];
 
@@ -137,12 +140,9 @@
             return values;
         }
 
-        Queue<TreeBuilder> queue = modifier.get();
-        TreeBuilder builder = queue.poll();
-        if (builder == null)
-            builder = new TreeBuilder();
+        TreeBuilder builder = TreeBuilder.newInstance();
         Object[] btree = builder.build(source, updateF, size);
-        queue.add(builder);
+
         return btree;
     }
 
@@ -174,12 +174,9 @@
         if (isEmpty(btree))
             return build(updateWith, updateWithLength, updateF);
 
-        Queue<TreeBuilder> queue = modifier.get();
-        TreeBuilder builder = queue.poll();
-        if (builder == null)
-            builder = new TreeBuilder();
+
+        TreeBuilder builder = TreeBuilder.newInstance();
         btree = builder.update(btree, comparator, updateWith, updateF);
-        queue.add(builder);
         return btree;
     }
 
@@ -201,12 +198,12 @@
 
     public static <V> Iterator<V> iterator(Object[] btree, Dir dir)
     {
-        return new BTreeSearchIterator<V, V>(btree, null, dir);
+        return new BTreeSearchIterator<>(btree, null, dir);
     }
 
     public static <V> Iterator<V> iterator(Object[] btree, int lb, int ub, Dir dir)
     {
-        return new BTreeSearchIterator<V, V>(btree, null, dir, lb, ub);
+        return new BTreeSearchIterator<>(btree, null, dir, lb, ub);
     }
 
     public static <V> Iterable<V> iterable(Object[] btree)
@@ -296,6 +293,40 @@
 
     /**
      * Modifies the provided btree directly. THIS SHOULD NOT BE USED WITHOUT EXTREME CARE as BTrees are meant to be immutable.
+     * Finds and replaces the item provided by index in the tree.
+     */
+    public static <V> void replaceInSitu(Object[] tree, int index, V replace)
+    {
+        // WARNING: if semantics change, see also InternalCursor.seekTo, which mirrors this implementation
+        if ((index < 0) | (index >= size(tree)))
+            throw new IndexOutOfBoundsException(index + " not in range [0.." + size(tree) + ")");
+
+        while (!isLeaf(tree))
+        {
+            final int[] sizeMap = getSizeMap(tree);
+            int boundary = Arrays.binarySearch(sizeMap, index);
+            if (boundary >= 0)
+            {
+                // exact match, in this branch node
+                assert boundary < sizeMap.length - 1;
+                tree[boundary] = replace;
+                return;
+            }
+
+            boundary = -1 -boundary;
+            if (boundary > 0)
+            {
+                assert boundary < sizeMap.length;
+                index -= (1 + sizeMap[boundary - 1]);
+            }
+            tree = (Object[]) tree[getChildStart(tree) + boundary];
+        }
+        assert index < getLeafKeyEnd(tree);
+        tree[index] = replace;
+    }
+
+    /**
+     * Modifies the provided btree directly. THIS SHOULD NOT BE USED WITHOUT EXTREME CARE as BTrees are meant to be immutable.
      * Finds and replaces the provided item in the tree. Both should sort as equal to each other (although this is not enforced)
      */
     public static <V> void replaceInSitu(Object[] node, Comparator<? super V> comparator, V find, V replace)
@@ -735,28 +766,29 @@
         return 1 + lookupSizeMap(root, childIndex - 1);
     }
 
-    private static final ThreadLocal<Queue<TreeBuilder>> modifier = new ThreadLocal<Queue<TreeBuilder>>()
+    final static Recycler<Builder> builderRecycler = new Recycler<Builder>()
     {
-        @Override
-        protected Queue<TreeBuilder> initialValue()
+        protected Builder newObject(Handle handle)
         {
-            return new ArrayDeque<>();
+            return new Builder(handle);
         }
     };
 
     public static <V> Builder<V> builder(Comparator<? super V> comparator)
     {
-        return new Builder<>(comparator);
+        Builder<V> builder = builderRecycler.get();
+        builder.reuse(comparator);
+
+        return builder;
     }
 
     public static <V> Builder<V> builder(Comparator<? super V> comparator, int initialCapacity)
     {
-        return new Builder<>(comparator);
+        return builder(comparator);
     }
 
     public static class Builder<V>
     {
-
         // a user-defined bulk resolution, to be applied manually via resolve()
         public static interface Resolver
         {
@@ -781,16 +813,13 @@
         boolean detected = true; // true if we have managed to cheaply ensure sorted (+ filtered, if resolver == null) as we have added
         boolean auto = true; // false if the user has promised to enforce the sort order and resolve any duplicates
         QuickResolver<V> quickResolver;
+        final Recycler.Handle recycleHandle;
 
-        protected Builder(Comparator<? super V> comparator)
-        {
-            this(comparator, 16);
-        }
 
-        protected Builder(Comparator<? super V> comparator, int initialCapacity)
+        private Builder(Recycler.Handle handle)
         {
-            this.comparator = comparator;
-            this.values = new Object[initialCapacity];
+            this.recycleHandle = handle;
+            this.values = new Object[16];
         }
 
         public Builder<V> setQuickResolver(QuickResolver<V> quickResolver)
@@ -799,16 +828,30 @@
             return this;
         }
 
-        public void reuse()
+        public void recycle()
         {
-            reuse(comparator);
+            if (recycleHandle != null)
+            {
+                this.cleanup();
+                builderRecycler.recycle(this, recycleHandle);
+            }
         }
 
-        public void reuse(Comparator<? super V> comparator)
+        /**
+         * Cleans up the Builder instance before recycling it.
+         */
+        private void cleanup()
         {
-            this.comparator = comparator;
+            quickResolver = null;
+            Arrays.fill(values, 0, count, null);
             count = 0;
             detected = true;
+            auto = true;
+        }
+
+        private void reuse(Comparator<? super V> comparator)
+        {
+            this.comparator = comparator;
         }
 
         public Builder<V> auto(boolean auto)
@@ -1035,9 +1078,16 @@
 
         public Object[] build()
         {
-            if (auto)
-                autoEnforce();
-            return BTree.build(Arrays.asList(values).subList(0, count), UpdateFunction.noOp());
+            try
+            {
+                if (auto)
+                    autoEnforce();
+                return BTree.build(Arrays.asList(values).subList(0, count), UpdateFunction.noOp());
+            }
+            finally
+            {
+                this.recycle();
+            }
         }
     }
 
@@ -1073,11 +1123,20 @@
             return node.length >= FAN_FACTOR / 2 && node.length <= FAN_FACTOR + 1;
         }
 
+        final int keyCount = getBranchKeyEnd(node);
+        if ((!isRoot && keyCount < FAN_FACTOR / 2) || keyCount > FAN_FACTOR + 1)
+            return false;
+
         int type = 0;
+        int size = -1;
+        int[] sizeMap = getSizeMap(node);
         // compare each child node with the branch element at the head of this node it corresponds with
         for (int i = getChildStart(node); i < getChildEnd(node) ; i++)
         {
             Object[] child = (Object[]) node[i];
+            size += size(child) + 1;
+            if (sizeMap[i - getChildStart(node)] != size)
+                return false;
             Object localmax = i < node.length - 2 ? node[i - getChildStart(node)] : max;
             if (!isWellFormed(cmp, child, false, min, localmax))
                 return false;

diff --git a/src/java/org/apache/cassandra/utils/btree/BTreeRemoval.java b/src/java/org/apache/cassandra/utils/btree/BTreeRemoval.java
new file mode 100644
index 0000000..b72214f
--- /dev/null
+++ b/src/java/org/apache/cassandra/utils/btree/BTreeRemoval.java

@@ -0,0 +1,342 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.cassandra.utils.btree;
+
+import java.util.Arrays;
+import java.util.Comparator;
+
+public class BTreeRemoval
+{
+    /**
+     * Remove |elem| from |btree|. If it's not present then return |btree| itself.
+     */
+    public static <V> Object[] remove(final Object[] btree, final Comparator<? super V> comparator, final V elem)
+    {
+        if (BTree.isEmpty(btree))
+            return btree;
+        int index = -1;
+        V elemToSwap = null;
+        int lb = 0;
+        Object[] node = btree;
+        while (true)
+        {
+            int keyEnd = BTree.getKeyEnd(node);
+            int i = Arrays.binarySearch((V[]) node, 0, keyEnd, elem, comparator);
+
+            if (i >= 0)
+            {
+                if (BTree.isLeaf(node))
+                    index = lb + i;
+                else
+                {
+                    final int indexInNode = BTree.getSizeMap(node)[i];
+                    index = lb + indexInNode - 1;
+                    elemToSwap = BTree.findByIndex(node, indexInNode - 1);
+                }
+                break;
+            }
+            if (BTree.isLeaf(node))
+                return btree;
+
+            i = -1 - i;
+            if (i > 0)
+                lb += BTree.getSizeMap(node)[i - 1] + 1;
+
+            node = (Object[]) node[keyEnd + i];
+        }
+        if (BTree.size(btree) == 1)
+            return BTree.empty();
+        Object[] result = removeFromLeaf(btree, index);
+        if (elemToSwap != null)
+            BTree.replaceInSitu(result, index, elemToSwap);
+        return result;
+    }
+
+    /**
+     * Remove |elem| from |btree|. It has to be present and it has to reside in a leaf node.
+     */
+    private static Object[] removeFromLeaf(Object[] node, int index)
+    {
+        Object[] result = null;
+        Object[] prevNode = null;
+        int prevI = -1;
+        boolean needsCopy = true;
+        while (!BTree.isLeaf(node))
+        {
+            final int keyEnd = BTree.getBranchKeyEnd(node);
+            int i = -1 - Arrays.binarySearch(BTree.getSizeMap(node), index);
+            if (i > 0)
+                index -= (1 + BTree.getSizeMap(node)[i - 1]);
+            Object[] nextNode = (Object[]) node[keyEnd + i];
+            boolean nextNodeNeedsCopy = true;
+            if (BTree.getKeyEnd(nextNode) > BTree.MINIMAL_NODE_SIZE)
+                node = copyIfNeeded(node, needsCopy);
+            else if (i > 0 && BTree.getKeyEnd((Object[]) node[keyEnd + i - 1]) > BTree.MINIMAL_NODE_SIZE)
+            {
+                node = copyIfNeeded(node, needsCopy);
+                final Object[] leftNeighbour = (Object[]) node[keyEnd + i - 1];
+                index++;
+                if (!BTree.isLeaf(leftNeighbour))
+                    index += BTree.size((Object[])leftNeighbour[BTree.getChildEnd(leftNeighbour) - 1]);
+                nextNode = rotateLeft(node, i);
+            }
+            else if (i < keyEnd && BTree.getKeyEnd((Object[]) node[keyEnd + i + 1]) > BTree.MINIMAL_NODE_SIZE)
+            {
+                node = copyIfNeeded(node, needsCopy);
+                nextNode = rotateRight(node, i);
+            }
+            else
+            {
+                nextNodeNeedsCopy = false;
+                if (i > 0)
+                {
+                    final Object[] leftNeighbour = (Object[]) node[keyEnd + i - 1];
+                    final Object nodeKey = node[i - 1];
+                    node = keyEnd == 1 ? null : copyWithKeyAndChildRemoved(node, i - 1, i - 1, false);
+                    nextNode = merge(leftNeighbour, nextNode, nodeKey);
+                    i = i - 1;
+                    index += BTree.size(leftNeighbour) + 1;
+                }
+                else
+                {
+                    final Object[] rightNeighbour = (Object[]) node[keyEnd + i + 1];
+                    final Object nodeKey = node[i];
+                    node = keyEnd == 1 ? null : copyWithKeyAndChildRemoved(node, i, i, false);
+                    nextNode = merge(nextNode, rightNeighbour, nodeKey);
+                }
+            }
+
+            if (node != null)
+            {
+                final int[] sizeMap = BTree.getSizeMap(node);
+                for (int j = i; j < sizeMap.length; ++j)
+                    sizeMap[j] -= 1;
+                if (prevNode != null)
+                    prevNode[prevI] = node;
+                else
+                    result = node;
+                prevNode = node;
+                prevI = BTree.getChildStart(node) + i;
+            }
+
+            node = nextNode;
+            needsCopy = nextNodeNeedsCopy;
+        }
+        final int keyEnd = BTree.getLeafKeyEnd(node);
+        final Object[] newLeaf = new Object[(keyEnd & 1) == 1 ? keyEnd : keyEnd - 1];
+        copyKeys(node, newLeaf, 0, index);
+        if (prevNode != null)
+            prevNode[prevI] = newLeaf;
+        else
+            result = newLeaf;
+        return result;
+    }
+
+    private static Object[] rotateRight(final Object[] node, final int i) {
+        final int keyEnd = BTree.getBranchKeyEnd(node);
+        final Object[] nextNode = (Object[]) node[keyEnd + i];
+        final Object[] rightNeighbour = (Object[]) node[keyEnd + i + 1];
+        final boolean leaves = BTree.isLeaf(nextNode);
+        final int nextKeyEnd = BTree.getKeyEnd(nextNode);
+        final Object[] newChild = leaves ? null : (Object[]) rightNeighbour[BTree.getChildStart(rightNeighbour)];
+        final Object[] newNextNode =
+                copyWithKeyAndChildInserted(nextNode, nextKeyEnd, node[i], BTree.getChildCount(nextNode), newChild);
+        node[i] = rightNeighbour[0];
+        node[keyEnd + i + 1] = copyWithKeyAndChildRemoved(rightNeighbour, 0, 0, true);
+        BTree.getSizeMap(node)[i] +=
+                leaves ? 1 : 1 + BTree.size((Object[]) newNextNode[BTree.getChildEnd(newNextNode) - 1]);
+        return newNextNode;
+    }
+
+    private static Object[] rotateLeft(final Object[] node, final int i) {
+        final int keyEnd = BTree.getBranchKeyEnd(node);
+        final Object[] nextNode = (Object[]) node[keyEnd + i];
+        final Object[] leftNeighbour = (Object[]) node[keyEnd + i - 1];
+        final int leftNeighbourEndKey = BTree.getKeyEnd(leftNeighbour);
+        final boolean leaves = BTree.isLeaf(nextNode);
+        final Object[] newChild = leaves ? null : (Object[]) leftNeighbour[BTree.getChildEnd(leftNeighbour) - 1];
+        final Object[] newNextNode = copyWithKeyAndChildInserted(nextNode, 0, node[i - 1], 0, newChild);
+        node[i - 1] = leftNeighbour[leftNeighbourEndKey - 1];
+        node[keyEnd + i - 1] = copyWithKeyAndChildRemoved(leftNeighbour, leftNeighbourEndKey - 1, leftNeighbourEndKey, true);
+        BTree.getSizeMap(node)[i - 1] -= leaves ? 1 : 1 + BTree.getSizeMap(newNextNode)[0];
+        return newNextNode;
+    }
+
+    private static <V> Object[] copyWithKeyAndChildInserted(final Object[] node, final int keyIndex, final V key, final int childIndex, final Object[] child)
+    {
+        final boolean leaf = BTree.isLeaf(node);
+        final int keyEnd = BTree.getKeyEnd(node);
+        final Object[] copy;
+        if (leaf)
+            copy = new Object[keyEnd + ((keyEnd & 1) == 1 ? 2 : 1)];
+        else
+            copy = new Object[node.length + 2];
+
+        if (keyIndex > 0)
+            System.arraycopy(node, 0, copy, 0, keyIndex);
+        copy[keyIndex] = key;
+        if (keyIndex < keyEnd)
+            System.arraycopy(node, keyIndex, copy, keyIndex + 1, keyEnd - keyIndex);
+
+        if (!leaf)
+        {
+            if (childIndex > 0)
+                System.arraycopy(node,
+                                 BTree.getChildStart(node),
+                                 copy,
+                                 keyEnd + 1,
+                                 childIndex);
+            copy[keyEnd + 1 + childIndex] = child;
+            if (childIndex <= keyEnd)
+                System.arraycopy(node,
+                                 BTree.getChildStart(node) + childIndex,
+                                 copy,
+                                 keyEnd + childIndex + 2,
+                                 keyEnd - childIndex + 1);
+            final int[] sizeMap = BTree.getSizeMap(node);
+            final int[] newSizeMap = new int[sizeMap.length + 1];
+            if (childIndex > 0)
+                System.arraycopy(sizeMap, 0, newSizeMap, 0, childIndex);
+            final int childSize = BTree.size(child);
+            newSizeMap[childIndex] = childSize + ((childIndex == 0) ? 0 : newSizeMap[childIndex - 1] + 1);
+            for (int i = childIndex + 1; i < newSizeMap.length; ++i)
+                newSizeMap[i] = sizeMap[i - 1] + childSize + 1;
+            copy[copy.length - 1] = newSizeMap;
+        }
+        return copy;
+    }
+
+    private static Object[] copyWithKeyAndChildRemoved(final Object[] node, final int keyIndex, final int childIndex, final boolean substractSize)
+    {
+        final boolean leaf = BTree.isLeaf(node);
+        final Object[] newNode;
+        if (leaf)
+        {
+            final int keyEnd = BTree.getKeyEnd(node);
+            newNode = new Object[keyEnd - ((keyEnd & 1) == 1 ? 0 : 1)];
+        }
+        else
+        {
+            newNode = new Object[node.length - 2];
+        }
+        int offset = copyKeys(node, newNode, 0, keyIndex);
+        if (!leaf)
+        {
+            offset = copyChildren(node, newNode, offset, childIndex);
+            final int[] nodeSizeMap = BTree.getSizeMap(node);
+            final int[] newNodeSizeMap = new int[nodeSizeMap.length - 1];
+            int pos = 0;
+            final int sizeToRemove = BTree.size((Object[])node[BTree.getChildStart(node) + childIndex]) + 1;
+            for (int i = 0; i < nodeSizeMap.length; ++i)
+                if (i != childIndex)
+                    newNodeSizeMap[pos++] = nodeSizeMap[i] -
+                        ((substractSize && i > childIndex) ? sizeToRemove : 0);
+            newNode[offset] = newNodeSizeMap;
+        }
+        return newNode;
+    }
+
+    private static <V> Object[] merge(final Object[] left, final Object[] right, final V nodeKey) {
+        assert BTree.getKeyEnd(left) == BTree.MINIMAL_NODE_SIZE;
+        assert BTree.getKeyEnd(right) == BTree.MINIMAL_NODE_SIZE;
+        final boolean leaves = BTree.isLeaf(left);
+        final Object[] result;
+        if (leaves)
+            result = new Object[BTree.MINIMAL_NODE_SIZE * 2 + 1];
+        else
+            result = new Object[left.length + right.length];
+        int offset = 0;
+        offset = copyKeys(left, result, offset);
+        result[offset++] = nodeKey;
+        offset = copyKeys(right, result, offset);
+        if (!leaves)
+        {
+            offset = copyChildren(left, result, offset);
+            offset = copyChildren(right, result, offset);
+            final int[] leftSizeMap = BTree.getSizeMap(left);
+            final int[] rightSizeMap = BTree.getSizeMap(right);
+            final int[] newSizeMap = new int[leftSizeMap.length + rightSizeMap.length];
+            offset = 0;
+            offset = copySizeMap(leftSizeMap, newSizeMap, offset, 0);
+            offset = copySizeMap(rightSizeMap, newSizeMap, offset, leftSizeMap[leftSizeMap.length - 1] + 1);
+            result[result.length - 1] = newSizeMap;
+        }
+        return result;
+    }
+
+    private static int copyKeys(final Object[] from, final Object[] to, final int offset)
+    {
+        final int keysCount = BTree.getKeyEnd(from);
+        System.arraycopy(from, 0, to, offset, keysCount);
+        return offset + keysCount;
+    }
+
+    private static int copyKeys(final Object[] from, final Object[] to, final int offset, final int skipIndex)
+    {
+        final int keysCount = BTree.getKeyEnd(from);
+        if (skipIndex > 0)
+            System.arraycopy(from, 0, to, offset, skipIndex);
+        if (skipIndex + 1 < keysCount)
+            System.arraycopy(from, skipIndex + 1, to, offset + skipIndex, keysCount - skipIndex - 1);
+        return offset + keysCount - 1;
+    }
+
+    private static int copyChildren(final Object[] from, final Object[] to, final int offset)
+    {
+        assert !BTree.isLeaf(from);
+        final int start = BTree.getChildStart(from);
+        final int childCount = BTree.getChildCount(from);
+        System.arraycopy(from, start, to, offset, childCount);
+        return offset + childCount;
+    }
+
+    private static int copyChildren(final Object[] from, final Object[] to, final int offset, final int skipIndex)
+    {
+        assert !BTree.isLeaf(from);
+        final int start = BTree.getChildStart(from);
+        final int childCount = BTree.getChildCount(from);
+        if (skipIndex > 0)
+            System.arraycopy(from, start, to, offset, skipIndex);
+        if (skipIndex + 1 <= childCount)
+            System.arraycopy(from, start + skipIndex + 1, to, offset + skipIndex, childCount - skipIndex - 1);
+        return offset + childCount - 1;
+    }
+
+    private static int copySizeMap(final int[] from, final int[] to, final int offset, final int extra)
+    {
+        for (int i = 0; i < from.length; ++i)
+            to[offset + i] = from[i] + extra;
+        return offset + from.length;
+    }
+
+    private static Object[] copyIfNeeded(final Object[] node, boolean needCopy)
+    {
+        if (!needCopy) return node;
+        final Object[] copy = new Object[node.length];
+        System.arraycopy(node, 0, copy, 0, node.length);
+        if (!BTree.isLeaf(node))
+        {
+            final int[] sizeMap = BTree.getSizeMap(node);
+            final int[] copySizeMap = new int[sizeMap.length];
+            System.arraycopy(sizeMap, 0, copySizeMap, 0, sizeMap.length);
+            copy[copy.length - 1] = copySizeMap;
+        }
+        return copy;
+    }
+}

diff --git a/src/java/org/apache/cassandra/utils/btree/BTreeSet.java b/src/java/org/apache/cassandra/utils/btree/BTreeSet.java
index 03fa1ec..a59e481 100644
--- a/src/java/org/apache/cassandra/utils/btree/BTreeSet.java
+++ b/src/java/org/apache/cassandra/utils/btree/BTreeSet.java

@@ -21,14 +21,11 @@
 import java.util.*;
 
 import com.google.common.collect.ImmutableList;
-import com.google.common.collect.Iterables;
 import com.google.common.collect.Ordering;
 
 import org.apache.cassandra.utils.btree.BTree.Dir;
 
 import static org.apache.cassandra.utils.btree.BTree.findIndex;
-import static org.apache.cassandra.utils.btree.BTree.lower;
-import static org.apache.cassandra.utils.btree.BTree.toArray;
 
 public class BTreeSet<V> implements NavigableSet<V>, List<V>
 {
@@ -639,6 +636,6 @@
 
     public static <V> BTreeSet<V> of(Comparator<? super V> comparator, V value)
     {
-        return new BTreeSet<>(BTree.build(ImmutableList.of(value), UpdateFunction.<V>noOp()), comparator);
+        return new BTreeSet<>(BTree.singleton(value), comparator);
     }
 }

diff --git a/src/java/org/apache/cassandra/utils/btree/TreeBuilder.java b/src/java/org/apache/cassandra/utils/btree/TreeBuilder.java
index 024902e..f42de0f 100644
--- a/src/java/org/apache/cassandra/utils/btree/TreeBuilder.java
+++ b/src/java/org/apache/cassandra/utils/btree/TreeBuilder.java

@@ -20,6 +20,8 @@
 
 import java.util.Comparator;
 
+import io.netty.util.Recycler;
+
 import static org.apache.cassandra.utils.btree.BTree.EMPTY_LEAF;
 import static org.apache.cassandra.utils.btree.BTree.FAN_SHIFT;
 import static org.apache.cassandra.utils.btree.BTree.POSITIVE_INFINITY;
@@ -28,12 +30,32 @@
  * A class for constructing a new BTree, either from an existing one and some set of modifications
  * or a new tree from a sorted collection of items.
  * <p/>
- * This is a fairly heavy-weight object, so a ThreadLocal instance is created for making modifications to a tree
+ * This is a fairly heavy-weight object, so a Recycled instance is created for making modifications to a tree
  */
 final class TreeBuilder
 {
+
+    private final static Recycler<TreeBuilder> builderRecycler = new Recycler<TreeBuilder>()
+    {
+        protected TreeBuilder newObject(Handle handle)
+        {
+            return new TreeBuilder(handle);
+        }
+    };
+
+    public static TreeBuilder newInstance()
+    {
+        return builderRecycler.get();
+    }
+
+    private final Recycler.Handle recycleHandle;
     private final NodeBuilder rootBuilder = new NodeBuilder();
 
+    private TreeBuilder(Recycler.Handle handle)
+    {
+        this.recycleHandle = handle;
+    }
+
     /**
      * At the highest level, we adhere to the classic b-tree insertion algorithm:
      *
@@ -93,6 +115,9 @@
 
         Object[] r = current.toNode();
         current.clear();
+
+        builderRecycler.recycle(this, recycleHandle);
+
         return r;
     }
 
@@ -114,6 +139,9 @@
 
         Object[] r = current.toNode();
         current.clear();
+
+        builderRecycler.recycle(this, recycleHandle);
+
         return r;
     }
 }

diff --git a/src/java/org/apache/cassandra/utils/btree/TreeCursor.java b/src/java/org/apache/cassandra/utils/btree/TreeCursor.java
index 5e55698..60c0eb9 100644
--- a/src/java/org/apache/cassandra/utils/btree/TreeCursor.java
+++ b/src/java/org/apache/cassandra/utils/btree/TreeCursor.java

@@ -219,8 +219,7 @@
             return;
         }
 
-        NodeCursor<K> cur = this.cur;
-        cur = root();
+        NodeCursor<K> cur = root();
         assert cur.nodeOffset == 0;
         while (true)
         {

diff --git a/src/java/org/apache/cassandra/utils/concurrent/Ref.java b/src/java/org/apache/cassandra/utils/concurrent/Ref.java
index 71d04f0..c3dd8b2 100644
--- a/src/java/org/apache/cassandra/utils/concurrent/Ref.java
+++ b/src/java/org/apache/cassandra/utils/concurrent/Ref.java

@@ -68,13 +68,14 @@
  * This class' functionality is achieved by what may look at first glance like a complex web of references,
  * but boils down to:
  *
+ * {@code
  * Target --> selfRef --> [Ref.State] <--> Ref.GlobalState --> Tidy
  *                                             ^
  *                                             |
  * Ref ----------------------------------------
  *                                             |
  * Global -------------------------------------
- *
+ * }
  * So that, if Target is collected, Impl is collected and, hence, so is selfRef.
  *
  * Once ref or selfRef are collected, the paired Ref.State's release method is called, which if it had

diff --git a/src/java/org/apache/cassandra/utils/concurrent/WrappedSharedCloseable.java b/src/java/org/apache/cassandra/utils/concurrent/WrappedSharedCloseable.java
index 0eefae3..31894b1 100644
--- a/src/java/org/apache/cassandra/utils/concurrent/WrappedSharedCloseable.java
+++ b/src/java/org/apache/cassandra/utils/concurrent/WrappedSharedCloseable.java

@@ -20,8 +20,6 @@
 
 import java.util.Arrays;
 
-import org.apache.cassandra.utils.Throwables;
-
 import static org.apache.cassandra.utils.Throwables.maybeFail;
 import static org.apache.cassandra.utils.Throwables.merge;
 
@@ -35,7 +33,7 @@
 
     public WrappedSharedCloseable(final AutoCloseable closeable)
     {
-        this(new AutoCloseable[] { closeable});
+        this(new AutoCloseable[] {closeable});
     }
 
     public WrappedSharedCloseable(final AutoCloseable[] closeable)

diff --git a/src/java/org/apache/cassandra/utils/memory/AbstractAllocator.java b/src/java/org/apache/cassandra/utils/memory/AbstractAllocator.java
index 9066335..c3cac2b 100644
--- a/src/java/org/apache/cassandra/utils/memory/AbstractAllocator.java
+++ b/src/java/org/apache/cassandra/utils/memory/AbstractAllocator.java

@@ -20,7 +20,6 @@
 import java.nio.ByteBuffer;
 
 import org.apache.cassandra.db.Clustering;
-import org.apache.cassandra.db.Columns;
 import org.apache.cassandra.db.rows.BTreeRow;
 import org.apache.cassandra.db.rows.Cell;
 import org.apache.cassandra.db.rows.Row;

diff --git a/src/java/org/apache/cassandra/utils/memory/BufferPool.java b/src/java/org/apache/cassandra/utils/memory/BufferPool.java
index f972059..3458c62 100644
--- a/src/java/org/apache/cassandra/utils/memory/BufferPool.java
+++ b/src/java/org/apache/cassandra/utils/memory/BufferPool.java

@@ -21,22 +21,23 @@
 import java.lang.ref.PhantomReference;
 import java.lang.ref.ReferenceQueue;
 import java.nio.ByteBuffer;
-import java.util.*;
+import java.util.Queue;
 import java.util.concurrent.*;
 import java.util.concurrent.atomic.AtomicLong;
 import java.util.concurrent.atomic.AtomicLongFieldUpdater;
 
-import org.apache.cassandra.concurrent.NamedThreadFactory;
-import org.apache.cassandra.io.compress.BufferType;
-import org.apache.cassandra.io.util.FileUtils;
-import org.apache.cassandra.utils.NoSpamLogger;
-
 import com.google.common.annotations.VisibleForTesting;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
+import io.netty.util.concurrent.FastThreadLocal;
+import org.apache.cassandra.concurrent.NamedThreadFactory;
 import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.io.compress.BufferType;
+import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.metrics.BufferPoolMetrics;
+import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.NoSpamLogger;
 import org.apache.cassandra.utils.concurrent.Ref;
 
 /**
@@ -45,7 +46,7 @@
 public class BufferPool
 {
     /** The size of a page aligned buffer, 64KiB */
-    static final int CHUNK_SIZE = 64 << 10;
+    public static final int CHUNK_SIZE = 64 << 10;
 
     @VisibleForTesting
     public static long MEMORY_USAGE_THRESHOLD = DatabaseDescriptor.getFileCacheSizeInMB() * 1024L * 1024L;
@@ -67,7 +68,7 @@
     private static final GlobalPool globalPool = new GlobalPool();
 
     /** A thread local pool of chunks, where chunks come from the global pool */
-    private static final ThreadLocal<LocalPool> localPool = new ThreadLocal<LocalPool>() {
+    private static final FastThreadLocal<LocalPool> localPool = new FastThreadLocal<LocalPool>() {
         @Override
         protected LocalPool initialValue()
         {
@@ -115,7 +116,7 @@
             return ret;
 
         if (logger.isTraceEnabled())
-            logger.trace("Requested buffer size {} has been allocated directly due to lack of capacity", size);
+            logger.trace("Requested buffer size {} has been allocated directly due to lack of capacity", FBUtilities.prettyPrintMemory(size));
 
         return localPool.get().allocate(size, allocateOnHeapWhenExhausted);
     }
@@ -131,7 +132,9 @@
         if (size > CHUNK_SIZE)
         {
             if (logger.isTraceEnabled())
-                logger.trace("Requested buffer size {} is bigger than {}, allocating directly", size, CHUNK_SIZE);
+                logger.trace("Requested buffer size {} is bigger than {}, allocating directly",
+                             FBUtilities.prettyPrintMemory(size),
+                             FBUtilities.prettyPrintMemory(CHUNK_SIZE));
 
             return localPool.get().allocate(size, allocateOnHeapWhenExhausted);
         }
@@ -223,8 +226,8 @@
             if (DISABLED)
                 logger.info("Global buffer pool is disabled, allocating {}", ALLOCATE_ON_HEAP_WHEN_EXAHUSTED ? "on heap" : "off heap");
             else
-                logger.info("Global buffer pool is enabled, when pool is exahusted (max is {} mb) it will allocate {}",
-                            MEMORY_USAGE_THRESHOLD / (1024L * 1024L),
+                logger.info("Global buffer pool is enabled, when pool is exhausted (max is {}) it will allocate {}",
+                            FBUtilities.prettyPrintMemory(MEMORY_USAGE_THRESHOLD),
                             ALLOCATE_ON_HEAP_WHEN_EXAHUSTED ? "on heap" : "off heap");
         }
 
@@ -260,8 +263,9 @@
                 long cur = memoryUsage.get();
                 if (cur + MACRO_CHUNK_SIZE > MEMORY_USAGE_THRESHOLD)
                 {
-                    noSpamLogger.info("Maximum memory usage reached ({} bytes), cannot allocate chunk of {} bytes",
-                                      MEMORY_USAGE_THRESHOLD, MACRO_CHUNK_SIZE);
+                    noSpamLogger.info("Maximum memory usage reached ({}), cannot allocate chunk of {}",
+                                      FBUtilities.prettyPrintMemory(MEMORY_USAGE_THRESHOLD),
+                                      FBUtilities.prettyPrintMemory(MACRO_CHUNK_SIZE));
                     return false;
                 }
                 if (memoryUsage.compareAndSet(cur, cur + MACRO_CHUNK_SIZE))
@@ -269,7 +273,22 @@
             }
 
             // allocate a large chunk
-            Chunk chunk = new Chunk(allocateDirectAligned(MACRO_CHUNK_SIZE));
+            Chunk chunk;
+            try
+            {
+                chunk = new Chunk(allocateDirectAligned(MACRO_CHUNK_SIZE));
+            }
+            catch (OutOfMemoryError oom)
+            {
+                noSpamLogger.error("Buffer pool failed to allocate chunk of {}, current size {} ({}). " +
+                                   "Attempting to continue; buffers will be allocated in on-heap memory which can degrade performance. " +
+                                   "Make sure direct memory size (-XX:MaxDirectMemorySize) is large enough to accommodate off-heap memtables and caches.",
+                                   FBUtilities.prettyPrintMemory(MACRO_CHUNK_SIZE),
+                                   FBUtilities.prettyPrintMemory(sizeInBytes()),
+                                   oom.toString());
+                return false;
+            }
+
             chunk.acquire(null);
             macroChunks.add(chunk);
             for (int i = 0 ; i < MACRO_CHUNK_SIZE ; i += CHUNK_SIZE)

diff --git a/src/java/org/apache/cassandra/utils/memory/EnsureOnHeap.java b/src/java/org/apache/cassandra/utils/memory/EnsureOnHeap.java
new file mode 100644
index 0000000..8345118
--- /dev/null
+++ b/src/java/org/apache/cassandra/utils/memory/EnsureOnHeap.java

@@ -0,0 +1,170 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.utils.memory;
+
+import java.util.Iterator;
+
+import org.apache.cassandra.db.BufferDecoratedKey;
+import org.apache.cassandra.db.Clustering;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.DeletionInfo;
+import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.db.transform.Transformation;
+import org.apache.cassandra.utils.SearchIterator;
+
+public abstract class EnsureOnHeap extends Transformation
+{
+    public abstract DecoratedKey applyToPartitionKey(DecoratedKey key);
+    public abstract UnfilteredRowIterator applyToPartition(UnfilteredRowIterator partition);
+    public abstract SearchIterator<Clustering, Row> applyToPartition(SearchIterator<Clustering, Row> partition);
+    public abstract Iterator<Row> applyToPartition(Iterator<Row> partition);
+    public abstract DeletionInfo applyToDeletionInfo(DeletionInfo deletionInfo);
+    public abstract Row applyToRow(Row row);
+    public abstract Row applyToStatic(Row row);
+    public abstract RangeTombstoneMarker applyToMarker(RangeTombstoneMarker marker);
+
+    static class CloneToHeap extends EnsureOnHeap
+    {
+        protected BaseRowIterator<?> applyToPartition(BaseRowIterator partition)
+        {
+            return partition instanceof UnfilteredRowIterator
+                   ? Transformation.apply((UnfilteredRowIterator) partition, this)
+                   : Transformation.apply((RowIterator) partition, this);
+        }
+
+        public DecoratedKey applyToPartitionKey(DecoratedKey key)
+        {
+            return new BufferDecoratedKey(key.getToken(), HeapAllocator.instance.clone(key.getKey()));
+        }
+
+        public Row applyToRow(Row row)
+        {
+            if (row == null)
+                return null;
+            return Rows.copy(row, HeapAllocator.instance.cloningBTreeRowBuilder()).build();
+        }
+
+        public Row applyToStatic(Row row)
+        {
+            if (row == Rows.EMPTY_STATIC_ROW)
+                return row;
+            return applyToRow(row);
+        }
+
+        public RangeTombstoneMarker applyToMarker(RangeTombstoneMarker marker)
+        {
+            return marker.copy(HeapAllocator.instance);
+        }
+
+        public UnfilteredRowIterator applyToPartition(UnfilteredRowIterator partition)
+        {
+            return Transformation.apply(partition, this);
+        }
+
+        public SearchIterator<Clustering, Row> applyToPartition(SearchIterator<Clustering, Row> partition)
+        {
+            return new SearchIterator<Clustering, Row>()
+            {
+                public boolean hasNext()
+                {
+                    return partition.hasNext();
+                }
+
+                public Row next(Clustering key)
+                {
+                    return applyToRow(partition.next(key));
+                }
+            };
+        }
+
+        public Iterator<Row> applyToPartition(Iterator<Row> partition)
+        {
+            return new Iterator<Row>()
+            {
+                public boolean hasNext()
+                {
+                    return partition.hasNext();
+                }
+                public Row next()
+                {
+                    return applyToRow(partition.next());
+                }
+                public void remove()
+                {
+                    partition.remove();
+                }
+            };
+        }
+
+        public DeletionInfo applyToDeletionInfo(DeletionInfo deletionInfo)
+        {
+            return deletionInfo.copy(HeapAllocator.instance);
+        }
+    }
+
+    static class NoOp extends EnsureOnHeap
+    {
+        protected BaseRowIterator<?> applyToPartition(BaseRowIterator partition)
+        {
+            return partition;
+        }
+
+        public DecoratedKey applyToPartitionKey(DecoratedKey key)
+        {
+            return key;
+        }
+
+        public Row applyToRow(Row row)
+        {
+            return row;
+        }
+
+        public Row applyToStatic(Row row)
+        {
+            return row;
+        }
+
+        public RangeTombstoneMarker applyToMarker(RangeTombstoneMarker marker)
+        {
+            return marker;
+        }
+
+        public UnfilteredRowIterator applyToPartition(UnfilteredRowIterator partition)
+        {
+            return partition;
+        }
+
+        public SearchIterator<Clustering, Row> applyToPartition(SearchIterator<Clustering, Row> partition)
+        {
+            return partition;
+        }
+
+        public Iterator<Row> applyToPartition(Iterator<Row> partition)
+        {
+            return partition;
+        }
+
+        public DeletionInfo applyToDeletionInfo(DeletionInfo deletionInfo)
+        {
+            return deletionInfo;
+        }
+    }
+}

diff --git a/src/java/org/apache/cassandra/utils/memory/HeapAllocator.java b/src/java/org/apache/cassandra/utils/memory/HeapAllocator.java
index 41877f5..8333142 100644
--- a/src/java/org/apache/cassandra/utils/memory/HeapAllocator.java
+++ b/src/java/org/apache/cassandra/utils/memory/HeapAllocator.java

@@ -33,4 +33,9 @@
     {
         return ByteBuffer.allocate(size);
     }
+
+    public boolean allocatingOnHeap()
+    {
+        return true;
+    }
 }

diff --git a/src/java/org/apache/cassandra/utils/memory/HeapPool.java b/src/java/org/apache/cassandra/utils/memory/HeapPool.java
index 19f81be..593b443 100644
--- a/src/java/org/apache/cassandra/utils/memory/HeapPool.java
+++ b/src/java/org/apache/cassandra/utils/memory/HeapPool.java

@@ -32,11 +32,6 @@
         super(maxOnHeapMemory, 0, cleanupThreshold, cleaner);
     }
 
-    public boolean needToCopyOnHeap()
-    {
-        return false;
-    }
-
     public MemtableAllocator newAllocator()
     {
         // TODO

diff --git a/src/java/org/apache/cassandra/utils/memory/MemoryUtil.java b/src/java/org/apache/cassandra/utils/memory/MemoryUtil.java
index 22ecbf5..3a18964 100644
--- a/src/java/org/apache/cassandra/utils/memory/MemoryUtil.java
+++ b/src/java/org/apache/cassandra/utils/memory/MemoryUtil.java

@@ -31,7 +31,7 @@
     private static final long UNSAFE_COPY_THRESHOLD = 1024 * 1024L; // copied from java.nio.Bits
 
     private static final Unsafe unsafe;
-    private static final Class<?> DIRECT_BYTE_BUFFER_CLASS;
+    private static final Class<?> DIRECT_BYTE_BUFFER_CLASS, RO_DIRECT_BYTE_BUFFER_CLASS;
     private static final long DIRECT_BYTE_BUFFER_ADDRESS_OFFSET;
     private static final long DIRECT_BYTE_BUFFER_CAPACITY_OFFSET;
     private static final long DIRECT_BYTE_BUFFER_LIMIT_OFFSET;
@@ -67,6 +67,7 @@
             DIRECT_BYTE_BUFFER_POSITION_OFFSET = unsafe.objectFieldOffset(Buffer.class.getDeclaredField("position"));
             DIRECT_BYTE_BUFFER_ATTACHMENT_OFFSET = unsafe.objectFieldOffset(clazz.getDeclaredField("att"));
             DIRECT_BYTE_BUFFER_CLASS = clazz;
+            RO_DIRECT_BYTE_BUFFER_CLASS = ByteBuffer.allocateDirect(0).asReadOnlyBuffer().getClass();
 
             clazz = ByteBuffer.allocate(0).getClass();
             BYTE_BUFFER_OFFSET_OFFSET = unsafe.objectFieldOffset(ByteBuffer.class.getDeclaredField("offset"));
@@ -107,6 +108,11 @@
         unsafe.putByte(address, b);
     }
 
+    public static void setByte(long address, int count, byte b)
+    {
+        unsafe.setMemory(address, count, b);
+    }
+
     public static void setShort(long address, short s)
     {
         unsafe.putShort(address, s);
@@ -150,13 +156,23 @@
 
     public static ByteBuffer getByteBuffer(long address, int length)
     {
-        ByteBuffer instance = getHollowDirectByteBuffer();
+        return getByteBuffer(address, length, ByteOrder.nativeOrder());
+    }
+
+    public static ByteBuffer getByteBuffer(long address, int length, ByteOrder order)
+    {
+        ByteBuffer instance = getHollowDirectByteBuffer(order);
         setByteBuffer(instance, address, length);
         return instance;
     }
 
     public static ByteBuffer getHollowDirectByteBuffer()
     {
+        return getHollowDirectByteBuffer(ByteOrder.nativeOrder());
+    }
+
+    public static ByteBuffer getHollowDirectByteBuffer(ByteOrder order)
+    {
         ByteBuffer instance;
         try
         {
@@ -166,7 +182,7 @@
         {
             throw new AssertionError(e);
         }
-        instance.order(ByteOrder.nativeOrder());
+        instance.order(order);
         return instance;
     }
 
@@ -206,7 +222,7 @@
 
     public static ByteBuffer duplicateDirectByteBuffer(ByteBuffer source, ByteBuffer hollowBuffer)
     {
-        assert source.getClass() == DIRECT_BYTE_BUFFER_CLASS;
+        assert source.getClass() == DIRECT_BYTE_BUFFER_CLASS || source.getClass() == RO_DIRECT_BYTE_BUFFER_CLASS;
         unsafe.putLong(hollowBuffer, DIRECT_BYTE_BUFFER_ADDRESS_OFFSET, unsafe.getLong(source, DIRECT_BYTE_BUFFER_ADDRESS_OFFSET));
         unsafe.putInt(hollowBuffer, DIRECT_BYTE_BUFFER_POSITION_OFFSET, unsafe.getInt(source, DIRECT_BYTE_BUFFER_POSITION_OFFSET));
         unsafe.putInt(hollowBuffer, DIRECT_BYTE_BUFFER_LIMIT_OFFSET, unsafe.getInt(source, DIRECT_BYTE_BUFFER_LIMIT_OFFSET));

diff --git a/src/java/org/apache/cassandra/utils/memory/MemtableAllocator.java b/src/java/org/apache/cassandra/utils/memory/MemtableAllocator.java
index 5a64c3c..b326af7 100644
--- a/src/java/org/apache/cassandra/utils/memory/MemtableAllocator.java
+++ b/src/java/org/apache/cassandra/utils/memory/MemtableAllocator.java

@@ -20,7 +20,6 @@
 
 import java.util.concurrent.atomic.AtomicLongFieldUpdater;
 
-import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.utils.concurrent.OpOrder;
@@ -32,7 +31,7 @@
     private final SubAllocator offHeap;
     volatile LifeCycle state = LifeCycle.LIVE;
 
-    static enum LifeCycle
+    enum LifeCycle
     {
         LIVE, DISCARDING, DISCARDED;
         LifeCycle transition(LifeCycle targetState)
@@ -62,6 +61,7 @@
     public abstract Row.Builder rowBuilder(OpOrder.Group opGroup);
     public abstract DecoratedKey clone(DecoratedKey key, OpOrder.Group opGroup);
     public abstract DataReclaimer reclaimer();
+    public abstract EnsureOnHeap ensureOnHeap();
 
     public SubAllocator onHeap()
     {
@@ -251,4 +251,5 @@
         private static final AtomicLongFieldUpdater<SubAllocator> reclaimingUpdater = AtomicLongFieldUpdater.newUpdater(SubAllocator.class, "reclaiming");
     }
 
+
 }

diff --git a/src/java/org/apache/cassandra/utils/memory/MemtablePool.java b/src/java/org/apache/cassandra/utils/memory/MemtablePool.java
index e792944..c082856 100644
--- a/src/java/org/apache/cassandra/utils/memory/MemtablePool.java
+++ b/src/java/org/apache/cassandra/utils/memory/MemtablePool.java

@@ -63,7 +63,6 @@
         return cleaner == null ? null : new MemtableCleanerThread<>(this, cleaner);
     }
 
-    public abstract boolean needToCopyOnHeap();
     public abstract MemtableAllocator newAllocator();
 
     /**

diff --git a/src/java/org/apache/cassandra/utils/memory/NativeAllocator.java b/src/java/org/apache/cassandra/utils/memory/NativeAllocator.java
index 4857f34..5bdaf08 100644
--- a/src/java/org/apache/cassandra/utils/memory/NativeAllocator.java
+++ b/src/java/org/apache/cassandra/utils/memory/NativeAllocator.java

@@ -24,10 +24,8 @@
 import java.util.concurrent.atomic.AtomicInteger;
 import java.util.concurrent.atomic.AtomicReference;
 
-import org.apache.cassandra.config.CFMetaData;
-import org.apache.cassandra.db.DecoratedKey;
-import org.apache.cassandra.db.NativeDecoratedKey;
-import org.apache.cassandra.db.rows.Row;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.utils.concurrent.OpOrder;
 
 /**
@@ -57,16 +55,42 @@
 
     private final AtomicReference<Region> currentRegion = new AtomicReference<>();
     private final ConcurrentLinkedQueue<Region> regions = new ConcurrentLinkedQueue<>();
+    private final EnsureOnHeap.CloneToHeap cloneToHeap = new EnsureOnHeap.CloneToHeap();
 
     protected NativeAllocator(NativePool pool)
     {
         super(pool.onHeap.newAllocator(), pool.offHeap.newAllocator());
     }
 
+    private static class CloningBTreeRowBuilder extends BTreeRow.Builder
+    {
+        final OpOrder.Group writeOp;
+        final NativeAllocator allocator;
+        private CloningBTreeRowBuilder(OpOrder.Group writeOp, NativeAllocator allocator)
+        {
+            super(true);
+            this.writeOp = writeOp;
+            this.allocator = allocator;
+        }
+
+        @Override
+        public void newRow(Clustering clustering)
+        {
+            if (clustering != Clustering.STATIC_CLUSTERING)
+                clustering = new NativeClustering(allocator, writeOp, clustering);
+            super.newRow(clustering);
+        }
+
+        @Override
+        public void addCell(Cell cell)
+        {
+            super.addCell(new NativeCell(allocator, writeOp, cell));
+        }
+    }
+
     public Row.Builder rowBuilder(OpOrder.Group opGroup)
     {
-        // TODO
-        throw new UnsupportedOperationException();
+        return new CloningBTreeRowBuilder(opGroup, this);
     }
 
     public DecoratedKey clone(DecoratedKey key, OpOrder.Group writeOp)
@@ -80,6 +104,11 @@
         return NO_OP;
     }
 
+    public EnsureOnHeap ensureOnHeap()
+    {
+        return cloneToHeap;
+    }
+
     public long allocate(int size, OpOrder.Group opGroup)
     {
         assert size >= 0;
@@ -146,6 +175,7 @@
     {
         for (Region region : regions)
             MemoryUtil.free(region.peer);
+
         super.setDiscarded();
     }
 
@@ -191,12 +221,12 @@
          * Offset for the next allocation, or the sentinel value -1
          * which implies that the region is still uninitialized.
          */
-        private AtomicInteger nextFreeOffset = new AtomicInteger(0);
+        private final AtomicInteger nextFreeOffset = new AtomicInteger(0);
 
         /**
          * Total number of allocations satisfied from this buffer
          */
-        private AtomicInteger allocCount = new AtomicInteger();
+        private final AtomicInteger allocCount = new AtomicInteger();
 
         /**
          * Create an uninitialized region. Note that memory is not allocated yet, so

diff --git a/src/java/org/apache/cassandra/utils/memory/NativePool.java b/src/java/org/apache/cassandra/utils/memory/NativePool.java
index 012867a..800c777 100644
--- a/src/java/org/apache/cassandra/utils/memory/NativePool.java
+++ b/src/java/org/apache/cassandra/utils/memory/NativePool.java

@@ -26,12 +26,6 @@
     }
 
     @Override
-    public boolean needToCopyOnHeap()
-    {
-        return true;
-    }
-
-    @Override
     public NativeAllocator newAllocator()
     {
         return new NativeAllocator(this);

diff --git a/src/java/org/apache/cassandra/utils/memory/SlabAllocator.java b/src/java/org/apache/cassandra/utils/memory/SlabAllocator.java
index 8ffead1..38b0885 100644
--- a/src/java/org/apache/cassandra/utils/memory/SlabAllocator.java
+++ b/src/java/org/apache/cassandra/utils/memory/SlabAllocator.java

@@ -59,13 +59,20 @@
 
     // this queue is used to keep references to off-heap allocated regions so that we can free them when we are discarded
     private final ConcurrentLinkedQueue<Region> offHeapRegions = new ConcurrentLinkedQueue<>();
-    private AtomicLong unslabbedSize = new AtomicLong(0);
+    private final AtomicLong unslabbedSize = new AtomicLong(0);
     private final boolean allocateOnHeapOnly;
+    private final EnsureOnHeap ensureOnHeap;
 
     SlabAllocator(SubAllocator onHeap, SubAllocator offHeap, boolean allocateOnHeapOnly)
     {
         super(onHeap, offHeap);
         this.allocateOnHeapOnly = allocateOnHeapOnly;
+        this.ensureOnHeap = allocateOnHeapOnly ? new EnsureOnHeap.NoOp() : new EnsureOnHeap.CloneToHeap();
+    }
+
+    public EnsureOnHeap ensureOnHeap()
+    {
+        return ensureOnHeap;
     }
 
     public ByteBuffer allocate(int size)
@@ -168,18 +175,18 @@
         /**
          * Actual underlying data
          */
-        private ByteBuffer data;
+        private final ByteBuffer data;
 
         /**
          * Offset for the next allocation, or the sentinel value -1
          * which implies that the region is still uninitialized.
          */
-        private AtomicInteger nextFreeOffset = new AtomicInteger(0);
+        private final AtomicInteger nextFreeOffset = new AtomicInteger(0);
 
         /**
          * Total number of allocations satisfied from this buffer
          */
-        private AtomicInteger allocCount = new AtomicInteger();
+        private final AtomicInteger allocCount = new AtomicInteger();
 
         /**
          * Create an uninitialized region. Note that memory is not allocated yet, so

diff --git a/src/java/org/apache/cassandra/utils/memory/SlabPool.java b/src/java/org/apache/cassandra/utils/memory/SlabPool.java
index c5c44e1..bd7ec1f 100644
--- a/src/java/org/apache/cassandra/utils/memory/SlabPool.java
+++ b/src/java/org/apache/cassandra/utils/memory/SlabPool.java

@@ -32,9 +32,4 @@
     {
         return new SlabAllocator(onHeap.newAllocator(), offHeap.newAllocator(), allocateOnHeap);
     }
-
-    public boolean needToCopyOnHeap()
-    {
-        return !allocateOnHeap;
-    }
 }

diff --git a/src/java/org/apache/cassandra/utils/vint/VIntCoding.java b/src/java/org/apache/cassandra/utils/vint/VIntCoding.java
index daf5006..3872424 100644
--- a/src/java/org/apache/cassandra/utils/vint/VIntCoding.java
+++ b/src/java/org/apache/cassandra/utils/vint/VIntCoding.java

@@ -50,6 +50,7 @@
 import java.io.DataOutput;
 import java.io.IOException;
 
+import io.netty.util.concurrent.FastThreadLocal;
 import net.nicoulaj.compilecommand.annotations.Inline;
 
 /**
@@ -103,7 +104,7 @@
         return Integer.numberOfLeadingZeros(~firstByte) - 24;
     }
 
-    protected static final ThreadLocal<byte[]> encodingBuffer = new ThreadLocal<byte[]>()
+    protected static final FastThreadLocal<byte[]> encodingBuffer = new FastThreadLocal<byte[]>()
     {
         @Override
         public byte[] initialValue()

diff --git a/src/resources/org/apache/cassandra/cql3/functions/JavaSourceUDF.txt b/src/resources/org/apache/cassandra/cql3/functions/JavaSourceUDF.txt
index 4bd3601..d736a5a 100644
--- a/src/resources/org/apache/cassandra/cql3/functions/JavaSourceUDF.txt
+++ b/src/resources/org/apache/cassandra/cql3/functions/JavaSourceUDF.txt

@@ -1,17 +1,20 @@
 package #package_name#;
 
 import java.nio.ByteBuffer;
-import java.util.List;
+import java.util.*;
 
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 import com.datastax.driver.core.TypeCodec;
+import com.datastax.driver.core.TupleValue;
+import com.datastax.driver.core.UDTValue;
 
 public final class #class_name# extends JavaUDF
 {
-    public #class_name#(TypeCodec<Object> returnCodec, TypeCodec<Object>[] argCodecs)
+    public #class_name#(TypeCodec<Object> returnCodec, TypeCodec<Object>[] argCodecs, UDFContext udfContext)
     {
-        super(returnCodec, argCodecs);
+        super(returnCodec, argCodecs, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/ar_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/ar_ST.txt
new file mode 100644
index 0000000..97bedb6
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/ar_ST.txt

@@ -0,0 +1,163 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html
+ب
+ا
+أ
+،
+عشر
+عدد
+عدة
+عشرة
+عدم
+عام
+عاما
+عن
+عند
+عندما
+على
+عليه
+عليها
+زيارة
+سنة
+سنوات
+تم
+ضد
+بعد
+بعض
+اعادة
+اعلنت
+بسبب
+حتى
+اذا
+احد
+اثر
+برس
+باسم
+غدا
+شخصا
+صباح
+اطار
+اربعة
+اخرى
+بان
+اجل
+غير
+بشكل
+حاليا
+بن
+به
+ثم
+اف
+ان
+او
+اي
+بها
+صفر
+حيث
+اكد
+الا
+اما
+امس
+السابق
+التى
+التي
+اكثر
+ايار
+ايضا
+ثلاثة
+الذاتي
+الاخيرة
+الثاني
+الثانية
+الذى
+الذي
+الان
+امام
+ايام
+خلال
+حوالى
+الذين
+الاول
+الاولى
+بين
+ذلك
+دون
+حول
+حين
+الف
+الى
+انه
+اول
+ضمن
+انها
+جميع
+الماضي
+الوقت
+المقبل
+اليوم
+ـ
+ف
+و
+و6
+قد
+لا
+ما
+مع
+مساء
+هذا
+واحد
+واضاف
+واضافت
+فان
+قبل
+قال
+كان
+لدى
+نحو
+هذه
+وان
+واكد
+كانت
+واوضح
+مايو
+فى
+في
+كل
+لم
+لن
+له
+من
+هو
+هي
+قوة
+كما
+لها
+منذ
+وقد
+ولا
+نفسه
+لقاء
+مقابل
+هناك
+وقال
+وكان
+نهاية
+وقالت
+وكانت
+للامم
+فيه
+كلم
+لكن
+وفي
+وقف
+ولم
+ومن
+وهو
+وهي
+يوم
+فيها
+منها
+مليار
+لوكالة
+يكون
+يمكن
+مليون

diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/bg_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/bg_ST.txt
new file mode 100644
index 0000000..ed6049d
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/bg_ST.txt

@@ -0,0 +1,260 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html
+а
+автентичен
+аз
+ако
+ала
+бе
+без
+беше
+би
+бивш
+бивша
+бившо
+бил
+била
+били
+било
+благодаря
+близо
+бъдат
+бъде
+бяха
+в
+вас
+ваш
+ваша
+вероятно
+вече
+взема
+ви
+вие
+винаги
+внимава
+време
+все
+всеки
+всички
+всичко
+всяка
+във
+въпреки
+върху
+г
+ги
+главен
+главна
+главно
+глас
+го
+година
+години
+годишен
+д
+да
+дали
+два
+двама
+двамата
+две
+двете
+ден
+днес
+дни
+до
+добра
+добре
+добро
+добър
+докато
+докога
+дори
+досега
+доста
+друг
+друга
+други
+е
+евтин
+едва
+един
+една
+еднаква
+еднакви
+еднакъв
+едно
+екип
+ето
+живот
+за
+забавям
+зад
+заедно
+заради
+засега
+заспал
+затова
+защо
+защото
+и
+из
+или
+им
+има
+имат
+иска
+й
+каза
+как
+каква
+какво
+както
+какъв
+като
+кога
+когато
+което
+които
+кой
+който
+колко
+която
+къде
+където
+към
+лесен
+лесно
+ли
+лош
+м
+май
+малко
+ме
+между
+мек
+мен
+месец
+ми
+много
+мнозина
+мога
+могат
+може
+мокър
+моля
+момента
+му
+н
+на
+над
+назад
+най
+направи
+напред
+например
+нас
+не
+него
+нещо
+нея
+ни
+ние
+никой
+нито
+нищо
+но
+нов
+нова
+нови
+новина
+някои
+някой
+няколко
+няма
+обаче
+около
+освен
+особено
+от
+отгоре
+отново
+още
+пак
+по
+повече
+повечето
+под
+поне
+поради
+после
+почти
+прави
+пред
+преди
+през
+при
+пък
+първата
+първи
+първо
+пъти
+равен
+равна
+с
+са
+сам
+само
+се
+сега
+си
+син
+скоро
+след
+следващ
+сме
+смях
+според
+сред
+срещу
+сте
+съм
+със
+също
+т
+тази
+така
+такива
+такъв
+там
+твой
+те
+тези
+ти
+т.н.
+то
+това
+тогава
+този
+той
+толкова
+точно
+три
+трябва
+тук
+тъй
+тя
+тях
+у
+утре
+харесва
+хиляди
+ч
+часа
+че
+често
+чрез
+ще
+щом
+юмрук
+я
+як
\ No newline at end of file

diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/cs_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/cs_ST.txt
new file mode 100644
index 0000000..49b52e1
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/cs_ST.txt

@@ -0,0 +1,257 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html
+ačkoli
+ahoj
+ale
+anebo
+ano
+asi
+aspoň
+během
+bez
+beze
+blízko
+bohužel
+brzo
+bude
+budeme
+budeš
+budete
+budou
+budu
+byl
+byla
+byli
+bylo
+byly
+bys
+čau
+chce
+chceme
+chceš
+chcete
+chci
+chtějí
+chtít
+chut'
+chuti
+co
+čtrnáct
+čtyři
+dál
+dále
+daleko
+děkovat
+děkujeme
+děkuji
+den
+deset
+devatenáct
+devět
+do
+dobrý
+docela
+dva
+dvacet
+dvanáct
+dvě
+hodně
+já
+jak
+jde
+je
+jeden
+jedenáct
+jedna
+jedno
+jednou
+jedou
+jeho
+její
+jejich
+jemu
+jen
+jenom
+ještě
+jestli
+jestliže
+jí
+jich
+jím
+jimi
+jinak
+jsem
+jsi
+jsme
+jsou
+jste
+kam
+kde
+kdo
+kdy
+když
+ke
+kolik
+kromě
+která
+které
+kteří
+který
+kvůli
+má
+mají
+málo
+mám
+máme
+máš
+máte
+mé
+mě
+mezi
+mí
+mít
+mně
+mnou
+moc
+mohl
+mohou
+moje
+moji
+možná
+můj
+musí
+může
+my
+na
+nad
+nade
+nám
+námi
+naproti
+nás
+náš
+naše
+naši
+ne
+ně
+nebo
+nebyl
+nebyla
+nebyli
+nebyly
+něco
+nedělá
+nedělají
+nedělám
+neděláme
+neděláš
+neděláte
+nějak
+nejsi
+někde
+někdo
+nemají
+nemáme
+nemáte
+neměl
+němu
+není
+nestačí
+nevadí
+než
+nic
+nich
+ním
+nimi
+nula
+od
+ode
+on
+ona
+oni
+ono
+ony
+osm
+osmnáct
+pak
+patnáct
+pět
+po
+pořád
+potom
+pozdě
+před
+přes
+přese
+pro
+proč
+prosím
+prostě
+proti
+protože
+rovně
+se
+sedm
+sedmnáct
+šest
+šestnáct
+skoro
+smějí
+smí
+snad
+spolu
+sta
+sté
+sto
+ta
+tady
+tak
+takhle
+taky
+tam
+tamhle
+tamhleto
+tamto
+tě
+tebe
+tebou
+ted'
+tedy
+ten
+ti
+tisíc
+tisíce
+to
+tobě
+tohle
+toto
+třeba
+tři
+třináct
+trošku
+tvá
+tvé
+tvoje
+tvůj
+ty
+určitě
+už
+vám
+vámi
+vás
+váš
+vaše
+vaši
+ve
+večer
+vedle
+vlastně
+všechno
+všichni
+vůbec
+vy
+vždy
+za
+zač
+zatímco
+ze
+že

diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/de_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/de_ST.txt
new file mode 100644
index 0000000..747e682
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/de_ST.txt

@@ -0,0 +1,604 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html

+a

+ab

+aber

+aber

+ach

+acht

+achte

+achten

+achter

+achtes

+ag

+alle

+allein

+allem

+allen

+aller

+allerdings

+alles

+allgemeinen

+als

+als

+also

+am

+an

+andere

+anderen

+andern

+anders

+au

+auch

+auch

+auf

+aus

+ausser

+au�er

+ausserdem

+au�erdem

+b

+bald

+bei

+beide

+beiden

+beim

+beispiel

+bekannt

+bereits

+besonders

+besser

+besten

+bin

+bis

+bisher

+bist

+c

+d

+da

+dabei

+dadurch

+daf�r

+dagegen

+daher

+dahin

+dahinter

+damals

+damit

+danach

+daneben

+dank

+dann

+daran

+darauf

+daraus

+darf

+darfst

+darin

+dar�ber

+darum

+darunter

+das

+das

+dasein

+daselbst

+dass

+da�

+dasselbe

+davon

+davor

+dazu

+dazwischen

+dein

+deine

+deinem

+deiner

+dem

+dementsprechend

+demgegen�ber

+demgem�ss

+demgem��

+demselben

+demzufolge

+den

+denen

+denn

+denn

+denselben

+der

+deren

+derjenige

+derjenigen

+dermassen

+derma�en

+derselbe

+derselben

+des

+deshalb

+desselben

+dessen

+deswegen

+d.h

+dich

+die

+diejenige

+diejenigen

+dies

+diese

+dieselbe

+dieselben

+diesem

+diesen

+dieser

+dieses

+dir

+doch

+dort

+drei

+drin

+dritte

+dritten

+dritter

+drittes

+du

+durch

+durchaus

+d�rfen

+d�rft

+durfte

+durften

+e

+eben

+ebenso

+ehrlich

+ei

+ei,

+ei,

+eigen

+eigene

+eigenen

+eigener

+eigenes

+ein

+einander

+eine

+einem

+einen

+einer

+eines

+einige

+einigen

+einiger

+einiges

+einmal

+einmal

+eins

+elf

+en

+ende

+endlich

+entweder

+entweder

+er

+Ernst

+erst

+erste

+ersten

+erster

+erstes

+es

+etwa

+etwas

+euch

+f

+fr�her

+f�nf

+f�nfte

+f�nften

+f�nfter

+f�nftes

+f�r

+g

+gab

+ganz

+ganze

+ganzen

+ganzer

+ganzes

+gar

+gedurft

+gegen

+gegen�ber

+gehabt

+gehen

+geht

+gekannt

+gekonnt

+gemacht

+gemocht

+gemusst

+genug

+gerade

+gern

+gesagt

+gesagt

+geschweige

+gewesen

+gewollt

+geworden

+gibt

+ging

+gleich

+gott

+gross

+gro�

+grosse

+gro�e

+grossen

+gro�en

+grosser

+gro�er

+grosses

+gro�es

+gut

+gute

+guter

+gutes

+h

+habe

+haben

+habt

+hast

+hat

+hatte

+h�tte

+hatten

+h�tten

+heisst

+her

+heute

+hier

+hin

+hinter

+hoch

+i

+ich

+ihm

+ihn

+ihnen

+ihr

+ihre

+ihrem

+ihren

+ihrer

+ihres

+im

+im

+immer

+in

+in

+indem

+infolgedessen

+ins

+irgend

+ist

+j

+ja

+ja

+jahr

+jahre

+jahren

+je

+jede

+jedem

+jeden

+jeder

+jedermann

+jedermanns

+jedoch

+jemand

+jemandem

+jemanden

+jene

+jenem

+jenen

+jener

+jenes

+jetzt

+k

+kam

+kann

+kannst

+kaum

+kein

+keine

+keinem

+keinen

+keiner

+kleine

+kleinen

+kleiner

+kleines

+kommen

+kommt

+k�nnen

+k�nnt

+konnte

+k�nnte

+konnten

+kurz

+l

+lang

+lange

+lange

+leicht

+leide

+lieber

+los

+m

+machen

+macht

+machte

+mag

+magst

+mahn

+man

+manche

+manchem

+manchen

+mancher

+manches

+mann

+mehr

+mein

+meine

+meinem

+meinen

+meiner

+meines

+mensch

+menschen

+mich

+mir

+mit

+mittel

+mochte

+m�chte

+mochten

+m�gen

+m�glich

+m�gt

+morgen

+muss

+mu�

+m�ssen

+musst

+m�sst

+musste

+mussten

+n

+na

+nach

+nachdem

+nahm

+nat�rlich

+neben

+nein

+neue

+neuen

+neun

+neunte

+neunten

+neunter

+neuntes

+nicht

+nicht

+nichts

+nie

+niemand

+niemandem

+niemanden

+noch

+nun

+nun

+nur

+o

+ob

+ob

+oben

+oder

+oder

+offen

+oft

+oft

+ohne

+Ordnung

+p

+q

+r

+recht

+rechte

+rechten

+rechter

+rechtes

+richtig

+rund

+s

+sa

+sache

+sagt

+sagte

+sah

+satt

+schlecht

+Schluss

+schon

+sechs

+sechste

+sechsten

+sechster

+sechstes

+sehr

+sei

+sei

+seid

+seien

+sein

+seine

+seinem

+seinen

+seiner

+seines

+seit

+seitdem

+selbst

+selbst

+sich

+sie

+sieben

+siebente

+siebenten

+siebenter

+siebentes

+sind

+so

+solang

+solche

+solchem

+solchen

+solcher

+solches

+soll

+sollen

+sollte

+sollten

+sondern

+sonst

+sowie

+sp�ter

+statt

+t

+tag

+tage

+tagen

+tat

+teil

+tel

+tritt

+trotzdem

+tun

+u

+�ber

+�berhaupt

+�brigens

+uhr

+um

+und

+und?

+uns

+unser

+unsere

+unserer

+unter

+v

+vergangenen

+viel

+viele

+vielem

+vielen

+vielleicht

+vier

+vierte

+vierten

+vierter

+viertes

+vom

+von

+vor

+w

+wahr?

+w�hrend

+w�hrenddem

+w�hrenddessen

+wann

+war

+w�re

+waren

+wart

+warum

+was

+wegen

+weil

+weit

+weiter

+weitere

+weiteren

+weiteres

+welche

+welchem

+welchen

+welcher

+welches

+wem

+wen

+wenig

+wenig

+wenige

+weniger

+weniges

+wenigstens

+wenn

+wenn

+wer

+werde

+werden

+werdet

+wessen

+wie

+wie

+wieder

+will

+willst

+wir

+wird

+wirklich

+wirst

+wo

+wohl

+wollen

+wollt

+wollte

+wollten

+worden

+wurde

+w�rde

+wurden

+w�rden

+x

+y

+z

+z.b

+zehn

+zehnte

+zehnten

+zehnter

+zehntes

+zeit

+zu

+zuerst

+zugleich

+zum

+zum

+zun�chst

+zur

+zur�ck

+zusammen

+zwanzig

+zwar

+zwar

+zwei

+zweite

+zweiten

+zweiter

+zweites

+zwischen

+zw�lf


diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/en_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/en_ST.txt
new file mode 100644
index 0000000..d30da31
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/en_ST.txt

@@ -0,0 +1,572 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html

+a

+a's

+able

+about

+above

+according

+accordingly

+across

+actually

+after

+afterwards

+again

+against

+ain't

+all

+allow

+allows

+almost

+alone

+along

+already

+also

+although

+always

+am

+among

+amongst

+an

+and

+another

+any

+anybody

+anyhow

+anyone

+anything

+anyway

+anyways

+anywhere

+apart

+appear

+appreciate

+appropriate

+are

+aren't

+around

+as

+aside

+ask

+asking

+associated

+at

+available

+away

+awfully

+b

+be

+became

+because

+become

+becomes

+becoming

+been

+before

+beforehand

+behind

+being

+believe

+below

+beside

+besides

+best

+better

+between

+beyond

+both

+brief

+but

+by

+c

+c'mon

+c's

+came

+can

+can't

+cannot

+cant

+cause

+causes

+certain

+certainly

+changes

+clearly

+co

+com

+come

+comes

+concerning

+consequently

+consider

+considering

+contain

+containing

+contains

+corresponding

+could

+couldn't

+course

+currently

+d

+definitely

+described

+despite

+did

+didn't

+different

+do

+does

+doesn't

+doing

+don't

+done

+down

+downwards

+during

+e

+each

+edu

+eg

+eight

+either

+else

+elsewhere

+enough

+entirely

+especially

+et

+etc

+even

+ever

+every

+everybody

+everyone

+everything

+everywhere

+ex

+exactly

+example

+except

+f

+far

+few

+fifth

+first

+five

+followed

+following

+follows

+for

+former

+formerly

+forth

+four

+from

+further

+furthermore

+g

+get

+gets

+getting

+given

+gives

+go

+goes

+going

+gone

+got

+gotten

+greetings

+h

+had

+hadn't

+happens

+hardly

+has

+hasn't

+have

+haven't

+having

+he

+he's

+hello

+help

+hence

+her

+here

+here's

+hereafter

+hereby

+herein

+hereupon

+hers

+herself

+hi

+him

+himself

+his

+hither

+hopefully

+how

+howbeit

+however

+i

+i'd

+i'll

+i'm

+i've

+ie

+if

+ignored

+immediate

+in

+inasmuch

+inc

+indeed

+indicate

+indicated

+indicates

+inner

+insofar

+instead

+into

+inward

+is

+isn't

+it

+it'd

+it'll

+it's

+its

+itself

+j

+just

+k

+keep

+keeps

+kept

+know

+knows

+known

+l

+last

+lately

+later

+latter

+latterly

+least

+less

+lest

+let

+let's

+like

+liked

+likely

+little

+look

+looking

+looks

+ltd

+m

+mainly

+many

+may

+maybe

+me

+mean

+meanwhile

+merely

+might

+more

+moreover

+most

+mostly

+much

+must

+my

+myself

+n

+name

+namely

+nd

+near

+nearly

+necessary

+need

+needs

+neither

+never

+nevertheless

+new

+next

+nine

+no

+nobody

+non

+none

+noone

+nor

+normally

+not

+nothing

+novel

+now

+nowhere

+o

+obviously

+of

+off

+often

+oh

+ok

+okay

+old

+on

+once

+one

+ones

+only

+onto

+or

+other

+others

+otherwise

+ought

+our

+ours

+ourselves

+out

+outside

+over

+overall

+own

+p

+particular

+particularly

+per

+perhaps

+placed

+please

+plus

+possible

+presumably

+probably

+provides

+q

+que

+quite

+qv

+r

+rather

+rd

+re

+really

+reasonably

+regarding

+regardless

+regards

+relatively

+respectively

+right

+s

+said

+same

+saw

+say

+saying

+says

+second

+secondly

+see

+seeing

+seem

+seemed

+seeming

+seems

+seen

+self

+selves

+sensible

+sent

+serious

+seriously

+seven

+several

+shall

+she

+should

+shouldn't

+since

+six

+so

+some

+somebody

+somehow

+someone

+something

+sometime

+sometimes

+somewhat

+somewhere

+soon

+sorry

+specified

+specify

+specifying

+still

+sub

+such

+sup

+sure

+t

+t's

+take

+taken

+tell

+tends

+th

+than

+thank

+thanks

+thanx

+that

+that's

+thats

+the

+their

+theirs

+them

+themselves

+then

+thence

+there

+there's

+thereafter

+thereby

+therefore

+therein

+theres

+thereupon

+these

+they

+they'd

+they'll

+they're

+they've

+think

+third

+this

+thorough

+thoroughly

+those

+though

+three

+through

+throughout

+thru

+thus

+to

+together

+too

+took

+toward

+towards

+tried

+tries

+truly

+try

+trying

+twice

+two

+u

+un

+under

+unfortunately

+unless

+unlikely

+until

+unto

+up

+upon

+us

+use

+used

+useful

+uses

+using

+usually

+uucp

+v

+value

+various

+very

+via

+viz

+vs

+w

+want

+wants

+was

+wasn't

+way

+we

+we'd

+we'll

+we're

+we've

+welcome

+well

+went

+were

+weren't

+what

+what's

+whatever

+when

+whence

+whenever

+where

+where's

+whereafter

+whereas

+whereby

+wherein

+whereupon

+wherever

+whether

+which

+while

+whither

+who

+who's

+whoever

+whole

+whom

+whose

+why

+will

+willing

+wish

+with

+within

+without

+won't

+wonder

+would

+would

+wouldn't

+x

+y

+yes

+yet

+you

+you'd

+you'll

+you're

+you've

+your

+yours

+yourself

+yourselves

+z

+zero


diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/es_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/es_ST.txt
new file mode 100644
index 0000000..75e2086
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/es_ST.txt

@@ -0,0 +1,308 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html

+a

+acuerdo

+adelante

+ademas

+adem�s

+adrede

+ahi

+ah�

+ahora

+al

+alli

+all�

+alrededor

+antano

+anta�o

+ante

+antes

+apenas

+aproximadamente

+aquel

+aqu�l

+aquella

+aqu�lla

+aquellas

+aqu�llas

+aquello

+aquellos

+aqu�llos

+aqui

+aqu�

+arribaabajo

+asi

+as�

+aun

+a�n

+aunque

+b

+bajo

+bastante

+bien

+breve

+c

+casi

+cerca

+claro

+como

+c�mo

+con

+conmigo

+contigo

+contra

+cual

+cu�l

+cuales

+cu�les

+cuando

+cu�ndo

+cuanta

+cu�nta

+cuantas

+cu�ntas

+cuanto

+cu�nto

+cuantos

+cu�ntos

+d

+de

+debajo

+del

+delante

+demasiado

+dentro

+deprisa

+desde

+despacio

+despues

+despu�s

+detras

+detr�s

+dia

+d�a

+dias

+d�as

+donde

+d�nde

+dos

+durante

+e

+el

+�l

+ella

+ellas

+ellos

+en

+encima

+enfrente

+enseguida

+entre

+es

+esa

+�sa

+esas

+�sas

+ese

+�se

+eso

+esos

+�sos

+esta

+est�

+�sta

+estado

+estados

+estan

+est�n

+estar

+estas

+�stas

+este

+�ste

+esto

+estos

+�stos

+ex

+excepto

+f

+final

+fue

+fuera

+fueron

+g

+general

+gran

+h

+ha

+habia

+hab�a

+habla

+hablan

+hace

+hacia

+han

+hasta

+hay

+horas

+hoy

+i

+incluso

+informo

+inform�

+j

+junto

+k

+l

+la

+lado

+las

+le

+lejos

+lo

+los

+luego

+m

+mal

+mas

+m�s

+mayor

+me

+medio

+mejor

+menos

+menudo

+mi

+m�

+mia

+m�a

+mias

+m�as

+mientras

+mio

+m�o

+mios

+m�os

+mis

+mismo

+mucho

+muy

+n

+nada

+nadie

+ninguna

+no

+nos

+nosotras

+nosotros

+nuestra

+nuestras

+nuestro

+nuestros

+nueva

+nuevo

+nunca

+o

+os

+otra

+otros

+p

+pais

+pa�s

+para

+parte

+pasado

+peor

+pero

+poco

+por

+porque

+pronto

+proximo

+pr�ximo

+puede

+q

+qeu

+que

+qu�

+quien

+qui�n

+quienes

+qui�nes

+quiza

+quiz�

+quizas

+quiz�s

+r

+raras

+repente

+s

+salvo

+se

+s�

+segun

+seg�n

+ser

+sera

+ser�

+si

+s�

+sido

+siempre

+sin

+sobre

+solamente

+solo

+s�lo

+son

+soyos

+su

+supuesto

+sus

+suya

+suyas

+suyo

+t

+tal

+tambien

+tambi�n

+tampoco

+tarde

+te

+temprano

+ti

+tiene

+todavia

+todav�a

+todo

+todos

+tras

+tu

+t�

+tus

+tuya

+tuyas

+tuyo

+tuyos

+u

+un

+una

+unas

+uno

+unos

+usted

+ustedes

+v

+veces

+vez

+vosotras

+vosotros

+vuestra

+vuestras

+vuestro

+vuestros

+w

+x

+y

+ya

+yo

+z


diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/fi_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/fi_ST.txt
new file mode 100644
index 0000000..3c8bfd5
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/fi_ST.txt

@@ -0,0 +1,748 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html

+aiemmin

+aika

+aikaa

+aikaan

+aikaisemmin

+aikaisin

+aikajen

+aikana

+aikoina

+aikoo

+aikovat

+aina

+ainakaan

+ainakin

+ainoa

+ainoat

+aiomme

+aion

+aiotte

+aist

+aivan

+ajan

+�l�

+alas

+alemmas

+�lk��n

+alkuisin

+alkuun

+alla

+alle

+aloitamme

+aloitan

+aloitat

+aloitatte

+aloitattivat

+aloitettava

+aloitettevaksi

+aloitettu

+aloitimme

+aloitin

+aloitit

+aloititte

+aloittaa

+aloittamatta

+aloitti

+aloittivat

+alta

+aluksi

+alussa

+alusta

+annettavaksi

+annetteva

+annettu

+antaa

+antamatta

+antoi

+aoua

+apu

+asia

+asiaa

+asian

+asiasta

+asiat

+asioiden

+asioihin

+asioita

+asti

+avuksi

+avulla

+avun

+avutta

+edell�

+edelle

+edelleen

+edelt�

+edemm�s

+edes

+edess�

+edest�

+ehk�

+ei

+eik�

+eilen

+eiv�t

+eli

+ellei

+elleiv�t

+ellemme

+ellen

+ellet

+ellette

+emme

+en

+en��

+enemm�n

+eniten

+ennen

+ensi

+ensimm�inen

+ensimm�iseksi

+ensimm�isen

+ensimm�isen�

+ensimm�iset

+ensimm�isi�

+ensimm�isiksi

+ensimm�isin�

+ensimm�ist�

+ensin

+entinen

+entisen

+entisi�

+entist�

+entisten

+er��t

+er�iden

+er�s

+eri

+eritt�in

+erityisesti

+esi

+esiin

+esill�

+esimerkiksi

+et

+eteen

+etenkin

+ett�

+ette

+ettei

+halua

+haluaa

+haluamatta

+haluamme

+haluan

+haluat

+haluatte

+haluavat

+halunnut

+halusi

+halusimme

+halusin

+halusit

+halusitte

+halusivat

+halutessa

+haluton

+h�n

+h�neen

+h�nell�

+h�nelle

+h�nelt�

+h�nen

+h�ness�

+h�nest�

+h�net

+he

+hei

+heid�n

+heihin

+heille

+heilt�

+heiss�

+heist�

+heit�

+helposti

+heti

+hetkell�

+hieman

+huolimatta

+huomenna

+hyv�

+hyv��

+hyv�t

+hyvi�

+hyvien

+hyviin

+hyviksi

+hyville

+hyvilt�

+hyvin

+hyvin�

+hyviss�

+hyvist�

+ihan

+ilman

+ilmeisesti

+itse

+itse��n

+itsens�

+ja

+j��

+j�lkeen

+j�lleen

+jo

+johon

+joiden

+joihin

+joiksi

+joilla

+joille

+joilta

+joissa

+joista

+joita

+joka

+jokainen

+jokin

+joko

+joku

+jolla

+jolle

+jolloin

+jolta

+jompikumpi

+jonka

+jonkin

+jonne

+joo

+jopa

+jos

+joskus

+jossa

+josta

+jota

+jotain

+joten

+jotenkin

+jotenkuten

+jotka

+jotta

+jouduimme

+jouduin

+jouduit

+jouduitte

+joudumme

+joudun

+joudutte

+joukkoon

+joukossa

+joukosta

+joutua

+joutui

+joutuivat

+joutumaan

+joutuu

+joutuvat

+juuri

+kahdeksan

+kahdeksannen

+kahdella

+kahdelle

+kahdelta

+kahden

+kahdessa

+kahdesta

+kahta

+kahteen

+kai

+kaiken

+kaikille

+kaikilta

+kaikkea

+kaikki

+kaikkia

+kaikkiaan

+kaikkialla

+kaikkialle

+kaikkialta

+kaikkien

+kaikkin

+kaksi

+kannalta

+kannattaa

+kanssa

+kanssaan

+kanssamme

+kanssani

+kanssanne

+kanssasi

+kauan

+kauemmas

+kautta

+kehen

+keiden

+keihin

+keiksi

+keill�

+keille

+keilt�

+kein�

+keiss�

+keist�

+keit�

+keitt�

+keitten

+keneen

+keneksi

+kenell�

+kenelle

+kenelt�

+kenen

+kenen�

+keness�

+kenest�

+kenet

+kenett�

+kenness�st�

+kerran

+kerta

+kertaa

+kesken

+keskim��rin

+ket�

+ketk�

+kiitos

+kohti

+koko

+kokonaan

+kolmas

+kolme

+kolmen

+kolmesti

+koska

+koskaan

+kovin

+kuin

+kuinka

+kuitenkaan

+kuitenkin

+kuka

+kukaan

+kukin

+kumpainen

+kumpainenkaan

+kumpi

+kumpikaan

+kumpikin

+kun

+kuten

+kuuden

+kuusi

+kuutta

+kyll�

+kymmenen

+kyse

+l�hekk�in

+l�hell�

+l�helle

+l�helt�

+l�hemm�s

+l�hes

+l�hinn�

+l�htien

+l�pi

+liian

+liki

+lis��

+lis�ksi

+luo

+mahdollisimman

+mahdollista

+me

+meid�n

+meill�

+meille

+melkein

+melko

+menee

+meneet

+menemme

+menen

+menet

+menette

+menev�t

+meni

+menimme

+menin

+menit

+meniv�t

+menness�

+mennyt

+menossa

+mihin

+mik�

+mik��n

+mik�li

+mikin

+miksi

+milloin

+min�

+minne

+minun

+minut

+miss�

+mist�

+mit�

+mit��n

+miten

+moi

+molemmat

+mones

+monesti

+monet

+moni

+moniaalla

+moniaalle

+moniaalta

+monta

+muassa

+muiden

+muita

+muka

+mukaan

+mukaansa

+mukana

+mutta

+muu

+muualla

+muualle

+muualta

+muuanne

+muulloin

+muun

+muut

+muuta

+muutama

+muutaman

+muuten

+my�hemmin

+my�s

+my�sk��n

+my�skin

+my�t�

+n�iden

+n�in

+n�iss�

+n�iss�hin

+n�iss�lle

+n�iss�lt�

+n�iss�st�

+n�it�

+n�m�

+ne

+nelj�

+nelj��

+nelj�n

+niiden

+niin

+niist�

+niit�

+noin

+nopeammin

+nopeasti

+nopeiten

+nro

+nuo

+nyt

+ohi

+oikein

+ole

+olemme

+olen

+olet

+olette

+oleva

+olevan

+olevat

+oli

+olimme

+olin

+olisi

+olisimme

+olisin

+olisit

+olisitte

+olisivat

+olit

+olitte

+olivat

+olla

+olleet

+olli

+ollut

+oma

+omaa

+omaan

+omaksi

+omalle

+omalta

+oman

+omassa

+omat

+omia

+omien

+omiin

+omiksi

+omille

+omilta

+omissa

+omista

+on

+onkin

+onko

+ovat

+p��lle

+paikoittain

+paitsi

+pakosti

+paljon

+paremmin

+parempi

+parhaillaan

+parhaiten

+per�ti

+perusteella

+pian

+pieneen

+pieneksi

+pienell�

+pienelle

+pienelt�

+pienempi

+pienest�

+pieni

+pienin

+puolesta

+puolestaan

+runsaasti

+saakka

+sadam

+sama

+samaa

+samaan

+samalla

+samallalta

+samallassa

+samallasta

+saman

+samat

+samoin

+sata

+sataa

+satojen

+se

+seitsem�n

+sek�

+sen

+seuraavat

+siell�

+sielt�

+siihen

+siin�

+siis

+siit�

+sijaan

+siksi

+sill�

+silloin

+silti

+sin�

+sinne

+sinua

+sinulle

+sinulta

+sinun

+sinussa

+sinusta

+sinut

+sis�kk�in

+sis�ll�

+sit�

+siten

+sitten

+suoraan

+suuntaan

+suuren

+suuret

+suuri

+suuria

+suurin

+suurten

+taa

+t��ll�

+t��lt�

+taas

+taemmas

+t�h�n

+tahansa

+tai

+takaa

+takaisin

+takana

+takia

+t�ll�

+t�ll�in

+t�m�

+t�m�n

+t�n�

+t�n��n

+t�nne

+tapauksessa

+t�ss�

+t�st�

+t�t�

+t�ten

+tavalla

+tavoitteena

+t�ysin

+t�ytyv�t

+t�ytyy

+te

+tietysti

+todella

+toinen

+toisaalla

+toisaalle

+toisaalta

+toiseen

+toiseksi

+toisella

+toiselle

+toiselta

+toisemme

+toisen

+toisensa

+toisessa

+toisesta

+toista

+toistaiseksi

+toki

+tosin

+tuhannen

+tuhat

+tule

+tulee

+tulemme

+tulen

+tulet

+tulette

+tulevat

+tulimme

+tulin

+tulisi

+tulisimme

+tulisin

+tulisit

+tulisitte

+tulisivat

+tulit

+tulitte

+tulivat

+tulla

+tulleet

+tullut

+tuntuu

+tuo

+tuolla

+tuolloin

+tuolta

+tuonne

+tuskin

+tyk�

+usea

+useasti

+useimmiten

+usein

+useita

+uudeksi

+uudelleen

+uuden

+uudet

+uusi

+uusia

+uusien

+uusinta

+uuteen

+uutta

+vaan

+v�h�n

+v�hemm�n

+v�hint��n

+v�hiten

+vai

+vaiheessa

+vaikea

+vaikean

+vaikeat

+vaikeilla

+vaikeille

+vaikeilta

+vaikeissa

+vaikeista

+vaikka

+vain

+v�lill�

+varmasti

+varsin

+varsinkin

+varten

+vasta

+vastaan

+vastakkain

+verran

+viel�

+vierekk�in

+vieri

+viiden

+viime

+viimeinen

+viimeisen

+viimeksi

+viisi

+voi

+voidaan

+voimme

+voin

+voisi

+voit

+voitte

+voivat

+vuoden

+vuoksi

+vuosi

+vuosien

+vuosina

+vuotta

+yh�

+yhdeks�n

+yhden

+yhdess�

+yht�

+yht��ll�

+yht��lle

+yht��lt�

+yht��n

+yhteen

+yhteens�

+yhteydess�

+yhteyteen

+yksi

+yksin

+yksitt�in

+yleens�

+ylemm�s

+yli

+yl�s

+ymp�ri


diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/fr_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/fr_ST.txt
new file mode 100644
index 0000000..c84d8c1
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/fr_ST.txt

@@ -0,0 +1,464 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html

+a

+�

+�

+abord

+afin

+ah

+ai

+aie

+ainsi

+allaient

+allo

+all�

+allons

+apr�s

+assez

+attendu

+au

+aucun

+aucune

+aujourd

+aujourd'hui

+auquel

+aura

+auront

+aussi

+autre

+autres

+aux

+auxquelles

+auxquels

+avaient

+avais

+avait

+avant

+avec

+avoir

+ayant

+b

+bah

+beaucoup

+bien

+bigre

+boum

+bravo

+brrr

+c

+�a

+car

+ce

+ceci

+cela

+celle

+celle-ci

+celle-l�

+celles

+celles-ci

+celles-l�

+celui

+celui-ci

+celui-l�

+cent

+cependant

+certain

+certaine

+certaines

+certains

+certes

+ces

+cet

+cette

+ceux

+ceux-ci

+ceux-l�

+chacun

+chaque

+cher

+ch�re

+ch�res

+chers

+chez

+chiche

+chut

+ci

+cinq

+cinquantaine

+cinquante

+cinquanti�me

+cinqui�me

+clac

+clic

+combien

+comme

+comment

+compris

+concernant

+contre

+couic

+crac

+d

+da

+dans

+de

+debout

+dedans

+dehors

+del�

+depuis

+derri�re

+des

+d�s

+d�sormais

+desquelles

+desquels

+dessous

+dessus

+deux

+deuxi�me

+deuxi�mement

+devant

+devers

+devra

+diff�rent

+diff�rente

+diff�rentes

+diff�rents

+dire

+divers

+diverse

+diverses

+dix

+dix-huit

+dixi�me

+dix-neuf

+dix-sept

+doit

+doivent

+donc

+dont

+douze

+douzi�me

+dring

+du

+duquel

+durant

+e

+effet

+eh

+elle

+elle-m�me

+elles

+elles-m�mes

+en

+encore

+entre

+envers

+environ

+es

+�s

+est

+et

+etant

+�taient

+�tais

+�tait

+�tant

+etc

+�t�

+etre

+�tre

+eu

+euh

+eux

+eux-m�mes

+except�

+f

+fa�on

+fais

+faisaient

+faisant

+fait

+feront

+fi

+flac

+floc

+font

+g

+gens

+h

+ha

+h�

+hein

+h�las

+hem

+hep

+hi

+ho

+hol�

+hop

+hormis

+hors

+hou

+houp

+hue

+hui

+huit

+huiti�me

+hum

+hurrah

+i

+il

+ils

+importe

+j

+je

+jusqu

+jusque

+k

+l

+la

+l�

+laquelle

+las

+le

+lequel

+les

+l�s

+lesquelles

+lesquels

+leur

+leurs

+longtemps

+lorsque

+lui

+lui-m�me

+m

+ma

+maint

+mais

+malgr�

+me

+m�me

+m�mes

+merci

+mes

+mien

+mienne

+miennes

+miens

+mille

+mince

+moi

+moi-m�me

+moins

+mon

+moyennant

+n

+na

+ne

+n�anmoins

+neuf

+neuvi�me

+ni

+nombreuses

+nombreux

+non

+nos

+notre

+n�tre

+n�tres

+nous

+nous-m�mes

+nul

+o

+o|

+�

+oh

+oh�

+ol�

+oll�

+on

+ont

+onze

+onzi�me

+ore

+ou

+o�

+ouf

+ouias

+oust

+ouste

+outre

+p

+paf

+pan

+par

+parmi

+partant

+particulier

+particuli�re

+particuli�rement

+pas

+pass�

+pendant

+personne

+peu

+peut

+peuvent

+peux

+pff

+pfft

+pfut

+pif

+plein

+plouf

+plus

+plusieurs

+plut�t

+pouah

+pour

+pourquoi

+premier

+premi�re

+premi�rement

+pr�s

+proche

+psitt

+puisque

+q

+qu

+quand

+quant

+quanta

+quant-�-soi

+quarante

+quatorze

+quatre

+quatre-vingt

+quatri�me

+quatri�mement

+que

+quel

+quelconque

+quelle

+quelles

+quelque

+quelques

+quelqu'un

+quels

+qui

+quiconque

+quinze

+quoi

+quoique

+r

+revoici

+revoil�

+rien

+s

+sa

+sacrebleu

+sans

+sapristi

+sauf

+se

+seize

+selon

+sept

+septi�me

+sera

+seront

+ses

+si

+sien

+sienne

+siennes

+siens

+sinon

+six

+sixi�me

+soi

+soi-m�me

+soit

+soixante

+son

+sont

+sous

+stop

+suis

+suivant

+sur

+surtout

+t

+ta

+tac

+tant

+te

+t�

+tel

+telle

+tellement

+telles

+tels

+tenant

+tes

+tic

+tien

+tienne

+tiennes

+tiens

+toc

+toi

+toi-m�me

+ton

+touchant

+toujours

+tous

+tout

+toute

+toutes

+treize

+trente

+tr�s

+trois

+troisi�me

+troisi�mement

+trop

+tsoin

+tsouin

+tu

+u

+un

+une

+unes

+uns

+v

+va

+vais

+vas

+v�

+vers

+via

+vif

+vifs

+vingt

+vivat

+vive

+vives

+vlan

+voici

+voil�

+vont

+vos

+votre

+v�tre

+v�tres

+vous

+vous-m�mes

+vu

+w

+x

+y

+z

+zut


diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/hi_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/hi_ST.txt
new file mode 100644
index 0000000..426fc2d
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/hi_ST.txt

@@ -0,0 +1,164 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html

+पर

+इन 

+वह 

+यिह 

+वुह 

+जिन्हें

+जिन्हों

+तिन्हें

+तिन्हों

+किन्हों

+किन्हें

+इत्यादि

+द्वारा

+इन्हें

+इन्हों

+उन्हों

+बिलकुल

+निहायत

+ऱ्वासा

+इन्हीं

+उन्हीं

+उन्हें

+इसमें

+जितना

+दुसरा

+कितना

+दबारा

+साबुत

+वग़ैरह

+दूसरे

+कौनसा

+लेकिन

+होता

+करने

+किया

+लिये

+अपने

+नहीं

+दिया

+इसका

+करना

+वाले

+सकते

+इसके

+सबसे

+होने

+करते

+बहुत

+वर्ग

+करें

+होती

+अपनी

+उनके

+कहते

+होते

+करता

+उनकी

+इसकी

+सकता

+रखें

+अपना

+उसके

+जिसे

+तिसे

+किसे

+किसी

+काफ़ी

+पहले

+नीचे

+बाला

+यहाँ

+जैसा

+जैसे

+मानो

+अंदर

+भीतर

+पूरा

+सारा

+होना

+उनको

+वहाँ

+वहीं

+जहाँ

+जीधर

+उनका

+इनका

+के

+हैं

+गया

+बनी

+एवं

+हुआ

+साथ

+बाद

+लिए

+कुछ

+कहा

+यदि

+हुई

+इसे

+हुए

+अभी

+सभी

+कुल

+रहा

+रहे

+इसी

+उसे

+जिस

+जिन

+तिस

+तिन

+कौन

+किस

+कोई

+ऐसे

+तरह

+किर

+साभ

+संग

+यही

+बही

+उसी

+फिर

+मगर

+का

+एक

+यह

+से

+को

+इस

+कि

+जो

+कर

+मे

+ने

+तो

+ही

+या

+हो

+था

+तक

+आप

+ये

+थे

+दो

+वे

+थी

+जा

+ना

+उस

+एस

+पे

+उन

+सो

+भी

+और

+घर

+तब

+जब

+अत

+व

+न


diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/hu_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/hu_ST.txt
new file mode 100644
index 0000000..7cf3e1c
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/hu_ST.txt

@@ -0,0 +1,738 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html
+a
+abba
+abban
+abból
+addig
+ahhoz
+ahol
+akár
+aki
+akik
+akkor
+alá
+alád
+alájuk
+alám
+alánk
+alapján
+alátok
+alatt
+alatta
+alattad
+alattam
+alattatok
+alattuk
+alattunk
+alól
+alóla
+alólad
+alólam
+alólatok
+alóluk
+alólunk
+által
+általában
+ám
+amely
+amelybol
+amelyek
+amelyekben
+amelyeket
+amelyet
+amelyik
+amelynek
+ami
+amíg
+amikor
+amit
+amott
+annak
+annál
+arra
+arról
+át
+attól
+az
+azért
+aznap
+azok
+azokat
+azokba
+azokban
+azokból
+azokért
+azokhoz
+azokig
+azokká
+azokkal
+azoknak
+azoknál
+azokon
+azokra
+azokról
+azoktól
+azon
+azonban
+azonnal
+azt
+aztán
+azzá
+azzal
+bal
+balra
+ban
+bár
+bárcsak
+bármilyen
+be
+belé
+beléd
+beléjük
+belém
+belénk
+belétek
+belőle
+belőled
+belőlem
+belőletek
+belőlük
+belőlünk
+belül
+ben
+benne
+benned
+bennem
+bennetek
+bennük
+bennünk
+búcsú
+csak
+csakhogy
+csupán
+de
+dehogy
+ebbe
+ebben
+ebből
+eddig
+egész
+egészen
+egy
+egyéb
+egyebek
+egyebet
+egyedül
+egyelőre
+egyet
+egyik
+egymás
+egyre
+egyszerre
+együtt
+ehhez
+el
+elé
+eléd
+elég
+eleinte
+eléjük
+elém
+elénk
+elétek
+éljen
+ellen
+ellenére
+ellenes
+elleni
+elmondta
+előbb
+elől
+előle
+előled
+előlem
+előletek
+előlük
+előlünk
+először
+előtt
+előtte
+előtted
+előttem
+előttetek
+előttük
+előttünk
+előző
+első
+elsők
+elsősorban
+elsőt
+én
+engem
+ennek
+ennél
+ennyi
+enyém
+erre
+erről
+érte
+érted
+értem
+értetek
+értük
+értünk
+és
+esetben
+ettől
+év
+évben
+éve
+évek
+éves
+évi
+évvel
+ez
+ezek
+ezekbe
+ezekben
+ezekből
+ezeken
+ezekért
+ezeket
+ezekhez
+ezekig
+ezekké
+ezekkel
+ezeknek
+ezeknél
+ezekre
+ezekről
+ezektől
+ezen
+ezentúl
+ezer
+ezért
+ezret
+ezt
+ezután
+ezzé
+ezzel
+fel
+fél
+fele
+felé
+felek
+felet
+felett
+fent
+fenti
+fölé
+gyakran
+ha
+halló
+hamar
+hanem
+hány
+hányszor
+harmadik
+harmadikat
+hármat
+harminc
+három
+hat
+hát
+hátha
+hatodik
+hatodikat
+hatot
+hátulsó
+hatvan
+helyett
+hét
+hetedik
+hetediket
+hetet
+hetven
+hiába
+hirtelen
+hiszen
+hogy
+hol
+holnap
+holnapot
+honnan
+hova
+hozzá
+hozzád
+hozzájuk
+hozzám
+hozzánk
+hozzátok
+hurrá
+húsz
+huszadik
+idén
+ide-оda
+igazán
+igen
+így
+illetve
+ilyen
+immár
+inkább
+is
+ismét
+itt
+jelenleg
+jó
+jobban
+jobbra
+jól
+jólesik
+jóval
+jövőre
+kell
+kellene
+kellett
+kelljen
+képest
+kérem
+kérlek
+késő
+később
+későn
+kész
+két
+kétszer
+ketten
+kettő
+kettőt
+kevés
+ki
+kiben
+kiből
+kicsit
+kicsoda
+kié
+kiért
+kihez
+kik
+kikbe
+kikben
+kikből
+kiken
+kikért
+kiket
+kikhez
+kikké
+kikkel
+kiknek
+kiknél
+kikre
+kikről
+kiktől
+kilenc
+kilencedik
+kilencediket
+kilencet
+kilencven
+kin
+kinek
+kinél
+kire
+kiről
+kit
+kitől
+kivé
+kivel
+korábban
+körül
+köszönhetően
+köszönöm
+közben
+közé
+közel
+közepén
+közepesen
+között
+közül
+külön
+különben
+különböző
+különbözőbb
+különbözőek
+lassan
+le
+legalább
+legyen
+lehet
+lehetetlen
+lehetőleg
+lehetőség
+lenne
+lennék
+lennének
+lesz
+leszek
+lesznek
+leszünk
+lett
+lettek
+lettem
+lettünk
+lévő
+ma
+maga
+magad
+magam
+magát
+magatokat
+magukat
+magunkat
+mai
+majd
+majdnem
+manapság
+már
+más
+másik
+másikat
+másnap
+második
+másodszor
+mások
+másokat
+mást
+meg
+még
+megcsinál
+megcsinálnak
+megint
+mégis
+megvan
+mellé
+melléd
+melléjük
+mellém
+mellénk
+mellétek
+mellett
+mellette
+melletted
+mellettem
+mellettetek
+mellettük
+mellettünk
+mellől
+mellőle
+mellőled
+mellőlem
+mellőletek
+mellőlük
+mellőlünk
+melyik
+mennyi
+mert
+mi
+miatt
+miatta
+miattad
+miattam
+miattatok
+miattuk
+miattunk
+mibe
+miben
+miből
+miért
+míg
+mihez
+mik
+mikbe
+mikben
+mikből
+miken
+mikért
+miket
+mikhez
+mikké
+mikkel
+miknek
+miknél
+mikor
+mikre
+mikről
+miktől
+milyen
+min
+mind
+mindegyik
+mindegyiket
+minden
+mindenesetre
+mindenki
+mindent
+mindenütt
+mindig
+mindketten
+minek
+minél
+minket
+mint
+mire
+miről
+mit
+mitől
+mivé
+mivel
+mögé
+mögéd
+mögéjük
+mögém
+mögénk
+mögétek
+mögött
+mögötte
+mögötted
+mögöttem
+mögöttetek
+mögöttük
+mögöttünk
+mögül
+mögüle
+mögüled
+mögülem
+mögületek
+mögülük
+mögülünk
+mondta
+most
+mostanáig
+múltkor
+múlva
+na
+nagyon
+nála
+nálad
+nálam
+nálatok
+náluk
+nálunk
+naponta
+napot
+ne
+négy
+negyedik
+negyediket
+négyet
+negyven
+néha
+néhány
+neked
+nekem
+neki
+nekik
+nektek
+nekünk
+nélkül
+nem
+nemcsak
+nemrég
+nincs
+nyolc
+nyolcadik
+nyolcadikat
+nyolcat
+nyolcvan
+ő
+ők
+őket
+olyan
+ön
+önbe
+önben
+önből
+önért
+önhöz
+onnan
+önnek
+önnel
+önnél
+önök
+önökbe
+önökben
+önökből
+önökért
+önöket
+önökhöz
+önökkel
+önöknek
+önöknél
+önökön
+önökre
+önökről
+önöktől
+önön
+önre
+önről
+önt
+öntől
+öt
+őt
+óta
+ötödik
+ötödiket
+ötöt
+ott
+ötven
+pár
+pedig
+például
+persze
+rá
+rád
+rajta
+rajtad
+rajtam
+rajtatok
+rajtuk
+rajtunk
+rájuk
+rám
+ránk
+rátok
+régen
+régóta
+rendben
+részére
+rögtön
+róla
+rólad
+rólam
+rólatok
+róluk
+rólunk
+rosszul
+se
+sem
+semmi
+semmilyen
+semmiség
+senki
+soha
+sok
+sokáig
+sokan
+sokszor
+során
+sőt
+stb.
+számára
+száz
+századik
+százat
+szemben
+szépen
+szerbusz
+szerint
+szerinte
+szerinted
+szerintem
+szerintetek
+szerintük
+szerintünk
+szervusz
+szinte
+szíves
+szívesen
+szíveskedjék
+talán
+tavaly
+távol
+te
+téged
+tegnap
+tegnapelőtt
+tehát
+tele
+tényleg
+tessék
+ti
+tied
+titeket
+tíz
+tizedik
+tizediket
+tizenegy
+tizenegyedik
+tizenhárom
+tizenhat
+tizenhét
+tizenkét
+tizenkettedik
+tizenkettő
+tizenkilenc
+tizennégy
+tizennyolc
+tizenöt
+tizet
+több
+többi
+többször
+tőle
+tőled
+tőlem
+tőletek
+tőlük
+tőlünk
+tovább
+további
+túl
+úgy
+ugyanakkor
+ugyanez
+ugyanis
+ugye
+úgyis
+úgynevezett
+újra
+úr
+urak
+uram
+urat
+után
+utoljára
+utolsó
+vagy
+vagyis
+vagyok
+vagytok
+vagyunk
+vajon
+valahol
+valaki
+valakit
+valamelyik
+valami
+valamint
+van
+vannak
+végén
+végre
+végül
+vele
+veled
+velem
+veletek
+velük
+velünk
+viszlát
+viszont
+viszontlátásra
+volna
+volnának
+volnék
+volt
+voltak
+voltam
+voltunk

diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/it_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/it_ST.txt
new file mode 100644
index 0000000..23c80a2
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/it_ST.txt

@@ -0,0 +1,400 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html
+a
+abbastanza
+accidenti
+ad
+adesso
+affinche
+agli
+ahime
+ahimè
+ai
+al
+alcuna
+alcuni
+alcuno
+all
+alla
+alle
+allo
+altri
+altrimenti
+altro
+altrui
+anche
+ancora
+anni
+anno
+ansa
+assai
+attesa
+avanti
+avendo
+avente
+aver
+avere
+avete
+aveva
+avuta
+avute
+avuti
+avuto
+basta
+bene
+benissimo
+berlusconi
+brava
+bravo
+c
+casa
+caso
+cento
+certa
+certe
+certi
+certo
+che
+chi
+chicchessia
+chiunque
+ci
+ciascuna
+ciascuno
+cima
+cio
+ciò
+cioe
+cioè
+circa
+citta
+città
+codesta
+codesti
+codesto
+cogli
+coi
+col
+colei
+coll
+coloro
+colui
+come
+con
+concernente
+consiglio
+contro
+cortesia
+cos
+cosa
+cosi
+così
+cui
+d
+da
+dagli
+dai
+dal
+dall
+dalla
+dalle
+dallo
+davanti
+degli
+dei
+del
+dell
+della
+delle
+dello
+dentro
+detto
+deve
+di
+dice
+dietro
+dire
+dirimpetto
+dopo
+dove
+dovra
+dovrà
+due
+dunque
+durante
+e
+è
+ecco
+ed
+egli
+ella
+eppure
+era
+erano
+esse
+essendo
+esser
+essere
+essi
+ex
+fa
+fare
+fatto
+favore
+fin
+finalmente
+finche
+fine
+fino
+forse
+fra
+fuori
+gia
+già
+giacche
+giorni
+giorno
+gli
+gliela
+gliele
+glieli
+glielo
+gliene
+governo
+grande
+grazie
+gruppo
+ha
+hai
+hanno
+ho
+i
+ieri
+il
+improvviso
+in
+infatti
+insieme
+intanto
+intorno
+invece
+io
+l
+la
+là
+lavoro
+le
+lei
+li
+lo
+lontano
+loro
+lui
+lungo
+ma
+macche
+magari
+mai
+male
+malgrado
+malissimo
+me
+medesimo
+mediante
+meglio
+meno
+mentre
+mesi
+mezzo
+mi
+mia
+mie
+miei
+mila
+miliardi
+milioni
+ministro
+mio
+moltissimo
+molto
+mondo
+nazionale
+ne
+negli
+nei
+nel
+nell
+nella
+nelle
+nello
+nemmeno
+neppure
+nessuna
+nessuno
+niente
+no
+noi
+non
+nondimeno
+nostra
+nostre
+nostri
+nostro
+nulla
+nuovo
+o
+od
+oggi
+ogni
+ognuna
+ognuno
+oltre
+oppure
+ora
+ore
+osi
+ossia
+paese
+parecchi
+parecchie
+parecchio
+parte
+partendo
+peccato
+peggio
+per
+perche
+perchè
+percio
+perciò
+perfino
+pero
+però
+persone
+piedi
+pieno
+piglia
+piu
+più
+po
+pochissimo
+poco
+poi
+poiche
+press
+prima
+primo
+proprio
+puo
+può
+pure
+purtroppo
+qualche
+qualcuna
+qualcuno
+quale
+quali
+qualunque
+quando
+quanta
+quante
+quanti
+quanto
+quantunque
+quasi
+quattro
+quel
+quella
+quelli
+quello
+quest
+questa
+queste
+questi
+questo
+qui
+quindi
+riecco
+salvo
+sara
+sarà
+sarebbe
+scopo
+scorso
+se
+secondo
+seguente
+sei
+sempre
+senza
+si
+sia
+siamo
+siete
+solito
+solo
+sono
+sopra
+sotto
+sta
+staranno
+stata
+state
+stati
+stato
+stesso
+su
+sua
+successivo
+sue
+sugli
+sui
+sul
+sull
+sulla
+sulle
+sullo
+suo
+suoi
+tale
+talvolta
+tanto
+te
+tempo
+ti
+torino
+tra
+tranne
+tre
+troppo
+tu
+tua
+tue
+tuo
+tuoi
+tutta
+tuttavia
+tutte
+tutti
+tutto
+uguali
+un
+una
+uno
+uomo
+va
+vale
+varia
+varie
+vario
+verso
+vi
+via
+vicino
+visto
+vita
+voi
+volta
+vostra
+vostre
+vostri
+vostro

diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/pl_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/pl_ST.txt
new file mode 100644
index 0000000..e27c30e
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/pl_ST.txt

@@ -0,0 +1,139 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html
+ach
+aj
+albo
+bardzo
+bez
+bo
+być
+ci
+cię
+ciebie
+co
+czy
+daleko
+dla
+dlaczego
+dlatego
+do
+dobrze
+dokąd
+dość
+dużo
+dwa
+dwaj
+dwie
+dwoje
+dziś
+dzisiaj
+gdyby
+gdzie
+go
+ich
+ile
+im
+inny
+ja
+ją
+jak
+jakby
+jaki
+je
+jeden
+jedna
+jedno
+jego
+jej
+jemu
+jeśli
+jest
+jestem
+jeżeli
+już
+każdy
+kiedy
+kierunku
+kto
+ku
+lub
+ma
+mają
+mam
+mi
+mną
+mnie
+moi
+mój
+moja
+moje
+może
+mu
+my
+na
+nam
+nami
+nas
+nasi
+nasz
+nasza
+nasze
+natychmiast
+nią
+nic
+nich
+nie
+niego
+niej
+niemu
+nigdy
+nim
+nimi
+niż
+obok
+od
+około
+on
+ona
+one
+oni
+ono
+owszem
+po
+pod
+ponieważ
+przed
+przedtem
+są
+sam
+sama
+się
+skąd
+tak
+taki
+tam
+ten
+to
+tobą
+tobie
+tu
+tutaj
+twoi
+twój
+twoja
+twoje
+ty
+wam
+wami
+was
+wasi
+wasz
+wasza
+wasze
+we
+więc
+wszystko
+wtedy
+wy
+żaden
+zawsze
+że

diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/pt_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/pt_ST.txt
new file mode 100644
index 0000000..da60644
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/pt_ST.txt

@@ -0,0 +1,357 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html

+a

+�

+adeus

+agora

+a�

+ainda

+al�m

+algo

+algumas

+alguns

+ali

+ano

+anos

+antes

+ao

+aos

+apenas

+apoio

+ap�s

+aquela

+aquelas

+aquele

+aqueles

+aqui

+aquilo

+�rea

+as

+�s

+assim

+at�

+atr�s

+atrav�s

+baixo

+bastante

+bem

+bom

+breve

+c�

+cada

+catorze

+cedo

+cento

+certamente

+certeza

+cima

+cinco

+coisa

+com

+como

+conselho

+contra

+custa

+da

+d�

+d�o

+daquela

+daquele

+dar

+das

+de

+debaixo

+demais

+dentro

+depois

+desde

+dessa

+desse

+desta

+deste

+deve

+dever�

+dez

+dezanove

+dezasseis

+dezassete

+dezoito

+dia

+diante

+diz

+dizem

+dizer

+do

+dois

+dos

+doze

+duas

+d�vida

+e

+�

+ela

+elas

+ele

+eles

+em

+embora

+entre

+era

+�s

+essa

+essas

+esse

+esses

+esta

+est�

+estar

+estas

+est�s

+estava

+este

+estes

+esteve

+estive

+estivemos

+estiveram

+estiveste

+estivestes

+estou

+eu

+exemplo

+fa�o

+falta

+favor

+faz

+fazeis

+fazem

+fazemos

+fazer

+fazes

+fez

+fim

+final

+foi

+fomos

+for

+foram

+forma

+foste

+fostes

+fui

+geral

+grande

+grandes

+grupo

+h�

+hoje

+horas

+isso

+isto

+j�

+l�

+lado

+local

+logo

+longe

+lugar

+maior

+maioria

+mais

+mal

+mas

+m�ximo

+me

+meio

+menor

+menos

+m�s

+meses

+meu

+meus

+mil

+minha

+minhas

+momento

+muito

+muitos

+na

+nada

+n�o

+naquela

+naquele

+nas

+nem

+nenhuma

+nessa

+nesse

+nesta

+neste

+n�vel

+no

+noite

+nome

+nos

+n�s

+nossa

+nossas

+nosso

+nossos

+nova

+nove

+novo

+novos

+num

+numa

+n�mero

+nunca

+o

+obra

+obrigada

+obrigado

+oitava

+oitavo

+oito

+onde

+ontem

+onze

+os

+ou

+outra

+outras

+outro

+outros

+para

+parece

+parte

+partir

+pela

+pelas

+pelo

+pelos

+perto

+pode

+p�de

+podem

+poder

+p�e

+p�em

+ponto

+pontos

+por

+porque

+porqu�

+posi��o

+poss�vel

+possivelmente

+posso

+pouca

+pouco

+primeira

+primeiro

+pr�prio

+pr�ximo

+puderam

+qual

+quando

+quanto

+quarta

+quarto

+quatro

+que

+qu�

+quem

+quer

+quero

+quest�o

+quinta

+quinto

+quinze

+rela��o

+sabe

+s�o

+se

+segunda

+segundo

+sei

+seis

+sem

+sempre

+ser

+seria

+sete

+s�tima

+s�timo

+seu

+seus

+sexta

+sexto

+sim

+sistema

+sob

+sobre

+sois

+somos

+sou

+sua

+suas

+tal

+talvez

+tamb�m

+tanto

+t�o

+tarde

+te

+tem

+t�m

+temos

+tendes

+tenho

+tens

+ter

+terceira

+terceiro

+teu

+teus

+teve

+tive

+tivemos

+tiveram

+tiveste

+tivestes

+toda

+todas

+todo

+todos

+trabalho

+tr�s

+treze

+tu

+tua

+tuas

+tudo

+um

+uma

+umas

+uns

+vai

+vais

+v�o

+v�rios

+vem

+v�m

+vens

+ver

+vez

+vezes

+viagem

+vindo

+vinte

+voc�

+voc�s

+vos

+v�s

+vossa

+vossas

+vosso

+vossos

+zero


diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/ro_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/ro_ST.txt
new file mode 100644
index 0000000..ec7c517
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/ro_ST.txt

@@ -0,0 +1,283 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html

+acea

+aceasta

+această

+aceea

+acei

+aceia

+acel

+acela

+acele

+acelea

+acest

+acesta

+aceste

+acestea

+aceşti

+aceştia

+acolo

+acord

+acum

+ai

+aia

+aibă

+aici

+al

+ăla

+ale

+alea

+ălea

+altceva

+altcineva

+am

+ar

+are

+aş

+aşadar

+asemenea

+asta

+ăsta

+astăzi

+astea

+ăstea

+ăştia

+asupra

+aţi

+au

+avea

+avem

+aveţi

+azi

+bine

+bucur

+bună

+ca

+că

+căci

+când

+care

+cărei

+căror

+cărui

+cât

+câte

+câţi

+către

+câtva

+caut

+ce

+cel

+ceva

+chiar

+cinci

+cînd

+cine

+cineva

+cît

+cîte

+cîţi

+cîtva

+contra

+cu

+cum

+cumva

+curând

+curînd

+da

+dă

+dacă

+dar

+dată

+datorită

+dau

+de

+deci

+deja

+deoarece

+departe

+deşi

+din

+dinaintea

+dintr-

+dintre

+doi

+doilea

+două

+drept

+după

+ea

+ei

+el

+ele

+eram

+este

+eşti

+eu

+face

+fără

+fata

+fi

+fie

+fiecare

+fii

+fim

+fiţi

+fiu

+frumos

+graţie

+halbă

+iar

+ieri

+îi

+îl

+îmi

+împotriva

+în 

+înainte

+înaintea

+încât

+încît

+încotro

+între

+întrucât

+întrucît

+îţi

+la

+lângă

+le

+li

+lîngă

+lor

+lui

+mă

+mai

+mâine

+mea

+mei

+mele

+mereu

+meu

+mi

+mie

+mîine

+mine

+mult

+multă

+mulţi

+mulţumesc

+ne

+nevoie

+nicăieri

+nici

+nimeni

+nimeri

+nimic

+nişte

+noastră

+noastre

+noi

+noroc

+noştri

+nostru

+nouă

+nu

+opt

+ori

+oricând

+oricare

+oricât

+orice

+oricînd

+oricine

+oricît

+oricum

+oriunde

+până

+patra

+patru

+patrulea

+pe

+pentru

+peste

+pic

+pînă

+poate

+pot

+prea

+prima

+primul

+prin

+printr-

+puţin

+puţina

+puţină

+rog

+sa

+să

+săi

+sale

+şapte

+şase

+sau

+său

+se

+şi

+sînt

+sîntem

+sînteţi

+spate

+spre

+ştiu

+sub

+sunt

+suntem

+sunteţi

+sută

+ta

+tăi

+tale

+tău

+te

+ţi

+ţie

+timp

+tine

+toată

+toate

+tot

+toţi

+totuşi

+trei

+treia

+treilea

+tu

+un

+una

+unde

+undeva

+unei

+uneia

+unele

+uneori

+unii

+unor

+unora

+unu

+unui

+unuia

+unul

+vă

+vi

+voastră

+voastre

+voi

+voştri

+vostru

+vouă

+vreme

+vreo

+vreun

+zece

+zero

+zi

+zice


diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/ru_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/ru_ST.txt
new file mode 100644
index 0000000..d7de4e5
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/ru_ST.txt

@@ -0,0 +1,423 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html

+а

+е

+и

+ж

+м

+о

+на

+не

+ни

+об

+но

+он

+мне

+мои

+мож

+она

+они

+оно

+мной

+много

+многочисленное

+многочисленная

+многочисленные

+многочисленный

+мною

+мой

+мог

+могут

+можно

+может

+можхо

+мор

+моя

+моё

+мочь

+над

+нее

+оба

+нам

+нем

+нами

+ними

+мимо

+немного

+одной

+одного

+менее

+однажды

+однако

+меня

+нему

+меньше

+ней

+наверху

+него

+ниже

+мало

+надо

+один

+одиннадцать

+одиннадцатый

+назад

+наиболее

+недавно

+миллионов

+недалеко

+между

+низко

+меля

+нельзя

+нибудь

+непрерывно

+наконец

+никогда

+никуда

+нас

+наш

+нет

+нею

+неё

+них

+мира

+наша

+наше

+наши

+ничего

+начала

+нередко

+несколько

+обычно

+опять

+около

+мы

+ну

+нх

+от

+отовсюду

+особенно

+нужно

+очень

+отсюда

+в

+во

+вон

+вниз

+внизу

+вокруг

+вот

+восемнадцать

+восемнадцатый

+восемь

+восьмой

+вверх

+вам

+вами

+важное

+важная

+важные

+важный

+вдали

+везде

+ведь

+вас

+ваш

+ваша

+ваше

+ваши

+впрочем

+весь

+вдруг

+вы

+все

+второй

+всем

+всеми

+времени

+время

+всему

+всего

+всегда

+всех

+всею

+всю

+вся

+всё

+всюду

+г

+год

+говорил

+говорит

+года

+году

+где

+да

+ее

+за

+из

+ли

+же

+им

+до

+по

+ими

+под

+иногда

+довольно

+именно

+долго

+позже

+более

+должно

+пожалуйста

+значит

+иметь

+больше

+пока

+ему

+имя

+пор

+пора

+потом

+потому

+после

+почему

+почти

+посреди

+ей

+два

+две

+двенадцать

+двенадцатый

+двадцать

+двадцатый

+двух

+его

+дел

+или

+без

+день

+занят

+занята

+занято

+заняты

+действительно

+давно

+девятнадцать

+девятнадцатый

+девять

+девятый

+даже

+алло

+жизнь

+далеко

+близко

+здесь

+дальше

+для

+лет

+зато

+даром

+первый

+перед

+затем

+зачем

+лишь

+десять

+десятый

+ею

+её

+их

+бы

+еще

+при

+был

+про

+процентов

+против

+просто

+бывает

+бывь

+если

+люди

+была

+были

+было

+будем

+будет

+будете

+будешь

+прекрасно

+буду

+будь

+будто

+будут

+ещё

+пятнадцать

+пятнадцатый

+друго

+другое

+другой

+другие

+другая

+других

+есть

+пять

+быть

+лучше

+пятый

+к

+ком

+конечно

+кому

+кого

+когда

+которой

+которого

+которая

+которые

+который

+которых

+кем

+каждое

+каждая

+каждые

+каждый

+кажется

+как

+какой

+какая

+кто

+кроме

+куда

+кругом

+с

+т

+у

+я

+та

+те

+уж

+со

+то

+том

+снова

+тому

+совсем

+того

+тогда

+тоже

+собой

+тобой

+собою

+тобою

+сначала

+только

+уметь

+тот

+тою

+хорошо

+хотеть

+хочешь

+хоть

+хотя

+свое

+свои

+твой

+своей

+своего

+своих

+свою

+твоя

+твоё

+раз

+уже

+сам

+там

+тем

+чем

+сама

+сами

+теми

+само

+рано

+самом

+самому

+самой

+самого

+семнадцать

+семнадцатый

+самим

+самими

+самих

+саму

+семь

+чему

+раньше

+сейчас

+чего

+сегодня

+себе

+тебе

+сеаой

+человек

+разве

+теперь

+себя

+тебя

+седьмой

+спасибо

+слишком

+так

+такое

+такой

+такие

+также

+такая

+сих

+тех

+чаще

+четвертый

+через

+часто

+шестой

+шестнадцать

+шестнадцатый

+шесть

+четыре

+четырнадцать

+четырнадцатый

+сколько

+сказал

+сказала

+сказать

+ту

+ты

+три

+эта

+эти

+что

+это

+чтоб

+этом

+этому

+этой

+этого

+чтобы

+этот

+стал

+туда

+этим

+этими

+рядом

+тринадцать

+тринадцатый

+этих

+третий

+тут

+эту

+суть

+чуть

+тысяч

+


diff --git a/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/sv_ST.txt b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/sv_ST.txt
new file mode 100644
index 0000000..582ab5a
--- /dev/null
+++ b/src/resources/org/apache/cassandra/index/sasi/analyzer/filter/sv_ST.txt

@@ -0,0 +1,387 @@
+# Stop Words List from http://members.unine.ch/jacques.savoy/clef/index.html

+aderton

+adertonde

+adj�

+aldrig

+alla

+allas

+allt

+alltid

+allts�

+�n

+andra

+andras

+annan

+annat

+�nnu

+artonde

+artonn

+�tminstone

+att

+�tta

+�ttio

+�ttionde

+�ttonde

+av

+�ven

+b�da

+b�das

+bakom

+bara

+b�st

+b�ttre

+beh�va

+beh�vas

+beh�vde

+beh�vt

+beslut

+beslutat

+beslutit

+bland

+blev

+bli

+blir

+blivit

+bort

+borta

+bra

+d�

+dag

+dagar

+dagarna

+dagen

+d�r

+d�rf�r

+de

+del

+delen

+dem

+den

+deras

+dess

+det

+detta

+dig

+din

+dina

+dit

+ditt

+dock

+du

+efter

+eftersom

+elfte

+eller

+elva

+en

+enkel

+enkelt

+enkla

+enligt

+er

+era

+ert

+ett

+ettusen

+f� 

+fanns

+f�r

+f�tt 

+fem

+femte

+femtio

+femtionde

+femton

+femtonde

+fick

+fin

+finnas

+finns

+fj�rde

+fjorton

+fjortonde

+fler

+flera

+flesta

+f�ljande

+f�r

+f�re

+f�rl�t

+f�rra

+f�rsta

+fram

+framf�r

+fr�n

+fyra

+fyrtio

+fyrtionde

+g�

+g�lla

+g�ller

+g�llt

+g�r

+g�rna

+g�tt

+genast

+genom

+gick

+gjorde

+gjort

+god

+goda

+godare

+godast

+g�r

+g�ra

+gott

+ha

+hade

+haft

+han

+hans

+har

+h�r

+heller

+hellre

+helst

+helt

+henne

+hennes

+hit

+h�g

+h�ger

+h�gre

+h�gst

+hon

+honom

+hundra

+hundraen

+hundraett

+hur

+i

+ibland

+idag

+ig�r

+igen

+imorgon

+in

+inf�r

+inga

+ingen

+ingenting

+inget

+innan

+inne

+inom

+inte

+inuti

+ja

+jag

+j�mf�rt

+kan

+kanske

+knappast

+kom

+komma

+kommer

+kommit

+kr

+kunde

+kunna

+kunnat

+kvar

+l�nge

+l�ngre

+l�ngsam

+l�ngsammare

+l�ngsammast

+l�ngsamt

+l�ngst

+l�ngt

+l�tt

+l�ttare

+l�ttast

+legat

+ligga

+ligger

+lika

+likst�lld

+likst�llda

+lilla

+lite

+liten

+litet

+man

+m�nga

+m�ste

+med

+mellan

+men

+mer

+mera

+mest

+mig

+min

+mina

+mindre

+minst

+mitt

+mittemot

+m�jlig

+m�jligen

+m�jligt

+m�jligtvis

+mot

+mycket

+n�gon

+n�gonting

+n�got

+n�gra

+n�r

+n�sta

+ned

+nederst

+nedersta

+nedre

+nej

+ner

+ni

+nio

+nionde

+nittio

+nittionde

+nitton

+nittonde

+n�dv�ndig

+n�dv�ndiga

+n�dv�ndigt

+n�dv�ndigtvis

+nog

+noll

+nr

+nu

+nummer

+och

+ocks�

+ofta

+oftast

+olika

+olikt

+om

+oss

+�ver

+�vermorgon

+�verst

+�vre

+p�

+rakt

+r�tt

+redan

+s�

+sade

+s�ga

+s�ger

+sagt

+samma

+s�mre

+s�mst

+sedan

+senare

+senast

+sent

+sex

+sextio

+sextionde

+sexton

+sextonde

+sig

+sin

+sina

+sist

+sista

+siste

+sitt

+sj�tte

+sju

+sjunde

+sjuttio

+sjuttionde

+sjutton

+sjuttonde

+ska

+skall

+skulle

+slutligen

+sm�

+sm�tt

+snart

+som

+stor

+stora

+st�rre

+st�rst

+stort

+tack

+tidig

+tidigare

+tidigast

+tidigt

+till

+tills

+tillsammans

+tio

+tionde

+tjugo

+tjugoen

+tjugoett

+tjugonde

+tjugotre

+tjugotv�

+tjungo

+tolfte

+tolv

+tre

+tredje

+trettio

+trettionde

+tretton

+trettonde

+tv�

+tv�hundra

+under

+upp

+ur

+urs�kt

+ut

+utan

+utanf�r

+ute

+vad

+v�nster

+v�nstra

+var

+v�r

+vara

+v�ra

+varf�r

+varifr�n

+varit

+varken

+v�rre

+vars�god

+vart

+v�rt

+vem

+vems

+verkligen

+vi

+vid

+vidare

+viktig

+viktigare

+viktigast

+viktigt

+vilka

+vilken

+vilket

+vill


diff --git a/test/conf/cassandra-murmur.yaml b/test/conf/cassandra-murmur.yaml
new file mode 100644
index 0000000..a4b25ba
--- /dev/null
+++ b/test/conf/cassandra-murmur.yaml

@@ -0,0 +1,45 @@
+#
+# Warning!
+# Consider the effects on 'o.a.c.i.s.LegacySSTableTest' before changing schemas in this file.
+#
+cluster_name: Test Cluster
+memtable_allocation_type: heap_buffers
+commitlog_sync: batch
+commitlog_sync_batch_window_in_ms: 1.0
+commitlog_segment_size_in_mb: 5
+commitlog_directory: build/test/cassandra/commitlog
+cdc_raw_directory: build/test/cassandra/cdc_raw
+cdc_enabled: false
+hints_directory: build/test/cassandra/hints
+partitioner: org.apache.cassandra.dht.Murmur3Partitioner
+listen_address: 127.0.0.1
+storage_port: 7010
+rpc_port: 9170
+start_native_transport: true
+native_transport_port: 9042
+column_index_size_in_kb: 4
+saved_caches_directory: build/test/cassandra/saved_caches
+data_file_directories:
+    - build/test/cassandra/data
+disk_access_mode: mmap
+seed_provider:
+    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
+      parameters:
+          - seeds: "127.0.0.1"
+endpoint_snitch: org.apache.cassandra.locator.SimpleSnitch
+dynamic_snitch: true
+request_scheduler: org.apache.cassandra.scheduler.RoundRobinScheduler
+request_scheduler_id: keyspace
+server_encryption_options:
+    internode_encryption: none
+    keystore: conf/.keystore
+    keystore_password: cassandra
+    truststore: conf/.truststore
+    truststore_password: cassandra
+incremental_backups: true
+concurrent_compactors: 4
+compaction_throughput_mb_per_sec: 0
+row_cache_class_name: org.apache.cassandra.cache.OHCProvider
+row_cache_size_in_mb: 16
+enable_user_defined_functions: true
+enable_scripted_user_defined_functions: true

diff --git a/test/conf/cassandra.keystore b/test/conf/cassandra.keystore
new file mode 100644
index 0000000..9a704ca
--- /dev/null
+++ b/test/conf/cassandra.keystore
Binary files differ

diff --git a/test/conf/cassandra.yaml b/test/conf/cassandra.yaml
index 1dba284..cf02634 100644
--- a/test/conf/cassandra.yaml
+++ b/test/conf/cassandra.yaml

@@ -3,11 +3,14 @@
 # Consider the effects on 'o.a.c.i.s.LegacySSTableTest' before changing schemas in this file.
 #
 cluster_name: Test Cluster
-memtable_allocation_type: heap_buffers
+# memtable_allocation_type: heap_buffers
+memtable_allocation_type: offheap_objects
 commitlog_sync: batch
 commitlog_sync_batch_window_in_ms: 1.0
 commitlog_segment_size_in_mb: 5
 commitlog_directory: build/test/cassandra/commitlog
+cdc_raw_directory: build/test/cassandra/cdc_raw
+cdc_enabled: false
 hints_directory: build/test/cassandra/hints
 partitioner: org.apache.cassandra.dht.ByteOrderedPartitioner
 listen_address: 127.0.0.1

diff --git a/test/conf/cassandra_encryption.yaml b/test/conf/cassandra_encryption.yaml
new file mode 100644
index 0000000..47e1312
--- /dev/null
+++ b/test/conf/cassandra_encryption.yaml

@@ -0,0 +1,14 @@
+transparent_data_encryption_options:
+    enabled: true
+    chunk_length_kb: 2
+    cipher: AES/CBC/PKCS5Padding
+    key_alias: testing:1
+    # CBC requires iv length to be 16 bytes
+    # iv_length: 16
+    key_provider: 
+      - class_name: org.apache.cassandra.security.JKSKeyProvider
+        parameters: 
+          - keystore: test/conf/cassandra.keystore
+            keystore_password: cassandra
+            store_type: JCEKS
+            key_password: cassandra

diff --git a/test/conf/cdc.yaml b/test/conf/cdc.yaml
new file mode 100644
index 0000000..f79930a
--- /dev/null
+++ b/test/conf/cdc.yaml

@@ -0,0 +1 @@
+cdc_enabled: true

diff --git a/test/conf/logback-test.xml b/test/conf/logback-test.xml
index abedc32..f6a5492 100644
--- a/test/conf/logback-test.xml
+++ b/test/conf/logback-test.xml

@@ -42,13 +42,13 @@
       <immediateFlush>false</immediateFlush>
     </encoder>
   </appender>
-  
+
   <appender name="STDOUT" target="System.out" class="org.apache.cassandra.ConsoleAppender">
     <encoder>
       <pattern>%-5level %date{HH:mm:ss,SSS} %msg%n</pattern>
     </encoder>
     <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
-      <level>INFO</level>
+      <level>DEBUG</level>
     </filter>
   </appender>
 
@@ -59,6 +59,8 @@
 
   <logger name="org.apache.hadoop" level="WARN"/>
 
+  <logger name="org.apache.cassandra.db.monitoring" level="DEBUG"/>
+
   <!-- Do not change the name of this appender. LogbackStatusListener uses the thread name
        tied to the appender name to know when to write to real stdout/stderr vs forwarding to logback -->
   <appender name="ASYNC" class="ch.qos.logback.classic.AsyncAppender">

diff --git a/test/data/bloom-filter/ka/foo.cql b/test/data/bloom-filter/ka/foo.cql
index c4aed6a..4926e3a 100644
--- a/test/data/bloom-filter/ka/foo.cql
+++ b/test/data/bloom-filter/ka/foo.cql

@@ -59,6 +59,6 @@
 Estimated droppable tombstones: 0.0
 SSTable Level: 0
 Repaired at: 0
-ReplayPosition(segmentId=1428529465658, position=6481)
+CommitLogPosition(segmentId=1428529465658, position=6481)
 Estimated tombstone drop times:%n
 

diff --git a/test/data/legacy-commitlog/3.4-encrypted/CommitLog-6-1452918948163.log b/test/data/legacy-commitlog/3.4-encrypted/CommitLog-6-1452918948163.log
new file mode 100644
index 0000000..3be1fcf
--- /dev/null
+++ b/test/data/legacy-commitlog/3.4-encrypted/CommitLog-6-1452918948163.log
Binary files differ

diff --git a/test/data/legacy-commitlog/3.4-encrypted/hash.txt b/test/data/legacy-commitlog/3.4-encrypted/hash.txt
new file mode 100644
index 0000000..d4cca55
--- /dev/null
+++ b/test/data/legacy-commitlog/3.4-encrypted/hash.txt

@@ -0,0 +1,5 @@
+#CommitLog upgrade test, version 3.4-SNAPSHOT
+#Fri Jan 15 20:35:53 PST 2016
+cells=8777
+hash=-542543236
+cfid=9debf690-bc0a-11e5-9ac3-9fafc76bc377

diff --git a/test/long/org/apache/cassandra/cql3/CachingBench.java b/test/long/org/apache/cassandra/cql3/CachingBench.java
new file mode 100644
index 0000000..370b3ff
--- /dev/null
+++ b/test/long/org/apache/cassandra/cql3/CachingBench.java

@@ -0,0 +1,375 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.cql3;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.List;
+import java.util.Random;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.Future;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.function.Predicate;
+
+import com.google.common.collect.Iterables;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import junit.framework.Assert;
+import org.apache.cassandra.config.Config.CommitLogSync;
+import org.apache.cassandra.config.Config.DiskAccessMode;
+import org.apache.cassandra.cache.ChunkCache;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.db.compaction.CompactionManager;
+import org.apache.cassandra.db.rows.Row;
+import org.apache.cassandra.db.rows.Unfiltered;
+import org.apache.cassandra.db.rows.UnfilteredRowIterator;
+import org.apache.cassandra.io.sstable.ISSTableScanner;
+import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.utils.FBUtilities;
+
+public class CachingBench extends CQLTester
+{
+    private static final String STRATEGY = "LeveledCompactionStrategy";
+
+    private static final int DEL_SECTIONS = 1000;
+    private static final int FLUSH_FREQ = 10000;
+    private static final int SCAN_FREQUENCY_INV = 12000;
+    static final int COUNT = 29000;
+    static final int ITERS = 9;
+
+    static final int KEY_RANGE = 30;
+    static final int CLUSTERING_RANGE = 210000;
+
+    static final int EXTRA_SIZE = 1025;
+    static final boolean CONCURRENT_COMPACTIONS = true;
+
+    // The name of this method is important!
+    // CommitLog settings must be applied before CQLTester sets up; by using the same name as its @BeforeClass method we
+    // are effectively overriding it.
+    @BeforeClass
+    public static void setUpClass()
+    {
+        DatabaseDescriptor.setCommitLogSync(CommitLogSync.periodic);
+        DatabaseDescriptor.setCommitLogSyncPeriod(100);
+        CQLTester.setUpClass();
+    }
+    
+    String hashQuery;
+
+    @Before
+    public void before() throws Throwable
+    {
+        createTable("CREATE TABLE %s(" +
+                    "  key int," +
+                    "  column int," +
+                    "  data int," +
+                    "  extra text," +
+                    "  PRIMARY KEY(key, column)" +
+                    ")"
+                   );
+
+        String hashIFunc = parseFunctionName(createFunction(KEYSPACE, "int, int",
+                " CREATE FUNCTION %s (state int, val int)" +
+                " CALLED ON NULL INPUT" +
+                " RETURNS int" +
+                " LANGUAGE java" +
+                " AS 'return val != null ? state * 17 + val : state;'")).name;
+        String hashTFunc = parseFunctionName(createFunction(KEYSPACE, "int, text",
+                " CREATE FUNCTION %s (state int, val text)" +
+                " CALLED ON NULL INPUT" +
+                " RETURNS int" +
+                " LANGUAGE java" +
+                " AS 'return val != null ? state * 17 + val.hashCode() : state;'")).name;
+
+        String hashInt = createAggregate(KEYSPACE, "int",
+                " CREATE AGGREGATE %s (int)" +
+                " SFUNC " + hashIFunc +
+                " STYPE int" +
+                " INITCOND 1");
+        String hashText = createAggregate(KEYSPACE, "text",
+                " CREATE AGGREGATE %s (text)" +
+                " SFUNC " + hashTFunc +
+                " STYPE int" +
+                " INITCOND 1");
+
+        hashQuery = String.format("SELECT count(column), %s(key), %s(column), %s(data), %s(extra), avg(key), avg(column), avg(data) FROM %%s",
+                                  hashInt, hashInt, hashInt, hashText);
+    }
+    AtomicLong id = new AtomicLong();
+    long compactionTimeNanos = 0;
+
+    void pushData(Random rand, int count) throws Throwable
+    {
+        for (int i = 0; i < count; ++i)
+        {
+            long ii = id.incrementAndGet();
+            if (ii % 1000 == 0)
+                System.out.print('.');
+            int key = rand.nextInt(KEY_RANGE);
+            int column = rand.nextInt(CLUSTERING_RANGE);
+            execute("INSERT INTO %s (key, column, data, extra) VALUES (?, ?, ?, ?)", key, column, (int) ii, genExtra(rand));
+            maybeCompact(ii);
+        }
+    }
+
+    private String genExtra(Random rand)
+    {
+        StringBuilder builder = new StringBuilder(EXTRA_SIZE);
+        for (int i = 0; i < EXTRA_SIZE; ++i)
+            builder.append((char) ('a' + rand.nextInt('z' - 'a' + 1)));
+        return builder.toString();
+    }
+
+    void readAndDelete(Random rand, int count) throws Throwable
+    {
+        for (int i = 0; i < count; ++i)
+        {
+            int key;
+            UntypedResultSet res;
+            long ii = id.incrementAndGet();
+            if (ii % 1000 == 0)
+                System.out.print('-');
+            if (rand.nextInt(SCAN_FREQUENCY_INV) != 1)
+            {
+                do
+                {
+                    key = rand.nextInt(KEY_RANGE);
+                    long cid = rand.nextInt(DEL_SECTIONS);
+                    int cstart = (int) (cid * CLUSTERING_RANGE / DEL_SECTIONS);
+                    int cend = (int) ((cid + 1) * CLUSTERING_RANGE / DEL_SECTIONS);
+                    res = execute("SELECT column FROM %s WHERE key = ? AND column >= ? AND column < ? LIMIT 1", key, cstart, cend);
+                } while (res.size() == 0);
+                UntypedResultSet.Row r = Iterables.get(res, rand.nextInt(res.size()));
+                int clustering = r.getInt("column");
+                execute("DELETE FROM %s WHERE key = ? AND column = ?", key, clustering);
+            }
+            else
+            {
+                execute(hashQuery);
+            }
+            maybeCompact(ii);
+        }
+    }
+
+    private void maybeCompact(long ii)
+    {
+        if (ii % FLUSH_FREQ == 0)
+        {
+            System.out.print("F");
+            flush();
+            if (ii % (FLUSH_FREQ * 10) == 0)
+            {
+                System.out.println("C");
+                long startTime = System.nanoTime();
+                getCurrentColumnFamilyStore().enableAutoCompaction(!CONCURRENT_COMPACTIONS);
+                long endTime = System.nanoTime();
+                compactionTimeNanos += endTime - startTime;
+                getCurrentColumnFamilyStore().disableAutoCompaction();
+            }
+        }
+    }
+
+    public void testSetup(String compactionClass, String compressorClass, DiskAccessMode mode, boolean cacheEnabled) throws Throwable
+    {
+        id.set(0);
+        compactionTimeNanos = 0;
+        ChunkCache.instance.enable(cacheEnabled);
+        DatabaseDescriptor.setDiskAccessMode(mode);
+        alterTable("ALTER TABLE %s WITH compaction = { 'class' :  '" + compactionClass + "'  };");
+        alterTable("ALTER TABLE %s WITH compression = { 'sstable_compression' : '" + compressorClass + "'  };");
+        ColumnFamilyStore cfs = getCurrentColumnFamilyStore();
+        cfs.disableAutoCompaction();
+
+        long onStartTime = System.currentTimeMillis();
+        ExecutorService es = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
+        List<Future<?>> tasks = new ArrayList<>();
+        for (int ti = 0; ti < 1; ++ti)
+        {
+            Random rand = new Random(ti);
+            tasks.add(es.submit(() -> 
+            {
+                for (int i = 0; i < ITERS; ++i)
+                    try
+                    {
+                        pushData(rand, COUNT);
+                        readAndDelete(rand, COUNT / 3);
+                    }
+                    catch (Throwable e)
+                    {
+                        throw new AssertionError(e);
+                    }
+            }));
+        }
+        for (Future<?> task : tasks)
+            task.get();
+
+        flush();
+        long onEndTime = System.currentTimeMillis();
+        int startRowCount = countRows(cfs);
+        int startTombCount = countTombstoneMarkers(cfs);
+        int startRowDeletions = countRowDeletions(cfs);
+        int startTableCount = cfs.getLiveSSTables().size();
+        long startSize = SSTableReader.getTotalBytes(cfs.getLiveSSTables());
+        System.out.println("\nCompession: " + cfs.getCompressionParameters().toString());
+        System.out.println("Reader " + cfs.getLiveSSTables().iterator().next().getFileDataInput(0).toString());
+        if (cacheEnabled)
+            System.out.format("Cache size %s requests %,d hit ratio %f\n",
+                FileUtils.stringifyFileSize(ChunkCache.instance.metrics.size.getValue()),
+                ChunkCache.instance.metrics.requests.getCount(),
+                ChunkCache.instance.metrics.hitRate.getValue());
+        else
+        {
+            Assert.assertTrue("Chunk cache had requests: " + ChunkCache.instance.metrics.requests.getCount(), ChunkCache.instance.metrics.requests.getCount() < COUNT);
+            System.out.println("Cache disabled");
+        }
+        System.out.println(String.format("Operations completed in %.3fs", (onEndTime - onStartTime) * 1e-3));
+        if (!CONCURRENT_COMPACTIONS)
+            System.out.println(String.format(", out of which %.3f for non-concurrent compaction", compactionTimeNanos * 1e-9));
+        else
+            System.out.println();
+
+        String hashesBefore = getHashes();
+        long startTime = System.currentTimeMillis();
+        CompactionManager.instance.performMaximal(cfs, true);
+        long endTime = System.currentTimeMillis();
+
+        int endRowCount = countRows(cfs);
+        int endTombCount = countTombstoneMarkers(cfs);
+        int endRowDeletions = countRowDeletions(cfs);
+        int endTableCount = cfs.getLiveSSTables().size();
+        long endSize = SSTableReader.getTotalBytes(cfs.getLiveSSTables());
+
+        System.out.println(String.format("Major compaction completed in %.3fs",
+                (endTime - startTime) * 1e-3));
+        System.out.println(String.format("At start: %,12d tables %12s %,12d rows %,12d deleted rows %,12d tombstone markers",
+                startTableCount, FileUtils.stringifyFileSize(startSize), startRowCount, startRowDeletions, startTombCount));
+        System.out.println(String.format("At end:   %,12d tables %12s %,12d rows %,12d deleted rows %,12d tombstone markers",
+                endTableCount, FileUtils.stringifyFileSize(endSize), endRowCount, endRowDeletions, endTombCount));
+        String hashesAfter = getHashes();
+
+        Assert.assertEquals(hashesBefore, hashesAfter);
+    }
+
+    private String getHashes() throws Throwable
+    {
+        long startTime = System.currentTimeMillis();
+        String hashes = Arrays.toString(getRows(execute(hashQuery))[0]);
+        long endTime = System.currentTimeMillis();
+        System.out.println(String.format("Hashes: %s, retrieved in %.3fs", hashes, (endTime - startTime) * 1e-3));
+        return hashes;
+    }
+
+    @Test
+    public void testWarmup() throws Throwable
+    {
+        testSetup(STRATEGY, "LZ4Compressor", DiskAccessMode.mmap, false);
+    }
+
+    @Test
+    public void testLZ4CachedMmap() throws Throwable
+    {
+        testSetup(STRATEGY, "LZ4Compressor", DiskAccessMode.mmap, true);
+    }
+
+    @Test
+    public void testLZ4CachedStandard() throws Throwable
+    {
+        testSetup(STRATEGY, "LZ4Compressor", DiskAccessMode.standard, true);
+    }
+
+    @Test
+    public void testLZ4UncachedMmap() throws Throwable
+    {
+        testSetup(STRATEGY, "LZ4Compressor", DiskAccessMode.mmap, false);
+    }
+
+    @Test
+    public void testLZ4UncachedStandard() throws Throwable
+    {
+        testSetup(STRATEGY, "LZ4Compressor", DiskAccessMode.standard, false);
+    }
+
+    @Test
+    public void testCachedStandard() throws Throwable
+    {
+        testSetup(STRATEGY, "", DiskAccessMode.standard, true);
+    }
+
+    @Test
+    public void testUncachedStandard() throws Throwable
+    {
+        testSetup(STRATEGY, "", DiskAccessMode.standard, false);
+    }
+
+    @Test
+    public void testMmapped() throws Throwable
+    {
+        testSetup(STRATEGY, "", DiskAccessMode.mmap, false /* doesn't matter */);
+    }
+
+    int countTombstoneMarkers(ColumnFamilyStore cfs)
+    {
+        return count(cfs, x -> x.isRangeTombstoneMarker());
+    }
+
+    int countRowDeletions(ColumnFamilyStore cfs)
+    {
+        return count(cfs, x -> x.isRow() && !((Row) x).deletion().isLive());
+    }
+
+    int countRows(ColumnFamilyStore cfs)
+    {
+        int nowInSec = FBUtilities.nowInSeconds();
+        return count(cfs, x -> x.isRow() && ((Row) x).hasLiveData(nowInSec));
+    }
+
+    private int count(ColumnFamilyStore cfs, Predicate<Unfiltered> predicate)
+    {
+        int count = 0;
+        for (SSTableReader reader : cfs.getLiveSSTables())
+            count += count(reader, predicate);
+        return count;
+    }
+
+    int count(SSTableReader reader, Predicate<Unfiltered> predicate)
+    {
+        int instances = 0;
+        try (ISSTableScanner partitions = reader.getScanner())
+        {
+            while (partitions.hasNext())
+            {
+                try (UnfilteredRowIterator iter = partitions.next())
+                {
+                    while (iter.hasNext())
+                    {
+                        Unfiltered atom = iter.next();
+                        if (predicate.test(atom))
+                            ++instances;
+                    }
+                }
+            }
+        }
+        return instances;
+    }
+}

diff --git a/test/long/org/apache/cassandra/db/commitlog/CommitLogStressTest.java b/test/long/org/apache/cassandra/db/commitlog/CommitLogStressTest.java
index d517055..239077e 100644
--- a/test/long/org/apache/cassandra/db/commitlog/CommitLogStressTest.java
+++ b/test/long/org/apache/cassandra/db/commitlog/CommitLogStressTest.java

@@ -21,44 +21,28 @@
  *
  */
 
-import java.io.File;
-import java.io.FileInputStream;
-import java.io.IOException;
+import java.io.*;
 import java.nio.ByteBuffer;
-import java.util.ArrayList;
-import java.util.Collection;
-import java.util.Iterator;
-import java.util.List;
-import java.util.Map;
-import java.util.Random;
-import java.util.concurrent.Executors;
-import java.util.concurrent.ScheduledExecutorService;
-import java.util.concurrent.ThreadLocalRandom;
-import java.util.concurrent.TimeUnit;
+import java.util.*;
+import java.util.concurrent.*;
+import java.util.concurrent.atomic.AtomicInteger;
 import java.util.concurrent.atomic.AtomicLong;
 
-import junit.framework.Assert;
-
 import com.google.common.util.concurrent.RateLimiter;
+import org.junit.*;
 
-import org.junit.Before;
-import org.junit.BeforeClass;
-import org.junit.Test;
-import org.apache.cassandra.SchemaLoader;
-import org.apache.cassandra.Util;
-import org.apache.cassandra.UpdateBuilder;
+import org.apache.cassandra.*;
 import org.apache.cassandra.config.Config.CommitLogSync;
-import org.apache.cassandra.config.DatabaseDescriptor;
-import org.apache.cassandra.config.ParameterizedClass;
-import org.apache.cassandra.config.Schema;
+import org.apache.cassandra.config.*;
 import org.apache.cassandra.db.Mutation;
-import org.apache.cassandra.db.rows.Cell;
-import org.apache.cassandra.db.rows.Row;
-import org.apache.cassandra.db.rows.SerializationHelper;
-import org.apache.cassandra.db.partitions.PartitionUpdate;
 import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.db.partitions.PartitionUpdate;
+import org.apache.cassandra.db.rows.*;
 import org.apache.cassandra.io.util.DataInputBuffer;
 import org.apache.cassandra.io.util.DataInputPlus;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.security.EncryptionContextGenerator;
+
 
 public class CommitLogStressTest
 {
@@ -129,7 +113,7 @@
     volatile boolean stop = false;
     boolean randomSize = false;
     boolean discardedRun = false;
-    ReplayPosition discardedPos;
+    CommitLogPosition discardedPos;
 
     @BeforeClass
     static public void initialize() throws IOException
@@ -146,10 +130,11 @@
 
         SchemaLoader.loadSchema();
         SchemaLoader.schemaDefinition(""); // leave def. blank to maintain old behaviour
+        CommitLog.instance.stopUnsafe(true);
     }
 
     @Before
-    public void cleanDir()
+    public void cleanDir() throws IOException
     {
         File dir = new File(location);
         if (dir.isDirectory())
@@ -186,8 +171,8 @@
     @Test
     public void testDiscardedRun() throws Exception
     {
-        discardedRun = true;
         randomSize = true;
+        discardedRun = true;
 
         testAllLogConfigs();
     }
@@ -198,37 +183,57 @@
         DatabaseDescriptor.setCommitLogSyncBatchWindow(1);
         DatabaseDescriptor.setCommitLogSyncPeriod(30);
         DatabaseDescriptor.setCommitLogSegmentSize(32);
-        for (ParameterizedClass compressor : new ParameterizedClass[] {
-                null,
-                new ParameterizedClass("LZ4Compressor", null),
-                new ParameterizedClass("SnappyCompressor", null),
-                new ParameterizedClass("DeflateCompressor", null) })
+
+        // test plain vanilla commit logs (the choice of 98% of users)
+        testLog(null, EncryptionContextGenerator.createDisabledContext());
+
+        // test the compression types
+        testLog(new ParameterizedClass("LZ4Compressor", null), EncryptionContextGenerator.createDisabledContext());
+        testLog(new ParameterizedClass("SnappyCompressor", null), EncryptionContextGenerator.createDisabledContext());
+        testLog(new ParameterizedClass("DeflateCompressor", null), EncryptionContextGenerator.createDisabledContext());
+
+        // test the encrypted commit log
+        testLog(null, EncryptionContextGenerator.createContext(true));
+    }
+
+    public void testLog(ParameterizedClass compression, EncryptionContext encryptionContext) throws IOException, InterruptedException
+    {
+        DatabaseDescriptor.setCommitLogCompression(compression);
+        DatabaseDescriptor.setEncryptionContext(encryptionContext);
+
+        String originalDir = DatabaseDescriptor.getCommitLogLocation();
+        try
         {
-            DatabaseDescriptor.setCommitLogCompression(compressor);
+            DatabaseDescriptor.setCommitLogLocation(location);
             for (CommitLogSync sync : CommitLogSync.values())
             {
                 DatabaseDescriptor.setCommitLogSync(sync);
-                CommitLog commitLog = new CommitLog(location, CommitLogArchiver.disabled()).start();
+                CommitLog commitLog = new CommitLog(CommitLogArchiver.disabled());
+                commitLog.segmentManager.enableReserveSegmentCreation();
+                commitLog.start();
                 testLog(commitLog);
+                assert !failed;
             }
         }
-        assert !failed;
+        finally
+        {
+            DatabaseDescriptor.setCommitLogLocation(originalDir);
+        }
     }
 
-    public void testLog(CommitLog commitLog) throws IOException, InterruptedException
-    {
-        System.out.format("\nTesting commit log size %.0fmb, compressor %s, sync %s%s%s\n",
-                          mb(DatabaseDescriptor.getCommitLogSegmentSize()),
-                          commitLog.configuration.getCompressorName(),
-                          commitLog.executor.getClass().getSimpleName(),
-                          randomSize ? " random size" : "",
-                          discardedRun ? " with discarded run" : "");
-        commitLog.allocator.enableReserveSegmentCreation();
+    public void testLog(CommitLog commitLog) throws IOException, InterruptedException {
+        System.out.format("\nTesting commit log size %.0fmb, compressor: %s, encryption enabled: %b, sync %s%s%s\n",
+                           mb(DatabaseDescriptor.getCommitLogSegmentSize()),
+                           commitLog.configuration.getCompressorName(),
+                           commitLog.configuration.useEncryption(),
+                           commitLog.executor.getClass().getSimpleName(),
+                           randomSize ? " random size" : "",
+                           discardedRun ? " with discarded run" : "");
 
-        final List<CommitlogExecutor> threads = new ArrayList<>();
+        final List<CommitlogThread> threads = new ArrayList<>();
         ScheduledExecutorService scheduled = startThreads(commitLog, threads);
 
-        discardedPos = ReplayPosition.NONE;
+        discardedPos = CommitLogPosition.NONE;
         if (discardedRun)
         {
             // Makes sure post-break data is not deleted, and that replayer correctly rejects earlier mutations.
@@ -237,17 +242,17 @@
             scheduled.shutdown();
             scheduled.awaitTermination(2, TimeUnit.SECONDS);
 
-            for (CommitlogExecutor t : threads)
+            for (CommitlogThread t: threads)
             {
                 t.join();
-                if (t.rp.compareTo(discardedPos) > 0)
-                    discardedPos = t.rp;
+                if (t.clsp.compareTo(discardedPos) > 0)
+                    discardedPos = t.clsp;
             }
             verifySizes(commitLog);
 
-            commitLog.discardCompletedSegments(Schema.instance.getCFMetaData("Keyspace1", "Standard1").cfId,
-                                               discardedPos);
+            commitLog.discardCompletedSegments(Schema.instance.getCFMetaData("Keyspace1", "Standard1").cfId, discardedPos);
             threads.clear();
+
             System.out.format("Discarded at %s\n", discardedPos);
             verifySizes(commitLog);
 
@@ -261,7 +266,7 @@
 
         int hash = 0;
         int cells = 0;
-        for (CommitlogExecutor t : threads)
+        for (CommitlogThread t: threads)
         {
             t.join();
             hash += t.hash;
@@ -271,25 +276,30 @@
 
         commitLog.shutdownBlocking();
 
-        System.out.print("Stopped. Replaying... ");
+        System.out.println("Stopped. Replaying... ");
         System.out.flush();
-        Replayer repl = new Replayer(commitLog);
+        Reader reader = new Reader();
         File[] files = new File(location).listFiles();
-        repl.recover(files);
+
+        DummyHandler handler = new DummyHandler();
+        reader.readAllFiles(handler, files);
 
         for (File f : files)
             if (!f.delete())
                 Assert.fail("Failed to delete " + f);
 
-        if (hash == repl.hash && cells == repl.cells)
-            System.out.println("Test success.");
+        if (hash == reader.hash && cells == reader.cells)
+            System.out.format("Test success. compressor = %s, encryption enabled = %b; discarded = %d, skipped = %d\n",
+                              commitLog.configuration.getCompressorName(),
+                              commitLog.configuration.useEncryption(),
+                              reader.discarded, reader.skipped);
         else
         {
-            System.out.format("Test failed. Cells %d expected %d, hash %d expected %d.\n",
-                              repl.cells,
-                              cells,
-                              repl.hash,
-                              hash);
+            System.out.format("Test failed (compressor = %s, encryption enabled = %b). Cells %d, expected %d, diff %d; discarded = %d, skipped = %d -  hash %d expected %d.\n",
+                              commitLog.configuration.getCompressorName(),
+                              commitLog.configuration.useEncryption(),
+                              reader.cells, cells, cells - reader.cells, reader.discarded, reader.skipped,
+                              reader.hash, hash);
             failed = true;
         }
     }
@@ -302,17 +312,18 @@
         // (which shouldn't write anything) to make sure the first we triggered completes.
         // FIXME: The executor should give us a chance to await completion of the sync we requested.
         commitLog.executor.requestExtraSync().awaitUninterruptibly();
+
         // Wait for any pending deletes or segment allocations to complete.
-        commitLog.allocator.awaitManagementTasksCompletion();
+        commitLog.segmentManager.awaitManagementTasksCompletion();
 
         long combinedSize = 0;
-        for (File f : new File(commitLog.location).listFiles())
+        for (File f : new File(DatabaseDescriptor.getCommitLogLocation()).listFiles())
             combinedSize += f.length();
         Assert.assertEquals(combinedSize, commitLog.getActiveOnDiskSize());
 
         List<String> logFileNames = commitLog.getActiveSegmentNames();
         Map<String, Double> ratios = commitLog.getActiveSegmentCompressionRatios();
-        Collection<CommitLogSegment> segments = commitLog.allocator.getActiveSegments();
+        Collection<CommitLogSegment> segments = commitLog.segmentManager.getActiveSegments();
 
         for (CommitLogSegment segment : segments)
         {
@@ -326,12 +337,11 @@
         Assert.assertTrue(ratios.isEmpty());
     }
 
-    public ScheduledExecutorService startThreads(final CommitLog commitLog, final List<CommitlogExecutor> threads)
+    public ScheduledExecutorService startThreads(final CommitLog commitLog, final List<CommitlogThread> threads)
     {
         stop = false;
-        for (int ii = 0; ii < NUM_THREADS; ii++)
-        {
-            final CommitlogExecutor t = new CommitlogExecutor(commitLog, new Random(ii));
+        for (int ii = 0; ii < NUM_THREADS; ii++) {
+            final CommitlogThread t = new CommitlogThread(commitLog, new Random(ii));
             threads.add(t);
             t.start();
         }
@@ -349,10 +359,10 @@
                 long freeMemory = runtime.freeMemory();
                 long temp = 0;
                 long sz = 0;
-                for (CommitlogExecutor cle : threads)
+                for (CommitlogThread clt : threads)
                 {
-                    temp += cle.counter.get();
-                    sz += cle.dataSize;
+                    temp += clt.counter.get();
+                    sz += clt.dataSize;
                 }
                 double time = (System.currentTimeMillis() - start) / 1000.0;
                 double avg = (temp / time);
@@ -397,18 +407,18 @@
         return slice;
     }
 
-    public class CommitlogExecutor extends Thread
-    {
+    public class CommitlogThread extends Thread {
         final AtomicLong counter = new AtomicLong();
         int hash = 0;
         int cells = 0;
         int dataSize = 0;
         final CommitLog commitLog;
         final Random random;
+        final AtomicInteger threadID = new AtomicInteger(0);
 
-        volatile ReplayPosition rp;
+        volatile CommitLogPosition clsp;
 
-        public CommitlogExecutor(CommitLog commitLog, Random rand)
+        public CommitlogThread(CommitLog commitLog, Random rand)
         {
             this.commitLog = commitLog;
             this.random = rand;
@@ -416,6 +426,7 @@
 
         public void run()
         {
+            Thread.currentThread().setName("CommitLogThread-" + threadID.getAndIncrement());
             RateLimiter rl = rateLimit != 0 ? RateLimiter.create(rateLimit) : null;
             final Random rand = random != null ? random : ThreadLocalRandom.current();
             while (!stop)
@@ -435,33 +446,39 @@
                     dataSize += sz;
                 }
 
-                rp = commitLog.add(new Mutation(builder.build()));
+                clsp = commitLog.add(new Mutation(builder.build()));
                 counter.incrementAndGet();
             }
         }
     }
 
-    class Replayer extends CommitLogReplayer
+    class Reader extends CommitLogReader
     {
-        Replayer(CommitLog log)
-        {
-            super(log, discardedPos, null, ReplayFilter.create());
-        }
-
-        int hash = 0;
-        int cells = 0;
+        int hash;
+        int cells;
+        int discarded;
+        int skipped;
 
         @Override
-        void replayMutation(byte[] inputBuffer, int size, final int entryLocation, final CommitLogDescriptor desc)
+        protected void readMutation(CommitLogReadHandler handler,
+                                    byte[] inputBuffer,
+                                    int size,
+                                    CommitLogPosition minPosition,
+                                    final int entryLocation,
+                                    final CommitLogDescriptor desc) throws IOException
         {
-            if (desc.id < discardedPos.segment)
+            if (desc.id < discardedPos.segmentId)
             {
                 System.out.format("Mutation from discarded segment, segment %d pos %d\n", desc.id, entryLocation);
+                discarded++;
                 return;
             }
-            else if (desc.id == discardedPos.segment && entryLocation <= discardedPos.position)
+            else if (desc.id == discardedPos.segmentId && entryLocation <= discardedPos.position)
+            {
                 // Skip over this mutation.
+                skipped++;
                 return;
+            }
 
             DataInputPlus bufIn = new DataInputBuffer(inputBuffer, 0, size);
             Mutation mutation;
@@ -497,4 +514,13 @@
             }
         }
     }
+
+    class DummyHandler implements CommitLogReadHandler
+    {
+        public boolean shouldSkipSegmentOnError(CommitLogReadException exception) throws IOException { return false; }
+
+        public void handleUnrecoverableError(CommitLogReadException exception) throws IOException { }
+
+        public void handleMutation(Mutation m, int size, int entryLocation, CommitLogDescriptor desc) { }
+    }
 }

diff --git a/test/long/org/apache/cassandra/db/compaction/LongLeveledCompactionStrategyTest.java b/test/long/org/apache/cassandra/db/compaction/LongLeveledCompactionStrategyTest.java
index 79497aa..f8f5a7c 100644
--- a/test/long/org/apache/cassandra/db/compaction/LongLeveledCompactionStrategyTest.java
+++ b/test/long/org/apache/cassandra/db/compaction/LongLeveledCompactionStrategyTest.java

@@ -45,6 +45,7 @@
 {
     public static final String KEYSPACE1 = "LongLeveledCompactionStrategyTest";
     public static final String CF_STANDARDLVL = "StandardLeveled";
+    public static final String CF_STANDARDLVL2 = "StandardLeveled2";
 
     @BeforeClass
     public static void defineSchema() throws ConfigurationException
@@ -55,6 +56,8 @@
         SchemaLoader.createKeyspace(KEYSPACE1,
                                     KeyspaceParams.simple(1),
                                     SchemaLoader.standardCFMD(KEYSPACE1, CF_STANDARDLVL)
+                                                .compaction(CompactionParams.lcs(leveledOptions)),
+                                    SchemaLoader.standardCFMD(KEYSPACE1, CF_STANDARDLVL2)
                                                 .compaction(CompactionParams.lcs(leveledOptions)));
     }
 
@@ -66,8 +69,8 @@
         Keyspace keyspace = Keyspace.open(ksname);
         ColumnFamilyStore store = keyspace.getColumnFamilyStore(cfname);
         store.disableAutoCompaction();
-
-        LeveledCompactionStrategy lcs = (LeveledCompactionStrategy)store.getCompactionStrategyManager().getStrategies().get(1);
+        CompactionStrategyManager mgr = store.getCompactionStrategyManager();
+        LeveledCompactionStrategy lcs = (LeveledCompactionStrategy) mgr.getStrategies().get(1).get(0);
 
         ByteBuffer value = ByteBuffer.wrap(new byte[100 * 1024]); // 100 KB value, make it easy to have multiple files
 
@@ -135,7 +138,7 @@
 
                 if (level > 0)
                 {// overlap check for levels greater than 0
-                    Set<SSTableReader> overlaps = LeveledManifest.overlapping(sstable, sstables);
+                    Set<SSTableReader> overlaps = LeveledManifest.overlapping(sstable.first.getToken(), sstable.last.getToken(), sstables);
                     assert overlaps.size() == 1 && overlaps.contains(sstable);
                 }
             }
@@ -145,14 +148,32 @@
     @Test
     public void testLeveledScanner() throws Exception
     {
-        testParallelLeveledCompaction();
         Keyspace keyspace = Keyspace.open(KEYSPACE1);
-        ColumnFamilyStore store = keyspace.getColumnFamilyStore(CF_STANDARDLVL);
+        ColumnFamilyStore store = keyspace.getColumnFamilyStore(CF_STANDARDLVL2);
+        ByteBuffer value = ByteBuffer.wrap(new byte[100 * 1024]); // 100 KB value, make it easy to have multiple files
+
+        // Enough data to have a level 1 and 2
+        int rows = 128;
+        int columns = 10;
+
+        // Adds enough data to trigger multiple sstable per level
+        for (int r = 0; r < rows; r++)
+        {
+            DecoratedKey key = Util.dk(String.valueOf(r));
+            UpdateBuilder builder = UpdateBuilder.create(store.metadata, key);
+            for (int c = 0; c < columns; c++)
+                builder.newRow("column" + c).add("val", value);
+
+            Mutation rm = new Mutation(builder.build());
+            rm.apply();
+            store.forceBlockingFlush();
+        }
+        LeveledCompactionStrategyTest.waitForLeveling(store);
         store.disableAutoCompaction();
+        CompactionStrategyManager mgr = store.getCompactionStrategyManager();
+        LeveledCompactionStrategy lcs = (LeveledCompactionStrategy) mgr.getStrategies().get(1).get(0);
 
-        LeveledCompactionStrategy lcs = (LeveledCompactionStrategy)store.getCompactionStrategyManager().getStrategies().get(1);
-
-        ByteBuffer value = ByteBuffer.wrap(new byte[10 * 1024]); // 10 KB value
+        value = ByteBuffer.wrap(new byte[10 * 1024]); // 10 KB value
 
         // Adds 10 partitions
         for (int r = 0; r < 10; r++)

diff --git a/test/long/org/apache/cassandra/io/compress/CompressorPerformance.java b/test/long/org/apache/cassandra/io/compress/CompressorPerformance.java
index 17122f5..e703839 100644
--- a/test/long/org/apache/cassandra/io/compress/CompressorPerformance.java
+++ b/test/long/org/apache/cassandra/io/compress/CompressorPerformance.java

@@ -23,6 +23,7 @@
 import java.io.FileInputStream;
 import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.util.Collections;
 import java.util.concurrent.ThreadLocalRandom;
 
 public class CompressorPerformance
@@ -33,7 +34,7 @@
         for (ICompressor compressor: new ICompressor[] {
                 SnappyCompressor.instance,  // warm up
                 DeflateCompressor.instance,
-                LZ4Compressor.instance,
+                LZ4Compressor.create(Collections.emptyMap()),
                 SnappyCompressor.instance
         })
         {

diff --git a/test/long/org/apache/cassandra/streaming/LongStreamingTest.java b/test/long/org/apache/cassandra/streaming/LongStreamingTest.java
new file mode 100644
index 0000000..7e53ba2
--- /dev/null
+++ b/test/long/org/apache/cassandra/streaming/LongStreamingTest.java

@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.streaming;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.io.Files;
+import org.junit.AfterClass;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import org.apache.cassandra.SchemaLoader;
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.Config;
+import org.apache.cassandra.config.Schema;
+import org.apache.cassandra.cql3.QueryProcessor;
+import org.apache.cassandra.cql3.UntypedResultSet;
+import org.apache.cassandra.db.Keyspace;
+import org.apache.cassandra.dht.Range;
+import org.apache.cassandra.dht.Token;
+import org.apache.cassandra.exceptions.InvalidRequestException;
+import org.apache.cassandra.io.sstable.CQLSSTableWriter;
+import org.apache.cassandra.io.sstable.SSTableLoader;
+import org.apache.cassandra.service.StorageService;
+import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.OutputHandler;
+
+import static org.junit.Assert.assertEquals;
+
+public class LongStreamingTest
+{
+    @BeforeClass
+    public static void setup() throws Exception
+    {
+        SchemaLoader.cleanupAndLeaveDirs();
+        Keyspace.setInitialized();
+        StorageService.instance.initServer();
+
+        StorageService.instance.setCompactionThroughputMbPerSec(0);
+        StorageService.instance.setStreamThroughputMbPerSec(0);
+        StorageService.instance.setInterDCStreamThroughputMbPerSec(0);
+    }
+
+    @AfterClass
+    public static void tearDown()
+    {
+        Config.setClientMode(false);
+    }
+
+    @Test
+    public void testCompressedStream() throws InvalidRequestException, IOException, ExecutionException, InterruptedException
+    {
+        String KS = "cql_keyspace";
+        String TABLE = "table1";
+
+        File tempdir = Files.createTempDir();
+        File dataDir = new File(tempdir.getAbsolutePath() + File.separator + KS + File.separator + TABLE);
+        assert dataDir.mkdirs();
+
+        String schema = "CREATE TABLE cql_keyspace.table1 ("
+                        + "  k int PRIMARY KEY,"
+                        + "  v1 text,"
+                        + "  v2 int"
+                        + ");";// with compression = {};";
+        String insert = "INSERT INTO cql_keyspace.table1 (k, v1, v2) VALUES (?, ?, ?)";
+        CQLSSTableWriter writer = CQLSSTableWriter.builder()
+                                                  .sorted()
+                                                  .inDirectory(dataDir)
+                                                  .forTable(schema)
+                                                  .using(insert).build();
+        long start = System.nanoTime();
+
+        for (int i = 0; i < 10_000_000; i++)
+            writer.addRow(i, "test1", 24);
+
+        writer.close();
+        System.err.println(String.format("Writer finished after %d seconds....", TimeUnit.NANOSECONDS.toSeconds(System.nanoTime() - start)));
+
+        File[] dataFiles = dataDir.listFiles((dir, name) -> name.endsWith("-Data.db"));
+        long dataSize = 0l;
+        for (File file : dataFiles)
+        {
+            System.err.println("File : "+file.getAbsolutePath());
+            dataSize += file.length();
+        }
+
+        SSTableLoader loader = new SSTableLoader(dataDir, new SSTableLoader.Client()
+        {
+            private String ks;
+            public void init(String keyspace)
+            {
+                for (Range<Token> range : StorageService.instance.getLocalRanges("cql_keyspace"))
+                    addRangeForEndpoint(range, FBUtilities.getBroadcastAddress());
+
+                this.ks = keyspace;
+            }
+
+            public CFMetaData getTableMetadata(String cfName)
+            {
+                return Schema.instance.getCFMetaData(ks, cfName);
+            }
+        }, new OutputHandler.SystemOutput(false, false));
+
+        start = System.nanoTime();
+        loader.stream().get();
+
+        long millis = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
+        System.err.println(String.format("Finished Streaming in %.2f seconds: %.2f Mb/sec",
+                                         millis/1000d,
+                                         (dataSize / (1 << 20) / (millis / 1000d)) * 8));
+
+
+        //Stream again
+        loader = new SSTableLoader(dataDir, new SSTableLoader.Client()
+        {
+            private String ks;
+            public void init(String keyspace)
+            {
+                for (Range<Token> range : StorageService.instance.getLocalRanges("cql_keyspace"))
+                    addRangeForEndpoint(range, FBUtilities.getBroadcastAddress());
+
+                this.ks = keyspace;
+            }
+
+            public CFMetaData getTableMetadata(String cfName)
+            {
+                return Schema.instance.getCFMetaData(ks, cfName);
+            }
+        }, new OutputHandler.SystemOutput(false, false));
+
+        start = System.nanoTime();
+        loader.stream().get();
+
+        millis = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
+        System.err.println(String.format("Finished Streaming in %.2f seconds: %.2f Mb/sec",
+                                         millis/1000d,
+                                         (dataSize / (1 << 20) / (millis / 1000d)) * 8));
+
+
+        //Compact them both
+        start = System.nanoTime();
+        Keyspace.open(KS).getColumnFamilyStore(TABLE).forceMajorCompaction();
+        millis = TimeUnit.NANOSECONDS.toMillis(System.nanoTime() - start);
+
+        System.err.println(String.format("Finished Compacting in %.2f seconds: %.2f Mb/sec",
+                                         millis / 1000d,
+                                         (dataSize * 2 / (1 << 20) / (millis / 1000d)) * 8));
+
+        UntypedResultSet rs = QueryProcessor.executeInternal("SELECT * FROM cql_keyspace.table1 limit 100;");
+        assertEquals(100, rs.size());
+    }
+}

diff --git a/test/microbench/org/apache/cassandra/test/microbench/DirectorySizerBench.java b/test/microbench/org/apache/cassandra/test/microbench/DirectorySizerBench.java
new file mode 100644
index 0000000..a653c81
--- /dev/null
+++ b/test/microbench/org/apache/cassandra/test/microbench/DirectorySizerBench.java

@@ -0,0 +1,105 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.test.microbench;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.PrintWriter;
+import java.nio.file.Files;
+import java.util.Arrays;
+import java.util.UUID;
+import java.util.concurrent.TimeUnit;
+
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.utils.DirectorySizeCalculator;
+import org.openjdk.jmh.annotations.*;
+import org.openjdk.jmh.infra.Blackhole;
+
+@BenchmarkMode(Mode.AverageTime)
+@OutputTimeUnit(TimeUnit.MILLISECONDS)
+@Warmup(iterations = 1)
+@Measurement(iterations = 30)
+@Fork(value = 1,jvmArgsAppend = "-Xmx512M")
+@Threads(1)
+@State(Scope.Benchmark)
+public class DirectorySizerBench
+{
+    private File tempDir;
+    private DirectorySizeCalculator sizer;
+
+    @Setup(Level.Trial)
+    public void setUp() throws IOException
+    {
+        tempDir = Files.createTempDirectory(randString()).toFile();
+
+        // Since #'s on laptops and commodity desktops are so useful in considering enterprise virtualized server environments...
+
+        // Spinning disk 7200rpm 1TB, win10, ntfs, i6600 skylake, 256 files:
+        // [java] Result: 0.581 ▒(99.9%) 0.003 ms/op [Average]
+        // [java]   Statistics: (min, avg, max) = (0.577, 0.581, 0.599), stdev = 0.005
+        // [java]   Confidence interval (99.9%): [0.577, 0.584]
+
+        // Same hardware, 25600 files:
+        // [java] Result: 56.990 ▒(99.9%) 0.374 ms/op [Average]
+        // [java]   Statistics: (min, avg, max) = (56.631, 56.990, 59.829), stdev = 0.560
+        // [java]   Confidence interval (99.9%): [56.616, 57.364]
+
+        // #'s on a rmbp, 2014, SSD, ubuntu 15.10, ext4, i7-4850HQ @ 2.3, 25600 samples
+        // [java] Result: 74.714 ±(99.9%) 0.558 ms/op [Average]
+        // [java]   Statistics: (min, avg, max) = (73.687, 74.714, 76.872), stdev = 0.835
+        // [java]   Confidence interval (99.9%): [74.156, 75.272]
+
+        // Throttle CPU on the Windows box to .87GHZ from 4.3GHZ turbo single-core, and #'s for 25600:
+        // [java] Result: 298.628 ▒(99.9%) 14.755 ms/op [Average]
+        // [java]   Statistics: (min, avg, max) = (291.245, 298.628, 412.881), stdev = 22.085
+        // [java]   Confidence interval (99.9%): [283.873, 313.383]
+
+        // Test w/25,600 files, 100x the load of a full default CommitLog (8192) divided by size (32 per)
+        populateRandomFiles(tempDir, 25600);
+        sizer = new DirectorySizeCalculator(tempDir);
+    }
+
+    @TearDown
+    public void tearDown()
+    {
+        FileUtils.deleteRecursive(tempDir);
+    }
+
+    private void populateRandomFiles(File dir, int count) throws IOException
+    {
+        for (int i = 0; i < count; i++)
+        {
+            PrintWriter pw = new PrintWriter(dir + File.separator + randString(), "UTF-8");
+            pw.write(randString());
+            pw.close();
+        }
+    }
+
+    private String randString()
+    {
+        return UUID.randomUUID().toString();
+    }
+
+    @Benchmark
+    public void countFiles(final Blackhole bh) throws IOException
+    {
+        sizer.rebuildFileList();
+        Files.walkFileTree(tempDir.toPath(), sizer);
+    }
+}

diff --git a/test/resources/auth/cassandra-test-jaas.conf b/test/resources/auth/cassandra-test-jaas.conf
new file mode 100644
index 0000000..ccb8b6a
--- /dev/null
+++ b/test/resources/auth/cassandra-test-jaas.conf

@@ -0,0 +1,4 @@
+// Delegates authentication to a stub login module, hardcoded to authenticate as a particular user - see JMXAuthTest
+TestLogin {
+  org.apache.cassandra.auth.jmx.JMXAuthTest$StubLoginModule REQUIRED role_name=test_role;
+};

diff --git a/test/resources/tokenization/adventures_of_huckleberry_finn_mark_twain.txt b/test/resources/tokenization/adventures_of_huckleberry_finn_mark_twain.txt
new file mode 100644
index 0000000..27cadc3
--- /dev/null
+++ b/test/resources/tokenization/adventures_of_huckleberry_finn_mark_twain.txt

@@ -0,0 +1,12361 @@
+

+

+The Project Gutenberg EBook of Adventures of Huckleberry Finn, Complete

+by Mark Twain (Samuel Clemens)

+

+This eBook is for the use of anyone anywhere at no cost and with almost

+no restrictions whatsoever. You may copy it, give it away or re-use

+it under the terms of the Project Gutenberg License included with this

+eBook or online at www.gutenberg.net

+

+Title: Adventures of Huckleberry Finn, Complete

+

+Author: Mark Twain (Samuel Clemens)

+

+Release Date: August 20, 2006 [EBook #76]

+

+Last Updated: October 20, 2012]

+

+Language: English

+

+

+*** START OF THIS PROJECT GUTENBERG EBOOK HUCKLEBERRY FINN ***

+

+Produced by David Widger

+

+

+

+

+

+ADVENTURES

+

+OF

+

+HUCKLEBERRY FINN

+

+(Tom Sawyer's Comrade)

+

+By Mark Twain

+

+Complete

+

+

+

+

+CONTENTS.

+

+CHAPTER I. Civilizing Huck.Miss Watson.Tom Sawyer Waits.

+

+CHAPTER II. The Boys Escape Jim.Torn Sawyer's Gang.Deep-laid Plans.

+

+CHAPTER III. A Good Going-over.Grace Triumphant."One of Tom Sawyers's

+Lies".

+

+CHAPTER IV. Huck and the Judge.Superstition.

+

+CHAPTER V. Huck's Father.The Fond Parent.Reform.

+

+CHAPTER VI. He Went for Judge Thatcher.Huck Decided to Leave.Political

+Economy.Thrashing Around.

+

+CHAPTER VII. Laying for Him.Locked in the Cabin.Sinking the

+Body.Resting.

+

+CHAPTER VIII. Sleeping in the Woods.Raising the Dead.Exploring the

+Island.Finding Jim.Jim's Escape.Signs.Balum.

+

+CHAPTER IX. The Cave.The Floating House.

+

+CHAPTER X. The Find.Old Hank Bunker.In Disguise.

+

+CHAPTER XI. Huck and the Woman.The Search.Prevarication.Going to

+Goshen.

+

+CHAPTER XII. Slow Navigation.Borrowing Things.Boarding the Wreck.The

+Plotters.Hunting for the Boat.

+

+CHAPTER XIII. Escaping from the Wreck.The Watchman.Sinking.

+

+CHAPTER XIV. A General Good Time.The Harem.French.

+

+CHAPTER XV. Huck Loses the Raft.In the Fog.Huck Finds the Raft.Trash.

+

+CHAPTER XVI. Expectation.A White Lie.Floating Currency.Running by

+Cairo.Swimming Ashore.

+

+CHAPTER XVII. An Evening Call.The Farm in Arkansaw.Interior

+Decorations.Stephen Dowling Bots.Poetical Effusions.

+

+CHAPTER XVIII. Col. Grangerford.Aristocracy.Feuds.The

+Testament.Recovering the Raft.The Woodpile.Pork and Cabbage.

+

+CHAPTER XIX. Tying Up Daytimes.An Astronomical Theory.Running a

+Temperance Revival.The Duke of Bridgewater.The Troubles of Royalty.

+

+CHAPTER XX. Huck Explains.Laying Out a Campaign.Working the

+Campmeeting.A Pirate at the Campmeeting.The Duke as a Printer.

+

+CHAPTER XXI. Sword Exercise.Hamlet's Soliloquy.They Loafed Around

+Town.A Lazy Town.Old Boggs.Dead.

+

+CHAPTER XXII. Sherburn.Attending the Circus.Intoxication in the

+Ring.The Thrilling Tragedy.

+

+CHAPTER XXIII. Sold.Royal Comparisons.Jim Gets Home-sick.

+

+CHAPTER XXIV. Jim in Royal Robes.They Take a Passenger.Getting

+Information.Family Grief.

+

+CHAPTER XXV. Is It Them?Singing the "Doxologer."Awful SquareFuneral

+Orgies.A Bad Investment .

+

+CHAPTER XXVI. A Pious King.The King's Clergy.She Asked His

+Pardon.Hiding in the Room.Huck Takes the Money.

+

+CHAPTER XXVII. The Funeral.Satisfying Curiosity.Suspicious of

+Huck,Quick Sales and Small.

+

+CHAPTER XXVIII. The Trip to England."The Brute!"Mary Jane Decides to

+Leave.Huck Parting with Mary Jane.Mumps.The Opposition Line.

+

+CHAPTER XXIX. Contested Relationship.The King Explains the Loss.A

+Question of Handwriting.Digging up the Corpse.Huck Escapes.

+

+CHAPTER XXX. The King Went for Him.A Royal Row.Powerful Mellow.

+

+CHAPTER XXXI. Ominous Plans.News from Jim.Old Recollections.A Sheep

+Story.Valuable Information.

+

+CHAPTER XXXII. Still and Sundaylike.Mistaken Identity.Up a Stump.In

+a Dilemma.

+

+CHAPTER XXXIII. A Nigger Stealer.Southern Hospitality.A Pretty Long

+Blessing.Tar and Feathers.

+

+CHAPTER XXXIV. The Hut by the Ash Hopper.Outrageous.Climbing the

+Lightning Rod.Troubled with Witches.

+

+CHAPTER XXXV. Escaping Properly.Dark Schemes.Discrimination in

+Stealing.A Deep Hole.

+

+CHAPTER XXXVI. The Lightning Rod.His Level Best.A Bequest to

+Posterity.A High Figure.

+

+CHAPTER XXXVII. The Last Shirt.Mooning Around.Sailing Orders.The

+Witch Pie.

+

+CHAPTER XXXVIII. The Coat of Arms.A Skilled Superintendent.Unpleasant

+Glory.A Tearful Subject.

+

+CHAPTER XXXIX. Rats.Lively Bedfellows.The Straw Dummy.

+

+CHAPTER XL. Fishing.The Vigilance Committee.A Lively Run.Jim Advises

+a Doctor.

+

+CHAPTER XLI. The Doctor.Uncle Silas.Sister Hotchkiss.Aunt Sally in

+Trouble.

+

+CHAPTER XLII. Tom Sawyer Wounded.The Doctor's Story.Tom

+Confesses.Aunt Polly Arrives.Hand Out Them Letters    .

+

+CHAPTER THE LAST. Out of Bondage.Paying the Captive.Yours Truly, Huck

+Finn.

+

+

+

+

+ILLUSTRATIONS.

+

+The Widows

+

+Moses and the "Bulrushers"

+

+Miss Watson

+

+Huck Stealing Away

+

+They Tip-toed Along

+

+Jim

+

+Tom Sawyer's Band of Robbers  

+

+Huck Creeps into his Window

+

+Miss Watson's Lecture

+

+The Robbers Dispersed

+

+Rubbing the Lamp

+

+! ! ! !

+

+Judge Thatcher surprised

+

+Jim Listening

+

+"Pap"

+

+Huck and his Father

+

+Reforming the Drunkard

+

+Falling from Grace

+

+The Widows

+

+Moses and the "Bulrushers"

+

+Miss Watson

+

+Huck Stealing Away

+

+They Tip-toed Along

+

+Jim

+

+Tom Sawyer's Band of Robbers  

+

+Huck Creeps into his Window

+

+Miss Watson's Lecture

+

+The Robbers Dispersed

+

+Rubbing the Lamp

+

+! ! ! !

+

+Judge Thatcher surprised

+

+Jim Listening

+

+"Pap"

+

+Huck and his Father

+

+Reforming the Drunkard

+

+Falling from Grace

+

+Getting out of the Way

+

+Solid Comfort

+

+Thinking it Over

+

+Raising a Howl

+

+"Git Up"

+

+The Shanty

+

+Shooting the Pig

+

+Taking a Rest

+

+In the Woods

+

+Watching the Boat

+

+Discovering the Camp Fire

+

+Jim and the Ghost

+

+Misto Bradish's Nigger

+

+Exploring the Cave

+

+In the Cave

+

+Jim sees a Dead Man

+

+They Found Eight Dollars

+

+Jim and the Snake

+

+Old Hank Bunker

+

+"A Fair Fit"

+

+"Come In"

+

+"Him and another Man"

+

+She puts up a Snack

+

+"Hump Yourself"

+

+On the Raft

+

+He sometimes Lifted a Chicken

+

+"Please don't, Bill"

+

+"It ain't Good Morals"

+

+"Oh! Lordy, Lordy!"

+

+In a Fix

+

+"Hello, What's Up?"

+

+The Wreck

+

+We turned in and Slept

+

+Turning over the Truck

+

+Solomon and his Million Wives

+

+The story of "Sollermun"

+

+"We Would Sell the Raft"

+

+Among the Snags

+

+Asleep on the Raft

+

+"Something being Raftsman"

+

+"Boy, that's a Lie"

+

+"Here I is, Huck"

+

+Climbing up the Bank

+

+"Who's There?"

+

+"Buck"

+

+"It made Her look Spidery"

+

+"They got him out and emptied Him"  

+

+The House

+

+Col. Grangerford

+

+Young Harney Shepherdson

+

+Miss Charlotte

+

+"And asked me if I Liked Her"

+

+"Behind the Wood-pile"

+

+Hiding Day-times

+

+"And Dogs a-Coming"

+

+"By rights I am a Duke!"

+

+"I am the Late Dauphin"

+

+Tail Piece

+

+On the Raft

+

+The King as Juliet

+

+"Courting on the Sly"

+

+"A Pirate for Thirty Years"

+

+Another little Job

+

+Practizing

+

+Hamlet's Soliloquy

+

+"Gimme a Chaw"

+

+A Little Monthly Drunk

+

+The Death of Boggs

+

+Sherburn steps out

+

+A Dead Head

+

+He shed Seventeen Suits

+

+Tragedy

+

+Their Pockets Bulged

+

+Henry the Eighth in Boston Harbor

+

+Harmless

+

+Adolphus

+

+He fairly emptied that Young Fellow

+

+"Alas, our Poor Brother"

+

+"You Bet it is"

+

+Leaking

+

+Making up the "Deffisit"

+

+Going for him

+

+The Doctor

+

+The Bag of Money

+

+The Cubby

+

+Supper with the Hare-Lip

+

+Honest Injun

+

+The Duke looks under the Bed

+

+Huck takes the Money

+

+A Crack in the Dining-room Door

+

+The Undertaker

+

+"He had a Rat!"

+

+"Was you in my Room?"

+

+Jawing

+

+In Trouble

+

+Indignation

+

+How to Find Them

+

+He Wrote

+

+Hannah with the Mumps

+

+The Auction

+

+The True Brothers

+

+The Doctor leads Huck

+

+The Duke Wrote

+

+"Gentlemen, Gentlemen!"

+

+"Jim Lit Out"

+

+The King shakes Huck

+

+The Duke went for Him

+

+Spanish Moss

+

+"Who Nailed Him?"

+

+Thinking

+

+He gave him Ten Cents

+

+Striking for the Back Country

+

+Still and Sunday-like

+

+She hugged him tight

+

+"Who do you reckon it is?"

+

+"It was Tom Sawyer"

+

+"Mr. Archibald Nichols, I presume?"

+

+A pretty long Blessing

+

+Traveling By Rail

+

+Vittles

+

+A Simple Job

+

+Witches

+

+Getting Wood

+

+One of the Best Authorities

+

+The Breakfast-Horn

+

+Smouching the Knives

+

+Going down the Lightning-Rod

+

+Stealing spoons

+

+Tom advises a Witch Pie

+

+The Rubbage-Pile

+

+"Missus, dey's a Sheet Gone"

+

+In a Tearing Way

+

+One of his Ancestors

+

+Jim's Coat of Arms

+

+A Tough Job

+

+Buttons on their Tails

+

+Irrigation

+

+Keeping off Dull Times

+

+Sawdust Diet

+

+Trouble is Brewing

+

+Fishing

+

+Every one had a Gun

+

+Tom caught on a Splinter

+

+Jim advises a Doctor

+

+The Doctor

+

+Uncle Silas in Danger

+

+Old Mrs. Hotchkiss

+

+Aunt Sally talks to Huck

+

+Tom Sawyer wounded

+

+The Doctor speaks for Jim

+

+Tom rose square up in Bed

+

+"Hand out them Letters"

+

+Out of Bondage

+

+Tom's Liberality

+

+Yours Truly

+

+

+

+

+EXPLANATORY

+

+IN this book a number of dialects are used, to wit:  the Missouri negro

+dialect; the extremest form of the backwoods Southwestern dialect; the

+ordinary "Pike County" dialect; and four modified varieties of this

+last. The shadings have not been done in a haphazard fashion, or by

+guesswork; but painstakingly, and with the trustworthy guidance and

+support of personal familiarity with these several forms of speech.

+

+I make this explanation for the reason that without it many readers

+would suppose that all these characters were trying to talk alike and

+not succeeding.

+

+THE AUTHOR.

+

+

+

+

+HUCKLEBERRY FINN

+

+Scene:  The Mississippi Valley Time:  Forty to fifty years ago

+

+

+

+

+CHAPTER I.

+

+YOU don't know about me without you have read a book by the name of The

+Adventures of Tom Sawyer; but that ain't no matter.  That book was made

+by Mr. Mark Twain, and he told the truth, mainly.  There was things

+which he stretched, but mainly he told the truth.  That is nothing.  I

+never seen anybody but lied one time or another, without it was Aunt

+Polly, or the widow, or maybe Mary.  Aunt PollyTom's Aunt Polly, she

+isand Mary, and the Widow Douglas is all told about in that book, which

+is mostly a true book, with some stretchers, as I said before.

+

+Now the way that the book winds up is this:  Tom and me found the money

+that the robbers hid in the cave, and it made us rich.  We got six

+thousand dollars apieceall gold.  It was an awful sight of money when

+it was piled up.  Well, Judge Thatcher he took it and put it out

+at interest, and it fetched us a dollar a day apiece all the year

+roundmore than a body could tell what to do with.  The Widow Douglas

+she took me for her son, and allowed she would sivilize me; but it was

+rough living in the house all the time, considering how dismal regular

+and decent the widow was in all her ways; and so when I couldn't stand

+it no longer I lit out.  I got into my old rags and my sugar-hogshead

+again, and was free and satisfied.  But Tom Sawyer he hunted me up and

+said he was going to start a band of robbers, and I might join if I

+would go back to the widow and be respectable.  So I went back.

+

+The widow she cried over me, and called me a poor lost lamb, and she

+called me a lot of other names, too, but she never meant no harm by

+it. She put me in them new clothes again, and I couldn't do nothing but

+sweat and sweat, and feel all cramped up.  Well, then, the old thing

+commenced again.  The widow rung a bell for supper, and you had to come

+to time. When you got to the table you couldn't go right to eating, but

+you had to wait for the widow to tuck down her head and grumble a little

+over the victuals, though there warn't really anything the matter with

+them,that is, nothing only everything was cooked by itself.  In a

+barrel of odds and ends it is different; things get mixed up, and the

+juice kind of swaps around, and the things go better.

+

+After supper she got out her book and learned me about Moses and the

+Bulrushers, and I was in a sweat to find out all about him; but by and

+by she let it out that Moses had been dead a considerable long time; so

+then I didn't care no more about him, because I don't take no stock in

+dead people.

+

+Pretty soon I wanted to smoke, and asked the widow to let me.  But she

+wouldn't.  She said it was a mean practice and wasn't clean, and I must

+try to not do it any more.  That is just the way with some people.  They

+get down on a thing when they don't know nothing about it.  Here she was

+a-bothering about Moses, which was no kin to her, and no use to anybody,

+being gone, you see, yet finding a power of fault with me for doing a

+thing that had some good in it.  And she took snuff, too; of course that

+was all right, because she done it herself.

+

+Her sister, Miss Watson, a tolerable slim old maid, with goggles on,

+had just come to live with her, and took a set at me now with a

+spelling-book. She worked me middling hard for about an hour, and then

+the widow made her ease up.  I couldn't stood it much longer.  Then for

+an hour it was deadly dull, and I was fidgety.  Miss Watson would say,

+"Don't put your feet up there, Huckleberry;" and "Don't scrunch up

+like that, Huckleberryset up straight;" and pretty soon she would

+say, "Don't gap and stretch like that, Huckleberrywhy don't you try to

+behave?"  Then she told me all about the bad place, and I said I wished

+I was there. She got mad then, but I didn't mean no harm.  All I wanted

+was to go somewheres; all I wanted was a change, I warn't particular.

+ She said it was wicked to say what I said; said she wouldn't say it for

+the whole world; she was going to live so as to go to the good place.

+ Well, I couldn't see no advantage in going where she was going, so I

+made up my mind I wouldn't try for it.  But I never said so, because it

+would only make trouble, and wouldn't do no good.

+

+Now she had got a start, and she went on and told me all about the good

+place.  She said all a body would have to do there was to go around all

+day long with a harp and sing, forever and ever.  So I didn't think

+much of it. But I never said so.  I asked her if she reckoned Tom Sawyer

+would go there, and she said not by a considerable sight.  I was glad

+about that, because I wanted him and me to be together.

+

+Miss Watson she kept pecking at me, and it got tiresome and lonesome.

+ By and by they fetched the niggers in and had prayers, and then

+everybody was off to bed.  I went up to my room with a piece of candle,

+and put it on the table.  Then I set down in a chair by the window and

+tried to think of something cheerful, but it warn't no use.  I felt

+so lonesome I most wished I was dead.  The stars were shining, and the

+leaves rustled in the woods ever so mournful; and I heard an owl, away

+off, who-whooing about somebody that was dead, and a whippowill and a

+dog crying about somebody that was going to die; and the wind was trying

+to whisper something to me, and I couldn't make out what it was, and so

+it made the cold shivers run over me. Then away out in the woods I heard

+that kind of a sound that a ghost makes when it wants to tell about

+something that's on its mind and can't make itself understood, and so

+can't rest easy in its grave, and has to go about that way every night

+grieving.  I got so down-hearted and scared I did wish I had some

+company.  Pretty soon a spider went crawling up my shoulder, and I

+flipped it off and it lit in the candle; and before I could budge it

+was all shriveled up.  I didn't need anybody to tell me that that was

+an awful bad sign and would fetch me some bad luck, so I was scared

+and most shook the clothes off of me. I got up and turned around in my

+tracks three times and crossed my breast every time; and then I tied

+up a little lock of my hair with a thread to keep witches away.  But

+I hadn't no confidence.  You do that when you've lost a horseshoe that

+you've found, instead of nailing it up over the door, but I hadn't ever

+heard anybody say it was any way to keep off bad luck when you'd killed

+a spider.

+

+I set down again, a-shaking all over, and got out my pipe for a smoke;

+for the house was all as still as death now, and so the widow wouldn't

+know. Well, after a long time I heard the clock away off in the town

+go boomboomboomtwelve licks; and all still againstiller than

+ever. Pretty soon I heard a twig snap down in the dark amongst the

+treessomething was a stirring.  I set still and listened.  Directly I

+could just barely hear a "me-yow! me-yow!" down there.  That was good!

+ Says I, "me-yow! me-yow!" as soft as I could, and then I put out the

+light and scrambled out of the window on to the shed.  Then I slipped

+down to the ground and crawled in among the trees, and, sure enough,

+there was Tom Sawyer waiting for me.

+

+

+

+

+CHAPTER II.

+

+WE went tiptoeing along a path amongst the trees back towards the end of

+the widow's garden, stooping down so as the branches wouldn't scrape our

+heads. When we was passing by the kitchen I fell over a root and made

+a noise.  We scrouched down and laid still.  Miss Watson's big nigger,

+named Jim, was setting in the kitchen door; we could see him pretty

+clear, because there was a light behind him.  He got up and stretched

+his neck out about a minute, listening.  Then he says:

+

+"Who dah?"

+

+He listened some more; then he come tiptoeing down and stood right

+between us; we could a touched him, nearly.  Well, likely it was

+minutes and minutes that there warn't a sound, and we all there so close

+together.  There was a place on my ankle that got to itching, but I

+dasn't scratch it; and then my ear begun to itch; and next my back,

+right between my shoulders.  Seemed like I'd die if I couldn't scratch.

+ Well, I've noticed that thing plenty times since.  If you are with

+the quality, or at a funeral, or trying to go to sleep when you ain't

+sleepyif you are anywheres where it won't do for you to scratch, why

+you will itch all over in upwards of a thousand places. Pretty soon Jim

+says:

+

+"Say, who is you?  Whar is you?  Dog my cats ef I didn' hear sumf'n.

+Well, I know what I's gwyne to do:  I's gwyne to set down here and

+listen tell I hears it agin."

+

+So he set down on the ground betwixt me and Tom.  He leaned his back up

+against a tree, and stretched his legs out till one of them most touched

+one of mine.  My nose begun to itch.  It itched till the tears come into

+my eyes.  But I dasn't scratch.  Then it begun to itch on the inside.

+Next I got to itching underneath.  I didn't know how I was going to set

+still. This miserableness went on as much as six or seven minutes; but

+it seemed a sight longer than that.  I was itching in eleven different

+places now.  I reckoned I couldn't stand it more'n a minute longer,

+but I set my teeth hard and got ready to try.  Just then Jim begun

+to breathe heavy; next he begun to snoreand then I was pretty soon

+comfortable again.

+

+Tom he made a sign to mekind of a little noise with his mouthand we

+went creeping away on our hands and knees.  When we was ten foot off Tom

+whispered to me, and wanted to tie Jim to the tree for fun.  But I said

+no; he might wake and make a disturbance, and then they'd find out I

+warn't in. Then Tom said he hadn't got candles enough, and he would slip

+in the kitchen and get some more.  I didn't want him to try.  I said Jim

+might wake up and come.  But Tom wanted to resk it; so we slid in there

+and got three candles, and Tom laid five cents on the table for pay.

+Then we got out, and I was in a sweat to get away; but nothing would do

+Tom but he must crawl to where Jim was, on his hands and knees, and play

+something on him.  I waited, and it seemed a good while, everything was

+so still and lonesome.

+

+As soon as Tom was back we cut along the path, around the garden fence,

+and by and by fetched up on the steep top of the hill the other side of

+the house.  Tom said he slipped Jim's hat off of his head and hung it

+on a limb right over him, and Jim stirred a little, but he didn't wake.

+Afterwards Jim said the witches be witched him and put him in a trance,

+and rode him all over the State, and then set him under the trees again,

+and hung his hat on a limb to show who done it.  And next time Jim told

+it he said they rode him down to New Orleans; and, after that, every

+time he told it he spread it more and more, till by and by he said they

+rode him all over the world, and tired him most to death, and his back

+was all over saddle-boils.  Jim was monstrous proud about it, and he

+got so he wouldn't hardly notice the other niggers.  Niggers would come

+miles to hear Jim tell about it, and he was more looked up to than any

+nigger in that country.  Strange niggers would stand with their mouths

+open and look him all over, same as if he was a wonder.  Niggers is

+always talking about witches in the dark by the kitchen fire; but

+whenever one was talking and letting on to know all about such things,

+Jim would happen in and say, "Hm!  What you know 'bout witches?" and

+that nigger was corked up and had to take a back seat.  Jim always kept

+that five-center piece round his neck with a string, and said it was a

+charm the devil give to him with his own hands, and told him he could

+cure anybody with it and fetch witches whenever he wanted to just by

+saying something to it; but he never told what it was he said to it.

+ Niggers would come from all around there and give Jim anything they

+had, just for a sight of that five-center piece; but they wouldn't touch

+it, because the devil had had his hands on it.  Jim was most ruined for

+a servant, because he got stuck up on account of having seen the devil

+and been rode by witches.

+

+Well, when Tom and me got to the edge of the hilltop we looked away down

+into the village and could see three or four lights twinkling, where

+there was sick folks, maybe; and the stars over us was sparkling ever

+so fine; and down by the village was the river, a whole mile broad, and

+awful still and grand.  We went down the hill and found Jo Harper and

+Ben Rogers, and two or three more of the boys, hid in the old tanyard.

+ So we unhitched a skiff and pulled down the river two mile and a half,

+to the big scar on the hillside, and went ashore.

+

+We went to a clump of bushes, and Tom made everybody swear to keep the

+secret, and then showed them a hole in the hill, right in the thickest

+part of the bushes.  Then we lit the candles, and crawled in on our

+hands and knees.  We went about two hundred yards, and then the cave

+opened up. Tom poked about amongst the passages, and pretty soon ducked

+under a wall where you wouldn't a noticed that there was a hole.  We

+went along a narrow place and got into a kind of room, all damp and

+sweaty and cold, and there we stopped.  Tom says:

+

+"Now, we'll start this band of robbers and call it Tom Sawyer's Gang.

+Everybody that wants to join has got to take an oath, and write his name

+in blood."

+

+Everybody was willing.  So Tom got out a sheet of paper that he had

+wrote the oath on, and read it.  It swore every boy to stick to the

+band, and never tell any of the secrets; and if anybody done anything to

+any boy in the band, whichever boy was ordered to kill that person and

+his family must do it, and he mustn't eat and he mustn't sleep till he

+had killed them and hacked a cross in their breasts, which was the sign

+of the band. And nobody that didn't belong to the band could use that

+mark, and if he did he must be sued; and if he done it again he must be

+killed.  And if anybody that belonged to the band told the secrets, he

+must have his throat cut, and then have his carcass burnt up and the

+ashes scattered all around, and his name blotted off of the list with

+blood and never mentioned again by the gang, but have a curse put on it

+and be forgot forever.

+

+Everybody said it was a real beautiful oath, and asked Tom if he got

+it out of his own head.  He said, some of it, but the rest was out of

+pirate-books and robber-books, and every gang that was high-toned had

+it.

+

+Some thought it would be good to kill the families of boys that told

+the secrets.  Tom said it was a good idea, so he took a pencil and wrote

+it in. Then Ben Rogers says:

+

+"Here's Huck Finn, he hain't got no family; what you going to do 'bout

+him?"

+

+"Well, hain't he got a father?" says Tom Sawyer.

+

+"Yes, he's got a father, but you can't never find him these days.  He

+used to lay drunk with the hogs in the tanyard, but he hain't been seen

+in these parts for a year or more."

+

+They talked it over, and they was going to rule me out, because they

+said every boy must have a family or somebody to kill, or else it

+wouldn't be fair and square for the others.  Well, nobody could think of

+anything to doeverybody was stumped, and set still.  I was most ready

+to cry; but all at once I thought of a way, and so I offered them Miss

+Watsonthey could kill her.  Everybody said:

+

+"Oh, she'll do.  That's all right.  Huck can come in."

+

+Then they all stuck a pin in their fingers to get blood to sign with,

+and I made my mark on the paper.

+

+"Now," says Ben Rogers, "what's the line of business of this Gang?"

+

+"Nothing only robbery and murder," Tom said.

+

+"But who are we going to rob?houses, or cattle, or"

+

+"Stuff! stealing cattle and such things ain't robbery; it's burglary,"

+says Tom Sawyer.  "We ain't burglars.  That ain't no sort of style.  We

+are highwaymen.  We stop stages and carriages on the road, with masks

+on, and kill the people and take their watches and money."

+

+"Must we always kill the people?"

+

+"Oh, certainly.  It's best.  Some authorities think different, but

+mostly it's considered best to kill themexcept some that you bring to

+the cave here, and keep them till they're ransomed."

+

+"Ransomed?  What's that?"

+

+"I don't know.  But that's what they do.  I've seen it in books; and so

+of course that's what we've got to do."

+

+"But how can we do it if we don't know what it is?"

+

+"Why, blame it all, we've got to do it.  Don't I tell you it's in the

+books?  Do you want to go to doing different from what's in the books,

+and get things all muddled up?"

+

+"Oh, that's all very fine to say, Tom Sawyer, but how in the nation

+are these fellows going to be ransomed if we don't know how to do it

+to them?that's the thing I want to get at.  Now, what do you reckon it

+is?"

+

+"Well, I don't know.  But per'aps if we keep them till they're ransomed,

+it means that we keep them till they're dead."

+

+"Now, that's something like.  That'll answer.  Why couldn't you said

+that before?  We'll keep them till they're ransomed to death; and a

+bothersome lot they'll be, tooeating up everything, and always trying

+to get loose."

+

+"How you talk, Ben Rogers.  How can they get loose when there's a guard

+over them, ready to shoot them down if they move a peg?"

+

+"A guard!  Well, that is good.  So somebody's got to set up all night

+and never get any sleep, just so as to watch them.  I think that's

+foolishness. Why can't a body take a club and ransom them as soon as

+they get here?"

+

+"Because it ain't in the books sothat's why.  Now, Ben Rogers, do you

+want to do things regular, or don't you?that's the idea.  Don't you

+reckon that the people that made the books knows what's the correct

+thing to do?  Do you reckon you can learn 'em anything?  Not by a good

+deal. No, sir, we'll just go on and ransom them in the regular way."

+

+"All right.  I don't mind; but I say it's a fool way, anyhow.  Say, do

+we kill the women, too?"

+

+"Well, Ben Rogers, if I was as ignorant as you I wouldn't let on.  Kill

+the women?  No; nobody ever saw anything in the books like that.  You

+fetch them to the cave, and you're always as polite as pie to them;

+and by and by they fall in love with you, and never want to go home any

+more."

+

+"Well, if that's the way I'm agreed, but I don't take no stock in it.

+Mighty soon we'll have the cave so cluttered up with women, and fellows

+waiting to be ransomed, that there won't be no place for the robbers.

+But go ahead, I ain't got nothing to say."

+

+Little Tommy Barnes was asleep now, and when they waked him up he was

+scared, and cried, and said he wanted to go home to his ma, and didn't

+want to be a robber any more.

+

+So they all made fun of him, and called him cry-baby, and that made him

+mad, and he said he would go straight and tell all the secrets.  But

+Tom give him five cents to keep quiet, and said we would all go home and

+meet next week, and rob somebody and kill some people.

+

+Ben Rogers said he couldn't get out much, only Sundays, and so he wanted

+to begin next Sunday; but all the boys said it would be wicked to do it

+on Sunday, and that settled the thing.  They agreed to get together and

+fix a day as soon as they could, and then we elected Tom Sawyer first

+captain and Jo Harper second captain of the Gang, and so started home.

+

+I clumb up the shed and crept into my window just before day was

+breaking. My new clothes was all greased up and clayey, and I was

+dog-tired.

+

+

+

+

+CHAPTER III.

+

+WELL, I got a good going-over in the morning from old Miss Watson on

+account of my clothes; but the widow she didn't scold, but only cleaned

+off the grease and clay, and looked so sorry that I thought I would

+behave awhile if I could.  Then Miss Watson she took me in the closet

+and prayed, but nothing come of it.  She told me to pray every day, and

+whatever I asked for I would get it.  But it warn't so.  I tried it.

+Once I got a fish-line, but no hooks.  It warn't any good to me without

+hooks.  I tried for the hooks three or four times, but somehow I

+couldn't make it work.  By and by, one day, I asked Miss Watson to

+try for me, but she said I was a fool.  She never told me why, and I

+couldn't make it out no way.

+

+I set down one time back in the woods, and had a long think about it.

+ I says to myself, if a body can get anything they pray for, why don't

+Deacon Winn get back the money he lost on pork?  Why can't the widow get

+back her silver snuffbox that was stole?  Why can't Miss Watson fat up?

+No, says I to my self, there ain't nothing in it.  I went and told the

+widow about it, and she said the thing a body could get by praying for

+it was "spiritual gifts."  This was too many for me, but she told me

+what she meantI must help other people, and do everything I could for

+other people, and look out for them all the time, and never think about

+myself. This was including Miss Watson, as I took it.  I went out in the

+woods and turned it over in my mind a long time, but I couldn't see no

+advantage about itexcept for the other people; so at last I reckoned

+I wouldn't worry about it any more, but just let it go.  Sometimes the

+widow would take me one side and talk about Providence in a way to make

+a body's mouth water; but maybe next day Miss Watson would take hold

+and knock it all down again.  I judged I could see that there was two

+Providences, and a poor chap would stand considerable show with the

+widow's Providence, but if Miss Watson's got him there warn't no help

+for him any more.  I thought it all out, and reckoned I would belong

+to the widow's if he wanted me, though I couldn't make out how he was

+a-going to be any better off then than what he was before, seeing I was

+so ignorant, and so kind of low-down and ornery.

+

+Pap he hadn't been seen for more than a year, and that was comfortable

+for me; I didn't want to see him no more.  He used to always whale me

+when he was sober and could get his hands on me; though I used to take

+to the woods most of the time when he was around.  Well, about this time

+he was found in the river drownded, about twelve mile above town, so

+people said.  They judged it was him, anyway; said this drownded man was

+just his size, and was ragged, and had uncommon long hair, which was all

+like pap; but they couldn't make nothing out of the face, because it had

+been in the water so long it warn't much like a face at all.  They said

+he was floating on his back in the water.  They took him and buried him

+on the bank.  But I warn't comfortable long, because I happened to think

+of something.  I knowed mighty well that a drownded man don't float on

+his back, but on his face.  So I knowed, then, that this warn't pap, but

+a woman dressed up in a man's clothes.  So I was uncomfortable again.

+ I judged the old man would turn up again by and by, though I wished he

+wouldn't.

+

+We played robber now and then about a month, and then I resigned.  All

+the boys did.  We hadn't robbed nobody, hadn't killed any people, but

+only just pretended.  We used to hop out of the woods and go charging

+down on hog-drivers and women in carts taking garden stuff to market,

+but we never hived any of them.  Tom Sawyer called the hogs "ingots,"

+and he called the turnips and stuff "julery," and we would go to the

+cave and powwow over what we had done, and how many people we had killed

+and marked.  But I couldn't see no profit in it.  One time Tom sent a

+boy to run about town with a blazing stick, which he called a slogan

+(which was the sign for the Gang to get together), and then he said he

+had got secret news by his spies that next day a whole parcel of Spanish

+merchants and rich A-rabs was going to camp in Cave Hollow with two

+hundred elephants, and six hundred camels, and over a thousand "sumter"

+mules, all loaded down with di'monds, and they didn't have only a guard

+of four hundred soldiers, and so we would lay in ambuscade, as he called

+it, and kill the lot and scoop the things.  He said we must slick up

+our swords and guns, and get ready.  He never could go after even a

+turnip-cart but he must have the swords and guns all scoured up for it,

+though they was only lath and broomsticks, and you might scour at them

+till you rotted, and then they warn't worth a mouthful of ashes more

+than what they was before.  I didn't believe we could lick such a crowd

+of Spaniards and A-rabs, but I wanted to see the camels and elephants,

+so I was on hand next day, Saturday, in the ambuscade; and when we got

+the word we rushed out of the woods and down the hill.  But there warn't

+no Spaniards and A-rabs, and there warn't no camels nor no elephants.

+ It warn't anything but a Sunday-school picnic, and only a primer-class

+at that.  We busted it up, and chased the children up the hollow; but we

+never got anything but some doughnuts and jam, though Ben Rogers got

+a rag doll, and Jo Harper got a hymn-book and a tract; and then the

+teacher charged in, and made us drop everything and cut.

+

+ I didn't see no di'monds, and I told Tom Sawyer so.  He said there was

+loads of them there, anyway; and he said there was A-rabs there, too,

+and elephants and things.  I said, why couldn't we see them, then?  He

+said if I warn't so ignorant, but had read a book called Don Quixote, I

+would know without asking.  He said it was all done by enchantment.  He

+said there was hundreds of soldiers there, and elephants and treasure,

+and so on, but we had enemies which he called magicians; and they had

+turned the whole thing into an infant Sunday-school, just out of spite.

+ I said, all right; then the thing for us to do was to go for the

+magicians.  Tom Sawyer said I was a numskull.

+

+"Why," said he, "a magician could call up a lot of genies, and they

+would hash you up like nothing before you could say Jack Robinson.  They

+are as tall as a tree and as big around as a church."

+

+"Well," I says, "s'pose we got some genies to help uscan't we lick

+the other crowd then?"

+

+"How you going to get them?"

+

+"I don't know.  How do they get them?"

+

+"Why, they rub an old tin lamp or an iron ring, and then the genies

+come tearing in, with the thunder and lightning a-ripping around and the

+smoke a-rolling, and everything they're told to do they up and do it.

+ They don't think nothing of pulling a shot-tower up by the roots, and

+belting a Sunday-school superintendent over the head with itor any

+other man."

+

+"Who makes them tear around so?"

+

+"Why, whoever rubs the lamp or the ring.  They belong to whoever rubs

+the lamp or the ring, and they've got to do whatever he says.  If he

+tells them to build a palace forty miles long out of di'monds, and fill

+it full of chewing-gum, or whatever you want, and fetch an emperor's

+daughter from China for you to marry, they've got to do itand they've

+got to do it before sun-up next morning, too.  And more:  they've got

+to waltz that palace around over the country wherever you want it, you

+understand."

+

+"Well," says I, "I think they are a pack of flat-heads for not keeping

+the palace themselves 'stead of fooling them away like that.  And what's

+moreif I was one of them I would see a man in Jericho before I would

+drop my business and come to him for the rubbing of an old tin lamp."

+

+"How you talk, Huck Finn.  Why, you'd have to come when he rubbed it,

+whether you wanted to or not."

+

+"What! and I as high as a tree and as big as a church?  All right, then;

+I would come; but I lay I'd make that man climb the highest tree there

+was in the country."

+

+"Shucks, it ain't no use to talk to you, Huck Finn.  You don't seem to

+know anything, somehowperfect saphead."

+

+I thought all this over for two or three days, and then I reckoned I

+would see if there was anything in it.  I got an old tin lamp and an

+iron ring, and went out in the woods and rubbed and rubbed till I sweat

+like an Injun, calculating to build a palace and sell it; but it warn't

+no use, none of the genies come.  So then I judged that all that stuff

+was only just one of Tom Sawyer's lies.  I reckoned he believed in the

+A-rabs and the elephants, but as for me I think different.  It had all

+the marks of a Sunday-school.

+

+

+

+

+CHAPTER IV.

+

+WELL, three or four months run along, and it was well into the winter

+now. I had been to school most all the time and could spell and read and

+write just a little, and could say the multiplication table up to six

+times seven is thirty-five, and I don't reckon I could ever get any

+further than that if I was to live forever.  I don't take no stock in

+mathematics, anyway.

+

+At first I hated the school, but by and by I got so I could stand it.

+Whenever I got uncommon tired I played hookey, and the hiding I got next

+day done me good and cheered me up.  So the longer I went to school the

+easier it got to be.  I was getting sort of used to the widow's ways,

+too, and they warn't so raspy on me.  Living in a house and sleeping in

+a bed pulled on me pretty tight mostly, but before the cold weather I

+used to slide out and sleep in the woods sometimes, and so that was a

+rest to me.  I liked the old ways best, but I was getting so I liked the

+new ones, too, a little bit. The widow said I was coming along slow but

+sure, and doing very satisfactory.  She said she warn't ashamed of me.

+

+One morning I happened to turn over the salt-cellar at breakfast.

+ I reached for some of it as quick as I could to throw over my left

+shoulder and keep off the bad luck, but Miss Watson was in ahead of me,

+and crossed me off. She says, "Take your hands away, Huckleberry; what

+a mess you are always making!"  The widow put in a good word for me, but

+that warn't going to keep off the bad luck, I knowed that well enough.

+ I started out, after breakfast, feeling worried and shaky, and

+wondering where it was going to fall on me, and what it was going to be.

+ There is ways to keep off some kinds of bad luck, but this wasn't one

+of them kind; so I never tried to do anything, but just poked along

+low-spirited and on the watch-out.

+

+I went down to the front garden and clumb over the stile where you go

+through the high board fence.  There was an inch of new snow on the

+ground, and I seen somebody's tracks.  They had come up from the quarry

+and stood around the stile a while, and then went on around the garden

+fence.  It was funny they hadn't come in, after standing around so.  I

+couldn't make it out.  It was very curious, somehow.  I was going to

+follow around, but I stooped down to look at the tracks first.  I didn't

+notice anything at first, but next I did.  There was a cross in the left

+boot-heel made with big nails, to keep off the devil.

+

+I was up in a second and shinning down the hill.  I looked over my

+shoulder every now and then, but I didn't see nobody.  I was at Judge

+Thatcher's as quick as I could get there.  He said:

+

+"Why, my boy, you are all out of breath.  Did you come for your

+interest?"

+

+"No, sir," I says; "is there some for me?"

+

+"Oh, yes, a half-yearly is in last nightover a hundred and fifty

+dollars.  Quite a fortune for you.  You had better let me invest it

+along with your six thousand, because if you take it you'll spend it."

+

+"No, sir," I says, "I don't want to spend it.  I don't want it at

+allnor the six thousand, nuther.  I want you to take it; I want to give

+it to youthe six thousand and all."

+

+He looked surprised.  He couldn't seem to make it out.  He says:

+

+"Why, what can you mean, my boy?"

+

+I says, "Don't you ask me no questions about it, please.  You'll take

+itwon't you?"

+

+He says:

+

+"Well, I'm puzzled.  Is something the matter?"

+

+"Please take it," says I, "and don't ask me nothingthen I won't have to

+tell no lies."

+

+He studied a while, and then he says:

+

+"Oho-o!  I think I see.  You want to sell all your property to menot

+give it.  That's the correct idea."

+

+Then he wrote something on a paper and read it over, and says:

+

+"There; you see it says 'for a consideration.'  That means I have bought

+it of you and paid you for it.  Here's a dollar for you.  Now you sign

+it."

+

+So I signed it, and left.

+

+Miss Watson's nigger, Jim, had a hair-ball as big as your fist, which

+had been took out of the fourth stomach of an ox, and he used to do

+magic with it.  He said there was a spirit inside of it, and it knowed

+everything.  So I went to him that night and told him pap was here

+again, for I found his tracks in the snow.  What I wanted to know was,

+what he was going to do, and was he going to stay?  Jim got out his

+hair-ball and said something over it, and then he held it up and dropped

+it on the floor.  It fell pretty solid, and only rolled about an inch.

+ Jim tried it again, and then another time, and it acted just the same.

+ Jim got down on his knees, and put his ear against it and listened.

+ But it warn't no use; he said it wouldn't talk. He said sometimes it

+wouldn't talk without money.  I told him I had an old slick counterfeit

+quarter that warn't no good because the brass showed through the silver

+a little, and it wouldn't pass nohow, even if the brass didn't show,

+because it was so slick it felt greasy, and so that would tell on it

+every time.  (I reckoned I wouldn't say nothing about the dollar I got

+from the judge.) I said it was pretty bad money, but maybe the hair-ball

+would take it, because maybe it wouldn't know the difference.  Jim smelt

+it and bit it and rubbed it, and said he would manage so the hair-ball

+would think it was good.  He said he would split open a raw Irish potato

+and stick the quarter in between and keep it there all night, and next

+morning you couldn't see no brass, and it wouldn't feel greasy no more,

+and so anybody in town would take it in a minute, let alone a hair-ball.

+ Well, I knowed a potato would do that before, but I had forgot it.

+

+Jim put the quarter under the hair-ball, and got down and listened

+again. This time he said the hair-ball was all right.  He said it

+would tell my whole fortune if I wanted it to.  I says, go on.  So the

+hair-ball talked to Jim, and Jim told it to me.  He says:

+

+"Yo' ole father doan' know yit what he's a-gwyne to do.  Sometimes he

+spec he'll go 'way, en den agin he spec he'll stay.  De bes' way is to

+res' easy en let de ole man take his own way.  Dey's two angels hoverin'

+roun' 'bout him.  One uv 'em is white en shiny, en t'other one is black.

+De white one gits him to go right a little while, den de black one sail

+in en bust it all up.  A body can't tell yit which one gwyne to fetch

+him at de las'.  But you is all right.  You gwyne to have considable

+trouble in yo' life, en considable joy.  Sometimes you gwyne to git

+hurt, en sometimes you gwyne to git sick; but every time you's gwyne

+to git well agin.  Dey's two gals flyin' 'bout you in yo' life.  One

+uv 'em's light en t'other one is dark. One is rich en t'other is po'.

+ You's gwyne to marry de po' one fust en de rich one by en by.  You

+wants to keep 'way fum de water as much as you kin, en don't run no

+resk, 'kase it's down in de bills dat you's gwyne to git hung."

+

+When I lit my candle and went up to my room that night there sat pap his

+own self!

+

+

+

+

+CHAPTER V.

+

+I had shut the door to.  Then I turned around and there he was.  I used

+to be scared of him all the time, he tanned me so much.  I reckoned I

+was scared now, too; but in a minute I see I was mistakenthat is, after

+the first jolt, as you may say, when my breath sort of hitched, he being

+so unexpected; but right away after I see I warn't scared of him worth

+bothring about.

+

+He was most fifty, and he looked it.  His hair was long and tangled and

+greasy, and hung down, and you could see his eyes shining through

+like he was behind vines.  It was all black, no gray; so was his long,

+mixed-up whiskers.  There warn't no color in his face, where his face

+showed; it was white; not like another man's white, but a white to make

+a body sick, a white to make a body's flesh crawla tree-toad white, a

+fish-belly white.  As for his clothesjust rags, that was all.  He had

+one ankle resting on t'other knee; the boot on that foot was busted, and

+two of his toes stuck through, and he worked them now and then.  His hat

+was laying on the flooran old black slouch with the top caved in, like

+a lid.

+

+I stood a-looking at him; he set there a-looking at me, with his chair

+tilted back a little.  I set the candle down.  I noticed the window was

+up; so he had clumb in by the shed.  He kept a-looking me all over.  By

+and by he says:

+

+"Starchy clothesvery.  You think you're a good deal of a big-bug,

+don't you?"

+

+"Maybe I am, maybe I ain't," I says.

+

+"Don't you give me none o' your lip," says he.  "You've put on

+considerable many frills since I been away.  I'll take you down a peg

+before I get done with you.  You're educated, too, they saycan read and

+write.  You think you're better'n your father, now, don't you, because

+he can't?  I'll take it out of you.  Who told you you might meddle

+with such hifalut'n foolishness, hey?who told you you could?"

+

+"The widow.  She told me."

+

+"The widow, hey?and who told the widow she could put in her shovel

+about a thing that ain't none of her business?"

+

+"Nobody never told her."

+

+"Well, I'll learn her how to meddle.  And looky hereyou drop that

+school, you hear?  I'll learn people to bring up a boy to put on airs

+over his own father and let on to be better'n what he is.  You lemme

+catch you fooling around that school again, you hear?  Your mother

+couldn't read, and she couldn't write, nuther, before she died.  None

+of the family couldn't before they died.  I can't; and here you're

+a-swelling yourself up like this.  I ain't the man to stand ityou hear?

+Say, lemme hear you read."

+

+I took up a book and begun something about General Washington and the

+wars. When I'd read about a half a minute, he fetched the book a whack

+with his hand and knocked it across the house.  He says:

+

+"It's so.  You can do it.  I had my doubts when you told me.  Now looky

+here; you stop that putting on frills.  I won't have it.  I'll lay for

+you, my smarty; and if I catch you about that school I'll tan you good.

+First you know you'll get religion, too.  I never see such a son."

+

+He took up a little blue and yaller picture of some cows and a boy, and

+says:

+

+"What's this?"

+

+"It's something they give me for learning my lessons good."

+

+He tore it up, and says:

+

+"I'll give you something betterI'll give you a cowhide."

+

+He set there a-mumbling and a-growling a minute, and then he says:

+

+"Ain't you a sweet-scented dandy, though?  A bed; and bedclothes; and

+a look'n'-glass; and a piece of carpet on the floorand your own father

+got to sleep with the hogs in the tanyard.  I never see such a son.  I

+bet I'll take some o' these frills out o' you before I'm done with you.

+Why, there ain't no end to your airsthey say you're rich.  Hey?how's

+that?"

+

+"They liethat's how."

+

+"Looky heremind how you talk to me; I'm a-standing about all I can

+stand nowso don't gimme no sass.  I've been in town two days, and I

+hain't heard nothing but about you bein' rich.  I heard about it

+away down the river, too.  That's why I come.  You git me that money

+to-morrowI want it."

+

+"I hain't got no money."

+

+"It's a lie.  Judge Thatcher's got it.  You git it.  I want it."

+

+"I hain't got no money, I tell you.  You ask Judge Thatcher; he'll tell

+you the same."

+

+"All right.  I'll ask him; and I'll make him pungle, too, or I'll know

+the reason why.  Say, how much you got in your pocket?  I want it."

+

+"I hain't got only a dollar, and I want that to"

+

+"It don't make no difference what you want it foryou just shell it

+out."

+

+He took it and bit it to see if it was good, and then he said he was

+going down town to get some whisky; said he hadn't had a drink all day.

+When he had got out on the shed he put his head in again, and cussed

+me for putting on frills and trying to be better than him; and when I

+reckoned he was gone he come back and put his head in again, and told me

+to mind about that school, because he was going to lay for me and lick

+me if I didn't drop that.

+

+Next day he was drunk, and he went to Judge Thatcher's and bullyragged

+him, and tried to make him give up the money; but he couldn't, and then

+he swore he'd make the law force him.

+

+The judge and the widow went to law to get the court to take me away

+from him and let one of them be my guardian; but it was a new judge that

+had just come, and he didn't know the old man; so he said courts mustn't

+interfere and separate families if they could help it; said he'd druther

+not take a child away from its father.  So Judge Thatcher and the widow

+had to quit on the business.

+

+That pleased the old man till he couldn't rest.  He said he'd cowhide

+me till I was black and blue if I didn't raise some money for him.  I

+borrowed three dollars from Judge Thatcher, and pap took it and got

+drunk, and went a-blowing around and cussing and whooping and carrying

+on; and he kept it up all over town, with a tin pan, till most midnight;

+then they jailed him, and next day they had him before court, and jailed

+him again for a week.  But he said he was satisfied; said he was boss

+of his son, and he'd make it warm for him.

+

+When he got out the new judge said he was a-going to make a man of him.

+So he took him to his own house, and dressed him up clean and nice, and

+had him to breakfast and dinner and supper with the family, and was just

+old pie to him, so to speak.  And after supper he talked to him about

+temperance and such things till the old man cried, and said he'd been

+a fool, and fooled away his life; but now he was a-going to turn over

+a new leaf and be a man nobody wouldn't be ashamed of, and he hoped the

+judge would help him and not look down on him.  The judge said he could

+hug him for them words; so he cried, and his wife she cried again; pap

+said he'd been a man that had always been misunderstood before, and the

+judge said he believed it.  The old man said that what a man wanted

+that was down was sympathy, and the judge said it was so; so they cried

+again.  And when it was bedtime the old man rose up and held out his

+hand, and says:

+

+"Look at it, gentlemen and ladies all; take a-hold of it; shake it.

+There's a hand that was the hand of a hog; but it ain't so no more; it's

+the hand of a man that's started in on a new life, and'll die before

+he'll go back.  You mark them wordsdon't forget I said them.  It's a

+clean hand now; shake itdon't be afeard."

+

+So they shook it, one after the other, all around, and cried.  The

+judge's wife she kissed it.  Then the old man he signed a pledgemade

+his mark. The judge said it was the holiest time on record, or something

+like that. Then they tucked the old man into a beautiful room, which was

+the spare room, and in the night some time he got powerful thirsty and

+clumb out on to the porch-roof and slid down a stanchion and traded his

+new coat for a jug of forty-rod, and clumb back again and had a good old

+time; and towards daylight he crawled out again, drunk as a fiddler, and

+rolled off the porch and broke his left arm in two places, and was most

+froze to death when somebody found him after sun-up.  And when they come

+to look at that spare room they had to take soundings before they could

+navigate it.

+

+The judge he felt kind of sore.  He said he reckoned a body could reform

+the old man with a shotgun, maybe, but he didn't know no other way.

+

+

+

+

+CHAPTER VI.

+

+WELL, pretty soon the old man was up and around again, and then he went

+for Judge Thatcher in the courts to make him give up that money, and he

+went for me, too, for not stopping school.  He catched me a couple of

+times and thrashed me, but I went to school just the same, and dodged

+him or outrun him most of the time.  I didn't want to go to school much

+before, but I reckoned I'd go now to spite pap.  That law trial was a

+slow businessappeared like they warn't ever going to get started on it;

+so every now and then I'd borrow two or three dollars off of the judge

+for him, to keep from getting a cowhiding.  Every time he got money he

+got drunk; and every time he got drunk he raised Cain around town; and

+every time he raised Cain he got jailed.  He was just suitedthis kind

+of thing was right in his line.

+

+He got to hanging around the widow's too much and so she told him at

+last that if he didn't quit using around there she would make trouble

+for him. Well, wasn't he mad?  He said he would show who was Huck

+Finn's boss.  So he watched out for me one day in the spring, and

+catched me, and took me up the river about three mile in a skiff, and

+crossed over to the Illinois shore where it was woody and there warn't

+no houses but an old log hut in a place where the timber was so thick

+you couldn't find it if you didn't know where it was.

+

+He kept me with him all the time, and I never got a chance to run off.

+We lived in that old cabin, and he always locked the door and put the

+key under his head nights.  He had a gun which he had stole, I reckon,

+and we fished and hunted, and that was what we lived on.  Every little

+while he locked me in and went down to the store, three miles, to the

+ferry, and traded fish and game for whisky, and fetched it home and got

+drunk and had a good time, and licked me.  The widow she found out where

+I was by and by, and she sent a man over to try to get hold of me; but

+pap drove him off with the gun, and it warn't long after that till I was

+used to being where I was, and liked itall but the cowhide part.

+

+It was kind of lazy and jolly, laying off comfortable all day, smoking

+and fishing, and no books nor study.  Two months or more run along, and

+my clothes got to be all rags and dirt, and I didn't see how I'd ever

+got to like it so well at the widow's, where you had to wash, and eat on

+a plate, and comb up, and go to bed and get up regular, and be forever

+bothering over a book, and have old Miss Watson pecking at you all the

+time.  I didn't want to go back no more.  I had stopped cussing, because

+the widow didn't like it; but now I took to it again because pap hadn't

+no objections.  It was pretty good times up in the woods there, take it

+all around.

+

+But by and by pap got too handy with his hick'ry, and I couldn't stand

+it. I was all over welts.  He got to going away so much, too, and

+locking me in.  Once he locked me in and was gone three days.  It was

+dreadful lonesome.  I judged he had got drownded, and I wasn't ever

+going to get out any more.  I was scared.  I made up my mind I would fix

+up some way to leave there.  I had tried to get out of that cabin many

+a time, but I couldn't find no way.  There warn't a window to it big

+enough for a dog to get through.  I couldn't get up the chimbly; it

+was too narrow.  The door was thick, solid oak slabs.  Pap was pretty

+careful not to leave a knife or anything in the cabin when he was away;

+I reckon I had hunted the place over as much as a hundred times; well, I

+was most all the time at it, because it was about the only way to put in

+the time.  But this time I found something at last; I found an old rusty

+wood-saw without any handle; it was laid in between a rafter and the

+clapboards of the roof. I greased it up and went to work.  There was an

+old horse-blanket nailed against the logs at the far end of the cabin

+behind the table, to keep the wind from blowing through the chinks and

+putting the candle out.  I got under the table and raised the blanket,

+and went to work to saw a section of the big bottom log outbig enough

+to let me through.  Well, it was a good long job, but I was getting

+towards the end of it when I heard pap's gun in the woods.  I got rid of

+the signs of my work, and dropped the blanket and hid my saw, and pretty

+soon pap come in.

+

+Pap warn't in a good humorso he was his natural self.  He said he was

+down town, and everything was going wrong.  His lawyer said he reckoned

+he would win his lawsuit and get the money if they ever got started on

+the trial; but then there was ways to put it off a long time, and Judge

+Thatcher knowed how to do it. And he said people allowed there'd be

+another trial to get me away from him and give me to the widow for my

+guardian, and they guessed it would win this time.  This shook me up

+considerable, because I didn't want to go back to the widow's any more

+and be so cramped up and sivilized, as they called it.  Then the old man

+got to cussing, and cussed everything and everybody he could think of,

+and then cussed them all over again to make sure he hadn't skipped any,

+and after that he polished off with a kind of a general cuss all round,

+including a considerable parcel of people which he didn't know the names

+of, and so called them what's-his-name when he got to them, and went

+right along with his cussing.

+

+He said he would like to see the widow get me.  He said he would watch

+out, and if they tried to come any such game on him he knowed of a place

+six or seven mile off to stow me in, where they might hunt till they

+dropped and they couldn't find me.  That made me pretty uneasy again,

+but only for a minute; I reckoned I wouldn't stay on hand till he got

+that chance.

+

+The old man made me go to the skiff and fetch the things he had

+got. There was a fifty-pound sack of corn meal, and a side of bacon,

+ammunition, and a four-gallon jug of whisky, and an old book and two

+newspapers for wadding, besides some tow.  I toted up a load, and went

+back and set down on the bow of the skiff to rest.  I thought it all

+over, and I reckoned I would walk off with the gun and some lines, and

+take to the woods when I run away.  I guessed I wouldn't stay in one

+place, but just tramp right across the country, mostly night times, and

+hunt and fish to keep alive, and so get so far away that the old man nor

+the widow couldn't ever find me any more.  I judged I would saw out and

+leave that night if pap got drunk enough, and I reckoned he would.  I

+got so full of it I didn't notice how long I was staying till the old

+man hollered and asked me whether I was asleep or drownded.

+

+I got the things all up to the cabin, and then it was about dark.  While

+I was cooking supper the old man took a swig or two and got sort of

+warmed up, and went to ripping again.  He had been drunk over in town,

+and laid in the gutter all night, and he was a sight to look at.  A body

+would a thought he was Adamhe was just all mud.  Whenever his liquor

+begun to work he most always went for the govment, this time he says:

+

+"Call this a govment! why, just look at it and see what it's like.

+Here's the law a-standing ready to take a man's son away from hima

+man's own son, which he has had all the trouble and all the anxiety

+and all the expense of raising.  Yes, just as that man has got that

+son raised at last, and ready to go to work and begin to do suthin' for

+him and give him a rest, the law up and goes for him.  And they call

+that govment!  That ain't all, nuther.  The law backs that old Judge

+Thatcher up and helps him to keep me out o' my property.  Here's what

+the law does:  The law takes a man worth six thousand dollars and

+up'ards, and jams him into an old trap of a cabin like this, and lets

+him go round in clothes that ain't fitten for a hog. They call that

+govment!  A man can't get his rights in a govment like this. Sometimes

+I've a mighty notion to just leave the country for good and all. Yes,

+and I told 'em so; I told old Thatcher so to his face.  Lots of 'em

+heard me, and can tell what I said.  Says I, for two cents I'd leave the

+blamed country and never come a-near it agin.  Them's the very words.  I

+says look at my hatif you call it a hatbut the lid raises up and the

+rest of it goes down till it's below my chin, and then it ain't rightly

+a hat at all, but more like my head was shoved up through a jint o'

+stove-pipe.  Look at it, says Isuch a hat for me to wearone of the

+wealthiest men in this town if I could git my rights.

+

+"Oh, yes, this is a wonderful govment, wonderful.  Why, looky here.

+There was a free nigger there from Ohioa mulatter, most as white as

+a white man.  He had the whitest shirt on you ever see, too, and the

+shiniest hat; and there ain't a man in that town that's got as fine

+clothes as what he had; and he had a gold watch and chain, and a

+silver-headed canethe awfulest old gray-headed nabob in the State.  And

+what do you think?  They said he was a p'fessor in a college, and could

+talk all kinds of languages, and knowed everything.  And that ain't the

+wust. They said he could vote when he was at home.  Well, that let me

+out. Thinks I, what is the country a-coming to?  It was 'lection day,

+and I was just about to go and vote myself if I warn't too drunk to get

+there; but when they told me there was a State in this country where

+they'd let that nigger vote, I drawed out.  I says I'll never vote agin.

+ Them's the very words I said; they all heard me; and the country may

+rot for all meI'll never vote agin as long as I live.  And to see the

+cool way of that niggerwhy, he wouldn't a give me the road if I hadn't

+shoved him out o' the way.  I says to the people, why ain't this nigger

+put up at auction and sold?that's what I want to know.  And what do you

+reckon they said? Why, they said he couldn't be sold till he'd been in

+the State six months, and he hadn't been there that long yet.  There,

+nowthat's a specimen.  They call that a govment that can't sell a free

+nigger till he's been in the State six months.  Here's a govment that

+calls itself a govment, and lets on to be a govment, and thinks it is a

+govment, and yet's got to set stock-still for six whole months before

+it can take a hold of a prowling, thieving, infernal, white-shirted free

+nigger, and"

+

+Pap was agoing on so he never noticed where his old limber legs was

+taking him to, so he went head over heels over the tub of salt pork and

+barked both shins, and the rest of his speech was all the hottest kind

+of languagemostly hove at the nigger and the govment, though he give

+the tub some, too, all along, here and there.  He hopped around the

+cabin considerable, first on one leg and then on the other, holding

+first one shin and then the other one, and at last he let out with his

+left foot all of a sudden and fetched the tub a rattling kick.  But it

+warn't good judgment, because that was the boot that had a couple of his

+toes leaking out of the front end of it; so now he raised a howl that

+fairly made a body's hair raise, and down he went in the dirt, and

+rolled there, and held his toes; and the cussing he done then laid over

+anything he had ever done previous.  He said so his own self afterwards.

+ He had heard old Sowberry Hagan in his best days, and he said it laid

+over him, too; but I reckon that was sort of piling it on, maybe.

+

+After supper pap took the jug, and said he had enough whisky there

+for two drunks and one delirium tremens.  That was always his word.  I

+judged he would be blind drunk in about an hour, and then I would steal

+the key, or saw myself out, one or t'other.  He drank and drank, and

+tumbled down on his blankets by and by; but luck didn't run my way.

+ He didn't go sound asleep, but was uneasy.  He groaned and moaned and

+thrashed around this way and that for a long time.  At last I got so

+sleepy I couldn't keep my eyes open all I could do, and so before I

+knowed what I was about I was sound asleep, and the candle burning.

+

+I don't know how long I was asleep, but all of a sudden there was an

+awful scream and I was up.  There was pap looking wild, and skipping

+around every which way and yelling about snakes.  He said they was

+crawling up his legs; and then he would give a jump and scream, and say

+one had bit him on the cheekbut I couldn't see no snakes.  He started

+and run round and round the cabin, hollering "Take him off! take him

+off! he's biting me on the neck!"  I never see a man look so wild in the

+eyes. Pretty soon he was all fagged out, and fell down panting; then he

+rolled over and over wonderful fast, kicking things every which way,

+and striking and grabbing at the air with his hands, and screaming and

+saying there was devils a-hold of him.  He wore out by and by, and laid

+still a while, moaning.  Then he laid stiller, and didn't make a sound.

+ I could hear the owls and the wolves away off in the woods, and it

+seemed terrible still.  He was laying over by the corner. By and by he

+raised up part way and listened, with his head to one side.  He says,

+very low:

+

+"Tramptramptramp; that's the dead; tramptramptramp; they're coming

+after me; but I won't go.  Oh, they're here! don't touch medon't! hands

+offthey're cold; let go.  Oh, let a poor devil alone!"

+

+Then he went down on all fours and crawled off, begging them to let him

+alone, and he rolled himself up in his blanket and wallowed in under the

+old pine table, still a-begging; and then he went to crying.  I could

+hear him through the blanket.

+

+By and by he rolled out and jumped up on his feet looking wild, and he

+see me and went for me.  He chased me round and round the place with a

+clasp-knife, calling me the Angel of Death, and saying he would kill me,

+and then I couldn't come for him no more.  I begged, and told him I

+was only Huck; but he laughed such a screechy laugh, and roared and

+cussed, and kept on chasing me up.  Once when I turned short and

+dodged under his arm he made a grab and got me by the jacket between my

+shoulders, and I thought I was gone; but I slid out of the jacket quick

+as lightning, and saved myself. Pretty soon he was all tired out, and

+dropped down with his back against the door, and said he would rest a

+minute and then kill me. He put his knife under him, and said he would

+sleep and get strong, and then he would see who was who.

+

+So he dozed off pretty soon.  By and by I got the old split-bottom chair

+and clumb up as easy as I could, not to make any noise, and got down the

+gun.  I slipped the ramrod down it to make sure it was loaded, then I

+laid it across the turnip barrel, pointing towards pap, and set down

+behind it to wait for him to stir.  And how slow and still the time did

+drag along.

+

+

+

+

+CHAPTER VII.

+

+"GIT up!  What you 'bout?"

+

+I opened my eyes and looked around, trying to make out where I was.  It

+was after sun-up, and I had been sound asleep.  Pap was standing over me

+looking sour and sick, too.  He says:

+

+"What you doin' with this gun?"

+

+I judged he didn't know nothing about what he had been doing, so I says:

+

+"Somebody tried to get in, so I was laying for him."

+

+"Why didn't you roust me out?"

+

+"Well, I tried to, but I couldn't; I couldn't budge you."

+

+"Well, all right.  Don't stand there palavering all day, but out with

+you and see if there's a fish on the lines for breakfast.  I'll be along

+in a minute."

+

+He unlocked the door, and I cleared out up the river-bank.  I noticed

+some pieces of limbs and such things floating down, and a sprinkling of

+bark; so I knowed the river had begun to rise.  I reckoned I would have

+great times now if I was over at the town.  The June rise used to be

+always luck for me; because as soon as that rise begins here comes

+cordwood floating down, and pieces of log raftssometimes a dozen logs

+together; so all you have to do is to catch them and sell them to the

+wood-yards and the sawmill.

+

+I went along up the bank with one eye out for pap and t'other one out

+for what the rise might fetch along.  Well, all at once here comes a

+canoe; just a beauty, too, about thirteen or fourteen foot long, riding

+high like a duck.  I shot head-first off of the bank like a frog,

+clothes and all on, and struck out for the canoe.  I just expected

+there'd be somebody laying down in it, because people often done that

+to fool folks, and when a chap had pulled a skiff out most to it they'd

+raise up and laugh at him.  But it warn't so this time.  It was a

+drift-canoe sure enough, and I clumb in and paddled her ashore.  Thinks

+I, the old man will be glad when he sees thisshe's worth ten dollars.

+ But when I got to shore pap wasn't in sight yet, and as I was running

+her into a little creek like a gully, all hung over with vines and

+willows, I struck another idea:  I judged I'd hide her good, and then,

+'stead of taking to the woods when I run off, I'd go down the river

+about fifty mile and camp in one place for good, and not have such a

+rough time tramping on foot.

+

+It was pretty close to the shanty, and I thought I heard the old man

+coming all the time; but I got her hid; and then I out and looked around

+a bunch of willows, and there was the old man down the path a piece just

+drawing a bead on a bird with his gun.  So he hadn't seen anything.

+

+When he got along I was hard at it taking up a "trot" line.  He abused

+me a little for being so slow; but I told him I fell in the river, and

+that was what made me so long.  I knowed he would see I was wet, and

+then he would be asking questions.  We got five catfish off the lines

+and went home.

+

+While we laid off after breakfast to sleep up, both of us being about

+wore out, I got to thinking that if I could fix up some way to keep pap

+and the widow from trying to follow me, it would be a certainer thing

+than trusting to luck to get far enough off before they missed me; you

+see, all kinds of things might happen.  Well, I didn't see no way for a

+while, but by and by pap raised up a minute to drink another barrel of

+water, and he says:

+

+"Another time a man comes a-prowling round here you roust me out, you

+hear? That man warn't here for no good.  I'd a shot him.  Next time you

+roust me out, you hear?"

+

+Then he dropped down and went to sleep again; but what he had been

+saying give me the very idea I wanted.  I says to myself, I can fix it

+now so nobody won't think of following me.

+

+About twelve o'clock we turned out and went along up the bank.  The

+river was coming up pretty fast, and lots of driftwood going by on the

+rise. By and by along comes part of a log raftnine logs fast together.

+ We went out with the skiff and towed it ashore.  Then we had dinner.

+Anybody but pap would a waited and seen the day through, so as to catch

+more stuff; but that warn't pap's style.  Nine logs was enough for one

+time; he must shove right over to town and sell.  So he locked me in and

+took the skiff, and started off towing the raft about half-past three.

+ I judged he wouldn't come back that night.  I waited till I reckoned he

+had got a good start; then I out with my saw, and went to work on that

+log again.  Before he was t'other side of the river I was out of the

+hole; him and his raft was just a speck on the water away off yonder.

+

+I took the sack of corn meal and took it to where the canoe was hid, and

+shoved the vines and branches apart and put it in; then I done the same

+with the side of bacon; then the whisky-jug.  I took all the coffee and

+sugar there was, and all the ammunition; I took the wadding; I took the

+bucket and gourd; I took a dipper and a tin cup, and my old saw and two

+blankets, and the skillet and the coffee-pot.  I took fish-lines and

+matches and other thingseverything that was worth a cent.  I cleaned

+out the place.  I wanted an axe, but there wasn't any, only the one out

+at the woodpile, and I knowed why I was going to leave that.  I fetched

+out the gun, and now I was done.

+

+I had wore the ground a good deal crawling out of the hole and dragging

+out so many things.  So I fixed that as good as I could from the outside

+by scattering dust on the place, which covered up the smoothness and the

+sawdust.  Then I fixed the piece of log back into its place, and put two

+rocks under it and one against it to hold it there, for it was bent up

+at that place and didn't quite touch ground.  If you stood four or five

+foot away and didn't know it was sawed, you wouldn't never notice

+it; and besides, this was the back of the cabin, and it warn't likely

+anybody would go fooling around there.

+

+It was all grass clear to the canoe, so I hadn't left a track.  I

+followed around to see.  I stood on the bank and looked out over the

+river.  All safe.  So I took the gun and went up a piece into the woods,

+and was hunting around for some birds when I see a wild pig; hogs soon

+went wild in them bottoms after they had got away from the prairie

+farms. I shot this fellow and took him into camp.

+

+I took the axe and smashed in the door.  I beat it and hacked it

+considerable a-doing it.  I fetched the pig in, and took him back nearly

+to the table and hacked into his throat with the axe, and laid him down

+on the ground to bleed; I say ground because it was groundhard packed,

+and no boards.  Well, next I took an old sack and put a lot of big rocks

+in itall I could dragand I started it from the pig, and dragged it to

+the door and through the woods down to the river and dumped it in, and

+down it sunk, out of sight.  You could easy see that something had been

+dragged over the ground.  I did wish Tom Sawyer was there; I knowed he

+would take an interest in this kind of business, and throw in the fancy

+touches.  Nobody could spread himself like Tom Sawyer in such a thing as

+that.

+

+Well, last I pulled out some of my hair, and blooded the axe good, and

+stuck it on the back side, and slung the axe in the corner.  Then I

+took up the pig and held him to my breast with my jacket (so he couldn't

+drip) till I got a good piece below the house and then dumped him into

+the river.  Now I thought of something else.  So I went and got the bag

+of meal and my old saw out of the canoe, and fetched them to the house.

+ I took the bag to where it used to stand, and ripped a hole in the

+bottom of it with the saw, for there warn't no knives and forks on the

+placepap done everything with his clasp-knife about the cooking.  Then

+I carried the sack about a hundred yards across the grass and through

+the willows east of the house, to a shallow lake that was five mile wide

+and full of rushesand ducks too, you might say, in the season.  There

+was a slough or a creek leading out of it on the other side that went

+miles away, I don't know where, but it didn't go to the river.  The meal

+sifted out and made a little track all the way to the lake.  I dropped

+pap's whetstone there too, so as to look like it had been done by

+accident. Then I tied up the rip in the meal sack with a string, so it

+wouldn't leak no more, and took it and my saw to the canoe again.

+

+It was about dark now; so I dropped the canoe down the river under some

+willows that hung over the bank, and waited for the moon to rise.  I

+made fast to a willow; then I took a bite to eat, and by and by laid

+down in the canoe to smoke a pipe and lay out a plan.  I says to myself,

+they'll follow the track of that sackful of rocks to the shore and then

+drag the river for me.  And they'll follow that meal track to the lake

+and go browsing down the creek that leads out of it to find the robbers

+that killed me and took the things.  They won't ever hunt the river for

+anything but my dead carcass. They'll soon get tired of that, and won't

+bother no more about me.  All right; I can stop anywhere I want to.

+Jackson's Island is good enough for me; I know that island pretty well,

+and nobody ever comes there.  And then I can paddle over to town nights,

+and slink around and pick up things I want. Jackson's Island's the

+place.

+

+I was pretty tired, and the first thing I knowed I was asleep.  When

+I woke up I didn't know where I was for a minute.  I set up and looked

+around, a little scared.  Then I remembered.  The river looked miles and

+miles across.  The moon was so bright I could a counted the drift logs

+that went a-slipping along, black and still, hundreds of yards out from

+shore. Everything was dead quiet, and it looked late, and smelt late.

+You know what I meanI don't know the words to put it in.

+

+I took a good gap and a stretch, and was just going to unhitch and start

+when I heard a sound away over the water.  I listened.  Pretty soon I

+made it out.  It was that dull kind of a regular sound that comes from

+oars working in rowlocks when it's a still night.  I peeped out through

+the willow branches, and there it wasa skiff, away across the water.

+ I couldn't tell how many was in it.  It kept a-coming, and when it was

+abreast of me I see there warn't but one man in it.  Think's I, maybe

+it's pap, though I warn't expecting him.  He dropped below me with the

+current, and by and by he came a-swinging up shore in the easy water,

+and he went by so close I could a reached out the gun and touched him.

+ Well, it was pap, sure enoughand sober, too, by the way he laid his

+oars.

+

+I didn't lose no time.  The next minute I was a-spinning down stream

+soft but quick in the shade of the bank.  I made two mile and a half,

+and then struck out a quarter of a mile or more towards the middle of

+the river, because pretty soon I would be passing the ferry landing, and

+people might see me and hail me.  I got out amongst the driftwood, and

+then laid down in the bottom of the canoe and let her float.

+

+ I laid there, and had a good rest and a smoke out of my pipe, looking

+away into the sky; not a cloud in it.  The sky looks ever so deep when

+you lay down on your back in the moonshine; I never knowed it before.

+ And how far a body can hear on the water such nights!  I heard people

+talking at the ferry landing. I heard what they said, tooevery word

+of it.  One man said it was getting towards the long days and the short

+nights now.  T'other one said this warn't one of the short ones, he

+reckonedand then they laughed, and he said it over again, and they

+laughed again; then they waked up another fellow and told him, and

+laughed, but he didn't laugh; he ripped out something brisk, and said

+let him alone.  The first fellow said he 'lowed to tell it to his

+old womanshe would think it was pretty good; but he said that warn't

+nothing to some things he had said in his time. I heard one man say it

+was nearly three o'clock, and he hoped daylight wouldn't wait more than

+about a week longer.  After that the talk got further and further away,

+and I couldn't make out the words any more; but I could hear the mumble,

+and now and then a laugh, too, but it seemed a long ways off.

+

+I was away below the ferry now.  I rose up, and there was Jackson's

+Island, about two mile and a half down stream, heavy timbered and

+standing up out of the middle of the river, big and dark and solid, like

+a steamboat without any lights.  There warn't any signs of the bar at

+the headit was all under water now.

+

+It didn't take me long to get there.  I shot past the head at a ripping

+rate, the current was so swift, and then I got into the dead water and

+landed on the side towards the Illinois shore.  I run the canoe into

+a deep dent in the bank that I knowed about; I had to part the willow

+branches to get in; and when I made fast nobody could a seen the canoe

+from the outside.

+

+I went up and set down on a log at the head of the island, and looked

+out on the big river and the black driftwood and away over to the town,

+three mile away, where there was three or four lights twinkling.  A

+monstrous big lumber-raft was about a mile up stream, coming along down,

+with a lantern in the middle of it.  I watched it come creeping down,

+and when it was most abreast of where I stood I heard a man say, "Stern

+oars, there! heave her head to stabboard!"  I heard that just as plain

+as if the man was by my side.

+

+There was a little gray in the sky now; so I stepped into the woods, and

+laid down for a nap before breakfast.

+

+

+

+

+CHAPTER VIII.

+

+THE sun was up so high when I waked that I judged it was after eight

+o'clock.  I laid there in the grass and the cool shade thinking about

+things, and feeling rested and ruther comfortable and satisfied.  I

+could see the sun out at one or two holes, but mostly it was big trees

+all about, and gloomy in there amongst them.  There was freckled places

+on the ground where the light sifted down through the leaves, and the

+freckled places swapped about a little, showing there was a little

+breeze up there.  A couple of squirrels set on a limb and jabbered at me

+very friendly.

+

+I was powerful lazy and comfortabledidn't want to get up and cook

+breakfast.  Well, I was dozing off again when I thinks I hears a deep

+sound of "boom!" away up the river.  I rouses up, and rests on my elbow

+and listens; pretty soon I hears it again.  I hopped up, and went and

+looked out at a hole in the leaves, and I see a bunch of smoke laying

+on the water a long ways upabout abreast the ferry.  And there was the

+ferryboat full of people floating along down.  I knowed what was the

+matter now.  "Boom!" I see the white smoke squirt out of the ferryboat's

+side.  You see, they was firing cannon over the water, trying to make my

+carcass come to the top.

+

+I was pretty hungry, but it warn't going to do for me to start a fire,

+because they might see the smoke.  So I set there and watched the

+cannon-smoke and listened to the boom.  The river was a mile wide there,

+and it always looks pretty on a summer morningso I was having a good

+enough time seeing them hunt for my remainders if I only had a bite to

+eat. Well, then I happened to think how they always put quicksilver in

+loaves of bread and float them off, because they always go right to the

+drownded carcass and stop there.  So, says I, I'll keep a lookout, and

+if any of them's floating around after me I'll give them a show.  I

+changed to the Illinois edge of the island to see what luck I could

+have, and I warn't disappointed.  A big double loaf come along, and I

+most got it with a long stick, but my foot slipped and she floated out

+further.  Of course I was where the current set in the closest to the

+shoreI knowed enough for that.  But by and by along comes another one,

+and this time I won.  I took out the plug and shook out the little dab

+of quicksilver, and set my teeth in.  It was "baker's bread"what the

+quality eat; none of your low-down corn-pone.

+

+I got a good place amongst the leaves, and set there on a log, munching

+the bread and watching the ferry-boat, and very well satisfied.  And

+then something struck me.  I says, now I reckon the widow or the parson

+or somebody prayed that this bread would find me, and here it has gone

+and done it.  So there ain't no doubt but there is something in that

+thingthat is, there's something in it when a body like the widow or the

+parson prays, but it don't work for me, and I reckon it don't work for

+only just the right kind.

+

+I lit a pipe and had a good long smoke, and went on watching.  The

+ferryboat was floating with the current, and I allowed I'd have a chance

+to see who was aboard when she come along, because she would come in

+close, where the bread did.  When she'd got pretty well along down

+towards me, I put out my pipe and went to where I fished out the bread,

+and laid down behind a log on the bank in a little open place.  Where

+the log forked I could peep through.

+

+By and by she come along, and she drifted in so close that they could

+a run out a plank and walked ashore.  Most everybody was on the boat.

+ Pap, and Judge Thatcher, and Bessie Thatcher, and Jo Harper, and Tom

+Sawyer, and his old Aunt Polly, and Sid and Mary, and plenty more.

+ Everybody was talking about the murder, but the captain broke in and

+says:

+

+"Look sharp, now; the current sets in the closest here, and maybe he's

+washed ashore and got tangled amongst the brush at the water's edge.  I

+hope so, anyway."

+

+I didn't hope so.  They all crowded up and leaned over the rails, nearly

+in my face, and kept still, watching with all their might.  I could see

+them first-rate, but they couldn't see me.  Then the captain sung out:

+

+"Stand away!" and the cannon let off such a blast right before me that

+it made me deef with the noise and pretty near blind with the smoke, and

+I judged I was gone.  If they'd a had some bullets in, I reckon they'd

+a got the corpse they was after.  Well, I see I warn't hurt, thanks to

+goodness. The boat floated on and went out of sight around the shoulder

+of the island.  I could hear the booming now and then, further and

+further off, and by and by, after an hour, I didn't hear it no more.

+ The island was three mile long.  I judged they had got to the foot, and

+was giving it up.  But they didn't yet a while.  They turned around

+the foot of the island and started up the channel on the Missouri side,

+under steam, and booming once in a while as they went.  I crossed over

+to that side and watched them. When they got abreast the head of the

+island they quit shooting and dropped over to the Missouri shore and

+went home to the town.

+

+I knowed I was all right now.  Nobody else would come a-hunting after

+me. I got my traps out of the canoe and made me a nice camp in the thick

+woods.  I made a kind of a tent out of my blankets to put my things

+under so the rain couldn't get at them.  I catched a catfish and haggled

+him open with my saw, and towards sundown I started my camp fire and had

+supper.  Then I set out a line to catch some fish for breakfast.

+

+When it was dark I set by my camp fire smoking, and feeling pretty well

+satisfied; but by and by it got sort of lonesome, and so I went and set

+on the bank and listened to the current swashing along, and counted the

+stars and drift logs and rafts that come down, and then went to bed;

+there ain't no better way to put in time when you are lonesome; you

+can't stay so, you soon get over it.

+

+And so for three days and nights.  No differencejust the same thing.

+But the next day I went exploring around down through the island.  I was

+boss of it; it all belonged to me, so to say, and I wanted to know

+all about it; but mainly I wanted to put in the time.  I found plenty

+strawberries, ripe and prime; and green summer grapes, and green

+razberries; and the green blackberries was just beginning to show.  They

+would all come handy by and by, I judged.

+

+Well, I went fooling along in the deep woods till I judged I warn't

+far from the foot of the island.  I had my gun along, but I hadn't shot

+nothing; it was for protection; thought I would kill some game nigh

+home. About this time I mighty near stepped on a good-sized snake,

+and it went sliding off through the grass and flowers, and I after

+it, trying to get a shot at it. I clipped along, and all of a sudden I

+bounded right on to the ashes of a camp fire that was still smoking.

+

+My heart jumped up amongst my lungs.  I never waited for to look

+further, but uncocked my gun and went sneaking back on my tiptoes as

+fast as ever I could.  Every now and then I stopped a second amongst the

+thick leaves and listened, but my breath come so hard I couldn't hear

+nothing else.  I slunk along another piece further, then listened again;

+and so on, and so on.  If I see a stump, I took it for a man; if I trod

+on a stick and broke it, it made me feel like a person had cut one of my

+breaths in two and I only got half, and the short half, too.

+

+When I got to camp I warn't feeling very brash, there warn't much sand

+in my craw; but I says, this ain't no time to be fooling around.  So I

+got all my traps into my canoe again so as to have them out of sight,

+and I put out the fire and scattered the ashes around to look like an

+old last year's camp, and then clumb a tree.

+

+I reckon I was up in the tree two hours; but I didn't see nothing,

+I didn't hear nothingI only thought I heard and seen as much as a

+thousand things.  Well, I couldn't stay up there forever; so at last I

+got down, but I kept in the thick woods and on the lookout all the

+time. All I could get to eat was berries and what was left over from

+breakfast.

+

+By the time it was night I was pretty hungry.  So when it was good

+and dark I slid out from shore before moonrise and paddled over to the

+Illinois bankabout a quarter of a mile.  I went out in the woods and

+cooked a supper, and I had about made up my mind I would stay there

+all night when I hear a plunkety-plunk, plunkety-plunk, and says

+to myself, horses coming; and next I hear people's voices.  I got

+everything into the canoe as quick as I could, and then went creeping

+through the woods to see what I could find out.  I hadn't got far when I

+hear a man say:

+

+"We better camp here if we can find a good place; the horses is about

+beat out.  Let's look around."

+

+I didn't wait, but shoved out and paddled away easy.  I tied up in the

+old place, and reckoned I would sleep in the canoe.

+

+I didn't sleep much.  I couldn't, somehow, for thinking.  And every time

+I waked up I thought somebody had me by the neck.  So the sleep didn't

+do me no good.  By and by I says to myself, I can't live this way; I'm

+a-going to find out who it is that's here on the island with me; I'll

+find it out or bust.  Well, I felt better right off.

+

+So I took my paddle and slid out from shore just a step or two, and

+then let the canoe drop along down amongst the shadows.  The moon was

+shining, and outside of the shadows it made it most as light as day.

+ I poked along well on to an hour, everything still as rocks and sound

+asleep. Well, by this time I was most down to the foot of the island.  A

+little ripply, cool breeze begun to blow, and that was as good as saying

+the night was about done.  I give her a turn with the paddle and brung

+her nose to shore; then I got my gun and slipped out and into the edge

+of the woods.  I sat down there on a log, and looked out through the

+leaves.  I see the moon go off watch, and the darkness begin to blanket

+the river. But in a little while I see a pale streak over the treetops,

+and knowed the day was coming.  So I took my gun and slipped off towards

+where I had run across that camp fire, stopping every minute or two

+to listen.  But I hadn't no luck somehow; I couldn't seem to find the

+place.  But by and by, sure enough, I catched a glimpse of fire away

+through the trees.  I went for it, cautious and slow.  By and by I was

+close enough to have a look, and there laid a man on the ground.  It

+most give me the fan-tods. He had a blanket around his head, and his

+head was nearly in the fire.  I set there behind a clump of bushes, in

+about six foot of him, and kept my eyes on him steady.  It was getting

+gray daylight now.  Pretty soon he gapped and stretched himself and hove

+off the blanket, and it was Miss Watson's Jim!  I bet I was glad to see

+him.  I says:

+

+"Hello, Jim!" and skipped out.

+

+He bounced up and stared at me wild.  Then he drops down on his knees,

+and puts his hands together and says:

+

+"Doan' hurt medon't!  I hain't ever done no harm to a ghos'.  I alwuz

+liked dead people, en done all I could for 'em.  You go en git in de

+river agin, whah you b'longs, en doan' do nuffn to Ole Jim, 'at 'uz

+awluz yo' fren'."

+

+Well, I warn't long making him understand I warn't dead.  I was ever so

+glad to see Jim.  I warn't lonesome now.  I told him I warn't afraid of

+him telling the people where I was.  I talked along, but he only set

+there and looked at me; never said nothing.  Then I says:

+

+"It's good daylight.  Le's get breakfast.  Make up your camp fire good."

+

+"What's de use er makin' up de camp fire to cook strawbries en sich

+truck? But you got a gun, hain't you?  Den we kin git sumfn better den

+strawbries."

+

+"Strawberries and such truck," I says.  "Is that what you live on?"

+

+"I couldn' git nuffn else," he says.

+

+"Why, how long you been on the island, Jim?"

+

+"I come heah de night arter you's killed."

+

+"What, all that time?"

+

+"Yesindeedy."

+

+"And ain't you had nothing but that kind of rubbage to eat?"

+

+"No, sahnuffn else."

+

+"Well, you must be most starved, ain't you?"

+

+"I reck'n I could eat a hoss.  I think I could. How long you ben on de

+islan'?"

+

+"Since the night I got killed."

+

+"No!  W'y, what has you lived on?  But you got a gun.  Oh, yes, you got

+a gun.  Dat's good.  Now you kill sumfn en I'll make up de fire."

+

+So we went over to where the canoe was, and while he built a fire in

+a grassy open place amongst the trees, I fetched meal and bacon and

+coffee, and coffee-pot and frying-pan, and sugar and tin cups, and the

+nigger was set back considerable, because he reckoned it was all done

+with witchcraft. I catched a good big catfish, too, and Jim cleaned him

+with his knife, and fried him.

+

+When breakfast was ready we lolled on the grass and eat it smoking hot.

+Jim laid it in with all his might, for he was most about starved.  Then

+when we had got pretty well stuffed, we laid off and lazied.  By and by

+Jim says:

+

+"But looky here, Huck, who wuz it dat 'uz killed in dat shanty ef it

+warn't you?"

+

+Then I told him the whole thing, and he said it was smart.  He said Tom

+Sawyer couldn't get up no better plan than what I had.  Then I says:

+

+"How do you come to be here, Jim, and how'd you get here?"

+

+He looked pretty uneasy, and didn't say nothing for a minute.  Then he

+says:

+

+"Maybe I better not tell."

+

+"Why, Jim?"

+

+"Well, dey's reasons.  But you wouldn' tell on me ef I uz to tell you,

+would you, Huck?"

+

+"Blamed if I would, Jim."

+

+"Well, I b'lieve you, Huck.  II run off."

+

+"Jim!"

+

+"But mind, you said you wouldn' tellyou know you said you wouldn' tell,

+Huck."

+

+"Well, I did.  I said I wouldn't, and I'll stick to it.  Honest injun,

+I will.  People would call me a low-down Abolitionist and despise me for

+keeping mumbut that don't make no difference.  I ain't a-going to tell,

+and I ain't a-going back there, anyways.  So, now, le's know all about

+it."

+

+"Well, you see, it 'uz dis way.  Ole missusdat's Miss Watsonshe pecks

+on me all de time, en treats me pooty rough, but she awluz said she

+wouldn' sell me down to Orleans.  But I noticed dey wuz a nigger trader

+roun' de place considable lately, en I begin to git oneasy.  Well, one

+night I creeps to de do' pooty late, en de do' warn't quite shet, en I

+hear old missus tell de widder she gwyne to sell me down to Orleans, but

+she didn' want to, but she could git eight hund'd dollars for me, en it

+'uz sich a big stack o' money she couldn' resis'.  De widder she try to

+git her to say she wouldn' do it, but I never waited to hear de res'.  I

+lit out mighty quick, I tell you.

+

+"I tuck out en shin down de hill, en 'spec to steal a skift 'long de

+sho' som'ers 'bove de town, but dey wuz people a-stirring yit, so I hid

+in de ole tumble-down cooper-shop on de bank to wait for everybody to

+go 'way. Well, I wuz dah all night.  Dey wuz somebody roun' all de time.

+ 'Long 'bout six in de mawnin' skifts begin to go by, en 'bout eight er

+nine every skift dat went 'long wuz talkin' 'bout how yo' pap come over

+to de town en say you's killed.  Dese las' skifts wuz full o' ladies en

+genlmen a-goin' over for to see de place.  Sometimes dey'd pull up at

+de sho' en take a res' b'fo' dey started acrost, so by de talk I got to

+know all 'bout de killin'.  I 'uz powerful sorry you's killed, Huck, but

+I ain't no mo' now.

+

+"I laid dah under de shavin's all day.  I 'uz hungry, but I warn't

+afeard; bekase I knowed ole missus en de widder wuz goin' to start to

+de camp-meet'n' right arter breakfas' en be gone all day, en dey knows

+I goes off wid de cattle 'bout daylight, so dey wouldn' 'spec to see me

+roun' de place, en so dey wouldn' miss me tell arter dark in de evenin'.

+De yuther servants wouldn' miss me, kase dey'd shin out en take holiday

+soon as de ole folks 'uz out'n de way.

+

+"Well, when it come dark I tuck out up de river road, en went 'bout two

+mile er more to whah dey warn't no houses.  I'd made up my mine 'bout

+what I's agwyne to do.  You see, ef I kep' on tryin' to git away afoot,

+de dogs 'ud track me; ef I stole a skift to cross over, dey'd miss dat

+skift, you see, en dey'd know 'bout whah I'd lan' on de yuther side, en

+whah to pick up my track.  So I says, a raff is what I's arter; it doan'

+make no track.

+

+"I see a light a-comin' roun' de p'int bymeby, so I wade' in en shove'

+a log ahead o' me en swum more'n half way acrost de river, en got in

+'mongst de drift-wood, en kep' my head down low, en kinder swum agin de

+current tell de raff come along.  Den I swum to de stern uv it en tuck

+a-holt.  It clouded up en 'uz pooty dark for a little while.  So I clumb

+up en laid down on de planks.  De men 'uz all 'way yonder in de middle,

+whah de lantern wuz.  De river wuz a-risin', en dey wuz a good current;

+so I reck'n'd 'at by fo' in de mawnin' I'd be twenty-five mile down de

+river, en den I'd slip in jis b'fo' daylight en swim asho', en take to

+de woods on de Illinois side.

+

+"But I didn' have no luck.  When we 'uz mos' down to de head er de

+islan' a man begin to come aft wid de lantern, I see it warn't no use

+fer to wait, so I slid overboard en struck out fer de islan'.  Well, I

+had a notion I could lan' mos' anywhers, but I couldn'tbank too bluff.

+ I 'uz mos' to de foot er de islan' b'fo' I found' a good place.  I went

+into de woods en jedged I wouldn' fool wid raffs no mo', long as dey

+move de lantern roun' so.  I had my pipe en a plug er dog-leg, en some

+matches in my cap, en dey warn't wet, so I 'uz all right."

+

+"And so you ain't had no meat nor bread to eat all this time?  Why

+didn't you get mud-turkles?"

+

+"How you gwyne to git 'm?  You can't slip up on um en grab um; en how's

+a body gwyne to hit um wid a rock?  How could a body do it in de night?

+ En I warn't gwyne to show mysef on de bank in de daytime."

+

+"Well, that's so.  You've had to keep in the woods all the time, of

+course. Did you hear 'em shooting the cannon?"

+

+"Oh, yes.  I knowed dey was arter you.  I see um go by heahwatched um

+thoo de bushes."

+

+Some young birds come along, flying a yard or two at a time and

+lighting. Jim said it was a sign it was going to rain.  He said it was

+a sign when young chickens flew that way, and so he reckoned it was the

+same way when young birds done it.  I was going to catch some of them,

+but Jim wouldn't let me.  He said it was death.  He said his father laid

+mighty sick once, and some of them catched a bird, and his old granny

+said his father would die, and he did.

+

+And Jim said you mustn't count the things you are going to cook for

+dinner, because that would bring bad luck.  The same if you shook the

+table-cloth after sundown.  And he said if a man owned a beehive

+and that man died, the bees must be told about it before sun-up next

+morning, or else the bees would all weaken down and quit work and die.

+ Jim said bees wouldn't sting idiots; but I didn't believe that, because

+I had tried them lots of times myself, and they wouldn't sting me.

+

+I had heard about some of these things before, but not all of them.  Jim

+knowed all kinds of signs.  He said he knowed most everything.  I said

+it looked to me like all the signs was about bad luck, and so I asked

+him if there warn't any good-luck signs.  He says:

+

+"Mighty fewan' dey ain't no use to a body.  What you want to know

+when good luck's a-comin' for?  Want to keep it off?"  And he said:  "Ef

+you's got hairy arms en a hairy breas', it's a sign dat you's agwyne

+to be rich. Well, dey's some use in a sign like dat, 'kase it's so fur

+ahead. You see, maybe you's got to be po' a long time fust, en so you

+might git discourage' en kill yo'sef 'f you didn' know by de sign dat

+you gwyne to be rich bymeby."

+

+"Have you got hairy arms and a hairy breast, Jim?"

+

+"What's de use to ax dat question?  Don't you see I has?"

+

+"Well, are you rich?"

+

+"No, but I ben rich wunst, and gwyne to be rich agin.  Wunst I had

+foteen dollars, but I tuck to specalat'n', en got busted out."

+

+"What did you speculate in, Jim?"

+

+"Well, fust I tackled stock."

+

+"What kind of stock?"

+

+"Why, live stockcattle, you know.  I put ten dollars in a cow.  But

+I ain' gwyne to resk no mo' money in stock.  De cow up 'n' died on my

+han's."

+

+"So you lost the ten dollars."

+

+"No, I didn't lose it all.  I on'y los' 'bout nine of it.  I sole de

+hide en taller for a dollar en ten cents."

+

+"You had five dollars and ten cents left.  Did you speculate any more?"

+

+"Yes.  You know that one-laigged nigger dat b'longs to old Misto

+Bradish? Well, he sot up a bank, en say anybody dat put in a dollar

+would git fo' dollars mo' at de en' er de year.  Well, all de niggers

+went in, but dey didn't have much.  I wuz de on'y one dat had much.  So

+I stuck out for mo' dan fo' dollars, en I said 'f I didn' git it I'd

+start a bank mysef. Well, o' course dat nigger want' to keep me out er

+de business, bekase he says dey warn't business 'nough for two banks, so

+he say I could put in my five dollars en he pay me thirty-five at de en'

+er de year.

+

+"So I done it.  Den I reck'n'd I'd inves' de thirty-five dollars right

+off en keep things a-movin'.  Dey wuz a nigger name' Bob, dat had

+ketched a wood-flat, en his marster didn' know it; en I bought it off'n

+him en told him to take de thirty-five dollars when de en' er de

+year come; but somebody stole de wood-flat dat night, en nex day de

+one-laigged nigger say de bank's busted.  So dey didn' none uv us git no

+money."

+

+"What did you do with the ten cents, Jim?"

+

+"Well, I 'uz gwyne to spen' it, but I had a dream, en de dream tole me

+to give it to a nigger name' BalumBalum's Ass dey call him for short;

+he's one er dem chuckleheads, you know.  But he's lucky, dey say, en I

+see I warn't lucky.  De dream say let Balum inves' de ten cents en he'd

+make a raise for me.  Well, Balum he tuck de money, en when he wuz in

+church he hear de preacher say dat whoever give to de po' len' to de

+Lord, en boun' to git his money back a hund'd times.  So Balum he tuck

+en give de ten cents to de po', en laid low to see what wuz gwyne to

+come of it."

+

+"Well, what did come of it, Jim?"

+

+"Nuffn never come of it.  I couldn' manage to k'leck dat money no way;

+en Balum he couldn'.  I ain' gwyne to len' no mo' money 'dout I see de

+security.  Boun' to git yo' money back a hund'd times, de preacher says!

+Ef I could git de ten cents back, I'd call it squah, en be glad er de

+chanst."

+

+"Well, it's all right anyway, Jim, long as you're going to be rich again

+some time or other."

+

+"Yes; en I's rich now, come to look at it.  I owns mysef, en I's wuth

+eight hund'd dollars.  I wisht I had de money, I wouldn' want no mo'."

+

+

+

+

+CHAPTER IX.

+

+I wanted to go and look at a place right about the middle of the island

+that I'd found when I was exploring; so we started and soon got to it,

+because the island was only three miles long and a quarter of a mile

+wide.

+

+This place was a tolerable long, steep hill or ridge about forty foot

+high. We had a rough time getting to the top, the sides was so steep and

+the bushes so thick.  We tramped and clumb around all over it, and by

+and by found a good big cavern in the rock, most up to the top on the

+side towards Illinois.  The cavern was as big as two or three rooms

+bunched together, and Jim could stand up straight in it.  It was cool in

+there. Jim was for putting our traps in there right away, but I said we

+didn't want to be climbing up and down there all the time.

+

+Jim said if we had the canoe hid in a good place, and had all the traps

+in the cavern, we could rush there if anybody was to come to the island,

+and they would never find us without dogs.  And, besides, he said them

+little birds had said it was going to rain, and did I want the things to

+get wet?

+

+So we went back and got the canoe, and paddled up abreast the cavern,

+and lugged all the traps up there.  Then we hunted up a place close by

+to hide the canoe in, amongst the thick willows.  We took some fish off

+of the lines and set them again, and begun to get ready for dinner.

+

+The door of the cavern was big enough to roll a hogshead in, and on one

+side of the door the floor stuck out a little bit, and was flat and a

+good place to build a fire on.  So we built it there and cooked dinner.

+

+We spread the blankets inside for a carpet, and eat our dinner in there.

+We put all the other things handy at the back of the cavern.  Pretty

+soon it darkened up, and begun to thunder and lighten; so the birds was

+right about it.  Directly it begun to rain, and it rained like all fury,

+too, and I never see the wind blow so.  It was one of these regular

+summer storms.  It would get so dark that it looked all blue-black

+outside, and lovely; and the rain would thrash along by so thick that

+the trees off a little ways looked dim and spider-webby; and here would

+come a blast of wind that would bend the trees down and turn up the

+pale underside of the leaves; and then a perfect ripper of a gust would

+follow along and set the branches to tossing their arms as if they

+was just wild; and next, when it was just about the bluest and

+blackestFST! it was as bright as glory, and you'd have a little

+glimpse of tree-tops a-plunging about away off yonder in the storm,

+hundreds of yards further than you could see before; dark as sin again

+in a second, and now you'd hear the thunder let go with an awful crash,

+and then go rumbling, grumbling, tumbling, down the sky towards the

+under side of the world, like rolling empty barrels down stairswhere

+it's long stairs and they bounce a good deal, you know.

+

+"Jim, this is nice," I says.  "I wouldn't want to be nowhere else but

+here. Pass me along another hunk of fish and some hot corn-bread."

+

+"Well, you wouldn't a ben here 'f it hadn't a ben for Jim.  You'd a ben

+down dah in de woods widout any dinner, en gittn' mos' drownded, too;

+dat you would, honey.  Chickens knows when it's gwyne to rain, en so do

+de birds, chile."

+

+The river went on raising and raising for ten or twelve days, till at

+last it was over the banks.  The water was three or four foot deep on

+the island in the low places and on the Illinois bottom.  On that side

+it was a good many miles wide, but on the Missouri side it was the same

+old distance acrossa half a milebecause the Missouri shore was just a

+wall of high bluffs.

+

+Daytimes we paddled all over the island in the canoe, It was mighty cool

+and shady in the deep woods, even if the sun was blazing outside.  We

+went winding in and out amongst the trees, and sometimes the vines hung

+so thick we had to back away and go some other way.  Well, on every old

+broken-down tree you could see rabbits and snakes and such things; and

+when the island had been overflowed a day or two they got so tame, on

+account of being hungry, that you could paddle right up and put your

+hand on them if you wanted to; but not the snakes and turtlesthey would

+slide off in the water.  The ridge our cavern was in was full of them.

+We could a had pets enough if we'd wanted them.

+

+One night we catched a little section of a lumber raftnice pine planks.

+It was twelve foot wide and about fifteen or sixteen foot long, and

+the top stood above water six or seven inchesa solid, level floor.  We

+could see saw-logs go by in the daylight sometimes, but we let them go;

+we didn't show ourselves in daylight.

+

+Another night when we was up at the head of the island, just before

+daylight, here comes a frame-house down, on the west side.  She was

+a two-story, and tilted over considerable.  We paddled out and got

+aboardclumb in at an upstairs window.  But it was too dark to see yet,

+so we made the canoe fast and set in her to wait for daylight.

+

+The light begun to come before we got to the foot of the island.  Then

+we looked in at the window.  We could make out a bed, and a table, and

+two old chairs, and lots of things around about on the floor, and there

+was clothes hanging against the wall.  There was something laying on the

+floor in the far corner that looked like a man.  So Jim says:

+

+"Hello, you!"

+

+But it didn't budge.  So I hollered again, and then Jim says:

+

+"De man ain't asleephe's dead.  You hold stillI'll go en see."

+

+He went, and bent down and looked, and says:

+

+"It's a dead man.  Yes, indeedy; naked, too.  He's ben shot in de back.

+I reck'n he's ben dead two er three days.  Come in, Huck, but doan' look

+at his faceit's too gashly."

+

+I didn't look at him at all.  Jim throwed some old rags over him, but

+he needn't done it; I didn't want to see him.  There was heaps of old

+greasy cards scattered around over the floor, and old whisky bottles,

+and a couple of masks made out of black cloth; and all over the walls

+was the ignorantest kind of words and pictures made with charcoal.

+ There was two old dirty calico dresses, and a sun-bonnet, and some

+women's underclothes hanging against the wall, and some men's clothing,

+too.  We put the lot into the canoeit might come good.  There was a

+boy's old speckled straw hat on the floor; I took that, too.  And there

+was a bottle that had had milk in it, and it had a rag stopper for a

+baby to suck.  We would a took the bottle, but it was broke.  There was

+a seedy old chest, and an old hair trunk with the hinges broke.  They

+stood open, but there warn't nothing left in them that was any account.

+ The way things was scattered about we reckoned the people left in a

+hurry, and warn't fixed so as to carry off most of their stuff.

+

+We got an old tin lantern, and a butcher-knife without any handle, and

+a bran-new Barlow knife worth two bits in any store, and a lot of tallow

+candles, and a tin candlestick, and a gourd, and a tin cup, and a ratty

+old bedquilt off the bed, and a reticule with needles and pins and

+beeswax and buttons and thread and all such truck in it, and a hatchet

+and some nails, and a fishline as thick as my little finger with some

+monstrous hooks on it, and a roll of buckskin, and a leather dog-collar,

+and a horseshoe, and some vials of medicine that didn't have no label

+on them; and just as we was leaving I found a tolerable good curry-comb,

+and Jim he found a ratty old fiddle-bow, and a wooden leg.  The straps

+was broke off of it, but, barring that, it was a good enough leg, though

+it was too long for me and not long enough for Jim, and we couldn't find

+the other one, though we hunted all around.

+

+And so, take it all around, we made a good haul.  When we was ready to

+shove off we was a quarter of a mile below the island, and it was pretty

+broad day; so I made Jim lay down in the canoe and cover up with the

+quilt, because if he set up people could tell he was a nigger a good

+ways off.  I paddled over to the Illinois shore, and drifted down most

+a half a mile doing it.  I crept up the dead water under the bank, and

+hadn't no accidents and didn't see nobody.  We got home all safe.

+

+

+

+

+CHAPTER X.

+

+AFTER breakfast I wanted to talk about the dead man and guess out how he

+come to be killed, but Jim didn't want to.  He said it would fetch bad

+luck; and besides, he said, he might come and ha'nt us; he said a man

+that warn't buried was more likely to go a-ha'nting around than one

+that was planted and comfortable.  That sounded pretty reasonable, so

+I didn't say no more; but I couldn't keep from studying over it and

+wishing I knowed who shot the man, and what they done it for.

+

+We rummaged the clothes we'd got, and found eight dollars in silver

+sewed up in the lining of an old blanket overcoat.  Jim said he reckoned

+the people in that house stole the coat, because if they'd a knowed the

+money was there they wouldn't a left it.  I said I reckoned they killed

+him, too; but Jim didn't want to talk about that.  I says:

+

+"Now you think it's bad luck; but what did you say when I fetched in the

+snake-skin that I found on the top of the ridge day before yesterday?

+You said it was the worst bad luck in the world to touch a snake-skin

+with my hands.  Well, here's your bad luck!  We've raked in all this

+truck and eight dollars besides.  I wish we could have some bad luck

+like this every day, Jim."

+

+"Never you mind, honey, never you mind.  Don't you git too peart.  It's

+a-comin'.  Mind I tell you, it's a-comin'."

+

+It did come, too.  It was a Tuesday that we had that talk.  Well, after

+dinner Friday we was laying around in the grass at the upper end of the

+ridge, and got out of tobacco.  I went to the cavern to get some, and

+found a rattlesnake in there.  I killed him, and curled him up on the

+foot of Jim's blanket, ever so natural, thinking there'd be some fun

+when Jim found him there.  Well, by night I forgot all about the snake,

+and when Jim flung himself down on the blanket while I struck a light

+the snake's mate was there, and bit him.

+

+He jumped up yelling, and the first thing the light showed was the

+varmint curled up and ready for another spring.  I laid him out in a

+second with a stick, and Jim grabbed pap's whisky-jug and begun to pour

+it down.

+

+He was barefooted, and the snake bit him right on the heel.  That all

+comes of my being such a fool as to not remember that wherever you leave

+a dead snake its mate always comes there and curls around it.  Jim told

+me to chop off the snake's head and throw it away, and then skin the

+body and roast a piece of it.  I done it, and he eat it and said it

+would help cure him. He made me take off the rattles and tie them around

+his wrist, too.  He said that that would help.  Then I slid out quiet

+and throwed the snakes clear away amongst the bushes; for I warn't going

+to let Jim find out it was all my fault, not if I could help it.

+

+Jim sucked and sucked at the jug, and now and then he got out of his

+head and pitched around and yelled; but every time he come to himself he

+went to sucking at the jug again.  His foot swelled up pretty big, and

+so did his leg; but by and by the drunk begun to come, and so I judged

+he was all right; but I'd druther been bit with a snake than pap's

+whisky.

+

+Jim was laid up for four days and nights.  Then the swelling was all

+gone and he was around again.  I made up my mind I wouldn't ever take

+a-holt of a snake-skin again with my hands, now that I see what had come

+of it. Jim said he reckoned I would believe him next time.  And he said

+that handling a snake-skin was such awful bad luck that maybe we hadn't

+got to the end of it yet.  He said he druther see the new moon over his

+left shoulder as much as a thousand times than take up a snake-skin

+in his hand.  Well, I was getting to feel that way myself, though I've

+always reckoned that looking at the new moon over your left shoulder is

+one of the carelessest and foolishest things a body can do.  Old Hank

+Bunker done it once, and bragged about it; and in less than two years he

+got drunk and fell off of the shot-tower, and spread himself out so

+that he was just a kind of a layer, as you may say; and they slid him

+edgeways between two barn doors for a coffin, and buried him so, so

+they say, but I didn't see it.  Pap told me.  But anyway it all come of

+looking at the moon that way, like a fool.

+

+Well, the days went along, and the river went down between its banks

+again; and about the first thing we done was to bait one of the big

+hooks with a skinned rabbit and set it and catch a catfish that was

+as big as a man, being six foot two inches long, and weighed over two

+hundred pounds. We couldn't handle him, of course; he would a flung us

+into Illinois.  We just set there and watched him rip and tear around

+till he drownded.  We found a brass button in his stomach and a round

+ball, and lots of rubbage.  We split the ball open with the hatchet,

+and there was a spool in it.  Jim said he'd had it there a long time, to

+coat it over so and make a ball of it.  It was as big a fish as was ever

+catched in the Mississippi, I reckon.  Jim said he hadn't ever seen

+a bigger one.  He would a been worth a good deal over at the village.

+ They peddle out such a fish as that by the pound in the market-house

+there; everybody buys some of him; his meat's as white as snow and makes

+a good fry.

+

+Next morning I said it was getting slow and dull, and I wanted to get a

+stirring up some way.  I said I reckoned I would slip over the river and

+find out what was going on.  Jim liked that notion; but he said I

+must go in the dark and look sharp.  Then he studied it over and said,

+couldn't I put on some of them old things and dress up like a girl?

+ That was a good notion, too.  So we shortened up one of the calico

+gowns, and I turned up my trouser-legs to my knees and got into it.  Jim

+hitched it behind with the hooks, and it was a fair fit.  I put on the

+sun-bonnet and tied it under my chin, and then for a body to look in

+and see my face was like looking down a joint of stove-pipe.  Jim said

+nobody would know me, even in the daytime, hardly.  I practiced around

+all day to get the hang of the things, and by and by I could do pretty

+well in them, only Jim said I didn't walk like a girl; and he said

+I must quit pulling up my gown to get at my britches-pocket.  I took

+notice, and done better.

+

+I started up the Illinois shore in the canoe just after dark.

+

+I started across to the town from a little below the ferry-landing, and

+the drift of the current fetched me in at the bottom of the town.  I

+tied up and started along the bank.  There was a light burning in a

+little shanty that hadn't been lived in for a long time, and I wondered

+who had took up quarters there.  I slipped up and peeped in at the

+window.  There was a woman about forty year old in there knitting by

+a candle that was on a pine table.  I didn't know her face; she was a

+stranger, for you couldn't start a face in that town that I didn't know.

+ Now this was lucky, because I was weakening; I was getting afraid I had

+come; people might know my voice and find me out.  But if this woman had

+been in such a little town two days she could tell me all I wanted to

+know; so I knocked at the door, and made up my mind I wouldn't forget I

+was a girl.

+

+

+

+

+CHAPTER XI.

+

+"COME in," says the woman, and I did.  She says:  "Take a cheer."

+

+I done it.  She looked me all over with her little shiny eyes, and says:

+

+"What might your name be?"

+

+"Sarah Williams."

+

+"Where 'bouts do you live?  In this neighborhood?'

+

+"No'm.  In Hookerville, seven mile below.  I've walked all the way and

+I'm all tired out."

+

+"Hungry, too, I reckon.  I'll find you something."

+

+"No'm, I ain't hungry.  I was so hungry I had to stop two miles below

+here at a farm; so I ain't hungry no more.  It's what makes me so late.

+My mother's down sick, and out of money and everything, and I come to

+tell my uncle Abner Moore.  He lives at the upper end of the town, she

+says.  I hain't ever been here before.  Do you know him?"

+

+"No; but I don't know everybody yet.  I haven't lived here quite two

+weeks. It's a considerable ways to the upper end of the town.  You

+better stay here all night.  Take off your bonnet."

+

+"No," I says; "I'll rest a while, I reckon, and go on.  I ain't afeared

+of the dark."

+

+She said she wouldn't let me go by myself, but her husband would be in

+by and by, maybe in a hour and a half, and she'd send him along with me.

+Then she got to talking about her husband, and about her relations up

+the river, and her relations down the river, and about how much better

+off they used to was, and how they didn't know but they'd made a mistake

+coming to our town, instead of letting well aloneand so on and so on,

+till I was afeard I had made a mistake coming to her to find out what

+was going on in the town; but by and by she dropped on to pap and the

+murder, and then I was pretty willing to let her clatter right along.

+ She told about me and Tom Sawyer finding the six thousand dollars (only

+she got it ten) and all about pap and what a hard lot he was, and what

+a hard lot I was, and at last she got down to where I was murdered.  I

+says:

+

+"Who done it?  We've heard considerable about these goings on down in

+Hookerville, but we don't know who 'twas that killed Huck Finn."

+

+"Well, I reckon there's a right smart chance of people here that'd

+like to know who killed him.  Some think old Finn done it himself."

+

+"Nois that so?"

+

+"Most everybody thought it at first.  He'll never know how nigh he come

+to getting lynched.  But before night they changed around and judged it

+was done by a runaway nigger named Jim."

+

+"Why he"

+

+I stopped.  I reckoned I better keep still.  She run on, and never

+noticed I had put in at all:

+

+"The nigger run off the very night Huck Finn was killed.  So there's a

+reward out for himthree hundred dollars.  And there's a reward out for

+old Finn, tootwo hundred dollars.  You see, he come to town the

+morning after the murder, and told about it, and was out with 'em on the

+ferryboat hunt, and right away after he up and left.  Before night they

+wanted to lynch him, but he was gone, you see.  Well, next day they

+found out the nigger was gone; they found out he hadn't ben seen sence

+ten o'clock the night the murder was done.  So then they put it on him,

+you see; and while they was full of it, next day, back comes old Finn,

+and went boo-hooing to Judge Thatcher to get money to hunt for the

+nigger all over Illinois with. The judge gave him some, and that evening

+he got drunk, and was around till after midnight with a couple of mighty

+hard-looking strangers, and then went off with them.  Well, he hain't

+come back sence, and they ain't looking for him back till this thing

+blows over a little, for people thinks now that he killed his boy and

+fixed things so folks would think robbers done it, and then he'd get

+Huck's money without having to bother a long time with a lawsuit.

+ People do say he warn't any too good to do it.  Oh, he's sly, I reckon.

+ If he don't come back for a year he'll be all right.  You can't prove

+anything on him, you know; everything will be quieted down then, and

+he'll walk in Huck's money as easy as nothing."

+

+"Yes, I reckon so, 'm.  I don't see nothing in the way of it.  Has

+everybody quit thinking the nigger done it?"

+

+"Oh, no, not everybody.  A good many thinks he done it.  But they'll get

+the nigger pretty soon now, and maybe they can scare it out of him."

+

+"Why, are they after him yet?"

+

+"Well, you're innocent, ain't you!  Does three hundred dollars lay

+around every day for people to pick up?  Some folks think the nigger

+ain't far from here.  I'm one of thembut I hain't talked it around.  A

+few days ago I was talking with an old couple that lives next door in

+the log shanty, and they happened to say hardly anybody ever goes to

+that island over yonder that they call Jackson's Island.  Don't anybody

+live there? says I. No, nobody, says they.  I didn't say any more, but

+I done some thinking.  I was pretty near certain I'd seen smoke over

+there, about the head of the island, a day or two before that, so I says

+to myself, like as not that nigger's hiding over there; anyway, says

+I, it's worth the trouble to give the place a hunt.  I hain't seen any

+smoke sence, so I reckon maybe he's gone, if it was him; but husband's

+going over to seehim and another man.  He was gone up the river; but he

+got back to-day, and I told him as soon as he got here two hours ago."

+

+I had got so uneasy I couldn't set still.  I had to do something with my

+hands; so I took up a needle off of the table and went to threading

+it. My hands shook, and I was making a bad job of it.  When the woman

+stopped talking I looked up, and she was looking at me pretty curious

+and smiling a little.  I put down the needle and thread, and let on to

+be interestedand I was, tooand says:

+

+"Three hundred dollars is a power of money.  I wish my mother could get

+it. Is your husband going over there to-night?"

+

+"Oh, yes.  He went up-town with the man I was telling you of, to get a

+boat and see if they could borrow another gun.  They'll go over after

+midnight."

+

+"Couldn't they see better if they was to wait till daytime?"

+

+"Yes.  And couldn't the nigger see better, too?  After midnight he'll

+likely be asleep, and they can slip around through the woods and hunt up

+his camp fire all the better for the dark, if he's got one."

+

+"I didn't think of that."

+

+The woman kept looking at me pretty curious, and I didn't feel a bit

+comfortable.  Pretty soon she says,

+

+"What did you say your name was, honey?"

+

+"MMary Williams."

+

+Somehow it didn't seem to me that I said it was Mary before, so I didn't

+look upseemed to me I said it was Sarah; so I felt sort of cornered,

+and was afeared maybe I was looking it, too.  I wished the woman would

+say something more; the longer she set still the uneasier I was.  But

+now she says:

+

+"Honey, I thought you said it was Sarah when you first come in?"

+

+"Oh, yes'm, I did.  Sarah Mary Williams.  Sarah's my first name.  Some

+calls me Sarah, some calls me Mary."

+

+"Oh, that's the way of it?"

+

+"Yes'm."

+

+I was feeling better then, but I wished I was out of there, anyway.  I

+couldn't look up yet.

+

+Well, the woman fell to talking about how hard times was, and how poor

+they had to live, and how the rats was as free as if they owned the

+place, and so forth and so on, and then I got easy again.  She was right

+about the rats. You'd see one stick his nose out of a hole in the corner

+every little while.  She said she had to have things handy to throw at

+them when she was alone, or they wouldn't give her no peace.  She showed

+me a bar of lead twisted up into a knot, and said she was a good shot

+with it generly, but she'd wrenched her arm a day or two ago, and didn't

+know whether she could throw true now.  But she watched for a chance,

+and directly banged away at a rat; but she missed him wide, and said

+"Ouch!" it hurt her arm so.  Then she told me to try for the next one.

+ I wanted to be getting away before the old man got back, but of course

+I didn't let on.  I got the thing, and the first rat that showed his

+nose I let drive, and if he'd a stayed where he was he'd a been a

+tolerable sick rat.  She said that was first-rate, and she reckoned I

+would hive the next one.  She went and got the lump of lead and fetched

+it back, and brought along a hank of yarn which she wanted me to help

+her with.  I held up my two hands and she put the hank over them, and

+went on talking about her and her husband's matters.  But she broke off

+to say:

+

+"Keep your eye on the rats.  You better have the lead in your lap,

+handy."

+

+So she dropped the lump into my lap just at that moment, and I clapped

+my legs together on it and she went on talking.  But only about a

+minute. Then she took off the hank and looked me straight in the face,

+and very pleasant, and says:

+

+"Come, now, what's your real name?"

+

+"Whwhat, mum?"

+

+"What's your real name?  Is it Bill, or Tom, or Bob?or what is it?"

+

+I reckon I shook like a leaf, and I didn't know hardly what to do.  But

+I says:

+

+"Please to don't poke fun at a poor girl like me, mum.  If I'm in the

+way here, I'll"

+

+"No, you won't.  Set down and stay where you are.  I ain't going to hurt

+you, and I ain't going to tell on you, nuther.  You just tell me your

+secret, and trust me.  I'll keep it; and, what's more, I'll help

+you. So'll my old man if you want him to.  You see, you're a runaway

+'prentice, that's all.  It ain't anything.  There ain't no harm in it.

+You've been treated bad, and you made up your mind to cut.  Bless you,

+child, I wouldn't tell on you.  Tell me all about it now, that's a good

+boy."

+

+So I said it wouldn't be no use to try to play it any longer, and I

+would just make a clean breast and tell her everything, but she musn't

+go back on her promise.  Then I told her my father and mother was dead,

+and the law had bound me out to a mean old farmer in the country thirty

+mile back from the river, and he treated me so bad I couldn't stand it

+no longer; he went away to be gone a couple of days, and so I took my

+chance and stole some of his daughter's old clothes and cleared out, and

+I had been three nights coming the thirty miles.  I traveled nights,

+and hid daytimes and slept, and the bag of bread and meat I carried from

+home lasted me all the way, and I had a-plenty.  I said I believed my

+uncle Abner Moore would take care of me, and so that was why I struck

+out for this town of Goshen.

+

+"Goshen, child?  This ain't Goshen.  This is St. Petersburg.  Goshen's

+ten mile further up the river.  Who told you this was Goshen?"

+

+"Why, a man I met at daybreak this morning, just as I was going to turn

+into the woods for my regular sleep.  He told me when the roads forked I

+must take the right hand, and five mile would fetch me to Goshen."

+

+"He was drunk, I reckon.  He told you just exactly wrong."

+

+"Well, he did act like he was drunk, but it ain't no matter now.  I got

+to be moving along.  I'll fetch Goshen before daylight."

+

+"Hold on a minute.  I'll put you up a snack to eat.  You might want it."

+

+So she put me up a snack, and says:

+

+"Say, when a cow's laying down, which end of her gets up first?  Answer

+up prompt nowdon't stop to study over it.  Which end gets up first?"

+

+"The hind end, mum."

+

+"Well, then, a horse?"

+

+"The for'rard end, mum."

+

+"Which side of a tree does the moss grow on?"

+

+"North side."

+

+"If fifteen cows is browsing on a hillside, how many of them eats with

+their heads pointed the same direction?"

+

+"The whole fifteen, mum."

+

+"Well, I reckon you have lived in the country.  I thought maybe you

+was trying to hocus me again.  What's your real name, now?"

+

+"George Peters, mum."

+

+"Well, try to remember it, George.  Don't forget and tell me it's

+Elexander before you go, and then get out by saying it's George

+Elexander when I catch you.  And don't go about women in that old

+calico.  You do a girl tolerable poor, but you might fool men, maybe.

+ Bless you, child, when you set out to thread a needle don't hold the

+thread still and fetch the needle up to it; hold the needle still and

+poke the thread at it; that's the way a woman most always does, but a

+man always does t'other way.  And when you throw at a rat or anything,

+hitch yourself up a tiptoe and fetch your hand up over your head as

+awkward as you can, and miss your rat about six or seven foot. Throw

+stiff-armed from the shoulder, like there was a pivot there for it to

+turn on, like a girl; not from the wrist and elbow, with your arm out

+to one side, like a boy.  And, mind you, when a girl tries to catch

+anything in her lap she throws her knees apart; she don't clap them

+together, the way you did when you catched the lump of lead.  Why, I

+spotted you for a boy when you was threading the needle; and I contrived

+the other things just to make certain.  Now trot along to your uncle,

+Sarah Mary Williams George Elexander Peters, and if you get into trouble

+you send word to Mrs. Judith Loftus, which is me, and I'll do what I can

+to get you out of it.  Keep the river road all the way, and next time

+you tramp take shoes and socks with you. The river road's a rocky one,

+and your feet'll be in a condition when you get to Goshen, I reckon."

+

+I went up the bank about fifty yards, and then I doubled on my tracks

+and slipped back to where my canoe was, a good piece below the house.  I

+jumped in, and was off in a hurry.  I went up-stream far enough to

+make the head of the island, and then started across.  I took off the

+sun-bonnet, for I didn't want no blinders on then.  When I was about the

+middle I heard the clock begin to strike, so I stops and listens; the

+sound come faint over the water but cleareleven.  When I struck the

+head of the island I never waited to blow, though I was most winded, but

+I shoved right into the timber where my old camp used to be, and started

+a good fire there on a high and dry spot.

+

+Then I jumped in the canoe and dug out for our place, a mile and a half

+below, as hard as I could go.  I landed, and slopped through the timber

+and up the ridge and into the cavern.  There Jim laid, sound asleep on

+the ground.  I roused him out and says:

+

+"Git up and hump yourself, Jim!  There ain't a minute to lose.  They're

+after us!"

+

+Jim never asked no questions, he never said a word; but the way he

+worked for the next half an hour showed about how he was scared.  By

+that time everything we had in the world was on our raft, and she was

+ready to be shoved out from the willow cove where she was hid.  We

+put out the camp fire at the cavern the first thing, and didn't show a

+candle outside after that.

+

+I took the canoe out from the shore a little piece, and took a look;

+but if there was a boat around I couldn't see it, for stars and shadows

+ain't good to see by.  Then we got out the raft and slipped along down

+in the shade, past the foot of the island dead stillnever saying a

+word.

+

+

+

+

+CHAPTER XII.

+

+IT must a been close on to one o'clock when we got below the island at

+last, and the raft did seem to go mighty slow.  If a boat was to come

+along we was going to take to the canoe and break for the Illinois

+shore; and it was well a boat didn't come, for we hadn't ever thought to

+put the gun in the canoe, or a fishing-line, or anything to eat.  We

+was in ruther too much of a sweat to think of so many things.  It warn't

+good judgment to put everything on the raft.

+

+If the men went to the island I just expect they found the camp fire I

+built, and watched it all night for Jim to come.  Anyways, they stayed

+away from us, and if my building the fire never fooled them it warn't no

+fault of mine.  I played it as low down on them as I could.

+

+When the first streak of day began to show we tied up to a towhead in a

+big bend on the Illinois side, and hacked off cottonwood branches with

+the hatchet, and covered up the raft with them so she looked like there

+had been a cave-in in the bank there.  A tow-head is a sandbar that has

+cottonwoods on it as thick as harrow-teeth.

+

+We had mountains on the Missouri shore and heavy timber on the Illinois

+side, and the channel was down the Missouri shore at that place, so we

+warn't afraid of anybody running across us.  We laid there all day,

+and watched the rafts and steamboats spin down the Missouri shore, and

+up-bound steamboats fight the big river in the middle.  I told Jim all

+about the time I had jabbering with that woman; and Jim said she was

+a smart one, and if she was to start after us herself she wouldn't set

+down and watch a camp fireno, sir, she'd fetch a dog.  Well, then, I

+said, why couldn't she tell her husband to fetch a dog?  Jim said he

+bet she did think of it by the time the men was ready to start, and he

+believed they must a gone up-town to get a dog and so they lost all that

+time, or else we wouldn't be here on a towhead sixteen or seventeen mile

+below the villageno, indeedy, we would be in that same old town again.

+ So I said I didn't care what was the reason they didn't get us as long

+as they didn't.

+

+When it was beginning to come on dark we poked our heads out of the

+cottonwood thicket, and looked up and down and across; nothing in sight;

+so Jim took up some of the top planks of the raft and built a snug

+wigwam to get under in blazing weather and rainy, and to keep the things

+dry. Jim made a floor for the wigwam, and raised it a foot or more above

+the level of the raft, so now the blankets and all the traps was out of

+reach of steamboat waves.  Right in the middle of the wigwam we made a

+layer of dirt about five or six inches deep with a frame around it for

+to hold it to its place; this was to build a fire on in sloppy weather

+or chilly; the wigwam would keep it from being seen.  We made an extra

+steering-oar, too, because one of the others might get broke on a snag

+or something. We fixed up a short forked stick to hang the old lantern

+on, because we must always light the lantern whenever we see a steamboat

+coming down-stream, to keep from getting run over; but we wouldn't have

+to light it for up-stream boats unless we see we was in what they call

+a "crossing"; for the river was pretty high yet, very low banks being

+still a little under water; so up-bound boats didn't always run the

+channel, but hunted easy water.

+

+This second night we run between seven and eight hours, with a current

+that was making over four mile an hour.  We catched fish and talked,

+and we took a swim now and then to keep off sleepiness.  It was kind of

+solemn, drifting down the big, still river, laying on our backs looking

+up at the stars, and we didn't ever feel like talking loud, and it

+warn't often that we laughedonly a little kind of a low chuckle.  We

+had mighty good weather as a general thing, and nothing ever happened to

+us at allthat night, nor the next, nor the next.

+

+Every night we passed towns, some of them away up on black hillsides,

+nothing but just a shiny bed of lights; not a house could you see.  The

+fifth night we passed St. Louis, and it was like the whole world lit up.

+In St. Petersburg they used to say there was twenty or thirty thousand

+people in St. Louis, but I never believed it till I see that wonderful

+spread of lights at two o'clock that still night.  There warn't a sound

+there; everybody was asleep.

+

+Every night now I used to slip ashore towards ten o'clock at some little

+village, and buy ten or fifteen cents' worth of meal or bacon or other

+stuff to eat; and sometimes I lifted a chicken that warn't roosting

+comfortable, and took him along.  Pap always said, take a chicken when

+you get a chance, because if you don't want him yourself you can easy

+find somebody that does, and a good deed ain't ever forgot.  I never see

+pap when he didn't want the chicken himself, but that is what he used to

+say, anyway.

+

+Mornings before daylight I slipped into cornfields and borrowed a

+watermelon, or a mushmelon, or a punkin, or some new corn, or things of

+that kind.  Pap always said it warn't no harm to borrow things if you

+was meaning to pay them back some time; but the widow said it warn't

+anything but a soft name for stealing, and no decent body would do it.

+ Jim said he reckoned the widow was partly right and pap was partly

+right; so the best way would be for us to pick out two or three things

+from the list and say we wouldn't borrow them any morethen he reckoned

+it wouldn't be no harm to borrow the others.  So we talked it over all

+one night, drifting along down the river, trying to make up our minds

+whether to drop the watermelons, or the cantelopes, or the mushmelons,

+or what.  But towards daylight we got it all settled satisfactory, and

+concluded to drop crabapples and p'simmons.  We warn't feeling just

+right before that, but it was all comfortable now.  I was glad the way

+it come out, too, because crabapples ain't ever good, and the p'simmons

+wouldn't be ripe for two or three months yet.

+

+We shot a water-fowl now and then that got up too early in the morning

+or didn't go to bed early enough in the evening.  Take it all round, we

+lived pretty high.

+

+The fifth night below St. Louis we had a big storm after midnight, with

+a power of thunder and lightning, and the rain poured down in a solid

+sheet. We stayed in the wigwam and let the raft take care of itself.

+When the lightning glared out we could see a big straight river ahead,

+and high, rocky bluffs on both sides.  By and by says I, "Hel-lo, Jim,

+looky yonder!" It was a steamboat that had killed herself on a rock.

+ We was drifting straight down for her.  The lightning showed her very

+distinct.  She was leaning over, with part of her upper deck above

+water, and you could see every little chimbly-guy clean and clear, and a

+chair by the big bell, with an old slouch hat hanging on the back of it,

+when the flashes come.

+

+Well, it being away in the night and stormy, and all so mysterious-like,

+I felt just the way any other boy would a felt when I see that wreck

+laying there so mournful and lonesome in the middle of the river.  I

+wanted to get aboard of her and slink around a little, and see what

+there was there.  So I says:

+

+"Le's land on her, Jim."

+

+But Jim was dead against it at first.  He says:

+

+"I doan' want to go fool'n 'long er no wrack.  We's doin' blame' well,

+en we better let blame' well alone, as de good book says.  Like as not

+dey's a watchman on dat wrack."

+

+"Watchman your grandmother," I says; "there ain't nothing to watch but

+the texas and the pilot-house; and do you reckon anybody's going to resk

+his life for a texas and a pilot-house such a night as this, when

+it's likely to break up and wash off down the river any minute?"  Jim

+couldn't say nothing to that, so he didn't try.  "And besides," I says,

+"we might borrow something worth having out of the captain's stateroom.

+ Seegars, I bet youand cost five cents apiece, solid cash.  Steamboat

+captains is always rich, and get sixty dollars a month, and they don't

+care a cent what a thing costs, you know, long as they want it.  Stick a

+candle in your pocket; I can't rest, Jim, till we give her a rummaging.

+ Do you reckon Tom Sawyer would ever go by this thing?  Not for pie, he

+wouldn't. He'd call it an adventurethat's what he'd call it; and he'd

+land on that wreck if it was his last act.  And wouldn't he throw style

+into it?wouldn't he spread himself, nor nothing?  Why, you'd think it

+was Christopher C'lumbus discovering Kingdom-Come.  I wish Tom Sawyer

+was here."

+

+Jim he grumbled a little, but give in.  He said we mustn't talk any more

+than we could help, and then talk mighty low.  The lightning showed us

+the wreck again just in time, and we fetched the stabboard derrick, and

+made fast there.

+

+The deck was high out here.  We went sneaking down the slope of it to

+labboard, in the dark, towards the texas, feeling our way slow with our

+feet, and spreading our hands out to fend off the guys, for it was so

+dark we couldn't see no sign of them.  Pretty soon we struck the forward

+end of the skylight, and clumb on to it; and the next step fetched us in

+front of the captain's door, which was open, and by Jimminy, away down

+through the texas-hall we see a light! and all in the same second we

+seem to hear low voices in yonder!

+

+Jim whispered and said he was feeling powerful sick, and told me to come

+along.  I says, all right, and was going to start for the raft; but just

+then I heard a voice wail out and say:

+

+"Oh, please don't, boys; I swear I won't ever tell!"

+

+Another voice said, pretty loud:

+

+"It's a lie, Jim Turner.  You've acted this way before.  You always want

+more'n your share of the truck, and you've always got it, too, because

+you've swore 't if you didn't you'd tell.  But this time you've said

+it jest one time too many.  You're the meanest, treacherousest hound in

+this country."

+

+By this time Jim was gone for the raft.  I was just a-biling with

+curiosity; and I says to myself, Tom Sawyer wouldn't back out now,

+and so I won't either; I'm a-going to see what's going on here.  So I

+dropped on my hands and knees in the little passage, and crept aft

+in the dark till there warn't but one stateroom betwixt me and the

+cross-hall of the texas.  Then in there I see a man stretched on the

+floor and tied hand and foot, and two men standing over him, and one

+of them had a dim lantern in his hand, and the other one had a pistol.

+ This one kept pointing the pistol at the man's head on the floor, and

+saying:

+

+"I'd like to!  And I orter, tooa mean skunk!"

+

+The man on the floor would shrivel up and say, "Oh, please don't, Bill;

+I hain't ever goin' to tell."

+

+And every time he said that the man with the lantern would laugh and

+say:

+

+"'Deed you ain't!  You never said no truer thing 'n that, you bet

+you." And once he said:  "Hear him beg! and yit if we hadn't got the

+best of him and tied him he'd a killed us both.  And what for?  Jist

+for noth'n. Jist because we stood on our rightsthat's what for.  But

+I lay you ain't a-goin' to threaten nobody any more, Jim Turner.  Put

+up that pistol, Bill."

+

+Bill says:

+

+"I don't want to, Jake Packard.  I'm for killin' himand didn't he kill

+old Hatfield jist the same wayand don't he deserve it?"

+

+"But I don't want him killed, and I've got my reasons for it."

+

+"Bless yo' heart for them words, Jake Packard!  I'll never forgit you

+long's I live!" says the man on the floor, sort of blubbering.

+

+Packard didn't take no notice of that, but hung up his lantern on a nail

+and started towards where I was there in the dark, and motioned Bill

+to come.  I crawfished as fast as I could about two yards, but the boat

+slanted so that I couldn't make very good time; so to keep from getting

+run over and catched I crawled into a stateroom on the upper side.

+ The man came a-pawing along in the dark, and when Packard got to my

+stateroom, he says:

+

+"Herecome in here."

+

+And in he come, and Bill after him.  But before they got in I was up

+in the upper berth, cornered, and sorry I come.  Then they stood there,

+with their hands on the ledge of the berth, and talked.  I couldn't see

+them, but I could tell where they was by the whisky they'd been having.

+ I was glad I didn't drink whisky; but it wouldn't made much difference

+anyway, because most of the time they couldn't a treed me because I

+didn't breathe.  I was too scared.  And, besides, a body couldn't

+breathe and hear such talk.  They talked low and earnest.  Bill wanted

+to kill Turner.  He says:

+

+"He's said he'll tell, and he will.  If we was to give both our shares

+to him now it wouldn't make no difference after the row and the way

+we've served him.  Shore's you're born, he'll turn State's evidence; now

+you hear me.  I'm for putting him out of his troubles."

+

+"So'm I," says Packard, very quiet.

+

+"Blame it, I'd sorter begun to think you wasn't.  Well, then, that's all

+right.  Le's go and do it."

+

+"Hold on a minute; I hain't had my say yit.  You listen to me.

+Shooting's good, but there's quieter ways if the thing's got to be

+done. But what I say is this:  it ain't good sense to go court'n around

+after a halter if you can git at what you're up to in some way that's

+jist as good and at the same time don't bring you into no resks.  Ain't

+that so?"

+

+"You bet it is.  But how you goin' to manage it this time?"

+

+"Well, my idea is this:  we'll rustle around and gather up whatever

+pickins we've overlooked in the staterooms, and shove for shore and hide

+the truck. Then we'll wait.  Now I say it ain't a-goin' to be more'n two

+hours befo' this wrack breaks up and washes off down the river.  See?

+He'll be drownded, and won't have nobody to blame for it but his own

+self.  I reckon that's a considerble sight better 'n killin' of him.

+ I'm unfavorable to killin' a man as long as you can git aroun' it; it

+ain't good sense, it ain't good morals.  Ain't I right?"

+

+"Yes, I reck'n you are.  But s'pose she don't break up and wash off?"

+

+"Well, we can wait the two hours anyway and see, can't we?"

+

+"All right, then; come along."

+

+So they started, and I lit out, all in a cold sweat, and scrambled

+forward. It was dark as pitch there; but I said, in a kind of a coarse

+whisper, "Jim!" and he answered up, right at my elbow, with a sort of a

+moan, and I says:

+

+"Quick, Jim, it ain't no time for fooling around and moaning; there's a

+gang of murderers in yonder, and if we don't hunt up their boat and set

+her drifting down the river so these fellows can't get away from the

+wreck there's one of 'em going to be in a bad fix.  But if we find their

+boat we can put all of 'em in a bad fixfor the sheriff 'll get 'em.

+Quickhurry!  I'll hunt the labboard side, you hunt the stabboard. You

+start at the raft, and"

+

+"Oh, my lordy, lordy!  raf'?  Dey ain' no raf' no mo'; she done broke

+loose en gone Ien here we is!"

+

+

+

+

+CHAPTER XIII.

+

+WELL, I catched my breath and most fainted.  Shut up on a wreck with

+such a gang as that!  But it warn't no time to be sentimentering.  We'd

+got to find that boat nowhad to have it for ourselves.  So we went

+a-quaking and shaking down the stabboard side, and slow work it was,

+tooseemed a week before we got to the stern.  No sign of a boat.  Jim

+said he didn't believe he could go any furtherso scared he hadn't

+hardly any strength left, he said.  But I said, come on, if we get left

+on this wreck we are in a fix, sure.  So on we prowled again.  We struck

+for the stern of the texas, and found it, and then scrabbled along

+forwards on the skylight, hanging on from shutter to shutter, for the

+edge of the skylight was in the water.  When we got pretty close to the

+cross-hall door there was the skiff, sure enough!  I could just barely

+see her.  I felt ever so thankful.  In another second I would a been

+aboard of her, but just then the door opened.  One of the men stuck his

+head out only about a couple of foot from me, and I thought I was gone;

+but he jerked it in again, and says:

+

+"Heave that blame lantern out o' sight, Bill!"

+

+He flung a bag of something into the boat, and then got in himself and

+set down.  It was Packard.  Then Bill he come out and got in.  Packard

+says, in a low voice:

+

+"All readyshove off!"

+

+I couldn't hardly hang on to the shutters, I was so weak.  But Bill

+says:

+

+"Hold on'd you go through him?"

+

+"No.  Didn't you?"

+

+"No.  So he's got his share o' the cash yet."

+

+"Well, then, come along; no use to take truck and leave money."

+

+"Say, won't he suspicion what we're up to?"

+

+"Maybe he won't.  But we got to have it anyway. Come along."

+

+So they got out and went in.

+

+The door slammed to because it was on the careened side; and in a half

+second I was in the boat, and Jim come tumbling after me.  I out with my

+knife and cut the rope, and away we went!

+

+We didn't touch an oar, and we didn't speak nor whisper, nor hardly even

+breathe.  We went gliding swift along, dead silent, past the tip of the

+paddle-box, and past the stern; then in a second or two more we was a

+hundred yards below the wreck, and the darkness soaked her up, every

+last sign of her, and we was safe, and knowed it.

+

+When we was three or four hundred yards down-stream we see the lantern

+show like a little spark at the texas door for a second, and we knowed

+by that that the rascals had missed their boat, and was beginning to

+understand that they was in just as much trouble now as Jim Turner was.

+

+Then Jim manned the oars, and we took out after our raft.  Now was the

+first time that I begun to worry about the menI reckon I hadn't

+had time to before.  I begun to think how dreadful it was, even for

+murderers, to be in such a fix.  I says to myself, there ain't no

+telling but I might come to be a murderer myself yet, and then how would

+I like it?  So says I to Jim:

+

+"The first light we see we'll land a hundred yards below it or above

+it, in a place where it's a good hiding-place for you and the skiff, and

+then I'll go and fix up some kind of a yarn, and get somebody to go for

+that gang and get them out of their scrape, so they can be hung when

+their time comes."

+

+But that idea was a failure; for pretty soon it begun to storm again,

+and this time worse than ever.  The rain poured down, and never a light

+showed; everybody in bed, I reckon.  We boomed along down the river,

+watching for lights and watching for our raft.  After a long time the

+rain let up, but the clouds stayed, and the lightning kept whimpering,

+and by and by a flash showed us a black thing ahead, floating, and we

+made for it.

+

+It was the raft, and mighty glad was we to get aboard of it again.  We

+seen a light now away down to the right, on shore.  So I said I would

+go for it. The skiff was half full of plunder which that gang had stole

+there on the wreck.  We hustled it on to the raft in a pile, and I told

+Jim to float along down, and show a light when he judged he had gone

+about two mile, and keep it burning till I come; then I manned my oars

+and shoved for the light.  As I got down towards it three or four more

+showedup on a hillside.  It was a village.  I closed in above the shore

+light, and laid on my oars and floated.  As I went by I see it was a

+lantern hanging on the jackstaff of a double-hull ferryboat.  I skimmed

+around for the watchman, a-wondering whereabouts he slept; and by and

+by I found him roosting on the bitts forward, with his head down between

+his knees.  I gave his shoulder two or three little shoves, and begun to

+cry.

+

+He stirred up in a kind of a startlish way; but when he see it was only

+me he took a good gap and stretch, and then he says:

+

+"Hello, what's up?  Don't cry, bub.  What's the trouble?"

+

+I says:

+

+"Pap, and mam, and sis, and"

+

+Then I broke down.  He says:

+

+"Oh, dang it now, don't take on so; we all has to have our troubles,

+and this 'n 'll come out all right.  What's the matter with 'em?"

+

+"They'rethey'reare you the watchman of the boat?"

+

+"Yes," he says, kind of pretty-well-satisfied like.  "I'm the captain

+and the owner and the mate and the pilot and watchman and head

+deck-hand; and sometimes I'm the freight and passengers.  I ain't as

+rich as old Jim Hornback, and I can't be so blame' generous and good

+to Tom, Dick, and Harry as what he is, and slam around money the way he

+does; but I've told him a many a time 't I wouldn't trade places with

+him; for, says I, a sailor's life's the life for me, and I'm derned if

+I'd live two mile out o' town, where there ain't nothing ever goin'

+on, not for all his spondulicks and as much more on top of it.  Says I"

+

+I broke in and says:

+

+"They're in an awful peck of trouble, and"

+

+"Who is?"

+

+"Why, pap and mam and sis and Miss Hooker; and if you'd take your

+ferryboat and go up there"

+

+"Up where?  Where are they?"

+

+"On the wreck."

+

+"What wreck?"

+

+"Why, there ain't but one."

+

+"What, you don't mean the Walter Scott?"

+

+"Yes."

+

+"Good land! what are they doin' there, for gracious sakes?"

+

+"Well, they didn't go there a-purpose."

+

+"I bet they didn't!  Why, great goodness, there ain't no chance for 'em

+if they don't git off mighty quick!  Why, how in the nation did they

+ever git into such a scrape?"

+

+"Easy enough.  Miss Hooker was a-visiting up there to the town"

+

+"Yes, Booth's Landinggo on."

+

+"She was a-visiting there at Booth's Landing, and just in the edge of

+the evening she started over with her nigger woman in the horse-ferry

+to stay all night at her friend's house, Miss What-you-may-call-her I

+disremember her nameand they lost their steering-oar, and swung

+around and went a-floating down, stern first, about two mile, and

+saddle-baggsed on the wreck, and the ferryman and the nigger woman and

+the horses was all lost, but Miss Hooker she made a grab and got aboard

+the wreck.  Well, about an hour after dark we come along down in our

+trading-scow, and it was so dark we didn't notice the wreck till we was

+right on it; and so we saddle-baggsed; but all of us was saved but

+Bill Whippleand oh, he was the best cretur!I most wish 't it had

+been me, I do."

+

+"My George!  It's the beatenest thing I ever struck.  And then what

+did you all do?"

+

+"Well, we hollered and took on, but it's so wide there we couldn't

+make nobody hear.  So pap said somebody got to get ashore and get help

+somehow. I was the only one that could swim, so I made a dash for it,

+and Miss Hooker she said if I didn't strike help sooner, come here and

+hunt up her uncle, and he'd fix the thing.  I made the land about a mile

+below, and been fooling along ever since, trying to get people to do

+something, but they said, 'What, in such a night and such a current?

+There ain't no sense in it; go for the steam ferry.'  Now if you'll go

+and"

+

+"By Jackson, I'd like to, and, blame it, I don't know but I will; but

+who in the dingnation's a-going' to pay for it?  Do you reckon your

+pap"

+

+"Why that's all right.  Miss Hooker she tole me, particular, that

+her uncle Hornback"

+

+"Great guns! is he her uncle?  Looky here, you break for that light

+over yonder-way, and turn out west when you git there, and about a

+quarter of a mile out you'll come to the tavern; tell 'em to dart you

+out to Jim Hornback's, and he'll foot the bill.  And don't you fool

+around any, because he'll want to know the news.  Tell him I'll have

+his niece all safe before he can get to town.  Hump yourself, now; I'm

+a-going up around the corner here to roust out my engineer."

+

+I struck for the light, but as soon as he turned the corner I went back

+and got into my skiff and bailed her out, and then pulled up shore in

+the easy water about six hundred yards, and tucked myself in among

+some woodboats; for I couldn't rest easy till I could see the ferryboat

+start. But take it all around, I was feeling ruther comfortable on

+accounts of taking all this trouble for that gang, for not many would

+a done it.  I wished the widow knowed about it.  I judged she would be

+proud of me for helping these rapscallions, because rapscallions and

+dead beats is the kind the widow and good people takes the most interest

+in.

+

+Well, before long here comes the wreck, dim and dusky, sliding along

+down! A kind of cold shiver went through me, and then I struck out for

+her.  She was very deep, and I see in a minute there warn't much chance

+for anybody being alive in her.  I pulled all around her and hollered

+a little, but there wasn't any answer; all dead still.  I felt a little

+bit heavy-hearted about the gang, but not much, for I reckoned if they

+could stand it I could.

+

+Then here comes the ferryboat; so I shoved for the middle of the river

+on a long down-stream slant; and when I judged I was out of eye-reach

+I laid on my oars, and looked back and see her go and smell around the

+wreck for Miss Hooker's remainders, because the captain would know her

+uncle Hornback would want them; and then pretty soon the ferryboat give

+it up and went for the shore, and I laid into my work and went a-booming

+down the river.

+

+It did seem a powerful long time before Jim's light showed up; and when

+it did show it looked like it was a thousand mile off.  By the time I

+got there the sky was beginning to get a little gray in the east; so we

+struck for an island, and hid the raft, and sunk the skiff, and turned

+in and slept like dead people.

+

+

+

+

+CHAPTER XIV.

+

+BY and by, when we got up, we turned over the truck the gang had stole

+off of the wreck, and found boots, and blankets, and clothes, and all

+sorts of other things, and a lot of books, and a spyglass, and three

+boxes of seegars.  We hadn't ever been this rich before in neither of

+our lives.  The seegars was prime.  We laid off all the afternoon in the

+woods talking, and me reading the books, and having a general good

+time. I told Jim all about what happened inside the wreck and at the

+ferryboat, and I said these kinds of things was adventures; but he said

+he didn't want no more adventures.  He said that when I went in the

+texas and he crawled back to get on the raft and found her gone he

+nearly died, because he judged it was all up with him anyway it could

+be fixed; for if he didn't get saved he would get drownded; and if he

+did get saved, whoever saved him would send him back home so as to get

+the reward, and then Miss Watson would sell him South, sure.  Well, he

+was right; he was most always right; he had an uncommon level head for a

+nigger.

+

+I read considerable to Jim about kings and dukes and earls and such, and

+how gaudy they dressed, and how much style they put on, and called each

+other your majesty, and your grace, and your lordship, and so on, 'stead

+of mister; and Jim's eyes bugged out, and he was interested.  He says:

+

+"I didn' know dey was so many un um.  I hain't hearn 'bout none un um,

+skasely, but ole King Sollermun, onless you counts dem kings dat's in a

+pack er k'yards.  How much do a king git?"

+

+"Get?"  I says; "why, they get a thousand dollars a month if they want

+it; they can have just as much as they want; everything belongs to

+them."

+

+"Ain' dat gay?  En what dey got to do, Huck?"

+

+"They don't do nothing!  Why, how you talk! They just set around."

+

+"No; is dat so?"

+

+"Of course it is.  They just set aroundexcept, maybe, when there's a

+war; then they go to the war.  But other times they just lazy around; or

+go hawkingjust hawking and spSh!d' you hear a noise?"

+

+We skipped out and looked; but it warn't nothing but the flutter of a

+steamboat's wheel away down, coming around the point; so we come back.

+

+"Yes," says I, "and other times, when things is dull, they fuss with the

+parlyment; and if everybody don't go just so he whacks their heads off.

+But mostly they hang round the harem."

+

+"Roun' de which?"

+

+"Harem."

+

+"What's de harem?"

+

+"The place where he keeps his wives.  Don't you know about the harem?

+Solomon had one; he had about a million wives."

+

+"Why, yes, dat's so; II'd done forgot it.  A harem's a bo'd'n-house, I

+reck'n.  Mos' likely dey has rackety times in de nussery.  En I reck'n

+de wives quarrels considable; en dat 'crease de racket.  Yit dey say

+Sollermun de wises' man dat ever live'.  I doan' take no stock in

+dat. Bekase why: would a wise man want to live in de mids' er sich a

+blim-blammin' all de time?  No'deed he wouldn't.  A wise man 'ud take

+en buil' a biler-factry; en den he could shet down de biler-factry

+when he want to res'."

+

+"Well, but he was the wisest man, anyway; because the widow she told

+me so, her own self."

+

+"I doan k'yer what de widder say, he warn't no wise man nuther.  He

+had some er de dad-fetchedes' ways I ever see.  Does you know 'bout dat

+chile dat he 'uz gwyne to chop in two?"

+

+"Yes, the widow told me all about it."

+

+"Well, den!  Warn' dat de beatenes' notion in de worl'?  You jes'

+take en look at it a minute.  Dah's de stump, dahdat's one er de women;

+heah's youdat's de yuther one; I's Sollermun; en dish yer dollar bill's

+de chile.  Bofe un you claims it.  What does I do?  Does I shin aroun'

+mongs' de neighbors en fine out which un you de bill do b'long to, en

+han' it over to de right one, all safe en soun', de way dat anybody dat

+had any gumption would?  No; I take en whack de bill in two, en give

+half un it to you, en de yuther half to de yuther woman.  Dat's de way

+Sollermun was gwyne to do wid de chile.  Now I want to ast you:  what's

+de use er dat half a bill?can't buy noth'n wid it.  En what use is a

+half a chile?  I wouldn' give a dern for a million un um."

+

+"But hang it, Jim, you've clean missed the pointblame it, you've missed

+it a thousand mile."

+

+"Who?  Me?  Go 'long.  Doan' talk to me 'bout yo' pints.  I reck'n I

+knows sense when I sees it; en dey ain' no sense in sich doin's as

+dat. De 'spute warn't 'bout a half a chile, de 'spute was 'bout a whole

+chile; en de man dat think he kin settle a 'spute 'bout a whole chile

+wid a half a chile doan' know enough to come in out'n de rain.  Doan'

+talk to me 'bout Sollermun, Huck, I knows him by de back."

+

+"But I tell you you don't get the point."

+

+"Blame de point!  I reck'n I knows what I knows.  En mine you, de real

+pint is down furderit's down deeper.  It lays in de way Sollermun was

+raised.  You take a man dat's got on'y one or two chillen; is dat man

+gwyne to be waseful o' chillen?  No, he ain't; he can't 'ford it.  He

+know how to value 'em.  But you take a man dat's got 'bout five million

+chillen runnin' roun' de house, en it's diffunt.  He as soon chop a

+chile in two as a cat. Dey's plenty mo'.  A chile er two, mo' er less,

+warn't no consekens to Sollermun, dad fatch him!"

+

+I never see such a nigger.  If he got a notion in his head once, there

+warn't no getting it out again.  He was the most down on Solomon of

+any nigger I ever see.  So I went to talking about other kings, and let

+Solomon slide.  I told about Louis Sixteenth that got his head cut off

+in France long time ago; and about his little boy the dolphin, that

+would a been a king, but they took and shut him up in jail, and some say

+he died there.

+

+"Po' little chap."

+

+"But some says he got out and got away, and come to America."

+

+"Dat's good!  But he'll be pooty lonesomedey ain' no kings here, is

+dey, Huck?"

+

+"No."

+

+"Den he cain't git no situation.  What he gwyne to do?"

+

+"Well, I don't know.  Some of them gets on the police, and some of them

+learns people how to talk French."

+

+"Why, Huck, doan' de French people talk de same way we does?"

+

+"No, Jim; you couldn't understand a word they saidnot a single word."

+

+"Well, now, I be ding-busted!  How do dat come?"

+

+"I don't know; but it's so.  I got some of their jabber out of a book.

+S'pose a man was to come to you and say Polly-voo-franzywhat would you

+think?"

+

+"I wouldn' think nuff'n; I'd take en bust him over de headdat is, if he

+warn't white.  I wouldn't 'low no nigger to call me dat."

+

+"Shucks, it ain't calling you anything.  It's only saying, do you know

+how to talk French?"

+

+"Well, den, why couldn't he say it?"

+

+"Why, he is a-saying it.  That's a Frenchman's way of saying it."

+

+"Well, it's a blame ridicklous way, en I doan' want to hear no mo' 'bout

+it.  Dey ain' no sense in it."

+

+"Looky here, Jim; does a cat talk like we do?"

+

+"No, a cat don't."

+

+"Well, does a cow?"

+

+"No, a cow don't, nuther."

+

+"Does a cat talk like a cow, or a cow talk like a cat?"

+

+"No, dey don't."

+

+"It's natural and right for 'em to talk different from each other, ain't

+it?"

+

+"Course."

+

+"And ain't it natural and right for a cat and a cow to talk different

+from us?"

+

+"Why, mos' sholy it is."

+

+"Well, then, why ain't it natural and right for a Frenchman to talk

+different from us?  You answer me that."

+

+"Is a cat a man, Huck?"

+

+"No."

+

+"Well, den, dey ain't no sense in a cat talkin' like a man.  Is a cow a

+man?er is a cow a cat?"

+

+"No, she ain't either of them."

+

+"Well, den, she ain't got no business to talk like either one er the

+yuther of 'em.  Is a Frenchman a man?"

+

+"Yes."

+

+"Well, den!  Dad blame it, why doan' he talk like a man?  You answer

+me dat!"

+

+I see it warn't no use wasting wordsyou can't learn a nigger to argue.

+So I quit.

+

+

+

+

+CHAPTER XV.

+

+WE judged that three nights more would fetch us to Cairo, at the bottom

+of Illinois, where the Ohio River comes in, and that was what we was

+after.  We would sell the raft and get on a steamboat and go way up the

+Ohio amongst the free States, and then be out of trouble.

+

+Well, the second night a fog begun to come on, and we made for a towhead

+to tie to, for it wouldn't do to try to run in a fog; but when I paddled

+ahead in the canoe, with the line to make fast, there warn't anything

+but little saplings to tie to.  I passed the line around one of them

+right on the edge of the cut bank, but there was a stiff current, and

+the raft come booming down so lively she tore it out by the roots and

+away she went.  I see the fog closing down, and it made me so sick and

+scared I couldn't budge for most a half a minute it seemed to meand

+then there warn't no raft in sight; you couldn't see twenty yards.  I

+jumped into the canoe and run back to the stern, and grabbed the paddle

+and set her back a stroke.  But she didn't come.  I was in such a hurry

+I hadn't untied her.  I got up and tried to untie her, but I was so

+excited my hands shook so I couldn't hardly do anything with them.

+

+As soon as I got started I took out after the raft, hot and heavy, right

+down the towhead.  That was all right as far as it went, but the towhead

+warn't sixty yards long, and the minute I flew by the foot of it I shot

+out into the solid white fog, and hadn't no more idea which way I was

+going than a dead man.

+

+Thinks I, it won't do to paddle; first I know I'll run into the bank

+or a towhead or something; I got to set still and float, and yet it's

+mighty fidgety business to have to hold your hands still at such a time.

+ I whooped and listened.  Away down there somewheres I hears a small

+whoop, and up comes my spirits.  I went tearing after it, listening

+sharp to hear it again.  The next time it come I see I warn't heading

+for it, but heading away to the right of it.  And the next time I was

+heading away to the left of itand not gaining on it much either, for

+I was flying around, this way and that and t'other, but it was going

+straight ahead all the time.

+

+I did wish the fool would think to beat a tin pan, and beat it all the

+time, but he never did, and it was the still places between the whoops

+that was making the trouble for me.  Well, I fought along, and directly

+I hears the whoop behind me.  I was tangled good now.  That was

+somebody else's whoop, or else I was turned around.

+

+I throwed the paddle down.  I heard the whoop again; it was behind me

+yet, but in a different place; it kept coming, and kept changing its

+place, and I kept answering, till by and by it was in front of me again,

+and I knowed the current had swung the canoe's head down-stream, and I

+was all right if that was Jim and not some other raftsman hollering.

+ I couldn't tell nothing about voices in a fog, for nothing don't look

+natural nor sound natural in a fog.

+

+The whooping went on, and in about a minute I come a-booming down on a

+cut bank with smoky ghosts of big trees on it, and the current throwed

+me off to the left and shot by, amongst a lot of snags that fairly

+roared, the currrent was tearing by them so swift.

+

+In another second or two it was solid white and still again.  I set

+perfectly still then, listening to my heart thump, and I reckon I didn't

+draw a breath while it thumped a hundred.

+

+I just give up then.  I knowed what the matter was.  That cut bank

+was an island, and Jim had gone down t'other side of it.  It warn't no

+towhead that you could float by in ten minutes.  It had the big timber

+of a regular island; it might be five or six miles long and more than

+half a mile wide.

+

+I kept quiet, with my ears cocked, about fifteen minutes, I reckon.  I

+was floating along, of course, four or five miles an hour; but you don't

+ever think of that.  No, you feel like you are laying dead still on

+the water; and if a little glimpse of a snag slips by you don't think to

+yourself how fast you're going, but you catch your breath and think,

+my! how that snag's tearing along.  If you think it ain't dismal and

+lonesome out in a fog that way by yourself in the night, you try it

+onceyou'll see.

+

+Next, for about a half an hour, I whoops now and then; at last I hears

+the answer a long ways off, and tries to follow it, but I couldn't do

+it, and directly I judged I'd got into a nest of towheads, for I had

+little dim glimpses of them on both sides of mesometimes just a narrow

+channel between, and some that I couldn't see I knowed was there because

+I'd hear the wash of the current against the old dead brush and trash

+that hung over the banks.  Well, I warn't long loosing the whoops down

+amongst the towheads; and I only tried to chase them a little while,

+anyway, because it was worse than chasing a Jack-o'-lantern.  You never

+knowed a sound dodge around so, and swap places so quick and so much.

+

+I had to claw away from the bank pretty lively four or five times, to

+keep from knocking the islands out of the river; and so I judged the

+raft must be butting into the bank every now and then, or else it would

+get further ahead and clear out of hearingit was floating a little

+faster than what I was.

+

+Well, I seemed to be in the open river again by and by, but I couldn't

+hear no sign of a whoop nowheres.  I reckoned Jim had fetched up on a

+snag, maybe, and it was all up with him.  I was good and tired, so I

+laid down in the canoe and said I wouldn't bother no more.  I didn't

+want to go to sleep, of course; but I was so sleepy I couldn't help it;

+so I thought I would take jest one little cat-nap.

+

+But I reckon it was more than a cat-nap, for when I waked up the stars

+was shining bright, the fog was all gone, and I was spinning down a

+big bend stern first.  First I didn't know where I was; I thought I was

+dreaming; and when things began to come back to me they seemed to come

+up dim out of last week.

+

+It was a monstrous big river here, with the tallest and the thickest

+kind of timber on both banks; just a solid wall, as well as I could see

+by the stars.  I looked away down-stream, and seen a black speck on the

+water. I took after it; but when I got to it it warn't nothing but a

+couple of sawlogs made fast together.  Then I see another speck, and

+chased that; then another, and this time I was right.  It was the raft.

+

+When I got to it Jim was setting there with his head down between his

+knees, asleep, with his right arm hanging over the steering-oar.  The

+other oar was smashed off, and the raft was littered up with leaves and

+branches and dirt.  So she'd had a rough time.

+

+I made fast and laid down under Jim's nose on the raft, and began to

+gap, and stretch my fists out against Jim, and says:

+

+"Hello, Jim, have I been asleep?  Why didn't you stir me up?"

+

+"Goodness gracious, is dat you, Huck?  En you ain' deadyou ain'

+drowndedyou's back agin?  It's too good for true, honey, it's too good

+for true. Lemme look at you chile, lemme feel o' you.  No, you ain'

+dead! you's back agin, 'live en soun', jis de same ole Huckde same ole

+Huck, thanks to goodness!"

+

+"What's the matter with you, Jim?  You been a-drinking?"

+

+"Drinkin'?  Has I ben a-drinkin'?  Has I had a chance to be a-drinkin'?"

+

+"Well, then, what makes you talk so wild?"

+

+"How does I talk wild?"

+

+"How?  Why, hain't you been talking about my coming back, and all that

+stuff, as if I'd been gone away?"

+

+"HuckHuck Finn, you look me in de eye; look me in de eye.  Hain't you

+ben gone away?"

+

+"Gone away?  Why, what in the nation do you mean?  I hain't been gone

+anywheres.  Where would I go to?"

+

+"Well, looky here, boss, dey's sumf'n wrong, dey is.  Is I me, or who

+is I? Is I heah, or whah is I?  Now dat's what I wants to know."

+

+"Well, I think you're here, plain enough, but I think you're a

+tangle-headed old fool, Jim."

+

+"I is, is I?  Well, you answer me dis:  Didn't you tote out de line in

+de canoe fer to make fas' to de tow-head?"

+

+"No, I didn't.  What tow-head?  I hain't see no tow-head."

+

+"You hain't seen no towhead?  Looky here, didn't de line pull loose en

+de raf' go a-hummin' down de river, en leave you en de canoe behine in

+de fog?"

+

+"What fog?"

+

+"Why, de fog!de fog dat's been aroun' all night.  En didn't you whoop,

+en didn't I whoop, tell we got mix' up in de islands en one un us got

+los' en t'other one was jis' as good as los', 'kase he didn' know whah

+he wuz? En didn't I bust up agin a lot er dem islands en have a turrible

+time en mos' git drownded?  Now ain' dat so, bossain't it so?  You

+answer me dat."

+

+"Well, this is too many for me, Jim.  I hain't seen no fog, nor no

+islands, nor no troubles, nor nothing.  I been setting here talking with

+you all night till you went to sleep about ten minutes ago, and I reckon

+I done the same.  You couldn't a got drunk in that time, so of course

+you've been dreaming."

+

+"Dad fetch it, how is I gwyne to dream all dat in ten minutes?"

+

+"Well, hang it all, you did dream it, because there didn't any of it

+happen."

+

+"But, Huck, it's all jis' as plain to me as"

+

+"It don't make no difference how plain it is; there ain't nothing in it.

+I know, because I've been here all the time."

+

+Jim didn't say nothing for about five minutes, but set there studying

+over it.  Then he says:

+

+"Well, den, I reck'n I did dream it, Huck; but dog my cats ef it ain't

+de powerfullest dream I ever see.  En I hain't ever had no dream b'fo'

+dat's tired me like dis one."

+

+"Oh, well, that's all right, because a dream does tire a body like

+everything sometimes.  But this one was a staving dream; tell me all

+about it, Jim."

+

+So Jim went to work and told me the whole thing right through, just as

+it happened, only he painted it up considerable.  Then he said he must

+start in and "'terpret" it, because it was sent for a warning.  He said

+the first towhead stood for a man that would try to do us some good, but

+the current was another man that would get us away from him.  The whoops

+was warnings that would come to us every now and then, and if we didn't

+try hard to make out to understand them they'd just take us into bad

+luck, 'stead of keeping us out of it.  The lot of towheads was troubles

+we was going to get into with quarrelsome people and all kinds of mean

+folks, but if we minded our business and didn't talk back and aggravate

+them, we would pull through and get out of the fog and into the big

+clear river, which was the free States, and wouldn't have no more

+trouble.

+

+It had clouded up pretty dark just after I got on to the raft, but it

+was clearing up again now.

+

+"Oh, well, that's all interpreted well enough as far as it goes, Jim," I

+says; "but what does these things stand for?"

+

+It was the leaves and rubbish on the raft and the smashed oar.  You

+could see them first-rate now.

+

+Jim looked at the trash, and then looked at me, and back at the trash

+again.  He had got the dream fixed so strong in his head that he

+couldn't seem to shake it loose and get the facts back into its place

+again right away.  But when he did get the thing straightened around he

+looked at me steady without ever smiling, and says:

+

+"What do dey stan' for?  I'se gwyne to tell you.  When I got all wore

+out wid work, en wid de callin' for you, en went to sleep, my heart wuz

+mos' broke bekase you wuz los', en I didn' k'yer no' mo' what become

+er me en de raf'.  En when I wake up en fine you back agin, all safe

+en soun', de tears come, en I could a got down on my knees en kiss yo'

+foot, I's so thankful. En all you wuz thinkin' 'bout wuz how you could

+make a fool uv ole Jim wid a lie.  Dat truck dah is trash; en trash

+is what people is dat puts dirt on de head er dey fren's en makes 'em

+ashamed."

+

+Then he got up slow and walked to the wigwam, and went in there without

+saying anything but that.  But that was enough.  It made me feel so mean

+I could almost kissed his foot to get him to take it back.

+

+It was fifteen minutes before I could work myself up to go and humble

+myself to a nigger; but I done it, and I warn't ever sorry for it

+afterwards, neither.  I didn't do him no more mean tricks, and I

+wouldn't done that one if I'd a knowed it would make him feel that way.

+

+

+

+

+CHAPTER XVI.

+

+WE slept most all day, and started out at night, a little ways behind a

+monstrous long raft that was as long going by as a procession.  She had

+four long sweeps at each end, so we judged she carried as many as thirty

+men, likely.  She had five big wigwams aboard, wide apart, and an open

+camp fire in the middle, and a tall flag-pole at each end.  There was a

+power of style about her.  It amounted to something being a raftsman

+on such a craft as that.

+

+We went drifting down into a big bend, and the night clouded up and got

+hot.  The river was very wide, and was walled with solid timber on

+both sides; you couldn't see a break in it hardly ever, or a light.  We

+talked about Cairo, and wondered whether we would know it when we got to

+it.  I said likely we wouldn't, because I had heard say there warn't but

+about a dozen houses there, and if they didn't happen to have them lit

+up, how was we going to know we was passing a town?  Jim said if the two

+big rivers joined together there, that would show.  But I said maybe

+we might think we was passing the foot of an island and coming into the

+same old river again. That disturbed Jimand me too.  So the question

+was, what to do?  I said, paddle ashore the first time a light showed,

+and tell them pap was behind, coming along with a trading-scow, and

+was a green hand at the business, and wanted to know how far it was to

+Cairo.  Jim thought it was a good idea, so we took a smoke on it and

+waited.

+

+There warn't nothing to do now but to look out sharp for the town, and

+not pass it without seeing it.  He said he'd be mighty sure to see it,

+because he'd be a free man the minute he seen it, but if he missed it

+he'd be in a slave country again and no more show for freedom.  Every

+little while he jumps up and says:

+

+"Dah she is?"

+

+But it warn't.  It was Jack-o'-lanterns, or lightning bugs; so he set

+down again, and went to watching, same as before.  Jim said it made him

+all over trembly and feverish to be so close to freedom.  Well, I can

+tell you it made me all over trembly and feverish, too, to hear him,

+because I begun to get it through my head that he was most freeand

+who was to blame for it?  Why, me.  I couldn't get that out of my

+conscience, no how nor no way. It got to troubling me so I couldn't

+rest; I couldn't stay still in one place.  It hadn't ever come home to

+me before, what this thing was that I was doing.  But now it did; and it

+stayed with me, and scorched me more and more.  I tried to make out to

+myself that I warn't to blame, because I didn't run Jim off from his

+rightful owner; but it warn't no use, conscience up and says, every

+time, "But you knowed he was running for his freedom, and you could a

+paddled ashore and told somebody."  That was soI couldn't get around

+that noway.  That was where it pinched.  Conscience says to me, "What

+had poor Miss Watson done to you that you could see her nigger go off

+right under your eyes and never say one single word?  What did that poor

+old woman do to you that you could treat her so mean?  Why, she tried to

+learn you your book, she tried to learn you your manners, she tried to

+be good to you every way she knowed how.  That's what she done."

+

+I got to feeling so mean and so miserable I most wished I was dead.  I

+fidgeted up and down the raft, abusing myself to myself, and Jim was

+fidgeting up and down past me.  We neither of us could keep still.

+ Every time he danced around and says, "Dah's Cairo!" it went through me

+like a shot, and I thought if it was Cairo I reckoned I would die of

+miserableness.

+

+Jim talked out loud all the time while I was talking to myself.  He was

+saying how the first thing he would do when he got to a free State he

+would go to saving up money and never spend a single cent, and when he

+got enough he would buy his wife, which was owned on a farm close to

+where Miss Watson lived; and then they would both work to buy the

+two children, and if their master wouldn't sell them, they'd get an

+Ab'litionist to go and steal them.

+

+It most froze me to hear such talk.  He wouldn't ever dared to talk such

+talk in his life before.  Just see what a difference it made in him the

+minute he judged he was about free.  It was according to the old saying,

+"Give a nigger an inch and he'll take an ell."  Thinks I, this is what

+comes of my not thinking.  Here was this nigger, which I had as good

+as helped to run away, coming right out flat-footed and saying he would

+steal his childrenchildren that belonged to a man I didn't even know; a

+man that hadn't ever done me no harm.

+

+I was sorry to hear Jim say that, it was such a lowering of him.  My

+conscience got to stirring me up hotter than ever, until at last I says

+to it, "Let up on meit ain't too late yetI'll paddle ashore at the

+first light and tell."  I felt easy and happy and light as a feather

+right off.  All my troubles was gone.  I went to looking out sharp for a

+light, and sort of singing to myself.  By and by one showed.  Jim sings

+out:

+

+"We's safe, Huck, we's safe!  Jump up and crack yo' heels!  Dat's de

+good ole Cairo at las', I jis knows it!"

+

+I says:

+

+"I'll take the canoe and go and see, Jim.  It mightn't be, you know."

+

+He jumped and got the canoe ready, and put his old coat in the bottom

+for me to set on, and give me the paddle; and as I shoved off, he says:

+

+"Pooty soon I'll be a-shout'n' for joy, en I'll say, it's all on

+accounts o' Huck; I's a free man, en I couldn't ever ben free ef it

+hadn' ben for Huck; Huck done it.  Jim won't ever forgit you, Huck;

+you's de bes' fren' Jim's ever had; en you's de only fren' ole Jim's

+got now."

+

+I was paddling off, all in a sweat to tell on him; but when he says

+this, it seemed to kind of take the tuck all out of me.  I went along

+slow then, and I warn't right down certain whether I was glad I started

+or whether I warn't.  When I was fifty yards off, Jim says:

+

+"Dah you goes, de ole true Huck; de on'y white genlman dat ever kep' his

+promise to ole Jim."

+

+Well, I just felt sick.  But I says, I got to do itI can't get out

+of it.  Right then along comes a skiff with two men in it with guns, and

+they stopped and I stopped.  One of them says:

+

+"What's that yonder?"

+

+"A piece of a raft," I says.

+

+"Do you belong on it?"

+

+"Yes, sir."

+

+"Any men on it?"

+

+"Only one, sir."

+

+"Well, there's five niggers run off to-night up yonder, above the head

+of the bend.  Is your man white or black?"

+

+I didn't answer up prompt.  I tried to, but the words wouldn't come. I

+tried for a second or two to brace up and out with it, but I warn't man

+enoughhadn't the spunk of a rabbit.  I see I was weakening; so I just

+give up trying, and up and says:

+

+"He's white."

+

+"I reckon we'll go and see for ourselves."

+

+"I wish you would," says I, "because it's pap that's there, and maybe

+you'd help me tow the raft ashore where the light is.  He's sickand so

+is mam and Mary Ann."

+

+"Oh, the devil! we're in a hurry, boy.  But I s'pose we've got to.

+ Come, buckle to your paddle, and let's get along."

+

+I buckled to my paddle and they laid to their oars.  When we had made a

+stroke or two, I says:

+

+"Pap'll be mighty much obleeged to you, I can tell you.  Everybody goes

+away when I want them to help me tow the raft ashore, and I can't do it

+by myself."

+

+"Well, that's infernal mean.  Odd, too.  Say, boy, what's the matter

+with your father?"

+

+"It's theathewell, it ain't anything much."

+

+They stopped pulling.  It warn't but a mighty little ways to the raft

+now. One says:

+

+"Boy, that's a lie.  What is the matter with your pap?  Answer up

+square now, and it'll be the better for you."

+

+"I will, sir, I will, honestbut don't leave us, please.  It's

+thetheGentlemen, if you'll only pull ahead, and let me heave you the

+headline, you won't have to come a-near the raftplease do."

+

+"Set her back, John, set her back!" says one.  They backed water.  "Keep

+away, boykeep to looard.  Confound it, I just expect the wind has

+blowed it to us.  Your pap's got the small-pox, and you know it precious

+well.  Why didn't you come out and say so?  Do you want to spread it all

+over?"

+

+"Well," says I, a-blubbering, "I've told everybody before, and they just

+went away and left us."

+

+"Poor devil, there's something in that.  We are right down sorry for

+you, but wewell, hang it, we don't want the small-pox, you see.  Look

+here, I'll tell you what to do.  Don't you try to land by yourself, or

+you'll smash everything to pieces.  You float along down about twenty

+miles, and you'll come to a town on the left-hand side of the river.  It

+will be long after sun-up then, and when you ask for help you tell them

+your folks are all down with chills and fever.  Don't be a fool again,

+and let people guess what is the matter.  Now we're trying to do you a

+kindness; so you just put twenty miles between us, that's a good boy.

+ It wouldn't do any good to land yonder where the light isit's only a

+wood-yard. Say, I reckon your father's poor, and I'm bound to say he's

+in pretty hard luck.  Here, I'll put a twenty-dollar gold piece on this

+board, and you get it when it floats by.  I feel mighty mean to leave

+you; but my kingdom! it won't do to fool with small-pox, don't you see?"

+

+"Hold on, Parker," says the other man, "here's a twenty to put on the

+board for me.  Good-bye, boy; you do as Mr. Parker told you, and you'll

+be all right."

+

+"That's so, my boygood-bye, good-bye.  If you see any runaway niggers

+you get help and nab them, and you can make some money by it."

+

+"Good-bye, sir," says I; "I won't let no runaway niggers get by me if I

+can help it."

+

+They went off and I got aboard the raft, feeling bad and low, because I

+knowed very well I had done wrong, and I see it warn't no use for me

+to try to learn to do right; a body that don't get started right when

+he's little ain't got no showwhen the pinch comes there ain't nothing

+to back him up and keep him to his work, and so he gets beat.  Then I

+thought a minute, and says to myself, hold on; s'pose you'd a done right

+and give Jim up, would you felt better than what you do now?  No, says

+I, I'd feel badI'd feel just the same way I do now.  Well, then, says

+I, what's the use you learning to do right when it's troublesome to do

+right and ain't no trouble to do wrong, and the wages is just the same?

+ I was stuck.  I couldn't answer that.  So I reckoned I wouldn't bother

+no more about it, but after this always do whichever come handiest at

+the time.

+

+I went into the wigwam; Jim warn't there.  I looked all around; he

+warn't anywhere.  I says:

+

+"Jim!"

+

+"Here I is, Huck.  Is dey out o' sight yit?  Don't talk loud."

+

+He was in the river under the stern oar, with just his nose out.  I told

+him they were out of sight, so he come aboard.  He says:

+

+"I was a-listenin' to all de talk, en I slips into de river en was gwyne

+to shove for sho' if dey come aboard.  Den I was gwyne to swim to de

+raf' agin when dey was gone.  But lawsy, how you did fool 'em, Huck!

+ Dat wuz de smartes' dodge!  I tell you, chile, I'spec it save' ole

+Jimole Jim ain't going to forgit you for dat, honey."

+

+Then we talked about the money.  It was a pretty good raisetwenty

+dollars apiece.  Jim said we could take deck passage on a steamboat

+now, and the money would last us as far as we wanted to go in the free

+States. He said twenty mile more warn't far for the raft to go, but he

+wished we was already there.

+

+Towards daybreak we tied up, and Jim was mighty particular about hiding

+the raft good.  Then he worked all day fixing things in bundles, and

+getting all ready to quit rafting.

+

+That night about ten we hove in sight of the lights of a town away down

+in a left-hand bend.

+

+I went off in the canoe to ask about it.  Pretty soon I found a man out

+in the river with a skiff, setting a trot-line.  I ranged up and says:

+

+"Mister, is that town Cairo?"

+

+"Cairo? no.  You must be a blame' fool."

+

+"What town is it, mister?"

+

+"If you want to know, go and find out.  If you stay here botherin'

+around me for about a half a minute longer you'll get something you

+won't want."

+

+I paddled to the raft.  Jim was awful disappointed, but I said never

+mind, Cairo would be the next place, I reckoned.

+

+We passed another town before daylight, and I was going out again; but

+it was high ground, so I didn't go.  No high ground about Cairo, Jim

+said. I had forgot it.  We laid up for the day on a towhead tolerable

+close to the left-hand bank.  I begun to suspicion something.  So did

+Jim.  I says:

+

+"Maybe we went by Cairo in the fog that night."

+

+He says:

+

+"Doan' le's talk about it, Huck.  Po' niggers can't have no luck.  I

+awluz 'spected dat rattlesnake-skin warn't done wid its work."

+

+"I wish I'd never seen that snake-skin, JimI do wish I'd never laid

+eyes on it."

+

+"It ain't yo' fault, Huck; you didn' know.  Don't you blame yo'self

+'bout it."

+

+When it was daylight, here was the clear Ohio water inshore, sure

+enough, and outside was the old regular Muddy!  So it was all up with

+Cairo.

+

+We talked it all over.  It wouldn't do to take to the shore; we couldn't

+take the raft up the stream, of course.  There warn't no way but to wait

+for dark, and start back in the canoe and take the chances.  So we slept

+all day amongst the cottonwood thicket, so as to be fresh for the work,

+and when we went back to the raft about dark the canoe was gone!

+

+We didn't say a word for a good while.  There warn't anything to

+say.  We both knowed well enough it was some more work of the

+rattlesnake-skin; so what was the use to talk about it?  It would only

+look like we was finding fault, and that would be bound to fetch more

+bad luckand keep on fetching it, too, till we knowed enough to keep

+still.

+

+By and by we talked about what we better do, and found there warn't no

+way but just to go along down with the raft till we got a chance to buy

+a canoe to go back in.  We warn't going to borrow it when there warn't

+anybody around, the way pap would do, for that might set people after

+us.

+

+So we shoved out after dark on the raft.

+

+Anybody that don't believe yet that it's foolishness to handle a

+snake-skin, after all that that snake-skin done for us, will believe it

+now if they read on and see what more it done for us.

+

+The place to buy canoes is off of rafts laying up at shore.  But we

+didn't see no rafts laying up; so we went along during three hours and

+more.  Well, the night got gray and ruther thick, which is the next

+meanest thing to fog.  You can't tell the shape of the river, and you

+can't see no distance. It got to be very late and still, and then along

+comes a steamboat up the river.  We lit the lantern, and judged she

+would see it.  Up-stream boats didn't generly come close to us; they

+go out and follow the bars and hunt for easy water under the reefs; but

+nights like this they bull right up the channel against the whole river.

+

+We could hear her pounding along, but we didn't see her good till she

+was close.  She aimed right for us.  Often they do that and try to see

+how close they can come without touching; sometimes the wheel bites off

+a sweep, and then the pilot sticks his head out and laughs, and thinks

+he's mighty smart.  Well, here she comes, and we said she was going to

+try and shave us; but she didn't seem to be sheering off a bit.  She

+was a big one, and she was coming in a hurry, too, looking like a black

+cloud with rows of glow-worms around it; but all of a sudden she bulged

+out, big and scary, with a long row of wide-open furnace doors shining

+like red-hot teeth, and her monstrous bows and guards hanging right

+over us.  There was a yell at us, and a jingling of bells to stop the

+engines, a powwow of cussing, and whistling of steamand as Jim went

+overboard on one side and I on the other, she come smashing straight

+through the raft.

+

+I divedand I aimed to find the bottom, too, for a thirty-foot wheel

+had got to go over me, and I wanted it to have plenty of room.  I could

+always stay under water a minute; this time I reckon I stayed under a

+minute and a half.  Then I bounced for the top in a hurry, for I was

+nearly busting.  I popped out to my armpits and blowed the water out of

+my nose, and puffed a bit.  Of course there was a booming current; and

+of course that boat started her engines again ten seconds after she

+stopped them, for they never cared much for raftsmen; so now she was

+churning along up the river, out of sight in the thick weather, though I

+could hear her.

+

+I sung out for Jim about a dozen times, but I didn't get any answer;

+so I grabbed a plank that touched me while I was "treading water," and

+struck out for shore, shoving it ahead of me.  But I made out to see

+that the drift of the current was towards the left-hand shore, which

+meant that I was in a crossing; so I changed off and went that way.

+

+It was one of these long, slanting, two-mile crossings; so I was a good

+long time in getting over.  I made a safe landing, and clumb up the

+bank. I couldn't see but a little ways, but I went poking along over

+rough ground for a quarter of a mile or more, and then I run across a

+big old-fashioned double log-house before I noticed it.  I was going to

+rush by and get away, but a lot of dogs jumped out and went to howling

+and barking at me, and I knowed better than to move another peg.

+

+

+

+

+CHAPTER XVII.

+

+IN about a minute somebody spoke out of a window without putting his

+head out, and says:

+

+"Be done, boys!  Who's there?"

+

+I says:

+

+"It's me."

+

+"Who's me?"

+

+"George Jackson, sir."

+

+"What do you want?"

+

+"I don't want nothing, sir.  I only want to go along by, but the dogs

+won't let me."

+

+"What are you prowling around here this time of night forhey?"

+

+"I warn't prowling around, sir, I fell overboard off of the steamboat."

+

+"Oh, you did, did you?  Strike a light there, somebody.  What did you

+say your name was?"

+

+"George Jackson, sir.  I'm only a boy."

+

+"Look here, if you're telling the truth you needn't be afraidnobody'll

+hurt you.  But don't try to budge; stand right where you are.  Rouse out

+Bob and Tom, some of you, and fetch the guns.  George Jackson, is there

+anybody with you?"

+

+"No, sir, nobody."

+

+I heard the people stirring around in the house now, and see a light.

+The man sung out:

+

+"Snatch that light away, Betsy, you old foolain't you got any sense?

+Put it on the floor behind the front door.  Bob, if you and Tom are

+ready, take your places."

+

+"All ready."

+

+"Now, George Jackson, do you know the Shepherdsons?"

+

+"No, sir; I never heard of them."

+

+"Well, that may be so, and it mayn't.  Now, all ready.  Step forward,

+George Jackson.  And mind, don't you hurrycome mighty slow.  If there's

+anybody with you, let him keep backif he shows himself he'll be shot.

+Come along now.  Come slow; push the door open yourselfjust enough to

+squeeze in, d' you hear?"

+

+I didn't hurry; I couldn't if I'd a wanted to.  I took one slow step at

+a time and there warn't a sound, only I thought I could hear my heart.

+ The dogs were as still as the humans, but they followed a little behind

+me. When I got to the three log doorsteps I heard them unlocking and

+unbarring and unbolting.  I put my hand on the door and pushed it a

+little and a little more till somebody said, "There, that's enoughput

+your head in." I done it, but I judged they would take it off.

+

+The candle was on the floor, and there they all was, looking at me, and

+me at them, for about a quarter of a minute:  Three big men with guns

+pointed at me, which made me wince, I tell you; the oldest, gray

+and about sixty, the other two thirty or moreall of them fine and

+handsomeand the sweetest old gray-headed lady, and back of her two

+young women which I couldn't see right well.  The old gentleman says:

+

+"There; I reckon it's all right.  Come in."

+

+As soon as I was in the old gentleman he locked the door and barred it

+and bolted it, and told the young men to come in with their guns, and

+they all went in a big parlor that had a new rag carpet on the floor,

+and got together in a corner that was out of the range of the front

+windowsthere warn't none on the side.  They held the candle, and took a

+good look at me, and all said, "Why, he ain't a Shepherdsonno, there

+ain't any Shepherdson about him."  Then the old man said he hoped I

+wouldn't mind being searched for arms, because he didn't mean no harm by

+itit was only to make sure.  So he didn't pry into my pockets, but only

+felt outside with his hands, and said it was all right.  He told me to

+make myself easy and at home, and tell all about myself; but the old

+lady says:

+

+"Why, bless you, Saul, the poor thing's as wet as he can be; and don't

+you reckon it may be he's hungry?"

+

+"True for you, RachelI forgot."

+

+So the old lady says:

+

+"Betsy" (this was a nigger woman), "you fly around and get him something

+to eat as quick as you can, poor thing; and one of you girls go and wake

+up Buck and tell himoh, here he is himself.  Buck, take this little

+stranger and get the wet clothes off from him and dress him up in some

+of yours that's dry."

+

+Buck looked about as old as methirteen or fourteen or along there,

+though he was a little bigger than me.  He hadn't on anything but a

+shirt, and he was very frowzy-headed.  He came in gaping and digging one

+fist into his eyes, and he was dragging a gun along with the other one.

+He says:

+

+"Ain't they no Shepherdsons around?"

+

+They said, no, 'twas a false alarm.

+

+"Well," he says, "if they'd a ben some, I reckon I'd a got one."

+

+They all laughed, and Bob says:

+

+"Why, Buck, they might have scalped us all, you've been so slow in

+coming."

+

+"Well, nobody come after me, and it ain't right I'm always kept down; I

+don't get no show."

+

+"Never mind, Buck, my boy," says the old man, "you'll have show enough,

+all in good time, don't you fret about that.  Go 'long with you now, and

+do as your mother told you."

+

+When we got up-stairs to his room he got me a coarse shirt and a

+roundabout and pants of his, and I put them on.  While I was at it he

+asked me what my name was, but before I could tell him he started to

+tell me about a bluejay and a young rabbit he had catched in the woods

+day before yesterday, and he asked me where Moses was when the candle

+went out.  I said I didn't know; I hadn't heard about it before, no way.

+

+"Well, guess," he says.

+

+"How'm I going to guess," says I, "when I never heard tell of it

+before?"

+

+"But you can guess, can't you?  It's just as easy."

+

+"Which candle?"  I says.

+

+"Why, any candle," he says.

+

+"I don't know where he was," says I; "where was he?"

+

+"Why, he was in the dark!  That's where he was!"

+

+"Well, if you knowed where he was, what did you ask me for?"

+

+"Why, blame it, it's a riddle, don't you see?  Say, how long are you

+going to stay here?  You got to stay always.  We can just have booming

+timesthey don't have no school now.  Do you own a dog?  I've got a

+dogand he'll go in the river and bring out chips that you throw in.  Do

+you like to comb up Sundays, and all that kind of foolishness?  You bet

+I don't, but ma she makes me.  Confound these ole britches!  I reckon

+I'd better put 'em on, but I'd ruther not, it's so warm.  Are you all

+ready? All right.  Come along, old hoss."

+

+Cold corn-pone, cold corn-beef, butter and buttermilkthat is what they

+had for me down there, and there ain't nothing better that ever I've

+come across yet.  Buck and his ma and all of them smoked cob pipes,

+except the nigger woman, which was gone, and the two young women.  They

+all smoked and talked, and I eat and talked.  The young women had

+quilts around them, and their hair down their backs.  They all asked me

+questions, and I told them how pap and me and all the family was living

+on a little farm down at the bottom of Arkansaw, and my sister Mary Ann

+run off and got married and never was heard of no more, and Bill went

+to hunt them and he warn't heard of no more, and Tom and Mort died,

+and then there warn't nobody but just me and pap left, and he was just

+trimmed down to nothing, on account of his troubles; so when he died

+I took what there was left, because the farm didn't belong to us, and

+started up the river, deck passage, and fell overboard; and that was how

+I come to be here.  So they said I could have a home there as long as I

+wanted it.  Then it was most daylight and everybody went to bed, and I

+went to bed with Buck, and when I waked up in the morning, drat it all,

+I had forgot what my name was. So I laid there about an hour trying to

+think, and when Buck waked up I says:

+

+"Can you spell, Buck?"

+

+"Yes," he says.

+

+"I bet you can't spell my name," says I.

+

+"I bet you what you dare I can," says he.

+

+"All right," says I, "go ahead."

+

+"G-e-o-r-g-e J-a-x-o-nthere now," he says.

+

+"Well," says I, "you done it, but I didn't think you could.  It ain't no

+slouch of a name to spellright off without studying."

+

+I set it down, private, because somebody might want me to spell it

+next, and so I wanted to be handy with it and rattle it off like I was

+used to it.

+

+It was a mighty nice family, and a mighty nice house, too.  I hadn't

+seen no house out in the country before that was so nice and had so much

+style.  It didn't have an iron latch on the front door, nor a wooden one

+with a buckskin string, but a brass knob to turn, the same as houses in

+town. There warn't no bed in the parlor, nor a sign of a bed; but heaps

+of parlors in towns has beds in them.  There was a big fireplace that

+was bricked on the bottom, and the bricks was kept clean and red by

+pouring water on them and scrubbing them with another brick; sometimes

+they wash them over with red water-paint that they call Spanish-brown,

+same as they do in town.  They had big brass dog-irons that could hold

+up a saw-log. There was a clock on the middle of the mantelpiece, with

+a picture of a town painted on the bottom half of the glass front, and

+a round place in the middle of it for the sun, and you could see the

+pendulum swinging behind it.  It was beautiful to hear that clock tick;

+and sometimes when one of these peddlers had been along and scoured her

+up and got her in good shape, she would start in and strike a hundred

+and fifty before she got tuckered out.  They wouldn't took any money for

+her.

+

+Well, there was a big outlandish parrot on each side of the clock,

+made out of something like chalk, and painted up gaudy.  By one of the

+parrots was a cat made of crockery, and a crockery dog by the other;

+and when you pressed down on them they squeaked, but didn't open

+their mouths nor look different nor interested.  They squeaked through

+underneath.  There was a couple of big wild-turkey-wing fans spread out

+behind those things.  On the table in the middle of the room was a kind

+of a lovely crockery basket that had apples and oranges and peaches and

+grapes piled up in it, which was much redder and yellower and prettier

+than real ones is, but they warn't real because you could see where

+pieces had got chipped off and showed the white chalk, or whatever it

+was, underneath.

+

+This table had a cover made out of beautiful oilcloth, with a red and

+blue spread-eagle painted on it, and a painted border all around.  It

+come all the way from Philadelphia, they said.  There was some books,

+too, piled up perfectly exact, on each corner of the table.  One was a

+big family Bible full of pictures.  One was Pilgrim's Progress, about a

+man that left his family, it didn't say why.  I read considerable in it

+now and then.  The statements was interesting, but tough.  Another was

+Friendship's Offering, full of beautiful stuff and poetry; but I didn't

+read the poetry.  Another was Henry Clay's Speeches, and another was Dr.

+Gunn's Family Medicine, which told you all about what to do if a body

+was sick or dead.  There was a hymn book, and a lot of other books.  And

+there was nice split-bottom chairs, and perfectly sound, toonot bagged

+down in the middle and busted, like an old basket.

+

+They had pictures hung on the wallsmainly Washingtons and Lafayettes,

+and battles, and Highland Marys, and one called "Signing the

+Declaration." There was some that they called crayons, which one of the

+daughters which was dead made her own self when she was only

+fifteen years old.  They was different from any pictures I ever see

+beforeblacker, mostly, than is common.  One was a woman in a slim black

+dress, belted small under the armpits, with bulges like a cabbage in

+the middle of the sleeves, and a large black scoop-shovel bonnet with

+a black veil, and white slim ankles crossed about with black tape, and

+very wee black slippers, like a chisel, and she was leaning pensive on a

+tombstone on her right elbow, under a weeping willow, and her other hand

+hanging down her side holding a white handkerchief and a reticule,

+and underneath the picture it said "Shall I Never See Thee More Alas."

+ Another one was a young lady with her hair all combed up straight

+to the top of her head, and knotted there in front of a comb like a

+chair-back, and she was crying into a handkerchief and had a dead bird

+laying on its back in her other hand with its heels up, and underneath

+the picture it said "I Shall Never Hear Thy Sweet Chirrup More Alas."

+ There was one where a young lady was at a window looking up at the

+moon, and tears running down her cheeks; and she had an open letter in

+one hand with black sealing wax showing on one edge of it, and she was

+mashing a locket with a chain to it against her mouth, and underneath

+the picture it said "And Art Thou Gone Yes Thou Art Gone Alas."  These

+was all nice pictures, I reckon, but I didn't somehow seem to take

+to them, because if ever I was down a little they always give me the

+fan-tods.  Everybody was sorry she died, because she had laid out a lot

+more of these pictures to do, and a body could see by what she had done

+what they had lost.  But I reckoned that with her disposition she was

+having a better time in the graveyard.  She was at work on what they

+said was her greatest picture when she took sick, and every day and

+every night it was her prayer to be allowed to live till she got it

+done, but she never got the chance.  It was a picture of a young woman

+in a long white gown, standing on the rail of a bridge all ready to jump

+off, with her hair all down her back, and looking up to the moon, with

+the tears running down her face, and she had two arms folded across her

+breast, and two arms stretched out in front, and two more reaching up

+towards the moonand the idea was to see which pair would look best,

+and then scratch out all the other arms; but, as I was saying, she died

+before she got her mind made up, and now they kept this picture over the

+head of the bed in her room, and every time her birthday come they hung

+flowers on it.  Other times it was hid with a little curtain.  The young

+woman in the picture had a kind of a nice sweet face, but there was so

+many arms it made her look too spidery, seemed to me.

+

+This young girl kept a scrap-book when she was alive, and used to paste

+obituaries and accidents and cases of patient suffering in it out of the

+Presbyterian Observer, and write poetry after them out of her own head.

+It was very good poetry. This is what she wrote about a boy by the name

+of Stephen Dowling Bots that fell down a well and was drownded:

+

+ODE TO STEPHEN DOWLING BOTS, DEC'D

+

+And did young Stephen sicken,    And did young Stephen die? And did the

+sad hearts thicken,    And did the mourners cry?

+

+No; such was not the fate of    Young Stephen Dowling Bots; Though sad

+hearts round him thickened,    'Twas not from sickness' shots.

+

+No whooping-cough did rack his frame,    Nor measles drear with spots;

+Not these impaired the sacred name    Of Stephen Dowling Bots.

+

+Despised love struck not with woe    That head of curly knots, Nor

+stomach troubles laid him low,    Young Stephen Dowling Bots.

+

+O no. Then list with tearful eye,    Whilst I his fate do tell. His soul

+did from this cold world fly    By falling down a well.

+

+They got him out and emptied him;    Alas it was too late; His spirit

+was gone for to sport aloft    In the realms of the good and great.

+

+If Emmeline Grangerford could make poetry like that before she was

+fourteen, there ain't no telling what she could a done by and by.  Buck

+said she could rattle off poetry like nothing.  She didn't ever have to

+stop to think.  He said she would slap down a line, and if she couldn't

+find anything to rhyme with it would just scratch it out and slap down

+another one, and go ahead. She warn't particular; she could write about

+anything you choose to give her to write about just so it was sadful.

+Every time a man died, or a woman died, or a child died, she would be on

+hand with her "tribute" before he was cold.  She called them tributes.

+The neighbors said it was the doctor first, then Emmeline, then the

+undertakerthe undertaker never got in ahead of Emmeline but once, and

+then she hung fire on a rhyme for the dead person's name, which was

+Whistler.  She warn't ever the same after that; she never complained,

+but she kinder pined away and did not live long.  Poor thing, many's the

+time I made myself go up to the little room that used to be hers and get

+out her poor old scrap-book and read in it when her pictures had been

+aggravating me and I had soured on her a little.  I liked all that

+family, dead ones and all, and warn't going to let anything come between

+us.  Poor Emmeline made poetry about all the dead people when she was

+alive, and it didn't seem right that there warn't nobody to make some

+about her now she was gone; so I tried to sweat out a verse or two

+myself, but I couldn't seem to make it go somehow.  They kept Emmeline's

+room trim and nice, and all the things fixed in it just the way she

+liked to have them when she was alive, and nobody ever slept there.

+ The old lady took care of the room herself, though there was plenty

+of niggers, and she sewed there a good deal and read her Bible there

+mostly.

+

+Well, as I was saying about the parlor, there was beautiful curtains on

+the windows:  white, with pictures painted on them of castles with vines

+all down the walls, and cattle coming down to drink.  There was a little

+old piano, too, that had tin pans in it, I reckon, and nothing was ever

+so lovely as to hear the young ladies sing "The Last Link is Broken"

+and play "The Battle of Prague" on it.  The walls of all the rooms was

+plastered, and most had carpets on the floors, and the whole house was

+whitewashed on the outside.

+

+It was a double house, and the big open place betwixt them was roofed

+and floored, and sometimes the table was set there in the middle of the

+day, and it was a cool, comfortable place.  Nothing couldn't be better.

+ And warn't the cooking good, and just bushels of it too!

+

+

+

+

+CHAPTER XVIII.

+

+COL.  Grangerford was a gentleman, you see.  He was a gentleman all

+over; and so was his family.  He was well born, as the saying is, and

+that's worth as much in a man as it is in a horse, so the Widow Douglas

+said, and nobody ever denied that she was of the first aristocracy

+in our town; and pap he always said it, too, though he warn't no more

+quality than a mudcat himself.  Col.  Grangerford was very tall and

+very slim, and had a darkish-paly complexion, not a sign of red in it

+anywheres; he was clean shaved every morning all over his thin face, and

+he had the thinnest kind of lips, and the thinnest kind of nostrils, and

+a high nose, and heavy eyebrows, and the blackest kind of eyes, sunk so

+deep back that they seemed like they was looking out of caverns at

+you, as you may say.  His forehead was high, and his hair was black and

+straight and hung to his shoulders. His hands was long and thin, and

+every day of his life he put on a clean shirt and a full suit from head

+to foot made out of linen so white it hurt your eyes to look at it;

+and on Sundays he wore a blue tail-coat with brass buttons on it.  He

+carried a mahogany cane with a silver head to it.  There warn't no

+frivolishness about him, not a bit, and he warn't ever loud.  He was

+as kind as he could beyou could feel that, you know, and so you had

+confidence.  Sometimes he smiled, and it was good to see; but when he

+straightened himself up like a liberty-pole, and the lightning begun to

+flicker out from under his eyebrows, you wanted to climb a tree first,

+and find out what the matter was afterwards.  He didn't ever have to

+tell anybody to mind their mannerseverybody was always good-mannered

+where he was.  Everybody loved to have him around, too; he was sunshine

+most alwaysI mean he made it seem like good weather.  When he turned

+into a cloudbank it was awful dark for half a minute, and that was

+enough; there wouldn't nothing go wrong again for a week.

+

+When him and the old lady come down in the morning all the family got

+up out of their chairs and give them good-day, and didn't set down again

+till they had set down.  Then Tom and Bob went to the sideboard where

+the decanter was, and mixed a glass of bitters and handed it to him, and

+he held it in his hand and waited till Tom's and Bob's was mixed, and

+then they bowed and said, "Our duty to you, sir, and madam;" and they

+bowed the least bit in the world and said thank you, and so they drank,

+all three, and Bob and Tom poured a spoonful of water on the sugar and

+the mite of whisky or apple brandy in the bottom of their tumblers, and

+give it to me and Buck, and we drank to the old people too.

+

+Bob was the oldest and Tom nexttall, beautiful men with very broad

+shoulders and brown faces, and long black hair and black eyes.  They

+dressed in white linen from head to foot, like the old gentleman, and

+wore broad Panama hats.

+

+Then there was Miss Charlotte; she was twenty-five, and tall and proud

+and grand, but as good as she could be when she warn't stirred up; but

+when she was she had a look that would make you wilt in your tracks,

+like her father.  She was beautiful.

+

+So was her sister, Miss Sophia, but it was a different kind.  She was

+gentle and sweet like a dove, and she was only twenty.

+

+Each person had their own nigger to wait on themBuck too.  My nigger

+had a monstrous easy time, because I warn't used to having anybody do

+anything for me, but Buck's was on the jump most of the time.

+

+This was all there was of the family now, but there used to be

+morethree sons; they got killed; and Emmeline that died.

+

+The old gentleman owned a lot of farms and over a hundred niggers.

+Sometimes a stack of people would come there, horseback, from ten or

+fifteen mile around, and stay five or six days, and have such junketings

+round about and on the river, and dances and picnics in the woods

+daytimes, and balls at the house nights.  These people was mostly

+kinfolks of the family.  The men brought their guns with them.  It was a

+handsome lot of quality, I tell you.

+

+There was another clan of aristocracy around therefive or six

+familiesmostly of the name of Shepherdson.  They was as high-toned

+and well born and rich and grand as the tribe of Grangerfords.  The

+Shepherdsons and Grangerfords used the same steamboat landing, which was

+about two mile above our house; so sometimes when I went up there with a

+lot of our folks I used to see a lot of the Shepherdsons there on their

+fine horses.

+

+One day Buck and me was away out in the woods hunting, and heard a horse

+coming.  We was crossing the road.  Buck says:

+

+"Quick!  Jump for the woods!"

+

+We done it, and then peeped down the woods through the leaves.  Pretty

+soon a splendid young man come galloping down the road, setting his

+horse easy and looking like a soldier.  He had his gun across his

+pommel.  I had seen him before.  It was young Harney Shepherdson.  I

+heard Buck's gun go off at my ear, and Harney's hat tumbled off from his

+head.  He grabbed his gun and rode straight to the place where we was

+hid.  But we didn't wait.  We started through the woods on a run.  The

+woods warn't thick, so I looked over my shoulder to dodge the bullet,

+and twice I seen Harney cover Buck with his gun; and then he rode away

+the way he cometo get his hat, I reckon, but I couldn't see.  We never

+stopped running till we got home.  The old gentleman's eyes blazed a

+minute'twas pleasure, mainly, I judgedthen his face sort of smoothed

+down, and he says, kind of gentle:

+

+"I don't like that shooting from behind a bush.  Why didn't you step

+into the road, my boy?"

+

+"The Shepherdsons don't, father.  They always take advantage."

+

+Miss Charlotte she held her head up like a queen while Buck was telling

+his tale, and her nostrils spread and her eyes snapped.  The two young

+men looked dark, but never said nothing.  Miss Sophia she turned pale,

+but the color come back when she found the man warn't hurt.

+

+Soon as I could get Buck down by the corn-cribs under the trees by

+ourselves, I says:

+

+"Did you want to kill him, Buck?"

+

+"Well, I bet I did."

+

+"What did he do to you?"

+

+"Him?  He never done nothing to me."

+

+"Well, then, what did you want to kill him for?"

+

+"Why, nothingonly it's on account of the feud."

+

+"What's a feud?"

+

+"Why, where was you raised?  Don't you know what a feud is?"

+

+"Never heard of it beforetell me about it."

+

+"Well," says Buck, "a feud is this way:  A man has a quarrel with

+another man, and kills him; then that other man's brother kills him;

+then the other brothers, on both sides, goes for one another; then the

+cousins chip inand by and by everybody's killed off, and there ain't

+no more feud.  But it's kind of slow, and takes a long time."

+

+"Has this one been going on long, Buck?"

+

+"Well, I should reckon!  It started thirty year ago, or som'ers along

+there.  There was trouble 'bout something, and then a lawsuit to settle

+it; and the suit went agin one of the men, and so he up and shot the

+man that won the suitwhich he would naturally do, of course.  Anybody

+would."

+

+"What was the trouble about, Buck?land?"

+

+"I reckon maybeI don't know."

+

+"Well, who done the shooting?  Was it a Grangerford or a Shepherdson?"

+

+"Laws, how do I know?  It was so long ago."

+

+"Don't anybody know?"

+

+"Oh, yes, pa knows, I reckon, and some of the other old people; but they

+don't know now what the row was about in the first place."

+

+"Has there been many killed, Buck?"

+

+"Yes; right smart chance of funerals.  But they don't always kill.  Pa's

+got a few buckshot in him; but he don't mind it 'cuz he don't weigh

+much, anyway.  Bob's been carved up some with a bowie, and Tom's been

+hurt once or twice."

+

+"Has anybody been killed this year, Buck?"

+

+"Yes; we got one and they got one.  'Bout three months ago my cousin

+Bud, fourteen year old, was riding through the woods on t'other side

+of the river, and didn't have no weapon with him, which was blame'

+foolishness, and in a lonesome place he hears a horse a-coming behind

+him, and sees old Baldy Shepherdson a-linkin' after him with his gun in

+his hand and his white hair a-flying in the wind; and 'stead of jumping

+off and taking to the brush, Bud 'lowed he could out-run him; so they

+had it, nip and tuck, for five mile or more, the old man a-gaining all

+the time; so at last Bud seen it warn't any use, so he stopped and faced

+around so as to have the bullet holes in front, you know, and the old

+man he rode up and shot him down.  But he didn't git much chance to

+enjoy his luck, for inside of a week our folks laid him out."

+

+"I reckon that old man was a coward, Buck."

+

+"I reckon he warn't a coward.  Not by a blame' sight.  There ain't a

+coward amongst them Shepherdsonsnot a one.  And there ain't no cowards

+amongst the Grangerfords either.  Why, that old man kep' up his end in a

+fight one day for half an hour against three Grangerfords, and come

+out winner.  They was all a-horseback; he lit off of his horse and got

+behind a little woodpile, and kep' his horse before him to stop the

+bullets; but the Grangerfords stayed on their horses and capered around

+the old man, and peppered away at him, and he peppered away at them.

+ Him and his horse both went home pretty leaky and crippled, but the

+Grangerfords had to be fetched homeand one of 'em was dead, and

+another died the next day.  No, sir; if a body's out hunting for cowards

+he don't want to fool away any time amongst them Shepherdsons, becuz

+they don't breed any of that kind."

+

+Next Sunday we all went to church, about three mile, everybody

+a-horseback. The men took their guns along, so did Buck, and kept

+them between their knees or stood them handy against the wall.  The

+Shepherdsons done the same.  It was pretty ornery preachingall about

+brotherly love, and such-like tiresomeness; but everybody said it was

+a good sermon, and they all talked it over going home, and had such

+a powerful lot to say about faith and good works and free grace and

+preforeordestination, and I don't know what all, that it did seem to me

+to be one of the roughest Sundays I had run across yet.

+

+About an hour after dinner everybody was dozing around, some in their

+chairs and some in their rooms, and it got to be pretty dull.  Buck and

+a dog was stretched out on the grass in the sun sound asleep.  I went up

+to our room, and judged I would take a nap myself.  I found that sweet

+Miss Sophia standing in her door, which was next to ours, and she took

+me in her room and shut the door very soft, and asked me if I liked her,

+and I said I did; and she asked me if I would do something for her and

+not tell anybody, and I said I would.  Then she said she'd forgot her

+Testament, and left it in the seat at church between two other books,

+and would I slip out quiet and go there and fetch it to her, and not say

+nothing to nobody.  I said I would. So I slid out and slipped off up the

+road, and there warn't anybody at the church, except maybe a hog or two,

+for there warn't any lock on the door, and hogs likes a puncheon floor

+in summer-time because it's cool.  If you notice, most folks don't go to

+church only when they've got to; but a hog is different.

+

+Says I to myself, something's up; it ain't natural for a girl to be in

+such a sweat about a Testament.  So I give it a shake, and out drops a

+little piece of paper with "HALF-PAST TWO" wrote on it with a pencil.  I

+ransacked it, but couldn't find anything else.  I couldn't make anything

+out of that, so I put the paper in the book again, and when I got home

+and upstairs there was Miss Sophia in her door waiting for me.  She

+pulled me in and shut the door; then she looked in the Testament till

+she found the paper, and as soon as she read it she looked glad; and

+before a body could think she grabbed me and give me a squeeze, and

+said I was the best boy in the world, and not to tell anybody.  She was

+mighty red in the face for a minute, and her eyes lighted up, and it

+made her powerful pretty.  I was a good deal astonished, but when I got

+my breath I asked her what the paper was about, and she asked me if I

+had read it, and I said no, and she asked me if I could read writing,

+and I told her "no, only coarse-hand," and then she said the paper

+warn't anything but a book-mark to keep her place, and I might go and

+play now.

+

+I went off down to the river, studying over this thing, and pretty soon

+I noticed that my nigger was following along behind.  When we was out

+of sight of the house he looked back and around a second, and then comes

+a-running, and says:

+

+"Mars Jawge, if you'll come down into de swamp I'll show you a whole

+stack o' water-moccasins."

+

+Thinks I, that's mighty curious; he said that yesterday.  He oughter

+know a body don't love water-moccasins enough to go around hunting for

+them. What is he up to, anyway?  So I says:

+

+"All right; trot ahead."

+

+I followed a half a mile; then he struck out over the swamp, and waded

+ankle deep as much as another half-mile.  We come to a little flat piece

+of land which was dry and very thick with trees and bushes and vines,

+and he says:

+

+"You shove right in dah jist a few steps, Mars Jawge; dah's whah dey is.

+I's seed 'm befo'; I don't k'yer to see 'em no mo'."

+

+Then he slopped right along and went away, and pretty soon the trees hid

+him.  I poked into the place a-ways and come to a little open patch

+as big as a bedroom all hung around with vines, and found a man laying

+there asleepand, by jings, it was my old Jim!

+

+I waked him up, and I reckoned it was going to be a grand surprise to

+him to see me again, but it warn't.  He nearly cried he was so glad, but

+he warn't surprised.  Said he swum along behind me that night, and heard

+me yell every time, but dasn't answer, because he didn't want nobody to

+pick him up and take him into slavery again.  Says he:

+

+"I got hurt a little, en couldn't swim fas', so I wuz a considable ways

+behine you towards de las'; when you landed I reck'ned I could ketch

+up wid you on de lan' 'dout havin' to shout at you, but when I see dat

+house I begin to go slow.  I 'uz off too fur to hear what dey say to

+youI wuz 'fraid o' de dogs; but when it 'uz all quiet agin I knowed

+you's in de house, so I struck out for de woods to wait for day.  Early

+in de mawnin' some er de niggers come along, gwyne to de fields, en dey

+tuk me en showed me dis place, whah de dogs can't track me on accounts

+o' de water, en dey brings me truck to eat every night, en tells me how

+you's a-gitt'n along."

+

+"Why didn't you tell my Jack to fetch me here sooner, Jim?"

+

+"Well, 'twarn't no use to 'sturb you, Huck, tell we could do sumfnbut

+we's all right now.  I ben a-buyin' pots en pans en vittles, as I got a

+chanst, en a-patchin' up de raf' nights when"

+

+"What raft, Jim?"

+

+"Our ole raf'."

+

+"You mean to say our old raft warn't smashed all to flinders?"

+

+"No, she warn't.  She was tore up a good dealone en' of her was; but

+dey warn't no great harm done, on'y our traps was mos' all los'.  Ef we

+hadn' dive' so deep en swum so fur under water, en de night hadn' ben

+so dark, en we warn't so sk'yerd, en ben sich punkin-heads, as de sayin'

+is, we'd a seed de raf'.  But it's jis' as well we didn't, 'kase now

+she's all fixed up agin mos' as good as new, en we's got a new lot o'

+stuff, in de place o' what 'uz los'."

+

+"Why, how did you get hold of the raft again, Jimdid you catch her?"

+

+"How I gwyne to ketch her en I out in de woods?  No; some er de niggers

+foun' her ketched on a snag along heah in de ben', en dey hid her in a

+crick 'mongst de willows, en dey wuz so much jawin' 'bout which un 'um

+she b'long to de mos' dat I come to heah 'bout it pooty soon, so I ups

+en settles de trouble by tellin' 'um she don't b'long to none uv um, but

+to you en me; en I ast 'm if dey gwyne to grab a young white genlman's

+propaty, en git a hid'n for it?  Den I gin 'm ten cents apiece, en dey

+'uz mighty well satisfied, en wisht some mo' raf's 'ud come along en

+make 'm rich agin. Dey's mighty good to me, dese niggers is, en whatever

+I wants 'm to do fur me I doan' have to ast 'm twice, honey.  Dat Jack's

+a good nigger, en pooty smart."

+

+"Yes, he is.  He ain't ever told me you was here; told me to come, and

+he'd show me a lot of water-moccasins.  If anything happens he ain't

+mixed up in it.  He can say he never seen us together, and it 'll be the

+truth."

+

+I don't want to talk much about the next day.  I reckon I'll cut it

+pretty short.  I waked up about dawn, and was a-going to turn over and

+go to sleep again when I noticed how still it wasdidn't seem to be

+anybody stirring.  That warn't usual.  Next I noticed that Buck was

+up and gone. Well, I gets up, a-wondering, and goes down stairsnobody

+around; everything as still as a mouse.  Just the same outside.  Thinks

+I, what does it mean?  Down by the wood-pile I comes across my Jack, and

+says:

+

+"What's it all about?"

+

+Says he:

+

+"Don't you know, Mars Jawge?"

+

+"No," says I, "I don't."

+

+"Well, den, Miss Sophia's run off! 'deed she has.  She run off in de

+night some timenobody don't know jis' when; run off to get married

+to dat young Harney Shepherdson, you knowleastways, so dey 'spec.  De

+fambly foun' it out 'bout half an hour agomaybe a little mo'en' I

+tell you dey warn't no time los'.  Sich another hurryin' up guns

+en hosses you never see!  De women folks has gone for to stir up de

+relations, en ole Mars Saul en de boys tuck dey guns en rode up de

+river road for to try to ketch dat young man en kill him 'fo' he kin

+git acrost de river wid Miss Sophia.  I reck'n dey's gwyne to be mighty

+rough times."

+

+"Buck went off 'thout waking me up."

+

+"Well, I reck'n he did!  Dey warn't gwyne to mix you up in it.

+ Mars Buck he loaded up his gun en 'lowed he's gwyne to fetch home a

+Shepherdson or bust. Well, dey'll be plenty un 'm dah, I reck'n, en you

+bet you he'll fetch one ef he gits a chanst."

+

+I took up the river road as hard as I could put.  By and by I begin to

+hear guns a good ways off.  When I come in sight of the log store and

+the woodpile where the steamboats lands I worked along under the trees

+and brush till I got to a good place, and then I clumb up into the

+forks of a cottonwood that was out of reach, and watched.  There was a

+wood-rank four foot high a little ways in front of the tree, and first I

+was going to hide behind that; but maybe it was luckier I didn't.

+

+There was four or five men cavorting around on their horses in the open

+place before the log store, cussing and yelling, and trying to get at

+a couple of young chaps that was behind the wood-rank alongside of the

+steamboat landing; but they couldn't come it.  Every time one of them

+showed himself on the river side of the woodpile he got shot at.  The

+two boys was squatting back to back behind the pile, so they could watch

+both ways.

+

+By and by the men stopped cavorting around and yelling.  They started

+riding towards the store; then up gets one of the boys, draws a steady

+bead over the wood-rank, and drops one of them out of his saddle.  All

+the men jumped off of their horses and grabbed the hurt one and started

+to carry him to the store; and that minute the two boys started on the

+run.  They got half way to the tree I was in before the men noticed.

+Then the men see them, and jumped on their horses and took out after

+them.  They gained on the boys, but it didn't do no good, the boys had

+too good a start; they got to the woodpile that was in front of my tree,

+and slipped in behind it, and so they had the bulge on the men again.

+One of the boys was Buck, and the other was a slim young chap about

+nineteen years old.

+

+The men ripped around awhile, and then rode away.  As soon as they was

+out of sight I sung out to Buck and told him.  He didn't know what

+to make of my voice coming out of the tree at first.  He was awful

+surprised.  He told me to watch out sharp and let him know when the

+men come in sight again; said they was up to some devilment or

+otherwouldn't be gone long.  I wished I was out of that tree, but I

+dasn't come down.  Buck begun to cry and rip, and 'lowed that him and

+his cousin Joe (that was the other young chap) would make up for this

+day yet.  He said his father and his two brothers was killed, and two

+or three of the enemy.  Said the Shepherdsons laid for them in

+ambush.  Buck said his father and brothers ought to waited for their

+relationsthe Shepherdsons was too strong for them.  I asked him what

+was become of young Harney and Miss Sophia.  He said they'd got across

+the river and was safe.  I was glad of that; but the way Buck did take

+on because he didn't manage to kill Harney that day he shot at himI

+hain't ever heard anything like it.

+

+All of a sudden, bang! bang! bang! goes three or four gunsthe men had

+slipped around through the woods and come in from behind without their

+horses!  The boys jumped for the riverboth of them hurtand as they

+swum down the current the men run along the bank shooting at them and

+singing out, "Kill them, kill them!"  It made me so sick I most fell out

+of the tree.  I ain't a-going to tell all that happenedit would make

+me sick again if I was to do that.  I wished I hadn't ever come ashore

+that night to see such things.  I ain't ever going to get shut of

+themlots of times I dream about them.

+

+I stayed in the tree till it begun to get dark, afraid to come down.

+Sometimes I heard guns away off in the woods; and twice I seen little

+gangs of men gallop past the log store with guns; so I reckoned the

+trouble was still a-going on.  I was mighty downhearted; so I made up my

+mind I wouldn't ever go anear that house again, because I reckoned I

+was to blame, somehow. I judged that that piece of paper meant that Miss

+Sophia was to meet Harney somewheres at half-past two and run off; and

+I judged I ought to told her father about that paper and the curious way

+she acted, and then maybe he would a locked her up, and this awful mess

+wouldn't ever happened.

+

+When I got down out of the tree I crept along down the river bank a

+piece, and found the two bodies laying in the edge of the water, and

+tugged at them till I got them ashore; then I covered up their faces,

+and got away as quick as I could.  I cried a little when I was covering

+up Buck's face, for he was mighty good to me.

+

+It was just dark now.  I never went near the house, but struck through

+the woods and made for the swamp.  Jim warn't on his island, so I

+tramped off in a hurry for the crick, and crowded through the willows,

+red-hot to jump aboard and get out of that awful country.  The raft was

+gone!  My souls, but I was scared!  I couldn't get my breath for most

+a minute. Then I raised a yell.  A voice not twenty-five foot from me

+says:

+

+"Good lan'! is dat you, honey?  Doan' make no noise."

+

+It was Jim's voicenothing ever sounded so good before.  I run along the

+bank a piece and got aboard, and Jim he grabbed me and hugged me, he was

+so glad to see me.  He says:

+

+"Laws bless you, chile, I 'uz right down sho' you's dead agin.  Jack's

+been heah; he say he reck'n you's ben shot, kase you didn' come home no

+mo'; so I's jes' dis minute a startin' de raf' down towards de mouf er

+de crick, so's to be all ready for to shove out en leave soon as Jack

+comes agin en tells me for certain you is dead.  Lawsy, I's mighty

+glad to git you back again, honey."

+

+I says:

+

+"All rightthat's mighty good; they won't find me, and they'll think

+I've been killed, and floated down the riverthere's something up there

+that 'll help them think soso don't you lose no time, Jim, but just

+shove off for the big water as fast as ever you can."

+

+I never felt easy till the raft was two mile below there and out in

+the middle of the Mississippi.  Then we hung up our signal lantern, and

+judged that we was free and safe once more.  I hadn't had a bite to eat

+since yesterday, so Jim he got out some corn-dodgers and buttermilk,

+and pork and cabbage and greensthere ain't nothing in the world so good

+when it's cooked rightand whilst I eat my supper we talked and had a

+good time.  I was powerful glad to get away from the feuds, and so was

+Jim to get away from the swamp.  We said there warn't no home like a

+raft, after all.  Other places do seem so cramped up and smothery, but a

+raft don't.  You feel mighty free and easy and comfortable on a raft.

+

+

+

+

+CHAPTER XIX.

+

+TWO or three days and nights went by; I reckon I might say they swum by,

+they slid along so quiet and smooth and lovely.  Here is the way we put

+in the time.  It was a monstrous big river down theresometimes a mile

+and a half wide; we run nights, and laid up and hid daytimes; soon as

+night was most gone we stopped navigating and tied upnearly always

+in the dead water under a towhead; and then cut young cottonwoods and

+willows, and hid the raft with them.  Then we set out the lines.  Next

+we slid into the river and had a swim, so as to freshen up and cool

+off; then we set down on the sandy bottom where the water was about knee

+deep, and watched the daylight come.  Not a sound anywheresperfectly

+stilljust like the whole world was asleep, only sometimes the bullfrogs

+a-cluttering, maybe.  The first thing to see, looking away over the

+water, was a kind of dull linethat was the woods on t'other side; you

+couldn't make nothing else out; then a pale place in the sky; then more

+paleness spreading around; then the river softened up away off, and

+warn't black any more, but gray; you could see little dark spots

+drifting along ever so far awaytrading scows, and such things; and

+long black streaksrafts; sometimes you could hear a sweep screaking; or

+jumbled up voices, it was so still, and sounds come so far; and by and

+by you could see a streak on the water which you know by the look of the

+streak that there's a snag there in a swift current which breaks on it

+and makes that streak look that way; and you see the mist curl up off

+of the water, and the east reddens up, and the river, and you make out a

+log-cabin in the edge of the woods, away on the bank on t'other side of

+the river, being a woodyard, likely, and piled by them cheats so you can

+throw a dog through it anywheres; then the nice breeze springs up, and

+comes fanning you from over there, so cool and fresh and sweet to smell

+on account of the woods and the flowers; but sometimes not that way,

+because they've left dead fish laying around, gars and such, and they

+do get pretty rank; and next you've got the full day, and everything

+smiling in the sun, and the song-birds just going it!

+

+A little smoke couldn't be noticed now, so we would take some fish off

+of the lines and cook up a hot breakfast.  And afterwards we would watch

+the lonesomeness of the river, and kind of lazy along, and by and by

+lazy off to sleep.  Wake up by and by, and look to see what done it, and

+maybe see a steamboat coughing along up-stream, so far off towards the

+other side you couldn't tell nothing about her only whether she was

+a stern-wheel or side-wheel; then for about an hour there wouldn't be

+nothing to hear nor nothing to seejust solid lonesomeness.  Next

+you'd see a raft sliding by, away off yonder, and maybe a galoot on it

+chopping, because they're most always doing it on a raft; you'd see the

+axe flash and come downyou don't hear nothing; you see that axe go

+up again, and by the time it's above the man's head then you hear the

+k'chunk!it had took all that time to come over the water.  So we

+would put in the day, lazying around, listening to the stillness.  Once

+there was a thick fog, and the rafts and things that went by was beating

+tin pans so the steamboats wouldn't run over them.  A scow or a

+raft went by so close we could hear them talking and cussing and

+laughingheard them plain; but we couldn't see no sign of them; it made

+you feel crawly; it was like spirits carrying on that way in the air.

+ Jim said he believed it was spirits; but I says:

+

+"No; spirits wouldn't say, 'Dern the dern fog.'"

+

+Soon as it was night out we shoved; when we got her out to about the

+middle we let her alone, and let her float wherever the current wanted

+her to; then we lit the pipes, and dangled our legs in the water, and

+talked about all kinds of thingswe was always naked, day and night,

+whenever the mosquitoes would let usthe new clothes Buck's folks made

+for me was too good to be comfortable, and besides I didn't go much on

+clothes, nohow.

+

+Sometimes we'd have that whole river all to ourselves for the longest

+time. Yonder was the banks and the islands, across the water; and maybe

+a sparkwhich was a candle in a cabin window; and sometimes on the water

+you could see a spark or twoon a raft or a scow, you know; and maybe

+you could hear a fiddle or a song coming over from one of them crafts.

+It's lovely to live on a raft.  We had the sky up there, all speckled

+with stars, and we used to lay on our backs and look up at them, and

+discuss about whether they was made or only just happened.  Jim he

+allowed they was made, but I allowed they happened; I judged it would

+have took too long to make so many.  Jim said the moon could a laid

+them; well, that looked kind of reasonable, so I didn't say nothing

+against it, because I've seen a frog lay most as many, so of course it

+could be done. We used to watch the stars that fell, too, and see them

+streak down.  Jim allowed they'd got spoiled and was hove out of the

+nest.

+

+Once or twice of a night we would see a steamboat slipping along in the

+dark, and now and then she would belch a whole world of sparks up out

+of her chimbleys, and they would rain down in the river and look awful

+pretty; then she would turn a corner and her lights would wink out and

+her powwow shut off and leave the river still again; and by and by her

+waves would get to us, a long time after she was gone, and joggle the

+raft a bit, and after that you wouldn't hear nothing for you couldn't

+tell how long, except maybe frogs or something.

+

+After midnight the people on shore went to bed, and then for two or

+three hours the shores was blackno more sparks in the cabin windows.

+ These sparks was our clockthe first one that showed again meant

+morning was coming, so we hunted a place to hide and tie up right away.

+

+One morning about daybreak I found a canoe and crossed over a chute to

+the main shoreit was only two hundred yardsand paddled about a mile

+up a crick amongst the cypress woods, to see if I couldn't get some

+berries. Just as I was passing a place where a kind of a cowpath crossed

+the crick, here comes a couple of men tearing up the path as tight as

+they could foot it.  I thought I was a goner, for whenever anybody was

+after anybody I judged it was meor maybe Jim.  I was about to dig out

+from there in a hurry, but they was pretty close to me then, and sung

+out and begged me to save their livessaid they hadn't been doing

+nothing, and was being chased for itsaid there was men and dogs

+a-coming.  They wanted to jump right in, but I says:

+

+"Don't you do it.  I don't hear the dogs and horses yet; you've got time

+to crowd through the brush and get up the crick a little ways; then you

+take to the water and wade down to me and get inthat'll throw the dogs

+off the scent."

+

+They done it, and soon as they was aboard I lit out for our towhead,

+and in about five or ten minutes we heard the dogs and the men away off,

+shouting. We heard them come along towards the crick, but couldn't

+see them; they seemed to stop and fool around a while; then, as we got

+further and further away all the time, we couldn't hardly hear them at

+all; by the time we had left a mile of woods behind us and struck the

+river, everything was quiet, and we paddled over to the towhead and hid

+in the cottonwoods and was safe.

+

+One of these fellows was about seventy or upwards, and had a bald head

+and very gray whiskers.  He had an old battered-up slouch hat on, and

+a greasy blue woollen shirt, and ragged old blue jeans britches stuffed

+into his boot-tops, and home-knit gallusesno, he only had one.  He had

+an old long-tailed blue jeans coat with slick brass buttons flung over

+his arm, and both of them had big, fat, ratty-looking carpet-bags.

+

+The other fellow was about thirty, and dressed about as ornery.  After

+breakfast we all laid off and talked, and the first thing that come out

+was that these chaps didn't know one another.

+

+"What got you into trouble?" says the baldhead to t'other chap.

+

+"Well, I'd been selling an article to take the tartar off the teethand

+it does take it off, too, and generly the enamel along with itbut I

+stayed about one night longer than I ought to, and was just in the act

+of sliding out when I ran across you on the trail this side of town, and

+you told me they were coming, and begged me to help you to get off.  So

+I told you I was expecting trouble myself, and would scatter out with

+you. That's the whole yarnwhat's yourn?

+

+"Well, I'd ben a-running' a little temperance revival thar 'bout a week,

+and was the pet of the women folks, big and little, for I was makin' it

+mighty warm for the rummies, I tell you, and takin' as much as five

+or six dollars a nightten cents a head, children and niggers freeand

+business a-growin' all the time, when somehow or another a little report

+got around last night that I had a way of puttin' in my time with a

+private jug on the sly.  A nigger rousted me out this mornin', and told

+me the people was getherin' on the quiet with their dogs and horses, and

+they'd be along pretty soon and give me 'bout half an hour's start,

+and then run me down if they could; and if they got me they'd tar

+and feather me and ride me on a rail, sure.  I didn't wait for no

+breakfastI warn't hungry."

+

+"Old man," said the young one, "I reckon we might double-team it

+together; what do you think?"

+

+"I ain't undisposed.  What's your linemainly?"

+

+"Jour printer by trade; do a little in patent medicines;

+theater-actortragedy, you know; take a turn to mesmerism and phrenology

+when there's a chance; teach singing-geography school for a change;

+sling a lecture sometimesoh, I do lots of thingsmost anything that

+comes handy, so it ain't work.  What's your lay?"

+

+"I've done considerble in the doctoring way in my time.  Layin' on o'

+hands is my best holtfor cancer and paralysis, and sich things; and I

+k'n tell a fortune pretty good when I've got somebody along to find out

+the facts for me.  Preachin's my line, too, and workin' camp-meetin's,

+and missionaryin' around."

+

+Nobody never said anything for a while; then the young man hove a sigh

+and says:

+

+"Alas!"

+

+"What 're you alassin' about?" says the bald-head.

+

+"To think I should have lived to be leading such a life, and be degraded

+down into such company."  And he begun to wipe the corner of his eye

+with a rag.

+

+"Dern your skin, ain't the company good enough for you?" says the

+baldhead, pretty pert and uppish.

+

+"Yes, it is good enough for me; it's as good as I deserve; for who

+fetched me so low when I was so high?  I did myself.  I don't blame

+you, gentlemenfar from it; I don't blame anybody.  I deserve it

+all.  Let the cold world do its worst; one thing I knowthere's a grave

+somewhere for me. The world may go on just as it's always done, and take

+everything from meloved ones, property, everything; but it can't take

+that. Some day I'll lie down in it and forget it all, and my poor broken

+heart will be at rest."  He went on a-wiping.

+

+"Drot your pore broken heart," says the baldhead; "what are you heaving

+your pore broken heart at us f'r?  we hain't done nothing."

+

+"No, I know you haven't.  I ain't blaming you, gentlemen.  I brought

+myself downyes, I did it myself.  It's right I should sufferperfectly

+rightI don't make any moan."

+

+"Brought you down from whar?  Whar was you brought down from?"

+

+"Ah, you would not believe me; the world never believeslet it pass'tis

+no matter.  The secret of my birth"

+

+"The secret of your birth!  Do you mean to say"

+

+"Gentlemen," says the young man, very solemn, "I will reveal it to you,

+for I feel I may have confidence in you.  By rights I am a duke!"

+

+Jim's eyes bugged out when he heard that; and I reckon mine did, too.

+Then the baldhead says:  "No! you can't mean it?"

+

+"Yes.  My great-grandfather, eldest son of the Duke of Bridgewater, fled

+to this country about the end of the last century, to breathe the pure

+air of freedom; married here, and died, leaving a son, his own father

+dying about the same time.  The second son of the late duke seized the

+titles and estatesthe infant real duke was ignored.  I am the lineal

+descendant of that infantI am the rightful Duke of Bridgewater; and

+here am I, forlorn, torn from my high estate, hunted of men, despised

+by the cold world, ragged, worn, heart-broken, and degraded to the

+companionship of felons on a raft!"

+

+Jim pitied him ever so much, and so did I. We tried to comfort him, but

+he said it warn't much use, he couldn't be much comforted; said if we

+was a mind to acknowledge him, that would do him more good than most

+anything else; so we said we would, if he would tell us how.  He said we

+ought to bow when we spoke to him, and say "Your Grace," or "My Lord,"

+or "Your Lordship"and he wouldn't mind it if we called him plain

+"Bridgewater," which, he said, was a title anyway, and not a name; and

+one of us ought to wait on him at dinner, and do any little thing for

+him he wanted done.

+

+Well, that was all easy, so we done it.  All through dinner Jim stood

+around and waited on him, and says, "Will yo' Grace have some o' dis or

+some o' dat?" and so on, and a body could see it was mighty pleasing to

+him.

+

+But the old man got pretty silent by and bydidn't have much to say, and

+didn't look pretty comfortable over all that petting that was going on

+around that duke.  He seemed to have something on his mind.  So, along

+in the afternoon, he says:

+

+"Looky here, Bilgewater," he says, "I'm nation sorry for you, but you

+ain't the only person that's had troubles like that."

+

+"No?"

+

+"No you ain't.  You ain't the only person that's ben snaked down

+wrongfully out'n a high place."

+

+"Alas!"

+

+"No, you ain't the only person that's had a secret of his birth."  And,

+by jings, he begins to cry.

+

+"Hold!  What do you mean?"

+

+"Bilgewater, kin I trust you?" says the old man, still sort of sobbing.

+

+"To the bitter death!"  He took the old man by the hand and squeezed it,

+and says, "That secret of your being:  speak!"

+

+"Bilgewater, I am the late Dauphin!"

+

+You bet you, Jim and me stared this time.  Then the duke says:

+

+"You are what?"

+

+"Yes, my friend, it is too trueyour eyes is lookin' at this very moment

+on the pore disappeared Dauphin, Looy the Seventeen, son of Looy the

+Sixteen and Marry Antonette."

+

+"You!  At your age!  No!  You mean you're the late Charlemagne; you must

+be six or seven hundred years old, at the very least."

+

+"Trouble has done it, Bilgewater, trouble has done it; trouble has brung

+these gray hairs and this premature balditude.  Yes, gentlemen, you

+see before you, in blue jeans and misery, the wanderin', exiled,

+trampled-on, and sufferin' rightful King of France."

+

+Well, he cried and took on so that me and Jim didn't know hardly what to

+do, we was so sorryand so glad and proud we'd got him with us, too.

+ So we set in, like we done before with the duke, and tried to comfort

+him. But he said it warn't no use, nothing but to be dead and done

+with it all could do him any good; though he said it often made him feel

+easier and better for a while if people treated him according to his

+rights, and got down on one knee to speak to him, and always called him

+"Your Majesty," and waited on him first at meals, and didn't set down

+in his presence till he asked them. So Jim and me set to majestying him,

+and doing this and that and t'other for him, and standing up till he

+told us we might set down.  This done him heaps of good, and so he

+got cheerful and comfortable.  But the duke kind of soured on him, and

+didn't look a bit satisfied with the way things was going; still,

+the king acted real friendly towards him, and said the duke's

+great-grandfather and all the other Dukes of Bilgewater was a good

+deal thought of by his father, and was allowed to come to the palace

+considerable; but the duke stayed huffy a good while, till by and by the

+king says:

+

+"Like as not we got to be together a blamed long time on this h-yer

+raft, Bilgewater, and so what's the use o' your bein' sour?  It 'll only

+make things oncomfortable.  It ain't my fault I warn't born a duke,

+it ain't your fault you warn't born a kingso what's the use to worry?

+ Make the best o' things the way you find 'em, says Ithat's my motto.

+ This ain't no bad thing that we've struck hereplenty grub and an easy

+lifecome, give us your hand, duke, and le's all be friends."

+

+The duke done it, and Jim and me was pretty glad to see it.  It took

+away all the uncomfortableness and we felt mighty good over it, because

+it would a been a miserable business to have any unfriendliness on the

+raft; for what you want, above all things, on a raft, is for everybody

+to be satisfied, and feel right and kind towards the others.

+

+It didn't take me long to make up my mind that these liars warn't no

+kings nor dukes at all, but just low-down humbugs and frauds.  But I

+never said nothing, never let on; kept it to myself; it's the best way;

+then you don't have no quarrels, and don't get into no trouble.  If they

+wanted us to call them kings and dukes, I hadn't no objections, 'long as

+it would keep peace in the family; and it warn't no use to tell Jim, so

+I didn't tell him.  If I never learnt nothing else out of pap, I learnt

+that the best way to get along with his kind of people is to let them

+have their own way.

+

+

+

+

+CHAPTER XX.

+

+THEY asked us considerable many questions; wanted to know what we

+covered up the raft that way for, and laid by in the daytime instead of

+runningwas Jim a runaway nigger?  Says I:

+

+"Goodness sakes! would a runaway nigger run south?"

+

+No, they allowed he wouldn't.  I had to account for things some way, so

+I says:

+

+"My folks was living in Pike County, in Missouri, where I was born, and

+they all died off but me and pa and my brother Ike.  Pa, he 'lowed

+he'd break up and go down and live with Uncle Ben, who's got a little

+one-horse place on the river, forty-four mile below Orleans.  Pa was

+pretty poor, and had some debts; so when he'd squared up there warn't

+nothing left but sixteen dollars and our nigger, Jim.  That warn't

+enough to take us fourteen hundred mile, deck passage nor no other way.

+ Well, when the river rose pa had a streak of luck one day; he ketched

+this piece of a raft; so we reckoned we'd go down to Orleans on it.

+ Pa's luck didn't hold out; a steamboat run over the forrard corner of

+the raft one night, and we all went overboard and dove under the wheel;

+Jim and me come up all right, but pa was drunk, and Ike was only four

+years old, so they never come up no more.  Well, for the next day or

+two we had considerable trouble, because people was always coming out in

+skiffs and trying to take Jim away from me, saying they believed he was

+a runaway nigger.  We don't run daytimes no more now; nights they don't

+bother us."

+

+The duke says:

+

+"Leave me alone to cipher out a way so we can run in the daytime if we

+want to.  I'll think the thing overI'll invent a plan that'll fix it.

+We'll let it alone for to-day, because of course we don't want to go by

+that town yonder in daylightit mightn't be healthy."

+

+Towards night it begun to darken up and look like rain; the heat

+lightning was squirting around low down in the sky, and the leaves was

+beginning to shiverit was going to be pretty ugly, it was easy to see

+that.  So the duke and the king went to overhauling our wigwam, to see

+what the beds was like.  My bed was a straw tick better than Jim's,

+which was a corn-shuck tick; there's always cobs around about in a shuck

+tick, and they poke into you and hurt; and when you roll over the dry

+shucks sound like you was rolling over in a pile of dead leaves; it

+makes such a rustling that you wake up.  Well, the duke allowed he would

+take my bed; but the king allowed he wouldn't.  He says:

+

+"I should a reckoned the difference in rank would a sejested to you that

+a corn-shuck bed warn't just fitten for me to sleep on.  Your Grace 'll

+take the shuck bed yourself."

+

+Jim and me was in a sweat again for a minute, being afraid there was

+going to be some more trouble amongst them; so we was pretty glad when

+the duke says:

+

+"'Tis my fate to be always ground into the mire under the iron heel of

+oppression.  Misfortune has broken my once haughty spirit; I yield, I

+submit; 'tis my fate.  I am alone in the worldlet me suffer; can bear

+it."

+

+We got away as soon as it was good and dark.  The king told us to stand

+well out towards the middle of the river, and not show a light till we

+got a long ways below the town.  We come in sight of the little bunch of

+lights by and bythat was the town, you knowand slid by, about a half

+a mile out, all right.  When we was three-quarters of a mile below we

+hoisted up our signal lantern; and about ten o'clock it come on to rain

+and blow and thunder and lighten like everything; so the king told us

+to both stay on watch till the weather got better; then him and the duke

+crawled into the wigwam and turned in for the night.  It was my watch

+below till twelve, but I wouldn't a turned in anyway if I'd had a bed,

+because a body don't see such a storm as that every day in the week, not

+by a long sight.  My souls, how the wind did scream along!  And every

+second or two there'd come a glare that lit up the white-caps for a half

+a mile around, and you'd see the islands looking dusty through the rain,

+and the trees thrashing around in the wind; then comes a H-WHACK!bum!

+bum! bumble-umble-um-bum-bum-bum-bumand the thunder would go rumbling

+and grumbling away, and quitand then RIP comes another flash and

+another sockdolager.  The waves most washed me off the raft sometimes,

+but I hadn't any clothes on, and didn't mind.  We didn't have no trouble

+about snags; the lightning was glaring and flittering around so constant

+that we could see them plenty soon enough to throw her head this way or

+that and miss them.

+

+I had the middle watch, you know, but I was pretty sleepy by that time,

+so Jim he said he would stand the first half of it for me; he was always

+mighty good that way, Jim was.  I crawled into the wigwam, but the king

+and the duke had their legs sprawled around so there warn't no show for

+me; so I laid outsideI didn't mind the rain, because it was warm, and

+the waves warn't running so high now.  About two they come up again,

+though, and Jim was going to call me; but he changed his mind, because

+he reckoned they warn't high enough yet to do any harm; but he was

+mistaken about that, for pretty soon all of a sudden along comes a

+regular ripper and washed me overboard.  It most killed Jim a-laughing.

+ He was the easiest nigger to laugh that ever was, anyway.

+

+I took the watch, and Jim he laid down and snored away; and by and by

+the storm let up for good and all; and the first cabin-light that showed

+I rousted him out, and we slid the raft into hiding quarters for the

+day.

+

+The king got out an old ratty deck of cards after breakfast, and him

+and the duke played seven-up a while, five cents a game.  Then they got

+tired of it, and allowed they would "lay out a campaign," as they called

+it. The duke went down into his carpet-bag, and fetched up a lot of

+little printed bills and read them out loud.  One bill said, "The

+celebrated Dr. Armand de Montalban, of Paris," would "lecture on the

+Science of Phrenology" at such and such a place, on the blank day of

+blank, at ten cents admission, and "furnish charts of character at

+twenty-five cents apiece."  The duke said that was him.  In another

+bill he was the "world-renowned Shakespearian tragedian, Garrick the

+Younger, of Drury Lane, London."  In other bills he had a lot of other

+names and done other wonderful things, like finding water and gold with

+a "divining-rod," "dissipating witch spells," and so on.  By and by he

+says:

+

+"But the histrionic muse is the darling.  Have you ever trod the boards,

+Royalty?"

+

+"No," says the king.

+

+"You shall, then, before you're three days older, Fallen Grandeur," says

+the duke.  "The first good town we come to we'll hire a hall and do the

+sword fight in Richard III. and the balcony scene in Romeo and Juliet.

+How does that strike you?"

+

+"I'm in, up to the hub, for anything that will pay, Bilgewater; but, you

+see, I don't know nothing about play-actin', and hain't ever seen much

+of it.  I was too small when pap used to have 'em at the palace.  Do you

+reckon you can learn me?"

+

+"Easy!"

+

+"All right.  I'm jist a-freezn' for something fresh, anyway.  Le's

+commence right away."

+

+So the duke he told him all about who Romeo was and who Juliet was, and

+said he was used to being Romeo, so the king could be Juliet.

+

+"But if Juliet's such a young gal, duke, my peeled head and my white

+whiskers is goin' to look oncommon odd on her, maybe."

+

+"No, don't you worry; these country jakes won't ever think of that.

+Besides, you know, you'll be in costume, and that makes all the

+difference in the world; Juliet's in a balcony, enjoying the moonlight

+before she goes to bed, and she's got on her night-gown and her ruffled

+nightcap.  Here are the costumes for the parts."

+

+He got out two or three curtain-calico suits, which he said was

+meedyevil armor for Richard III. and t'other chap, and a long white

+cotton nightshirt and a ruffled nightcap to match.  The king was

+satisfied; so the duke got out his book and read the parts over in the

+most splendid spread-eagle way, prancing around and acting at the same

+time, to show how it had got to be done; then he give the book to the

+king and told him to get his part by heart.

+

+There was a little one-horse town about three mile down the bend, and

+after dinner the duke said he had ciphered out his idea about how to run

+in daylight without it being dangersome for Jim; so he allowed he would

+go down to the town and fix that thing.  The king allowed he would go,

+too, and see if he couldn't strike something.  We was out of coffee, so

+Jim said I better go along with them in the canoe and get some.

+

+When we got there there warn't nobody stirring; streets empty, and

+perfectly dead and still, like Sunday.  We found a sick nigger sunning

+himself in a back yard, and he said everybody that warn't too young or

+too sick or too old was gone to camp-meeting, about two mile back in the

+woods.  The king got the directions, and allowed he'd go and work that

+camp-meeting for all it was worth, and I might go, too.

+

+The duke said what he was after was a printing-office.  We found it;

+a little bit of a concern, up over a carpenter shopcarpenters and

+printers all gone to the meeting, and no doors locked.  It was a dirty,

+littered-up place, and had ink marks, and handbills with pictures of

+horses and runaway niggers on them, all over the walls.  The duke shed

+his coat and said he was all right now.  So me and the king lit out for

+the camp-meeting.

+

+We got there in about a half an hour fairly dripping, for it was a most

+awful hot day.  There was as much as a thousand people there from

+twenty mile around.  The woods was full of teams and wagons, hitched

+everywheres, feeding out of the wagon-troughs and stomping to keep

+off the flies.  There was sheds made out of poles and roofed over with

+branches, where they had lemonade and gingerbread to sell, and piles of

+watermelons and green corn and such-like truck.

+

+The preaching was going on under the same kinds of sheds, only they was

+bigger and held crowds of people.  The benches was made out of outside

+slabs of logs, with holes bored in the round side to drive sticks into

+for legs. They didn't have no backs.  The preachers had high platforms

+to stand on at one end of the sheds.  The women had on sun-bonnets;

+and some had linsey-woolsey frocks, some gingham ones, and a few of the

+young ones had on calico.  Some of the young men was barefooted, and

+some of the children didn't have on any clothes but just a tow-linen

+shirt.  Some of the old women was knitting, and some of the young folks

+was courting on the sly.

+

+The first shed we come to the preacher was lining out a hymn.  He lined

+out two lines, everybody sung it, and it was kind of grand to hear it,

+there was so many of them and they done it in such a rousing way; then

+he lined out two more for them to singand so on.  The people woke up

+more and more, and sung louder and louder; and towards the end some

+begun to groan, and some begun to shout.  Then the preacher begun to

+preach, and begun in earnest, too; and went weaving first to one side of

+the platform and then the other, and then a-leaning down over the front

+of it, with his arms and his body going all the time, and shouting his

+words out with all his might; and every now and then he would hold up

+his Bible and spread it open, and kind of pass it around this way and

+that, shouting, "It's the brazen serpent in the wilderness!  Look upon

+it and live!"  And people would shout out, "Glory!A-a-men!"  And so

+he went on, and the people groaning and crying and saying amen:

+

+"Oh, come to the mourners' bench! come, black with sin! (Amen!) come,

+sick and sore! (Amen!) come, lame and halt and blind! (Amen!) come,

+pore and needy, sunk in shame! (A-A-Men!) come, all that's worn and

+soiled and suffering!come with a broken spirit! come with a contrite

+heart! come in your rags and sin and dirt! the waters that cleanse

+is free, the door of heaven stands openoh, enter in and be at rest!"

+(A-A-Men!  Glory, Glory Hallelujah!)

+

+And so on.  You couldn't make out what the preacher said any more, on

+account of the shouting and crying.  Folks got up everywheres in the

+crowd, and worked their way just by main strength to the mourners'

+bench, with the tears running down their faces; and when all the

+mourners had got up there to the front benches in a crowd, they sung and

+shouted and flung themselves down on the straw, just crazy and wild.

+

+Well, the first I knowed the king got a-going, and you could hear him

+over everybody; and next he went a-charging up on to the platform, and

+the preacher he begged him to speak to the people, and he done it.  He

+told them he was a piratebeen a pirate for thirty years out in the

+Indian Oceanand his crew was thinned out considerable last spring in

+a fight, and he was home now to take out some fresh men, and thanks to

+goodness he'd been robbed last night and put ashore off of a steamboat

+without a cent, and he was glad of it; it was the blessedest thing that

+ever happened to him, because he was a changed man now, and happy for

+the first time in his life; and, poor as he was, he was going to start

+right off and work his way back to the Indian Ocean, and put in the rest

+of his life trying to turn the pirates into the true path; for he could

+do it better than anybody else, being acquainted with all pirate crews

+in that ocean; and though it would take him a long time to get there

+without money, he would get there anyway, and every time he convinced

+a pirate he would say to him, "Don't you thank me, don't you give me no

+credit; it all belongs to them dear people in Pokeville camp-meeting,

+natural brothers and benefactors of the race, and that dear preacher

+there, the truest friend a pirate ever had!"

+

+And then he busted into tears, and so did everybody.  Then somebody

+sings out, "Take up a collection for him, take up a collection!"  Well,

+a half a dozen made a jump to do it, but somebody sings out, "Let him

+pass the hat around!"  Then everybody said it, the preacher too.

+

+So the king went all through the crowd with his hat swabbing his eyes,

+and blessing the people and praising them and thanking them for being

+so good to the poor pirates away off there; and every little while the

+prettiest kind of girls, with the tears running down their cheeks, would

+up and ask him would he let them kiss him for to remember him by; and he

+always done it; and some of them he hugged and kissed as many as five or

+six timesand he was invited to stay a week; and everybody wanted him to

+live in their houses, and said they'd think it was an honor; but he said

+as this was the last day of the camp-meeting he couldn't do no good, and

+besides he was in a sweat to get to the Indian Ocean right off and go to

+work on the pirates.

+

+When we got back to the raft and he come to count up he found he had

+collected eighty-seven dollars and seventy-five cents.  And then he had

+fetched away a three-gallon jug of whisky, too, that he found under a

+wagon when he was starting home through the woods.  The king said,

+take it all around, it laid over any day he'd ever put in in the

+missionarying line.  He said it warn't no use talking, heathens don't

+amount to shucks alongside of pirates to work a camp-meeting with.

+

+The duke was thinking he'd been doing pretty well till the king come

+to show up, but after that he didn't think so so much.  He had set

+up and printed off two little jobs for farmers in that

+printing-officehorse billsand took the money, four dollars.  And he

+had got in ten dollars' worth of advertisements for the paper, which he

+said he would put in for four dollars if they would pay in advanceso

+they done it. The price of the paper was two dollars a year, but he took

+in three subscriptions for half a dollar apiece on condition of them

+paying him in advance; they were going to pay in cordwood and onions as

+usual, but he said he had just bought the concern and knocked down the

+price as low as he could afford it, and was going to run it for cash.

+ He set up a little piece of poetry, which he made, himself, out of

+his own headthree verseskind of sweet and saddishthe name of it was,

+"Yes, crush, cold world, this breaking heart"and he left that all set

+up and ready to print in the paper, and didn't charge nothing for it.

+ Well, he took in nine dollars and a half, and said he'd done a pretty

+square day's work for it.

+

+Then he showed us another little job he'd printed and hadn't charged

+for, because it was for us.  It had a picture of a runaway nigger with

+a bundle on a stick over his shoulder, and "$200 reward" under it.  The

+reading was all about Jim, and just described him to a dot.  It said

+he run away from St. Jacques' plantation, forty mile below New Orleans,

+last winter, and likely went north, and whoever would catch him and send

+him back he could have the reward and expenses.

+

+"Now," says the duke, "after to-night we can run in the daytime if we

+want to.  Whenever we see anybody coming we can tie Jim hand and foot

+with a rope, and lay him in the wigwam and show this handbill and say we

+captured him up the river, and were too poor to travel on a steamboat,

+so we got this little raft on credit from our friends and are going down

+to get the reward.  Handcuffs and chains would look still better on Jim,

+but it wouldn't go well with the story of us being so poor.  Too much

+like jewelry.  Ropes are the correct thingwe must preserve the unities,

+as we say on the boards."

+

+We all said the duke was pretty smart, and there couldn't be no trouble

+about running daytimes.  We judged we could make miles enough that night

+to get out of the reach of the powwow we reckoned the duke's work in

+the printing office was going to make in that little town; then we could

+boom right along if we wanted to.

+

+We laid low and kept still, and never shoved out till nearly ten

+o'clock; then we slid by, pretty wide away from the town, and didn't

+hoist our lantern till we was clear out of sight of it.

+

+When Jim called me to take the watch at four in the morning, he says:

+

+"Huck, does you reck'n we gwyne to run acrost any mo' kings on dis

+trip?"

+

+"No," I says, "I reckon not."

+

+"Well," says he, "dat's all right, den.  I doan' mine one er two kings,

+but dat's enough.  Dis one's powerful drunk, en de duke ain' much

+better."

+

+I found Jim had been trying to get him to talk French, so he could hear

+what it was like; but he said he had been in this country so long, and

+had so much trouble, he'd forgot it.

+

+

+

+

+CHAPTER XXI.

+

+IT was after sun-up now, but we went right on and didn't tie up.  The

+king and the duke turned out by and by looking pretty rusty; but after

+they'd jumped overboard and took a swim it chippered them up a good

+deal. After breakfast the king he took a seat on the corner of the raft,

+and pulled off his boots and rolled up his britches, and let his legs

+dangle in the water, so as to be comfortable, and lit his pipe, and went

+to getting his Romeo and Juliet by heart.  When he had got it pretty

+good him and the duke begun to practice it together.  The duke had to

+learn him over and over again how to say every speech; and he made him

+sigh, and put his hand on his heart, and after a while he said he done

+it pretty well; "only," he says, "you mustn't bellow out Romeo!

+that way, like a bullyou must say it soft and sick and languishy,

+soR-o-o-meo! that is the idea; for Juliet's a dear sweet mere child of

+a girl, you know, and she doesn't bray like a jackass."

+

+Well, next they got out a couple of long swords that the duke made out

+of oak laths, and begun to practice the sword fightthe duke called

+himself Richard III.; and the way they laid on and pranced around

+the raft was grand to see.  But by and by the king tripped and fell

+overboard, and after that they took a rest, and had a talk about all

+kinds of adventures they'd had in other times along the river.

+

+After dinner the duke says:

+

+"Well, Capet, we'll want to make this a first-class show, you know, so

+I guess we'll add a little more to it.  We want a little something to

+answer encores with, anyway."

+

+"What's onkores, Bilgewater?"

+

+The duke told him, and then says:

+

+"I'll answer by doing the Highland fling or the sailor's hornpipe; and

+youwell, let me seeoh, I've got ityou can do Hamlet's soliloquy."

+

+"Hamlet's which?"

+

+"Hamlet's soliloquy, you know; the most celebrated thing in Shakespeare.

+Ah, it's sublime, sublime!  Always fetches the house.  I haven't got

+it in the bookI've only got one volumebut I reckon I can piece it out

+from memory.  I'll just walk up and down a minute, and see if I can call

+it back from recollection's vaults."

+

+So he went to marching up and down, thinking, and frowning horrible

+every now and then; then he would hoist up his eyebrows; next he would

+squeeze his hand on his forehead and stagger back and kind of moan; next

+he would sigh, and next he'd let on to drop a tear.  It was beautiful

+to see him. By and by he got it.  He told us to give attention.  Then

+he strikes a most noble attitude, with one leg shoved forwards, and his

+arms stretched away up, and his head tilted back, looking up at the sky;

+and then he begins to rip and rave and grit his teeth; and after that,

+all through his speech, he howled, and spread around, and swelled up his

+chest, and just knocked the spots out of any acting ever I see before.

+ This is the speechI learned it, easy enough, while he was learning it

+to the king:

+

+To be, or not to be; that is the bare bodkin That makes calamity of

+so long life; For who would fardels bear, till Birnam Wood do come

+to Dunsinane, But that the fear of something after death Murders the

+innocent sleep, Great nature's second course, And makes us rather sling

+the arrows of outrageous fortune Than fly to others that we know not of.

+There's the respect must give us pause: Wake Duncan with thy knocking! I

+would thou couldst; For who would bear the whips and scorns of time, The

+oppressor's wrong, the proud man's contumely, The law's delay, and the

+quietus which his pangs might take. In the dead waste and middle of the

+night, when churchyards yawn In customary suits of solemn black, But

+that the undiscovered country from whose bourne no traveler returns,

+Breathes forth contagion on the world, And thus the native hue of

+resolution, like the poor cat i' the adage, Is sicklied o'er with care.

+And all the clouds that lowered o'er our housetops, With this

+regard their currents turn awry, And lose the name of action. 'Tis a

+consummation devoutly to be wished. But soft you, the fair Ophelia: Ope

+not thy ponderous and marble jaws. But get thee to a nunnery&mdash;go!

+

+Well, the old man he liked that speech, and he mighty soon got it so he

+could do it first rate. It seemed like he was just born for it; and when

+he had his hand in and was excited, it was perfectly lovely the way he

+would rip and tear and rair up behind when he was getting it off.

+

+The first chance we got, the duke he had some show bills printed; and

+after that, for two or three days as we floated along, the raft was a

+most uncommon lively place, for there warn't nothing but sword-fighting

+and rehearsingas the duke called itgoing on all the time. One morning,

+when we was pretty well down the State of Arkansaw, we come in sight

+of a little one-horse town in a big bend; so we tied up about

+three-quarters of a mile above it, in the mouth of a crick which was

+shut in like a tunnel by the cypress trees, and all of us but Jim took

+the canoe and went down there to see if there was any chance in that

+place for our show.

+

+We struck it mighty lucky; there was going to be a circus there that

+afternoon, and the country people was already beginning to come in, in

+all kinds of old shackly wagons, and on horses. The circus would leave

+before night, so our show would have a pretty good chance. The duke he

+hired the court house, and we went around and stuck up our bills. They

+read like this:

+

+Shaksperean Revival!!!

+

+Wonderful Attraction!

+

+For One Night Only! The world renowned tragedians,

+

+David Garrick the younger, of Drury Lane Theatre, London,

+

+and

+

+Edmund Kean the elder, of the Royal Haymarket Theatre, Whitechapel,

+Pudding Lane, Piccadilly, London, and the Royal Continental Theatres, in

+their sublime Shaksperean Spectacle entitled The Balcony Scene in

+

+Romeo and Juliet!!!

+

+Romeo...................................... Mr. Garrick.

+

+Juliet..................................... Mr. Kean.

+

+Assisted by the whole strength of the company!

+

+New costumes, new scenery, new appointments!

+

+Also:

+

+The thrilling, masterly, and blood-curdling Broad-sword conflict In

+Richard III.!!!

+

+Richard III................................ Mr. Garrick.

+

+Richmond................................... Mr. Kean.

+

+also:

+

+(by special request,)

+

+Hamlet's Immortal Soliloquy!!

+

+By the Illustrious Kean!

+

+Done by him 300 consecutive nights in Paris!

+

+For One Night Only,

+

+On account of imperative European engagements!

+

+Admission 25 cents; children and servants, 10 cents.

+

+Then we went loafing around the town. The stores and houses was most all

+old shackly dried-up frame concerns that hadn't ever been painted; they

+was set up three or four foot above ground on stilts, so as to be out of

+reach of the water when the river was overflowed. The houses had little

+gardens around them, but they didn't seem to raise hardly anything in

+them but jimpson weeds, and sunflowers, and ash-piles, and old curled-up

+boots and shoes, and pieces of bottles, and rags, and played-out

+tin-ware. The fences was made of different kinds of boards, nailed on

+at different times; and they leaned every which-way, and had gates that

+didn't generly have but one hingea leather one. Some of the fences

+had been whitewashed, some time or another, but the duke said it was in

+Clumbus's time, like enough. There was generly hogs in the garden, and

+people driving them out.

+

+All the stores was along one street.  They had white domestic awnings in

+front, and the country people hitched their horses to the awning-posts.

+There was empty drygoods boxes under the awnings, and loafers roosting

+on them all day long, whittling them with their Barlow knives; and

+chawing tobacco, and gaping and yawning and stretchinga mighty ornery

+lot. They generly had on yellow straw hats most as wide as an umbrella,

+but didn't wear no coats nor waistcoats, they called one another Bill,

+and Buck, and Hank, and Joe, and Andy, and talked lazy and drawly, and

+used considerable many cuss words.  There was as many as one loafer

+leaning up against every awning-post, and he most always had his hands

+in his britches-pockets, except when he fetched them out to lend a chaw

+of tobacco or scratch.  What a body was hearing amongst them all the

+time was:

+

+"Gimme a chaw 'v tobacker, Hank."

+

+"Cain't; I hain't got but one chaw left.  Ask Bill."

+

+Maybe Bill he gives him a chaw; maybe he lies and says he ain't got

+none. Some of them kinds of loafers never has a cent in the world, nor a

+chaw of tobacco of their own.  They get all their chawing by borrowing;

+they say to a fellow, "I wisht you'd len' me a chaw, Jack, I jist this

+minute give Ben Thompson the last chaw I had"which is a lie pretty

+much everytime; it don't fool nobody but a stranger; but Jack ain't no

+stranger, so he says:

+

+"You give him a chaw, did you?  So did your sister's cat's

+grandmother. You pay me back the chaws you've awready borry'd off'n me,

+Lafe Buckner, then I'll loan you one or two ton of it, and won't charge

+you no back intrust, nuther."

+

+"Well, I did pay you back some of it wunst."

+

+"Yes, you did'bout six chaws.  You borry'd store tobacker and paid back

+nigger-head."

+

+Store tobacco is flat black plug, but these fellows mostly chaws the

+natural leaf twisted.  When they borrow a chaw they don't generly cut it

+off with a knife, but set the plug in between their teeth, and gnaw with

+their teeth and tug at the plug with their hands till they get it in

+two; then sometimes the one that owns the tobacco looks mournful at it

+when it's handed back, and says, sarcastic:

+

+"Here, gimme the chaw, and you take the plug."

+

+All the streets and lanes was just mud; they warn't nothing else but

+mudmud as black as tar and nigh about a foot deep in some places,

+and two or three inches deep in all the places.  The hogs loafed and

+grunted around everywheres.  You'd see a muddy sow and a litter of pigs

+come lazying along the street and whollop herself right down in the way,

+where folks had to walk around her, and she'd stretch out and shut her

+eyes and wave her ears whilst the pigs was milking her, and look as

+happy as if she was on salary. And pretty soon you'd hear a loafer

+sing out, "Hi!  so boy! sick him, Tige!" and away the sow would go,

+squealing most horrible, with a dog or two swinging to each ear, and

+three or four dozen more a-coming; and then you would see all the

+loafers get up and watch the thing out of sight, and laugh at the fun

+and look grateful for the noise.  Then they'd settle back again till

+there was a dog fight.  There couldn't anything wake them up all over,

+and make them happy all over, like a dog fightunless it might be

+putting turpentine on a stray dog and setting fire to him, or tying a

+tin pan to his tail and see him run himself to death.

+

+On the river front some of the houses was sticking out over the bank,

+and they was bowed and bent, and about ready to tumble in. The people

+had moved out of them.  The bank was caved away under one corner of some

+others, and that corner was hanging over.  People lived in them yet, but

+it was dangersome, because sometimes a strip of land as wide as a house

+caves in at a time.  Sometimes a belt of land a quarter of a mile deep

+will start in and cave along and cave along till it all caves into the

+river in one summer. Such a town as that has to be always moving back,

+and back, and back, because the river's always gnawing at it.

+

+The nearer it got to noon that day the thicker and thicker was the

+wagons and horses in the streets, and more coming all the time.

+ Families fetched their dinners with them from the country, and eat them

+in the wagons.  There was considerable whisky drinking going on, and I

+seen three fights.  By and by somebody sings out:

+

+"Here comes old Boggs!in from the country for his little old monthly

+drunk; here he comes, boys!"

+

+All the loafers looked glad; I reckoned they was used to having fun out

+of Boggs.  One of them says:

+

+"Wonder who he's a-gwyne to chaw up this time.  If he'd a-chawed up all

+the men he's ben a-gwyne to chaw up in the last twenty year he'd have

+considerable ruputation now."

+

+Another one says, "I wisht old Boggs 'd threaten me, 'cuz then I'd know

+I warn't gwyne to die for a thousan' year."

+

+Boggs comes a-tearing along on his horse, whooping and yelling like an

+Injun, and singing out:

+

+"Cler the track, thar.  I'm on the waw-path, and the price uv coffins is

+a-gwyne to raise."

+

+He was drunk, and weaving about in his saddle; he was over fifty year

+old, and had a very red face.  Everybody yelled at him and laughed at

+him and sassed him, and he sassed back, and said he'd attend to them and

+lay them out in their regular turns, but he couldn't wait now because

+he'd come to town to kill old Colonel Sherburn, and his motto was, "Meat

+first, and spoon vittles to top off on."

+

+He see me, and rode up and says:

+

+"Whar'd you come f'm, boy?  You prepared to die?"

+

+Then he rode on.  I was scared, but a man says:

+

+"He don't mean nothing; he's always a-carryin' on like that when he's

+drunk.  He's the best naturedest old fool in Arkansawnever hurt nobody,

+drunk nor sober."

+

+Boggs rode up before the biggest store in town, and bent his head down

+so he could see under the curtain of the awning and yells:

+

+"Come out here, Sherburn! Come out and meet the man you've swindled.

+You're the houn' I'm after, and I'm a-gwyne to have you, too!"

+

+And so he went on, calling Sherburn everything he could lay his tongue

+to, and the whole street packed with people listening and laughing and

+going on.  By and by a proud-looking man about fifty-fiveand he was a

+heap the best dressed man in that town, toosteps out of the store, and

+the crowd drops back on each side to let him come.  He says to Boggs,

+mighty ca'm and slowhe says:

+

+"I'm tired of this, but I'll endure it till one o'clock.  Till one

+o'clock, mindno longer.  If you open your mouth against me only once

+after that time you can't travel so far but I will find you."

+

+Then he turns and goes in.  The crowd looked mighty sober; nobody

+stirred, and there warn't no more laughing.  Boggs rode off

+blackguarding Sherburn as loud as he could yell, all down the street;

+and pretty soon back he comes and stops before the store, still keeping

+it up.  Some men crowded around him and tried to get him to shut up,

+but he wouldn't; they told him it would be one o'clock in about fifteen

+minutes, and so he must go homehe must go right away.  But it didn't

+do no good.  He cussed away with all his might, and throwed his hat down

+in the mud and rode over it, and pretty soon away he went a-raging down

+the street again, with his gray hair a-flying. Everybody that could get

+a chance at him tried their best to coax him off of his horse so they

+could lock him up and get him sober; but it warn't no useup the street

+he would tear again, and give Sherburn another cussing.  By and by

+somebody says:

+

+"Go for his daughter!quick, go for his daughter; sometimes he'll listen

+to her.  If anybody can persuade him, she can."

+

+So somebody started on a run.  I walked down street a ways and stopped.

+In about five or ten minutes here comes Boggs again, but not on his

+horse.  He was a-reeling across the street towards me, bare-headed, with

+a friend on both sides of him a-holt of his arms and hurrying him along.

+He was quiet, and looked uneasy; and he warn't hanging back any, but was

+doing some of the hurrying himself.  Somebody sings out:

+

+"Boggs!"

+

+I looked over there to see who said it, and it was that Colonel

+Sherburn. He was standing perfectly still in the street, and had a

+pistol raised in his right handnot aiming it, but holding it out with

+the barrel tilted up towards the sky.  The same second I see a young

+girl coming on the run, and two men with her.  Boggs and the men turned

+round to see who called him, and when they see the pistol the men

+jumped to one side, and the pistol-barrel come down slow and steady to

+a levelboth barrels cocked. Boggs throws up both of his hands and says,

+"O Lord, don't shoot!"  Bang! goes the first shot, and he staggers back,

+clawing at the airbang! goes the second one, and he tumbles backwards

+on to the ground, heavy and solid, with his arms spread out.  That young

+girl screamed out and comes rushing, and down she throws herself on her

+father, crying, and saying, "Oh, he's killed him, he's killed him!"  The

+crowd closed up around them, and shouldered and jammed one another, with

+their necks stretched, trying to see, and people on the inside trying to

+shove them back and shouting, "Back, back! give him air, give him air!"

+

+Colonel Sherburn he tossed his pistol on to the ground, and turned

+around on his heels and walked off.

+

+They took Boggs to a little drug store, the crowd pressing around just

+the same, and the whole town following, and I rushed and got a good

+place at the window, where I was close to him and could see in.  They

+laid him on the floor and put one large Bible under his head, and opened

+another one and spread it on his breast; but they tore open his shirt

+first, and I seen where one of the bullets went in.  He made about a

+dozen long gasps, his breast lifting the Bible up when he drawed in his

+breath, and letting it down again when he breathed it outand after that

+he laid still; he was dead.  Then they pulled his daughter away from

+him, screaming and crying, and took her off.  She was about sixteen, and

+very sweet and gentle looking, but awful pale and scared.

+

+Well, pretty soon the whole town was there, squirming and scrouging and

+pushing and shoving to get at the window and have a look, but people

+that had the places wouldn't give them up, and folks behind them was

+saying all the time, "Say, now, you've looked enough, you fellows;

+'tain't right and 'tain't fair for you to stay thar all the time, and

+never give nobody a chance; other folks has their rights as well as

+you."

+

+There was considerable jawing back, so I slid out, thinking maybe

+there was going to be trouble.  The streets was full, and everybody was

+excited. Everybody that seen the shooting was telling how it happened,

+and there was a big crowd packed around each one of these fellows,

+stretching their necks and listening.  One long, lanky man, with long

+hair and a big white fur stovepipe hat on the back of his head, and a

+crooked-handled cane, marked out the places on the ground where Boggs

+stood and where Sherburn stood, and the people following him around from

+one place to t'other and watching everything he done, and bobbing their

+heads to show they understood, and stooping a little and resting their

+hands on their thighs to watch him mark the places on the ground with

+his cane; and then he stood up straight and stiff where Sherburn had

+stood, frowning and having his hat-brim down over his eyes, and sung

+out, "Boggs!" and then fetched his cane down slow to a level, and says

+"Bang!" staggered backwards, says "Bang!" again, and fell down flat on

+his back. The people that had seen the thing said he done it perfect;

+said it was just exactly the way it all happened.  Then as much as a

+dozen people got out their bottles and treated him.

+

+Well, by and by somebody said Sherburn ought to be lynched.  In about a

+minute everybody was saying it; so away they went, mad and yelling, and

+snatching down every clothes-line they come to to do the hanging with.

+

+

+

+

+CHAPTER XXII.

+

+THEY swarmed up towards Sherburn's house, a-whooping and raging like

+Injuns, and everything had to clear the way or get run over and tromped

+to mush, and it was awful to see.  Children was heeling it ahead of the

+mob, screaming and trying to get out of the way; and every window along

+the road was full of women's heads, and there was nigger boys in every

+tree, and bucks and wenches looking over every fence; and as soon as the

+mob would get nearly to them they would break and skaddle back out of

+reach.  Lots of the women and girls was crying and taking on, scared

+most to death.

+

+They swarmed up in front of Sherburn's palings as thick as they could

+jam together, and you couldn't hear yourself think for the noise.  It

+was a little twenty-foot yard.  Some sung out "Tear down the fence! tear

+down the fence!"  Then there was a racket of ripping and tearing and

+smashing, and down she goes, and the front wall of the crowd begins to

+roll in like a wave.

+

+Just then Sherburn steps out on to the roof of his little front porch,

+with a double-barrel gun in his hand, and takes his stand, perfectly

+ca'm and deliberate, not saying a word.  The racket stopped, and the

+wave sucked back.

+

+Sherburn never said a wordjust stood there, looking down.  The

+stillness was awful creepy and uncomfortable.  Sherburn run his eye slow

+along the crowd; and wherever it struck the people tried a little to

+out-gaze him, but they couldn't; they dropped their eyes and looked

+sneaky. Then pretty soon Sherburn sort of laughed; not the pleasant

+kind, but the kind that makes you feel like when you are eating bread

+that's got sand in it.

+

+Then he says, slow and scornful:

+

+"The idea of you lynching anybody!  It's amusing.  The idea of you

+thinking you had pluck enough to lynch a man!  Because you're brave

+enough to tar and feather poor friendless cast-out women that come along

+here, did that make you think you had grit enough to lay your hands on a

+man?  Why, a man's safe in the hands of ten thousand of your kindas

+long as it's daytime and you're not behind him.

+

+"Do I know you?  I know you clear through. I was born and raised in the

+South, and I've lived in the North; so I know the average all around.

+The average man's a coward.  In the North he lets anybody walk over him

+that wants to, and goes home and prays for a humble spirit to bear it.

+In the South one man all by himself, has stopped a stage full of men

+in the daytime, and robbed the lot.  Your newspapers call you a

+brave people so much that you think you are braver than any other

+peoplewhereas you're just as brave, and no braver.  Why don't your

+juries hang murderers?  Because they're afraid the man's friends will

+shoot them in the back, in the darkand it's just what they would do.

+

+"So they always acquit; and then a man goes in the night, with a

+hundred masked cowards at his back and lynches the rascal.  Your mistake

+is, that you didn't bring a man with you; that's one mistake, and the

+other is that you didn't come in the dark and fetch your masks.  You

+brought part of a manBuck Harkness, thereand if you hadn't had him

+to start you, you'd a taken it out in blowing.

+

+"You didn't want to come.  The average man don't like trouble and

+danger. You don't like trouble and danger.  But if only half a

+manlike Buck Harkness, thereshouts 'Lynch him! lynch him!' you're

+afraid to back downafraid you'll be found out to be what you

+arecowardsand so you raise a yell, and hang yourselves on to that

+half-a-man's coat-tail, and come raging up here, swearing what big

+things you're going to do. The pitifulest thing out is a mob; that's

+what an army isa mob; they don't fight with courage that's born in

+them, but with courage that's borrowed from their mass, and from their

+officers.  But a mob without any man at the head of it is beneath

+pitifulness.  Now the thing for you to do is to droop your tails and

+go home and crawl in a hole.  If any real lynching's going to be done it

+will be done in the dark, Southern fashion; and when they come they'll

+bring their masks, and fetch a man along.  Now leaveand take your

+half-a-man with you"tossing his gun up across his left arm and cocking

+it when he says this.

+

+The crowd washed back sudden, and then broke all apart, and went tearing

+off every which way, and Buck Harkness he heeled it after them, looking

+tolerable cheap.  I could a stayed if I wanted to, but I didn't want to.

+

+I went to the circus and loafed around the back side till the watchman

+went by, and then dived in under the tent.  I had my twenty-dollar gold

+piece and some other money, but I reckoned I better save it, because

+there ain't no telling how soon you are going to need it, away from

+home and amongst strangers that way.  You can't be too careful.  I ain't

+opposed to spending money on circuses when there ain't no other way, but

+there ain't no use in wasting it on them.

+

+It was a real bully circus.  It was the splendidest sight that ever was

+when they all come riding in, two and two, a gentleman and lady, side

+by side, the men just in their drawers and undershirts, and no shoes

+nor stirrups, and resting their hands on their thighs easy and

+comfortablethere must a been twenty of themand every lady with a

+lovely complexion, and perfectly beautiful, and looking just like a gang

+of real sure-enough queens, and dressed in clothes that cost millions of

+dollars, and just littered with diamonds.  It was a powerful fine sight;

+I never see anything so lovely.  And then one by one they got up

+and stood, and went a-weaving around the ring so gentle and wavy and

+graceful, the men looking ever so tall and airy and straight, with their

+heads bobbing and skimming along, away up there under the tent-roof, and

+every lady's rose-leafy dress flapping soft and silky around her hips,

+and she looking like the most loveliest parasol.

+

+And then faster and faster they went, all of them dancing, first one

+foot out in the air and then the other, the horses leaning more and

+more, and the ringmaster going round and round the center-pole, cracking

+his whip and shouting "Hi!hi!" and the clown cracking jokes behind

+him; and by and by all hands dropped the reins, and every lady put her

+knuckles on her hips and every gentleman folded his arms, and then how

+the horses did lean over and hump themselves!  And so one after the

+other they all skipped off into the ring, and made the sweetest bow I

+ever see, and then scampered out, and everybody clapped their hands and

+went just about wild.

+

+Well, all through the circus they done the most astonishing things; and

+all the time that clown carried on so it most killed the people.  The

+ringmaster couldn't ever say a word to him but he was back at him quick

+as a wink with the funniest things a body ever said; and how he ever

+could think of so many of them, and so sudden and so pat, was what I

+couldn't noway understand. Why, I couldn't a thought of them in a year.

+And by and by a drunk man tried to get into the ringsaid he wanted to

+ride; said he could ride as well as anybody that ever was.  They argued

+and tried to keep him out, but he wouldn't listen, and the whole show

+come to a standstill.  Then the people begun to holler at him and make

+fun of him, and that made him mad, and he begun to rip and tear; so that

+stirred up the people, and a lot of men begun to pile down off of the

+benches and swarm towards the ring, saying, "Knock him down! throw him

+out!" and one or two women begun to scream.  So, then, the ringmaster

+he made a little speech, and said he hoped there wouldn't be no

+disturbance, and if the man would promise he wouldn't make no more

+trouble he would let him ride if he thought he could stay on the horse.

+ So everybody laughed and said all right, and the man got on. The minute

+he was on, the horse begun to rip and tear and jump and cavort around,

+with two circus men hanging on to his bridle trying to hold him, and the

+drunk man hanging on to his neck, and his heels flying in the air every

+jump, and the whole crowd of people standing up shouting and laughing

+till tears rolled down.  And at last, sure enough, all the circus men

+could do, the horse broke loose, and away he went like the very nation,

+round and round the ring, with that sot laying down on him and hanging

+to his neck, with first one leg hanging most to the ground on one side,

+and then t'other one on t'other side, and the people just crazy.  It

+warn't funny to me, though; I was all of a tremble to see his danger.

+ But pretty soon he struggled up astraddle and grabbed the bridle,

+a-reeling this way and that; and the next minute he sprung up and

+dropped the bridle and stood! and the horse a-going like a house afire

+too.  He just stood up there, a-sailing around as easy and comfortable

+as if he warn't ever drunk in his lifeand then he begun to pull off his

+clothes and sling them.  He shed them so thick they kind of clogged up

+the air, and altogether he shed seventeen suits. And, then, there he

+was, slim and handsome, and dressed the gaudiest and prettiest you

+ever saw, and he lit into that horse with his whip and made him fairly

+humand finally skipped off, and made his bow and danced off to

+the dressing-room, and everybody just a-howling with pleasure and

+astonishment.

+

+Then the ringmaster he see how he had been fooled, and he was the

+sickest ringmaster you ever see, I reckon.  Why, it was one of his own

+men!  He had got up that joke all out of his own head, and never let on

+to nobody. Well, I felt sheepish enough to be took in so, but I wouldn't

+a been in that ringmaster's place, not for a thousand dollars.  I don't

+know; there may be bullier circuses than what that one was, but I

+never struck them yet. Anyways, it was plenty good enough for me; and

+wherever I run across it, it can have all of my custom every time.

+

+Well, that night we had our show; but there warn't only about twelve

+people therejust enough to pay expenses.  And they laughed all the

+time, and that made the duke mad; and everybody left, anyway, before

+the show was over, but one boy which was asleep.  So the duke said these

+Arkansaw lunkheads couldn't come up to Shakespeare; what they wanted

+was low comedyand maybe something ruther worse than low comedy, he

+reckoned.  He said he could size their style.  So next morning he got

+some big sheets of wrapping paper and some black paint, and drawed off

+some handbills, and stuck them up all over the village.  The bills said:

+

+

+

+

+CHAPTER XXIII.

+

+WELL, all day him and the king was hard at it, rigging up a stage and

+a curtain and a row of candles for footlights; and that night the house

+was jam full of men in no time.  When the place couldn't hold no more,

+the duke he quit tending door and went around the back way and come on

+to the stage and stood up before the curtain and made a little speech,

+and praised up this tragedy, and said it was the most thrillingest one

+that ever was; and so he went on a-bragging about the tragedy, and about

+Edmund Kean the Elder, which was to play the main principal part in it;

+and at last when he'd got everybody's expectations up high enough, he

+rolled up the curtain, and the next minute the king come a-prancing

+out on all fours, naked; and he was painted all over,

+ring-streaked-and-striped, all sorts of colors, as splendid as a

+rainbow.  Andbut never mind the rest of his outfit; it was just wild,

+but it was awful funny. The people most killed themselves laughing; and

+when the king got done capering and capered off behind the scenes, they

+roared and clapped and stormed and haw-hawed till he come back and done

+it over again, and after that they made him do it another time. Well, it

+would make a cow laugh to see the shines that old idiot cut.

+

+Then the duke he lets the curtain down, and bows to the people, and says

+the great tragedy will be performed only two nights more, on accounts of

+pressing London engagements, where the seats is all sold already for it

+in Drury Lane; and then he makes them another bow, and says if he has

+succeeded in pleasing them and instructing them, he will be deeply

+obleeged if they will mention it to their friends and get them to come

+and see it.

+

+Twenty people sings out:

+

+"What, is it over?  Is that all?"

+

+The duke says yes.  Then there was a fine time.  Everybody sings

+out, "Sold!" and rose up mad, and was a-going for that stage and them

+tragedians.  But a big, fine looking man jumps up on a bench and shouts:

+

+"Hold on!  Just a word, gentlemen."  They stopped to listen.  "We are

+soldmighty badly sold.  But we don't want to be the laughing stock of

+this whole town, I reckon, and never hear the last of this thing as long

+as we live.  No.  What we want is to go out of here quiet, and talk

+this show up, and sell the rest of the town!  Then we'll all be in the

+same boat.  Ain't that sensible?" ("You bet it is!the jedge is right!"

+everybody sings out.) "All right, thennot a word about any sell.  Go

+along home, and advise everybody to come and see the tragedy."

+

+Next day you couldn't hear nothing around that town but how splendid

+that show was.  House was jammed again that night, and we sold this

+crowd the same way.  When me and the king and the duke got home to the

+raft we all had a supper; and by and by, about midnight, they made Jim

+and me back her out and float her down the middle of the river, and

+fetch her in and hide her about two mile below town.

+

+The third night the house was crammed againand they warn't new-comers

+this time, but people that was at the show the other two nights.  I

+stood by the duke at the door, and I see that every man that went in had

+his pockets bulging, or something muffled up under his coatand I see it

+warn't no perfumery, neither, not by a long sight.  I smelt sickly eggs

+by the barrel, and rotten cabbages, and such things; and if I know the

+signs of a dead cat being around, and I bet I do, there was sixty-four

+of them went in.  I shoved in there for a minute, but it was too various

+for me; I couldn't stand it.  Well, when the place couldn't hold no more

+people the duke he give a fellow a quarter and told him to tend door

+for him a minute, and then he started around for the stage door, I after

+him; but the minute we turned the corner and was in the dark he says:

+

+"Walk fast now till you get away from the houses, and then shin for the

+raft like the dickens was after you!"

+

+I done it, and he done the same.  We struck the raft at the same time,

+and in less than two seconds we was gliding down stream, all dark and

+still, and edging towards the middle of the river, nobody saying a

+word. I reckoned the poor king was in for a gaudy time of it with the

+audience, but nothing of the sort; pretty soon he crawls out from under

+the wigwam, and says:

+

+"Well, how'd the old thing pan out this time, duke?"  He hadn't been

+up-town at all.

+

+We never showed a light till we was about ten mile below the village.

+Then we lit up and had a supper, and the king and the duke fairly

+laughed their bones loose over the way they'd served them people.  The

+duke says:

+

+"Greenhorns, flatheads!  I knew the first house would keep mum and let

+the rest of the town get roped in; and I knew they'd lay for us the

+third night, and consider it was their turn now.  Well, it is their

+turn, and I'd give something to know how much they'd take for it.  I

+would just like to know how they're putting in their opportunity.

+ They can turn it into a picnic if they want tothey brought plenty

+provisions."

+

+Them rapscallions took in four hundred and sixty-five dollars in that

+three nights.  I never see money hauled in by the wagon-load like that

+before.  By and by, when they was asleep and snoring, Jim says:

+

+"Don't it s'prise you de way dem kings carries on, Huck?"

+

+"No," I says, "it don't."

+

+"Why don't it, Huck?"

+

+"Well, it don't, because it's in the breed.  I reckon they're all

+alike."

+

+"But, Huck, dese kings o' ourn is reglar rapscallions; dat's jist what

+dey is; dey's reglar rapscallions."

+

+"Well, that's what I'm a-saying; all kings is mostly rapscallions, as

+fur as I can make out."

+

+"Is dat so?"

+

+"You read about them onceyou'll see.  Look at Henry the Eight; this 'n

+'s a Sunday-school Superintendent to him.  And look at Charles Second,

+and Louis Fourteen, and Louis Fifteen, and James Second, and Edward

+Second, and Richard Third, and forty more; besides all them Saxon

+heptarchies that used to rip around so in old times and raise Cain.  My,

+you ought to seen old Henry the Eight when he was in bloom.  He was a

+blossom.  He used to marry a new wife every day, and chop off her head

+next morning.  And he would do it just as indifferent as if he was

+ordering up eggs.  'Fetch up Nell Gwynn,' he says.  They fetch her up.

+Next morning, 'Chop off her head!'  And they chop it off.  'Fetch up

+Jane Shore,' he says; and up she comes, Next morning, 'Chop off her

+head'and they chop it off.  'Ring up Fair Rosamun.'  Fair Rosamun

+answers the bell.  Next morning, 'Chop off her head.'  And he made every

+one of them tell him a tale every night; and he kept that up till he had

+hogged a thousand and one tales that way, and then he put them all in a

+book, and called it Domesday Bookwhich was a good name and stated the

+case.  You don't know kings, Jim, but I know them; and this old rip

+of ourn is one of the cleanest I've struck in history.  Well, Henry he

+takes a notion he wants to get up some trouble with this country. How

+does he go at itgive notice?give the country a show?  No.  All of a

+sudden he heaves all the tea in Boston Harbor overboard, and whacks

+out a declaration of independence, and dares them to come on.  That was

+his stylehe never give anybody a chance.  He had suspicions of his

+father, the Duke of Wellington.  Well, what did he do?  Ask him to show

+up?  Nodrownded him in a butt of mamsey, like a cat.  S'pose people

+left money laying around where he waswhat did he do?  He collared it.

+ S'pose he contracted to do a thing, and you paid him, and didn't set

+down there and see that he done itwhat did he do?  He always done the

+other thing. S'pose he opened his mouthwhat then?  If he didn't shut it

+up powerful quick he'd lose a lie every time.  That's the kind of a bug

+Henry was; and if we'd a had him along 'stead of our kings he'd a fooled

+that town a heap worse than ourn done.  I don't say that ourn is lambs,

+because they ain't, when you come right down to the cold facts; but they

+ain't nothing to that old ram, anyway.  All I say is, kings is kings,

+and you got to make allowances.  Take them all around, they're a mighty

+ornery lot. It's the way they're raised."

+

+"But dis one do smell so like de nation, Huck."

+

+"Well, they all do, Jim.  We can't help the way a king smells; history

+don't tell no way."

+

+"Now de duke, he's a tolerble likely man in some ways."

+

+"Yes, a duke's different.  But not very different.  This one's

+a middling hard lot for a duke.  When he's drunk there ain't no

+near-sighted man could tell him from a king."

+

+"Well, anyways, I doan' hanker for no mo' un um, Huck.  Dese is all I

+kin stan'."

+

+"It's the way I feel, too, Jim.  But we've got them on our hands, and we

+got to remember what they are, and make allowances.  Sometimes I wish we

+could hear of a country that's out of kings."

+

+What was the use to tell Jim these warn't real kings and dukes?  It

+wouldn't a done no good; and, besides, it was just as I said:  you

+couldn't tell them from the real kind.

+

+I went to sleep, and Jim didn't call me when it was my turn.  He often

+done that.  When I waked up just at daybreak he was sitting there with

+his head down betwixt his knees, moaning and mourning to himself.  I

+didn't take notice nor let on.  I knowed what it was about.  He was

+thinking about his wife and his children, away up yonder, and he was low

+and homesick; because he hadn't ever been away from home before in his

+life; and I do believe he cared just as much for his people as white

+folks does for their'n.  It don't seem natural, but I reckon it's so.

+ He was often moaning and mourning that way nights, when he judged I

+was asleep, and saying, "Po' little 'Lizabeth! po' little Johnny! it's

+mighty hard; I spec' I ain't ever gwyne to see you no mo', no mo'!"  He

+was a mighty good nigger, Jim was.

+

+But this time I somehow got to talking to him about his wife and young

+ones; and by and by he says:

+

+"What makes me feel so bad dis time 'uz bekase I hear sumpn over yonder

+on de bank like a whack, er a slam, while ago, en it mine me er de time

+I treat my little 'Lizabeth so ornery.  She warn't on'y 'bout fo' year

+ole, en she tuck de sk'yarlet fever, en had a powful rough spell; but

+she got well, en one day she was a-stannin' aroun', en I says to her, I

+says:

+

+"'Shet de do'.'

+

+"She never done it; jis' stood dah, kiner smilin' up at me.  It make me

+mad; en I says agin, mighty loud, I says:

+

+"'Doan' you hear me?  Shet de do'!'

+

+"She jis stood de same way, kiner smilin' up.  I was a-bilin'!  I says:

+

+"'I lay I make you mine!'

+

+"En wid dat I fetch' her a slap side de head dat sont her a-sprawlin'.

+Den I went into de yuther room, en 'uz gone 'bout ten minutes; en when

+I come back dah was dat do' a-stannin' open yit, en dat chile stannin'

+mos' right in it, a-lookin' down and mournin', en de tears runnin' down.

+ My, but I wuz mad!  I was a-gwyne for de chile, but jis' denit was a

+do' dat open innerdsjis' den, 'long come de wind en slam it to, behine

+de chile, ker-BLAM!en my lan', de chile never move'!  My breff mos'

+hop outer me; en I feel sosoI doan' know HOW I feel.  I crope out,

+all a-tremblin', en crope aroun' en open de do' easy en slow, en poke my

+head in behine de chile, sof' en still, en all uv a sudden I says POW!

+jis' as loud as I could yell.  She never budge!  Oh, Huck, I bust out

+a-cryin' en grab her up in my arms, en say, 'Oh, de po' little thing!

+ De Lord God Amighty fogive po' ole Jim, kaze he never gwyne to fogive

+hisself as long's he live!'  Oh, she was plumb deef en dumb, Huck, plumb

+deef en dumben I'd ben a-treat'n her so!"

+

+

+

+

+CHAPTER XXIV.

+

+NEXT day, towards night, we laid up under a little willow towhead out in

+the middle, where there was a village on each side of the river, and the

+duke and the king begun to lay out a plan for working them towns.  Jim

+he spoke to the duke, and said he hoped it wouldn't take but a few

+hours, because it got mighty heavy and tiresome to him when he had to

+lay all day in the wigwam tied with the rope.  You see, when we left him

+all alone we had to tie him, because if anybody happened on to him all

+by himself and not tied it wouldn't look much like he was a runaway

+nigger, you know. So the duke said it was kind of hard to have to lay

+roped all day, and he'd cipher out some way to get around it.

+

+He was uncommon bright, the duke was, and he soon struck it.  He dressed

+Jim up in King Lear's outfitit was a long curtain-calico gown, and a

+white horse-hair wig and whiskers; and then he took his theater paint

+and painted Jim's face and hands and ears and neck all over a dead,

+dull, solid blue, like a man that's been drownded nine days.  Blamed if

+he warn't the horriblest looking outrage I ever see.  Then the duke took

+and wrote out a sign on a shingle so:

+

+Sick Arabbut harmless when not out of his head.

+

+And he nailed that shingle to a lath, and stood the lath up four or five

+foot in front of the wigwam.  Jim was satisfied.  He said it was a sight

+better than lying tied a couple of years every day, and trembling all

+over every time there was a sound.  The duke told him to make himself

+free and easy, and if anybody ever come meddling around, he must hop

+out of the wigwam, and carry on a little, and fetch a howl or two like

+a wild beast, and he reckoned they would light out and leave him alone.

+ Which was sound enough judgment; but you take the average man, and he

+wouldn't wait for him to howl.  Why, he didn't only look like he was

+dead, he looked considerable more than that.

+

+These rapscallions wanted to try the Nonesuch again, because there was

+so much money in it, but they judged it wouldn't be safe, because maybe

+the news might a worked along down by this time.  They couldn't hit no

+project that suited exactly; so at last the duke said he reckoned he'd

+lay off and work his brains an hour or two and see if he couldn't put up

+something on the Arkansaw village; and the king he allowed he would drop

+over to t'other village without any plan, but just trust in Providence

+to lead him the profitable waymeaning the devil, I reckon.  We had all

+bought store clothes where we stopped last; and now the king put his'n

+on, and he told me to put mine on.  I done it, of course.  The king's

+duds was all black, and he did look real swell and starchy.  I never

+knowed how clothes could change a body before.  Why, before, he looked

+like the orneriest old rip that ever was; but now, when he'd take off

+his new white beaver and make a bow and do a smile, he looked that grand

+and good and pious that you'd say he had walked right out of the ark,

+and maybe was old Leviticus himself.  Jim cleaned up the canoe, and I

+got my paddle ready.  There was a big steamboat laying at the shore away

+up under the point, about three mile above the townbeen there a couple

+of hours, taking on freight.  Says the king:

+

+"Seein' how I'm dressed, I reckon maybe I better arrive down from St.

+Louis or Cincinnati, or some other big place.  Go for the steamboat,

+Huckleberry; we'll come down to the village on her."

+

+I didn't have to be ordered twice to go and take a steamboat ride.

+ I fetched the shore a half a mile above the village, and then went

+scooting along the bluff bank in the easy water.  Pretty soon we come to

+a nice innocent-looking young country jake setting on a log swabbing the

+sweat off of his face, for it was powerful warm weather; and he had a

+couple of big carpet-bags by him.

+

+"Run her nose in shore," says the king.  I done it.  "Wher' you bound

+for, young man?"

+

+"For the steamboat; going to Orleans."

+

+"Git aboard," says the king.  "Hold on a minute, my servant 'll he'p you

+with them bags.  Jump out and he'p the gentleman, Adolphus"meaning me,

+I see.

+

+I done so, and then we all three started on again.  The young chap was

+mighty thankful; said it was tough work toting his baggage such weather.

+He asked the king where he was going, and the king told him he'd come

+down the river and landed at the other village this morning, and now he

+was going up a few mile to see an old friend on a farm up there.  The

+young fellow says:

+

+"When I first see you I says to myself, 'It's Mr. Wilks, sure, and he

+come mighty near getting here in time.'  But then I says again, 'No, I

+reckon it ain't him, or else he wouldn't be paddling up the river.'  You

+ain't him, are you?"

+

+"No, my name's BlodgettElexander BlodgettReverend Elexander

+Blodgett, I s'pose I must say, as I'm one o' the Lord's poor servants.

+ But still I'm jist as able to be sorry for Mr. Wilks for not arriving

+in time, all the same, if he's missed anything by itwhich I hope he

+hasn't."

+

+"Well, he don't miss any property by it, because he'll get that all

+right; but he's missed seeing his brother Peter diewhich he mayn't

+mind, nobody can tell as to thatbut his brother would a give anything

+in this world to see him before he died; never talked about nothing

+else all these three weeks; hadn't seen him since they was boys

+togetherand hadn't ever seen his brother William at allthat's the deef

+and dumb oneWilliam ain't more than thirty or thirty-five.  Peter and

+George were the only ones that come out here; George was the married

+brother; him and his wife both died last year.  Harvey and William's the

+only ones that's left now; and, as I was saying, they haven't got here

+in time."

+

+"Did anybody send 'em word?"

+

+"Oh, yes; a month or two ago, when Peter was first took; because Peter

+said then that he sorter felt like he warn't going to get well this

+time. You see, he was pretty old, and George's g'yirls was too young to

+be much company for him, except Mary Jane, the red-headed one; and so he

+was kinder lonesome after George and his wife died, and didn't seem

+to care much to live.  He most desperately wanted to see Harveyand

+William, too, for that matterbecause he was one of them kind that can't

+bear to make a will.  He left a letter behind for Harvey, and said he'd

+told in it where his money was hid, and how he wanted the rest of the

+property divided up so George's g'yirls would be all rightfor George

+didn't leave nothing.  And that letter was all they could get him to put

+a pen to."

+

+"Why do you reckon Harvey don't come?  Wher' does he live?"

+

+"Oh, he lives in EnglandSheffieldpreaches therehasn't ever been in

+this country.  He hasn't had any too much timeand besides he mightn't a

+got the letter at all, you know."

+

+"Too bad, too bad he couldn't a lived to see his brothers, poor soul.

+You going to Orleans, you say?"

+

+"Yes, but that ain't only a part of it.  I'm going in a ship, next

+Wednesday, for Ryo Janeero, where my uncle lives."

+

+"It's a pretty long journey.  But it'll be lovely; wisht I was a-going.

+Is Mary Jane the oldest?  How old is the others?"

+

+"Mary Jane's nineteen, Susan's fifteen, and Joanna's about

+fourteenthat's the one that gives herself to good works and has a

+hare-lip."

+

+"Poor things! to be left alone in the cold world so."

+

+"Well, they could be worse off.  Old Peter had friends, and they

+ain't going to let them come to no harm.  There's Hobson, the Babtis'

+preacher; and Deacon Lot Hovey, and Ben Rucker, and Abner Shackleford,

+and Levi Bell, the lawyer; and Dr. Robinson, and their wives, and the

+widow Bartley, andwell, there's a lot of them; but these are the ones

+that Peter was thickest with, and used to write about sometimes, when

+he wrote home; so Harvey 'll know where to look for friends when he gets

+here."

+

+Well, the old man went on asking questions till he just fairly emptied

+that young fellow.  Blamed if he didn't inquire about everybody and

+everything in that blessed town, and all about the Wilkses; and about

+Peter's businesswhich was a tanner; and about George'swhich was a

+carpenter; and about Harvey'swhich was a dissentering minister; and so

+on, and so on.  Then he says:

+

+"What did you want to walk all the way up to the steamboat for?"

+

+"Because she's a big Orleans boat, and I was afeard she mightn't stop

+there.  When they're deep they won't stop for a hail.  A Cincinnati boat

+will, but this is a St. Louis one."

+

+"Was Peter Wilks well off?"

+

+"Oh, yes, pretty well off.  He had houses and land, and it's reckoned he

+left three or four thousand in cash hid up som'ers."

+

+"When did you say he died?"

+

+"I didn't say, but it was last night."

+

+"Funeral to-morrow, likely?"

+

+"Yes, 'bout the middle of the day."

+

+"Well, it's all terrible sad; but we've all got to go, one time or

+another. So what we want to do is to be prepared; then we're all right."

+

+"Yes, sir, it's the best way.  Ma used to always say that."

+

+When we struck the boat she was about done loading, and pretty soon she

+got off.  The king never said nothing about going aboard, so I lost

+my ride, after all.  When the boat was gone the king made me paddle up

+another mile to a lonesome place, and then he got ashore and says:

+

+"Now hustle back, right off, and fetch the duke up here, and the new

+carpet-bags.  And if he's gone over to t'other side, go over there and

+git him.  And tell him to git himself up regardless.  Shove along, now."

+

+I see what he was up to; but I never said nothing, of course.  When

+I got back with the duke we hid the canoe, and then they set down on a

+log, and the king told him everything, just like the young fellow had

+said itevery last word of it.  And all the time he was a-doing it he

+tried to talk like an Englishman; and he done it pretty well, too, for

+a slouch. I can't imitate him, and so I ain't a-going to try to; but he

+really done it pretty good.  Then he says:

+

+"How are you on the deef and dumb, Bilgewater?"

+

+The duke said, leave him alone for that; said he had played a deef

+and dumb person on the histronic boards.  So then they waited for a

+steamboat.

+

+About the middle of the afternoon a couple of little boats come along,

+but they didn't come from high enough up the river; but at last there

+was a big one, and they hailed her.  She sent out her yawl, and we went

+aboard, and she was from Cincinnati; and when they found we only wanted

+to go four or five mile they was booming mad, and gave us a cussing, and

+said they wouldn't land us.  But the king was ca'm.  He says:

+

+"If gentlemen kin afford to pay a dollar a mile apiece to be took on and

+put off in a yawl, a steamboat kin afford to carry 'em, can't it?"

+

+So they softened down and said it was all right; and when we got to the

+village they yawled us ashore.  About two dozen men flocked down when

+they see the yawl a-coming, and when the king says:

+

+"Kin any of you gentlemen tell me wher' Mr. Peter Wilks lives?" they

+give a glance at one another, and nodded their heads, as much as to say,

+"What d' I tell you?"  Then one of them says, kind of soft and gentle:

+

+"I'm sorry sir, but the best we can do is to tell you where he did

+live yesterday evening."

+

+Sudden as winking the ornery old cretur went an to smash, and fell up

+against the man, and put his chin on his shoulder, and cried down his

+back, and says:

+

+"Alas, alas, our poor brothergone, and we never got to see him; oh,

+it's too, too hard!"

+

+Then he turns around, blubbering, and makes a lot of idiotic signs to

+the duke on his hands, and blamed if he didn't drop a carpet-bag and

+bust out a-crying.  If they warn't the beatenest lot, them two frauds,

+that ever I struck.

+

+Well, the men gathered around and sympathized with them, and said all

+sorts of kind things to them, and carried their carpet-bags up the hill

+for them, and let them lean on them and cry, and told the king all about

+his brother's last moments, and the king he told it all over again on

+his hands to the duke, and both of them took on about that dead tanner

+like they'd lost the twelve disciples.  Well, if ever I struck anything

+like it, I'm a nigger. It was enough to make a body ashamed of the human

+race.

+

+

+

+

+CHAPTER XXV.

+

+THE news was all over town in two minutes, and you could see the people

+tearing down on the run from every which way, some of them putting on

+their coats as they come.  Pretty soon we was in the middle of a crowd,

+and the noise of the tramping was like a soldier march.  The windows and

+dooryards was full; and every minute somebody would say, over a fence:

+

+"Is it them?"

+

+And somebody trotting along with the gang would answer back and say:

+

+"You bet it is."

+

+When we got to the house the street in front of it was packed, and the

+three girls was standing in the door.  Mary Jane was red-headed, but

+that don't make no difference, she was most awful beautiful, and her

+face and her eyes was all lit up like glory, she was so glad her uncles

+was come. The king he spread his arms, and Mary Jane she jumped for

+them, and the hare-lip jumped for the duke, and there they had it!

+ Everybody most, leastways women, cried for joy to see them meet again

+at last and have such good times.

+

+Then the king he hunched the duke privateI see him do itand then he

+looked around and see the coffin, over in the corner on two chairs; so

+then him and the duke, with a hand across each other's shoulder, and

+t'other hand to their eyes, walked slow and solemn over there, everybody

+dropping back to give them room, and all the talk and noise stopping,

+people saying "Sh!" and all the men taking their hats off and drooping

+their heads, so you could a heard a pin fall.  And when they got there

+they bent over and looked in the coffin, and took one sight, and then

+they bust out a-crying so you could a heard them to Orleans, most; and

+then they put their arms around each other's necks, and hung their chins

+over each other's shoulders; and then for three minutes, or maybe four,

+I never see two men leak the way they done.  And, mind you, everybody

+was doing the same; and the place was that damp I never see anything

+like it. Then one of them got on one side of the coffin, and t'other on

+t'other side, and they kneeled down and rested their foreheads on the

+coffin, and let on to pray all to themselves.  Well, when it come

+to that it worked the crowd like you never see anything like it, and

+everybody broke down and went to sobbing right out loudthe poor girls,

+too; and every woman, nearly, went up to the girls, without saying a

+word, and kissed them, solemn, on the forehead, and then put their hand

+on their head, and looked up towards the sky, with the tears running

+down, and then busted out and went off sobbing and swabbing, and give

+the next woman a show.  I never see anything so disgusting.

+

+Well, by and by the king he gets up and comes forward a little, and

+works himself up and slobbers out a speech, all full of tears and

+flapdoodle about its being a sore trial for him and his poor brother

+to lose the diseased, and to miss seeing diseased alive after the long

+journey of four thousand mile, but it's a trial that's sweetened and

+sanctified to us by this dear sympathy and these holy tears, and so he

+thanks them out of his heart and out of his brother's heart, because out

+of their mouths they can't, words being too weak and cold, and all that

+kind of rot and slush, till it was just sickening; and then he blubbers

+out a pious goody-goody Amen, and turns himself loose and goes to crying

+fit to bust.

+

+And the minute the words were out of his mouth somebody over in the

+crowd struck up the doxolojer, and everybody joined in with all their

+might, and it just warmed you up and made you feel as good as church

+letting out. Music is a good thing; and after all that soul-butter and

+hogwash I never see it freshen up things so, and sound so honest and

+bully.

+

+Then the king begins to work his jaw again, and says how him and his

+nieces would be glad if a few of the main principal friends of the

+family would take supper here with them this evening, and help set up

+with the ashes of the diseased; and says if his poor brother laying

+yonder could speak he knows who he would name, for they was names that

+was very dear to him, and mentioned often in his letters; and so he will

+name the same, to wit, as follows, vizz.:Rev. Mr. Hobson, and Deacon

+Lot Hovey, and Mr. Ben Rucker, and Abner Shackleford, and Levi Bell, and

+Dr. Robinson, and their wives, and the widow Bartley.

+

+Rev. Hobson and Dr. Robinson was down to the end of the town a-hunting

+togetherthat is, I mean the doctor was shipping a sick man to t'other

+world, and the preacher was pinting him right.  Lawyer Bell was away up

+to Louisville on business.  But the rest was on hand, and so they all

+come and shook hands with the king and thanked him and talked to him;

+and then they shook hands with the duke and didn't say nothing, but just

+kept a-smiling and bobbing their heads like a passel of sapheads whilst

+he made all sorts of signs with his hands and said "Goo-googoo-goo-goo"

+all the time, like a baby that can't talk.

+

+So the king he blattered along, and managed to inquire about pretty

+much everybody and dog in town, by his name, and mentioned all sorts

+of little things that happened one time or another in the town, or to

+George's family, or to Peter.  And he always let on that Peter wrote him

+the things; but that was a lie:  he got every blessed one of them out of

+that young flathead that we canoed up to the steamboat.

+

+Then Mary Jane she fetched the letter her father left behind, and the

+king he read it out loud and cried over it.  It give the dwelling-house

+and three thousand dollars, gold, to the girls; and it give the tanyard

+(which was doing a good business), along with some other houses and

+land (worth about seven thousand), and three thousand dollars in gold

+to Harvey and William, and told where the six thousand cash was hid down

+cellar.  So these two frauds said they'd go and fetch it up, and have

+everything square and above-board; and told me to come with a candle.

+ We shut the cellar door behind us, and when they found the bag

+they spilt it out on the floor, and it was a lovely sight, all them

+yaller-boys.  My, the way the king's eyes did shine!  He slaps the duke

+on the shoulder and says:

+

+"Oh, this ain't bully nor noth'n!  Oh, no, I reckon not!  Why,

+bully, it beats the Nonesuch, don't it?"

+

+The duke allowed it did.  They pawed the yaller-boys, and sifted them

+through their fingers and let them jingle down on the floor; and the

+king says:

+

+"It ain't no use talkin'; bein' brothers to a rich dead man and

+representatives of furrin heirs that's got left is the line for you and

+me, Bilge.  Thish yer comes of trust'n to Providence.  It's the best

+way, in the long run.  I've tried 'em all, and ther' ain't no better

+way."

+

+Most everybody would a been satisfied with the pile, and took it on

+trust; but no, they must count it.  So they counts it, and it comes out

+four hundred and fifteen dollars short.  Says the king:

+

+"Dern him, I wonder what he done with that four hundred and fifteen

+dollars?"

+

+They worried over that awhile, and ransacked all around for it.  Then

+the duke says:

+

+"Well, he was a pretty sick man, and likely he made a mistakeI reckon

+that's the way of it.  The best way's to let it go, and keep still about

+it.  We can spare it."

+

+"Oh, shucks, yes, we can spare it.  I don't k'yer noth'n 'bout

+thatit's the count I'm thinkin' about.  We want to be awful square

+and open and above-board here, you know.  We want to lug this h-yer

+money up stairs and count it before everybodythen ther' ain't noth'n

+suspicious.  But when the dead man says ther's six thous'n dollars, you

+know, we don't want to"

+

+"Hold on," says the duke.  "Le's make up the deffisit," and he begun to

+haul out yaller-boys out of his pocket.

+

+"It's a most amaz'n' good idea, dukeyou have got a rattlin' clever

+head on you," says the king.  "Blest if the old Nonesuch ain't a heppin'

+us out agin," and he begun to haul out yaller-jackets and stack them

+up.

+

+It most busted them, but they made up the six thousand clean and clear.

+

+"Say," says the duke, "I got another idea.  Le's go up stairs and count

+this money, and then take and give it to the girls."

+

+"Good land, duke, lemme hug you!  It's the most dazzling idea 'at ever a

+man struck.  You have cert'nly got the most astonishin' head I ever see.

+Oh, this is the boss dodge, ther' ain't no mistake 'bout it.  Let 'em

+fetch along their suspicions now if they want tothis 'll lay 'em out."

+

+When we got up-stairs everybody gethered around the table, and the king

+he counted it and stacked it up, three hundred dollars in a piletwenty

+elegant little piles.  Everybody looked hungry at it, and licked their

+chops.  Then they raked it into the bag again, and I see the king begin

+to swell himself up for another speech.  He says:

+

+"Friends all, my poor brother that lays yonder has done generous by

+them that's left behind in the vale of sorrers.  He has done generous by

+these yer poor little lambs that he loved and sheltered, and that's left

+fatherless and motherless.  Yes, and we that knowed him knows that he

+would a done more generous by 'em if he hadn't ben afeard o' woundin'

+his dear William and me.  Now, wouldn't he?  Ther' ain't no question

+'bout it in my mind.  Well, then, what kind o' brothers would it be

+that 'd stand in his way at sech a time?  And what kind o' uncles would

+it be that 'd robyes, robsech poor sweet lambs as these 'at he loved

+so at sech a time?  If I know Williamand I think I dohewell, I'll

+jest ask him." He turns around and begins to make a lot of signs to

+the duke with his hands, and the duke he looks at him stupid and

+leather-headed a while; then all of a sudden he seems to catch his

+meaning, and jumps for the king, goo-gooing with all his might for joy,

+and hugs him about fifteen times before he lets up.  Then the king says,

+"I knowed it; I reckon that 'll convince anybody the way he feels

+about it.  Here, Mary Jane, Susan, Joanner, take the moneytake it

+all.  It's the gift of him that lays yonder, cold but joyful."

+

+Mary Jane she went for him, Susan and the hare-lip went for the

+duke, and then such another hugging and kissing I never see yet.  And

+everybody crowded up with the tears in their eyes, and most shook the

+hands off of them frauds, saying all the time:

+

+"You dear good souls!how lovely!how could you!"

+

+Well, then, pretty soon all hands got to talking about the diseased

+again, and how good he was, and what a loss he was, and all that; and

+before long a big iron-jawed man worked himself in there from outside,

+and stood a-listening and looking, and not saying anything; and nobody

+saying anything to him either, because the king was talking and they was

+all busy listening.  The king was sayingin the middle of something he'd

+started in on

+

+"they bein' partickler friends o' the diseased.  That's why they're

+invited here this evenin'; but tomorrow we want all to comeeverybody;

+for he respected everybody, he liked everybody, and so it's fitten that

+his funeral orgies sh'd be public."

+

+And so he went a-mooning on and on, liking to hear himself talk, and

+every little while he fetched in his funeral orgies again, till the duke

+he couldn't stand it no more; so he writes on a little scrap of paper,

+"Obsequies, you old fool," and folds it up, and goes to goo-gooing and

+reaching it over people's heads to him.  The king he reads it and puts

+it in his pocket, and says:

+

+"Poor William, afflicted as he is, his heart's aluz right.  Asks me

+to invite everybody to come to the funeralwants me to make 'em all

+welcome.  But he needn't a worriedit was jest what I was at."

+

+Then he weaves along again, perfectly ca'm, and goes to dropping in his

+funeral orgies again every now and then, just like he done before.  And

+when he done it the third time he says:

+

+"I say orgies, not because it's the common term, because it

+ain'tobsequies bein' the common termbut because orgies is the right

+term. Obsequies ain't used in England no more nowit's gone out.  We

+say orgies now in England.  Orgies is better, because it means the thing

+you're after more exact.  It's a word that's made up out'n the Greek

+orgo, outside, open, abroad; and the Hebrew jeesum, to plant, cover

+up; hence inter.  So, you see, funeral orgies is an open er public

+funeral."

+

+He was the worst I ever struck.  Well, the iron-jawed man he laughed

+right in his face.  Everybody was shocked.  Everybody says, "Why,

+doctor!" and Abner Shackleford says:

+

+"Why, Robinson, hain't you heard the news?  This is Harvey Wilks."

+

+The king he smiled eager, and shoved out his flapper, and says:

+

+"Is it my poor brother's dear good friend and physician?  I"

+

+"Keep your hands off of me!" says the doctor.  "You talk like an

+Englishman, don't you?  It's the worst imitation I ever heard.  You

+Peter Wilks's brother!  You're a fraud, that's what you are!"

+

+Well, how they all took on!  They crowded around the doctor and tried to

+quiet him down, and tried to explain to him and tell him how Harvey 'd

+showed in forty ways that he was Harvey, and knowed everybody by name,

+and the names of the very dogs, and begged and begged him not to hurt

+Harvey's feelings and the poor girl's feelings, and all that.  But it

+warn't no use; he stormed right along, and said any man that pretended

+to be an Englishman and couldn't imitate the lingo no better than what

+he did was a fraud and a liar.  The poor girls was hanging to the king

+and crying; and all of a sudden the doctor ups and turns on them.  He

+says:

+

+"I was your father's friend, and I'm your friend; and I warn you as a

+friend, and an honest one that wants to protect you and keep you out of

+harm and trouble, to turn your backs on that scoundrel and have nothing

+to do with him, the ignorant tramp, with his idiotic Greek and Hebrew,

+as he calls it.  He is the thinnest kind of an impostorhas come here

+with a lot of empty names and facts which he picked up somewheres, and

+you take them for proofs, and are helped to fool yourselves by these

+foolish friends here, who ought to know better.  Mary Jane Wilks, you

+know me for your friend, and for your unselfish friend, too.  Now listen

+to me; turn this pitiful rascal outI beg you to do it.  Will you?"

+

+Mary Jane straightened herself up, and my, but she was handsome!  She

+says:

+

+"Here is my answer."  She hove up the bag of money and put it in the

+king's hands, and says, "Take this six thousand dollars, and invest for

+me and my sisters any way you want to, and don't give us no receipt for

+it."

+

+Then she put her arm around the king on one side, and Susan and the

+hare-lip done the same on the other.  Everybody clapped their hands and

+stomped on the floor like a perfect storm, whilst the king held up his

+head and smiled proud.  The doctor says:

+

+"All right; I wash my hands of the matter.  But I warn you all that a

+time 's coming when you're going to feel sick whenever you think of this

+day." And away he went.

+

+"All right, doctor," says the king, kinder mocking him; "we'll try and

+get 'em to send for you;" which made them all laugh, and they said it

+was a prime good hit.

+

+

+

+

+CHAPTER XXVI.

+

+WELL, when they was all gone the king he asks Mary Jane how they was off

+for spare rooms, and she said she had one spare room, which would do for

+Uncle William, and she'd give her own room to Uncle Harvey, which was

+a little bigger, and she would turn into the room with her sisters and

+sleep on a cot; and up garret was a little cubby, with a pallet in it.

+The king said the cubby would do for his valleymeaning me.

+

+So Mary Jane took us up, and she showed them their rooms, which was

+plain but nice.  She said she'd have her frocks and a lot of other traps

+took out of her room if they was in Uncle Harvey's way, but he said

+they warn't.  The frocks was hung along the wall, and before them was

+a curtain made out of calico that hung down to the floor.  There was an

+old hair trunk in one corner, and a guitar-box in another, and all sorts

+of little knickknacks and jimcracks around, like girls brisken up a room

+with.  The king said it was all the more homely and more pleasanter for

+these fixings, and so don't disturb them.  The duke's room was pretty

+small, but plenty good enough, and so was my cubby.

+

+That night they had a big supper, and all them men and women was there,

+and I stood behind the king and the duke's chairs and waited on them,

+and the niggers waited on the rest.  Mary Jane she set at the head of

+the table, with Susan alongside of her, and said how bad the biscuits

+was, and how mean the preserves was, and how ornery and tough the fried

+chickens wasand all that kind of rot, the way women always do for to

+force out compliments; and the people all knowed everything was tiptop,

+and said sosaid "How do you get biscuits to brown so nice?" and

+"Where, for the land's sake, did you get these amaz'n pickles?" and

+all that kind of humbug talky-talk, just the way people always does at a

+supper, you know.

+

+And when it was all done me and the hare-lip had supper in the kitchen

+off of the leavings, whilst the others was helping the niggers clean up

+the things.  The hare-lip she got to pumping me about England, and blest

+if I didn't think the ice was getting mighty thin sometimes.  She says:

+

+"Did you ever see the king?"

+

+"Who?  William Fourth?  Well, I bet I havehe goes to our church."  I

+knowed he was dead years ago, but I never let on.  So when I says he

+goes to our church, she says:

+

+"Whatregular?"

+

+"Yesregular.  His pew's right over opposite ournon t'other side the

+pulpit."

+

+"I thought he lived in London?"

+

+"Well, he does.  Where would he live?"

+

+"But I thought you lived in Sheffield?"

+

+I see I was up a stump.  I had to let on to get choked with a chicken

+bone, so as to get time to think how to get down again.  Then I says:

+

+"I mean he goes to our church regular when he's in Sheffield.  That's

+only in the summer time, when he comes there to take the sea baths."

+

+"Why, how you talkSheffield ain't on the sea."

+

+"Well, who said it was?"

+

+"Why, you did."

+

+"I didn't nuther."

+

+"You did!"

+

+"I didn't."

+

+"You did."

+

+"I never said nothing of the kind."

+

+"Well, what did you say, then?"

+

+"Said he come to take the sea bathsthat's what I said."

+

+"Well, then, how's he going to take the sea baths if it ain't on the

+sea?"

+

+"Looky here," I says; "did you ever see any Congress-water?"

+

+"Yes."

+

+"Well, did you have to go to Congress to get it?"

+

+"Why, no."

+

+"Well, neither does William Fourth have to go to the sea to get a sea

+bath."

+

+"How does he get it, then?"

+

+"Gets it the way people down here gets Congress-waterin barrels.  There

+in the palace at Sheffield they've got furnaces, and he wants his water

+hot.  They can't bile that amount of water away off there at the sea.

+They haven't got no conveniences for it."

+

+"Oh, I see, now.  You might a said that in the first place and saved

+time."

+

+When she said that I see I was out of the woods again, and so I was

+comfortable and glad.  Next, she says:

+

+"Do you go to church, too?"

+

+"Yesregular."

+

+"Where do you set?"

+

+"Why, in our pew."

+

+"Whose pew?"

+

+"Why, ournyour Uncle Harvey's."

+

+"His'n?  What does he want with a pew?"

+

+"Wants it to set in.  What did you reckon he wanted with it?"

+

+"Why, I thought he'd be in the pulpit."

+

+Rot him, I forgot he was a preacher.  I see I was up a stump again, so I

+played another chicken bone and got another think.  Then I says:

+

+"Blame it, do you suppose there ain't but one preacher to a church?"

+

+"Why, what do they want with more?"

+

+"What!to preach before a king?  I never did see such a girl as you.

+They don't have no less than seventeen."

+

+"Seventeen!  My land!  Why, I wouldn't set out such a string as that,

+not if I never got to glory.  It must take 'em a week."

+

+"Shucks, they don't all of 'em preach the same dayonly one of 'em."

+

+"Well, then, what does the rest of 'em do?"

+

+"Oh, nothing much.  Loll around, pass the plateand one thing or

+another.  But mainly they don't do nothing."

+

+"Well, then, what are they for?"

+

+"Why, they're for style.  Don't you know nothing?"

+

+"Well, I don't want to know no such foolishness as that.  How is

+servants treated in England?  Do they treat 'em better 'n we treat our

+niggers?"

+

+"No!  A servant ain't nobody there.  They treat them worse than dogs."

+

+"Don't they give 'em holidays, the way we do, Christmas and New Year's

+week, and Fourth of July?"

+

+"Oh, just listen!  A body could tell you hain't ever been to England

+by that.  Why, Hare-lwhy, Joanna, they never see a holiday from year's

+end to year's end; never go to the circus, nor theater, nor nigger

+shows, nor nowheres."

+

+"Nor church?"

+

+"Nor church."

+

+"But you always went to church."

+

+Well, I was gone up again.  I forgot I was the old man's servant.  But

+next minute I whirled in on a kind of an explanation how a valley was

+different from a common servant and had to go to church whether he

+wanted to or not, and set with the family, on account of its being the

+law.  But I didn't do it pretty good, and when I got done I see she

+warn't satisfied.  She says:

+

+"Honest injun, now, hain't you been telling me a lot of lies?"

+

+"Honest injun," says I.

+

+"None of it at all?"

+

+"None of it at all.  Not a lie in it," says I.

+

+"Lay your hand on this book and say it."

+

+I see it warn't nothing but a dictionary, so I laid my hand on it and

+said it.  So then she looked a little better satisfied, and says:

+

+"Well, then, I'll believe some of it; but I hope to gracious if I'll

+believe the rest."

+

+"What is it you won't believe, Joe?" says Mary Jane, stepping in with

+Susan behind her.  "It ain't right nor kind for you to talk so to him,

+and him a stranger and so far from his people.  How would you like to be

+treated so?"

+

+"That's always your way, Maimalways sailing in to help somebody before

+they're hurt.  I hain't done nothing to him.  He's told some stretchers,

+I reckon, and I said I wouldn't swallow it all; and that's every bit

+and grain I did say.  I reckon he can stand a little thing like that,

+can't he?"

+

+"I don't care whether 'twas little or whether 'twas big; he's here in

+our house and a stranger, and it wasn't good of you to say it.  If you

+was in his place it would make you feel ashamed; and so you oughtn't to

+say a thing to another person that will make them feel ashamed."

+

+"Why, Mam, he said"

+

+"It don't make no difference what he saidthat ain't the thing.  The

+thing is for you to treat him kind, and not be saying things to make

+him remember he ain't in his own country and amongst his own folks."

+

+I says to myself, this is a girl that I'm letting that old reptile rob

+her of her money!

+

+Then Susan she waltzed in; and if you'll believe me, she did give

+Hare-lip hark from the tomb!

+

+Says I to myself, and this is another one that I'm letting him rob her

+of her money!

+

+Then Mary Jane she took another inning, and went in sweet and lovely

+againwhich was her way; but when she got done there warn't hardly

+anything left o' poor Hare-lip.  So she hollered.

+

+"All right, then," says the other girls; "you just ask his pardon."

+

+She done it, too; and she done it beautiful.  She done it so beautiful

+it was good to hear; and I wished I could tell her a thousand lies, so

+she could do it again.

+

+I says to myself, this is another one that I'm letting him rob her of

+her money.  And when she got through they all jest laid theirselves

+out to make me feel at home and know I was amongst friends.  I felt so

+ornery and low down and mean that I says to myself, my mind's made up;

+I'll hive that money for them or bust.

+

+So then I lit outfor bed, I said, meaning some time or another.  When

+I got by myself I went to thinking the thing over.  I says to myself,

+shall I go to that doctor, private, and blow on these frauds?  Nothat

+won't do. He might tell who told him; then the king and the duke would

+make it warm for me.  Shall I go, private, and tell Mary Jane?  NoI

+dasn't do it. Her face would give them a hint, sure; they've got the

+money, and they'd slide right out and get away with it.  If she was to

+fetch in help I'd get mixed up in the business before it was done with,

+I judge.  No; there ain't no good way but one.  I got to steal that

+money, somehow; and I got to steal it some way that they won't suspicion

+that I done it. They've got a good thing here, and they ain't a-going

+to leave till they've played this family and this town for all they're

+worth, so I'll find a chance time enough. I'll steal it and hide it; and

+by and by, when I'm away down the river, I'll write a letter and tell

+Mary Jane where it's hid.  But I better hive it tonight if I can,

+because the doctor maybe hasn't let up as much as he lets on he has; he

+might scare them out of here yet.

+

+So, thinks I, I'll go and search them rooms.  Upstairs the hall was

+dark, but I found the duke's room, and started to paw around it with

+my hands; but I recollected it wouldn't be much like the king to let

+anybody else take care of that money but his own self; so then I went to

+his room and begun to paw around there.  But I see I couldn't do nothing

+without a candle, and I dasn't light one, of course.  So I judged I'd

+got to do the other thinglay for them and eavesdrop.  About that time

+I hears their footsteps coming, and was going to skip under the bed; I

+reached for it, but it wasn't where I thought it would be; but I touched

+the curtain that hid Mary Jane's frocks, so I jumped in behind that and

+snuggled in amongst the gowns, and stood there perfectly still.

+

+They come in and shut the door; and the first thing the duke done was to

+get down and look under the bed.  Then I was glad I hadn't found the bed

+when I wanted it.  And yet, you know, it's kind of natural to hide under

+the bed when you are up to anything private.  They sets down then, and

+the king says:

+

+"Well, what is it?  And cut it middlin' short, because it's better for

+us to be down there a-whoopin' up the mournin' than up here givin' 'em a

+chance to talk us over."

+

+"Well, this is it, Capet.  I ain't easy; I ain't comfortable.  That

+doctor lays on my mind.  I wanted to know your plans.  I've got a

+notion, and I think it's a sound one."

+

+"What is it, duke?"

+

+"That we better glide out of this before three in the morning, and clip

+it down the river with what we've got.  Specially, seeing we got it so

+easygiven back to us, flung at our heads, as you may say, when of

+course we allowed to have to steal it back.  I'm for knocking off and

+lighting out."

+

+That made me feel pretty bad.  About an hour or two ago it would a been

+a little different, but now it made me feel bad and disappointed, The

+king rips out and says:

+

+"What!  And not sell out the rest o' the property?  March off like

+a passel of fools and leave eight or nine thous'n' dollars' worth o'

+property layin' around jest sufferin' to be scooped in?and all good,

+salable stuff, too."

+

+The duke he grumbled; said the bag of gold was enough, and he didn't

+want to go no deeperdidn't want to rob a lot of orphans of everything

+they had.

+

+"Why, how you talk!" says the king.  "We sha'n't rob 'em of nothing at

+all but jest this money.  The people that buys the property is the

+suff'rers; because as soon 's it's found out 'at we didn't own itwhich

+won't be long after we've slidthe sale won't be valid, and it 'll all

+go back to the estate.  These yer orphans 'll git their house back agin,

+and that's enough for them; they're young and spry, and k'n easy

+earn a livin'.  they ain't a-goin to suffer.  Why, jest thinkthere's

+thous'n's and thous'n's that ain't nigh so well off.  Bless you, they

+ain't got noth'n' to complain of."

+

+Well, the king he talked him blind; so at last he give in, and said all

+right, but said he believed it was blamed foolishness to stay, and that

+doctor hanging over them.  But the king says:

+

+"Cuss the doctor!  What do we k'yer for him?  Hain't we got all the

+fools in town on our side?  And ain't that a big enough majority in any

+town?"

+

+So they got ready to go down stairs again.  The duke says:

+

+"I don't think we put that money in a good place."

+

+That cheered me up.  I'd begun to think I warn't going to get a hint of

+no kind to help me.  The king says:

+

+"Why?"

+

+"Because Mary Jane 'll be in mourning from this out; and first you know

+the nigger that does up the rooms will get an order to box these duds

+up and put 'em away; and do you reckon a nigger can run across money and

+not borrow some of it?"

+

+"Your head's level agin, duke," says the king; and he comes a-fumbling

+under the curtain two or three foot from where I was.  I stuck tight to

+the wall and kept mighty still, though quivery; and I wondered what them

+fellows would say to me if they catched me; and I tried to think what

+I'd better do if they did catch me.  But the king he got the bag before

+I could think more than about a half a thought, and he never suspicioned

+I was around.  They took and shoved the bag through a rip in the straw

+tick that was under the feather-bed, and crammed it in a foot or two

+amongst the straw and said it was all right now, because a nigger only

+makes up the feather-bed, and don't turn over the straw tick only about

+twice a year, and so it warn't in no danger of getting stole now.

+

+But I knowed better.  I had it out of there before they was half-way

+down stairs.  I groped along up to my cubby, and hid it there till I

+could get a chance to do better.  I judged I better hide it outside

+of the house somewheres, because if they missed it they would give the

+house a good ransacking:  I knowed that very well.  Then I turned in,

+with my clothes all on; but I couldn't a gone to sleep if I'd a wanted

+to, I was in such a sweat to get through with the business.  By and by I

+heard the king and the duke come up; so I rolled off my pallet and laid

+with my chin at the top of my ladder, and waited to see if anything was

+going to happen.  But nothing did.

+

+So I held on till all the late sounds had quit and the early ones hadn't

+begun yet; and then I slipped down the ladder.

+

+

+

+

+CHAPTER XXVII.

+

+I crept to their doors and listened; they was snoring.  So I tiptoed

+along, and got down stairs all right.  There warn't a sound anywheres.

+ I peeped through a crack of the dining-room door, and see the men that

+was watching the corpse all sound asleep on their chairs.  The door

+was open into the parlor, where the corpse was laying, and there was a

+candle in both rooms. I passed along, and the parlor door was open; but

+I see there warn't nobody in there but the remainders of Peter; so I

+shoved on by; but the front door was locked, and the key wasn't there.

+ Just then I heard somebody coming down the stairs, back behind me.  I

+run in the parlor and took a swift look around, and the only place I

+see to hide the bag was in the coffin.  The lid was shoved along about

+a foot, showing the dead man's face down in there, with a wet cloth over

+it, and his shroud on.  I tucked the money-bag in under the lid, just

+down beyond where his hands was crossed, which made me creep, they was

+so cold, and then I run back across the room and in behind the door.

+

+The person coming was Mary Jane.  She went to the coffin, very soft, and

+kneeled down and looked in; then she put up her handkerchief, and I see

+she begun to cry, though I couldn't hear her, and her back was to me.  I

+slid out, and as I passed the dining-room I thought I'd make sure them

+watchers hadn't seen me; so I looked through the crack, and everything

+was all right.  They hadn't stirred.

+

+I slipped up to bed, feeling ruther blue, on accounts of the thing

+playing out that way after I had took so much trouble and run so much

+resk about it.  Says I, if it could stay where it is, all right; because

+when we get down the river a hundred mile or two I could write back to

+Mary Jane, and she could dig him up again and get it; but that ain't the

+thing that's going to happen; the thing that's going to happen is, the

+money 'll be found when they come to screw on the lid.  Then the king

+'ll get it again, and it 'll be a long day before he gives anybody

+another chance to smouch it from him. Of course I wanted to slide

+down and get it out of there, but I dasn't try it.  Every minute it was

+getting earlier now, and pretty soon some of them watchers would begin

+to stir, and I might get catchedcatched with six thousand dollars in my

+hands that nobody hadn't hired me to take care of.  I don't wish to be

+mixed up in no such business as that, I says to myself.

+

+When I got down stairs in the morning the parlor was shut up, and the

+watchers was gone.  There warn't nobody around but the family and the

+widow Bartley and our tribe.  I watched their faces to see if anything

+had been happening, but I couldn't tell.

+

+Towards the middle of the day the undertaker come with his man, and they

+set the coffin in the middle of the room on a couple of chairs, and then

+set all our chairs in rows, and borrowed more from the neighbors till

+the hall and the parlor and the dining-room was full.  I see the coffin

+lid was the way it was before, but I dasn't go to look in under it, with

+folks around.

+

+Then the people begun to flock in, and the beats and the girls took

+seats in the front row at the head of the coffin, and for a half an hour

+the people filed around slow, in single rank, and looked down at the

+dead man's face a minute, and some dropped in a tear, and it was

+all very still and solemn, only the girls and the beats holding

+handkerchiefs to their eyes and keeping their heads bent, and sobbing a

+little.  There warn't no other sound but the scraping of the feet on

+the floor and blowing nosesbecause people always blows them more at a

+funeral than they do at other places except church.

+

+When the place was packed full the undertaker he slid around in his

+black gloves with his softy soothering ways, putting on the last

+touches, and getting people and things all ship-shape and comfortable,

+and making no more sound than a cat.  He never spoke; he moved people

+around, he squeezed in late ones, he opened up passageways, and done

+it with nods, and signs with his hands.  Then he took his place over

+against the wall. He was the softest, glidingest, stealthiest man I ever

+see; and there warn't no more smile to him than there is to a ham.

+

+They had borrowed a melodeuma sick one; and when everything was ready

+a young woman set down and worked it, and it was pretty skreeky and

+colicky, and everybody joined in and sung, and Peter was the only one

+that had a good thing, according to my notion.  Then the Reverend Hobson

+opened up, slow and solemn, and begun to talk; and straight off the most

+outrageous row busted out in the cellar a body ever heard; it was only

+one dog, but he made a most powerful racket, and he kept it up right

+along; the parson he had to stand there, over the coffin, and waityou

+couldn't hear yourself think.  It was right down awkward, and nobody

+didn't seem to know what to do.  But pretty soon they see that

+long-legged undertaker make a sign to the preacher as much as to say,

+"Don't you worryjust depend on me."  Then he stooped down and begun

+to glide along the wall, just his shoulders showing over the people's

+heads.  So he glided along, and the powwow and racket getting more and

+more outrageous all the time; and at last, when he had gone around two

+sides of the room, he disappears down cellar.  Then in about two seconds

+we heard a whack, and the dog he finished up with a most amazing howl or

+two, and then everything was dead still, and the parson begun his solemn

+talk where he left off.  In a minute or two here comes this undertaker's

+back and shoulders gliding along the wall again; and so he glided and

+glided around three sides of the room, and then rose up, and shaded his

+mouth with his hands, and stretched his neck out towards the preacher,

+over the people's heads, and says, in a kind of a coarse whisper, "He

+had a rat!"  Then he drooped down and glided along the wall again to

+his place.  You could see it was a great satisfaction to the people,

+because naturally they wanted to know.  A little thing like that don't

+cost nothing, and it's just the little things that makes a man to be

+looked up to and liked.  There warn't no more popular man in town than

+what that undertaker was.

+

+Well, the funeral sermon was very good, but pison long and tiresome; and

+then the king he shoved in and got off some of his usual rubbage, and

+at last the job was through, and the undertaker begun to sneak up on the

+coffin with his screw-driver.  I was in a sweat then, and watched him

+pretty keen. But he never meddled at all; just slid the lid along as

+soft as mush, and screwed it down tight and fast.  So there I was!  I

+didn't know whether the money was in there or not.  So, says I, s'pose

+somebody has hogged that bag on the sly?now how do I know whether

+to write to Mary Jane or not? S'pose she dug him up and didn't find

+nothing, what would she think of me? Blame it, I says, I might get

+hunted up and jailed; I'd better lay low and keep dark, and not write at

+all; the thing's awful mixed now; trying to better it, I've worsened it

+a hundred times, and I wish to goodness I'd just let it alone, dad fetch

+the whole business!

+

+They buried him, and we come back home, and I went to watching faces

+againI couldn't help it, and I couldn't rest easy.  But nothing come of

+it; the faces didn't tell me nothing.

+

+The king he visited around in the evening, and sweetened everybody up,

+and made himself ever so friendly; and he give out the idea that his

+congregation over in England would be in a sweat about him, so he must

+hurry and settle up the estate right away and leave for home.  He was

+very sorry he was so pushed, and so was everybody; they wished he could

+stay longer, but they said they could see it couldn't be done.  And he

+said of course him and William would take the girls home with them; and

+that pleased everybody too, because then the girls would be well fixed

+and amongst their own relations; and it pleased the girls, tootickled

+them so they clean forgot they ever had a trouble in the world; and told

+him to sell out as quick as he wanted to, they would be ready.  Them

+poor things was that glad and happy it made my heart ache to see them

+getting fooled and lied to so, but I didn't see no safe way for me to

+chip in and change the general tune.

+

+Well, blamed if the king didn't bill the house and the niggers and all

+the property for auction straight offsale two days after the funeral;

+but anybody could buy private beforehand if they wanted to.

+

+So the next day after the funeral, along about noon-time, the girls' joy

+got the first jolt.  A couple of nigger traders come along, and the king

+sold them the niggers reasonable, for three-day drafts as they called

+it, and away they went, the two sons up the river to Memphis, and their

+mother down the river to Orleans.  I thought them poor girls and them

+niggers would break their hearts for grief; they cried around each

+other, and took on so it most made me down sick to see it.  The girls

+said they hadn't ever dreamed of seeing the family separated or sold

+away from the town.  I can't ever get it out of my memory, the sight of

+them poor miserable girls and niggers hanging around each other's necks

+and crying; and I reckon I couldn't a stood it all, but would a had

+to bust out and tell on our gang if I hadn't knowed the sale warn't no

+account and the niggers would be back home in a week or two.

+

+The thing made a big stir in the town, too, and a good many come out

+flatfooted and said it was scandalous to separate the mother and the

+children that way.  It injured the frauds some; but the old fool he

+bulled right along, spite of all the duke could say or do, and I tell

+you the duke was powerful uneasy.

+

+Next day was auction day.  About broad day in the morning the king and

+the duke come up in the garret and woke me up, and I see by their look

+that there was trouble.  The king says:

+

+"Was you in my room night before last?"

+

+"No, your majesty"which was the way I always called him when nobody but

+our gang warn't around.

+

+"Was you in there yisterday er last night?"

+

+"No, your majesty."

+

+"Honor bright, nowno lies."

+

+"Honor bright, your majesty, I'm telling you the truth.  I hain't been

+a-near your room since Miss Mary Jane took you and the duke and showed

+it to you."

+

+The duke says:

+

+"Have you seen anybody else go in there?"

+

+"No, your grace, not as I remember, I believe."

+

+"Stop and think."

+

+I studied awhile and see my chance; then I says:

+

+"Well, I see the niggers go in there several times."

+

+Both of them gave a little jump, and looked like they hadn't ever

+expected it, and then like they had.  Then the duke says:

+

+"What, all of them?"

+

+"Noleastways, not all at oncethat is, I don't think I ever see them

+all come out at once but just one time."

+

+"Hello!  When was that?"

+

+"It was the day we had the funeral.  In the morning.  It warn't early,

+because I overslept.  I was just starting down the ladder, and I see

+them."

+

+"Well, go on, go on!  What did they do?  How'd they act?"

+

+"They didn't do nothing.  And they didn't act anyway much, as fur as I

+see. They tiptoed away; so I seen, easy enough, that they'd shoved in

+there to do up your majesty's room, or something, s'posing you was up;

+and found you warn't up, and so they was hoping to slide out of the

+way of trouble without waking you up, if they hadn't already waked you

+up."

+

+"Great guns, this is a go!" says the king; and both of them looked

+pretty sick and tolerable silly.  They stood there a-thinking and

+scratching their heads a minute, and the duke he bust into a kind of a

+little raspy chuckle, and says:

+

+"It does beat all how neat the niggers played their hand.  They let on

+to be sorry they was going out of this region!  And I believed they

+was sorry, and so did you, and so did everybody.  Don't ever tell me

+any more that a nigger ain't got any histrionic talent.  Why, the way

+they played that thing it would fool anybody.  In my opinion, there's

+a fortune in 'em.  If I had capital and a theater, I wouldn't want a

+better lay-out than thatand here we've gone and sold 'em for a song.

+ Yes, and ain't privileged to sing the song yet.  Say, where is that

+songthat draft?"

+

+"In the bank for to be collected.  Where would it be?"

+

+"Well, that's all right then, thank goodness."

+

+Says I, kind of timid-like:

+

+"Is something gone wrong?"

+

+The king whirls on me and rips out:

+

+"None o' your business!  You keep your head shet, and mind y'r own

+affairsif you got any.  Long as you're in this town don't you forgit

+thatyou hear?"  Then he says to the duke, "We got to jest swaller it

+and say noth'n':  mum's the word for us."

+

+As they was starting down the ladder the duke he chuckles again, and

+says:

+

+"Quick sales and small profits!  It's a good businessyes."

+

+The king snarls around on him and says:

+

+"I was trying to do for the best in sellin' 'em out so quick.  If the

+profits has turned out to be none, lackin' considable, and none to

+carry, is it my fault any more'n it's yourn?"

+

+"Well, they'd be in this house yet and we wouldn't if I could a got

+my advice listened to."

+

+The king sassed back as much as was safe for him, and then swapped

+around and lit into me again.  He give me down the banks for not

+coming and telling him I see the niggers come out of his room acting

+that waysaid any fool would a knowed something was up.  And then

+waltzed in and cussed himself awhile, and said it all come of him not

+laying late and taking his natural rest that morning, and he'd be

+blamed if he'd ever do it again.  So they went off a-jawing; and I felt

+dreadful glad I'd worked it all off on to the niggers, and yet hadn't

+done the niggers no harm by it.

+

+

+

+

+CHAPTER XXVIII.

+

+BY and by it was getting-up time.  So I come down the ladder and started

+for down-stairs; but as I come to the girls' room the door was open, and

+I see Mary Jane setting by her old hair trunk, which was open and she'd

+been packing things in itgetting ready to go to England.  But she

+had stopped now with a folded gown in her lap, and had her face in her

+hands, crying.  I felt awful bad to see it; of course anybody would.  I

+went in there and says:

+

+"Miss Mary Jane, you can't a-bear to see people in trouble, and I

+can'tmost always.  Tell me about it."

+

+So she done it.  And it was the niggersI just expected it.  She said

+the beautiful trip to England was most about spoiled for her; she didn't

+know how she was ever going to be happy there, knowing the mother and

+the children warn't ever going to see each other no moreand then busted

+out bitterer than ever, and flung up her hands, and says:

+

+"Oh, dear, dear, to think they ain't ever going to see each other any

+more!"

+

+"But they willand inside of two weeksand I know it!" says I.

+

+Laws, it was out before I could think!  And before I could budge she

+throws her arms around my neck and told me to say it again, say it

+again, say it again!

+

+I see I had spoke too sudden and said too much, and was in a close

+place. I asked her to let me think a minute; and she set there, very

+impatient and excited and handsome, but looking kind of happy and

+eased-up, like a person that's had a tooth pulled out.  So I went to

+studying it out.  I says to myself, I reckon a body that ups and tells

+the truth when he is in a tight place is taking considerable many resks,

+though I ain't had no experience, and can't say for certain; but it

+looks so to me, anyway; and yet here's a case where I'm blest if it

+don't look to me like the truth is better and actuly safer than a lie.

+ I must lay it by in my mind, and think it over some time or other, it's

+so kind of strange and unregular. I never see nothing like it.  Well, I

+says to myself at last, I'm a-going to chance it; I'll up and tell the

+truth this time, though it does seem most like setting down on a kag of

+powder and touching it off just to see where you'll go to. Then I says:

+

+"Miss Mary Jane, is there any place out of town a little ways where you

+could go and stay three or four days?"

+

+"Yes; Mr. Lothrop's.  Why?"

+

+"Never mind why yet.  If I'll tell you how I know the niggers will see

+each other again inside of two weekshere in this houseand prove how

+I know itwill you go to Mr. Lothrop's and stay four days?"

+

+"Four days!" she says; "I'll stay a year!"

+

+"All right," I says, "I don't want nothing more out of you than just

+your wordI druther have it than another man's kiss-the-Bible."  She

+smiled and reddened up very sweet, and I says, "If you don't mind it,

+I'll shut the doorand bolt it."

+

+Then I come back and set down again, and says:

+

+"Don't you holler.  Just set still and take it like a man.  I got to

+tell the truth, and you want to brace up, Miss Mary, because it's a

+bad kind, and going to be hard to take, but there ain't no help for

+it.  These uncles of yourn ain't no uncles at all; they're a couple of

+fraudsregular dead-beats.  There, now we're over the worst of it, you

+can stand the rest middling easy."

+

+It jolted her up like everything, of course; but I was over the shoal

+water now, so I went right along, her eyes a-blazing higher and higher

+all the time, and told her every blame thing, from where we first struck

+that young fool going up to the steamboat, clear through to where she

+flung herself on to the king's breast at the front door and he kissed

+her sixteen or seventeen timesand then up she jumps, with her face

+afire like sunset, and says:

+

+"The brute!  Come, don't waste a minutenot a secondwe'll have them

+tarred and feathered, and flung in the river!"

+

+Says I:

+

+"Cert'nly.  But do you mean before you go to Mr. Lothrop's, or"

+

+"Oh," she says, "what am I thinking about!" she says, and set right

+down again.  "Don't mind what I saidplease don'tyou won't, now,

+will you?" Laying her silky hand on mine in that kind of a way that

+I said I would die first.  "I never thought, I was so stirred up," she

+says; "now go on, and I won't do so any more.  You tell me what to do,

+and whatever you say I'll do it."

+

+"Well," I says, "it's a rough gang, them two frauds, and I'm fixed so

+I got to travel with them a while longer, whether I want to or notI

+druther not tell you why; and if you was to blow on them this town would

+get me out of their claws, and I'd be all right; but there'd be another

+person that you don't know about who'd be in big trouble.  Well, we

+got to save him, hain't we?  Of course.  Well, then, we won't blow on

+them."

+

+Saying them words put a good idea in my head.  I see how maybe I could

+get me and Jim rid of the frauds; get them jailed here, and then leave.

+But I didn't want to run the raft in the daytime without anybody aboard

+to answer questions but me; so I didn't want the plan to begin working

+till pretty late to-night.  I says:

+

+"Miss Mary Jane, I'll tell you what we'll do, and you won't have to stay

+at Mr. Lothrop's so long, nuther.  How fur is it?"

+

+"A little short of four milesright out in the country, back here."

+

+"Well, that 'll answer.  Now you go along out there, and lay low

+till nine or half-past to-night, and then get them to fetch you home

+againtell them you've thought of something.  If you get here before

+eleven put a candle in this window, and if I don't turn up wait till

+eleven, and then if I don't turn up it means I'm gone, and out of the

+way, and safe. Then you come out and spread the news around, and get

+these beats jailed."

+

+"Good," she says, "I'll do it."

+

+"And if it just happens so that I don't get away, but get took up along

+with them, you must up and say I told you the whole thing beforehand,

+and you must stand by me all you can."

+

+"Stand by you! indeed I will.  They sha'n't touch a hair of your head!"

+she says, and I see her nostrils spread and her eyes snap when she said

+it, too.

+

+"If I get away I sha'n't be here," I says, "to prove these rapscallions

+ain't your uncles, and I couldn't do it if I was here.  I could swear

+they was beats and bummers, that's all, though that's worth something.

+Well, there's others can do that better than what I can, and they're

+people that ain't going to be doubted as quick as I'd be.  I'll tell you

+how to find them.  Gimme a pencil and a piece of paper.  There'Royal

+Nonesuch, Bricksville.'  Put it away, and don't lose it.  When the

+court wants to find out something about these two, let them send up to

+Bricksville and say they've got the men that played the Royal Nonesuch,

+and ask for some witnesseswhy, you'll have that entire town down here

+before you can hardly wink, Miss Mary.  And they'll come a-biling, too."

+

+I judged we had got everything fixed about right now.  So I says:

+

+"Just let the auction go right along, and don't worry.  Nobody don't

+have to pay for the things they buy till a whole day after the auction

+on accounts of the short notice, and they ain't going out of this till

+they get that money; and the way we've fixed it the sale ain't going to

+count, and they ain't going to get no money.  It's just like the way

+it was with the niggersit warn't no sale, and the niggers will be

+back before long.  Why, they can't collect the money for the niggers

+yetthey're in the worst kind of a fix, Miss Mary."

+

+"Well," she says, "I'll run down to breakfast now, and then I'll start

+straight for Mr. Lothrop's."

+

+"'Deed, that ain't the ticket, Miss Mary Jane," I says, "by no manner

+of means; go before breakfast."

+

+"Why?"

+

+"What did you reckon I wanted you to go at all for, Miss Mary?"

+

+"Well, I never thoughtand come to think, I don't know.  What was it?"

+

+"Why, it's because you ain't one of these leather-face people.  I don't

+want no better book than what your face is.  A body can set down and

+read it off like coarse print.  Do you reckon you can go and face your

+uncles when they come to kiss you good-morning, and never"

+

+"There, there, don't!  Yes, I'll go before breakfastI'll be glad to.

+And leave my sisters with them?"

+

+"Yes; never mind about them.  They've got to stand it yet a while.  They

+might suspicion something if all of you was to go.  I don't want you to

+see them, nor your sisters, nor nobody in this town; if a neighbor was

+to ask how is your uncles this morning your face would tell something.

+ No, you go right along, Miss Mary Jane, and I'll fix it with all of

+them. I'll tell Miss Susan to give your love to your uncles and say

+you've went away for a few hours for to get a little rest and change, or

+to see a friend, and you'll be back to-night or early in the morning."

+

+"Gone to see a friend is all right, but I won't have my love given to

+them."

+

+"Well, then, it sha'n't be."  It was well enough to tell her sono

+harm in it.  It was only a little thing to do, and no trouble; and it's

+the little things that smooths people's roads the most, down here below;

+it would make Mary Jane comfortable, and it wouldn't cost nothing.  Then

+I says:  "There's one more thingthat bag of money."

+

+"Well, they've got that; and it makes me feel pretty silly to think

+how they got it."

+

+"No, you're out, there.  They hain't got it."

+

+"Why, who's got it?"

+

+"I wish I knowed, but I don't.  I had it, because I stole it from

+them; and I stole it to give to you; and I know where I hid it, but I'm

+afraid it ain't there no more.  I'm awful sorry, Miss Mary Jane, I'm

+just as sorry as I can be; but I done the best I could; I did honest.  I

+come nigh getting caught, and I had to shove it into the first place I

+come to, and runand it warn't a good place."

+

+"Oh, stop blaming yourselfit's too bad to do it, and I won't allow

+ityou couldn't help it; it wasn't your fault.  Where did you hide it?"

+

+I didn't want to set her to thinking about her troubles again; and I

+couldn't seem to get my mouth to tell her what would make her see that

+corpse laying in the coffin with that bag of money on his stomach.  So

+for a minute I didn't say nothing; then I says:

+

+"I'd ruther not tell you where I put it, Miss Mary Jane, if you don't

+mind letting me off; but I'll write it for you on a piece of paper, and

+you can read it along the road to Mr. Lothrop's, if you want to.  Do you

+reckon that 'll do?"

+

+"Oh, yes."

+

+So I wrote:  "I put it in the coffin.  It was in there when you was

+crying there, away in the night.  I was behind the door, and I was

+mighty sorry for you, Miss Mary Jane."

+

+It made my eyes water a little to remember her crying there all by

+herself in the night, and them devils laying there right under her own

+roof, shaming her and robbing her; and when I folded it up and give it

+to her I see the water come into her eyes, too; and she shook me by the

+hand, hard, and says:

+

+"Good-bye.  I'm going to do everything just as you've told me; and if

+I don't ever see you again, I sha'n't ever forget you and I'll think of

+you a many and a many a time, and I'll pray for you, too!"and she was

+gone.

+

+Pray for me!  I reckoned if she knowed me she'd take a job that was more

+nearer her size.  But I bet she done it, just the sameshe was just that

+kind.  She had the grit to pray for Judus if she took the notionthere

+warn't no back-down to her, I judge.  You may say what you want to, but

+in my opinion she had more sand in her than any girl I ever see; in

+my opinion she was just full of sand.  It sounds like flattery, but it

+ain't no flattery.  And when it comes to beautyand goodness, tooshe

+lays over them all.  I hain't ever seen her since that time that I see

+her go out of that door; no, I hain't ever seen her since, but I reckon

+I've thought of her a many and a many a million times, and of her saying

+she would pray for me; and if ever I'd a thought it would do any good

+for me to pray for her, blamed if I wouldn't a done it or bust.

+

+Well, Mary Jane she lit out the back way, I reckon; because nobody see

+her go.  When I struck Susan and the hare-lip, I says:

+

+"What's the name of them people over on t'other side of the river that

+you all goes to see sometimes?"

+

+They says:

+

+"There's several; but it's the Proctors, mainly."

+

+"That's the name," I says; "I most forgot it.  Well, Miss Mary Jane she

+told me to tell you she's gone over there in a dreadful hurryone of

+them's sick."

+

+"Which one?"

+

+"I don't know; leastways, I kinder forget; but I thinks it's"

+

+"Sakes alive, I hope it ain't Hanner?"

+

+"I'm sorry to say it," I says, "but Hanner's the very one."

+

+"My goodness, and she so well only last week!  Is she took bad?"

+

+"It ain't no name for it.  They set up with her all night, Miss Mary

+Jane said, and they don't think she'll last many hours."

+

+"Only think of that, now!  What's the matter with her?"

+

+I couldn't think of anything reasonable, right off that way, so I says:

+

+"Mumps."

+

+"Mumps your granny!  They don't set up with people that's got the

+mumps."

+

+"They don't, don't they?  You better bet they do with these mumps.

+ These mumps is different.  It's a new kind, Miss Mary Jane said."

+

+"How's it a new kind?"

+

+"Because it's mixed up with other things."

+

+"What other things?"

+

+"Well, measles, and whooping-cough, and erysiplas, and consumption, and

+yaller janders, and brain-fever, and I don't know what all."

+

+"My land!  And they call it the mumps?"

+

+"That's what Miss Mary Jane said."

+

+"Well, what in the nation do they call it the mumps for?"

+

+"Why, because it is the mumps.  That's what it starts with."

+

+"Well, ther' ain't no sense in it.  A body might stump his toe, and take

+pison, and fall down the well, and break his neck, and bust his brains

+out, and somebody come along and ask what killed him, and some numskull

+up and say, 'Why, he stumped his toe.'  Would ther' be any sense

+in that? No.  And ther' ain't no sense in this, nuther.  Is it

+ketching?"

+

+"Is it ketching?  Why, how you talk.  Is a harrow catchingin the

+dark? If you don't hitch on to one tooth, you're bound to on another,

+ain't you? And you can't get away with that tooth without fetching the

+whole harrow along, can you?  Well, these kind of mumps is a kind of a

+harrow, as you may sayand it ain't no slouch of a harrow, nuther, you

+come to get it hitched on good."

+

+"Well, it's awful, I think," says the hare-lip.  "I'll go to Uncle

+Harvey and"

+

+"Oh, yes," I says, "I would.  Of course I would.  I wouldn't lose no

+time."

+

+"Well, why wouldn't you?"

+

+"Just look at it a minute, and maybe you can see.  Hain't your uncles

+obleegd to get along home to England as fast as they can?  And do you

+reckon they'd be mean enough to go off and leave you to go all that

+journey by yourselves?  you know they'll wait for you.  So fur, so

+good. Your uncle Harvey's a preacher, ain't he?  Very well, then; is a

+preacher going to deceive a steamboat clerk? is he going to deceive

+a ship clerk?so as to get them to let Miss Mary Jane go aboard?  Now

+you know he ain't.  What will he do, then?  Why, he'll say, 'It's a

+great pity, but my church matters has got to get along the best way they

+can; for my niece has been exposed to the dreadful pluribus-unum mumps,

+and so it's my bounden duty to set down here and wait the three months

+it takes to show on her if she's got it.'  But never mind, if you think

+it's best to tell your uncle Harvey"

+

+"Shucks, and stay fooling around here when we could all be having good

+times in England whilst we was waiting to find out whether Mary Jane's

+got it or not?  Why, you talk like a muggins."

+

+"Well, anyway, maybe you'd better tell some of the neighbors."

+

+"Listen at that, now.  You do beat all for natural stupidness.  Can't

+you see that they'd go and tell?  Ther' ain't no way but just to not

+tell anybody at all."

+

+"Well, maybe you're rightyes, I judge you are right."

+

+"But I reckon we ought to tell Uncle Harvey she's gone out a while,

+anyway, so he won't be uneasy about her?"

+

+"Yes, Miss Mary Jane she wanted you to do that.  She says, 'Tell them to

+give Uncle Harvey and William my love and a kiss, and say I've run over

+the river to see Mr.'Mr.what is the name of that rich family your

+uncle Peter used to think so much of?I mean the one that"

+

+"Why, you must mean the Apthorps, ain't it?"

+

+"Of course; bother them kind of names, a body can't ever seem to

+remember them, half the time, somehow.  Yes, she said, say she has run

+over for to ask the Apthorps to be sure and come to the auction and buy

+this house, because she allowed her uncle Peter would ruther they had

+it than anybody else; and she's going to stick to them till they say

+they'll come, and then, if she ain't too tired, she's coming home; and

+if she is, she'll be home in the morning anyway.  She said, don't say

+nothing about the Proctors, but only about the Apthorpswhich 'll be

+perfectly true, because she is going there to speak about their buying

+the house; I know it, because she told me so herself."

+

+"All right," they said, and cleared out to lay for their uncles, and

+give them the love and the kisses, and tell them the message.

+

+Everything was all right now.  The girls wouldn't say nothing because

+they wanted to go to England; and the king and the duke would ruther

+Mary Jane was off working for the auction than around in reach of

+Doctor Robinson.  I felt very good; I judged I had done it pretty neatI

+reckoned Tom Sawyer couldn't a done it no neater himself.  Of course he

+would a throwed more style into it, but I can't do that very handy, not

+being brung up to it.

+

+Well, they held the auction in the public square, along towards the end

+of the afternoon, and it strung along, and strung along, and the old man

+he was on hand and looking his level pisonest, up there longside of the

+auctioneer, and chipping in a little Scripture now and then, or a little

+goody-goody saying of some kind, and the duke he was around goo-gooing

+for sympathy all he knowed how, and just spreading himself generly.

+

+But by and by the thing dragged through, and everything was

+soldeverything but a little old trifling lot in the graveyard.  So

+they'd got to work that offI never see such a girafft as the king was

+for wanting to swallow everything.  Well, whilst they was at it a

+steamboat landed, and in about two minutes up comes a crowd a-whooping

+and yelling and laughing and carrying on, and singing out:

+

+"Here's your opposition line! here's your two sets o' heirs to old

+Peter Wilksand you pays your money and you takes your choice!"

+

+

+

+

+CHAPTER XXIX.

+

+THEY was fetching a very nice-looking old gentleman along, and a

+nice-looking younger one, with his right arm in a sling.  And, my souls,

+how the people yelled and laughed, and kept it up.  But I didn't see no

+joke about it, and I judged it would strain the duke and the king some

+to see any.  I reckoned they'd turn pale.  But no, nary a pale did

+they turn. The duke he never let on he suspicioned what was up, but

+just went a goo-gooing around, happy and satisfied, like a jug that's

+googling out buttermilk; and as for the king, he just gazed and gazed

+down sorrowful on them new-comers like it give him the stomach-ache in

+his very heart to think there could be such frauds and rascals in the

+world.  Oh, he done it admirable.  Lots of the principal people

+gethered around the king, to let him see they was on his side.  That old

+gentleman that had just come looked all puzzled to death.  Pretty

+soon he begun to speak, and I see straight off he pronounced like an

+Englishmannot the king's way, though the king's was pretty good for

+an imitation.  I can't give the old gent's words, nor I can't imitate

+him; but he turned around to the crowd, and says, about like this:

+

+"This is a surprise to me which I wasn't looking for; and I'll

+acknowledge, candid and frank, I ain't very well fixed to meet it and

+answer it; for my brother and me has had misfortunes; he's broke his

+arm, and our baggage got put off at a town above here last night in the

+night by a mistake.  I am Peter Wilks' brother Harvey, and this is his

+brother William, which can't hear nor speakand can't even make signs to

+amount to much, now't he's only got one hand to work them with.  We are

+who we say we are; and in a day or two, when I get the baggage, I can

+prove it. But up till then I won't say nothing more, but go to the hotel

+and wait."

+

+So him and the new dummy started off; and the king he laughs, and

+blethers out:

+

+"Broke his armvery likely, ain't it?and very convenient, too,

+for a fraud that's got to make signs, and ain't learnt how.  Lost

+their baggage! That's mighty good!and mighty ingeniousunder the

+circumstances!"

+

+So he laughed again; and so did everybody else, except three or four,

+or maybe half a dozen.  One of these was that doctor; another one was

+a sharp-looking gentleman, with a carpet-bag of the old-fashioned kind

+made out of carpet-stuff, that had just come off of the steamboat and

+was talking to him in a low voice, and glancing towards the king now and

+then and nodding their headsit was Levi Bell, the lawyer that was gone

+up to Louisville; and another one was a big rough husky that come along

+and listened to all the old gentleman said, and was listening to the

+king now. And when the king got done this husky up and says:

+

+"Say, looky here; if you are Harvey Wilks, when'd you come to this

+town?"

+

+"The day before the funeral, friend," says the king.

+

+"But what time o' day?"

+

+"In the evenin''bout an hour er two before sundown."

+

+"How'd you come?"

+

+"I come down on the Susan Powell from Cincinnati."

+

+"Well, then, how'd you come to be up at the Pint in the mornin'in a

+canoe?"

+

+"I warn't up at the Pint in the mornin'."

+

+"It's a lie."

+

+Several of them jumped for him and begged him not to talk that way to an

+old man and a preacher.

+

+"Preacher be hanged, he's a fraud and a liar.  He was up at the Pint

+that mornin'.  I live up there, don't I?  Well, I was up there, and

+he was up there.  I see him there.  He come in a canoe, along with Tim

+Collins and a boy."

+

+The doctor he up and says:

+

+"Would you know the boy again if you was to see him, Hines?"

+

+"I reckon I would, but I don't know.  Why, yonder he is, now.  I know

+him perfectly easy."

+

+It was me he pointed at.  The doctor says:

+

+"Neighbors, I don't know whether the new couple is frauds or not; but if

+these two ain't frauds, I am an idiot, that's all.  I think it's our

+duty to see that they don't get away from here till we've looked into

+this thing. Come along, Hines; come along, the rest of you.  We'll take

+these fellows to the tavern and affront them with t'other couple, and I

+reckon we'll find out something before we get through."

+

+It was nuts for the crowd, though maybe not for the king's friends; so

+we all started.  It was about sundown.  The doctor he led me along by

+the hand, and was plenty kind enough, but he never let go my hand.

+

+We all got in a big room in the hotel, and lit up some candles, and

+fetched in the new couple.  First, the doctor says:

+

+"I don't wish to be too hard on these two men, but I think they're

+frauds, and they may have complices that we don't know nothing about.

+ If they have, won't the complices get away with that bag of gold Peter

+Wilks left?  It ain't unlikely.  If these men ain't frauds, they won't

+object to sending for that money and letting us keep it till they prove

+they're all rightain't that so?"

+

+Everybody agreed to that.  So I judged they had our gang in a pretty

+tight place right at the outstart.  But the king he only looked

+sorrowful, and says:

+

+"Gentlemen, I wish the money was there, for I ain't got no disposition

+to throw anything in the way of a fair, open, out-and-out investigation

+o' this misable business; but, alas, the money ain't there; you k'n send

+and see, if you want to."

+

+"Where is it, then?"

+

+"Well, when my niece give it to me to keep for her I took and hid it

+inside o' the straw tick o' my bed, not wishin' to bank it for the few

+days we'd be here, and considerin' the bed a safe place, we not bein'

+used to niggers, and suppos'n' 'em honest, like servants in England.

+ The niggers stole it the very next mornin' after I had went down

+stairs; and when I sold 'em I hadn't missed the money yit, so they got

+clean away with it.  My servant here k'n tell you 'bout it, gentlemen."

+

+The doctor and several said "Shucks!" and I see nobody didn't altogether

+believe him.  One man asked me if I see the niggers steal it.  I said

+no, but I see them sneaking out of the room and hustling away, and I

+never thought nothing, only I reckoned they was afraid they had waked up

+my master and was trying to get away before he made trouble with them.

+ That was all they asked me.  Then the doctor whirls on me and says:

+

+"Are you English, too?"

+

+I says yes; and him and some others laughed, and said, "Stuff!"

+

+Well, then they sailed in on the general investigation, and there we had

+it, up and down, hour in, hour out, and nobody never said a word about

+supper, nor ever seemed to think about itand so they kept it up, and

+kept it up; and it was the worst mixed-up thing you ever see.  They

+made the king tell his yarn, and they made the old gentleman tell his'n;

+and anybody but a lot of prejudiced chuckleheads would a seen that the

+old gentleman was spinning truth and t'other one lies.  And by and by

+they had me up to tell what I knowed.  The king he give me a left-handed

+look out of the corner of his eye, and so I knowed enough to talk on the

+right side.  I begun to tell about Sheffield, and how we lived there,

+and all about the English Wilkses, and so on; but I didn't get pretty

+fur till the doctor begun to laugh; and Levi Bell, the lawyer, says:

+

+"Set down, my boy; I wouldn't strain myself if I was you.  I reckon

+you ain't used to lying, it don't seem to come handy; what you want is

+practice.  You do it pretty awkward."

+

+I didn't care nothing for the compliment, but I was glad to be let off,

+anyway.

+

+The doctor he started to say something, and turns and says:

+

+"If you'd been in town at first, Levi Bell" The king broke in and

+reached out his hand, and says:

+

+"Why, is this my poor dead brother's old friend that he's wrote so often

+about?"

+

+The lawyer and him shook hands, and the lawyer smiled and looked

+pleased, and they talked right along awhile, and then got to one side

+and talked low; and at last the lawyer speaks up and says:

+

+"That 'll fix it.  I'll take the order and send it, along with your

+brother's, and then they'll know it's all right."

+

+So they got some paper and a pen, and the king he set down and twisted

+his head to one side, and chawed his tongue, and scrawled off something;

+and then they give the pen to the dukeand then for the first time the

+duke looked sick.  But he took the pen and wrote.  So then the lawyer

+turns to the new old gentleman and says:

+

+"You and your brother please write a line or two and sign your names."

+

+The old gentleman wrote, but nobody couldn't read it.  The lawyer looked

+powerful astonished, and says:

+

+"Well, it beats me"and snaked a lot of old letters out of his pocket,

+and examined them, and then examined the old man's writing, and then

+them again; and then says:  "These old letters is from Harvey Wilks;

+and here's these two handwritings, and anybody can see they didn't

+write them" (the king and the duke looked sold and foolish, I tell

+you, to see how the lawyer had took them in), "and here's this old

+gentleman's hand writing, and anybody can tell, easy enough, he didn't

+write themfact is, the scratches he makes ain't properly writing at

+all.  Now, here's some letters from"

+

+The new old gentleman says:

+

+"If you please, let me explain.  Nobody can read my hand but my brother

+thereso he copies for me.  It's his hand you've got there, not mine."

+

+"Well!" says the lawyer, "this is a state of things.  I've got some

+of William's letters, too; so if you'll get him to write a line or so we

+can com"

+

+"He can't write with his left hand," says the old gentleman.  "If he

+could use his right hand, you would see that he wrote his own letters

+and mine too.  Look at both, pleasethey're by the same hand."

+

+The lawyer done it, and says:

+

+"I believe it's soand if it ain't so, there's a heap stronger

+resemblance than I'd noticed before, anyway.  Well, well, well!  I

+thought we was right on the track of a solution, but it's gone to grass,

+partly.  But anyway, one thing is provedthese two ain't either of 'em

+Wilkses"and he wagged his head towards the king and the duke.

+

+Well, what do you think?  That muleheaded old fool wouldn't give in

+then! Indeed he wouldn't.  Said it warn't no fair test.  Said his

+brother William was the cussedest joker in the world, and hadn't tried

+to writehe see William was going to play one of his jokes the minute

+he put the pen to paper.  And so he warmed up and went warbling and

+warbling right along till he was actuly beginning to believe what he was

+saying himself; but pretty soon the new gentleman broke in, and says:

+

+"I've thought of something.  Is there anybody here that helped to lay

+out my brhelped to lay out the late Peter Wilks for burying?"

+

+"Yes," says somebody, "me and Ab Turner done it.  We're both here."

+

+Then the old man turns towards the king, and says:

+

+"Perhaps this gentleman can tell me what was tattooed on his breast?"

+

+Blamed if the king didn't have to brace up mighty quick, or he'd a

+squshed down like a bluff bank that the river has cut under, it took

+him so sudden; and, mind you, it was a thing that was calculated to make

+most anybody sqush to get fetched such a solid one as that without any

+notice, because how was he going to know what was tattooed on the man?

+ He whitened a little; he couldn't help it; and it was mighty still in

+there, and everybody bending a little forwards and gazing at him.  Says

+I to myself, now he'll throw up the spongethere ain't no more use.

+ Well, did he?  A body can't hardly believe it, but he didn't.  I reckon

+he thought he'd keep the thing up till he tired them people out, so

+they'd thin out, and him and the duke could break loose and get away.

+ Anyway, he set there, and pretty soon he begun to smile, and says:

+

+"Mf!  It's a very tough question, ain't it!  yes, sir, I k'n

+tell you what's tattooed on his breast.  It's jest a small, thin, blue

+arrowthat's what it is; and if you don't look clost, you can't see it.

+ now what do you sayhey?"

+

+Well, I never see anything like that old blister for clean out-and-out

+cheek.

+

+The new old gentleman turns brisk towards Ab Turner and his pard, and

+his eye lights up like he judged he'd got the king this time, and

+says:

+

+"Thereyou've heard what he said!  Was there any such mark on Peter

+Wilks' breast?"

+

+Both of them spoke up and says:

+

+"We didn't see no such mark."

+

+"Good!" says the old gentleman.  "Now, what you did see on his breast

+was a small dim P, and a B (which is an initial he dropped when he was

+young), and a W, with dashes between them, so:  PBW"and he marked

+them that way on a piece of paper.  "Come, ain't that what you saw?"

+

+Both of them spoke up again, and says:

+

+"No, we didn't.  We never seen any marks at all."

+

+Well, everybody was in a state of mind now, and they sings out:

+

+"The whole bilin' of 'm 's frauds!  Le's duck 'em! le's drown 'em!

+le's ride 'em on a rail!" and everybody was whooping at once, and there

+was a rattling powwow.  But the lawyer he jumps on the table and yells,

+and says:

+

+"Gentlemengentlemen!  Hear me just a wordjust a single wordif you

+please!  There's one way yetlet's go and dig up the corpse and look."

+

+That took them.

+

+"Hooray!" they all shouted, and was starting right off; but the lawyer

+and the doctor sung out:

+

+"Hold on, hold on!  Collar all these four men and the boy, and fetch

+them along, too!"

+

+"We'll do it!" they all shouted; "and if we don't find them marks we'll

+lynch the whole gang!"

+

+I was scared, now, I tell you.  But there warn't no getting away, you

+know. They gripped us all, and marched us right along, straight for the

+graveyard, which was a mile and a half down the river, and the whole

+town at our heels, for we made noise enough, and it was only nine in the

+evening.

+

+As we went by our house I wished I hadn't sent Mary Jane out of town;

+because now if I could tip her the wink she'd light out and save me, and

+blow on our dead-beats.

+

+Well, we swarmed along down the river road, just carrying on like

+wildcats; and to make it more scary the sky was darking up, and the

+lightning beginning to wink and flitter, and the wind to shiver amongst

+the leaves. This was the most awful trouble and most dangersome I ever

+was in; and I was kinder stunned; everything was going so different from

+what I had allowed for; stead of being fixed so I could take my own time

+if I wanted to, and see all the fun, and have Mary Jane at my back to

+save me and set me free when the close-fit come, here was nothing in the

+world betwixt me and sudden death but just them tattoo-marks.  If they

+didn't find them

+

+I couldn't bear to think about it; and yet, somehow, I couldn't think

+about nothing else.  It got darker and darker, and it was a beautiful

+time to give the crowd the slip; but that big husky had me by the

+wristHinesand a body might as well try to give Goliar the slip.  He

+dragged me right along, he was so excited, and I had to run to keep up.

+

+When they got there they swarmed into the graveyard and washed over it

+like an overflow.  And when they got to the grave they found they had

+about a hundred times as many shovels as they wanted, but nobody hadn't

+thought to fetch a lantern.  But they sailed into digging anyway by the

+flicker of the lightning, and sent a man to the nearest house, a half a

+mile off, to borrow one.

+

+So they dug and dug like everything; and it got awful dark, and the rain

+started, and the wind swished and swushed along, and the lightning come

+brisker and brisker, and the thunder boomed; but them people never took

+no notice of it, they was so full of this business; and one minute

+you could see everything and every face in that big crowd, and the

+shovelfuls of dirt sailing up out of the grave, and the next second the

+dark wiped it all out, and you couldn't see nothing at all.

+

+At last they got out the coffin and begun to unscrew the lid, and then

+such another crowding and shouldering and shoving as there was, to

+scrouge in and get a sight, you never see; and in the dark, that way, it

+was awful.  Hines he hurt my wrist dreadful pulling and tugging so,

+and I reckon he clean forgot I was in the world, he was so excited and

+panting.

+

+All of a sudden the lightning let go a perfect sluice of white glare,

+and somebody sings out:

+

+"By the living jingo, here's the bag of gold on his breast!"

+

+Hines let out a whoop, like everybody else, and dropped my wrist and

+give a big surge to bust his way in and get a look, and the way I lit

+out and shinned for the road in the dark there ain't nobody can tell.

+

+I had the road all to myself, and I fairly flewleastways, I had it all

+to myself except the solid dark, and the now-and-then glares, and the

+buzzing of the rain, and the thrashing of the wind, and the splitting of

+the thunder; and sure as you are born I did clip it along!

+

+When I struck the town I see there warn't nobody out in the storm, so

+I never hunted for no back streets, but humped it straight through the

+main one; and when I begun to get towards our house I aimed my eye and

+set it. No light there; the house all darkwhich made me feel sorry and

+disappointed, I didn't know why.  But at last, just as I was sailing by,

+flash comes the light in Mary Jane's window! and my heart swelled up

+sudden, like to bust; and the same second the house and all was behind

+me in the dark, and wasn't ever going to be before me no more in this

+world. She was the best girl I ever see, and had the most sand.

+

+The minute I was far enough above the town to see I could make the

+towhead, I begun to look sharp for a boat to borrow, and the first

+time the lightning showed me one that wasn't chained I snatched it and

+shoved. It was a canoe, and warn't fastened with nothing but a rope.

+ The towhead was a rattling big distance off, away out there in the

+middle of the river, but I didn't lose no time; and when I struck the

+raft at last I was so fagged I would a just laid down to blow and gasp

+if I could afforded it.  But I didn't.  As I sprung aboard I sung out:

+

+"Out with you, Jim, and set her loose!  Glory be to goodness, we're shut

+of them!"

+

+Jim lit out, and was a-coming for me with both arms spread, he was so

+full of joy; but when I glimpsed him in the lightning my heart shot up

+in my mouth and I went overboard backwards; for I forgot he was old King

+Lear and a drownded A-rab all in one, and it most scared the livers and

+lights out of me.  But Jim fished me out, and was going to hug me and

+bless me, and so on, he was so glad I was back and we was shut of the

+king and the duke, but I says:

+

+"Not now; have it for breakfast, have it for breakfast!  Cut loose and

+let her slide!"

+

+So in two seconds away we went a-sliding down the river, and it did

+seem so good to be free again and all by ourselves on the big river, and

+nobody to bother us.  I had to skip around a bit, and jump up and crack

+my heels a few timesI couldn't help it; but about the third crack

+I noticed a sound that I knowed mighty well, and held my breath and

+listened and waited; and sure enough, when the next flash busted out

+over the water, here they come!and just a-laying to their oars and

+making their skiff hum!  It was the king and the duke.

+

+So I wilted right down on to the planks then, and give up; and it was

+all I could do to keep from crying.

+

+

+

+

+CHAPTER XXX.

+

+WHEN they got aboard the king went for me, and shook me by the collar,

+and says:

+

+"Tryin' to give us the slip, was ye, you pup!  Tired of our company,

+hey?"

+

+I says:

+

+"No, your majesty, we warn'tplease don't, your majesty!"

+

+"Quick, then, and tell us what was your idea, or I'll shake the

+insides out o' you!"

+

+"Honest, I'll tell you everything just as it happened, your majesty.

+ The man that had a-holt of me was very good to me, and kept saying he

+had a boy about as big as me that died last year, and he was sorry

+to see a boy in such a dangerous fix; and when they was all took by

+surprise by finding the gold, and made a rush for the coffin, he lets go

+of me and whispers, 'Heel it now, or they'll hang ye, sure!' and I lit

+out.  It didn't seem no good for me to stayI couldn't do nothing,

+and I didn't want to be hung if I could get away.  So I never stopped

+running till I found the canoe; and when I got here I told Jim to hurry,

+or they'd catch me and hang me yet, and said I was afeard you and the

+duke wasn't alive now, and I was awful sorry, and so was Jim, and was

+awful glad when we see you coming; you may ask Jim if I didn't."

+

+Jim said it was so; and the king told him to shut up, and said, "Oh,

+yes, it's mighty likely!" and shook me up again, and said he reckoned

+he'd drownd me.  But the duke says:

+

+"Leggo the boy, you old idiot!  Would you a done any different?  Did

+you inquire around for him when you got loose?  I don't remember it."

+

+So the king let go of me, and begun to cuss that town and everybody in

+it. But the duke says:

+

+"You better a blame' sight give yourself a good cussing, for you're

+the one that's entitled to it most.  You hain't done a thing from the

+start that had any sense in it, except coming out so cool and cheeky

+with that imaginary blue-arrow mark.  That was brightit was right

+down bully; and it was the thing that saved us.  For if it hadn't been

+for that they'd a jailed us till them Englishmen's baggage comeand

+thenthe penitentiary, you bet! But that trick took 'em to the

+graveyard, and the gold done us a still bigger kindness; for if the

+excited fools hadn't let go all holts and made that rush to get a

+look we'd a slept in our cravats to-nightcravats warranted to wear,

+toolonger than we'd need 'em."

+

+They was still a minutethinking; then the king says, kind of

+absent-minded like:

+

+"Mf!  And we reckoned the niggers stole it!"

+

+That made me squirm!

+

+"Yes," says the duke, kinder slow and deliberate and sarcastic, "we

+did."

+

+After about a half a minute the king drawls out:

+

+"Leastways, I did."

+

+The duke says, the same way:

+

+"On the contrary, I did."

+

+The king kind of ruffles up, and says:

+

+"Looky here, Bilgewater, what'r you referrin' to?"

+

+The duke says, pretty brisk:

+

+"When it comes to that, maybe you'll let me ask, what was you

+referring to?"

+

+"Shucks!" says the king, very sarcastic; "but I don't knowmaybe you was

+asleep, and didn't know what you was about."

+

+The duke bristles up now, and says:

+

+"Oh, let up on this cussed nonsense; do you take me for a blame' fool?

+Don't you reckon I know who hid that money in that coffin?"

+

+"Yes, sir!  I know you do know, because you done it yourself!"

+

+"It's a lie!"and the duke went for him.  The king sings out:

+

+"Take y'r hands off!leggo my throat!I take it all back!"

+

+The duke says:

+

+"Well, you just own up, first, that you did hide that money there,

+intending to give me the slip one of these days, and come back and dig

+it up, and have it all to yourself."

+

+"Wait jest a minute, dukeanswer me this one question, honest and fair;

+if you didn't put the money there, say it, and I'll b'lieve you, and

+take back everything I said."

+

+"You old scoundrel, I didn't, and you know I didn't.  There, now!"

+

+"Well, then, I b'lieve you.  But answer me only jest this one morenow

+don't git mad; didn't you have it in your mind to hook the money and

+hide it?"

+

+The duke never said nothing for a little bit; then he says:

+

+"Well, I don't care if I did, I didn't do it, anyway.  But you not

+only had it in mind to do it, but you done it."

+

+"I wisht I never die if I done it, duke, and that's honest.  I won't say

+I warn't goin' to do it, because I was; but youI mean somebodygot in

+ahead o' me."

+

+"It's a lie!  You done it, and you got to say you done it, or"

+

+The king began to gurgle, and then he gasps out:

+

+"'Nough!I own up!"

+

+I was very glad to hear him say that; it made me feel much more easier

+than what I was feeling before.  So the duke took his hands off and

+says:

+

+"If you ever deny it again I'll drown you.  It's well for you to set

+there and blubber like a babyit's fitten for you, after the way

+you've acted. I never see such an old ostrich for wanting to gobble

+everythingand I a-trusting you all the time, like you was my own

+father.  You ought to been ashamed of yourself to stand by and hear it

+saddled on to a lot of poor niggers, and you never say a word for 'em.

+ It makes me feel ridiculous to think I was soft enough to believe

+that rubbage.  Cuss you, I can see now why you was so anxious to make

+up the deffisityou wanted to get what money I'd got out of the Nonesuch

+and one thing or another, and scoop it all!"

+

+The king says, timid, and still a-snuffling:

+

+"Why, duke, it was you that said make up the deffisit; it warn't me."

+

+"Dry up!  I don't want to hear no more out of you!" says the duke.  "And

+now you see what you GOT by it.  They've got all their own money back,

+and all of ourn but a shekel or two besides.  G'long to bed, and

+don't you deffersit me no more deffersits, long 's you live!"

+

+So the king sneaked into the wigwam and took to his bottle for comfort,

+and before long the duke tackled HIS bottle; and so in about a half an

+hour they was as thick as thieves again, and the tighter they got the

+lovinger they got, and went off a-snoring in each other's arms.  They

+both got powerful mellow, but I noticed the king didn't get mellow

+enough to forget to remember to not deny about hiding the money-bag

+again.  That made me feel easy and satisfied.  Of course when they got

+to snoring we had a long gabble, and I told Jim everything.

+

+

+

+

+CHAPTER XXXI.

+

+WE dasn't stop again at any town for days and days; kept right along

+down the river.  We was down south in the warm weather now, and a mighty

+long ways from home.  We begun to come to trees with Spanish moss on

+them, hanging down from the limbs like long, gray beards.  It was the

+first I ever see it growing, and it made the woods look solemn and

+dismal.  So now the frauds reckoned they was out of danger, and they

+begun to work the villages again.

+

+First they done a lecture on temperance; but they didn't make enough

+for them both to get drunk on.  Then in another village they started

+a dancing-school; but they didn't know no more how to dance than a

+kangaroo does; so the first prance they made the general public jumped

+in and pranced them out of town.  Another time they tried to go at

+yellocution; but they didn't yellocute long till the audience got up and

+give them a solid good cussing, and made them skip out.  They tackled

+missionarying, and mesmerizing, and doctoring, and telling fortunes, and

+a little of everything; but they couldn't seem to have no luck.  So at

+last they got just about dead broke, and laid around the raft as she

+floated along, thinking and thinking, and never saying nothing, by the

+half a day at a time, and dreadful blue and desperate.

+

+And at last they took a change and begun to lay their heads together in

+the wigwam and talk low and confidential two or three hours at a time.

+Jim and me got uneasy.  We didn't like the look of it.  We judged they

+was studying up some kind of worse deviltry than ever.  We turned it

+over and over, and at last we made up our minds they was going to break

+into somebody's house or store, or was going into the counterfeit-money

+business, or something. So then we was pretty scared, and made up an

+agreement that we wouldn't have nothing in the world to do with such

+actions, and if we ever got the least show we would give them the cold

+shake and clear out and leave them behind. Well, early one morning we

+hid the raft in a good, safe place about two mile below a little bit of

+a shabby village named Pikesville, and the king he went ashore and told

+us all to stay hid whilst he went up to town and smelt around to see

+if anybody had got any wind of the Royal Nonesuch there yet. ("House to

+rob, you mean," says I to myself; "and when you get through robbing it

+you'll come back here and wonder what has become of me and Jim and the

+raftand you'll have to take it out in wondering.") And he said if he

+warn't back by midday the duke and me would know it was all right, and

+we was to come along.

+

+So we stayed where we was.  The duke he fretted and sweated around, and

+was in a mighty sour way.  He scolded us for everything, and we couldn't

+seem to do nothing right; he found fault with every little thing.

+Something was a-brewing, sure.  I was good and glad when midday come

+and no king; we could have a change, anywayand maybe a chance for the

+change on top of it.  So me and the duke went up to the village, and

+hunted around there for the king, and by and by we found him in the

+back room of a little low doggery, very tight, and a lot of loafers

+bullyragging him for sport, and he a-cussing and a-threatening with all

+his might, and so tight he couldn't walk, and couldn't do nothing to

+them.  The duke he begun to abuse him for an old fool, and the king

+begun to sass back, and the minute they was fairly at it I lit out and

+shook the reefs out of my hind legs, and spun down the river road like

+a deer, for I see our chance; and I made up my mind that it would be a

+long day before they ever see me and Jim again.  I got down there all

+out of breath but loaded up with joy, and sung out:

+

+"Set her loose, Jim! we're all right now!"

+

+But there warn't no answer, and nobody come out of the wigwam.  Jim was

+gone!  I set up a shoutand then anotherand then another one; and run

+this way and that in the woods, whooping and screeching; but it warn't

+no useold Jim was gone.  Then I set down and cried; I couldn't help

+it. But I couldn't set still long.  Pretty soon I went out on the road,

+trying to think what I better do, and I run across a boy walking, and

+asked him if he'd seen a strange nigger dressed so and so, and he says:

+

+"Yes."

+

+"Whereabouts?" says I.

+

+"Down to Silas Phelps' place, two mile below here.  He's a runaway

+nigger, and they've got him.  Was you looking for him?"

+

+"You bet I ain't!  I run across him in the woods about an hour or two

+ago, and he said if I hollered he'd cut my livers outand told me to lay

+down and stay where I was; and I done it.  Been there ever since; afeard

+to come out."

+

+"Well," he says, "you needn't be afeard no more, becuz they've got him.

+He run off f'm down South, som'ers."

+

+"It's a good job they got him."

+

+"Well, I reckon!  There's two hunderd dollars reward on him.  It's

+like picking up money out'n the road."

+

+"Yes, it isand I could a had it if I'd been big enough; I see him

+first. Who nailed him?"

+

+"It was an old fellowa strangerand he sold out his chance in him for

+forty dollars, becuz he's got to go up the river and can't wait.  Think

+o' that, now!  You bet I'd wait, if it was seven year."

+

+"That's me, every time," says I.  "But maybe his chance ain't worth

+no more than that, if he'll sell it so cheap.  Maybe there's something

+ain't straight about it."

+

+"But it is, thoughstraight as a string.  I see the handbill myself.

+ It tells all about him, to a dotpaints him like a picture, and tells

+the plantation he's frum, below Newrleans.  No-sirree-bob, they

+ain't no trouble 'bout that speculation, you bet you.  Say, gimme a

+chaw tobacker, won't ye?"

+

+I didn't have none, so he left.  I went to the raft, and set down in the

+wigwam to think.  But I couldn't come to nothing.  I thought till I wore

+my head sore, but I couldn't see no way out of the trouble.  After all

+this long journey, and after all we'd done for them scoundrels, here it

+was all come to nothing, everything all busted up and ruined, because

+they could have the heart to serve Jim such a trick as that, and make

+him a slave again all his life, and amongst strangers, too, for forty

+dirty dollars.

+

+Once I said to myself it would be a thousand times better for Jim to

+be a slave at home where his family was, as long as he'd got to be a

+slave, and so I'd better write a letter to Tom Sawyer and tell him to

+tell Miss Watson where he was.  But I soon give up that notion for two

+things: she'd be mad and disgusted at his rascality and ungratefulness

+for leaving her, and so she'd sell him straight down the river again;

+and if she didn't, everybody naturally despises an ungrateful nigger,

+and they'd make Jim feel it all the time, and so he'd feel ornery and

+disgraced. And then think of me!  It would get all around that Huck

+Finn helped a nigger to get his freedom; and if I was ever to see

+anybody from that town again I'd be ready to get down and lick his boots

+for shame.  That's just the way:  a person does a low-down thing, and

+then he don't want to take no consequences of it. Thinks as long as he

+can hide it, it ain't no disgrace.  That was my fix exactly. The more I

+studied about this the more my conscience went to grinding me, and the

+more wicked and low-down and ornery I got to feeling. And at last, when

+it hit me all of a sudden that here was the plain hand of Providence

+slapping me in the face and letting me know my wickedness was being

+watched all the time from up there in heaven, whilst I was stealing a

+poor old woman's nigger that hadn't ever done me no harm, and now was

+showing me there's One that's always on the lookout, and ain't a-going

+to allow no such miserable doings to go only just so fur and no further,

+I most dropped in my tracks I was so scared.  Well, I tried the best I

+could to kinder soften it up somehow for myself by saying I was brung

+up wicked, and so I warn't so much to blame; but something inside of me

+kept saying, "There was the Sunday-school, you could a gone to it; and

+if you'd a done it they'd a learnt you there that people that acts as

+I'd been acting about that nigger goes to everlasting fire."

+

+It made me shiver.  And I about made up my mind to pray, and see if I

+couldn't try to quit being the kind of a boy I was and be better.  So

+I kneeled down.  But the words wouldn't come.  Why wouldn't they?  It

+warn't no use to try and hide it from Him.  Nor from me, neither.  I

+knowed very well why they wouldn't come.  It was because my heart warn't

+right; it was because I warn't square; it was because I was playing

+double.  I was letting on to give up sin, but away inside of me I was

+holding on to the biggest one of all.  I was trying to make my mouth

+say I would do the right thing and the clean thing, and go and write

+to that nigger's owner and tell where he was; but deep down in me I

+knowed it was a lie, and He knowed it.  You can't pray a lieI found

+that out.

+

+So I was full of trouble, full as I could be; and didn't know what to

+do. At last I had an idea; and I says, I'll go and write the letterand

+then see if I can pray.  Why, it was astonishing, the way I felt as

+light as a feather right straight off, and my troubles all gone.  So I

+got a piece of paper and a pencil, all glad and excited, and set down

+and wrote:

+

+Miss Watson, your runaway nigger Jim is down here two mile below

+Pikesville, and Mr. Phelps has got him and he will give him up for the

+reward if you send.

+

+Huck Finn.

+

+I felt good and all washed clean of sin for the first time I had ever

+felt so in my life, and I knowed I could pray now.  But I didn't do it

+straight off, but laid the paper down and set there thinkingthinking

+how good it was all this happened so, and how near I come to being lost

+and going to hell.  And went on thinking.  And got to thinking over our

+trip down the river; and I see Jim before me all the time:  in the day

+and in the night-time, sometimes moonlight, sometimes storms, and we

+a-floating along, talking and singing and laughing.  But somehow I

+couldn't seem to strike no places to harden me against him, but only the

+other kind.  I'd see him standing my watch on top of his'n, 'stead of

+calling me, so I could go on sleeping; and see him how glad he was when

+I come back out of the fog; and when I come to him again in the swamp,

+up there where the feud was; and such-like times; and would always call

+me honey, and pet me and do everything he could think of for me, and how

+good he always was; and at last I struck the time I saved him by telling

+the men we had small-pox aboard, and he was so grateful, and said I was

+the best friend old Jim ever had in the world, and the only one he's

+got now; and then I happened to look around and see that paper.

+

+It was a close place.  I took it up, and held it in my hand.  I was

+a-trembling, because I'd got to decide, forever, betwixt two things, and

+I knowed it.  I studied a minute, sort of holding my breath, and then

+says to myself:

+

+"All right, then, I'll go to hell"and tore it up.

+

+It was awful thoughts and awful words, but they was said.  And I let

+them stay said; and never thought no more about reforming.  I shoved the

+whole thing out of my head, and said I would take up wickedness again,

+which was in my line, being brung up to it, and the other warn't.  And

+for a starter I would go to work and steal Jim out of slavery again;

+and if I could think up anything worse, I would do that, too; because as

+long as I was in, and in for good, I might as well go the whole hog.

+

+Then I set to thinking over how to get at it, and turned over some

+considerable many ways in my mind; and at last fixed up a plan that

+suited me.  So then I took the bearings of a woody island that was down

+the river a piece, and as soon as it was fairly dark I crept out with my

+raft and went for it, and hid it there, and then turned in.  I slept the

+night through, and got up before it was light, and had my breakfast,

+and put on my store clothes, and tied up some others and one thing or

+another in a bundle, and took the canoe and cleared for shore.  I landed

+below where I judged was Phelps's place, and hid my bundle in the woods,

+and then filled up the canoe with water, and loaded rocks into her and

+sunk her where I could find her again when I wanted her, about a quarter

+of a mile below a little steam sawmill that was on the bank.

+

+Then I struck up the road, and when I passed the mill I see a sign on

+it, "Phelps's Sawmill," and when I come to the farm-houses, two or

+three hundred yards further along, I kept my eyes peeled, but didn't

+see nobody around, though it was good daylight now.  But I didn't mind,

+because I didn't want to see nobody just yetI only wanted to get the

+lay of the land. According to my plan, I was going to turn up there from

+the village, not from below.  So I just took a look, and shoved along,

+straight for town. Well, the very first man I see when I got there was

+the duke.  He was sticking up a bill for the Royal Nonesuchthree-night

+performancelike that other time.  They had the cheek, them frauds!  I

+was right on him before I could shirk.  He looked astonished, and says:

+

+"Hel-lo!  Where'd you come from?"  Then he says, kind of glad and

+eager, "Where's the raft?got her in a good place?"

+

+I says:

+

+"Why, that's just what I was going to ask your grace."

+

+Then he didn't look so joyful, and says:

+

+"What was your idea for asking me?" he says.

+

+"Well," I says, "when I see the king in that doggery yesterday I says

+to myself, we can't get him home for hours, till he's soberer; so I went

+a-loafing around town to put in the time and wait.  A man up and offered

+me ten cents to help him pull a skiff over the river and back to fetch

+a sheep, and so I went along; but when we was dragging him to the boat,

+and the man left me a-holt of the rope and went behind him to shove him

+along, he was too strong for me and jerked loose and run, and we after

+him.  We didn't have no dog, and so we had to chase him all over the

+country till we tired him out.  We never got him till dark; then we

+fetched him over, and I started down for the raft.  When I got there and

+see it was gone, I says to myself, 'They've got into trouble and had to

+leave; and they've took my nigger, which is the only nigger I've got in

+the world, and now I'm in a strange country, and ain't got no property

+no more, nor nothing, and no way to make my living;' so I set down and

+cried.  I slept in the woods all night.  But what did become of the

+raft, then?and Jimpoor Jim!"

+

+"Blamed if I knowthat is, what's become of the raft.  That old fool had

+made a trade and got forty dollars, and when we found him in the doggery

+the loafers had matched half-dollars with him and got every cent but

+what he'd spent for whisky; and when I got him home late last night and

+found the raft gone, we said, 'That little rascal has stole our raft and

+shook us, and run off down the river.'"

+

+"I wouldn't shake my nigger, would I?the only nigger I had in the

+world, and the only property."

+

+"We never thought of that.  Fact is, I reckon we'd come to consider him

+our nigger; yes, we did consider him sogoodness knows we had trouble

+enough for him.  So when we see the raft was gone and we flat broke,

+there warn't anything for it but to try the Royal Nonesuch another

+shake. And I've pegged along ever since, dry as a powder-horn.  Where's

+that ten cents? Give it here."

+

+I had considerable money, so I give him ten cents, but begged him to

+spend it for something to eat, and give me some, because it was all the

+money I had, and I hadn't had nothing to eat since yesterday.  He never

+said nothing.  The next minute he whirls on me and says:

+

+"Do you reckon that nigger would blow on us?  We'd skin him if he done

+that!"

+

+"How can he blow?  Hain't he run off?"

+

+"No!  That old fool sold him, and never divided with me, and the money's

+gone."

+

+"Sold him?"  I says, and begun to cry; "why, he was my nigger, and

+that was my money.  Where is he?I want my nigger."

+

+"Well, you can't get your nigger, that's allso dry up your

+blubbering. Looky heredo you think you'd venture to blow on us?

+ Blamed if I think I'd trust you.  Why, if you was to blow on us"

+

+He stopped, but I never see the duke look so ugly out of his eyes

+before. I went on a-whimpering, and says:

+

+"I don't want to blow on nobody; and I ain't got no time to blow, nohow.

+I got to turn out and find my nigger."

+

+He looked kinder bothered, and stood there with his bills fluttering on

+his arm, thinking, and wrinkling up his forehead.  At last he says:

+

+"I'll tell you something.  We got to be here three days.  If you'll

+promise you won't blow, and won't let the nigger blow, I'll tell you

+where to find him."

+

+So I promised, and he says:

+

+"A farmer by the name of Silas Ph" and then he stopped.  You see, he

+started to tell me the truth; but when he stopped that way, and begun to

+study and think again, I reckoned he was changing his mind.  And so he

+was. He wouldn't trust me; he wanted to make sure of having me out of

+the way the whole three days.  So pretty soon he says:

+

+"The man that bought him is named Abram FosterAbram G. Fosterand he

+lives forty mile back here in the country, on the road to Lafayette."

+

+"All right," I says, "I can walk it in three days.  And I'll start this

+very afternoon."

+

+"No you wont, you'll start now; and don't you lose any time about it,

+neither, nor do any gabbling by the way.  Just keep a tight tongue in

+your head and move right along, and then you won't get into trouble with

+us, d'ye hear?"

+

+That was the order I wanted, and that was the one I played for.  I

+wanted to be left free to work my plans.

+

+"So clear out," he says; "and you can tell Mr. Foster whatever you want

+to. Maybe you can get him to believe that Jim is your niggersome

+idiots don't require documentsleastways I've heard there's such down

+South here.  And when you tell him the handbill and the reward's bogus,

+maybe he'll believe you when you explain to him what the idea was for

+getting 'em out.  Go 'long now, and tell him anything you want to; but

+mind you don't work your jaw any between here and there."

+

+So I left, and struck for the back country.  I didn't look around, but I

+kinder felt like he was watching me.  But I knowed I could tire him out

+at that.  I went straight out in the country as much as a mile before

+I stopped; then I doubled back through the woods towards Phelps'.  I

+reckoned I better start in on my plan straight off without fooling

+around, because I wanted to stop Jim's mouth till these fellows could

+get away.  I didn't want no trouble with their kind.  I'd seen all I

+wanted to of them, and wanted to get entirely shut of them.

+

+

+

+

+CHAPTER XXXII.

+

+WHEN I got there it was all still and Sunday-like, and hot and sunshiny;

+the hands was gone to the fields; and there was them kind of faint

+dronings of bugs and flies in the air that makes it seem so lonesome and

+like everybody's dead and gone; and if a breeze fans along and quivers

+the leaves it makes you feel mournful, because you feel like it's

+spirits whisperingspirits that's been dead ever so many yearsand you

+always think they're talking about you.  As a general thing it makes a

+body wish he was dead, too, and done with it all.

+

+Phelps' was one of these little one-horse cotton plantations, and they

+all look alike.  A rail fence round a two-acre yard; a stile made out

+of logs sawed off and up-ended in steps, like barrels of a different

+length, to climb over the fence with, and for the women to stand on when

+they are going to jump on to a horse; some sickly grass-patches in the

+big yard, but mostly it was bare and smooth, like an old hat with the

+nap rubbed off; big double log-house for the white folkshewed logs,

+with the chinks stopped up with mud or mortar, and these mud-stripes

+been whitewashed some time or another; round-log kitchen, with a big

+broad, open but roofed passage joining it to the house; log smoke-house

+back of the kitchen; three little log nigger-cabins in a row t'other

+side the smoke-house; one little hut all by itself away down against

+the back fence, and some outbuildings down a piece the other side;

+ash-hopper and big kettle to bile soap in by the little hut; bench by

+the kitchen door, with bucket of water and a gourd; hound asleep there

+in the sun; more hounds asleep round about; about three shade trees away

+off in a corner; some currant bushes and gooseberry bushes in one place

+by the fence; outside of the fence a garden and a watermelon patch; then

+the cotton fields begins, and after the fields the woods.

+

+I went around and clumb over the back stile by the ash-hopper, and

+started for the kitchen.  When I got a little ways I heard the dim hum

+of a spinning-wheel wailing along up and sinking along down again;

+and then I knowed for certain I wished I was deadfor that is the

+lonesomest sound in the whole world.

+

+I went right along, not fixing up any particular plan, but just trusting

+to Providence to put the right words in my mouth when the time come; for

+I'd noticed that Providence always did put the right words in my mouth

+if I left it alone.

+

+When I got half-way, first one hound and then another got up and went

+for me, and of course I stopped and faced them, and kept still.  And

+such another powwow as they made!  In a quarter of a minute I was a kind

+of a hub of a wheel, as you may sayspokes made out of dogscircle of

+fifteen of them packed together around me, with their necks and noses

+stretched up towards me, a-barking and howling; and more a-coming; you

+could see them sailing over fences and around corners from everywheres.

+

+A nigger woman come tearing out of the kitchen with a rolling-pin in her

+hand, singing out, "Begone you Tige! you Spot! begone sah!" and she

+fetched first one and then another of them a clip and sent them howling,

+and then the rest followed; and the next second half of them come back,

+wagging their tails around me, and making friends with me.  There ain't

+no harm in a hound, nohow.

+

+And behind the woman comes a little nigger girl and two little nigger

+boys without anything on but tow-linen shirts, and they hung on to their

+mother's gown, and peeped out from behind her at me, bashful, the way

+they always do.  And here comes the white woman running from the house,

+about forty-five or fifty year old, bareheaded, and her spinning-stick

+in her hand; and behind her comes her little white children, acting the

+same way the little niggers was doing.  She was smiling all over so she

+could hardly standand says:

+

+"It's you, at last!ain't it?"

+

+I out with a "Yes'm" before I thought.

+

+She grabbed me and hugged me tight; and then gripped me by both hands

+and shook and shook; and the tears come in her eyes, and run down over;

+and she couldn't seem to hug and shake enough, and kept saying, "You

+don't look as much like your mother as I reckoned you would; but law

+sakes, I don't care for that, I'm so glad to see you!  Dear, dear, it

+does seem like I could eat you up!  Children, it's your cousin Tom!tell

+him howdy."

+

+But they ducked their heads, and put their fingers in their mouths, and

+hid behind her.  So she run on:

+

+"Lize, hurry up and get him a hot breakfast right awayor did you get

+your breakfast on the boat?"

+

+I said I had got it on the boat.  So then she started for the house,

+leading me by the hand, and the children tagging after.  When we got

+there she set me down in a split-bottomed chair, and set herself down on

+a little low stool in front of me, holding both of my hands, and says:

+

+"Now I can have a good look at you; and, laws-a-me, I've been hungry

+for it a many and a many a time, all these long years, and it's come

+at last! We been expecting you a couple of days and more.  What kep'

+you?boat get aground?"

+

+"Yes'mshe"

+

+"Don't say yes'msay Aunt Sally.  Where'd she get aground?"

+

+I didn't rightly know what to say, because I didn't know whether the

+boat would be coming up the river or down.  But I go a good deal on

+instinct; and my instinct said she would be coming upfrom down towards

+Orleans. That didn't help me much, though; for I didn't know the names

+of bars down that way.  I see I'd got to invent a bar, or forget the

+name of the one we got aground onorNow I struck an idea, and fetched

+it out:

+

+"It warn't the groundingthat didn't keep us back but a little.  We

+blowed out a cylinder-head."

+

+"Good gracious! anybody hurt?"

+

+"No'm.  Killed a nigger."

+

+"Well, it's lucky; because sometimes people do get hurt.  Two years ago

+last Christmas your uncle Silas was coming up from Newrleans on the old

+Lally Rook, and she blowed out a cylinder-head and crippled a man.  And

+I think he died afterwards.  He was a Baptist.  Your uncle Silas knowed

+a family in Baton Rouge that knowed his people very well.  Yes, I

+remember now, he did die.  Mortification set in, and they had to

+amputate him. But it didn't save him.  Yes, it was mortificationthat

+was it.  He turned blue all over, and died in the hope of a glorious

+resurrection. They say he was a sight to look at.  Your uncle's been up

+to the town every day to fetch you. And he's gone again, not more'n an

+hour ago; he'll be back any minute now. You must a met him on the road,

+didn't you?oldish man, with a"

+

+"No, I didn't see nobody, Aunt Sally.  The boat landed just at daylight,

+and I left my baggage on the wharf-boat and went looking around the town

+and out a piece in the country, to put in the time and not get here too

+soon; and so I come down the back way."

+

+"Who'd you give the baggage to?"

+

+"Nobody."

+

+"Why, child, it 'll be stole!"

+

+"Not where I hid it I reckon it won't," I says.

+

+"How'd you get your breakfast so early on the boat?"

+

+It was kinder thin ice, but I says:

+

+"The captain see me standing around, and told me I better have something

+to eat before I went ashore; so he took me in the texas to the officers'

+lunch, and give me all I wanted."

+

+I was getting so uneasy I couldn't listen good.  I had my mind on the

+children all the time; I wanted to get them out to one side and pump

+them a little, and find out who I was.  But I couldn't get no show, Mrs.

+Phelps kept it up and run on so.  Pretty soon she made the cold chills

+streak all down my back, because she says:

+

+"But here we're a-running on this way, and you hain't told me a word

+about Sis, nor any of them.  Now I'll rest my works a little, and you

+start up yourn; just tell me everythingtell me all about 'm all every

+one of 'm; and how they are, and what they're doing, and what they told

+you to tell me; and every last thing you can think of."

+

+Well, I see I was up a stumpand up it good.  Providence had stood by

+me this fur all right, but I was hard and tight aground now.  I see it

+warn't a bit of use to try to go aheadI'd got to throw up my hand.  So

+I says to myself, here's another place where I got to resk the truth.

+ I opened my mouth to begin; but she grabbed me and hustled me in behind

+the bed, and says:

+

+"Here he comes!  Stick your head down lowerthere, that'll do; you can't

+be seen now.  Don't you let on you're here.  I'll play a joke on him.

+Children, don't you say a word."

+

+I see I was in a fix now.  But it warn't no use to worry; there warn't

+nothing to do but just hold still, and try and be ready to stand from

+under when the lightning struck.

+

+I had just one little glimpse of the old gentleman when he come in; then

+the bed hid him.  Mrs. Phelps she jumps for him, and says:

+

+"Has he come?"

+

+"No," says her husband.

+

+"Good-ness gracious!" she says, "what in the warld can have become of

+him?"

+

+"I can't imagine," says the old gentleman; "and I must say it makes me

+dreadful uneasy."

+

+"Uneasy!" she says; "I'm ready to go distracted!  He must a come; and

+you've missed him along the road.  I know it's sosomething tells me

+so."

+

+"Why, Sally, I couldn't miss him along the roadyou know that."

+

+"But oh, dear, dear, what will Sis say!  He must a come!  You must a

+missed him.  He"

+

+"Oh, don't distress me any more'n I'm already distressed.  I don't know

+what in the world to make of it.  I'm at my wit's end, and I don't mind

+acknowledging 't I'm right down scared.  But there's no hope that he's

+come; for he couldn't come and me miss him.  Sally, it's terriblejust

+terriblesomething's happened to the boat, sure!"

+

+"Why, Silas!  Look yonder!up the road!ain't that somebody coming?"

+

+He sprung to the window at the head of the bed, and that give Mrs.

+Phelps the chance she wanted.  She stooped down quick at the foot of the

+bed and give me a pull, and out I come; and when he turned back from the

+window there she stood, a-beaming and a-smiling like a house afire, and

+I standing pretty meek and sweaty alongside.  The old gentleman stared,

+and says:

+

+"Why, who's that?"

+

+"Who do you reckon 't is?"

+

+"I hain't no idea.  Who is it?"

+

+"It's Tom Sawyer!"

+

+By jings, I most slumped through the floor!  But there warn't no time to

+swap knives; the old man grabbed me by the hand and shook, and kept on

+shaking; and all the time how the woman did dance around and laugh and

+cry; and then how they both did fire off questions about Sid, and Mary,

+and the rest of the tribe.

+

+But if they was joyful, it warn't nothing to what I was; for it was like

+being born again, I was so glad to find out who I was.  Well, they froze

+to me for two hours; and at last, when my chin was so tired it couldn't

+hardly go any more, I had told them more about my familyI mean the

+Sawyer familythan ever happened to any six Sawyer families.  And I

+explained all about how we blowed out a cylinder-head at the mouth of

+White River, and it took us three days to fix it.  Which was all right,

+and worked first-rate; because they didn't know but what it would take

+three days to fix it.  If I'd a called it a bolthead it would a done

+just as well.

+

+Now I was feeling pretty comfortable all down one side, and pretty

+uncomfortable all up the other.  Being Tom Sawyer was easy and

+comfortable, and it stayed easy and comfortable till by and by I hear a

+steamboat coughing along down the river.  Then I says to myself, s'pose

+Tom Sawyer comes down on that boat?  And s'pose he steps in here any

+minute, and sings out my name before I can throw him a wink to keep

+quiet?

+

+Well, I couldn't have it that way; it wouldn't do at all.  I must go

+up the road and waylay him.  So I told the folks I reckoned I would go

+up to the town and fetch down my baggage.  The old gentleman was for

+going along with me, but I said no, I could drive the horse myself, and

+I druther he wouldn't take no trouble about me.

+

+

+

+

+CHAPTER XXXIII.

+

+SO I started for town in the wagon, and when I was half-way I see a

+wagon coming, and sure enough it was Tom Sawyer, and I stopped and

+waited till he come along.  I says "Hold on!" and it stopped alongside,

+and his mouth opened up like a trunk, and stayed so; and he swallowed

+two or three times like a person that's got a dry throat, and then says:

+

+"I hain't ever done you no harm.  You know that.  So, then, what you

+want to come back and ha'nt me for?"

+

+I says:

+

+"I hain't come backI hain't been gone."

+

+When he heard my voice it righted him up some, but he warn't quite

+satisfied yet.  He says:

+

+"Don't you play nothing on me, because I wouldn't on you.  Honest injun

+now, you ain't a ghost?"

+

+"Honest injun, I ain't," I says.

+

+"WellIIwell, that ought to settle it, of course; but I can't somehow

+seem to understand it no way.  Looky here, warn't you ever murdered at

+all?"

+

+"No.  I warn't ever murdered at allI played it on them.  You come in

+here and feel of me if you don't believe me."

+

+So he done it; and it satisfied him; and he was that glad to see me

+again he didn't know what to do.  And he wanted to know all about it

+right off, because it was a grand adventure, and mysterious, and so it

+hit him where he lived.  But I said, leave it alone till by and by; and

+told his driver to wait, and we drove off a little piece, and I told

+him the kind of a fix I was in, and what did he reckon we better do?  He

+said, let him alone a minute, and don't disturb him.  So he thought and

+thought, and pretty soon he says:

+

+"It's all right; I've got it.  Take my trunk in your wagon, and let on

+it's your'n; and you turn back and fool along slow, so as to get to the

+house about the time you ought to; and I'll go towards town a piece, and

+take a fresh start, and get there a quarter or a half an hour after you;

+and you needn't let on to know me at first."

+

+I says:

+

+"All right; but wait a minute.  There's one more thinga thing that

+nobody don't know but me.  And that is, there's a nigger here that

+I'm a-trying to steal out of slavery, and his name is Jimold Miss

+Watson's Jim."

+

+He says:

+

+"What!  Why, Jim is"

+

+He stopped and went to studying.  I says:

+

+"I know what you'll say.  You'll say it's dirty, low-down business; but

+what if it is?  I'm low down; and I'm a-going to steal him, and I want

+you keep mum and not let on.  Will you?"

+

+His eye lit up, and he says:

+

+"I'll help you steal him!"

+

+Well, I let go all holts then, like I was shot.  It was the most

+astonishing speech I ever heardand I'm bound to say Tom Sawyer fell

+considerable in my estimation.  Only I couldn't believe it.  Tom Sawyer

+a nigger-stealer!

+

+"Oh, shucks!"  I says; "you're joking."

+

+"I ain't joking, either."

+

+"Well, then," I says, "joking or no joking, if you hear anything said

+about a runaway nigger, don't forget to remember that you don't know

+nothing about him, and I don't know nothing about him."

+

+Then we took the trunk and put it in my wagon, and he drove off his

+way and I drove mine.  But of course I forgot all about driving slow on

+accounts of being glad and full of thinking; so I got home a heap too

+quick for that length of a trip.  The old gentleman was at the door, and

+he says:

+

+"Why, this is wonderful!  Whoever would a thought it was in that mare

+to do it?  I wish we'd a timed her.  And she hain't sweated a hairnot

+a hair. It's wonderful.  Why, I wouldn't take a hundred dollars for that

+horse nowI wouldn't, honest; and yet I'd a sold her for fifteen before,

+and thought 'twas all she was worth."

+

+That's all he said.  He was the innocentest, best old soul I ever see.

+But it warn't surprising; because he warn't only just a farmer, he was

+a preacher, too, and had a little one-horse log church down back of the

+plantation, which he built it himself at his own expense, for a church

+and schoolhouse, and never charged nothing for his preaching, and it was

+worth it, too.  There was plenty other farmer-preachers like that, and

+done the same way, down South.

+

+In about half an hour Tom's wagon drove up to the front stile, and Aunt

+Sally she see it through the window, because it was only about fifty

+yards, and says:

+

+"Why, there's somebody come!  I wonder who 'tis?  Why, I do believe it's

+a stranger.  Jimmy" (that's one of the children) "run and tell Lize to

+put on another plate for dinner."

+

+Everybody made a rush for the front door, because, of course, a stranger

+don't come every year, and so he lays over the yaller-fever, for

+interest, when he does come.  Tom was over the stile and starting for

+the house; the wagon was spinning up the road for the village, and we

+was all bunched in the front door.  Tom had his store clothes on, and an

+audienceand that was always nuts for Tom Sawyer.  In them circumstances

+it warn't no trouble to him to throw in an amount of style that was

+suitable.  He warn't a boy to meeky along up that yard like a sheep; no,

+he come ca'm and important, like the ram.  When he got a-front of us he

+lifts his hat ever so gracious and dainty, like it was the lid of a box

+that had butterflies asleep in it and he didn't want to disturb them,

+and says:

+

+"Mr. Archibald Nichols, I presume?"

+

+"No, my boy," says the old gentleman, "I'm sorry to say 't your driver

+has deceived you; Nichols's place is down a matter of three mile more.

+Come in, come in."

+

+Tom he took a look back over his shoulder, and says, "Too latehe's out

+of sight."

+

+"Yes, he's gone, my son, and you must come in and eat your dinner with

+us; and then we'll hitch up and take you down to Nichols's."

+

+"Oh, I can't make you so much trouble; I couldn't think of it.  I'll

+walkI don't mind the distance."

+

+"But we won't let you walkit wouldn't be Southern hospitality to do

+it. Come right in."

+

+"Oh, do," says Aunt Sally; "it ain't a bit of trouble to us, not a

+bit in the world.  You must stay.  It's a long, dusty three mile, and

+we can't let you walk.  And, besides, I've already told 'em to put on

+another plate when I see you coming; so you mustn't disappoint us.  Come

+right in and make yourself at home."

+

+So Tom he thanked them very hearty and handsome, and let himself be

+persuaded, and come in; and when he was in he said he was a stranger

+from Hicksville, Ohio, and his name was William Thompsonand he made

+another bow.

+

+Well, he run on, and on, and on, making up stuff about Hicksville and

+everybody in it he could invent, and I getting a little nervious, and

+wondering how this was going to help me out of my scrape; and at last,

+still talking along, he reached over and kissed Aunt Sally right on the

+mouth, and then settled back again in his chair comfortable, and was

+going on talking; but she jumped up and wiped it off with the back of

+her hand, and says:

+

+"You owdacious puppy!"

+

+He looked kind of hurt, and says:

+

+"I'm surprised at you, m'am."

+

+"You're s'rpWhy, what do you reckon I am?  I've a good notion to take

+andSay, what do you mean by kissing me?"

+

+He looked kind of humble, and says:

+

+"I didn't mean nothing, m'am.  I didn't mean no harm.  IIthought you'd

+like it."

+

+"Why, you born fool!"  She took up the spinning stick, and it looked

+like it was all she could do to keep from giving him a crack with it.

+ "What made you think I'd like it?"

+

+"Well, I don't know.  Only, theytheytold me you would."

+

+"They told you I would.  Whoever told you's another lunatic.  I

+never heard the beat of it.  Who's they?"

+

+"Why, everybody.  They all said so, m'am."

+

+It was all she could do to hold in; and her eyes snapped, and her

+fingers worked like she wanted to scratch him; and she says:

+

+"Who's 'everybody'?  Out with their names, or ther'll be an idiot

+short."

+

+He got up and looked distressed, and fumbled his hat, and says:

+

+"I'm sorry, and I warn't expecting it.  They told me to.  They all told

+me to.  They all said, kiss her; and said she'd like it.  They all said

+itevery one of them.  But I'm sorry, m'am, and I won't do it no moreI

+won't, honest."

+

+"You won't, won't you?  Well, I sh'd reckon you won't!"

+

+"No'm, I'm honest about it; I won't ever do it againtill you ask me."

+

+"Till I ask you!  Well, I never see the beat of it in my born days!

+ I lay you'll be the Methusalem-numskull of creation before ever I ask

+youor the likes of you."

+

+"Well," he says, "it does surprise me so.  I can't make it out, somehow.

+They said you would, and I thought you would.  But" He stopped and

+looked around slow, like he wished he could run across a friendly eye

+somewheres, and fetched up on the old gentleman's, and says, "Didn't

+you think she'd like me to kiss her, sir?"

+

+"Why, no; IIwell, no, I b'lieve I didn't."

+

+Then he looks on around the same way to me, and says:

+

+"Tom, didn't you think Aunt Sally 'd open out her arms and say, 'Sid

+Sawyer'"

+

+"My land!" she says, breaking in and jumping for him, "you impudent

+young rascal, to fool a body so" and was going to hug him, but he

+fended her off, and says:

+

+"No, not till you've asked me first."

+

+So she didn't lose no time, but asked him; and hugged him and kissed

+him over and over again, and then turned him over to the old man, and he

+took what was left.  And after they got a little quiet again she says:

+

+"Why, dear me, I never see such a surprise.  We warn't looking for you

+at all, but only Tom.  Sis never wrote to me about anybody coming but

+him."

+

+"It's because it warn't intended for any of us to come but Tom," he

+says; "but I begged and begged, and at the last minute she let me

+come, too; so, coming down the river, me and Tom thought it would be a

+first-rate surprise for him to come here to the house first, and for me

+to by and by tag along and drop in, and let on to be a stranger.  But it

+was a mistake, Aunt Sally.  This ain't no healthy place for a stranger

+to come."

+

+"Nonot impudent whelps, Sid.  You ought to had your jaws boxed; I

+hain't been so put out since I don't know when.  But I don't care, I

+don't mind the termsI'd be willing to stand a thousand such jokes to

+have you here. Well, to think of that performance!  I don't deny it, I

+was most putrified with astonishment when you give me that smack."

+

+We had dinner out in that broad open passage betwixt the house and

+the kitchen; and there was things enough on that table for seven

+familiesand all hot, too; none of your flabby, tough meat that's laid

+in a cupboard in a damp cellar all night and tastes like a hunk of

+old cold cannibal in the morning.  Uncle Silas he asked a pretty long

+blessing over it, but it was worth it; and it didn't cool it a bit,

+neither, the way I've seen them kind of interruptions do lots of times.

+ There was a considerable good deal of talk all the afternoon, and me

+and Tom was on the lookout all the time; but it warn't no use, they

+didn't happen to say nothing about any runaway nigger, and we was afraid

+to try to work up to it.  But at supper, at night, one of the little

+boys says:

+

+"Pa, mayn't Tom and Sid and me go to the show?"

+

+"No," says the old man, "I reckon there ain't going to be any; and you

+couldn't go if there was; because the runaway nigger told Burton and

+me all about that scandalous show, and Burton said he would tell the

+people; so I reckon they've drove the owdacious loafers out of town

+before this time."

+

+So there it was!but I couldn't help it.  Tom and me was to sleep in the

+same room and bed; so, being tired, we bid good-night and went up to

+bed right after supper, and clumb out of the window and down the

+lightning-rod, and shoved for the town; for I didn't believe anybody was

+going to give the king and the duke a hint, and so if I didn't hurry up

+and give them one they'd get into trouble sure.

+

+On the road Tom he told me all about how it was reckoned I was murdered,

+and how pap disappeared pretty soon, and didn't come back no more, and

+what a stir there was when Jim run away; and I told Tom all about our

+Royal Nonesuch rapscallions, and as much of the raft voyage as I had

+time to; and as we struck into the town and up through the the middle of

+it--it was as much as half-after eight, thenhere comes a raging rush of

+people with torches, and an awful whooping and yelling, and banging tin

+pans and blowing horns; and we jumped to one side to let them go by;

+and as they went by I see they had the king and the duke astraddle of a

+railthat is, I knowed it was the king and the duke, though they was

+all over tar and feathers, and didn't look like nothing in the

+world that was humanjust looked like a couple of monstrous big

+soldier-plumes.  Well, it made me sick to see it; and I was sorry for

+them poor pitiful rascals, it seemed like I couldn't ever feel any

+hardness against them any more in the world.  It was a dreadful thing to

+see.  Human beings can be awful cruel to one another.

+

+We see we was too latecouldn't do no good.  We asked some stragglers

+about it, and they said everybody went to the show looking very

+innocent; and laid low and kept dark till the poor old king was in the

+middle of his cavortings on the stage; then somebody give a signal, and

+the house rose up and went for them.

+

+So we poked along back home, and I warn't feeling so brash as I was

+before, but kind of ornery, and humble, and to blame, somehowthough

+I hadn't done nothing.  But that's always the way; it don't make no

+difference whether you do right or wrong, a person's conscience ain't

+got no sense, and just goes for him anyway.  If I had a yaller dog that

+didn't know no more than a person's conscience does I would pison him.

+It takes up more room than all the rest of a person's insides, and yet

+ain't no good, nohow.  Tom Sawyer he says the same.

+

+

+

+

+CHAPTER XXXIV.

+

+WE stopped talking, and got to thinking.  By and by Tom says:

+

+"Looky here, Huck, what fools we are to not think of it before!  I bet I

+know where Jim is."

+

+"No!  Where?"

+

+"In that hut down by the ash-hopper.  Why, looky here.  When we was at

+dinner, didn't you see a nigger man go in there with some vittles?"

+

+"Yes."

+

+"What did you think the vittles was for?"

+

+"For a dog."

+

+"So 'd I. Well, it wasn't for a dog."

+

+"Why?"

+

+"Because part of it was watermelon."

+

+"So it wasI noticed it.  Well, it does beat all that I never thought

+about a dog not eating watermelon.  It shows how a body can see and

+don't see at the same time."

+

+"Well, the nigger unlocked the padlock when he went in, and he locked it

+again when he came out.  He fetched uncle a key about the time we got up

+from tablesame key, I bet.  Watermelon shows man, lock shows prisoner;

+and it ain't likely there's two prisoners on such a little plantation,

+and where the people's all so kind and good.  Jim's the prisoner.  All

+rightI'm glad we found it out detective fashion; I wouldn't give shucks

+for any other way.  Now you work your mind, and study out a plan to

+steal Jim, and I will study out one, too; and we'll take the one we like

+the best."

+

+What a head for just a boy to have!  If I had Tom Sawyer's head I

+wouldn't trade it off to be a duke, nor mate of a steamboat, nor clown

+in a circus, nor nothing I can think of.  I went to thinking out a plan,

+but only just to be doing something; I knowed very well where the right

+plan was going to come from.  Pretty soon Tom says:

+

+"Ready?"

+

+"Yes," I says.

+

+"All rightbring it out."

+

+"My plan is this," I says.  "We can easy find out if it's Jim in there.

+Then get up my canoe to-morrow night, and fetch my raft over from the

+island.  Then the first dark night that comes steal the key out of the

+old man's britches after he goes to bed, and shove off down the river

+on the raft with Jim, hiding daytimes and running nights, the way me and

+Jim used to do before.  Wouldn't that plan work?"

+

+"Work?  Why, cert'nly it would work, like rats a-fighting.  But it's

+too blame' simple; there ain't nothing to it.  What's the good of a

+plan that ain't no more trouble than that?  It's as mild as goose-milk.

+ Why, Huck, it wouldn't make no more talk than breaking into a soap

+factory."

+

+I never said nothing, because I warn't expecting nothing different; but

+I knowed mighty well that whenever he got his plan ready it wouldn't

+have none of them objections to it.

+

+And it didn't.  He told me what it was, and I see in a minute it was

+worth fifteen of mine for style, and would make Jim just as free a man

+as mine would, and maybe get us all killed besides.  So I was satisfied,

+and said we would waltz in on it.  I needn't tell what it was here,

+because I knowed it wouldn't stay the way, it was.  I knowed he would be

+changing it around every which way as we went along, and heaving in new

+bullinesses wherever he got a chance.  And that is what he done.

+

+Well, one thing was dead sure, and that was that Tom Sawyer was in

+earnest, and was actuly going to help steal that nigger out of slavery.

+That was the thing that was too many for me.  Here was a boy that was

+respectable and well brung up; and had a character to lose; and folks at

+home that had characters; and he was bright and not leather-headed; and

+knowing and not ignorant; and not mean, but kind; and yet here he was,

+without any more pride, or rightness, or feeling, than to stoop to

+this business, and make himself a shame, and his family a shame,

+before everybody.  I couldn't understand it no way at all.  It was

+outrageous, and I knowed I ought to just up and tell him so; and so be

+his true friend, and let him quit the thing right where he was and save

+himself. And I did start to tell him; but he shut me up, and says:

+

+"Don't you reckon I know what I'm about?  Don't I generly know what I'm

+about?"

+

+"Yes."

+

+"Didn't I say I was going to help steal the nigger?"

+

+"Yes."

+

+"Well, then."

+

+That's all he said, and that's all I said.  It warn't no use to say any

+more; because when he said he'd do a thing, he always done it.  But I

+couldn't make out how he was willing to go into this thing; so I just

+let it go, and never bothered no more about it.  If he was bound to have

+it so, I couldn't help it.

+

+When we got home the house was all dark and still; so we went on down to

+the hut by the ash-hopper for to examine it.  We went through the yard

+so as to see what the hounds would do.  They knowed us, and didn't make

+no more noise than country dogs is always doing when anything comes by

+in the night.  When we got to the cabin we took a look at the front and

+the two sides; and on the side I warn't acquainted withwhich was the

+north sidewe found a square window-hole, up tolerable high, with just

+one stout board nailed across it.  I says:

+

+"Here's the ticket.  This hole's big enough for Jim to get through if we

+wrench off the board."

+

+Tom says:

+

+"It's as simple as tit-tat-toe, three-in-a-row, and as easy as

+playing hooky.  I should hope we can find a way that's a little more

+complicated than that, Huck Finn."

+

+"Well, then," I says, "how 'll it do to saw him out, the way I done

+before I was murdered that time?"

+

+"That's more like," he says.  "It's real mysterious, and troublesome,

+and good," he says; "but I bet we can find a way that's twice as long.

+ There ain't no hurry; le's keep on looking around."

+

+Betwixt the hut and the fence, on the back side, was a lean-to that

+joined the hut at the eaves, and was made out of plank.  It was as long

+as the hut, but narrowonly about six foot wide.  The door to it was at

+the south end, and was padlocked.  Tom he went to the soap-kettle and

+searched around, and fetched back the iron thing they lift the lid with;

+so he took it and prized out one of the staples.  The chain fell down,

+and we opened the door and went in, and shut it, and struck a match,

+and see the shed was only built against a cabin and hadn't no connection

+with it; and there warn't no floor to the shed, nor nothing in it but

+some old rusty played-out hoes and spades and picks and a crippled plow.

+ The match went out, and so did we, and shoved in the staple again, and

+the door was locked as good as ever. Tom was joyful.  He says;

+

+"Now we're all right.  We'll dig him out.  It 'll take about a week!"

+

+Then we started for the house, and I went in the back dooryou only have

+to pull a buckskin latch-string, they don't fasten the doorsbut that

+warn't romantical enough for Tom Sawyer; no way would do him but he must

+climb up the lightning-rod.  But after he got up half way about three

+times, and missed fire and fell every time, and the last time most

+busted his brains out, he thought he'd got to give it up; but after he

+was rested he allowed he would give her one more turn for luck, and this

+time he made the trip.

+

+In the morning we was up at break of day, and down to the nigger cabins

+to pet the dogs and make friends with the nigger that fed Jimif it

+was Jim that was being fed.  The niggers was just getting through

+breakfast and starting for the fields; and Jim's nigger was piling up

+a tin pan with bread and meat and things; and whilst the others was

+leaving, the key come from the house.

+

+This nigger had a good-natured, chuckle-headed face, and his wool was

+all tied up in little bunches with thread.  That was to keep witches

+off.  He said the witches was pestering him awful these nights, and

+making him see all kinds of strange things, and hear all kinds of

+strange words and noises, and he didn't believe he was ever witched so

+long before in his life.  He got so worked up, and got to running on so

+about his troubles, he forgot all about what he'd been a-going to do.

+ So Tom says:

+

+"What's the vittles for?  Going to feed the dogs?"

+

+The nigger kind of smiled around gradually over his face, like when you

+heave a brickbat in a mud-puddle, and he says:

+

+"Yes, Mars Sid, A dog.  Cur'us dog, too.  Does you want to go en look at

+'im?"

+

+"Yes."

+

+I hunched Tom, and whispers:

+

+"You going, right here in the daybreak?  that warn't the plan."

+

+"No, it warn't; but it's the plan now."

+

+So, drat him, we went along, but I didn't like it much.  When we got in

+we couldn't hardly see anything, it was so dark; but Jim was there, sure

+enough, and could see us; and he sings out:

+

+"Why, Huck!  En good lan'! ain' dat Misto Tom?"

+

+I just knowed how it would be; I just expected it.  I didn't know

+nothing to do; and if I had I couldn't a done it, because that nigger

+busted in and says:

+

+"Why, de gracious sakes! do he know you genlmen?"

+

+We could see pretty well now.  Tom he looked at the nigger, steady and

+kind of wondering, and says:

+

+"Does who know us?"

+

+"Why, dis-yer runaway nigger."

+

+"I don't reckon he does; but what put that into your head?"

+

+"What put it dar?  Didn' he jis' dis minute sing out like he knowed

+you?"

+

+Tom says, in a puzzled-up kind of way:

+

+"Well, that's mighty curious.  Who sung out? when did he sing out?

+ what did he sing out?" And turns to me, perfectly ca'm, and says,

+"Did you hear anybody sing out?"

+

+Of course there warn't nothing to be said but the one thing; so I says:

+

+"No; I ain't heard nobody say nothing."

+

+Then he turns to Jim, and looks him over like he never see him before,

+and says:

+

+"Did you sing out?"

+

+"No, sah," says Jim; "I hain't said nothing, sah."

+

+"Not a word?"

+

+"No, sah, I hain't said a word."

+

+"Did you ever see us before?"

+

+"No, sah; not as I knows on."

+

+So Tom turns to the nigger, which was looking wild and distressed, and

+says, kind of severe:

+

+"What do you reckon's the matter with you, anyway?  What made you think

+somebody sung out?"

+

+"Oh, it's de dad-blame' witches, sah, en I wisht I was dead, I do.

+ Dey's awluz at it, sah, en dey do mos' kill me, dey sk'yers me so.

+ Please to don't tell nobody 'bout it sah, er ole Mars Silas he'll scole

+me; 'kase he say dey ain't no witches.  I jis' wish to goodness he was

+heah nowden what would he say!  I jis' bet he couldn' fine no way to

+git aroun' it dis time.  But it's awluz jis' so; people dat's sot,

+stays sot; dey won't look into noth'n'en fine it out f'r deyselves, en

+when you fine it out en tell um 'bout it, dey doan' b'lieve you."

+

+Tom give him a dime, and said we wouldn't tell nobody; and told him to

+buy some more thread to tie up his wool with; and then looks at Jim, and

+says:

+

+"I wonder if Uncle Silas is going to hang this nigger.  If I was to

+catch a nigger that was ungrateful enough to run away, I wouldn't give

+him up, I'd hang him."  And whilst the nigger stepped to the door to

+look at the dime and bite it to see if it was good, he whispers to Jim

+and says:

+

+"Don't ever let on to know us.  And if you hear any digging going on

+nights, it's us; we're going to set you free."

+

+Jim only had time to grab us by the hand and squeeze it; then the nigger

+come back, and we said we'd come again some time if the nigger wanted

+us to; and he said he would, more particular if it was dark, because the

+witches went for him mostly in the dark, and it was good to have folks

+around then.

+

+

+

+

+CHAPTER XXXV.

+

+IT would be most an hour yet till breakfast, so we left and struck down

+into the woods; because Tom said we got to have some light to see how

+to dig by, and a lantern makes too much, and might get us into trouble;

+what we must have was a lot of them rotten chunks that's called

+fox-fire, and just makes a soft kind of a glow when you lay them in a

+dark place.  We fetched an armful and hid it in the weeds, and set down

+to rest, and Tom says, kind of dissatisfied:

+

+"Blame it, this whole thing is just as easy and awkward as it can be.

+And so it makes it so rotten difficult to get up a difficult plan.

+ There ain't no watchman to be druggednow there ought to be a

+watchman.  There ain't even a dog to give a sleeping-mixture to.  And

+there's Jim chained by one leg, with a ten-foot chain, to the leg of his

+bed:  why, all you got to do is to lift up the bedstead and slip off

+the chain.  And Uncle Silas he trusts everybody; sends the key to the

+punkin-headed nigger, and don't send nobody to watch the nigger.  Jim

+could a got out of that window-hole before this, only there wouldn't be

+no use trying to travel with a ten-foot chain on his leg.  Why, drat it,

+Huck, it's the stupidest arrangement I ever see. You got to invent all

+the difficulties.  Well, we can't help it; we got to do the best we can

+with the materials we've got. Anyhow, there's one thingthere's more

+honor in getting him out through a lot of difficulties and dangers,

+where there warn't one of them furnished to you by the people who it was

+their duty to furnish them, and you had to contrive them all out of your

+own head.  Now look at just that one thing of the lantern.  When you

+come down to the cold facts, we simply got to let on that a lantern's

+resky.  Why, we could work with a torchlight procession if we wanted to,

+I believe.  Now, whilst I think of it, we got to hunt up something to

+make a saw out of the first chance we get."

+

+"What do we want of a saw?"

+

+"What do we want of it?  Hain't we got to saw the leg of Jim's bed

+off, so as to get the chain loose?"

+

+"Why, you just said a body could lift up the bedstead and slip the chain

+off."

+

+"Well, if that ain't just like you, Huck Finn.  You can get up the

+infant-schooliest ways of going at a thing.  Why, hain't you ever read

+any books at all?Baron Trenck, nor Casanova, nor Benvenuto Chelleeny,

+nor Henri IV., nor none of them heroes?  Who ever heard of getting a

+prisoner loose in such an old-maidy way as that?  No; the way all the

+best authorities does is to saw the bed-leg in two, and leave it just

+so, and swallow the sawdust, so it can't be found, and put some dirt and

+grease around the sawed place so the very keenest seneskal can't see

+no sign of it's being sawed, and thinks the bed-leg is perfectly sound.

+Then, the night you're ready, fetch the leg a kick, down she goes; slip

+off your chain, and there you are.  Nothing to do but hitch your

+rope ladder to the battlements, shin down it, break your leg in the

+moatbecause a rope ladder is nineteen foot too short, you knowand

+there's your horses and your trusty vassles, and they scoop you up and

+fling you across a saddle, and away you go to your native Langudoc, or

+Navarre, or wherever it is. It's gaudy, Huck.  I wish there was a moat

+to this cabin. If we get time, the night of the escape, we'll dig one."

+

+I says:

+

+"What do we want of a moat when we're going to snake him out from under

+the cabin?"

+

+But he never heard me.  He had forgot me and everything else.  He had

+his chin in his hand, thinking.  Pretty soon he sighs and shakes his

+head; then sighs again, and says:

+

+"No, it wouldn't dothere ain't necessity enough for it."

+

+"For what?"  I says.

+

+"Why, to saw Jim's leg off," he says.

+

+"Good land!"  I says; "why, there ain't no necessity for it.  And what

+would you want to saw his leg off for, anyway?"

+

+"Well, some of the best authorities has done it.  They couldn't get the

+chain off, so they just cut their hand off and shoved.  And a leg would

+be better still.  But we got to let that go.  There ain't necessity

+enough in this case; and, besides, Jim's a nigger, and wouldn't

+understand the reasons for it, and how it's the custom in Europe; so

+we'll let it go.  But there's one thinghe can have a rope ladder; we

+can tear up our sheets and make him a rope ladder easy enough.  And we

+can send it to him in a pie; it's mostly done that way.  And I've et

+worse pies."

+

+"Why, Tom Sawyer, how you talk," I says; "Jim ain't got no use for a

+rope ladder."

+

+"He has got use for it.  How you talk, you better say; you don't

+know nothing about it.  He's got to have a rope ladder; they all do."

+

+"What in the nation can he do with it?"

+

+"Do with it?  He can hide it in his bed, can't he?"  That's what they

+all do; and he's got to, too.  Huck, you don't ever seem to want to do

+anything that's regular; you want to be starting something fresh all the

+time. S'pose he don't do nothing with it? ain't it there in his bed,

+for a clew, after he's gone? and don't you reckon they'll want clews?

+ Of course they will.  And you wouldn't leave them any?  That would be a

+pretty howdy-do, wouldn't it!  I never heard of such a thing."

+

+"Well," I says, "if it's in the regulations, and he's got to have

+it, all right, let him have it; because I don't wish to go back on no

+regulations; but there's one thing, Tom Sawyerif we go to tearing up

+our sheets to make Jim a rope ladder, we're going to get into trouble

+with Aunt Sally, just as sure as you're born.  Now, the way I look at

+it, a hickry-bark ladder don't cost nothing, and don't waste nothing,

+and is just as good to load up a pie with, and hide in a straw tick,

+as any rag ladder you can start; and as for Jim, he ain't had no

+experience, and so he don't care what kind of a"

+

+"Oh, shucks, Huck Finn, if I was as ignorant as you I'd keep

+stillthat's what I'D do.  Who ever heard of a state prisoner escaping

+by a hickry-bark ladder?  Why, it's perfectly ridiculous."

+

+"Well, all right, Tom, fix it your own way; but if you'll take my

+advice, you'll let me borrow a sheet off of the clothesline."

+

+He said that would do.  And that gave him another idea, and he says:

+

+"Borrow a shirt, too."

+

+"What do we want of a shirt, Tom?"

+

+"Want it for Jim to keep a journal on."

+

+"Journal your grannyJim can't write."

+

+"S'pose he can't writehe can make marks on the shirt, can't he, if

+we make him a pen out of an old pewter spoon or a piece of an old iron

+barrel-hoop?"

+

+"Why, Tom, we can pull a feather out of a goose and make him a better

+one; and quicker, too."

+

+"Prisoners don't have geese running around the donjon-keep to pull

+pens out of, you muggins.  They always make their pens out of the

+hardest, toughest, troublesomest piece of old brass candlestick or

+something like that they can get their hands on; and it takes them weeks

+and weeks and months and months to file it out, too, because they've got

+to do it by rubbing it on the wall.  They wouldn't use a goose-quill

+if they had it. It ain't regular."

+

+"Well, then, what'll we make him the ink out of?"

+

+"Many makes it out of iron-rust and tears; but that's the common sort

+and women; the best authorities uses their own blood.  Jim can do that;

+and when he wants to send any little common ordinary mysterious message

+to let the world know where he's captivated, he can write it on the

+bottom of a tin plate with a fork and throw it out of the window.  The

+Iron Mask always done that, and it's a blame' good way, too."

+

+"Jim ain't got no tin plates.  They feed him in a pan."

+

+"That ain't nothing; we can get him some."

+

+"Can't nobody read his plates."

+

+"That ain't got anything to do with it, Huck Finn.  All he's got to

+do is to write on the plate and throw it out.  You don't have to be

+able to read it. Why, half the time you can't read anything a prisoner

+writes on a tin plate, or anywhere else."

+

+"Well, then, what's the sense in wasting the plates?"

+

+"Why, blame it all, it ain't the prisoner's plates."

+

+"But it's somebody's plates, ain't it?"

+

+"Well, spos'n it is?  What does the prisoner care whose"

+

+He broke off there, because we heard the breakfast-horn blowing.  So we

+cleared out for the house.

+

+Along during the morning I borrowed a sheet and a white shirt off of the

+clothes-line; and I found an old sack and put them in it, and we went

+down and got the fox-fire, and put that in too.  I called it borrowing,

+because that was what pap always called it; but Tom said it warn't

+borrowing, it was stealing.  He said we was representing prisoners; and

+prisoners don't care how they get a thing so they get it, and nobody

+don't blame them for it, either.  It ain't no crime in a prisoner to

+steal the thing he needs to get away with, Tom said; it's his right; and

+so, as long as we was representing a prisoner, we had a perfect right to

+steal anything on this place we had the least use for to get ourselves

+out of prison with.  He said if we warn't prisoners it would be a very

+different thing, and nobody but a mean, ornery person would steal when

+he warn't a prisoner.  So we allowed we would steal everything there was

+that come handy.  And yet he made a mighty fuss, one day, after that,

+when I stole a watermelon out of the nigger-patch and eat it; and he

+made me go and give the niggers a dime without telling them what it

+was for. Tom said that what he meant was, we could steal anything we

+needed. Well, I says, I needed the watermelon.  But he said I didn't

+need it to get out of prison with; there's where the difference was.

+ He said if I'd a wanted it to hide a knife in, and smuggle it to Jim

+to kill the seneskal with, it would a been all right.  So I let it go at

+that, though I couldn't see no advantage in my representing a prisoner

+if I got to set down and chaw over a lot of gold-leaf distinctions like

+that every time I see a chance to hog a watermelon.

+

+Well, as I was saying, we waited that morning till everybody was settled

+down to business, and nobody in sight around the yard; then Tom he

+carried the sack into the lean-to whilst I stood off a piece to keep

+watch.  By and by he come out, and we went and set down on the woodpile

+to talk.  He says:

+

+"Everything's all right now except tools; and that's easy fixed."

+

+"Tools?"  I says.

+

+"Yes."

+

+"Tools for what?"

+

+"Why, to dig with.  We ain't a-going to gnaw him out, are we?"

+

+"Ain't them old crippled picks and things in there good enough to dig a

+nigger out with?"  I says.

+

+He turns on me, looking pitying enough to make a body cry, and says:

+

+"Huck Finn, did you ever hear of a prisoner having picks and shovels,

+and all the modern conveniences in his wardrobe to dig himself out with?

+ Now I want to ask youif you got any reasonableness in you at allwhat

+kind of a show would that give him to be a hero?  Why, they might as

+well lend him the key and done with it.  Picks and shovelswhy, they

+wouldn't furnish 'em to a king."

+

+"Well, then," I says, "if we don't want the picks and shovels, what do

+we want?"

+

+"A couple of case-knives."

+

+"To dig the foundations out from under that cabin with?"

+

+"Yes."

+

+"Confound it, it's foolish, Tom."

+

+"It don't make no difference how foolish it is, it's the right wayand

+it's the regular way.  And there ain't no other way, that ever I heard

+of, and I've read all the books that gives any information about these

+things. They always dig out with a case-knifeand not through dirt, mind

+you; generly it's through solid rock.  And it takes them weeks and weeks

+and weeks, and for ever and ever.  Why, look at one of them prisoners in

+the bottom dungeon of the Castle Deef, in the harbor of Marseilles, that

+dug himself out that way; how long was he at it, you reckon?"

+

+"I don't know."

+

+"Well, guess."

+

+"I don't know.  A month and a half."

+

+"Thirty-seven yearand he come out in China.  That's the kind.  I

+wish the bottom of this fortress was solid rock."

+

+"Jim don't know nobody in China."

+

+"What's that got to do with it?  Neither did that other fellow.  But

+you're always a-wandering off on a side issue.  Why can't you stick to

+the main point?"

+

+"All rightI don't care where he comes out, so he comes out; and Jim

+don't, either, I reckon.  But there's one thing, anywayJim's too old to

+be dug out with a case-knife.  He won't last."

+

+"Yes he will last, too.  You don't reckon it's going to take

+thirty-seven years to dig out through a dirt foundation, do you?"

+

+"How long will it take, Tom?"

+

+"Well, we can't resk being as long as we ought to, because it mayn't

+take very long for Uncle Silas to hear from down there by New Orleans.

+ He'll hear Jim ain't from there.  Then his next move will be to

+advertise Jim, or something like that.  So we can't resk being as long

+digging him out as we ought to.  By rights I reckon we ought to be

+a couple of years; but we can't.  Things being so uncertain, what I

+recommend is this:  that we really dig right in, as quick as we can;

+and after that, we can let on, to ourselves, that we was at it

+thirty-seven years.  Then we can snatch him out and rush him away the

+first time there's an alarm.  Yes, I reckon that 'll be the best way."

+

+"Now, there's sense in that," I says.  "Letting on don't cost nothing;

+letting on ain't no trouble; and if it's any object, I don't mind

+letting on we was at it a hundred and fifty year.  It wouldn't strain

+me none, after I got my hand in.  So I'll mosey along now, and smouch a

+couple of case-knives."

+

+"Smouch three," he says; "we want one to make a saw out of."

+

+"Tom, if it ain't unregular and irreligious to sejest it," I says,

+"there's an old rusty saw-blade around yonder sticking under the

+weather-boarding behind the smoke-house."

+

+He looked kind of weary and discouraged-like, and says:

+

+"It ain't no use to try to learn you nothing, Huck.  Run along and

+smouch the knivesthree of them."  So I done it.

+

+

+

+

+CHAPTER XXXVI.

+

+AS soon as we reckoned everybody was asleep that night we went down the

+lightning-rod, and shut ourselves up in the lean-to, and got out our

+pile of fox-fire, and went to work.  We cleared everything out of the

+way, about four or five foot along the middle of the bottom log.  Tom

+said he was right behind Jim's bed now, and we'd dig in under it, and

+when we got through there couldn't nobody in the cabin ever know there

+was any hole there, because Jim's counter-pin hung down most to the

+ground, and you'd have to raise it up and look under to see the hole.

+ So we dug and dug with the case-knives till most midnight; and then

+we was dog-tired, and our hands was blistered, and yet you couldn't see

+we'd done anything hardly.  At last I says:

+

+"This ain't no thirty-seven year job; this is a thirty-eight year job,

+Tom Sawyer."

+

+He never said nothing.  But he sighed, and pretty soon he stopped

+digging, and then for a good little while I knowed that he was thinking.

+Then he says:

+

+"It ain't no use, Huck, it ain't a-going to work.  If we was prisoners

+it would, because then we'd have as many years as we wanted, and no

+hurry; and we wouldn't get but a few minutes to dig, every day, while

+they was changing watches, and so our hands wouldn't get blistered, and

+we could keep it up right along, year in and year out, and do it right,

+and the way it ought to be done.  But we can't fool along; we got to

+rush; we ain't got no time to spare.  If we was to put in another

+night this way we'd have to knock off for a week to let our hands get

+wellcouldn't touch a case-knife with them sooner."

+

+"Well, then, what we going to do, Tom?"

+

+"I'll tell you.  It ain't right, and it ain't moral, and I wouldn't like

+it to get out; but there ain't only just the one way:  we got to dig him

+out with the picks, and let on it's case-knives."

+

+"Now you're talking!"  I says; "your head gets leveler and leveler

+all the time, Tom Sawyer," I says.  "Picks is the thing, moral or no

+moral; and as for me, I don't care shucks for the morality of it, nohow.

+ When I start in to steal a nigger, or a watermelon, or a Sunday-school

+book, I ain't no ways particular how it's done so it's done.  What I

+want is my nigger; or what I want is my watermelon; or what I want is my

+Sunday-school book; and if a pick's the handiest thing, that's the thing

+I'm a-going to dig that nigger or that watermelon or that Sunday-school

+book out with; and I don't give a dead rat what the authorities thinks

+about it nuther."

+

+"Well," he says, "there's excuse for picks and letting-on in a case like

+this; if it warn't so, I wouldn't approve of it, nor I wouldn't stand by

+and see the rules brokebecause right is right, and wrong is wrong,

+and a body ain't got no business doing wrong when he ain't ignorant and

+knows better.  It might answer for you to dig Jim out with a pick,

+without any letting on, because you don't know no better; but it

+wouldn't for me, because I do know better.  Gimme a case-knife."

+

+He had his own by him, but I handed him mine.  He flung it down, and

+says:

+

+"Gimme a case-knife."

+

+I didn't know just what to dobut then I thought.  I scratched around

+amongst the old tools, and got a pickaxe and give it to him, and he took

+it and went to work, and never said a word.

+

+He was always just that particular.  Full of principle.

+

+So then I got a shovel, and then we picked and shoveled, turn about,

+and made the fur fly.  We stuck to it about a half an hour, which was as

+long as we could stand up; but we had a good deal of a hole to show for

+it. When I got up stairs I looked out at the window and see Tom doing

+his level best with the lightning-rod, but he couldn't come it, his

+hands was so sore.  At last he says:

+

+"It ain't no use, it can't be done.  What you reckon I better do?  Can't

+you think of no way?"

+

+"Yes," I says, "but I reckon it ain't regular.  Come up the stairs, and

+let on it's a lightning-rod."

+

+So he done it.

+

+Next day Tom stole a pewter spoon and a brass candlestick in the house,

+for to make some pens for Jim out of, and six tallow candles; and I

+hung around the nigger cabins and laid for a chance, and stole three tin

+plates.  Tom says it wasn't enough; but I said nobody wouldn't ever see

+the plates that Jim throwed out, because they'd fall in the dog-fennel

+and jimpson weeds under the window-holethen we could tote them back and

+he could use them over again.  So Tom was satisfied.  Then he says:

+

+"Now, the thing to study out is, how to get the things to Jim."

+

+"Take them in through the hole," I says, "when we get it done."

+

+He only just looked scornful, and said something about nobody ever heard

+of such an idiotic idea, and then he went to studying.  By and by he

+said he had ciphered out two or three ways, but there warn't no need to

+decide on any of them yet.  Said we'd got to post Jim first.

+

+That night we went down the lightning-rod a little after ten, and took

+one of the candles along, and listened under the window-hole, and heard

+Jim snoring; so we pitched it in, and it didn't wake him.  Then we

+whirled in with the pick and shovel, and in about two hours and a half

+the job was done.  We crept in under Jim's bed and into the cabin, and

+pawed around and found the candle and lit it, and stood over Jim awhile,

+and found him looking hearty and healthy, and then we woke him up gentle

+and gradual.  He was so glad to see us he most cried; and called us

+honey, and all the pet names he could think of; and was for having us

+hunt up a cold-chisel to cut the chain off of his leg with right away,

+and clearing out without losing any time.  But Tom he showed him how

+unregular it would be, and set down and told him all about our plans,

+and how we could alter them in a minute any time there was an alarm; and

+not to be the least afraid, because we would see he got away, sure.

+ So Jim he said it was all right, and we set there and talked over old

+times awhile, and then Tom asked a lot of questions, and when Jim told

+him Uncle Silas come in every day or two to pray with him, and Aunt

+Sally come in to see if he was comfortable and had plenty to eat, and

+both of them was kind as they could be, Tom says:

+

+"Now I know how to fix it.  We'll send you some things by them."

+

+I said, "Don't do nothing of the kind; it's one of the most jackass

+ideas I ever struck;" but he never paid no attention to me; went right

+on.  It was his way when he'd got his plans set.

+

+So he told Jim how we'd have to smuggle in the rope-ladder pie and other

+large things by Nat, the nigger that fed him, and he must be on the

+lookout, and not be surprised, and not let Nat see him open them; and

+we would put small things in uncle's coat-pockets and he must steal them

+out; and we would tie things to aunt's apron-strings or put them in her

+apron-pocket, if we got a chance; and told him what they would be and

+what they was for.  And told him how to keep a journal on the shirt with

+his blood, and all that. He told him everything.  Jim he couldn't see

+no sense in the most of it, but he allowed we was white folks and knowed

+better than him; so he was satisfied, and said he would do it all just

+as Tom said.

+

+Jim had plenty corn-cob pipes and tobacco; so we had a right down good

+sociable time; then we crawled out through the hole, and so home to

+bed, with hands that looked like they'd been chawed.  Tom was in high

+spirits. He said it was the best fun he ever had in his life, and the

+most intellectural; and said if he only could see his way to it we would

+keep it up all the rest of our lives and leave Jim to our children to

+get out; for he believed Jim would come to like it better and better the

+more he got used to it.  He said that in that way it could be strung out

+to as much as eighty year, and would be the best time on record.  And he

+said it would make us all celebrated that had a hand in it.

+

+In the morning we went out to the woodpile and chopped up the brass

+candlestick into handy sizes, and Tom put them and the pewter spoon in

+his pocket.  Then we went to the nigger cabins, and while I got Nat's

+notice off, Tom shoved a piece of candlestick into the middle of a

+corn-pone that was in Jim's pan, and we went along with Nat to see how

+it would work, and it just worked noble; when Jim bit into it it most

+mashed all his teeth out; and there warn't ever anything could a worked

+better. Tom said so himself. Jim he never let on but what it was only

+just a piece of rock or something like that that's always getting into

+bread, you know; but after that he never bit into nothing but what he

+jabbed his fork into it in three or four places first.

+

+And whilst we was a-standing there in the dimmish light, here comes a

+couple of the hounds bulging in from under Jim's bed; and they kept on

+piling in till there was eleven of them, and there warn't hardly room

+in there to get your breath.  By jings, we forgot to fasten that lean-to

+door!  The nigger Nat he only just hollered "Witches" once, and keeled

+over on to the floor amongst the dogs, and begun to groan like he was

+dying.  Tom jerked the door open and flung out a slab of Jim's meat,

+and the dogs went for it, and in two seconds he was out himself and back

+again and shut the door, and I knowed he'd fixed the other door too.

+Then he went to work on the nigger, coaxing him and petting him, and

+asking him if he'd been imagining he saw something again.  He raised up,

+and blinked his eyes around, and says:

+

+"Mars Sid, you'll say I's a fool, but if I didn't b'lieve I see most a

+million dogs, er devils, er some'n, I wisht I may die right heah in dese

+tracks.  I did, mos' sholy.  Mars Sid, I felt umI felt um, sah; dey

+was all over me.  Dad fetch it, I jis' wisht I could git my han's on one

+er dem witches jis' wunston'y jis' wunstit's all I'd ast.  But mos'ly

+I wisht dey'd lemme 'lone, I does."

+

+Tom says:

+

+"Well, I tell you what I think.  What makes them come here just at this

+runaway nigger's breakfast-time?  It's because they're hungry; that's

+the reason.  You make them a witch pie; that's the thing for you to

+do."

+

+"But my lan', Mars Sid, how's I gwyne to make 'm a witch pie?  I doan'

+know how to make it.  I hain't ever hearn er sich a thing b'fo'."

+

+"Well, then, I'll have to make it myself."

+

+"Will you do it, honey?will you?  I'll wusshup de groun' und' yo' foot,

+I will!"

+

+"All right, I'll do it, seeing it's you, and you've been good to us and

+showed us the runaway nigger.  But you got to be mighty careful.  When

+we come around, you turn your back; and then whatever we've put in the

+pan, don't you let on you see it at all.  And don't you look when Jim

+unloads the pansomething might happen, I don't know what.  And above

+all, don't you handle the witch-things."

+

+"Hannel 'M, Mars Sid?  What is you a-talkin' 'bout?  I wouldn'

+lay de weight er my finger on um, not f'r ten hund'd thous'n billion

+dollars, I wouldn't."

+

+

+

+

+CHAPTER XXXVII.

+

+THAT was all fixed.  So then we went away and went to the rubbage-pile

+in the back yard, where they keep the old boots, and rags, and pieces

+of bottles, and wore-out tin things, and all such truck, and scratched

+around and found an old tin washpan, and stopped up the holes as well as

+we could, to bake the pie in, and took it down cellar and stole it full

+of flour and started for breakfast, and found a couple of shingle-nails

+that Tom said would be handy for a prisoner to scrabble his name and

+sorrows on the dungeon walls with, and dropped one of them in Aunt

+Sally's apron-pocket which was hanging on a chair, and t'other we stuck

+in the band of Uncle Silas's hat, which was on the bureau, because we

+heard the children say their pa and ma was going to the runaway nigger's

+house this morning, and then went to breakfast, and Tom dropped the

+pewter spoon in Uncle Silas's coat-pocket, and Aunt Sally wasn't come

+yet, so we had to wait a little while.

+

+And when she come she was hot and red and cross, and couldn't hardly

+wait for the blessing; and then she went to sluicing out coffee with one

+hand and cracking the handiest child's head with her thimble with the

+other, and says:

+

+"I've hunted high and I've hunted low, and it does beat all what has

+become of your other shirt."

+

+My heart fell down amongst my lungs and livers and things, and a hard

+piece of corn-crust started down my throat after it and got met on the

+road with a cough, and was shot across the table, and took one of the

+children in the eye and curled him up like a fishing-worm, and let a cry

+out of him the size of a warwhoop, and Tom he turned kinder blue around

+the gills, and it all amounted to a considerable state of things for

+about a quarter of a minute or as much as that, and I would a sold out

+for half price if there was a bidder.  But after that we was all right

+againit was the sudden surprise of it that knocked us so kind of cold.

+Uncle Silas he says:

+

+"It's most uncommon curious, I can't understand it.  I know perfectly

+well I took it off, because"

+

+"Because you hain't got but one on.  Just listen at the man!  I know

+you took it off, and know it by a better way than your wool-gethering

+memory, too, because it was on the clo's-line yesterdayI see it there

+myself. But it's gone, that's the long and the short of it, and you'll

+just have to change to a red flann'l one till I can get time to make a

+new one. And it 'll be the third I've made in two years.  It just keeps

+a body on the jump to keep you in shirts; and whatever you do manage to

+do with 'm all is more'n I can make out.  A body 'd think you would

+learn to take some sort of care of 'em at your time of life."

+

+"I know it, Sally, and I do try all I can.  But it oughtn't to be

+altogether my fault, because, you know, I don't see them nor have

+nothing to do with them except when they're on me; and I don't believe

+I've ever lost one of them off of me."

+

+"Well, it ain't your fault if you haven't, Silas; you'd a done it

+if you could, I reckon.  And the shirt ain't all that's gone, nuther.

+ Ther's a spoon gone; and that ain't all.  There was ten, and now

+ther's only nine. The calf got the shirt, I reckon, but the calf never

+took the spoon, that's certain."

+

+"Why, what else is gone, Sally?"

+

+"Ther's six candles gonethat's what.  The rats could a got the

+candles, and I reckon they did; I wonder they don't walk off with the

+whole place, the way you're always going to stop their holes and don't

+do it; and if they warn't fools they'd sleep in your hair, Silasyou'd

+never find it out; but you can't lay the spoon on the rats, and that I

+know."

+

+"Well, Sally, I'm in fault, and I acknowledge it; I've been remiss; but

+I won't let to-morrow go by without stopping up them holes."

+

+"Oh, I wouldn't hurry; next year 'll do.  Matilda Angelina Araminta

+Phelps!"

+

+Whack comes the thimble, and the child snatches her claws out of the

+sugar-bowl without fooling around any.  Just then the nigger woman steps

+on to the passage, and says:

+

+"Missus, dey's a sheet gone."

+

+"A sheet gone!  Well, for the land's sake!"

+

+"I'll stop up them holes to-day," says Uncle Silas, looking sorrowful.

+

+"Oh, do shet up!s'pose the rats took the sheet?  where's it gone,

+Lize?"

+

+"Clah to goodness I hain't no notion, Miss' Sally.  She wuz on de

+clo'sline yistiddy, but she done gone:  she ain' dah no mo' now."

+

+"I reckon the world is coming to an end.  I never see the beat of it

+in all my born days.  A shirt, and a sheet, and a spoon, and six can"

+

+"Missus," comes a young yaller wench, "dey's a brass cannelstick

+miss'n."

+

+"Cler out from here, you hussy, er I'll take a skillet to ye!"

+

+Well, she was just a-biling.  I begun to lay for a chance; I reckoned

+I would sneak out and go for the woods till the weather moderated.  She

+kept a-raging right along, running her insurrection all by herself, and

+everybody else mighty meek and quiet; and at last Uncle Silas, looking

+kind of foolish, fishes up that spoon out of his pocket.  She stopped,

+with her mouth open and her hands up; and as for me, I wished I was in

+Jeruslem or somewheres. But not long, because she says:

+

+"It's just as I expected.  So you had it in your pocket all the time;

+and like as not you've got the other things there, too.  How'd it get

+there?"

+

+"I reely don't know, Sally," he says, kind of apologizing, "or you know

+I would tell.  I was a-studying over my text in Acts Seventeen before

+breakfast, and I reckon I put it in there, not noticing, meaning to put

+my Testament in, and it must be so, because my Testament ain't in; but

+I'll go and see; and if the Testament is where I had it, I'll know I

+didn't put it in, and that will show that I laid the Testament down and

+took up the spoon, and"

+

+"Oh, for the land's sake!  Give a body a rest!  Go 'long now, the whole

+kit and biling of ye; and don't come nigh me again till I've got back my

+peace of mind."

+

+I'D a heard her if she'd a said it to herself, let alone speaking it

+out; and I'd a got up and obeyed her if I'd a been dead.  As we was

+passing through the setting-room the old man he took up his hat, and the

+shingle-nail fell out on the floor, and he just merely picked it up and

+laid it on the mantel-shelf, and never said nothing, and went out.  Tom

+see him do it, and remembered about the spoon, and says:

+

+"Well, it ain't no use to send things by him no more, he ain't

+reliable." Then he says:  "But he done us a good turn with the spoon,

+anyway, without knowing it, and so we'll go and do him one without him

+knowing itstop up his rat-holes."

+

+There was a noble good lot of them down cellar, and it took us a whole

+hour, but we done the job tight and good and shipshape.  Then we heard

+steps on the stairs, and blowed out our light and hid; and here comes

+the old man, with a candle in one hand and a bundle of stuff in t'other,

+looking as absent-minded as year before last.  He went a mooning around,

+first to one rat-hole and then another, till he'd been to them all.

+ Then he stood about five minutes, picking tallow-drip off of his candle

+and thinking.  Then he turns off slow and dreamy towards the stairs,

+saying:

+

+"Well, for the life of me I can't remember when I done it.  I could

+show her now that I warn't to blame on account of the rats.  But never

+mindlet it go.  I reckon it wouldn't do no good."

+

+And so he went on a-mumbling up stairs, and then we left.  He was a

+mighty nice old man.  And always is.

+

+Tom was a good deal bothered about what to do for a spoon, but he said

+we'd got to have it; so he took a think.  When he had ciphered it out

+he told me how we was to do; then we went and waited around the

+spoon-basket till we see Aunt Sally coming, and then Tom went to

+counting the spoons and laying them out to one side, and I slid one of

+them up my sleeve, and Tom says:

+

+"Why, Aunt Sally, there ain't but nine spoons yet."

+

+She says:

+

+"Go 'long to your play, and don't bother me.  I know better, I counted

+'m myself."

+

+"Well, I've counted them twice, Aunty, and I can't make but nine."

+

+She looked out of all patience, but of course she come to countanybody

+would.

+

+"I declare to gracious ther' ain't but nine!" she says.  "Why, what in

+the worldplague take the things, I'll count 'm again."

+

+So I slipped back the one I had, and when she got done counting, she

+says:

+

+"Hang the troublesome rubbage, ther's ten now!" and she looked huffy

+and bothered both.  But Tom says:

+

+"Why, Aunty, I don't think there's ten."

+

+"You numskull, didn't you see me count 'm?"

+

+"I know, but"

+

+"Well, I'll count 'm again."

+

+So I smouched one, and they come out nine, same as the other time.

+ Well, she was in a tearing wayjust a-trembling all over, she was so

+mad.  But she counted and counted till she got that addled she'd start

+to count in the basket for a spoon sometimes; and so, three times they

+come out right, and three times they come out wrong.  Then she grabbed

+up the basket and slammed it across the house and knocked the cat

+galley-west; and she said cle'r out and let her have some peace, and if

+we come bothering around her again betwixt that and dinner she'd skin

+us.  So we had the odd spoon, and dropped it in her apron-pocket whilst

+she was a-giving us our sailing orders, and Jim got it all right, along

+with her shingle nail, before noon.  We was very well satisfied with

+this business, and Tom allowed it was worth twice the trouble it took,

+because he said now she couldn't ever count them spoons twice alike

+again to save her life; and wouldn't believe she'd counted them right if

+she did; and said that after she'd about counted her head off for the

+next three days he judged she'd give it up and offer to kill anybody

+that wanted her to ever count them any more.

+

+So we put the sheet back on the line that night, and stole one out of

+her closet; and kept on putting it back and stealing it again for a

+couple of days till she didn't know how many sheets she had any more,

+and she didn't care, and warn't a-going to bullyrag the rest of her

+soul out about it, and wouldn't count them again not to save her life;

+she druther die first.

+

+So we was all right now, as to the shirt and the sheet and the spoon

+and the candles, by the help of the calf and the rats and the mixed-up

+counting; and as to the candlestick, it warn't no consequence, it would

+blow over by and by.

+

+But that pie was a job; we had no end of trouble with that pie.  We

+fixed it up away down in the woods, and cooked it there; and we got it

+done at last, and very satisfactory, too; but not all in one day; and we

+had to use up three wash-pans full of flour before we got through, and

+we got burnt pretty much all over, in places, and eyes put out with

+the smoke; because, you see, we didn't want nothing but a crust, and we

+couldn't prop it up right, and she would always cave in.  But of course

+we thought of the right way at lastwhich was to cook the ladder, too,

+in the pie.  So then we laid in with Jim the second night, and tore

+up the sheet all in little strings and twisted them together, and long

+before daylight we had a lovely rope that you could a hung a person

+with.  We let on it took nine months to make it.

+

+And in the forenoon we took it down to the woods, but it wouldn't go

+into the pie.  Being made of a whole sheet, that way, there was rope

+enough for forty pies if we'd a wanted them, and plenty left over

+for soup, or sausage, or anything you choose.  We could a had a whole

+dinner.

+

+But we didn't need it.  All we needed was just enough for the pie, and

+so we throwed the rest away.  We didn't cook none of the pies in the

+wash-panafraid the solder would melt; but Uncle Silas he had a noble

+brass warming-pan which he thought considerable of, because it belonged

+to one of his ancesters with a long wooden handle that come over from

+England with William the Conqueror in the Mayflower or one of them early

+ships and was hid away up garret with a lot of other old pots and things

+that was valuable, not on account of being any account, because they

+warn't, but on account of them being relicts, you know, and we snaked

+her out, private, and took her down there, but she failed on the first

+pies, because we didn't know how, but she come up smiling on the last

+one.  We took and lined her with dough, and set her in the coals, and

+loaded her up with rag rope, and put on a dough roof, and shut down the

+lid, and put hot embers on top, and stood off five foot, with the long

+handle, cool and comfortable, and in fifteen minutes she turned out a

+pie that was a satisfaction to look at. But the person that et it would

+want to fetch a couple of kags of toothpicks along, for if that rope

+ladder wouldn't cramp him down to business I don't know nothing what I'm

+talking about, and lay him in enough stomach-ache to last him till next

+time, too.

+

+Nat didn't look when we put the witch pie in Jim's pan; and we put the

+three tin plates in the bottom of the pan under the vittles; and so Jim

+got everything all right, and as soon as he was by himself he busted

+into the pie and hid the rope ladder inside of his straw tick,

+and scratched some marks on a tin plate and throwed it out of the

+window-hole.

+

+

+

+

+CHAPTER XXXVIII.

+

+MAKING them pens was a distressid tough job, and so was the saw; and Jim

+allowed the inscription was going to be the toughest of all.  That's the

+one which the prisoner has to scrabble on the wall.  But he had to have

+it; Tom said he'd got to; there warn't no case of a state prisoner not

+scrabbling his inscription to leave behind, and his coat of arms.

+

+"Look at Lady Jane Grey," he says; "look at Gilford Dudley; look at old

+Northumberland!  Why, Huck, s'pose it is considerble trouble?what

+you going to do?how you going to get around it?  Jim's got to do his

+inscription and coat of arms.  They all do."

+

+Jim says:

+

+"Why, Mars Tom, I hain't got no coat o' arm; I hain't got nuffn but dish

+yer ole shirt, en you knows I got to keep de journal on dat."

+

+"Oh, you don't understand, Jim; a coat of arms is very different."

+

+"Well," I says, "Jim's right, anyway, when he says he ain't got no coat

+of arms, because he hain't."

+

+"I reckon I knowed that," Tom says, "but you bet he'll have one before

+he goes out of thisbecause he's going out right, and there ain't

+going to be no flaws in his record."

+

+So whilst me and Jim filed away at the pens on a brickbat apiece, Jim

+a-making his'n out of the brass and I making mine out of the spoon,

+Tom set to work to think out the coat of arms.  By and by he said he'd

+struck so many good ones he didn't hardly know which to take, but there

+was one which he reckoned he'd decide on.  He says:

+

+"On the scutcheon we'll have a bend or in the dexter base, a saltire

+murrey in the fess, with a dog, couchant, for common charge, and under

+his foot a chain embattled, for slavery, with a chevron vert in a

+chief engrailed, and three invected lines on a field azure, with the

+nombril points rampant on a dancette indented; crest, a runaway nigger,

+sable, with his bundle over his shoulder on a bar sinister; and a

+couple of gules for supporters, which is you and me; motto, Maggiore

+Fretta, Minore Otto.  Got it out of a bookmeans the more haste the

+less speed."

+

+"Geewhillikins," I says, "but what does the rest of it mean?"

+

+"We ain't got no time to bother over that," he says; "we got to dig in

+like all git-out."

+

+"Well, anyway," I says, "what's some of it?  What's a fess?"

+

+"A fessa fess isyou don't need to know what a fess is.  I'll show

+him how to make it when he gets to it."

+

+"Shucks, Tom," I says, "I think you might tell a person.  What's a bar

+sinister?"

+

+"Oh, I don't know.  But he's got to have it.  All the nobility does."

+

+That was just his way.  If it didn't suit him to explain a thing to you,

+he wouldn't do it.  You might pump at him a week, it wouldn't make no

+difference.

+

+He'd got all that coat of arms business fixed, so now he started in to

+finish up the rest of that part of the work, which was to plan out a

+mournful inscriptionsaid Jim got to have one, like they all done.  He

+made up a lot, and wrote them out on a paper, and read them off, so:

+

+1.  Here a captive heart busted. 2.  Here a poor prisoner, forsook by

+the world and friends, fretted his sorrowful life. 3.  Here a lonely

+heart broke, and a worn spirit went to its rest, after thirty-seven

+years of solitary captivity. 4.  Here, homeless and friendless, after

+thirty-seven years of bitter captivity, perished a noble stranger,

+natural son of Louis XIV.

+

+Tom's voice trembled whilst he was reading them, and he most broke down.

+When he got done he couldn't no way make up his mind which one for Jim

+to scrabble on to the wall, they was all so good; but at last he allowed

+he would let him scrabble them all on.  Jim said it would take him a

+year to scrabble such a lot of truck on to the logs with a nail, and he

+didn't know how to make letters, besides; but Tom said he would block

+them out for him, and then he wouldn't have nothing to do but just

+follow the lines.  Then pretty soon he says:

+

+"Come to think, the logs ain't a-going to do; they don't have log walls

+in a dungeon:  we got to dig the inscriptions into a rock.  We'll fetch

+a rock."

+

+Jim said the rock was worse than the logs; he said it would take him

+such a pison long time to dig them into a rock he wouldn't ever get out.

+ But Tom said he would let me help him do it.  Then he took a look to

+see how me and Jim was getting along with the pens.  It was most pesky

+tedious hard work and slow, and didn't give my hands no show to get

+well of the sores, and we didn't seem to make no headway, hardly; so Tom

+says:

+

+"I know how to fix it.  We got to have a rock for the coat of arms and

+mournful inscriptions, and we can kill two birds with that same rock.

+There's a gaudy big grindstone down at the mill, and we'll smouch it,

+and carve the things on it, and file out the pens and the saw on it,

+too."

+

+It warn't no slouch of an idea; and it warn't no slouch of a grindstone

+nuther; but we allowed we'd tackle it.  It warn't quite midnight yet,

+so we cleared out for the mill, leaving Jim at work.  We smouched the

+grindstone, and set out to roll her home, but it was a most nation tough

+job. Sometimes, do what we could, we couldn't keep her from falling

+over, and she come mighty near mashing us every time.  Tom said she was

+going to get one of us, sure, before we got through.  We got her half

+way; and then we was plumb played out, and most drownded with sweat.  We

+see it warn't no use; we got to go and fetch Jim. So he raised up his

+bed and slid the chain off of the bed-leg, and wrapt it round and round

+his neck, and we crawled out through our hole and down there, and Jim

+and me laid into that grindstone and walked her along like nothing; and

+Tom superintended.  He could out-superintend any boy I ever see.  He

+knowed how to do everything.

+

+Our hole was pretty big, but it warn't big enough to get the grindstone

+through; but Jim he took the pick and soon made it big enough.  Then Tom

+marked out them things on it with the nail, and set Jim to work on them,

+with the nail for a chisel and an iron bolt from the rubbage in the

+lean-to for a hammer, and told him to work till the rest of his candle

+quit on him, and then he could go to bed, and hide the grindstone under

+his straw tick and sleep on it.  Then we helped him fix his chain back

+on the bed-leg, and was ready for bed ourselves.  But Tom thought of

+something, and says:

+

+"You got any spiders in here, Jim?"

+

+"No, sah, thanks to goodness I hain't, Mars Tom."

+

+"All right, we'll get you some."

+

+"But bless you, honey, I doan' want none.  I's afeard un um.  I jis'

+'s soon have rattlesnakes aroun'."

+

+Tom thought a minute or two, and says:

+

+"It's a good idea.  And I reckon it's been done.  It must a been done;

+it stands to reason.  Yes, it's a prime good idea.  Where could you keep

+it?"

+

+"Keep what, Mars Tom?"

+

+"Why, a rattlesnake."

+

+"De goodness gracious alive, Mars Tom!  Why, if dey was a rattlesnake to

+come in heah I'd take en bust right out thoo dat log wall, I would, wid

+my head."

+

+"Why, Jim, you wouldn't be afraid of it after a little.  You could tame

+it."

+

+"Tame it!"

+

+"Yeseasy enough.  Every animal is grateful for kindness and petting,

+and they wouldn't think of hurting a person that pets them.  Any book

+will tell you that.  You trythat's all I ask; just try for two or three

+days. Why, you can get him so, in a little while, that he'll love you;

+and sleep with you; and won't stay away from you a minute; and will let

+you wrap him round your neck and put his head in your mouth."

+

+"Please, Mars Tomdoan' talk so!  I can't stan' it!  He'd let

+me shove his head in my mouffer a favor, hain't it?  I lay he'd wait a

+pow'ful long time 'fo' I ast him.  En mo' en dat, I doan' want him

+to sleep wid me."

+

+"Jim, don't act so foolish.  A prisoner's got to have some kind of a

+dumb pet, and if a rattlesnake hain't ever been tried, why, there's more

+glory to be gained in your being the first to ever try it than any other

+way you could ever think of to save your life."

+

+"Why, Mars Tom, I doan' want no sich glory.  Snake take 'n bite

+Jim's chin off, den whah is de glory?  No, sah, I doan' want no sich

+doin's."

+

+"Blame it, can't you try?  I only want you to tryyou needn't keep

+it up if it don't work."

+

+"But de trouble all done ef de snake bite me while I's a tryin' him.

+Mars Tom, I's willin' to tackle mos' anything 'at ain't onreasonable,

+but ef you en Huck fetches a rattlesnake in heah for me to tame, I's

+gwyne to leave, dat's shore."

+

+"Well, then, let it go, let it go, if you're so bull-headed about it.

+ We can get you some garter-snakes, and you can tie some buttons on

+their tails, and let on they're rattlesnakes, and I reckon that 'll have

+to do."

+

+"I k'n stan' dem, Mars Tom, but blame' 'f I couldn' get along widout

+um, I tell you dat.  I never knowed b'fo' 't was so much bother and

+trouble to be a prisoner."

+

+"Well, it always is when it's done right.  You got any rats around

+here?"

+

+"No, sah, I hain't seed none."

+

+"Well, we'll get you some rats."

+

+"Why, Mars Tom, I doan' want no rats.  Dey's de dadblamedest creturs

+to 'sturb a body, en rustle roun' over 'im, en bite his feet, when he's

+tryin' to sleep, I ever see.  No, sah, gimme g'yarter-snakes, 'f I's

+got to have 'm, but doan' gimme no rats; I hain' got no use f'r um,

+skasely."

+

+"But, Jim, you got to have 'emthey all do.  So don't make no more

+fuss about it.  Prisoners ain't ever without rats.  There ain't no

+instance of it.  And they train them, and pet them, and learn them

+tricks, and they get to be as sociable as flies.  But you got to play

+music to them.  You got anything to play music on?"

+

+"I ain' got nuffn but a coase comb en a piece o' paper, en a juice-harp;

+but I reck'n dey wouldn' take no stock in a juice-harp."

+

+"Yes they would they don't care what kind of music 'tis.  A

+jews-harp's plenty good enough for a rat.  All animals like musicin a

+prison they dote on it.  Specially, painful music; and you can't get no

+other kind out of a jews-harp.  It always interests them; they come out

+to see what's the matter with you.  Yes, you're all right; you're fixed

+very well.  You want to set on your bed nights before you go to sleep,

+and early in the mornings, and play your jews-harp; play 'The Last Link

+is Broken'that's the thing that 'll scoop a rat quicker 'n anything

+else; and when you've played about two minutes you'll see all the rats,

+and the snakes, and spiders, and things begin to feel worried about you,

+and come.  And they'll just fairly swarm over you, and have a noble good

+time."

+

+"Yes, dey will, I reck'n, Mars Tom, but what kine er time is Jim

+havin'? Blest if I kin see de pint.  But I'll do it ef I got to.  I

+reck'n I better keep de animals satisfied, en not have no trouble in de

+house."

+

+Tom waited to think it over, and see if there wasn't nothing else; and

+pretty soon he says:

+

+"Oh, there's one thing I forgot.  Could you raise a flower here, do you

+reckon?"

+

+"I doan know but maybe I could, Mars Tom; but it's tolable dark in heah,

+en I ain' got no use f'r no flower, nohow, en she'd be a pow'ful sight

+o' trouble."

+

+"Well, you try it, anyway.  Some other prisoners has done it."

+

+"One er dem big cat-tail-lookin' mullen-stalks would grow in heah, Mars

+Tom, I reck'n, but she wouldn't be wuth half de trouble she'd coss."

+

+"Don't you believe it.  We'll fetch you a little one and you plant it in

+the corner over there, and raise it.  And don't call it mullen, call it

+Pitchiolathat's its right name when it's in a prison.  And you want to

+water it with your tears."

+

+"Why, I got plenty spring water, Mars Tom."

+

+"You don't want spring water; you want to water it with your tears.

+ It's the way they always do."

+

+"Why, Mars Tom, I lay I kin raise one er dem mullen-stalks twyste wid

+spring water whiles another man's a start'n one wid tears."

+

+"That ain't the idea.  You got to do it with tears."

+

+"She'll die on my han's, Mars Tom, she sholy will; kase I doan' skasely

+ever cry."

+

+So Tom was stumped.  But he studied it over, and then said Jim would

+have to worry along the best he could with an onion.  He promised

+he would go to the nigger cabins and drop one, private, in Jim's

+coffee-pot, in the morning. Jim said he would "jis' 's soon have

+tobacker in his coffee;" and found so much fault with it, and with the

+work and bother of raising the mullen, and jews-harping the rats, and

+petting and flattering up the snakes and spiders and things, on top of

+all the other work he had to do on pens, and inscriptions, and journals,

+and things, which made it more trouble and worry and responsibility to

+be a prisoner than anything he ever undertook, that Tom most lost all

+patience with him; and said he was just loadened down with more gaudier

+chances than a prisoner ever had in the world to make a name for

+himself, and yet he didn't know enough to appreciate them, and they was

+just about wasted on him.  So Jim he was sorry, and said he wouldn't

+behave so no more, and then me and Tom shoved for bed.

+

+

+

+

+CHAPTER XXXIX.

+

+IN the morning we went up to the village and bought a wire rat-trap and

+fetched it down, and unstopped the best rat-hole, and in about an hour

+we had fifteen of the bulliest kind of ones; and then we took it and put

+it in a safe place under Aunt Sally's bed.  But while we was gone for

+spiders little Thomas Franklin Benjamin Jefferson Elexander Phelps found

+it there, and opened the door of it to see if the rats would come out,

+and they did; and Aunt Sally she come in, and when we got back she was

+a-standing on top of the bed raising Cain, and the rats was doing what

+they could to keep off the dull times for her.  So she took and dusted

+us both with the hickry, and we was as much as two hours catching

+another fifteen or sixteen, drat that meddlesome cub, and they warn't

+the likeliest, nuther, because the first haul was the pick of the flock.

+ I never see a likelier lot of rats than what that first haul was.

+

+We got a splendid stock of sorted spiders, and bugs, and frogs, and

+caterpillars, and one thing or another; and we like to got a hornet's

+nest, but we didn't.  The family was at home.  We didn't give it right

+up, but stayed with them as long as we could; because we allowed we'd

+tire them out or they'd got to tire us out, and they done it.  Then we

+got allycumpain and rubbed on the places, and was pretty near all right

+again, but couldn't set down convenient.  And so we went for the snakes,

+and grabbed a couple of dozen garters and house-snakes, and put them in

+a bag, and put it in our room, and by that time it was supper-time, and

+a rattling good honest day's work:  and hungry?oh, no, I reckon not!

+ And there warn't a blessed snake up there when we went backwe didn't

+half tie the sack, and they worked out somehow, and left.  But it didn't

+matter much, because they was still on the premises somewheres.  So

+we judged we could get some of them again.  No, there warn't no real

+scarcity of snakes about the house for a considerable spell.  You'd see

+them dripping from the rafters and places every now and then; and they

+generly landed in your plate, or down the back of your neck, and most

+of the time where you didn't want them.  Well, they was handsome and

+striped, and there warn't no harm in a million of them; but that never

+made no difference to Aunt Sally; she despised snakes, be the breed what

+they might, and she couldn't stand them no way you could fix it; and

+every time one of them flopped down on her, it didn't make no difference

+what she was doing, she would just lay that work down and light out.  I

+never see such a woman.  And you could hear her whoop to Jericho.  You

+couldn't get her to take a-holt of one of them with the tongs.  And if

+she turned over and found one in bed she would scramble out and lift a

+howl that you would think the house was afire.  She disturbed the old

+man so that he said he could most wish there hadn't ever been no snakes

+created.  Why, after every last snake had been gone clear out of the

+house for as much as a week Aunt Sally warn't over it yet; she warn't

+near over it; when she was setting thinking about something you could

+touch her on the back of her neck with a feather and she would jump

+right out of her stockings.  It was very curious.  But Tom said all

+women was just so.  He said they was made that way for some reason or

+other.

+

+We got a licking every time one of our snakes come in her way, and she

+allowed these lickings warn't nothing to what she would do if we ever

+loaded up the place again with them.  I didn't mind the lickings,

+because they didn't amount to nothing; but I minded the trouble we

+had to lay in another lot.  But we got them laid in, and all the other

+things; and you never see a cabin as blithesome as Jim's was when they'd

+all swarm out for music and go for him.  Jim didn't like the spiders,

+and the spiders didn't like Jim; and so they'd lay for him, and make it

+mighty warm for him.  And he said that between the rats and the snakes

+and the grindstone there warn't no room in bed for him, skasely; and

+when there was, a body couldn't sleep, it was so lively, and it was

+always lively, he said, because they never all slept at one time, but

+took turn about, so when the snakes was asleep the rats was on deck, and

+when the rats turned in the snakes come on watch, so he always had one

+gang under him, in his way, and t'other gang having a circus over him,

+and if he got up to hunt a new place the spiders would take a chance at

+him as he crossed over. He said if he ever got out this time he wouldn't

+ever be a prisoner again, not for a salary.

+

+Well, by the end of three weeks everything was in pretty good shape.

+ The shirt was sent in early, in a pie, and every time a rat bit Jim he

+would get up and write a little in his journal whilst the ink was fresh;

+the pens was made, the inscriptions and so on was all carved on the

+grindstone; the bed-leg was sawed in two, and we had et up the sawdust,

+and it give us a most amazing stomach-ache.  We reckoned we was all

+going to die, but didn't.  It was the most undigestible sawdust I ever

+see; and Tom said the same.

+

+But as I was saying, we'd got all the work done now, at last; and we was

+all pretty much fagged out, too, but mainly Jim.  The old man had wrote

+a couple of times to the plantation below Orleans to come and get their

+runaway nigger, but hadn't got no answer, because there warn't no such

+plantation; so he allowed he would advertise Jim in the St. Louis and

+New Orleans papers; and when he mentioned the St. Louis ones it give me

+the cold shivers, and I see we hadn't no time to lose. So Tom said, now

+for the nonnamous letters.

+

+"What's them?"  I says.

+

+"Warnings to the people that something is up.  Sometimes it's done one

+way, sometimes another.  But there's always somebody spying around that

+gives notice to the governor of the castle.  When Louis XVI. was going

+to light out of the Tooleries, a servant-girl done it.  It's a very good

+way, and so is the nonnamous letters.  We'll use them both.  And it's

+usual for the prisoner's mother to change clothes with him, and she

+stays in, and he slides out in her clothes.  We'll do that, too."

+

+"But looky here, Tom, what do we want to warn anybody for that

+something's up?  Let them find it out for themselvesit's their

+lookout."

+

+"Yes, I know; but you can't depend on them.  It's the way they've acted

+from the very startleft us to do everything.  They're so confiding

+and mullet-headed they don't take notice of nothing at all.  So if we

+don't give them notice there won't be nobody nor nothing to interfere

+with us, and so after all our hard work and trouble this escape 'll go

+off perfectly flat; won't amount to nothingwon't be nothing to it."

+

+"Well, as for me, Tom, that's the way I'd like."

+

+"Shucks!" he says, and looked disgusted.  So I says:

+

+"But I ain't going to make no complaint.  Any way that suits you suits

+me. What you going to do about the servant-girl?"

+

+"You'll be her.  You slide in, in the middle of the night, and hook that

+yaller girl's frock."

+

+"Why, Tom, that 'll make trouble next morning; because, of course, she

+prob'bly hain't got any but that one."

+

+"I know; but you don't want it but fifteen minutes, to carry the

+nonnamous letter and shove it under the front door."

+

+"All right, then, I'll do it; but I could carry it just as handy in my

+own togs."

+

+"You wouldn't look like a servant-girl then, would you?"

+

+"No, but there won't be nobody to see what I look like, anyway."

+

+"That ain't got nothing to do with it.  The thing for us to do is just

+to do our duty, and not worry about whether anybody sees us do it or

+not. Hain't you got no principle at all?"

+

+"All right, I ain't saying nothing; I'm the servant-girl.  Who's Jim's

+mother?"

+

+"I'm his mother.  I'll hook a gown from Aunt Sally."

+

+"Well, then, you'll have to stay in the cabin when me and Jim leaves."

+

+"Not much.  I'll stuff Jim's clothes full of straw and lay it on his bed

+to represent his mother in disguise, and Jim 'll take the nigger woman's

+gown off of me and wear it, and we'll all evade together.  When a

+prisoner of style escapes it's called an evasion.  It's always called

+so when a king escapes, f'rinstance.  And the same with a king's son;

+it don't make no difference whether he's a natural one or an unnatural

+one."

+

+So Tom he wrote the nonnamous letter, and I smouched the yaller wench's

+frock that night, and put it on, and shoved it under the front door, the

+way Tom told me to.  It said:

+

+Beware.  Trouble is brewing.  Keep a sharp lookout. Unknown Friend.

+

+Next night we stuck a picture, which Tom drawed in blood, of a skull and

+crossbones on the front door; and next night another one of a coffin on

+the back door.  I never see a family in such a sweat.  They couldn't a

+been worse scared if the place had a been full of ghosts laying for them

+behind everything and under the beds and shivering through the air.  If

+a door banged, Aunt Sally she jumped and said "ouch!" if anything fell,

+she jumped and said "ouch!" if you happened to touch her, when she

+warn't noticing, she done the same; she couldn't face noway and be

+satisfied, because she allowed there was something behind her every

+timeso she was always a-whirling around sudden, and saying "ouch," and

+before she'd got two-thirds around she'd whirl back again, and say it

+again; and she was afraid to go to bed, but she dasn't set up.  So the

+thing was working very well, Tom said; he said he never see a thing work

+more satisfactory. He said it showed it was done right.

+

+So he said, now for the grand bulge!  So the very next morning at the

+streak of dawn we got another letter ready, and was wondering what we

+better do with it, because we heard them say at supper they was going

+to have a nigger on watch at both doors all night.  Tom he went down the

+lightning-rod to spy around; and the nigger at the back door was asleep,

+and he stuck it in the back of his neck and come back.  This letter

+said:

+

+Don't betray me, I wish to be your friend.  There is a desprate gang of

+cutthroats from over in the Indian Territory going to steal your runaway

+nigger to-night, and they have been trying to scare you so as you will

+stay in the house and not bother them.  I am one of the gang, but have

+got religgion and wish to quit it and lead an honest life again, and

+will betray the helish design. They will sneak down from northards,

+along the fence, at midnight exact, with a false key, and go in the

+nigger's cabin to get him. I am to be off a piece and blow a tin horn

+if I see any danger; but stead of that I will baa like a sheep soon as

+they get in and not blow at all; then whilst they are getting his

+chains loose, you slip there and lock them in, and can kill them at your

+leasure.  Don't do anything but just the way I am telling you, if you do

+they will suspicion something and raise whoop-jamboreehoo. I do not wish

+any reward but to know I have done the right thing. Unknown Friend.

+

+

+

+

+CHAPTER XL.

+

+WE was feeling pretty good after breakfast, and took my canoe and went

+over the river a-fishing, with a lunch, and had a good time, and took a

+look at the raft and found her all right, and got home late to supper,

+and found them in such a sweat and worry they didn't know which end they

+was standing on, and made us go right off to bed the minute we was done

+supper, and wouldn't tell us what the trouble was, and never let on a

+word about the new letter, but didn't need to, because we knowed as much

+about it as anybody did, and as soon as we was half up stairs and her

+back was turned we slid for the cellar cupboard and loaded up a good

+lunch and took it up to our room and went to bed, and got up about

+half-past eleven, and Tom put on Aunt Sally's dress that he stole and

+was going to start with the lunch, but says:

+

+"Where's the butter?"

+

+"I laid out a hunk of it," I says, "on a piece of a corn-pone."

+

+"Well, you left it laid out, thenit ain't here."

+

+"We can get along without it," I says.

+

+"We can get along with it, too," he says; "just you slide down cellar

+and fetch it.  And then mosey right down the lightning-rod and come

+along. I'll go and stuff the straw into Jim's clothes to represent his

+mother in disguise, and be ready to baa like a sheep and shove soon as

+you get there."

+

+So out he went, and down cellar went I. The hunk of butter, big as

+a person's fist, was where I had left it, so I took up the slab of

+corn-pone with it on, and blowed out my light, and started up stairs

+very stealthy, and got up to the main floor all right, but here comes

+Aunt Sally with a candle, and I clapped the truck in my hat, and clapped

+my hat on my head, and the next second she see me; and she says:

+

+"You been down cellar?"

+

+"Yes'm."

+

+"What you been doing down there?"

+

+"Noth'n."

+

+"Noth'n!"

+

+"No'm."

+

+"Well, then, what possessed you to go down there this time of night?"

+

+"I don't know 'm."

+

+"You don't know?  Don't answer me that way. Tom, I want to know what

+you been doing down there."

+

+"I hain't been doing a single thing, Aunt Sally, I hope to gracious if I

+have."

+

+I reckoned she'd let me go now, and as a generl thing she would; but I

+s'pose there was so many strange things going on she was just in a sweat

+about every little thing that warn't yard-stick straight; so she says,

+very decided:

+

+"You just march into that setting-room and stay there till I come.  You

+been up to something you no business to, and I lay I'll find out what it

+is before I'M done with you."

+

+So she went away as I opened the door and walked into the setting-room.

+My, but there was a crowd there!  Fifteen farmers, and every one of them

+had a gun.  I was most powerful sick, and slunk to a chair and set down.

+They was setting around, some of them talking a little, in a low voice,

+and all of them fidgety and uneasy, but trying to look like they warn't;

+but I knowed they was, because they was always taking off their hats,

+and putting them on, and scratching their heads, and changing their

+seats, and fumbling with their buttons.  I warn't easy myself, but I

+didn't take my hat off, all the same.

+

+I did wish Aunt Sally would come, and get done with me, and lick me, if

+she wanted to, and let me get away and tell Tom how we'd overdone this

+thing, and what a thundering hornet's-nest we'd got ourselves into, so

+we could stop fooling around straight off, and clear out with Jim before

+these rips got out of patience and come for us.

+

+At last she come and begun to ask me questions, but I couldn't answer

+them straight, I didn't know which end of me was up; because these men

+was in such a fidget now that some was wanting to start right NOW and

+lay for them desperadoes, and saying it warn't but a few minutes to

+midnight; and others was trying to get them to hold on and wait for the

+sheep-signal; and here was Aunty pegging away at the questions, and

+me a-shaking all over and ready to sink down in my tracks I was

+that scared; and the place getting hotter and hotter, and the butter

+beginning to melt and run down my neck and behind my ears; and pretty

+soon, when one of them says, "I'M for going and getting in the cabin

+first and right now, and catching them when they come," I most

+dropped; and a streak of butter come a-trickling down my forehead, and

+Aunt Sally she see it, and turns white as a sheet, and says:

+

+"For the land's sake, what is the matter with the child?  He's got the

+brain-fever as shore as you're born, and they're oozing out!"

+

+And everybody runs to see, and she snatches off my hat, and out comes

+the bread and what was left of the butter, and she grabbed me, and

+hugged me, and says:

+

+"Oh, what a turn you did give me! and how glad and grateful I am it

+ain't no worse; for luck's against us, and it never rains but it pours,

+and when I see that truck I thought we'd lost you, for I knowed by

+the color and all it was just like your brains would be ifDear,

+dear, whyd'nt you tell me that was what you'd been down there for, I

+wouldn't a cared.  Now cler out to bed, and don't lemme see no more of

+you till morning!"

+

+I was up stairs in a second, and down the lightning-rod in another one,

+and shinning through the dark for the lean-to.  I couldn't hardly get my

+words out, I was so anxious; but I told Tom as quick as I could we must

+jump for it now, and not a minute to losethe house full of men, yonder,

+with guns!

+

+His eyes just blazed; and he says:

+

+"No!is that so?  ain't it bully!  Why, Huck, if it was to do over

+again, I bet I could fetch two hundred!  If we could put it off till"

+

+"Hurry!  Hurry!"  I says.  "Where's Jim?"

+

+"Right at your elbow; if you reach out your arm you can touch him.

+ He's dressed, and everything's ready.  Now we'll slide out and give the

+sheep-signal."

+

+But then we heard the tramp of men coming to the door, and heard them

+begin to fumble with the pad-lock, and heard a man say:

+

+"I told you we'd be too soon; they haven't comethe door is locked.

+Here, I'll lock some of you into the cabin, and you lay for 'em in the

+dark and kill 'em when they come; and the rest scatter around a piece,

+and listen if you can hear 'em coming."

+

+So in they come, but couldn't see us in the dark, and most trod on

+us whilst we was hustling to get under the bed.  But we got under all

+right, and out through the hole, swift but softJim first, me next,

+and Tom last, which was according to Tom's orders.  Now we was in the

+lean-to, and heard trampings close by outside.  So we crept to the door,

+and Tom stopped us there and put his eye to the crack, but couldn't make

+out nothing, it was so dark; and whispered and said he would listen

+for the steps to get further, and when he nudged us Jim must glide out

+first, and him last.  So he set his ear to the crack and listened, and

+listened, and listened, and the steps a-scraping around out there all

+the time; and at last he nudged us, and we slid out, and stooped down,

+not breathing, and not making the least noise, and slipped stealthy

+towards the fence in Injun file, and got to it all right, and me and Jim

+over it; but Tom's britches catched fast on a splinter on the top

+rail, and then he hear the steps coming, so he had to pull loose, which

+snapped the splinter and made a noise; and as he dropped in our tracks

+and started somebody sings out:

+

+"Who's that?  Answer, or I'll shoot!"

+

+But we didn't answer; we just unfurled our heels and shoved.  Then there

+was a rush, and a Bang, Bang, Bang! and the bullets fairly whizzed

+around us! We heard them sing out:

+

+"Here they are!  They've broke for the river!  After 'em, boys, and turn

+loose the dogs!"

+

+So here they come, full tilt.  We could hear them because they wore

+boots and yelled, but we didn't wear no boots and didn't yell.  We was

+in the path to the mill; and when they got pretty close on to us we

+dodged into the bush and let them go by, and then dropped in behind

+them.  They'd had all the dogs shut up, so they wouldn't scare off the

+robbers; but by this time somebody had let them loose, and here they

+come, making powwow enough for a million; but they was our dogs; so we

+stopped in our tracks till they catched up; and when they see it warn't

+nobody but us, and no excitement to offer them, they only just said

+howdy, and tore right ahead towards the shouting and clattering; and

+then we up-steam again, and whizzed along after them till we was nearly

+to the mill, and then struck up through the bush to where my canoe was

+tied, and hopped in and pulled for dear life towards the middle of the

+river, but didn't make no more noise than we was obleeged to. Then we

+struck out, easy and comfortable, for the island where my raft was; and

+we could hear them yelling and barking at each other all up and down the

+bank, till we was so far away the sounds got dim and died out.  And when

+we stepped on to the raft I says:

+

+"Now, old Jim, you're a free man again, and I bet you won't ever be a

+slave no more."

+

+"En a mighty good job it wuz, too, Huck.  It 'uz planned beautiful, en

+it 'uz done beautiful; en dey ain't nobody kin git up a plan dat's mo'

+mixed-up en splendid den what dat one wuz."

+

+We was all glad as we could be, but Tom was the gladdest of all because

+he had a bullet in the calf of his leg.

+

+When me and Jim heard that we didn't feel so brash as what we did

+before. It was hurting him considerable, and bleeding; so we laid him in

+the wigwam and tore up one of the duke's shirts for to bandage him, but

+he says:

+

+"Gimme the rags; I can do it myself.  Don't stop now; don't fool around

+here, and the evasion booming along so handsome; man the sweeps, and set

+her loose!  Boys, we done it elegant!'deed we did.  I wish we'd a

+had the handling of Louis XVI., there wouldn't a been no 'Son of Saint

+Louis, ascend to heaven!' wrote down in his biography; no, sir, we'd

+a whooped him over the borderthat's what we'd a done with himand

+done it just as slick as nothing at all, too.  Man the sweepsman the

+sweeps!"

+

+But me and Jim was consultingand thinking.  And after we'd thought a

+minute, I says:

+

+"Say it, Jim."

+

+So he says:

+

+"Well, den, dis is de way it look to me, Huck.  Ef it wuz him dat 'uz

+bein' sot free, en one er de boys wuz to git shot, would he say, 'Go on

+en save me, nemmine 'bout a doctor f'r to save dis one?'  Is dat like

+Mars Tom Sawyer?  Would he say dat?  You bet he wouldn't!  well,

+den, is Jim gywne to say it?  No, sahI doan' budge a step out'n dis

+place 'dout a doctor, not if it's forty year!"

+

+I knowed he was white inside, and I reckoned he'd say what he did sayso

+it was all right now, and I told Tom I was a-going for a doctor.

+ He raised considerable row about it, but me and Jim stuck to it and

+wouldn't budge; so he was for crawling out and setting the raft loose

+himself; but we wouldn't let him.  Then he give us a piece of his mind,

+but it didn't do no good.

+

+So when he sees me getting the canoe ready, he says:

+

+"Well, then, if you're bound to go, I'll tell you the way to do when you

+get to the village.  Shut the door and blindfold the doctor tight and

+fast, and make him swear to be silent as the grave, and put a purse

+full of gold in his hand, and then take and lead him all around the

+back alleys and everywheres in the dark, and then fetch him here in the

+canoe, in a roundabout way amongst the islands, and search him and take

+his chalk away from him, and don't give it back to him till you get him

+back to the village, or else he will chalk this raft so he can find it

+again. It's the way they all do."

+

+So I said I would, and left, and Jim was to hide in the woods when he

+see the doctor coming till he was gone again.

+

+

+

+

+CHAPTER XLI.

+

+THE doctor was an old man; a very nice, kind-looking old man when I got

+him up.  I told him me and my brother was over on Spanish Island hunting

+yesterday afternoon, and camped on a piece of a raft we found, and about

+midnight he must a kicked his gun in his dreams, for it went off and

+shot him in the leg, and we wanted him to go over there and fix it and

+not say nothing about it, nor let anybody know, because we wanted to

+come home this evening and surprise the folks.

+

+"Who is your folks?" he says.

+

+"The Phelpses, down yonder."

+

+"Oh," he says.  And after a minute, he says:

+

+"How'd you say he got shot?"

+

+"He had a dream," I says, "and it shot him."

+

+"Singular dream," he says.

+

+So he lit up his lantern, and got his saddle-bags, and we started.  But

+when he sees the canoe he didn't like the look of hersaid she was big

+enough for one, but didn't look pretty safe for two.  I says:

+

+"Oh, you needn't be afeard, sir, she carried the three of us easy

+enough."

+

+"What three?"

+

+"Why, me and Sid, andandand the guns; that's what I mean."

+

+"Oh," he says.

+

+But he put his foot on the gunnel and rocked her, and shook his head,

+and said he reckoned he'd look around for a bigger one.  But they was

+all locked and chained; so he took my canoe, and said for me to wait

+till he come back, or I could hunt around further, or maybe I better

+go down home and get them ready for the surprise if I wanted to.  But

+I said I didn't; so I told him just how to find the raft, and then he

+started.

+

+I struck an idea pretty soon.  I says to myself, spos'n he can't fix

+that leg just in three shakes of a sheep's tail, as the saying is?

+spos'n it takes him three or four days?  What are we going to do?lay

+around there till he lets the cat out of the bag?  No, sir; I know what

+I'll do.  I'll wait, and when he comes back if he says he's got to

+go any more I'll get down there, too, if I swim; and we'll take and tie

+him, and keep him, and shove out down the river; and when Tom's done

+with him we'll give him what it's worth, or all we got, and then let him

+get ashore.

+

+So then I crept into a lumber-pile to get some sleep; and next time I

+waked up the sun was away up over my head!  I shot out and went for the

+doctor's house, but they told me he'd gone away in the night some time

+or other, and warn't back yet.  Well, thinks I, that looks powerful bad

+for Tom, and I'll dig out for the island right off.  So away I shoved,

+and turned the corner, and nearly rammed my head into Uncle Silas's

+stomach! He says:

+

+"Why, Tom!  Where you been all this time, you rascal?"

+

+"I hain't been nowheres," I says, "only just hunting for the runaway

+niggerme and Sid."

+

+"Why, where ever did you go?" he says.  "Your aunt's been mighty

+uneasy."

+

+"She needn't," I says, "because we was all right.  We followed the men

+and the dogs, but they outrun us, and we lost them; but we thought we

+heard them on the water, so we got a canoe and took out after them and

+crossed over, but couldn't find nothing of them; so we cruised along

+up-shore till we got kind of tired and beat out; and tied up the canoe

+and went to sleep, and never waked up till about an hour ago; then we

+paddled over here to hear the news, and Sid's at the post-office to see

+what he can hear, and I'm a-branching out to get something to eat for

+us, and then we're going home."

+

+So then we went to the post-office to get "Sid"; but just as I

+suspicioned, he warn't there; so the old man he got a letter out of the

+office, and we waited awhile longer, but Sid didn't come; so the old man

+said, come along, let Sid foot it home, or canoe it, when he got done

+fooling aroundbut we would ride.  I couldn't get him to let me stay

+and wait for Sid; and he said there warn't no use in it, and I must come

+along, and let Aunt Sally see we was all right.

+

+When we got home Aunt Sally was that glad to see me she laughed and

+cried both, and hugged me, and give me one of them lickings of hern that

+don't amount to shucks, and said she'd serve Sid the same when he come.

+

+And the place was plum full of farmers and farmers' wives, to dinner;

+and such another clack a body never heard.  Old Mrs. Hotchkiss was the

+worst; her tongue was a-going all the time.  She says:

+

+"Well, Sister Phelps, I've ransacked that-air cabin over, an' I b'lieve

+the nigger was crazy.  I says to Sister Damrelldidn't I, Sister

+Damrell?s'I, he's crazy, s'Ithem's the very words I said.  You all

+hearn me: he's crazy, s'I; everything shows it, s'I.  Look at that-air

+grindstone, s'I; want to tell me't any cretur 't's in his right mind

+'s a goin' to scrabble all them crazy things onto a grindstone, s'I?

+ Here sich 'n' sich a person busted his heart; 'n' here so 'n' so

+pegged along for thirty-seven year, 'n' all thatnatcherl son o' Louis

+somebody, 'n' sich everlast'n rubbage.  He's plumb crazy, s'I; it's what

+I says in the fust place, it's what I says in the middle, 'n' it's what

+I says last 'n' all the timethe nigger's crazycrazy 's Nebokoodneezer,

+s'I."

+

+"An' look at that-air ladder made out'n rags, Sister Hotchkiss," says

+old Mrs. Damrell; "what in the name o' goodness could he ever want

+of"

+

+"The very words I was a-sayin' no longer ago th'n this minute to Sister

+Utterback, 'n' she'll tell you so herself.  Sh-she, look at that-air rag

+ladder, sh-she; 'n' s'I, yes, look at it, s'Iwhat could he a-wanted

+of it, s'I.  Sh-she, Sister Hotchkiss, sh-she"

+

+"But how in the nation'd they ever git that grindstone in there,

+anyway? 'n' who dug that-air hole? 'n' who"

+

+"My very words, Brer Penrod!  I was a-sayin'pass that-air sasser o'

+m'lasses, won't ye?I was a-sayin' to Sister Dunlap, jist this minute,

+how did they git that grindstone in there, s'I.  Without help, mind

+you'thout help!  that's wher 'tis.  Don't tell me, s'I; there

+wuz help, s'I; 'n' ther' wuz a plenty help, too, s'I; ther's ben a

+dozen a-helpin' that nigger, 'n' I lay I'd skin every last nigger on

+this place but I'd find out who done it, s'I; 'n' moreover, s'I"

+

+"A dozen says you!forty couldn't a done every thing that's been

+done. Look at them case-knife saws and things, how tedious they've been

+made; look at that bed-leg sawed off with 'm, a week's work for six men;

+look at that nigger made out'n straw on the bed; and look at"

+

+"You may well say it, Brer Hightower!  It's jist as I was a-sayin'

+to Brer Phelps, his own self.  S'e, what do you think of it, Sister

+Hotchkiss, s'e? Think o' what, Brer Phelps, s'I?  Think o' that bed-leg

+sawed off that a way, s'e?  think of it, s'I?  I lay it never sawed

+itself off, s'Isomebody sawed it, s'I; that's my opinion, take it

+or leave it, it mayn't be no 'count, s'I, but sich as 't is, it's my

+opinion, s'I, 'n' if any body k'n start a better one, s'I, let him do

+it, s'I, that's all.  I says to Sister Dunlap, s'I"

+

+"Why, dog my cats, they must a ben a house-full o' niggers in there

+every night for four weeks to a done all that work, Sister Phelps.  Look

+at that shirtevery last inch of it kivered over with secret African

+writ'n done with blood!  Must a ben a raft uv 'm at it right along, all

+the time, amost.  Why, I'd give two dollars to have it read to me; 'n'

+as for the niggers that wrote it, I 'low I'd take 'n' lash 'm t'll"

+

+"People to help him, Brother Marples!  Well, I reckon you'd think

+so if you'd a been in this house for a while back.  Why, they've stole

+everything they could lay their hands onand we a-watching all the time,

+mind you. They stole that shirt right off o' the line! and as for that

+sheet they made the rag ladder out of, ther' ain't no telling how

+many times they didn't steal that; and flour, and candles, and

+candlesticks, and spoons, and the old warming-pan, and most a thousand

+things that I disremember now, and my new calico dress; and me and

+Silas and my Sid and Tom on the constant watch day and night, as I was

+a-telling you, and not a one of us could catch hide nor hair nor sight

+nor sound of them; and here at the last minute, lo and behold you, they

+slides right in under our noses and fools us, and not only fools us

+but the Injun Territory robbers too, and actuly gets away with that

+nigger safe and sound, and that with sixteen men and twenty-two dogs

+right on their very heels at that very time!  I tell you, it just bangs

+anything I ever heard of. Why, sperits couldn't a done better and

+been no smarter. And I reckon they must a been speritsbecause, you

+know our dogs, and ther' ain't no better; well, them dogs never even got

+on the track of 'm once!  You explain that to me if you can!any

+of you!"

+

+"Well, it does beat"

+

+"Laws alive, I never"

+

+"So help me, I wouldn't a be"

+

+"House-thieves as well as"

+

+"Goodnessgracioussakes, I'd a ben afeard to live in sich a"

+

+"'Fraid to live!why, I was that scared I dasn't hardly go to bed, or

+get up, or lay down, or set down, Sister Ridgeway.  Why, they'd steal

+the verywhy, goodness sakes, you can guess what kind of a fluster I was

+in by the time midnight come last night.  I hope to gracious if I warn't

+afraid they'd steal some o' the family!  I was just to that pass I

+didn't have no reasoning faculties no more.  It looks foolish enough

+now, in the daytime; but I says to myself, there's my two poor boys

+asleep, 'way up stairs in that lonesome room, and I declare to goodness

+I was that uneasy 't I crep' up there and locked 'em in!  I did.  And

+anybody would. Because, you know, when you get scared that way, and it

+keeps running on, and getting worse and worse all the time, and your

+wits gets to addling, and you get to doing all sorts o' wild things,

+and by and by you think to yourself, spos'n I was a boy, and was away up

+there, and the door ain't locked, and you" She stopped, looking kind

+of wondering, and then she turned her head around slow, and when her eye

+lit on meI got up and took a walk.

+

+Says I to myself, I can explain better how we come to not be in that

+room this morning if I go out to one side and study over it a little.

+ So I done it.  But I dasn't go fur, or she'd a sent for me.  And when

+it was late in the day the people all went, and then I come in and

+told her the noise and shooting waked up me and "Sid," and the door was

+locked, and we wanted to see the fun, so we went down the lightning-rod,

+and both of us got hurt a little, and we didn't never want to try that

+no more.  And then I went on and told her all what I told Uncle Silas

+before; and then she said she'd forgive us, and maybe it was all right

+enough anyway, and about what a body might expect of boys, for all boys

+was a pretty harum-scarum lot as fur as she could see; and so, as long

+as no harm hadn't come of it, she judged she better put in her time

+being grateful we was alive and well and she had us still, stead of

+fretting over what was past and done.  So then she kissed me, and patted

+me on the head, and dropped into a kind of a brown study; and pretty

+soon jumps up, and says:

+

+"Why, lawsamercy, it's most night, and Sid not come yet!  What has

+become of that boy?"

+

+I see my chance; so I skips up and says:

+

+"I'll run right up to town and get him," I says.

+

+"No you won't," she says.  "You'll stay right wher' you are; one's

+enough to be lost at a time.  If he ain't here to supper, your uncle 'll

+go."

+

+Well, he warn't there to supper; so right after supper uncle went.

+

+He come back about ten a little bit uneasy; hadn't run across Tom's

+track. Aunt Sally was a good deal uneasy; but Uncle Silas he said

+there warn't no occasion to beboys will be boys, he said, and you'll

+see this one turn up in the morning all sound and right.  So she had

+to be satisfied.  But she said she'd set up for him a while anyway, and

+keep a light burning so he could see it.

+

+And then when I went up to bed she come up with me and fetched her

+candle, and tucked me in, and mothered me so good I felt mean, and like

+I couldn't look her in the face; and she set down on the bed and talked

+with me a long time, and said what a splendid boy Sid was, and didn't

+seem to want to ever stop talking about him; and kept asking me every

+now and then if I reckoned he could a got lost, or hurt, or maybe

+drownded, and might be laying at this minute somewheres suffering or

+dead, and she not by him to help him, and so the tears would drip down

+silent, and I would tell her that Sid was all right, and would be home

+in the morning, sure; and she would squeeze my hand, or maybe kiss me,

+and tell me to say it again, and keep on saying it, because it done her

+good, and she was in so much trouble.  And when she was going away she

+looked down in my eyes so steady and gentle, and says:

+

+"The door ain't going to be locked, Tom, and there's the window and

+the rod; but you'll be good, won't you?  And you won't go?  For my

+sake."

+

+Laws knows I wanted to go bad enough to see about Tom, and was all

+intending to go; but after that I wouldn't a went, not for kingdoms.

+

+But she was on my mind and Tom was on my mind, so I slept very restless.

+And twice I went down the rod away in the night, and slipped around

+front, and see her setting there by her candle in the window with her

+eyes towards the road and the tears in them; and I wished I could do

+something for her, but I couldn't, only to swear that I wouldn't never

+do nothing to grieve her any more.  And the third time I waked up at

+dawn, and slid down, and she was there yet, and her candle was most out,

+and her old gray head was resting on her hand, and she was asleep.

+

+

+

+

+CHAPTER XLII.

+

+THE old man was uptown again before breakfast, but couldn't get no

+track of Tom; and both of them set at the table thinking, and not saying

+nothing, and looking mournful, and their coffee getting cold, and not

+eating anything. And by and by the old man says:

+

+"Did I give you the letter?"

+

+"What letter?"

+

+"The one I got yesterday out of the post-office."

+

+"No, you didn't give me no letter."

+

+"Well, I must a forgot it."

+

+So he rummaged his pockets, and then went off somewheres where he had

+laid it down, and fetched it, and give it to her.  She says:

+

+"Why, it's from St. Petersburgit's from Sis."

+

+I allowed another walk would do me good; but I couldn't stir.  But

+before she could break it open she dropped it and runfor she see

+something. And so did I. It was Tom Sawyer on a mattress; and that old

+doctor; and Jim, in her calico dress, with his hands tied behind him;

+and a lot of people.  I hid the letter behind the first thing that come

+handy, and rushed.  She flung herself at Tom, crying, and says:

+

+"Oh, he's dead, he's dead, I know he's dead!"

+

+And Tom he turned his head a little, and muttered something or other,

+which showed he warn't in his right mind; then she flung up her hands,

+and says:

+

+"He's alive, thank God!  And that's enough!" and she snatched a kiss of

+him, and flew for the house to get the bed ready, and scattering orders

+right and left at the niggers and everybody else, as fast as her tongue

+could go, every jump of the way.

+

+I followed the men to see what they was going to do with Jim; and the

+old doctor and Uncle Silas followed after Tom into the house.  The men

+was very huffy, and some of them wanted to hang Jim for an example to

+all the other niggers around there, so they wouldn't be trying to run

+away like Jim done, and making such a raft of trouble, and keeping a

+whole family scared most to death for days and nights.  But the others

+said, don't do it, it wouldn't answer at all; he ain't our nigger, and

+his owner would turn up and make us pay for him, sure.  So that cooled

+them down a little, because the people that's always the most anxious

+for to hang a nigger that hain't done just right is always the very

+ones that ain't the most anxious to pay for him when they've got their

+satisfaction out of him.

+

+They cussed Jim considerble, though, and give him a cuff or two side the

+head once in a while, but Jim never said nothing, and he never let on to

+know me, and they took him to the same cabin, and put his own clothes

+on him, and chained him again, and not to no bed-leg this time, but to

+a big staple drove into the bottom log, and chained his hands, too, and

+both legs, and said he warn't to have nothing but bread and water to

+eat after this till his owner come, or he was sold at auction because

+he didn't come in a certain length of time, and filled up our hole, and

+said a couple of farmers with guns must stand watch around about the

+cabin every night, and a bulldog tied to the door in the daytime; and

+about this time they was through with the job and was tapering off with

+a kind of generl good-bye cussing, and then the old doctor comes and

+takes a look, and says:

+

+"Don't be no rougher on him than you're obleeged to, because he ain't

+a bad nigger.  When I got to where I found the boy I see I couldn't cut

+the bullet out without some help, and he warn't in no condition for

+me to leave to go and get help; and he got a little worse and a little

+worse, and after a long time he went out of his head, and wouldn't let

+me come a-nigh him any more, and said if I chalked his raft he'd kill

+me, and no end of wild foolishness like that, and I see I couldn't do

+anything at all with him; so I says, I got to have help somehow; and

+the minute I says it out crawls this nigger from somewheres and says

+he'll help, and he done it, too, and done it very well.  Of course I

+judged he must be a runaway nigger, and there I was! and there I had

+to stick right straight along all the rest of the day and all night.  It

+was a fix, I tell you! I had a couple of patients with the chills, and

+of course I'd of liked to run up to town and see them, but I dasn't,

+because the nigger might get away, and then I'd be to blame; and yet

+never a skiff come close enough for me to hail.  So there I had to stick

+plumb until daylight this morning; and I never see a nigger that was a

+better nuss or faithfuller, and yet he was risking his freedom to do it,

+and was all tired out, too, and I see plain enough he'd been worked

+main hard lately.  I liked the nigger for that; I tell you, gentlemen, a

+nigger like that is worth a thousand dollarsand kind treatment, too.  I

+had everything I needed, and the boy was doing as well there as he

+would a done at homebetter, maybe, because it was so quiet; but there I

+was, with both of 'm on my hands, and there I had to stick till about

+dawn this morning; then some men in a skiff come by, and as good luck

+would have it the nigger was setting by the pallet with his head propped

+on his knees sound asleep; so I motioned them in quiet, and they slipped

+up on him and grabbed him and tied him before he knowed what he was

+about, and we never had no trouble. And the boy being in a kind of a

+flighty sleep, too, we muffled the oars and hitched the raft on, and

+towed her over very nice and quiet, and the nigger never made the least

+row nor said a word from the start.  He ain't no bad nigger, gentlemen;

+that's what I think about him."

+

+Somebody says:

+

+"Well, it sounds very good, doctor, I'm obleeged to say."

+

+Then the others softened up a little, too, and I was mighty thankful

+to that old doctor for doing Jim that good turn; and I was glad it was

+according to my judgment of him, too; because I thought he had a good

+heart in him and was a good man the first time I see him.  Then they

+all agreed that Jim had acted very well, and was deserving to have some

+notice took of it, and reward.  So every one of them promised, right out

+and hearty, that they wouldn't cuss him no more.

+

+Then they come out and locked him up.  I hoped they was going to say he

+could have one or two of the chains took off, because they was rotten

+heavy, or could have meat and greens with his bread and water; but they

+didn't think of it, and I reckoned it warn't best for me to mix in, but

+I judged I'd get the doctor's yarn to Aunt Sally somehow or other as

+soon as I'd got through the breakers that was laying just ahead of

+meexplanations, I mean, of how I forgot to mention about Sid being shot

+when I was telling how him and me put in that dratted night paddling

+around hunting the runaway nigger.

+

+But I had plenty time.  Aunt Sally she stuck to the sick-room all day

+and all night, and every time I see Uncle Silas mooning around I dodged

+him.

+

+Next morning I heard Tom was a good deal better, and they said Aunt

+Sally was gone to get a nap.  So I slips to the sick-room, and if I

+found him awake I reckoned we could put up a yarn for the family that

+would wash. But he was sleeping, and sleeping very peaceful, too; and

+pale, not fire-faced the way he was when he come.  So I set down and

+laid for him to wake.  In about half an hour Aunt Sally comes gliding

+in, and there I was, up a stump again!  She motioned me to be still, and

+set down by me, and begun to whisper, and said we could all be joyful

+now, because all the symptoms was first-rate, and he'd been sleeping

+like that for ever so long, and looking better and peacefuller all the

+time, and ten to one he'd wake up in his right mind.

+

+So we set there watching, and by and by he stirs a bit, and opened his

+eyes very natural, and takes a look, and says:

+

+"Hello!why, I'm at home!  How's that?  Where's the raft?"

+

+"It's all right," I says.

+

+"And Jim?"

+

+"The same," I says, but couldn't say it pretty brash.  But he never

+noticed, but says:

+

+"Good!  Splendid!  Now we're all right and safe! Did you tell Aunty?"

+

+I was going to say yes; but she chipped in and says:  "About what, Sid?"

+

+"Why, about the way the whole thing was done."

+

+"What whole thing?"

+

+"Why, the whole thing.  There ain't but one; how we set the runaway

+nigger freeme and Tom."

+

+"Good land!  Set the runWhat is the child talking about!  Dear, dear,

+out of his head again!"

+

+"No, I ain't out of my head; I know all what I'm talking about.  We

+did set him freeme and Tom.  We laid out to do it, and we done it.

+ And we done it elegant, too."  He'd got a start, and she never checked

+him up, just set and stared and stared, and let him clip along, and

+I see it warn't no use for me to put in.  "Why, Aunty, it cost us a

+power of workweeks of ithours and hours, every night, whilst you was

+all asleep. And we had to steal candles, and the sheet, and the shirt,

+and your dress, and spoons, and tin plates, and case-knives, and the

+warming-pan, and the grindstone, and flour, and just no end of things,

+and you can't think what work it was to make the saws, and pens, and

+inscriptions, and one thing or another, and you can't think half the

+fun it was.  And we had to make up the pictures of coffins and things,

+and nonnamous letters from the robbers, and get up and down the

+lightning-rod, and dig the hole into the cabin, and made the rope ladder

+and send it in cooked up in a pie, and send in spoons and things to work

+with in your apron pocket"

+

+"Mercy sakes!"

+

+"and load up the cabin with rats and snakes and so on, for company for

+Jim; and then you kept Tom here so long with the butter in his hat that

+you come near spiling the whole business, because the men come before

+we was out of the cabin, and we had to rush, and they heard us and let

+drive at us, and I got my share, and we dodged out of the path and let

+them go by, and when the dogs come they warn't interested in us, but

+went for the most noise, and we got our canoe, and made for the

+raft, and was all safe, and Jim was a free man, and we done it all by

+ourselves, and wasn't it bully, Aunty!"

+

+"Well, I never heard the likes of it in all my born days!  So it was

+you, you little rapscallions, that's been making all this trouble,

+and turned everybody's wits clean inside out and scared us all most to

+death.  I've as good a notion as ever I had in my life to take it out

+o' you this very minute.  To think, here I've been, night after night,

+ayou just get well once, you young scamp, and I lay I'll tan the Old

+Harry out o' both o' ye!"

+

+But Tom, he was so proud and joyful, he just couldn't hold in,

+and his tongue just went itshe a-chipping in, and spitting fire all

+along, and both of them going it at once, like a cat convention; and she

+says:

+

+"Well, you get all the enjoyment you can out of it now, for mind I

+tell you if I catch you meddling with him again"

+

+"Meddling with who?"  Tom says, dropping his smile and looking

+surprised.

+

+"With who?  Why, the runaway nigger, of course.  Who'd you reckon?"

+

+Tom looks at me very grave, and says:

+

+"Tom, didn't you just tell me he was all right?  Hasn't he got away?"

+

+"Him?" says Aunt Sally; "the runaway nigger?  'Deed he hasn't.

+ They've got him back, safe and sound, and he's in that cabin again,

+on bread and water, and loaded down with chains, till he's claimed or

+sold!"

+

+Tom rose square up in bed, with his eye hot, and his nostrils opening

+and shutting like gills, and sings out to me:

+

+"They hain't no right to shut him up!  SHOVE!and don't you lose a

+minute.  Turn him loose! he ain't no slave; he's as free as any cretur

+that walks this earth!"

+

+"What does the child mean?"

+

+"I mean every word I say, Aunt Sally, and if somebody don't go, I'll

+go. I've knowed him all his life, and so has Tom, there.  Old Miss

+Watson died two months ago, and she was ashamed she ever was going to

+sell him down the river, and said so; and she set him free in her

+will."

+

+"Then what on earth did you want to set him free for, seeing he was

+already free?"

+

+"Well, that is a question, I must say; and just like women!  Why,

+I wanted the adventure of it; and I'd a waded neck-deep in blood

+togoodness alive, Aunt Polly!"

+

+If she warn't standing right there, just inside the door, looking as

+sweet and contented as an angel half full of pie, I wish I may never!

+

+Aunt Sally jumped for her, and most hugged the head off of her, and

+cried over her, and I found a good enough place for me under the bed,

+for it was getting pretty sultry for us, seemed to me.  And I peeped

+out, and in a little while Tom's Aunt Polly shook herself loose and

+stood there looking across at Tom over her spectacleskind of grinding

+him into the earth, you know.  And then she says:

+

+"Yes, you better turn y'r head awayI would if I was you, Tom."

+

+"Oh, deary me!" says Aunt Sally; "Is he changed so?  Why, that ain't

+Tom, it's Sid; Tom'sTom'swhy, where is Tom?  He was here a minute

+ago."

+

+"You mean where's Huck Finnthat's what you mean!  I reckon I hain't

+raised such a scamp as my Tom all these years not to know him when I

+see him.  That would be a pretty howdy-do. Come out from under that

+bed, Huck Finn."

+

+So I done it.  But not feeling brash.

+

+Aunt Sally she was one of the mixed-upest-looking persons I ever

+seeexcept one, and that was Uncle Silas, when he come in and they told

+it all to him.  It kind of made him drunk, as you may say, and he didn't

+know nothing at all the rest of the day, and preached a prayer-meeting

+sermon that night that gave him a rattling ruputation, because the

+oldest man in the world couldn't a understood it.  So Tom's Aunt Polly,

+she told all about who I was, and what; and I had to up and tell how

+I was in such a tight place that when Mrs. Phelps took me for Tom

+Sawyershe chipped in and says, "Oh, go on and call me Aunt Sally, I'm

+used to it now, and 'tain't no need to change"that when Aunt Sally took

+me for Tom Sawyer I had to stand itthere warn't no other way, and

+I knowed he wouldn't mind, because it would be nuts for him, being

+a mystery, and he'd make an adventure out of it, and be perfectly

+satisfied.  And so it turned out, and he let on to be Sid, and made

+things as soft as he could for me.

+

+And his Aunt Polly she said Tom was right about old Miss Watson setting

+Jim free in her will; and so, sure enough, Tom Sawyer had gone and took

+all that trouble and bother to set a free nigger free! and I couldn't

+ever understand before, until that minute and that talk, how he could

+help a body set a nigger free with his bringing-up.

+

+Well, Aunt Polly she said that when Aunt Sally wrote to her that Tom and

+Sid had come all right and safe, she says to herself:

+

+"Look at that, now!  I might have expected it, letting him go off that

+way without anybody to watch him.  So now I got to go and trapse all

+the way down the river, eleven hundred mile, and find out what that

+creetur's up to this time, as long as I couldn't seem to get any

+answer out of you about it."

+

+"Why, I never heard nothing from you," says Aunt Sally.

+

+"Well, I wonder!  Why, I wrote you twice to ask you what you could mean

+by Sid being here."

+

+"Well, I never got 'em, Sis."

+

+Aunt Polly she turns around slow and severe, and says:

+

+"You, Tom!"

+

+"Wellwhat?" he says, kind of pettish.

+

+"Don't you what me, you impudent thinghand out them letters."

+

+"What letters?"

+

+"Them letters.  I be bound, if I have to take a-holt of you I'll"

+

+"They're in the trunk.  There, now.  And they're just the same as they

+was when I got them out of the office.  I hain't looked into them, I

+hain't touched them.  But I knowed they'd make trouble, and I thought if

+you warn't in no hurry, I'd"

+

+"Well, you do need skinning, there ain't no mistake about it.  And I

+wrote another one to tell you I was coming; and I s'pose he"

+

+"No, it come yesterday; I hain't read it yet, but it's all right, I've

+got that one."

+

+I wanted to offer to bet two dollars she hadn't, but I reckoned maybe it

+was just as safe to not to.  So I never said nothing.

+

+

+

+

+CHAPTER THE LAST

+

+THE first time I catched Tom private I asked him what was his idea, time

+of the evasion?what it was he'd planned to do if the evasion worked all

+right and he managed to set a nigger free that was already free before?

+And he said, what he had planned in his head from the start, if we got

+Jim out all safe, was for us to run him down the river on the raft, and

+have adventures plumb to the mouth of the river, and then tell him about

+his being free, and take him back up home on a steamboat, in style,

+and pay him for his lost time, and write word ahead and get out all

+the niggers around, and have them waltz him into town with a torchlight

+procession and a brass-band, and then he would be a hero, and so would

+we.  But I reckoned it was about as well the way it was.

+

+We had Jim out of the chains in no time, and when Aunt Polly and Uncle

+Silas and Aunt Sally found out how good he helped the doctor nurse Tom,

+they made a heap of fuss over him, and fixed him up prime, and give him

+all he wanted to eat, and a good time, and nothing to do.  And we had

+him up to the sick-room, and had a high talk; and Tom give Jim forty

+dollars for being prisoner for us so patient, and doing it up so good,

+and Jim was pleased most to death, and busted out, and says:

+

+"Dah, now, Huck, what I tell you?what I tell you up dah on Jackson

+islan'?  I tole you I got a hairy breas', en what's de sign un it; en

+I tole you I ben rich wunst, en gwineter to be rich agin; en it's

+come true; en heah she is!  dah, now! doan' talk to mesigns is

+signs, mine I tell you; en I knowed jis' 's well 'at I 'uz gwineter be

+rich agin as I's a-stannin' heah dis minute!"

+

+And then Tom he talked along and talked along, and says, le's all three

+slide out of here one of these nights and get an outfit, and go for

+howling adventures amongst the Injuns, over in the Territory, for a

+couple of weeks or two; and I says, all right, that suits me, but I

+ain't got no money for to buy the outfit, and I reckon I couldn't get

+none from home, because it's likely pap's been back before now, and got

+it all away from Judge Thatcher and drunk it up.

+

+"No, he hain't," Tom says; "it's all there yetsix thousand dollars

+and more; and your pap hain't ever been back since.  Hadn't when I come

+away, anyhow."

+

+Jim says, kind of solemn:

+

+"He ain't a-comin' back no mo', Huck."

+

+I says:

+

+"Why, Jim?"

+

+"Nemmine why, Huckbut he ain't comin' back no mo."

+

+But I kept at him; so at last he says:

+

+"Doan' you 'member de house dat was float'n down de river, en dey wuz a

+man in dah, kivered up, en I went in en unkivered him and didn' let you

+come in?  Well, den, you kin git yo' money when you wants it, kase dat

+wuz him."

+

+Tom's most well now, and got his bullet around his neck on a watch-guard

+for a watch, and is always seeing what time it is, and so there ain't

+nothing more to write about, and I am rotten glad of it, because if I'd

+a knowed what a trouble it was to make a book I wouldn't a tackled it,

+and ain't a-going to no more.  But I reckon I got to light out for the

+Territory ahead of the rest, because Aunt Sally she's going to adopt me

+and sivilize me, and I can't stand it.  I been there before.

+

+THE END. YOURS TRULY, HUCK FINN.

+

+

+

+

+

+End of the Project Gutenberg EBook of Adventures of Huckleberry Finn,

+Complete, by Mark Twain (Samuel Clemens)

+

+*** END OF THIS PROJECT GUTENBERG EBOOK HUCKLEBERRY FINN ***

+

+***** This file should be named 76-h.htm or 76-h.zip ***** This and

+all associated files of various formats will be found in:

+http://www.gutenberg.net/7/76/

+

+Produced by David Widger. Previous editions produced by Ron Burkey and

+Internet Wiretap

+

+Updated editions will replace the previous one--the old editions will be

+renamed.

+

+Creating the works from public domain print editions means that no one

+owns a United States copyright in these works, so the Foundation (and

+you!) can copy and distribute it in the United States without permission

+and without paying copyright royalties. Special rules, set forth in

+the General Terms of Use part of this license, apply to copying and

+distributing Project Gutenberg-tm electronic works to protect the

+PROJECT GUTENBERG-tm concept and trademark. Project Gutenberg is a

+registered trademark, and may not be used if you charge for the eBooks,

+unless you receive specific permission. If you do not charge anything

+for copies of this eBook, complying with the rules is very easy. You

+may use this eBook for nearly any purpose such as creation of derivative

+works, reports, performances and research. They may be modified and

+printed and given away--you may do practically ANYTHING with public

+domain eBooks. Redistribution is subject to the trademark license,

+especially commercial redistribution.

+

+*** START: FULL LICENSE ***

+

+THE FULL PROJECT GUTENBERG LICENSE PLEASE READ THIS BEFORE YOU

+DISTRIBUTE OR USE THIS WORK

+

+To protect the Project Gutenberg-tm mission of promoting the free

+distribution of electronic works, by using or distributing this work

+(or any other work associated in any way with the phrase "Project

+Gutenberg"), you agree to comply with all the terms of the Full

+Project Gutenberg-tm License (available with this file or online at

+http://gutenberg.net/license).

+

+Section 1. General Terms of Use and Redistributing Project Gutenberg-tm

+electronic works

+

+1.A. By reading or using any part of this Project Gutenberg-tm

+electronic work, you indicate that you have read, understand, agree

+to and accept all the terms of this license and intellectual property

+(trademark/copyright) agreement. If you do not agree to abide by all the

+terms of this agreement, you must cease using and return or destroy all

+copies of Project Gutenberg-tm electronic works in your possession.

+If you paid a fee for obtaining a copy of or access to a Project

+Gutenberg-tm electronic work and you do not agree to be bound by the

+terms of this agreement, you may obtain a refund from the person or

+entity to whom you paid the fee as set forth in paragraph 1.E.8.

+

+1.B. "Project Gutenberg" is a registered trademark. It may only be used

+on or associated in any way with an electronic work by people who agree

+to be bound by the terms of this agreement. There are a few things that

+you can do with most Project Gutenberg-tm electronic works even without

+complying with the full terms of this agreement. See paragraph 1.C

+below. There are a lot of things you can do with Project Gutenberg-tm

+electronic works if you follow the terms of this agreement and help

+preserve free future access to Project Gutenberg-tm electronic works.

+See paragraph 1.E below.

+

+1.C. The Project Gutenberg Literary Archive Foundation ("the Foundation"

+or PGLAF), owns a compilation copyright in the collection of Project

+Gutenberg-tm electronic works. Nearly all the individual works in

+the collection are in the public domain in the United States. If an

+individual work is in the public domain in the United States and you

+are located in the United States, we do not claim a right to prevent

+you from copying, distributing, performing, displaying or creating

+derivative works based on the work as long as all references to Project

+Gutenberg are removed. Of course, we hope that you will support the

+Project Gutenberg-tm mission of promoting free access to electronic

+works by freely sharing Project Gutenberg-tm works in compliance with

+the terms of this agreement for keeping the Project Gutenberg-tm name

+associated with the work. You can easily comply with the terms of this

+agreement by keeping this work in the same format with its attached

+full Project Gutenberg-tm License when you share it without charge with

+others.

+

+1.D. The copyright laws of the place where you are located also govern

+what you can do with this work. Copyright laws in most countries are in

+a constant state of change. If you are outside the United States, check

+the laws of your country in addition to the terms of this agreement

+before downloading, copying, displaying, performing, distributing

+or creating derivative works based on this work or any other Project

+Gutenberg-tm work. The Foundation makes no representations concerning

+the copyright status of any work in any country outside the United

+States.

+

+1.E. Unless you have removed all references to Project Gutenberg:

+

+1.E.1. The following sentence, with active links to, or other immediate

+access to, the full Project Gutenberg-tm License must appear prominently

+whenever any copy of a Project Gutenberg-tm work (any work on which the

+phrase "Project Gutenberg" appears, or with which the phrase "Project

+Gutenberg" is associated) is accessed, displayed, performed, viewed,

+copied or distributed:

+

+This eBook is for the use of anyone anywhere at no cost and with almost

+no restrictions whatsoever. You may copy it, give it away or re-use

+it under the terms of the Project Gutenberg License included with this

+eBook or online at www.gutenberg.net

+

+1.E.2. If an individual Project Gutenberg-tm electronic work is derived

+from the public domain (does not contain a notice indicating that it is

+posted with permission of the copyright holder), the work can be copied

+and distributed to anyone in the United States without paying any fees

+or charges. If you are redistributing or providing access to a work with

+the phrase "Project Gutenberg" associated with or appearing on the work,

+you must comply either with the requirements of paragraphs 1.E.1 through

+1.E.7 or obtain permission for the use of the work and the Project

+Gutenberg-tm trademark as set forth in paragraphs 1.E.8 or 1.E.9.

+

+1.E.3. If an individual Project Gutenberg-tm electronic work is posted

+with the permission of the copyright holder, your use and distribution

+must comply with both paragraphs 1.E.1 through 1.E.7 and any additional

+terms imposed by the copyright holder. Additional terms will be linked

+to the Project Gutenberg-tm License for all works posted with the

+permission of the copyright holder found at the beginning of this work.

+

+1.E.4. Do not unlink or detach or remove the full Project Gutenberg-tm

+License terms from this work, or any files containing a part of this

+work or any other work associated with Project Gutenberg-tm.

+

+1.E.5. Do not copy, display, perform, distribute or redistribute

+this electronic work, or any part of this electronic work, without

+prominently displaying the sentence set forth in paragraph 1.E.1 with

+active links or immediate access to the full terms of the Project

+Gutenberg-tm License.

+

+1.E.6. You may convert to and distribute this work in any binary,

+compressed, marked up, nonproprietary or proprietary form, including any

+word processing or hypertext form. However, if you provide access to or

+distribute copies of a Project Gutenberg-tm work in a format other

+than "Plain Vanilla ASCII" or other format used in the official

+version posted on the official Project Gutenberg-tm web site

+(www.gutenberg.net), you must, at no additional cost, fee or expense

+to the user, provide a copy, a means of exporting a copy, or a means

+of obtaining a copy upon request, of the work in its original "Plain

+Vanilla ASCII" or other form. Any alternate format must include the full

+Project Gutenberg-tm License as specified in paragraph 1.E.1.

+

+1.E.7. Do not charge a fee for access to, viewing, displaying,

+performing, copying or distributing any Project Gutenberg-tm works

+unless you comply with paragraph 1.E.8 or 1.E.9.

+

+1.E.8. You may charge a reasonable fee for copies of or providing access

+to or distributing Project Gutenberg-tm electronic works provided that

+

+- You pay a royalty fee of 20% of the gross profits you derive from

+the use of Project Gutenberg-tm works calculated using the method you

+already use to calculate your applicable taxes. The fee is owed to the

+owner of the Project Gutenberg-tm trademark, but he has agreed to donate

+royalties under this paragraph to the Project Gutenberg Literary Archive

+Foundation. Royalty payments must be paid within 60 days following each

+date on which you prepare (or are legally required to prepare) your

+periodic tax returns. Royalty payments should be clearly marked as such

+and sent to the Project Gutenberg Literary Archive Foundation at the

+address specified in Section 4, "Information about donations to the

+Project Gutenberg Literary Archive Foundation."

+

+- You provide a full refund of any money paid by a user who notifies you

+in writing (or by e-mail) within 30 days of receipt that s/he does not

+agree to the terms of the full Project Gutenberg-tm License. You

+must require such a user to return or destroy all copies of the works

+possessed in a physical medium and discontinue all use of and all access

+to other copies of Project Gutenberg-tm works.

+

+- You provide, in accordance with paragraph 1.F.3, a full refund of

+any money paid for a work or a replacement copy, if a defect in the

+electronic work is discovered and reported to you within 90 days of

+receipt of the work.

+

+- You comply with all other terms of this agreement for free

+distribution of Project Gutenberg-tm works.

+

+1.E.9. If you wish to charge a fee or distribute a Project Gutenberg-tm

+electronic work or group of works on different terms than are set forth

+in this agreement, you must obtain permission in writing from both the

+Project Gutenberg Literary Archive Foundation and Michael Hart, the

+owner of the Project Gutenberg-tm trademark. Contact the Foundation as

+set forth in Section 3 below.

+

+1.F.

+

+1.F.1. Project Gutenberg volunteers and employees expend considerable

+effort to identify, do copyright research on, transcribe and proofread

+public domain works in creating the Project Gutenberg-tm collection.

+Despite these efforts, Project Gutenberg-tm electronic works, and the

+medium on which they may be stored, may contain "Defects," such as, but

+not limited to, incomplete, inaccurate or corrupt data, transcription

+errors, a copyright or other intellectual property infringement, a

+defective or damaged disk or other medium, a computer virus, or computer

+codes that damage or cannot be read by your equipment.

+

+1.F.2. LIMITED WARRANTY, DISCLAIMER OF DAMAGES - Except for the "Right

+of Replacement or Refund" described in paragraph 1.F.3, the Project

+Gutenberg Literary Archive Foundation, the owner of the Project

+Gutenberg-tm trademark, and any other party distributing a Project

+Gutenberg-tm electronic work under this agreement, disclaim all

+liability to you for damages, costs and expenses, including legal fees.

+YOU AGREE THAT YOU HAVE NO REMEDIES FOR NEGLIGENCE, STRICT LIABILITY,

+BREACH OF WARRANTY OR BREACH OF CONTRACT EXCEPT THOSE PROVIDED IN

+PARAGRAPH F3. YOU AGREE THAT THE FOUNDATION, THE TRADEMARK OWNER, AND

+ANY DISTRIBUTOR UNDER THIS AGREEMENT WILL NOT BE LIABLE TO YOU FOR

+ACTUAL, DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE OR INCIDENTAL DAMAGES

+EVEN IF YOU GIVE NOTICE OF THE POSSIBILITY OF SUCH DAMAGE.

+

+1.F.3. LIMITED RIGHT OF REPLACEMENT OR REFUND - If you discover a defect

+in this electronic work within 90 days of receiving it, you can receive

+a refund of the money (if any) you paid for it by sending a written

+explanation to the person you received the work from. If you received

+the work on a physical medium, you must return the medium with your

+written explanation. The person or entity that provided you with the

+defective work may elect to provide a replacement copy in lieu of a

+refund. If you received the work electronically, the person or entity

+providing it to you may choose to give you a second opportunity to

+receive the work electronically in lieu of a refund. If the second copy

+is also defective, you may demand a refund in writing without further

+opportunities to fix the problem.

+

+1.F.4. Except for the limited right of replacement or refund set forth

+in paragraph 1.F.3, this work is provided to you 'AS-IS' WITH NO OTHER

+WARRANTIES OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO

+WARRANTIES OF MERCHANTIBILITY OR FITNESS FOR ANY PURPOSE.

+

+1.F.5. Some states do not allow disclaimers of certain implied

+warranties or the exclusion or limitation of certain types of damages.

+If any disclaimer or limitation set forth in this agreement violates the

+law of the state applicable to this agreement, the agreement shall be

+interpreted to make the maximum disclaimer or limitation permitted by

+the applicable state law. The invalidity or unenforceability of any

+provision of this agreement shall not void the remaining provisions.

+

+1.F.6. INDEMNITY - You agree to indemnify and hold the Foundation,

+the trademark owner, any agent or employee of the Foundation, anyone

+providing copies of Project Gutenberg-tm electronic works in accordance

+with this agreement, and any volunteers associated with the production,

+promotion and distribution of Project Gutenberg-tm electronic works,

+harmless from all liability, costs and expenses, including legal fees,

+that arise directly or indirectly from any of the following which you do

+or cause to occur: (a) distribution of this or any Project Gutenberg-tm

+work, (b) alteration, modification, or additions or deletions to any

+Project Gutenberg-tm work, and (c) any Defect you cause.

+

+Section 2. Information about the Mission of Project Gutenberg-tm

+

+Project Gutenberg-tm is synonymous with the free distribution of

+electronic works in formats readable by the widest variety of computers

+including obsolete, old, middle-aged and new computers. It exists

+because of the efforts of hundreds of volunteers and donations from

+people in all walks of life.

+

+Volunteers and financial support to provide volunteers with the

+assistance they need, is critical to reaching Project Gutenberg-tm's

+goals and ensuring that the Project Gutenberg-tm collection will remain

+freely available for generations to come. In 2001, the Project Gutenberg

+Literary Archive Foundation was created to provide a secure and

+permanent future for Project Gutenberg-tm and future generations. To

+learn more about the Project Gutenberg Literary Archive Foundation and

+how your efforts and donations can help, see Sections 3 and 4 and the

+Foundation web page at http://www.pglaf.org.

+

+Section 3. Information about the Project Gutenberg Literary Archive

+Foundation

+

+The Project Gutenberg Literary Archive Foundation is a non profit

+501(c)(3) educational corporation organized under the laws of the state

+of Mississippi and granted tax exempt status by the Internal Revenue

+Service. The Foundation's EIN or federal tax identification number

+is 64-6221541. Its 501(c)(3) letter is posted at

+http://pglaf.org/fundraising. Contributions to the Project Gutenberg

+Literary Archive Foundation are tax deductible to the full extent

+permitted by U.S. federal laws and your state's laws.

+

+The Foundation's principal office is located at 4557 Melan Dr. S.

+Fairbanks, AK, 99712., but its volunteers and employees are scattered

+throughout numerous locations. Its business office is located at

+809 North 1500 West, Salt Lake City, UT 84116, (801) 596-1887,

+email business@pglaf.org. Email contact links and up to date contact

+information can be found at the Foundation's web site and official page

+at http://pglaf.org

+

+For additional contact information: Dr. Gregory B. Newby Chief Executive

+and Director gbnewby@pglaf.org

+

+Section 4. Information about Donations to the Project Gutenberg Literary

+Archive Foundation

+

+Project Gutenberg-tm depends upon and cannot survive without wide spread

+public support and donations to carry out its mission of increasing

+the number of public domain and licensed works that can be freely

+distributed in machine readable form accessible by the widest array

+of equipment including outdated equipment. Many small donations ($1 to

+$5,000) are particularly important to maintaining tax exempt status with

+the IRS.

+

+The Foundation is committed to complying with the laws regulating

+charities and charitable donations in all 50 states of the United

+States. Compliance requirements are not uniform and it takes a

+considerable effort, much paperwork and many fees to meet and keep up

+with these requirements. We do not solicit donations in locations

+where we have not received written confirmation of compliance. To SEND

+DONATIONS or determine the status of compliance for any particular state

+visit http://pglaf.org

+

+While we cannot and do not solicit contributions from states where we

+have not met the solicitation requirements, we know of no prohibition

+against accepting unsolicited donations from donors in such states who

+approach us with offers to donate.

+

+International donations are gratefully accepted, but we cannot make any

+statements concerning tax treatment of donations received from outside

+the United States. U.S. laws alone swamp our small staff.

+

+Please check the Project Gutenberg Web pages for current donation

+methods and addresses. Donations are accepted in a number of other ways

+including including checks, online payments and credit card donations.

+To donate, please visit: http://pglaf.org/donate

+

+Section 5. General Information About Project Gutenberg-tm electronic

+works.

+

+Professor Michael S. Hart is the originator of the Project Gutenberg-tm

+concept of a library of electronic works that could be freely shared

+with anyone. For thirty years, he produced and distributed Project

+Gutenberg-tm eBooks with only a loose network of volunteer support.

+

+Project Gutenberg-tm eBooks are often created from several printed

+editions, all of which are confirmed as Public Domain in the U.S. unless

+a copyright notice is included. Thus, we do not necessarily keep eBooks

+in compliance with any particular paper edition.

+

+Most people start at our Web site which has the main PG search facility:

+

+http://www.gutenberg.net

+

+This Web site includes information about Project Gutenberg-tm, including

+how to make donations to the Project Gutenberg Literary Archive

+Foundation, how to help produce our new eBooks, and how to subscribe to

+our email newsletter to hear about new eBooks.

+


diff --git a/test/resources/tokenization/apache_license_header.txt b/test/resources/tokenization/apache_license_header.txt
new file mode 100644
index 0000000..d973dce
--- /dev/null
+++ b/test/resources/tokenization/apache_license_header.txt

@@ -0,0 +1,16 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
\ No newline at end of file

diff --git a/test/resources/tokenization/french_skip_stop_words_before_stemming.txt b/test/resources/tokenization/french_skip_stop_words_before_stemming.txt
new file mode 100644
index 0000000..59a1c23
--- /dev/null
+++ b/test/resources/tokenization/french_skip_stop_words_before_stemming.txt

@@ -0,0 +1 @@
+"La danse sous la pluie" est une chanson connue
\ No newline at end of file

diff --git a/test/resources/tokenization/ja_jp_1.txt b/test/resources/tokenization/ja_jp_1.txt
new file mode 100644
index 0000000..1a0a198
--- /dev/null
+++ b/test/resources/tokenization/ja_jp_1.txt

@@ -0,0 +1 @@
+古写本は題名の記されていないものも多く、記されている場合であっても内容はさまざまである。『源氏物語』の場合は冊子の標題として「源氏物語」ないしそれに相当する物語全体の標題が記されている場合よりも、それぞれの帖名が記されていることが少なくない。こうした経緯から、現在において一般に『源氏物語』と呼ばれているこの物語が書かれた当時の題名が何であったのかは明らかではない。古い時代の写本や注釈書などの文献に記されている名称は大きく以下の系統に分かれる。
\ No newline at end of file

diff --git a/test/resources/tokenization/ja_jp_2.txt b/test/resources/tokenization/ja_jp_2.txt
new file mode 100644
index 0000000..278b4fd
--- /dev/null
+++ b/test/resources/tokenization/ja_jp_2.txt

@@ -0,0 +1,2 @@
+中野幸一編『常用 源氏物語要覧』武蔵野書院、1997年（平成9年）。 ISBN 4-8386-0383-5
+その他にCD-ROM化された本文検索システムとして次のようなものがある。
\ No newline at end of file

diff --git a/test/resources/tokenization/lorem_ipsum.txt b/test/resources/tokenization/lorem_ipsum.txt
new file mode 100644
index 0000000..14a4477
--- /dev/null
+++ b/test/resources/tokenization/lorem_ipsum.txt

@@ -0,0 +1 @@
+"Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum."
\ No newline at end of file

diff --git a/test/resources/tokenization/ru_ru_1.txt b/test/resources/tokenization/ru_ru_1.txt
new file mode 100644
index 0000000..c19a9be
--- /dev/null
+++ b/test/resources/tokenization/ru_ru_1.txt

@@ -0,0 +1,19 @@
+Вэл фабулаз эффикеэнди витюпэраторебуз эи, кюм нобёз дикырыт ёнвидюнт ед. Ючю золэт ийжквюы эа, нык но элитр волуптюа пэркёпитюр. Ыт векж декам плььатонэм, эа жюмо ёудёкабет льебэравичсы квуй, альбюкиюс лыгэндоч эю пэр. Еюж ед аутым нюмквуам тебиквюэ, эи амэт дэбыт нюлльам квюо. Ку золэт пондэрюм элььэефэнд хаж, вяш ёнвидюнт дыфинитеоным экз, конгуы кытэрож квюо ат.
+
+Ад фиэрэнт ыкжплььикари нык, ут дольорэ емпэтюсъ зыд. Зыд ажжюм пэржыкюти жкряпшэрит эю, ыюм ан витаэ аляквюид дяшзынтиыт. Вэл квюандо ридэнж эю. Еюж жэмпэр конклььюжионэмквуэ нэ.
+
+Ку хёз порро тамквюам плььатонэм, льаборэ ыпикурэи вэл ты. Но ентэгры компльыктётюр мэя, дуо жанктюч дэльэнйт льебэравичсы нэ. Эжт фалля пропрёаы эю, эож вэрыар ёнэрмйщ ан. Мюндй контынтёонэж прё ат. Эрож копиожаы пытынтёюм шэа эи.
+
+Но мэль омниюм рэпудёандаэ. Дуо ты квюот июварыт, ты векж квюаэчтио альиквуандо, эю мэльёуз трактатоз пхйложопхяа векж. Ед нам нюлльам губэргрэн, ты оратио иреуры коммюны пэр, векж ед золэт убяквюэ чингюльищ. Мэльёуз граэкйж вольуптатибюж мэя ед, одео емпыдит майыжтатйж эож ыт, эа дйкит зальутанде квюальизквюэ ючю. Йн еюж анкилльаы аккоммодары, ан выльёт оптёон ывыртятюр вэл.
+
+Йн граэко дычэрунт мандамюч мыа. Но про щольыат примич, нык ан этёам дольорэ элььэефэнд. Ыам нэ квюандо нощтыр. Ныморэ дикунт ад хаж, хаж квюод дёко эуежмод ты, амэт дыфинитеоным еюж ыт. Эжт ад апэряря апыирёан, кюм зальы рэктэквуэ нэ.
+
+Эи эрюдитя факёльиси еюж, ыам дольорэ фабулаз вокябюч ат. Про опортэры азжюывырит йн. Мовэт аюдиам ючю эю, нэ едквюэ пэркйпет квюальизквюэ хёз. Эа кюм этёам ырант граэкы. Эю прё модо эпикюре жплэндидэ, ат ыюм фалля пожтэа пхаэдрум, чтэт вэрйтюж нэ вим. Конгуы оратио лобортис ут кюм. Нобёз опортэат но жят, вэрйтюж пэркйпет мальюизчыт квуй ан.
+
+Мацим ютенам рыфэррэнтур вим ат. Ку квюо квюач дигнижжим, одео жольюта тебиквюэ мыа ыт. Ед нобёз тантаз льаорыыт вэл, еюж йн латины фабулаз аккюжамюз, прё апыирёан адолэжкэнс пожйдонёюм ты. Консэквюат котёдиэквюэ ыюм ан, хёз ут хабымуч ыпикурэи чэнзэрет. Ат квуй дэбыт вирйз, нам эю ыльит фабыллас дэлььякатезшимя. Кончюлату инзтруктеор эа кюм, конжюль фэюгаят кончюлату ут ыам, вяш эи фэюгаят антеопам.
+
+Юллюм оратио консэквюат ут вэл, выльёт рыпудяары хэндрэрет эю прё. Унюм ыкчпэтында торквюатоз ад векж. Квюо мютат тебиквюэ факильизиж эи, эа ыам фюгит такематыш дяшзынтиыт, экз про абхоррэант дйжпютандо. Ку хаж льабятюр эрепюят, нолюёжжэ ёудёкабет пэр эю. Тота долорюм азжюывырит прё ут, нык зальы элитр дикырыт эю. Ед дуо ыкжплььикари мныжаркхюм конклььюжионэмквуэ.
+
+Кончюлату азжюывырит нэ зыд. Вэл но квуым граэкйж юрбанйтаж. Про эффякиантур дэфянятйоныс ут, зюаз эрат конкыптам векж эю. Юллюм зюжкепиантюр экз прё, оратио нонумй орнатюс эи эож. Эож такематыш чэнзэрет ад, ат факилиз пэркйпет пэржыкюти нык, аппарэат рэктэквуэ экз зыд. Кюм йн вёвындо дэтракто окюррырэт.
+
+Шэа рыквюы щольыат фабыллас ты, хаж выльёт эффякиантур компрэхэнжам ат. Ты мэя эзшэ ажжюм апыирёан, ат докэндё конкыптам еюж. Ножтрюд жанктюч ывыртятюр ты вяш, но примич промпта пэрчыквюэрёж дуо. Выро мютат омнэжквюы ыам эю.
\ No newline at end of file

diff --git a/test/resources/tokenization/top_visited_domains.txt b/test/resources/tokenization/top_visited_domains.txt
new file mode 100644
index 0000000..0238c36
--- /dev/null
+++ b/test/resources/tokenization/top_visited_domains.txt

@@ -0,0 +1,3 @@
+google.com facebook.com youtube.com yahoo.com baidu.com amazon.com wikipedia.org taobao.com twitter.com Qq.com google.co.in apple.com
+
+http://www.alexa.com/topsites
\ No newline at end of file

diff --git a/test/resources/tokenization/zn_tw_1.txt b/test/resources/tokenization/zn_tw_1.txt
new file mode 100644
index 0000000..7e1e545
--- /dev/null
+++ b/test/resources/tokenization/zn_tw_1.txt

@@ -0,0 +1,19 @@
+銌鞁鬿 蝑蝞蝢 砫粍紞 顃餭 蜭, 趍跠跬 鏀顝饇 轞騹鼚 曨曣 慔 巆忁 嫷 惝掭掝 鑳鱨鱮, 滆 浶洯浽 牣犿玒 嶕憱撏 駇僾 憱撏 硾 湹渵焲 鋧鋓頠 匊呥 犌犐瑆 翣聜蒢 蜙, 毼 噳墺 耇胇赲 稨窨箌 埱娵徖, 鍆錌雔 熩熝犚 庲悊 槄 銌鞁鬿 烍烚珜 疿疶砳 魆 糑縒 魦 萴葂 貄趎跰 萰葍萯, 嗛嗕塨 礂簅縭 婜孲 跣, 楋 澭濇 嗢嗂塝 姌弣抶 曋橪橤
+
+崺崸 獧瞝瞣 牣犿玒 嫷, 墆 齴讘麡 毊灚襳 毚丮厹 甿虮 箯 埱娵 脀蚅蚡 礯籔羻 鈁陾靰, 垼娕 螏螉褩 竀篴臌 槶, 鵳齖齘 驐鷑鷩 絒翗腏 輗 嬦憼 耜僇鄗 訬軗郲 舿萐菿 頠餈 槶, 抰枅 嬃 軹軦軵 鸙讟钃 椵楘溍 渳湥牋 蔝蓶蓨 跜軥 嫀, 砯砨 嗢 鄨鎷闒 縓罃蔾, 鍹餳駷 玝甿虮 熩熝犚 碡碙 銇
+
+瞝瞣 莃荶衒 碄碆碃 樆樦潏 穊, 枲柊氠 婰婜孲 踣 繗繓 犈犆犅 溗煂獂 儋圚 餀, 蟷蠉蟼 禒箈箑 牬玾 槶 玾珆玸 錖霒馞 撖 姴怤 犆犅 鳻嶬幧 誁趏跮 墐
+
+墐 鬵鵛嚪 圩芰敔 蒝蒧蓏 餳駷, 葮 廘榙 斶檎檦 謺貙蹖 澉 駍駔鳿 蒝蒧蓏 蔊蓴蔖 垌壴, 煻 垺垼娕 簻臗藱 笓粊 絼 騉鬵 樛槷殦 縸縩薋 巕氍爟, 璸瓁穟 鯦鯢鯡 罫蓱蒆 齥廲 滘 鯠鯦 噮噦噞 忁曨曣 釂鱞鸄 鉌 寔嵒 葮 瀿犨皫 顤鰩鷎, 憢 蔏蔍蓪 柦柋牬 玝甿虮 偢偣
+
+嗂 蒏葝葮 鋱鋟鋈 鬄鵊鵙 繖藒 毚丮 匢奾灱 枲柊氠 椵楘溍 斠, 鬄鵊 鼏噳墺 巕氍爟 鋟, 鳱 鵳齖齘 雥齆犪 騧騜 轞騹鼚 溗煂獂 諙 岵帔, 煻 廦廥彋 繠繗繓 馦騧騜 齖齘 煘煓 喥喓堮 軹軦軵 壿, 斖蘱 酳 圛嬖嬨 姛帡恦, 摿斠榱 櫧櫋瀩 廅愮揫 驧鬤 跾
+
+綒 鞻饙騴 萆覕貹 稘稒稕 稢綌 笢笣紽 磃箹糈 瑽 氕厊, 剆坲 禖 鶀嚵巆 枅杺枙, 郔镺陯 烗猀珖 伒匢 殟 憢 箛箙 馺骱魡 潧潣瑽 觶譈譀 塝 豥趍 捘栒毤 幨懅憴 稘稒稕, 撖撱暲 駓駗鴀 鄻鎟霣 蝯 訑紱 縢羱 槏殟殠 浘涀缹 鄻鎟霣 輘, 籺籿 媝寔嵒 樧槧樈 焟硱筎 瞂
+
+蚔趵郚 碄碆碃 幋 璻甔 輘 裧頖 簎艜薤 鑤仜伒 誽賚賧 淠淉 鄜酳銪 炾笀耔 椵楘溍 魡 疿疶砳 趡趛踠 躨钀钁 馺 哤垽 庌弝彶 譋轐鏕 毄滱漮 踣 墡嬇, 賗 鯦鯢鯡 齈龘墻 輘輠 蕡蕇蕱 襛襡襙 隒雸頍 紒翀 楋, 殠漊 皾籈譧 磩磟窱 狅妵妶 榎
+
+釂鱞 禠 袟袘觕 餈餖駜 椵楘溍 銈 欿殽 鬵鵛嚪 鬎鯪鯠 礂簅縭, 彃 嶝仉圠 裍裚詷 莃荶 茺苶 趍跠跬 燚璒瘭 廲籗糴 殠 魦魵 姛帡恦 賌輈鄍 沀皯竻 墏, 橁橖澭 牣犿玒 捃挸栚 酳 劻穋 噮噦噞 獧瞝瞣 釂鱞 暕, 蝺 葝葮 壾嵷幓 褣諝趥
+
+跿 鮛鮥鴮 燲獯璯 鵵鵹鵿 唗哱 蓪 塛嫆嫊 邆錉霋 哤垽, 瀁瀎 馺骱魡 鏾鐇闠 闟顣飁 墆, 壾嵷幓 摬摙敳 鵳齖齘 歅 鋄銶 澂 櫞氌瀙 忕汌卣 蠁襏 斶檎檦 觶譈譀 釪傛 瑽, 觾韄鷡 輐銛靾 廞 袚觙 剆坲姏 鼏噳墺 榯槄 觢, 榎 鷃黫鼱 蛚袲褁 闟顣飁 饙騴, 諙踣踙 齸圞趲 鄜 鶾鷃 驐鷑鷩 禒箈箑 痵 娭屔, 蓨蝪 譋轐鏕 蔪蓩蔮 楋
+
+褅褌諃 蛃袚觙 傎圌媔 侹厗 榃, 緦 恦拻 杍肜阰 軥軱逴 緷 摲摓 郔镺陯 揈敜敥, 誙賗跿 彔抳抰 袀豇貣 蜬蝁 榎 傎圌 圛嬖嬨 鑴鱱爧 潣, 枲柊 誙賗跿 貵趀跅 鮂鮐嚃 溿 禖 笓粊 齴讘麡 漻漍犕 趡趛踠, 廞 騩鰒鰔 峷敊浭 烒珛
\ No newline at end of file

diff --git a/test/unit/org/apache/cassandra/OffsetAwareConfigurationLoader.java b/test/unit/org/apache/cassandra/OffsetAwareConfigurationLoader.java
index 3bdb192..0047f48 100644
--- a/test/unit/org/apache/cassandra/OffsetAwareConfigurationLoader.java
+++ b/test/unit/org/apache/cassandra/OffsetAwareConfigurationLoader.java

@@ -47,17 +47,20 @@
     {
         Config config = super.loadConfig();
 
+        String sep = File.pathSeparator;
 
         config.rpc_port += offset;
         config.native_transport_port += offset;
         config.storage_port += offset;
 
-        config.commitlog_directory += File.pathSeparator + offset;
-        config.saved_caches_directory += File.pathSeparator + offset;
-        config.hints_directory += File.pathSeparator + offset;
-        for (int i = 0; i < config.data_file_directories.length; i++)
-            config.data_file_directories[i] += File.pathSeparator + offset;
+        config.commitlog_directory += sep + offset;
+        config.saved_caches_directory += sep + offset;
+        config.hints_directory += sep + offset;
 
+        config.cdc_raw_directory += sep + offset;
+
+        for (int i = 0; i < config.data_file_directories.length; i++)
+            config.data_file_directories[i] += sep + offset;
 
         return config;
     }

diff --git a/test/unit/org/apache/cassandra/SchemaLoader.java b/test/unit/org/apache/cassandra/SchemaLoader.java
index 5d720c4..28fc8d5 100644
--- a/test/unit/org/apache/cassandra/SchemaLoader.java
+++ b/test/unit/org/apache/cassandra/SchemaLoader.java

@@ -21,6 +21,9 @@
 import java.io.IOException;
 import java.util.*;
 
+import org.apache.cassandra.dht.Murmur3Partitioner;
+import org.apache.cassandra.index.sasi.SASIIndex;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder;
 import org.junit.After;
 import org.junit.BeforeClass;
 
@@ -68,6 +71,8 @@
 
     public static void startGossiper()
     {
+        // skip shadow round and endpoint collision check in tests
+        System.setProperty("cassandra.allow_unsafe_join", "true");
         if (!Gossiper.instance.isEnabled())
             Gossiper.instance.start((int) (System.currentTimeMillis() / 1000));
     }
@@ -249,6 +254,8 @@
                         + "WITH COMPACT STORAGE", ks_cql)
         )));
 
+        if (DatabaseDescriptor.getPartitioner() instanceof Murmur3Partitioner)
+            schema.add(KeyspaceMetadata.create("sasi", KeyspaceParams.simpleTransient(1), Tables.of(sasiCFMD("sasi", "test_cf"), clusteringSASICFMD("sasi", "clustering_test_cf"))));
 
         if (Boolean.parseBoolean(System.getProperty("cassandra.test.compression", "false")))
             useCompression(schema);
@@ -398,7 +405,12 @@
         return standardCFMD(ksName, cfName);
 
     }
-    public static CFMetaData compositeIndexCFMD(String ksName, String cfName, boolean withIndex) throws ConfigurationException
+    public static CFMetaData compositeIndexCFMD(String ksName, String cfName, boolean withRegularIndex) throws ConfigurationException
+    {
+        return compositeIndexCFMD(ksName, cfName, withRegularIndex, false);
+    }
+
+    public static CFMetaData compositeIndexCFMD(String ksName, String cfName, boolean withRegularIndex, boolean withStaticIndex) throws ConfigurationException
     {
         // the withIndex flag exists to allow tests index creation
         // on existing columns
@@ -407,9 +419,11 @@
                 .addClusteringColumn("c1", AsciiType.instance)
                 .addRegularColumn("birthdate", LongType.instance)
                 .addRegularColumn("notbirthdate", LongType.instance)
+                .addStaticColumn("static", LongType.instance)
                 .build();
 
-        if (withIndex)
+        if (withRegularIndex)
+        {
             cfm.indexes(
                 cfm.getIndexes()
                    .with(IndexMetadata.fromIndexTargets(cfm,
@@ -419,6 +433,20 @@
                                                         "birthdate_key_index",
                                                         IndexMetadata.Kind.COMPOSITES,
                                                         Collections.EMPTY_MAP)));
+        }
+
+        if (withStaticIndex)
+        {
+            cfm.indexes(
+                    cfm.getIndexes()
+                       .with(IndexMetadata.fromIndexTargets(cfm,
+                                                            Collections.singletonList(
+                                                                new IndexTarget(new ColumnIdentifier("static", true),
+                                                                                IndexTarget.Type.VALUES)),
+                                                            "static_index",
+                                                            IndexMetadata.Kind.COMPOSITES,
+                                                            Collections.EMPTY_MAP)));
+        }
 
         return cfm.compression(getCompressionParameters());
     }
@@ -446,7 +474,7 @@
 
         return cfm.compression(getCompressionParameters());
     }
-    
+
     public static CFMetaData jdbcCFMD(String ksName, String cfName, AbstractType comp)
     {
         return CFMetaData.Builder.create(ksName, cfName).addPartitionKey("key", BytesType.instance)
@@ -454,6 +482,204 @@
                                                         .compression(getCompressionParameters());
     }
 
+    public static CFMetaData sasiCFMD(String ksName, String cfName)
+    {
+        CFMetaData cfm = CFMetaData.Builder.create(ksName, cfName)
+                                           .addPartitionKey("id", UTF8Type.instance)
+                                           .addRegularColumn("first_name", UTF8Type.instance)
+                                           .addRegularColumn("last_name", UTF8Type.instance)
+                                           .addRegularColumn("age", Int32Type.instance)
+                                           .addRegularColumn("height", Int32Type.instance)
+                                           .addRegularColumn("timestamp", LongType.instance)
+                                           .addRegularColumn("address", UTF8Type.instance)
+                                           .addRegularColumn("score", DoubleType.instance)
+                                           .addRegularColumn("comment", UTF8Type.instance)
+                                           .addRegularColumn("comment_suffix_split", UTF8Type.instance)
+                                           .addRegularColumn("/output/full-name/", UTF8Type.instance)
+                                           .addRegularColumn("/data/output/id", UTF8Type.instance)
+                                           .addRegularColumn("first_name_prefix", UTF8Type.instance)
+                                           .build();
+
+        cfm.indexes(cfm.getIndexes()
+                        .with(IndexMetadata.fromSchemaMetadata("first_name", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+                        {{
+                            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                            put(IndexTarget.TARGET_OPTION_NAME, "first_name");
+                            put("mode", OnDiskIndexBuilder.Mode.CONTAINS.toString());
+                        }}))
+                        .with(IndexMetadata.fromSchemaMetadata("last_name", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+                        {{
+                            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                            put(IndexTarget.TARGET_OPTION_NAME, "last_name");
+                            put("mode", OnDiskIndexBuilder.Mode.CONTAINS.toString());
+                        }}))
+                        .with(IndexMetadata.fromSchemaMetadata("age", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+                        {{
+                            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                            put(IndexTarget.TARGET_OPTION_NAME, "age");
+
+                        }}))
+                        .with(IndexMetadata.fromSchemaMetadata("timestamp", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+                        {{
+                            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                            put(IndexTarget.TARGET_OPTION_NAME, "timestamp");
+                            put("mode", OnDiskIndexBuilder.Mode.SPARSE.toString());
+
+                        }}))
+                        .with(IndexMetadata.fromSchemaMetadata("address", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+                        {{
+                            put("analyzer_class", "org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer");
+                            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                            put(IndexTarget.TARGET_OPTION_NAME, "address");
+                            put("mode", OnDiskIndexBuilder.Mode.PREFIX.toString());
+                            put("case_sensitive", "false");
+                        }}))
+                        .with(IndexMetadata.fromSchemaMetadata("score", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+                        {{
+                            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                            put(IndexTarget.TARGET_OPTION_NAME, "score");
+                        }}))
+                        .with(IndexMetadata.fromSchemaMetadata("comment", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+                        {{
+                            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                            put(IndexTarget.TARGET_OPTION_NAME, "comment");
+                            put("mode", OnDiskIndexBuilder.Mode.CONTAINS.toString());
+                            put("analyzed", "true");
+                        }}))
+                        .with(IndexMetadata.fromSchemaMetadata("comment_suffix_split", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+                        {{
+                            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                            put(IndexTarget.TARGET_OPTION_NAME, "comment_suffix_split");
+                            put("mode", OnDiskIndexBuilder.Mode.CONTAINS.toString());
+                            put("analyzed", "false");
+                        }}))
+                        .with(IndexMetadata.fromSchemaMetadata("output_full_name", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+                        {{
+                            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                            put(IndexTarget.TARGET_OPTION_NAME, "/output/full-name/");
+                            put("analyzed", "true");
+                            put("analyzer_class", "org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer");
+                            put("case_sensitive", "false");
+                        }}))
+                        .with(IndexMetadata.fromSchemaMetadata("data_output_id", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+                        {{
+                            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                            put(IndexTarget.TARGET_OPTION_NAME, "/data/output/id");
+                            put("mode", OnDiskIndexBuilder.Mode.CONTAINS.toString());
+                        }}))
+                        .with(IndexMetadata.fromSchemaMetadata("first_name_prefix", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+                        {{
+                            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                            put(IndexTarget.TARGET_OPTION_NAME, "first_name_prefix");
+                            put("analyzed", "true");
+                            put("tokenization_normalize_lowercase", "true");
+                        }})));
+
+        return cfm;
+    }
+
+    public static CFMetaData clusteringSASICFMD(String ksName, String cfName)
+    {
+        return clusteringSASICFMD(ksName, cfName, "location", "age", "height", "score");
+    }
+
+    public static CFMetaData clusteringSASICFMD(String ksName, String cfName, String...indexedColumns)
+    {
+        CFMetaData cfm = CFMetaData.Builder.create(ksName, cfName)
+                                           .addPartitionKey("name", UTF8Type.instance)
+                                           .addClusteringColumn("location", UTF8Type.instance)
+                                           .addClusteringColumn("age", Int32Type.instance)
+                                           .addRegularColumn("height", Int32Type.instance)
+                                           .addRegularColumn("score", DoubleType.instance)
+                                           .build();
+
+        Indexes indexes = cfm.getIndexes();
+        for (String indexedColumn : indexedColumns)
+        {
+            indexes = indexes.with(IndexMetadata.fromSchemaMetadata(indexedColumn, IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+            {{
+                put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+                put(IndexTarget.TARGET_OPTION_NAME, indexedColumn);
+                put("mode", OnDiskIndexBuilder.Mode.PREFIX.toString());
+            }}));
+        }
+        cfm.indexes(indexes);
+        return cfm;
+    }
+
+    public static CFMetaData staticSASICFMD(String ksName, String cfName)
+    {
+        CFMetaData cfm = CFMetaData.Builder.create(ksName, cfName)
+                                           .addPartitionKey("sensor_id", Int32Type.instance)
+                                           .addStaticColumn("sensor_type", UTF8Type.instance)
+                                           .addClusteringColumn("date", LongType.instance)
+                                           .addRegularColumn("value", DoubleType.instance)
+                                           .addRegularColumn("variance", Int32Type.instance)
+                                           .build();
+
+        Indexes indexes = cfm.getIndexes();
+        indexes = indexes.with(IndexMetadata.fromSchemaMetadata("sensor_type", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+        {{
+            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+            put(IndexTarget.TARGET_OPTION_NAME, "sensor_type");
+            put("mode", OnDiskIndexBuilder.Mode.PREFIX.toString());
+            put("analyzer_class", "org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer");
+            put("case_sensitive", "false");
+        }}));
+
+        indexes = indexes.with(IndexMetadata.fromSchemaMetadata("value", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+        {{
+            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+            put(IndexTarget.TARGET_OPTION_NAME, "value");
+            put("mode", OnDiskIndexBuilder.Mode.PREFIX.toString());
+        }}));
+
+        indexes = indexes.with(IndexMetadata.fromSchemaMetadata("variance", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+        {{
+            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+            put(IndexTarget.TARGET_OPTION_NAME, "variance");
+            put("mode", OnDiskIndexBuilder.Mode.PREFIX.toString());
+        }}));
+
+        cfm.indexes(indexes);
+        return cfm;
+    }
+
+    public static CFMetaData fullTextSearchSASICFMD(String ksName, String cfName)
+    {
+        CFMetaData cfm = CFMetaData.Builder.create(ksName, cfName)
+                                           .addPartitionKey("song_id", UUIDType.instance)
+                                           .addRegularColumn("title", UTF8Type.instance)
+                                           .addRegularColumn("artist", UTF8Type.instance)
+                                           .build();
+
+        Indexes indexes = cfm.getIndexes();
+        indexes = indexes.with(IndexMetadata.fromSchemaMetadata("title", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+        {{
+            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+            put(IndexTarget.TARGET_OPTION_NAME, "title");
+            put("mode", OnDiskIndexBuilder.Mode.CONTAINS.toString());
+            put("analyzer_class", "org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer");
+            put("tokenization_enable_stemming", "true");
+            put("tokenization_locale", "en");
+            put("tokenization_skip_stop_words", "true");
+            put("tokenization_normalize_lowercase", "true");
+        }}));
+
+        indexes = indexes.with(IndexMetadata.fromSchemaMetadata("artist", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+        {{
+            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+            put(IndexTarget.TARGET_OPTION_NAME, "artist");
+            put("mode", OnDiskIndexBuilder.Mode.CONTAINS.toString());
+            put("analyzer_class", "org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer");
+            put("case_sensitive", "false");
+
+        }}));
+
+        cfm.indexes(indexes);
+        return cfm;
+    }
+
     public static CompressionParams getCompressionParameters()
     {
         return getCompressionParameters(null);

diff --git a/test/unit/org/apache/cassandra/Util.java b/test/unit/org/apache/cassandra/Util.java
index e7b1ffa..7493b1f 100644
--- a/test/unit/org/apache/cassandra/Util.java
+++ b/test/unit/org/apache/cassandra/Util.java

@@ -228,8 +228,9 @@
     public static void compact(ColumnFamilyStore cfs, Collection<SSTableReader> sstables)
     {
         int gcBefore = cfs.gcBefore(FBUtilities.nowInSeconds());
-        AbstractCompactionTask task = cfs.getCompactionStrategyManager().getUserDefinedTask(sstables, gcBefore);
-        task.execute(null);
+        List<AbstractCompactionTask> tasks = cfs.getCompactionStrategyManager().getUserDefinedTasks(sstables, gcBefore);
+        for (AbstractCompactionTask task : tasks)
+            task.execute(null);
     }
 
     public static void expectEOF(Callable<?> callable)
@@ -275,7 +276,8 @@
 
     public static void assertEmptyUnfiltered(ReadCommand command)
     {
-        try (ReadOrderGroup orderGroup = command.startOrderGroup(); UnfilteredPartitionIterator iterator = command.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = command.executionController();
+             UnfilteredPartitionIterator iterator = command.executeLocally(executionController))
         {
             if (iterator.hasNext())
             {
@@ -289,7 +291,8 @@
 
     public static void assertEmpty(ReadCommand command)
     {
-        try (ReadOrderGroup orderGroup = command.startOrderGroup(); PartitionIterator iterator = command.executeInternal(orderGroup))
+        try (ReadExecutionController executionController = command.executionController();
+             PartitionIterator iterator = command.executeInternal(executionController))
         {
             if (iterator.hasNext())
             {
@@ -304,7 +307,8 @@
     public static List<ImmutableBTreePartition> getAllUnfiltered(ReadCommand command)
     {
         List<ImmutableBTreePartition> results = new ArrayList<>();
-        try (ReadOrderGroup orderGroup = command.startOrderGroup(); UnfilteredPartitionIterator iterator = command.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = command.executionController();
+             UnfilteredPartitionIterator iterator = command.executeLocally(executionController))
         {
             while (iterator.hasNext())
             {
@@ -320,7 +324,8 @@
     public static List<FilteredPartition> getAll(ReadCommand command)
     {
         List<FilteredPartition> results = new ArrayList<>();
-        try (ReadOrderGroup orderGroup = command.startOrderGroup(); PartitionIterator iterator = command.executeInternal(orderGroup))
+        try (ReadExecutionController executionController = command.executionController();
+             PartitionIterator iterator = command.executeInternal(executionController))
         {
             while (iterator.hasNext())
             {
@@ -335,7 +340,8 @@
 
     public static Row getOnlyRowUnfiltered(ReadCommand cmd)
     {
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup(); UnfilteredPartitionIterator iterator = cmd.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController();
+             UnfilteredPartitionIterator iterator = cmd.executeLocally(executionController))
         {
             assert iterator.hasNext() : "Expecting one row in one partition but got nothing";
             try (UnfilteredRowIterator partition = iterator.next())
@@ -352,7 +358,8 @@
 
     public static Row getOnlyRow(ReadCommand cmd)
     {
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup(); PartitionIterator iterator = cmd.executeInternal(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController();
+             PartitionIterator iterator = cmd.executeInternal(executionController))
         {
             assert iterator.hasNext() : "Expecting one row in one partition but got nothing";
             try (RowIterator partition = iterator.next())
@@ -368,7 +375,8 @@
 
     public static ImmutableBTreePartition getOnlyPartitionUnfiltered(ReadCommand cmd)
     {
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup(); UnfilteredPartitionIterator iterator = cmd.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController();
+             UnfilteredPartitionIterator iterator = cmd.executeLocally(executionController))
         {
             assert iterator.hasNext() : "Expecting a single partition but got nothing";
             try (UnfilteredRowIterator partition = iterator.next())
@@ -381,7 +389,8 @@
 
     public static FilteredPartition getOnlyPartition(ReadCommand cmd)
     {
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup(); PartitionIterator iterator = cmd.executeInternal(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController();
+             PartitionIterator iterator = cmd.executeInternal(executionController))
         {
             assert iterator.hasNext() : "Expecting a single partition but got nothing";
             try (RowIterator partition = iterator.next())
@@ -462,7 +471,7 @@
     // moved & refactored from KeyspaceTest in < 3.0
     public static void assertColumns(Row row, String... expectedColumnNames)
     {
-        Iterator<Cell> cells = row == null ? Iterators.<Cell>emptyIterator() : row.cells().iterator();
+        Iterator<Cell> cells = row == null ? Collections.emptyIterator() : row.cells().iterator();
         String[] actual = Iterators.toArray(Iterators.transform(cells, new Function<Cell, String>()
         {
             public String apply(Cell cell)
@@ -623,9 +632,9 @@
 
     public static UnfilteredPartitionIterator executeLocally(PartitionRangeReadCommand command,
                                                              ColumnFamilyStore cfs,
-                                                             ReadOrderGroup orderGroup)
+                                                             ReadExecutionController controller)
     {
-        return new InternalPartitionRangeReadCommand(command).queryStorageInternal(cfs, orderGroup);
+        return new InternalPartitionRangeReadCommand(command).queryStorageInternal(cfs, controller);
     }
 
     private static final class InternalPartitionRangeReadCommand extends PartitionRangeReadCommand
@@ -646,9 +655,9 @@
         }
 
         private UnfilteredPartitionIterator queryStorageInternal(ColumnFamilyStore cfs,
-                                                                 ReadOrderGroup orderGroup)
+                                                                 ReadExecutionController controller)
         {
-            return queryStorage(cfs, orderGroup);
+            return queryStorage(cfs, controller);
         }
     }
 }

diff --git a/test/unit/org/apache/cassandra/auth/StubAuthorizer.java b/test/unit/org/apache/cassandra/auth/StubAuthorizer.java
new file mode 100644
index 0000000..8e0d141
--- /dev/null
+++ b/test/unit/org/apache/cassandra/auth/StubAuthorizer.java

@@ -0,0 +1,120 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.auth;
+
+import java.util.*;
+import java.util.stream.Collectors;
+
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.exceptions.RequestExecutionException;
+import org.apache.cassandra.exceptions.RequestValidationException;
+import org.apache.cassandra.utils.Pair;
+
+public class StubAuthorizer implements IAuthorizer
+{
+    Map<Pair<String, IResource>, Set<Permission>> userPermissions = new HashMap<>();
+
+    public void clear()
+    {
+        userPermissions.clear();
+    }
+
+    public Set<Permission> authorize(AuthenticatedUser user, IResource resource)
+    {
+        Pair<String, IResource> key = Pair.create(user.getName(), resource);
+        Set<Permission> perms = userPermissions.get(key);
+        return perms != null ? perms : Collections.emptySet();
+    }
+
+    public void grant(AuthenticatedUser performer,
+                      Set<Permission> permissions,
+                      IResource resource,
+                      RoleResource grantee) throws RequestValidationException, RequestExecutionException
+    {
+        Pair<String, IResource> key = Pair.create(grantee.getRoleName(), resource);
+        Set<Permission> perms = userPermissions.get(key);
+        if (null == perms)
+        {
+            perms = new HashSet<>();
+            userPermissions.put(key, perms);
+        }
+        perms.addAll(permissions);
+    }
+
+    public void revoke(AuthenticatedUser performer,
+                       Set<Permission> permissions,
+                       IResource resource,
+                       RoleResource revokee) throws RequestValidationException, RequestExecutionException
+    {
+        Pair<String, IResource> key = Pair.create(revokee.getRoleName(), resource);
+        Set<Permission> perms = userPermissions.get(key);
+        if (null != perms)
+        {
+            perms.removeAll(permissions);
+            if (perms.isEmpty())
+                userPermissions.remove(key);
+        }
+    }
+
+    public Set<PermissionDetails> list(AuthenticatedUser performer,
+                                       Set<Permission> permissions,
+                                       IResource resource,
+                                       RoleResource grantee) throws RequestValidationException, RequestExecutionException
+    {
+        return userPermissions.entrySet()
+                              .stream()
+                              .filter(entry -> entry.getKey().left.equals(grantee.getRoleName())
+                                               && (resource == null || entry.getKey().right.equals(resource)))
+                              .flatMap(entry -> entry.getValue()
+                                                     .stream()
+                                                     .filter(permissions::contains)
+                                                     .map(p -> new PermissionDetails(entry.getKey().left,
+                                                                                     entry.getKey().right,
+                                                                                     p)))
+                              .collect(Collectors.toSet());
+
+    }
+
+    public void revokeAllFrom(RoleResource revokee)
+    {
+        for (Pair<String, IResource> key : userPermissions.keySet())
+            if (key.left.equals(revokee.getRoleName()))
+                userPermissions.remove(key);
+    }
+
+    public void revokeAllOn(IResource droppedResource)
+    {
+        for (Pair<String, IResource> key : userPermissions.keySet())
+            if (key.right.equals(droppedResource))
+                userPermissions.remove(key);
+    }
+
+    public Set<? extends IResource> protectedResources()
+    {
+        return Collections.emptySet();
+    }
+
+    public void validateConfiguration() throws ConfigurationException
+    {
+    }
+
+    public void setup()
+    {
+    }
+}

diff --git a/test/unit/org/apache/cassandra/auth/jmx/AuthorizationProxyTest.java b/test/unit/org/apache/cassandra/auth/jmx/AuthorizationProxyTest.java
new file mode 100644
index 0000000..9943acb
--- /dev/null
+++ b/test/unit/org/apache/cassandra/auth/jmx/AuthorizationProxyTest.java

@@ -0,0 +1,574 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.auth.jmx;
+
+import java.util.*;
+import java.util.concurrent.atomic.AtomicBoolean;
+import java.util.function.Function;
+import java.util.function.Supplier;
+import java.util.stream.Collectors;
+import javax.management.MalformedObjectNameException;
+import javax.management.ObjectName;
+import javax.management.remote.JMXPrincipal;
+import javax.security.auth.Subject;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.collect.ImmutableSet;
+import org.junit.Test;
+
+import org.apache.cassandra.auth.*;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+public class AuthorizationProxyTest
+{
+    JMXResource osBean = JMXResource.mbean("java.lang:type=OperatingSystem");
+    JMXResource runtimeBean = JMXResource.mbean("java.lang:type=Runtime");
+    JMXResource threadingBean = JMXResource.mbean("java.lang:type=Threading");
+    JMXResource javaLangWildcard = JMXResource.mbean("java.lang:type=*");
+
+    JMXResource hintsBean = JMXResource.mbean("org.apache.cassandra.hints:type=HintsService");
+    JMXResource batchlogBean = JMXResource.mbean("org.apache.cassandra.db:type=BatchlogManager");
+    JMXResource customBean = JMXResource.mbean("org.apache.cassandra:type=CustomBean,property=foo");
+    Set<ObjectName> allBeans = objectNames(osBean, runtimeBean, threadingBean, hintsBean, batchlogBean, customBean);
+
+    RoleResource role1 = RoleResource.role("r1");
+
+    @Test
+    public void roleHasRequiredPermission() throws Throwable
+    {
+        Map<RoleResource, Set<PermissionDetails>> permissions =
+            ImmutableMap.of(role1, Collections.singleton(permission(role1, osBean, Permission.SELECT)));
+
+        AuthorizationProxy proxy = new ProxyBuilder().isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .isAuthzRequired(() -> true)
+                                                     .build();
+
+        assertTrue(proxy.authorize(subject(role1.getRoleName()),
+                                   "getAttribute",
+                                   new Object[]{ objectName(osBean), "arch" }));
+    }
+
+    @Test
+    public void roleDoesNotHaveRequiredPermission() throws Throwable
+    {
+        Map<RoleResource, Set<PermissionDetails>> permissions =
+            ImmutableMap.of(role1, Collections.singleton(permission(role1, osBean, Permission.AUTHORIZE)));
+
+        AuthorizationProxy proxy = new ProxyBuilder().isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .isAuthzRequired(() -> true).build();
+
+        assertFalse(proxy.authorize(subject(role1.getRoleName()),
+                                    "setAttribute",
+                                    new Object[]{ objectName(osBean), "arch" }));
+    }
+
+    @Test
+    public void roleHasRequiredPermissionOnRootResource() throws Throwable
+    {
+        Map<RoleResource, Set<PermissionDetails>> permissions =
+            ImmutableMap.of(role1, Collections.singleton(permission(role1, JMXResource.root(), Permission.SELECT)));
+
+        AuthorizationProxy proxy = new ProxyBuilder().isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .isAuthzRequired(() -> true)
+                                                     .build();
+
+        assertTrue(proxy.authorize(subject(role1.getRoleName()),
+                                   "getAttribute",
+                                   new Object[]{ objectName(osBean), "arch" }));
+    }
+
+    @Test
+    public void roleHasOtherPermissionOnRootResource() throws Throwable
+    {
+        Map<RoleResource, Set<PermissionDetails>> permissions =
+            ImmutableMap.of(role1, Collections.singleton(permission(role1, JMXResource.root(), Permission.AUTHORIZE)));
+
+        AuthorizationProxy proxy = new ProxyBuilder().isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .isAuthzRequired(() -> true)
+                                                     .build();
+
+        assertFalse(proxy.authorize(subject(role1.getRoleName()),
+                                    "invoke",
+                                    new Object[]{ objectName(osBean), "bogusMethod" }));
+    }
+
+    @Test
+    public void roleHasNoPermissions() throws Throwable
+    {
+        AuthorizationProxy proxy = new ProxyBuilder().isSuperuser((role) -> false)
+                                                     .getPermissions((role) -> Collections.emptySet())
+                                                     .isAuthzRequired(() -> true)
+                                                     .build();
+
+        assertFalse(proxy.authorize(subject(role1.getRoleName()),
+                                    "getAttribute",
+                                    new Object[]{ objectName(osBean), "arch" }));
+    }
+
+    @Test
+    public void roleHasNoPermissionsButIsSuperuser() throws Throwable
+    {
+        AuthorizationProxy proxy = new ProxyBuilder().isSuperuser((role) -> true)
+                                                     .getPermissions((role) -> Collections.emptySet())
+                                                     .isAuthzRequired(() -> true)
+                                                     .build();
+
+        assertTrue(proxy.authorize(subject(role1.getRoleName()),
+                                   "getAttribute",
+                                   new Object[]{ objectName(osBean), "arch" }));
+    }
+
+    @Test
+    public void roleHasNoPermissionsButAuthzNotRequired() throws Throwable
+    {
+        AuthorizationProxy proxy = new ProxyBuilder().isSuperuser((role) -> false)
+                                                     .getPermissions((role) -> Collections.emptySet())
+                                                     .isAuthzRequired(() -> false)
+                                                     .build();
+
+        assertTrue(proxy.authorize(subject(role1.getRoleName()),
+                                   "getAttribute",
+                                   new Object[]{ objectName(osBean), "arch" }));
+    }
+
+    @Test
+    public void authorizeWhenSubjectIsNull() throws Throwable
+    {
+        // a null subject indicates that the action is being performed by the
+        // connector itself, so we always authorize it
+        // Verify that the superuser status is never tested as the request returns early
+        // due to the null Subject
+        // Also, hardcode the permissions provider to return an empty set, so we know that
+        // can be doubly sure that it's the null Subject which causes the authz to succeed
+        final AtomicBoolean suStatusChecked = new AtomicBoolean(false);
+        AuthorizationProxy proxy = new ProxyBuilder().getPermissions((role) -> Collections.emptySet())
+                                                     .isAuthzRequired(() -> true)
+                                                     .isSuperuser((role) ->
+                                                                  {
+                                                                      suStatusChecked.set(true);
+                                                                      return false;
+                                                                  })
+                                                     .build();
+
+        assertTrue(proxy.authorize(null,
+                                   "getAttribute",
+                                   new Object[]{ objectName(osBean), "arch" }));
+        assertFalse(suStatusChecked.get());
+    }
+
+    @Test
+    public void rejectWhenSubjectNotAuthenticated() throws Throwable
+    {
+        // Access is denied to a Subject without any associated Principals
+        // Verify that the superuser status is never tested as the request is rejected early
+        // due to the Subject
+        final AtomicBoolean suStatusChecked = new AtomicBoolean(false);
+        AuthorizationProxy proxy = new ProxyBuilder().isAuthzRequired(() -> true)
+                                                     .isSuperuser((role) ->
+                                                                  {
+                                                                      suStatusChecked.set(true);
+                                                                      return true;
+                                                                  })
+                                                     .build();
+        assertFalse(proxy.authorize(new Subject(),
+                                    "getAttribute",
+                                    new Object[]{ objectName(osBean), "arch" }));
+        assertFalse(suStatusChecked.get());
+    }
+
+    @Test
+    public void authorizeWhenWildcardGrantCoversExactTarget() throws Throwable
+    {
+        Map<RoleResource, Set<PermissionDetails>> permissions =
+            ImmutableMap.of(role1, Collections.singleton(permission(role1, javaLangWildcard, Permission.SELECT)));
+
+        AuthorizationProxy proxy = new ProxyBuilder().isAuthzRequired(() -> true)
+                                                     .isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .build();
+
+        assertTrue(proxy.authorize(subject(role1.getRoleName()),
+                                   "getAttribute",
+                                   new Object[]{ objectName(osBean), "arch" }));
+    }
+
+    @Test
+    public void rejectWhenWildcardGrantDoesNotCoverExactTarget() throws Throwable
+    {
+        Map<RoleResource, Set<PermissionDetails>> permissions =
+            ImmutableMap.of(role1, Collections.singleton(permission(role1, javaLangWildcard, Permission.SELECT)));
+
+        AuthorizationProxy proxy = new ProxyBuilder().isAuthzRequired(() -> true)
+                                                     .isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .build();
+
+        assertFalse(proxy.authorize(subject(role1.getRoleName()),
+                                    "getAttribute",
+                                    new Object[]{ objectName(customBean), "arch" }));
+    }
+
+    @Test
+    public void authorizeWhenWildcardGrantCoversWildcardTarget() throws Throwable
+    {
+        Map<RoleResource, Set<PermissionDetails>> permissions =
+            ImmutableMap.of(role1, Collections.singleton(permission(role1, javaLangWildcard, Permission.DESCRIBE)));
+
+        AuthorizationProxy proxy = new ProxyBuilder().isAuthzRequired(() -> true)
+                                                     .isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .queryNames(matcher(allBeans))
+                                                     .build();
+
+        assertTrue(proxy.authorize(subject(role1.getRoleName()),
+                                   "queryNames",
+                                   new Object[]{ objectName(javaLangWildcard), null }));
+    }
+
+    @Test
+    public void rejectWhenWildcardGrantIsDisjointWithWildcardTarget() throws Throwable
+    {
+        JMXResource customWildcard = JMXResource.mbean("org.apache.cassandra:*");
+        Map<RoleResource, Set<PermissionDetails>> permissions =
+            ImmutableMap.of(role1, Collections.singleton(permission(role1, customWildcard, Permission.DESCRIBE)));
+
+        AuthorizationProxy proxy = new ProxyBuilder().isAuthzRequired(() -> true)
+                                                     .isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .queryNames(matcher(allBeans))
+                                                     .build();
+
+        // the grant on org.apache.cassandra:* shouldn't permit us to invoke queryNames with java.lang:*
+        assertFalse(proxy.authorize(subject(role1.getRoleName()),
+                                    "queryNames",
+                                    new Object[]{ objectName(javaLangWildcard), null }));
+    }
+
+    @Test
+    public void rejectWhenWildcardGrantIntersectsWithWildcardTarget() throws Throwable
+    {
+        // in this test, permissions are granted on org.apache.cassandra:type=CustomBean,property=*
+        // and all beans in the org.apache.cassandra.hints domain, but
+        // but the target of the invocation is org.apache.cassandra*:*
+        // i.e. the subject has permissions on all CustomBeans and on the HintsService bean, but is
+        // attempting to query all names in the org.apache.cassandra* domain. The operation should
+        // be rejected as the permissions don't cover all known beans matching that domain, due to
+        // the BatchLogManager bean.
+
+        JMXResource allCustomBeans = JMXResource.mbean("org.apache.cassandra:type=CustomBean,property=*");
+        JMXResource allHintsBeans = JMXResource.mbean("org.apache.cassandra.hints:*");
+        ObjectName allCassandraBeans = ObjectName.getInstance("org.apache.cassandra*:*");
+
+        Map<RoleResource, Set<PermissionDetails>> permissions =
+            ImmutableMap.of(role1, ImmutableSet.of(permission(role1, allCustomBeans, Permission.DESCRIBE),
+                                                   permission(role1, allHintsBeans, Permission.DESCRIBE)));
+
+        AuthorizationProxy proxy = new ProxyBuilder().isAuthzRequired(() -> true)
+                                                     .isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .queryNames(matcher(allBeans))
+                                                     .build();
+
+        // the grant on org.apache.cassandra:* shouldn't permit us to invoke queryNames with java.lang:*
+        assertFalse(proxy.authorize(subject(role1.getRoleName()),
+                                    "queryNames",
+                                    new Object[]{ allCassandraBeans, null }));
+    }
+
+    @Test
+    public void authorizeOnTargetWildcardWithPermissionOnRoot() throws Throwable
+    {
+        Map<RoleResource, Set<PermissionDetails>> permissions =
+            ImmutableMap.of(role1, Collections.singleton(permission(role1, JMXResource.root(), Permission.SELECT)));
+
+        AuthorizationProxy proxy = new ProxyBuilder().isAuthzRequired(() -> true)
+                                                     .isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .build();
+
+        assertTrue(proxy.authorize(subject(role1.getRoleName()),
+                                   "getAttribute",
+                                   new Object[]{ objectName(javaLangWildcard), "arch" }));
+    }
+
+    @Test
+    public void rejectInvocationOfUnknownMethod() throws Throwable
+    {
+        // Grant ALL permissions on the root resource, so we know that it's
+        // the unknown method that causes the authz rejection. Of course, this
+        // isn't foolproof but it's something.
+        Set<PermissionDetails> allPerms = Permission.ALL.stream()
+                                                        .map(perm -> permission(role1, JMXResource.root(), perm))
+                                                        .collect(Collectors.toSet());
+        Map<RoleResource, Set<PermissionDetails>> permissions = ImmutableMap.of(role1, allPerms);
+        AuthorizationProxy proxy = new ProxyBuilder().isAuthzRequired(() -> true)
+                                                     .isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .build();
+
+        assertFalse(proxy.authorize(subject(role1.getRoleName()),
+                                    "unKnownMethod",
+                                    new Object[] { ObjectName.getInstance(osBean.getObjectName()) }));
+    }
+
+    @Test
+    public void rejectInvocationOfBlacklistedMethods() throws Throwable
+    {
+        String[] methods = { "createMBean",
+                             "deserialize",
+                             "getClassLoader",
+                             "getClassLoaderFor",
+                             "instantiate",
+                             "registerMBean",
+                             "unregisterMBean" };
+
+        // Hardcode the superuser status check to return true, so any allowed method can be invoked.
+        AuthorizationProxy proxy = new ProxyBuilder().isAuthzRequired(() -> true)
+                                                     .isSuperuser((role) -> true)
+                                                     .build();
+
+        for (String method : methods)
+            // the arguments array isn't significant, so it can just be empty
+            assertFalse(proxy.authorize(subject(role1.getRoleName()), method, new Object[0]));
+    }
+
+    @Test
+    public void authorizeMethodsWithoutMBeanArgumentIfPermissionsGranted() throws Throwable
+    {
+        // Certain methods on MBeanServer don't take an ObjectName as their first argument.
+        // These methods are characterised by AuthorizationProxy as being concerned with
+        // the MBeanServer itself, as opposed to a specific managed bean. Of these methods,
+        // only those considered "descriptive" are allowed to be invoked by remote users.
+        // These require the DESCRIBE permission on the root JMXResource.
+        testNonMbeanMethods(true);
+    }
+
+    @Test
+    public void rejectMethodsWithoutMBeanArgumentIfPermissionsNotGranted() throws Throwable
+    {
+        testNonMbeanMethods(false);
+    }
+
+    @Test
+    public void rejectWhenAuthSetupIsNotComplete() throws Throwable
+    {
+        // IAuthorizer & IRoleManager should not be considered ready to use until
+        // we know that auth setup has completed. So, even though the IAuthorizer
+        // would theoretically grant access, the auth proxy should deny it if setup
+        // hasn't finished.
+
+        Map<RoleResource, Set<PermissionDetails>> permissions =
+        ImmutableMap.of(role1, Collections.singleton(permission(role1, osBean, Permission.SELECT)));
+
+        // verify that access is granted when setup is complete
+        AuthorizationProxy proxy = new ProxyBuilder().isSuperuser((role) -> false)
+                                                     .getPermissions(permissions::get)
+                                                     .isAuthzRequired(() -> true)
+                                                     .isAuthSetupComplete(() -> true)
+                                                     .build();
+
+        assertTrue(proxy.authorize(subject(role1.getRoleName()),
+                                   "getAttribute",
+                                   new Object[]{ objectName(osBean), "arch" }));
+
+        // and denied when it isn't
+        proxy = new ProxyBuilder().isSuperuser((role) -> false)
+                                  .getPermissions(permissions::get)
+                                  .isAuthzRequired(() -> true)
+                                  .isAuthSetupComplete(() -> false)
+                                  .build();
+
+        assertFalse(proxy.authorize(subject(role1.getRoleName()),
+                                   "getAttribute",
+                                   new Object[]{ objectName(osBean), "arch" }));
+    }
+
+    private void testNonMbeanMethods(boolean withPermission)
+    {
+        String[] methods = { "getDefaultDomain",
+                             "getDomains",
+                             "getMBeanCount",
+                             "hashCode",
+                             "queryMBeans",
+                             "queryNames",
+                             "toString" };
+
+
+        ProxyBuilder builder = new ProxyBuilder().isAuthzRequired(() -> true).isSuperuser((role) -> false);
+        if (withPermission)
+        {
+            Map<RoleResource, Set<PermissionDetails>> permissions =
+                ImmutableMap.of(role1, ImmutableSet.of(permission(role1, JMXResource.root(), Permission.DESCRIBE)));
+            builder.getPermissions(permissions::get);
+        }
+        else
+        {
+            builder.getPermissions((role) -> Collections.emptySet());
+        }
+        AuthorizationProxy proxy = builder.build();
+
+        for (String method : methods)
+            assertEquals(withPermission, proxy.authorize(subject(role1.getRoleName()), method, new Object[]{ null }));
+
+        // non-whitelisted methods should be rejected regardless.
+        // This isn't exactly comprehensive, but it's better than nothing
+        String[] notAllowed = { "fooMethod", "barMethod", "bazMethod" };
+        for (String method : notAllowed)
+            assertFalse(proxy.authorize(subject(role1.getRoleName()), method, new Object[]{ null }));
+    }
+
+    // provides a simple matching function which can be substituted for the proxy's queryMBeans
+    // utility (which by default just delegates to the MBeanServer)
+    // This function just iterates over a supplied set of ObjectNames and filters out those
+    // to which the target name *doesn't* apply
+    private static Function<ObjectName, Set<ObjectName>> matcher(Set<ObjectName> allBeans)
+    {
+        return (target) -> allBeans.stream()
+                                   .filter(target::apply)
+                                   .collect(Collectors.toSet());
+    }
+
+    private static PermissionDetails permission(RoleResource grantee, IResource resource, Permission permission)
+    {
+        return new PermissionDetails(grantee.getRoleName(), resource, permission);
+    }
+
+    private static Subject subject(String roleName)
+    {
+        Subject subject = new Subject();
+        subject.getPrincipals().add(new CassandraPrincipal(roleName));
+        return subject;
+    }
+
+    private static ObjectName objectName(JMXResource resource) throws MalformedObjectNameException
+    {
+        return ObjectName.getInstance(resource.getObjectName());
+    }
+
+    private static Set<ObjectName> objectNames(JMXResource... resource)
+    {
+        Set<ObjectName> names = new HashSet<>();
+        try
+        {
+            for (JMXResource r : resource)
+                names.add(objectName(r));
+        }
+        catch (MalformedObjectNameException e)
+        {
+            fail("JMXResource returned invalid object name: " + e.getMessage());
+        }
+        return names;
+    }
+
+    public static class ProxyBuilder
+    {
+        Function<RoleResource, Set<PermissionDetails>> getPermissions;
+        Function<ObjectName, Set<ObjectName>> queryNames;
+        Function<RoleResource, Boolean> isSuperuser;
+        Supplier<Boolean> isAuthzRequired;
+        Supplier<Boolean> isAuthSetupComplete = () -> true;
+
+        AuthorizationProxy build()
+        {
+            InjectableAuthProxy proxy = new InjectableAuthProxy();
+
+            if (getPermissions != null)
+                proxy.setGetPermissions(getPermissions);
+
+            if (queryNames != null)
+                proxy.setQueryNames(queryNames);
+
+            if (isSuperuser != null)
+                proxy.setIsSuperuser(isSuperuser);
+
+            if (isAuthzRequired != null)
+                proxy.setIsAuthzRequired(isAuthzRequired);
+
+            proxy.setIsAuthSetupComplete(isAuthSetupComplete);
+
+            return proxy;
+        }
+
+        ProxyBuilder getPermissions(Function<RoleResource, Set<PermissionDetails>> f)
+        {
+            getPermissions = f;
+            return this;
+        }
+
+        ProxyBuilder queryNames(Function<ObjectName, Set<ObjectName>> f)
+        {
+            queryNames = f;
+            return this;
+        }
+
+        ProxyBuilder isSuperuser(Function<RoleResource, Boolean> f)
+        {
+            isSuperuser = f;
+            return this;
+        }
+
+        ProxyBuilder isAuthzRequired(Supplier<Boolean> s)
+        {
+            isAuthzRequired = s;
+            return this;
+        }
+
+        ProxyBuilder isAuthSetupComplete(Supplier<Boolean> s)
+        {
+            isAuthSetupComplete = s;
+            return this;
+        }
+
+        private static class InjectableAuthProxy extends AuthorizationProxy
+        {
+            void setGetPermissions(Function<RoleResource, Set<PermissionDetails>> f)
+            {
+                this.getPermissions = f;
+            }
+
+            void setQueryNames(Function<ObjectName, Set<ObjectName>> f)
+            {
+                this.queryNames = f;
+            }
+
+            void setIsSuperuser(Function<RoleResource, Boolean> f)
+            {
+                this.isSuperuser = f;
+            }
+
+            void setIsAuthzRequired(Supplier<Boolean> s)
+            {
+                this.isAuthzRequired = s;
+            }
+
+            void setIsAuthSetupComplete(Supplier<Boolean> s)
+            {
+                this.isAuthSetupComplete = s;
+            }
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/auth/jmx/JMXAuthTest.java b/test/unit/org/apache/cassandra/auth/jmx/JMXAuthTest.java
new file mode 100644
index 0000000..10c871b
--- /dev/null
+++ b/test/unit/org/apache/cassandra/auth/jmx/JMXAuthTest.java

@@ -0,0 +1,279 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.auth.jmx;
+
+import java.lang.reflect.Field;
+import java.nio.file.Paths;
+import java.rmi.server.RMISocketFactory;
+import java.util.HashMap;
+import java.util.Map;
+import javax.management.JMX;
+import javax.management.MBeanServerConnection;
+import javax.management.ObjectName;
+import javax.management.remote.*;
+import javax.security.auth.Subject;
+import javax.security.auth.callback.CallbackHandler;
+import javax.security.auth.login.LoginException;
+import javax.security.auth.spi.LoginModule;
+
+import com.google.common.collect.ImmutableSet;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import org.apache.cassandra.auth.*;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.cql3.CQLTester;
+import org.apache.cassandra.db.ColumnFamilyStoreMBean;
+import org.apache.cassandra.utils.JMXServerUtils;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
+
+public class JMXAuthTest extends CQLTester
+{
+    private static JMXConnectorServer jmxServer;
+    private static MBeanServerConnection connection;
+    private RoleResource role;
+    private String tableName;
+    private JMXResource tableMBean;
+
+    @FunctionalInterface
+    private interface MBeanAction
+    {
+        void execute();
+    }
+
+    @BeforeClass
+    public static void setupClass() throws Exception
+    {
+        setupAuthorizer();
+        setupJMXServer();
+    }
+
+    private static void setupAuthorizer()
+    {
+        try
+        {
+            IAuthorizer authorizer = new StubAuthorizer();
+            Field authorizerField = DatabaseDescriptor.class.getDeclaredField("authorizer");
+            authorizerField.setAccessible(true);
+            authorizerField.set(null, authorizer);
+            DatabaseDescriptor.setPermissionsValidity(0);
+        }
+        catch (IllegalAccessException | NoSuchFieldException e)
+        {
+            throw new RuntimeException(e);
+        }
+    }
+
+    private static void setupJMXServer() throws Exception
+    {
+        String config = Paths.get(ClassLoader.getSystemResource("auth/cassandra-test-jaas.conf").toURI()).toString();
+        System.setProperty("com.sun.management.jmxremote.authenticate", "true");
+        System.setProperty("java.security.auth.login.config", config);
+        System.setProperty("cassandra.jmx.remote.login.config", "TestLogin");
+        System.setProperty("cassandra.jmx.authorizer", NoSuperUserAuthorizationProxy.class.getName());
+        jmxServer = JMXServerUtils.createJMXServer(9999, true);
+        jmxServer.start();
+
+        JMXServiceURL jmxUrl = new JMXServiceURL("service:jmx:rmi:///jndi/rmi://localhost:9999/jmxrmi");
+        Map<String, Object> env = new HashMap<>();
+        env.put("com.sun.jndi.rmi.factory.socket", RMISocketFactory.getDefaultSocketFactory());
+        JMXConnector jmxc = JMXConnectorFactory.connect(jmxUrl, env);
+        connection = jmxc.getMBeanServerConnection();
+    }
+
+    @Before
+    public void setup() throws Throwable
+    {
+        role = RoleResource.role("test_role");
+        clearAllPermissions();
+        tableName = createTable("CREATE TABLE %s (k int, v int, PRIMARY KEY (k))");
+        tableMBean = JMXResource.mbean(String.format("org.apache.cassandra.db:type=Tables,keyspace=%s,table=%s",
+                                                     KEYSPACE, tableName));
+    }
+
+    @Test
+    public void readAttribute() throws Throwable
+    {
+        ColumnFamilyStoreMBean proxy = JMX.newMBeanProxy(connection,
+                                                         ObjectName.getInstance(tableMBean.getObjectName()),
+                                                         ColumnFamilyStoreMBean.class);
+
+        // grant SELECT on a single specific Table mbean
+        assertPermissionOnResource(Permission.SELECT, tableMBean, proxy::getTableName);
+
+        // grant SELECT on all Table mbeans in named keyspace
+        clearAllPermissions();
+        JMXResource allTablesInKeyspace = JMXResource.mbean(String.format("org.apache.cassandra.db:type=Tables,keyspace=%s,*",
+                                                                          KEYSPACE));
+        assertPermissionOnResource(Permission.SELECT, allTablesInKeyspace, proxy::getTableName);
+
+        // grant SELECT on all Table mbeans
+        clearAllPermissions();
+        JMXResource allTables = JMXResource.mbean("org.apache.cassandra.db:type=Tables,*");
+        assertPermissionOnResource(Permission.SELECT, allTables, proxy::getTableName);
+
+        // grant SELECT ON ALL MBEANS
+        clearAllPermissions();
+        assertPermissionOnResource(Permission.SELECT, JMXResource.root(), proxy::getTableName);
+    }
+
+    @Test
+    public void writeAttribute() throws Throwable
+    {
+        ColumnFamilyStoreMBean proxy = JMX.newMBeanProxy(connection,
+                                                         ObjectName.getInstance(tableMBean.getObjectName()),
+                                                         ColumnFamilyStoreMBean.class);
+        MBeanAction action = () -> proxy.setMinimumCompactionThreshold(4);
+
+        // grant MODIFY on a single specific Table mbean
+        assertPermissionOnResource(Permission.MODIFY, tableMBean, action);
+
+        // grant MODIFY on all Table mbeans in named keyspace
+        clearAllPermissions();
+        JMXResource allTablesInKeyspace = JMXResource.mbean(String.format("org.apache.cassandra.db:type=Tables,keyspace=%s,*",
+                                                                          KEYSPACE));
+        assertPermissionOnResource(Permission.MODIFY, allTablesInKeyspace, action);
+
+        // grant MODIFY on all Table mbeans
+        clearAllPermissions();
+        JMXResource allTables = JMXResource.mbean("org.apache.cassandra.db:type=Tables,*");
+        assertPermissionOnResource(Permission.MODIFY, allTables, action);
+
+        // grant MODIFY ON ALL MBEANS
+        clearAllPermissions();
+        assertPermissionOnResource(Permission.MODIFY, JMXResource.root(), action);
+    }
+
+    @Test
+    public void executeMethod() throws Throwable
+    {
+        ColumnFamilyStoreMBean proxy = JMX.newMBeanProxy(connection,
+                                                         ObjectName.getInstance(tableMBean.getObjectName()),
+                                                         ColumnFamilyStoreMBean.class);
+
+        // grant EXECUTE on a single specific Table mbean
+        assertPermissionOnResource(Permission.EXECUTE, tableMBean, proxy::estimateKeys);
+
+        // grant EXECUTE on all Table mbeans in named keyspace
+        clearAllPermissions();
+        JMXResource allTablesInKeyspace = JMXResource.mbean(String.format("org.apache.cassandra.db:type=Tables,keyspace=%s,*",
+                                                                          KEYSPACE));
+        assertPermissionOnResource(Permission.EXECUTE, allTablesInKeyspace, proxy::estimateKeys);
+
+        // grant EXECUTE on all Table mbeans
+        clearAllPermissions();
+        JMXResource allTables = JMXResource.mbean("org.apache.cassandra.db:type=Tables,*");
+        assertPermissionOnResource(Permission.EXECUTE, allTables, proxy::estimateKeys);
+
+        // grant EXECUTE ON ALL MBEANS
+        clearAllPermissions();
+        assertPermissionOnResource(Permission.EXECUTE, JMXResource.root(), proxy::estimateKeys);
+    }
+
+    private void assertPermissionOnResource(Permission permission,
+                                            JMXResource resource,
+                                            MBeanAction action)
+    {
+        assertUnauthorized(action);
+        grantPermission(permission, resource, role);
+        assertAuthorized(action);
+    }
+
+    private void grantPermission(Permission permission, JMXResource resource, RoleResource role)
+    {
+        DatabaseDescriptor.getAuthorizer().grant(AuthenticatedUser.SYSTEM_USER,
+                                                 ImmutableSet.of(permission),
+                                                 resource,
+                                                 role);
+    }
+
+    private void assertAuthorized(MBeanAction action)
+    {
+        action.execute();
+    }
+
+    private void assertUnauthorized(MBeanAction action)
+    {
+        try
+        {
+            action.execute();
+            fail("Expected an UnauthorizedException, but none was thrown");
+        }
+        catch (SecurityException e)
+        {
+            assertEquals("Access Denied", e.getLocalizedMessage());
+        }
+    }
+
+    private void clearAllPermissions()
+    {
+        ((StubAuthorizer) DatabaseDescriptor.getAuthorizer()).clear();
+    }
+
+    public static class StubLoginModule implements LoginModule
+    {
+        private CassandraPrincipal principal;
+        private Subject subject;
+
+        public StubLoginModule(){}
+
+        public void initialize(Subject subject, CallbackHandler callbackHandler, Map<String, ?> sharedState, Map<String, ?> options)
+        {
+            this.subject = subject;
+            principal = new CassandraPrincipal((String)options.get("role_name"));
+        }
+
+        public boolean login() throws LoginException
+        {
+            return true;
+        }
+
+        public boolean commit() throws LoginException
+        {
+            if (!subject.getPrincipals().contains(principal))
+                subject.getPrincipals().add(principal);
+            return true;
+        }
+
+        public boolean abort() throws LoginException
+        {
+            return true;
+        }
+
+        public boolean logout() throws LoginException
+        {
+            return true;
+        }
+    }
+
+    // always answers false to isSuperUser and true to isAuthSetup complete - saves us having to initialize
+    // a real IRoleManager and StorageService for the test
+    public static class NoSuperUserAuthorizationProxy extends AuthorizationProxy
+    {
+        public NoSuperUserAuthorizationProxy()
+        {
+            super();
+            this.isSuperuser = (role) -> false;
+            this.isAuthSetupComplete = () -> true;
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/batchlog/BatchlogManagerTest.java b/test/unit/org/apache/cassandra/batchlog/BatchlogManagerTest.java
index dd5444f..d4e621f 100644
--- a/test/unit/org/apache/cassandra/batchlog/BatchlogManagerTest.java
+++ b/test/unit/org/apache/cassandra/batchlog/BatchlogManagerTest.java

@@ -40,7 +40,7 @@
 import org.apache.cassandra.db.Mutation;
 import org.apache.cassandra.db.RowUpdateBuilder;
 import org.apache.cassandra.db.SystemKeyspace;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.apache.cassandra.db.marshal.BytesType;
 import org.apache.cassandra.db.partitions.ImmutableBTreePartition;
 import org.apache.cassandra.db.partitions.PartitionUpdate;
@@ -248,7 +248,7 @@
             if (i == 500)
                 SystemKeyspace.saveTruncationRecord(Keyspace.open(KEYSPACE1).getColumnFamilyStore(CF_STANDARD2),
                                                     timestamp,
-                                                    ReplayPosition.NONE);
+                                                    CommitLogPosition.NONE);
 
             // Adjust the timestamp (slightly) to make the test deterministic.
             if (i >= 500)

diff --git a/test/unit/org/apache/cassandra/cache/AutoSavingCacheTest.java b/test/unit/org/apache/cassandra/cache/AutoSavingCacheTest.java
index 0c7e8a5..c952470 100644
--- a/test/unit/org/apache/cassandra/cache/AutoSavingCacheTest.java
+++ b/test/unit/org/apache/cassandra/cache/AutoSavingCacheTest.java

@@ -19,6 +19,7 @@
 
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.marshal.AsciiType;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
@@ -51,9 +52,23 @@
     }
 
     @Test
+    public void testSerializeAndLoadKeyCache0kB() throws Exception
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(0);
+        doTestSerializeAndLoadKeyCache();
+    }
+
+    @Test
     public void testSerializeAndLoadKeyCache() throws Exception
     {
+        DatabaseDescriptor.setColumnIndexCacheSize(8);
+        doTestSerializeAndLoadKeyCache();
+    }
+
+    private static void doTestSerializeAndLoadKeyCache() throws Exception
+    {
         ColumnFamilyStore cfs = Keyspace.open(KEYSPACE1).getColumnFamilyStore(CF_STANDARD1);
+        cfs.truncateBlocking();
         for (int i = 0; i < 2; i++)
         {
             ColumnDefinition colDef = ColumnDefinition.regularDef(cfs.metadata, ByteBufferUtil.bytes("col1"), AsciiType.instance);

diff --git a/test/unit/org/apache/cassandra/config/CFMetaDataTest.java b/test/unit/org/apache/cassandra/config/CFMetaDataTest.java
index 9d91df3..6bfe5c0 100644
--- a/test/unit/org/apache/cassandra/config/CFMetaDataTest.java
+++ b/test/unit/org/apache/cassandra/config/CFMetaDataTest.java

@@ -26,18 +26,11 @@
 import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.db.Mutation;
-import org.apache.cassandra.db.marshal.AsciiType;
-import org.apache.cassandra.db.marshal.Int32Type;
-import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.db.marshal.*;
 import org.apache.cassandra.db.partitions.PartitionUpdate;
 import org.apache.cassandra.db.rows.UnfilteredRowIterators;
 import org.apache.cassandra.exceptions.ConfigurationException;
-import org.apache.cassandra.schema.CompressionParams;
-import org.apache.cassandra.schema.KeyspaceMetadata;
-import org.apache.cassandra.schema.KeyspaceParams;
-import org.apache.cassandra.schema.SchemaKeyspace;
-import org.apache.cassandra.schema.TableParams;
-import org.apache.cassandra.schema.Types;
+import org.apache.cassandra.schema.*;
 import org.apache.cassandra.thrift.CfDef;
 import org.apache.cassandra.thrift.ColumnDef;
 import org.apache.cassandra.thrift.IndexType;
@@ -49,6 +42,8 @@
 import org.junit.Test;
 
 import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
 
 public class CFMetaDataTest
 {
@@ -173,4 +168,98 @@
         assertEquals(cfm.params, params);
         assertEquals(new HashSet<>(cfm.allColumns()), columns);
     }
+    
+    @Test
+    public void testIsNameValidPositive()
+    {
+         assertTrue(CFMetaData.isNameValid("abcdefghijklmnopqrstuvwxyz"));
+         assertTrue(CFMetaData.isNameValid("ABCDEFGHIJKLMNOPQRSTUVWXYZ"));
+         assertTrue(CFMetaData.isNameValid("_01234567890"));
+    }
+    
+    @Test
+    public void testIsNameValidNegative()
+    {
+        assertFalse(CFMetaData.isNameValid(null));
+        assertFalse(CFMetaData.isNameValid(""));
+        assertFalse(CFMetaData.isNameValid(" "));
+        assertFalse(CFMetaData.isNameValid("@"));
+        assertFalse(CFMetaData.isNameValid("!"));
+    }
+
+    private static Set<String> primitiveTypes = new HashSet<String>(Arrays.asList(new String[] { "ascii", "bigint", "blob", "boolean", "date",
+                                                                                                 "decimal", "double", "float", "inet", "int",
+                                                                                                 "smallint", "text", "time", "timestamp",
+                                                                                                 "timeuuid", "tinyint", "uuid", "varchar",
+                                                                                                 "varint" }));
+
+    @Test
+    public void typeCompatibilityTest() throws Throwable
+    {
+        Map<String, Set<String>> compatibilityMap = new HashMap<>();
+        compatibilityMap.put("bigint", new HashSet<>(Arrays.asList(new String[] {"timestamp"})));
+        compatibilityMap.put("blob", new HashSet<>(Arrays.asList(new String[] {"ascii", "bigint", "boolean", "date", "decimal", "double",
+                                                                               "float", "inet", "int", "smallint", "text", "time", "timestamp",
+                                                                               "timeuuid", "tinyint", "uuid", "varchar", "varint"})));
+        compatibilityMap.put("date", new HashSet<>(Arrays.asList(new String[] {"int"})));
+        compatibilityMap.put("time", new HashSet<>(Arrays.asList(new String[] {"bigint"})));
+        compatibilityMap.put("text", new HashSet<>(Arrays.asList(new String[] {"ascii", "varchar"})));
+        compatibilityMap.put("timestamp", new HashSet<>(Arrays.asList(new String[] {"bigint"})));
+        compatibilityMap.put("varchar", new HashSet<>(Arrays.asList(new String[] {"ascii", "text"})));
+        compatibilityMap.put("varint", new HashSet<>(Arrays.asList(new String[] {"bigint", "int", "timestamp"})));
+        compatibilityMap.put("uuid", new HashSet<>(Arrays.asList(new String[] {"timeuuid"})));
+
+        for (String sourceTypeString: primitiveTypes)
+        {
+            AbstractType sourceType = CQLTypeParser.parse("KEYSPACE", sourceTypeString, Types.none());
+            for (String destinationTypeString: primitiveTypes)
+            {
+                AbstractType destinationType = CQLTypeParser.parse("KEYSPACE", destinationTypeString, Types.none());
+
+                if (compatibilityMap.get(destinationTypeString) != null &&
+                    compatibilityMap.get(destinationTypeString).contains(sourceTypeString) ||
+                    sourceTypeString.equals(destinationTypeString))
+                {
+                    assertTrue(sourceTypeString + " should be compatible with " + destinationTypeString,
+                               destinationType.isValueCompatibleWith(sourceType));
+                }
+                else
+                {
+                    assertFalse(sourceTypeString + " should not be compatible with " + destinationTypeString,
+                                destinationType.isValueCompatibleWith(sourceType));
+                }
+            }
+        }
+    }
+
+    @Test
+    public void clusteringColumnTypeCompatibilityTest() throws Throwable
+    {
+        Map<String, Set<String>> compatibilityMap = new HashMap<>();
+        compatibilityMap.put("blob", new HashSet<>(Arrays.asList(new String[] {"ascii", "text", "varchar"})));
+        compatibilityMap.put("text", new HashSet<>(Arrays.asList(new String[] {"ascii", "varchar"})));
+        compatibilityMap.put("varchar", new HashSet<>(Arrays.asList(new String[] {"ascii", "text" })));
+
+        for (String sourceTypeString: primitiveTypes)
+        {
+            AbstractType sourceType = CQLTypeParser.parse("KEYSPACE", sourceTypeString, Types.none());
+            for (String destinationTypeString: primitiveTypes)
+            {
+                AbstractType destinationType = CQLTypeParser.parse("KEYSPACE", destinationTypeString, Types.none());
+
+                if (compatibilityMap.get(destinationTypeString) != null &&
+                    compatibilityMap.get(destinationTypeString).contains(sourceTypeString) ||
+                    sourceTypeString.equals(destinationTypeString))
+                {
+                    assertTrue(sourceTypeString + " should be compatible with " + destinationTypeString,
+                               destinationType.isCompatibleWith(sourceType));
+                }
+                else
+                {
+                    assertFalse(sourceTypeString + " should not be compatible with " + destinationTypeString,
+                                destinationType.isCompatibleWith(sourceType));
+                }
+            }
+        }
+    }
 }

diff --git a/test/unit/org/apache/cassandra/config/DatabaseDescriptorTest.java b/test/unit/org/apache/cassandra/config/DatabaseDescriptorTest.java
index 3a3b6ee..84f0235 100644
--- a/test/unit/org/apache/cassandra/config/DatabaseDescriptorTest.java
+++ b/test/unit/org/apache/cassandra/config/DatabaseDescriptorTest.java

@@ -23,6 +23,8 @@
 import java.net.Inet6Address;
 import java.net.InetAddress;
 import java.net.NetworkInterface;
+import java.util.Arrays;
+import java.util.Collection;
 import java.util.Enumeration;
 
 import org.junit.BeforeClass;
@@ -43,6 +45,8 @@
 import static org.junit.Assert.assertNotNull;
 import static org.junit.Assert.assertNull;
 
+import static org.junit.Assert.assertTrue;
+
 @RunWith(OrderedJUnit4ClassRunner.class)
 public class DatabaseDescriptorTest
 {
@@ -265,4 +269,15 @@
         DatabaseDescriptor.applyAddressConfig(testConfig);
 
     }
+    
+    @Test
+    public void testTokensFromString()
+    {
+        assertTrue(DatabaseDescriptor.tokensFromString(null).isEmpty());
+        Collection<String> tokens = DatabaseDescriptor.tokensFromString(" a,b ,c , d, f,g,h");
+        assertEquals(7, tokens.size());
+        assertTrue(tokens.containsAll(Arrays.asList(new String[]{"a", "b", "c", "d", "f", "g", "h"})));
+
+        
+    }
 }

diff --git a/test/unit/org/apache/cassandra/cql3/CDCStatementTest.java b/test/unit/org/apache/cassandra/cql3/CDCStatementTest.java
new file mode 100644
index 0000000..632c290
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/CDCStatementTest.java

@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.cql3;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+public class CDCStatementTest extends CQLTester
+{
+    @Test
+    public void testEnableOnCreate() throws Throwable
+    {
+        createTable("CREATE TABLE %s (key text, val int, primary key(key)) WITH cdc = true;");
+        Assert.assertTrue(currentTableMetadata().params.cdc);
+    }
+
+    @Test
+    public void testEnableOnAlter() throws Throwable
+    {
+        createTable("CREATE TABLE %s (key text, val int, primary key(key));");
+        Assert.assertFalse(currentTableMetadata().params.cdc);
+        execute("ALTER TABLE %s WITH cdc = true;");
+        Assert.assertTrue(currentTableMetadata().params.cdc);
+    }
+
+    @Test
+    public void testDisableOnAlter() throws Throwable
+    {
+        createTable("CREATE TABLE %s (key text, val int, primary key(key)) WITH cdc = true;");
+        Assert.assertTrue(currentTableMetadata().params.cdc);
+        execute("ALTER TABLE %s WITH cdc = false;");
+        Assert.assertFalse(currentTableMetadata().params.cdc);
+    }
+}

diff --git a/test/unit/org/apache/cassandra/cql3/CQL3TypeLiteralTest.java b/test/unit/org/apache/cassandra/cql3/CQL3TypeLiteralTest.java
index 02ed1a8..43dc267 100644
--- a/test/unit/org/apache/cassandra/cql3/CQL3TypeLiteralTest.java
+++ b/test/unit/org/apache/cassandra/cql3/CQL3TypeLiteralTest.java

@@ -629,14 +629,14 @@
     static UserType randomUserType(int level)
     {
         int typeCount = 2 + randInt(5);
-        List<ByteBuffer> names = new ArrayList<>();
+        List<FieldIdentifier> names = new ArrayList<>();
         List<AbstractType<?>> types = new ArrayList<>();
         for (int i = 0; i < typeCount; i++)
         {
-            names.add(UTF8Type.instance.fromString('f' + randLetters(i)));
+            names.add(FieldIdentifier.forQuoted('f' + randLetters(i)));
             types.add(randomNestedType(level));
         }
-        return new UserType("ks", UTF8Type.instance.fromString("u" + randInt(1000000)), names, types);
+        return new UserType("ks", UTF8Type.instance.fromString("u" + randInt(1000000)), names, types, true);
     }
 
     //

diff --git a/test/unit/org/apache/cassandra/cql3/CQLTester.java b/test/unit/org/apache/cassandra/cql3/CQLTester.java
index a213edf..7e1516a 100644
--- a/test/unit/org/apache/cassandra/cql3/CQLTester.java
+++ b/test/unit/org/apache/cassandra/cql3/CQLTester.java

@@ -110,9 +110,6 @@
         }
         PROTOCOL_VERSIONS = builder.build();
 
-        // Once per-JVM is enough
-        prepareServer();
-
         nativeAddr = InetAddress.getLoopbackAddress();
 
         try
@@ -142,6 +139,11 @@
     private boolean usePrepared = USE_PREPARED_VALUES;
     private static boolean reusePrepared = REUSE_PREPARED;
 
+    protected boolean usePrepared()
+    {
+        return usePrepared;
+    }
+
     public static void prepareServer()
     {
         if (isServerPrepared)
@@ -194,6 +196,10 @@
             FileUtils.deleteRecursive(dir);
         }
 
+        File cdcDir = new File(DatabaseDescriptor.getCDCLogLocation());
+        if (cdcDir.exists())
+            FileUtils.deleteRecursive(cdcDir);
+
         cleanupSavedCaches();
 
         // clean up data directory which are stored as data directory/keyspace/data files
@@ -228,6 +234,9 @@
             DatabaseDescriptor.setRowCacheSizeInMB(ROW_CACHE_SIZE_IN_MB);
 
         StorageService.instance.setPartitionerUnsafe(Murmur3Partitioner.instance);
+
+        // Once per-JVM is enough
+        prepareServer();
     }
 
     @AfterClass
@@ -374,6 +383,12 @@
              : Keyspace.open(KEYSPACE).getColumnFamilyStore(currentTable);
     }
 
+    public void flush(boolean forceFlush)
+    {
+        if (forceFlush)
+            flush();
+    }
+
     public void flush()
     {
         ColumnFamilyStore store = getCurrentColumnFamilyStore();
@@ -381,6 +396,12 @@
             store.forceBlockingFlush();
     }
 
+    public void disableCompaction()
+    {
+        ColumnFamilyStore store = getCurrentColumnFamilyStore();
+        store.disableAutoCompaction();
+    }
+
     public void compact()
     {
         try
@@ -682,6 +703,11 @@
         return currentTable == null ? query : String.format(query, KEYSPACE + "." + currentTable);
     }
 
+    protected ResultMessage.Prepared prepare(String query) throws Throwable
+    {
+        return QueryProcessor.prepare(formatQuery(query), ClientState.forInternalCalls(), false);
+    }
+
     protected UntypedResultSet execute(String query, Object... values) throws Throwable
     {
         query = formatQuery(query);
@@ -832,8 +858,17 @@
         {
             while (iter.hasNext())
             {
-                iter.next();
+                UntypedResultSet.Row actual = iter.next();
                 i++;
+
+                StringBuilder str = new StringBuilder();
+                for (int j = 0; j < meta.size(); j++)
+                {
+                    ColumnSpecification column = meta.get(j);
+                    ByteBuffer actualValue = actual.getBytes(column.name.toString());
+                    str.append(String.format("%s=%s ", column.name, formatValue(actualValue, column.type)));
+                }
+                logger.info("Extra row num {}: {}", i, str.toString());
             }
             Assert.fail(String.format("Got more rows than expected. Expected %d but got %d.", rows.length, i));
         }
@@ -1129,6 +1164,24 @@
                 e.getMessage().contains(text));
     }
 
+    @FunctionalInterface
+    public interface CheckedFunction {
+        void apply() throws Throwable;
+    }
+
+    /**
+     * Runs the given function before and after a flush of sstables.  This is useful for checking that behavior is
+     * the same whether data is in memtables or sstables.
+     * @param runnable
+     * @throws Throwable
+     */
+    public void beforeAndAfterFlush(CheckedFunction runnable) throws Throwable
+    {
+        runnable.apply();
+        flush();
+        runnable.apply();
+    }
+
     private static String replaceValues(String query, Object[] values)
     {
         StringBuilder sb = new StringBuilder();
@@ -1339,7 +1392,7 @@
         if (value instanceof ByteBuffer)
             return (ByteBuffer)value;
 
-        return type.decompose(value);
+        return type.decompose(serializeTuples(value));
     }
 
     private static String formatValue(ByteBuffer bb, AbstractType<?> type)
@@ -1366,7 +1419,19 @@
 
     protected Object userType(Object... values)
     {
-        return new TupleValue(values).toByteBuffer();
+        if (values.length % 2 != 0)
+            throw new IllegalArgumentException("userType() requires an even number of arguments");
+
+        String[] fieldNames = new String[values.length / 2];
+        Object[] fieldValues = new Object[values.length / 2];
+        int fieldNum = 0;
+        for (int i = 0; i < values.length; i += 2)
+        {
+            fieldNames[fieldNum] = (String) values[i];
+            fieldValues[fieldNum] = values[i + 1];
+            fieldNum++;
+        }
+        return new UserTypeValue(fieldNames, fieldValues);
     }
 
     protected Object list(Object...values)
@@ -1480,7 +1545,7 @@
 
     private static class TupleValue
     {
-        private final Object[] values;
+        protected final Object[] values;
 
         TupleValue(Object[] values)
         {
@@ -1514,4 +1579,43 @@
             return "TupleValue" + toCQLString();
         }
     }
+
+    private static class UserTypeValue extends TupleValue
+    {
+        private final String[] fieldNames;
+
+        UserTypeValue(String[] fieldNames, Object[] fieldValues)
+        {
+            super(fieldValues);
+            this.fieldNames = fieldNames;
+        }
+
+        @Override
+        public String toCQLString()
+        {
+            StringBuilder sb = new StringBuilder();
+            sb.append("{");
+            boolean haveEntry = false;
+            for (int i = 0; i < values.length; i++)
+            {
+                if (values[i] != null)
+                {
+                    if (haveEntry)
+                        sb.append(", ");
+                    sb.append(ColumnIdentifier.maybeQuote(fieldNames[i]));
+                    sb.append(": ");
+                    sb.append(formatForCQL(values[i]));
+                    haveEntry = true;
+                }
+            }
+            assert haveEntry;
+            sb.append("}");
+            return sb.toString();
+        }
+
+        public String toString()
+        {
+            return "UserTypeValue" + toCQLString();
+        }
+    }
 }

diff --git a/test/unit/org/apache/cassandra/cql3/ColumnConditionTest.java b/test/unit/org/apache/cassandra/cql3/ColumnConditionTest.java
index 71524c5..989c524 100644
--- a/test/unit/org/apache/cassandra/cql3/ColumnConditionTest.java
+++ b/test/unit/org/apache/cassandra/cql3/ColumnConditionTest.java

@@ -186,7 +186,7 @@
         ColumnDefinition definition = ColumnDefinition.regularDef("ks", "cf", "c", ListType.getInstance(Int32Type.instance, true));
 
         // EQ
-        ColumnCondition condition = ColumnCondition.condition(definition, null, new Lists.Value(Arrays.asList(ONE)), Operator.EQ);
+        ColumnCondition condition = ColumnCondition.condition(definition, new Lists.Value(Arrays.asList(ONE)), Operator.EQ);
         ColumnCondition.CollectionBound bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertTrue(listAppliesTo(bound, list(ONE), list(ONE)));
         assertTrue(listAppliesTo(bound, list(), list()));
@@ -202,7 +202,7 @@
         assertTrue(listAppliesTo(bound, list(ByteBufferUtil.EMPTY_BYTE_BUFFER), list(ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // NEQ
-        condition = ColumnCondition.condition(definition, null, new Lists.Value(Arrays.asList(ONE)), Operator.NEQ);
+        condition = ColumnCondition.condition(definition, new Lists.Value(Arrays.asList(ONE)), Operator.NEQ);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertFalse(listAppliesTo(bound, list(ONE), list(ONE)));
         assertFalse(listAppliesTo(bound, list(), list()));
@@ -218,7 +218,7 @@
         assertFalse(listAppliesTo(bound, list(ByteBufferUtil.EMPTY_BYTE_BUFFER), list(ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // LT
-        condition = ColumnCondition.condition(definition, null, new Lists.Value(Arrays.asList(ONE)), Operator.LT);
+        condition = ColumnCondition.condition(definition, new Lists.Value(Arrays.asList(ONE)), Operator.LT);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertFalse(listAppliesTo(bound, list(ONE), list(ONE)));
         assertFalse(listAppliesTo(bound, list(), list()));
@@ -234,7 +234,7 @@
         assertFalse(listAppliesTo(bound, list(ByteBufferUtil.EMPTY_BYTE_BUFFER), list(ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // LTE
-        condition = ColumnCondition.condition(definition, null, new Lists.Value(Arrays.asList(ONE)), Operator.LTE);
+        condition = ColumnCondition.condition(definition, new Lists.Value(Arrays.asList(ONE)), Operator.LTE);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertTrue(listAppliesTo(bound, list(ONE), list(ONE)));
         assertTrue(listAppliesTo(bound, list(), list()));
@@ -250,7 +250,7 @@
         assertTrue(listAppliesTo(bound, list(ByteBufferUtil.EMPTY_BYTE_BUFFER), list(ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // GT
-        condition = ColumnCondition.condition(definition, null, new Lists.Value(Arrays.asList(ONE)), Operator.GT);
+        condition = ColumnCondition.condition(definition, new Lists.Value(Arrays.asList(ONE)), Operator.GT);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertFalse(listAppliesTo(bound, list(ONE), list(ONE)));
         assertFalse(listAppliesTo(bound, list(), list()));
@@ -266,7 +266,7 @@
         assertFalse(listAppliesTo(bound, list(ByteBufferUtil.EMPTY_BYTE_BUFFER), list(ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // GTE
-        condition = ColumnCondition.condition(definition, null, new Lists.Value(Arrays.asList(ONE)), Operator.GTE);
+        condition = ColumnCondition.condition(definition, new Lists.Value(Arrays.asList(ONE)), Operator.GTE);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertTrue(listAppliesTo(bound, list(ONE), list(ONE)));
         assertTrue(listAppliesTo(bound, list(), list()));
@@ -315,7 +315,7 @@
         ColumnDefinition definition = ColumnDefinition.regularDef("ks", "cf", "c", ListType.getInstance(Int32Type.instance, true));
 
         // EQ
-        ColumnCondition condition = ColumnCondition.condition(definition, null, new Sets.Value(set(ONE)), Operator.EQ);
+        ColumnCondition condition = ColumnCondition.condition(definition, new Sets.Value(set(ONE)), Operator.EQ);
         ColumnCondition.CollectionBound bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertTrue(setAppliesTo(bound, set(ONE), list(ONE)));
         assertTrue(setAppliesTo(bound, set(), list()));
@@ -331,7 +331,7 @@
         assertTrue(setAppliesTo(bound, set(ByteBufferUtil.EMPTY_BYTE_BUFFER), list(ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // NEQ
-        condition = ColumnCondition.condition(definition, null, new Sets.Value(set(ONE)), Operator.NEQ);
+        condition = ColumnCondition.condition(definition, new Sets.Value(set(ONE)), Operator.NEQ);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertFalse(setAppliesTo(bound, set(ONE), list(ONE)));
         assertFalse(setAppliesTo(bound, set(), list()));
@@ -347,7 +347,7 @@
         assertFalse(setAppliesTo(bound, set(ByteBufferUtil.EMPTY_BYTE_BUFFER), list(ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // LT
-        condition = ColumnCondition.condition(definition, null, new Lists.Value(Arrays.asList(ONE)), Operator.LT);
+        condition = ColumnCondition.condition(definition, new Lists.Value(Arrays.asList(ONE)), Operator.LT);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertFalse(setAppliesTo(bound, set(ONE), list(ONE)));
         assertFalse(setAppliesTo(bound, set(), list()));
@@ -363,7 +363,7 @@
         assertFalse(setAppliesTo(bound, set(ByteBufferUtil.EMPTY_BYTE_BUFFER), list(ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // LTE
-        condition = ColumnCondition.condition(definition, null, new Lists.Value(Arrays.asList(ONE)), Operator.LTE);
+        condition = ColumnCondition.condition(definition, new Lists.Value(Arrays.asList(ONE)), Operator.LTE);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertTrue(setAppliesTo(bound, set(ONE), list(ONE)));
         assertTrue(setAppliesTo(bound, set(), list()));
@@ -379,7 +379,7 @@
         assertTrue(setAppliesTo(bound, set(ByteBufferUtil.EMPTY_BYTE_BUFFER), list(ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // GT
-        condition = ColumnCondition.condition(definition, null, new Lists.Value(Arrays.asList(ONE)), Operator.GT);
+        condition = ColumnCondition.condition(definition, new Lists.Value(Arrays.asList(ONE)), Operator.GT);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertFalse(setAppliesTo(bound, set(ONE), list(ONE)));
         assertFalse(setAppliesTo(bound, set(), list()));
@@ -395,7 +395,7 @@
         assertFalse(setAppliesTo(bound, set(ByteBufferUtil.EMPTY_BYTE_BUFFER), list(ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // GTE
-        condition = ColumnCondition.condition(definition, null, new Lists.Value(Arrays.asList(ONE)), Operator.GTE);
+        condition = ColumnCondition.condition(definition, new Lists.Value(Arrays.asList(ONE)), Operator.GTE);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
         assertTrue(setAppliesTo(bound, set(ONE), list(ONE)));
         assertTrue(setAppliesTo(bound, set(), list()));
@@ -448,7 +448,7 @@
         Maps.Value placeholder = new Maps.Value(placeholderMap);
 
         // EQ
-        ColumnCondition condition = ColumnCondition.condition(definition, null, placeholder, Operator.EQ);
+        ColumnCondition condition = ColumnCondition.condition(definition, placeholder, Operator.EQ);
         ColumnCondition.CollectionBound bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
 
         assertTrue(mapAppliesTo(bound, map(ONE, ONE), map(ONE, ONE)));
@@ -470,7 +470,7 @@
         assertTrue(mapAppliesTo(bound, map(ONE, ByteBufferUtil.EMPTY_BYTE_BUFFER), map(ONE, ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // NEQ
-        condition = ColumnCondition.condition(definition, null, placeholder, Operator.NEQ);
+        condition = ColumnCondition.condition(definition, placeholder, Operator.NEQ);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
 
         assertFalse(mapAppliesTo(bound, map(ONE, ONE), map(ONE, ONE)));
@@ -492,7 +492,7 @@
         assertFalse(mapAppliesTo(bound, map(ONE, ByteBufferUtil.EMPTY_BYTE_BUFFER), map(ONE, ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // LT
-        condition = ColumnCondition.condition(definition, null, placeholder, Operator.LT);
+        condition = ColumnCondition.condition(definition, placeholder, Operator.LT);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
 
         assertFalse(mapAppliesTo(bound, map(ONE, ONE), map(ONE, ONE)));
@@ -514,7 +514,7 @@
         assertFalse(mapAppliesTo(bound, map(ONE, ByteBufferUtil.EMPTY_BYTE_BUFFER), map(ONE, ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // LTE
-        condition = ColumnCondition.condition(definition, null, placeholder, Operator.LTE);
+        condition = ColumnCondition.condition(definition, placeholder, Operator.LTE);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
 
         assertTrue(mapAppliesTo(bound, map(ONE, ONE), map(ONE, ONE)));
@@ -536,7 +536,7 @@
         assertTrue(mapAppliesTo(bound, map(ONE, ByteBufferUtil.EMPTY_BYTE_BUFFER), map(ONE, ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // GT
-        condition = ColumnCondition.condition(definition, null, placeholder, Operator.GT);
+        condition = ColumnCondition.condition(definition, placeholder, Operator.GT);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
 
         assertFalse(mapAppliesTo(bound, map(ONE, ONE), map(ONE, ONE)));
@@ -558,7 +558,7 @@
         assertFalse(mapAppliesTo(bound, map(ONE, ByteBufferUtil.EMPTY_BYTE_BUFFER), map(ONE, ByteBufferUtil.EMPTY_BYTE_BUFFER)));
 
         // GTE
-        condition = ColumnCondition.condition(definition, null, placeholder, Operator.GTE);
+        condition = ColumnCondition.condition(definition, placeholder, Operator.GTE);
         bound = (ColumnCondition.CollectionBound) condition.bind(QueryOptions.DEFAULT);
 
         assertTrue(mapAppliesTo(bound, map(ONE, ONE), map(ONE, ONE)));

diff --git a/test/unit/org/apache/cassandra/cql3/ColumnIdentifierTest.java b/test/unit/org/apache/cassandra/cql3/ColumnIdentifierTest.java
index c287883..158110c 100644
--- a/test/unit/org/apache/cassandra/cql3/ColumnIdentifierTest.java
+++ b/test/unit/org/apache/cassandra/cql3/ColumnIdentifierTest.java

@@ -26,6 +26,7 @@
 import junit.framework.Assert;
 import org.apache.cassandra.db.marshal.BytesType;
 import org.apache.cassandra.utils.ByteBufferUtil;
+import static org.junit.Assert.assertEquals;
 
 public class ColumnIdentifierTest
 {
@@ -57,5 +58,23 @@
     {
         return v < 0 ? -1 : v > 0 ? 1 : 0;
     }
+    
+    @Test
+    public void testMaybeQuote()
+    {
+        String unquotable = "a";
+        assertEquals(unquotable, ColumnIdentifier.maybeQuote(unquotable));
+        unquotable = "z4";
+        assertEquals(unquotable, ColumnIdentifier.maybeQuote(unquotable));
+        unquotable = "m_4_";
+        assertEquals(unquotable, ColumnIdentifier.maybeQuote(unquotable));
+        unquotable = "f__";
+        assertEquals(unquotable, ColumnIdentifier.maybeQuote(unquotable));
+        
+        assertEquals("\"A\"", ColumnIdentifier.maybeQuote("A"));
+        assertEquals("\"4b\"", ColumnIdentifier.maybeQuote("4b"));
+        assertEquals("\"\"\"\"", ColumnIdentifier.maybeQuote("\""));
+        assertEquals("\"\"\"a\"\"b\"\"\"", ColumnIdentifier.maybeQuote("\"a\"b\""));
+    }
 
 }

diff --git a/test/unit/org/apache/cassandra/cql3/KeyCacheCqlTest.java b/test/unit/org/apache/cassandra/cql3/KeyCacheCqlTest.java
index 54d39b1..21a17fa 100644
--- a/test/unit/org/apache/cassandra/cql3/KeyCacheCqlTest.java
+++ b/test/unit/org/apache/cassandra/cql3/KeyCacheCqlTest.java

@@ -22,13 +22,16 @@
 import java.util.ArrayList;
 import java.util.Iterator;
 import java.util.List;
+import java.util.concurrent.Callable;
 
 import org.junit.Assert;
 import org.junit.Test;
 
 import org.apache.cassandra.cache.KeyCacheKey;
+import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.db.Keyspace;
+import org.apache.cassandra.index.Index;
 import org.apache.cassandra.metrics.CacheMetrics;
 import org.apache.cassandra.metrics.CassandraMetricsRegistry;
 import org.apache.cassandra.service.CacheService;
@@ -79,7 +82,20 @@
                                      "0123456789012345678901234567890123456789012345678901234567890123456789012345678901234567890123456789";
 
     @Test
-    public void testSliceQueries() throws Throwable
+    public void testSliceQueriesShallowIndexEntry() throws Throwable
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(0);
+        testSliceQueries();
+    }
+
+    @Test
+    public void testSliceQueriesIndexInfoOnHeap() throws Throwable
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(8);
+        testSliceQueries();
+    }
+
+    private void testSliceQueries() throws Throwable
     {
         createTable("CREATE TABLE %s (pk text, ck1 int, ck2 int, val text, vpk text, vck1 int, vck2 int, PRIMARY KEY (pk, ck1, ck2))");
 
@@ -163,7 +179,20 @@
     }
 
     @Test
-    public void test2iKeyCachePaths() throws Throwable
+    public void test2iKeyCachePathsShallowIndexEntry() throws Throwable
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(0);
+        test2iKeyCachePaths();
+    }
+
+    @Test
+    public void test2iKeyCachePathsIndexInfoOnHeap() throws Throwable
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(8);
+        test2iKeyCachePaths();
+    }
+
+    private void test2iKeyCachePaths() throws Throwable
     {
         String table = createTable("CREATE TABLE %s ("
                                    + commonColumnsDef
@@ -240,7 +269,20 @@
     }
 
     @Test
-    public void test2iKeyCachePathsSaveKeysForDroppedTable() throws Throwable
+    public void test2iKeyCachePathsSaveKeysForDroppedTableShallowIndexEntry() throws Throwable
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(0);
+        test2iKeyCachePathsSaveKeysForDroppedTable();
+    }
+
+    @Test
+    public void test2iKeyCachePathsSaveKeysForDroppedTableIndexInfoOnHeap() throws Throwable
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(8);
+        test2iKeyCachePathsSaveKeysForDroppedTable();
+    }
+
+    private void test2iKeyCachePathsSaveKeysForDroppedTable() throws Throwable
     {
         String table = createTable("CREATE TABLE %s ("
                                    + commonColumnsDef
@@ -300,7 +342,20 @@
     }
 
     @Test
-    public void testKeyCacheNonClustered() throws Throwable
+    public void testKeyCacheNonClusteredShallowIndexEntry() throws Throwable
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(0);
+        testKeyCacheNonClustered();
+    }
+
+    @Test
+    public void testKeyCacheNonClusteredIndexInfoOnHeap() throws Throwable
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(8);
+        testKeyCacheNonClustered();
+    }
+
+    private void testKeyCacheNonClustered() throws Throwable
     {
         String table = createTable("CREATE TABLE %s ("
                                    + commonColumnsDef
@@ -333,7 +388,20 @@
     }
 
     @Test
-    public void testKeyCacheClustered() throws Throwable
+    public void testKeyCacheClusteredShallowIndexEntry() throws Throwable
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(0);
+        testKeyCacheClustered();
+    }
+
+    @Test
+    public void testKeyCacheClusteredIndexInfoOnHeap() throws Throwable
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(8);
+        testKeyCacheClustered();
+    }
+
+    private void testKeyCacheClustered() throws Throwable
     {
         String table = createTable("CREATE TABLE %s ("
                                    + commonColumnsDef
@@ -407,7 +475,7 @@
         if (index != null)
         {
             StorageService.instance.disableAutoCompaction(KEYSPACE, table + '.' + index);
-            Keyspace.open(KEYSPACE).getColumnFamilyStore(table).indexManager.getIndexByName(index).getBlockingFlushTask().call();
+            triggerBlockingFlush(Keyspace.open(KEYSPACE).getColumnFamilyStore(table).indexManager.getIndexByName(index));
         }
 
         for (int i = 0; i < 100; i++)
@@ -432,7 +500,7 @@
             {
                 Keyspace.open(KEYSPACE).getColumnFamilyStore(table).forceFlush().get();
                 if (index != null)
-                    Keyspace.open(KEYSPACE).getColumnFamilyStore(table).indexManager.getIndexByName(index).getBlockingFlushTask().call();
+                    triggerBlockingFlush(Keyspace.open(KEYSPACE).getColumnFamilyStore(table).indexManager.getIndexByName(index));
             }
         }
     }
@@ -464,4 +532,12 @@
         Assert.assertEquals(0L, metrics.requests.getCount());
         Assert.assertEquals(0L, metrics.size.getValue().longValue());
     }
+
+    private static void triggerBlockingFlush(Index index) throws Exception
+    {
+        assert index != null;
+        Callable<?> flushTask = index.getBlockingFlushTask();
+        if (flushTask != null)
+            flushTask.call();
+    }
 }

diff --git a/test/unit/org/apache/cassandra/cql3/OutOfSpaceTest.java b/test/unit/org/apache/cassandra/cql3/OutOfSpaceTest.java
index 1527b1e..fd7afd9 100644
--- a/test/unit/org/apache/cassandra/cql3/OutOfSpaceTest.java
+++ b/test/unit/org/apache/cassandra/cql3/OutOfSpaceTest.java

@@ -149,7 +149,7 @@
 
         // Make sure commit log wasn't discarded.
         UUID cfid = currentTableMetadata().cfId;
-        for (CommitLogSegment segment : CommitLog.instance.allocator.getActiveSegments())
+        for (CommitLogSegment segment : CommitLog.instance.segmentManager.getActiveSegments())
             if (segment.getDirtyCFIDs().contains(cfid))
                 return;
         fail("Expected commit log to remain dirty for the affected table.");

diff --git a/test/unit/org/apache/cassandra/cql3/PagingQueryTest.java b/test/unit/org/apache/cassandra/cql3/PagingQueryTest.java
new file mode 100644
index 0000000..8f5f282
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/PagingQueryTest.java

@@ -0,0 +1,111 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.cql3;
+
+import java.util.Iterator;
+import java.util.concurrent.ThreadLocalRandom;
+
+import org.junit.Test;
+
+import com.datastax.driver.core.*;
+import com.datastax.driver.core.ResultSet;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+public class PagingQueryTest extends CQLTester
+{
+    @Test
+    public void pagingOnRegularColumn() throws Throwable
+    {
+        createTable("CREATE TABLE %s (" +
+                    " k1 int," +
+                    " c1 int," +
+                    " c2 int," +
+                    " v1 text," +
+                    " v2 text," +
+                    " v3 text," +
+                    " v4 text," +
+                    "PRIMARY KEY (k1, c1, c2))");
+
+        for (int c1 = 0; c1 < 100; c1++)
+        {
+            for (int c2 = 0; c2 < 100; c2++)
+            {
+                execute("INSERT INTO %s (k1, c1, c2, v1, v2, v3, v4) VALUES (?, ?, ?, ?, ?, ?, ?)", 1, c1, c2,
+                        Integer.toString(c1), Integer.toString(c2), someText(), someText());
+            }
+
+            if (c1 % 30 == 0)
+                flush();
+        }
+
+        flush();
+
+        try (Session session = sessionNet())
+        {
+            SimpleStatement stmt = new SimpleStatement("SELECT c1, c2, v1, v2 FROM " + KEYSPACE + '.' + currentTable() + " WHERE k1 = 1");
+            stmt.setFetchSize(3);
+            ResultSet rs = session.execute(stmt);
+            Iterator<Row> iter = rs.iterator();
+            for (int c1 = 0; c1 < 100; c1++)
+            {
+                for (int c2 = 0; c2 < 100; c2++)
+                {
+                    assertTrue(iter.hasNext());
+                    Row row = iter.next();
+                    String msg = "On " + c1 + ',' + c2;
+                    assertEquals(msg, c1, row.getInt(0));
+                    assertEquals(msg, c2, row.getInt(1));
+                    assertEquals(msg, Integer.toString(c1), row.getString(2));
+                    assertEquals(msg, Integer.toString(c2), row.getString(3));
+                }
+            }
+            assertFalse(iter.hasNext());
+
+            for (int c1 = 0; c1 < 100; c1++)
+            {
+                stmt = new SimpleStatement("SELECT c1, c2, v1, v2 FROM " + KEYSPACE + '.' + currentTable() + " WHERE k1 = 1 AND c1 = ?", c1);
+                stmt.setFetchSize(3);
+                rs = session.execute(stmt);
+                iter = rs.iterator();
+                for (int c2 = 0; c2 < 100; c2++)
+                {
+                    assertTrue(iter.hasNext());
+                    Row row = iter.next();
+                    String msg = "Within " + c1 + " on " + c2;
+                    assertEquals(msg, c1, row.getInt(0));
+                    assertEquals(msg, c2, row.getInt(1));
+                    assertEquals(msg, Integer.toString(c1), row.getString(2));
+                    assertEquals(msg, Integer.toString(c2), row.getString(3));
+                }
+                assertFalse(iter.hasNext());
+            }
+        }
+    }
+
+    private static String someText()
+    {
+        char[] arr = new char[1024];
+        for (int i = 0; i < arr.length; i++)
+            arr[i] = (char)(32 + ThreadLocalRandom.current().nextInt(95));
+        return new String(arr);
+    }
+}

diff --git a/test/unit/org/apache/cassandra/cql3/SimpleQueryTest.java b/test/unit/org/apache/cassandra/cql3/SimpleQueryTest.java
index 052b53d..22a4c49 100644
--- a/test/unit/org/apache/cassandra/cql3/SimpleQueryTest.java
+++ b/test/unit/org/apache/cassandra/cql3/SimpleQueryTest.java

@@ -18,6 +18,7 @@
 package org.apache.cassandra.cql3;
 
 import java.util.*;
+
 import org.junit.Test;
 
 import static junit.framework.Assert.*;
@@ -391,24 +392,44 @@
 
         execute("INSERT INTO %s (k, t, v, s) values (?, ?, ?, ?)", "key1", 3, "foo3", "st3");
         execute("INSERT INTO %s (k, t, v) values (?, ?, ?)", "key1", 4, "foo4");
+        execute("INSERT INTO %s (k, t, v, s) values (?, ?, ?, ?)", "key1", 2, "foo2", "st2-repeat");
+
+        flush();
+
+        execute("INSERT INTO %s (k, t, v, s) values (?, ?, ?, ?)", "key1", 5, "foo5", "st5");
+        execute("INSERT INTO %s (k, t, v) values (?, ?, ?)", "key1", 6, "foo6");
+
 
         assertRows(execute("SELECT * FROM %s"),
-            row("key1",  1, "st3", "foo1"),
-            row("key1",  2, "st3", "foo2"),
-            row("key1",  3, "st3", "foo3"),
-            row("key1",  4, "st3", "foo4")
+            row("key1",  1, "st5", "foo1"),
+            row("key1",  2, "st5", "foo2"),
+            row("key1",  3, "st5", "foo3"),
+            row("key1",  4, "st5", "foo4"),
+            row("key1",  5, "st5", "foo5"),
+            row("key1",  6, "st5", "foo6")
         );
 
         assertRows(execute("SELECT s FROM %s WHERE k = ?", "key1"),
-            row("st3"),
-            row("st3"),
-            row("st3"),
-            row("st3")
+            row("st5"),
+            row("st5"),
+            row("st5"),
+            row("st5"),
+            row("st5"),
+            row("st5")
         );
 
         assertRows(execute("SELECT DISTINCT s FROM %s WHERE k = ?", "key1"),
-            row("st3")
+            row("st5")
         );
+
+        assertEmpty(execute("SELECT * FROM %s WHERE k = ? AND t > ? AND t < ?", "key1", 7, 5));
+        assertEmpty(execute("SELECT * FROM %s WHERE k = ? AND t > ? AND t < ? ORDER BY t DESC", "key1", 7, 5));
+
+        assertRows(execute("SELECT * FROM %s WHERE k = ? AND t = ?", "key1", 2),
+            row("key1", 2, "st5", "foo2"));
+
+        assertRows(execute("SELECT * FROM %s WHERE k = ? AND t = ? ORDER BY t DESC", "key1", 2),
+            row("key1", 2, "st5", "foo2"));
     }
 
     @Test
@@ -529,4 +550,43 @@
             row(0, 0, 0, 0)
         );
     }
+
+    /** Test for Cassandra issue 10958 **/
+    @Test
+    public void restrictionOnRegularColumnWithStaticColumnPresentTest() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, id2 int, age int static, extra int, PRIMARY KEY(id, id2))");
+
+        execute("INSERT INTO %s (id, id2, age, extra) VALUES (?, ?, ?, ?)", 1, 1, 1, 1);
+        execute("INSERT INTO %s (id, id2, age, extra) VALUES (?, ?, ?, ?)", 2, 2, 2, 2);
+        execute("UPDATE %s SET age=? WHERE id=?", 3, 3);
+
+        assertRows(execute("SELECT * FROM %s"),
+            row(1, 1, 1, 1),
+            row(2, 2, 2, 2),
+            row(3, null, 3, null)
+        );
+
+        assertRows(execute("SELECT * FROM %s WHERE extra > 1 ALLOW FILTERING"),
+            row(2, 2, 2, 2)
+        );
+    }
+
+    @Test
+    public void testRowFilteringOnStaticColumn() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, name text, age int static, PRIMARY KEY (id, name))");
+        for (int i = 0; i < 5; i++)
+        {
+            execute("INSERT INTO %s (id, name, age) VALUES (?, ?, ?)", i, "NameDoesNotMatter", i);
+        }
+
+        assertInvalid("SELECT id, age FROM %s WHERE age < 1");
+        assertRows(execute("SELECT id, age FROM %s WHERE age < 1 ALLOW FILTERING"),
+                   row(0, 0));
+        assertRows(execute("SELECT id, age FROM %s WHERE age > 0 AND age < 3 ALLOW FILTERING"),
+                   row(1, 1), row(2, 2));
+        assertRows(execute("SELECT id, age FROM %s WHERE age > 3 ALLOW FILTERING"),
+                   row(4, 4));
+    }
 }

diff --git a/test/unit/org/apache/cassandra/cql3/TombstonesWithIndexedSSTableTest.java b/test/unit/org/apache/cassandra/cql3/TombstonesWithIndexedSSTableTest.java
index 3042acd..787c309 100644
--- a/test/unit/org/apache/cassandra/cql3/TombstonesWithIndexedSSTableTest.java
+++ b/test/unit/org/apache/cassandra/cql3/TombstonesWithIndexedSSTableTest.java

@@ -24,8 +24,8 @@
 import org.apache.cassandra.Util;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.marshal.Int32Type;
-import org.apache.cassandra.io.sstable.IndexHelper;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.io.util.FileDataInput;
 import org.apache.cassandra.utils.ByteBufferUtil;
 
 public class TombstonesWithIndexedSSTableTest extends CQLTester
@@ -76,13 +76,17 @@
             {
                 // The line below failed with key caching off (CASSANDRA-11158)
                 @SuppressWarnings("unchecked")
-                RowIndexEntry<IndexHelper.IndexInfo> indexEntry = sstable.getPosition(dk, SSTableReader.Operator.EQ);
+                RowIndexEntry indexEntry = sstable.getPosition(dk, SSTableReader.Operator.EQ);
                 if (indexEntry != null && indexEntry.isIndexed())
                 {
-                    ClusteringPrefix firstName = indexEntry.columnsIndex().get(1).firstName;
-                    if (firstName.kind().isBoundary())
-                        break deletionLoop;
-                    indexedRow = Int32Type.instance.compose(firstName.get(0));
+                    try (FileDataInput reader = sstable.openIndexReader())
+                    {
+                        RowIndexEntry.IndexInfoRetriever infoRetriever = indexEntry.openWithIndex(sstable.getIndexFile());
+                        ClusteringPrefix firstName = infoRetriever.columnsIndex(1).firstName;
+                        if (firstName.kind().isBoundary())
+                            break deletionLoop;
+                        indexedRow = Int32Type.instance.compose(firstName.get(0));
+                    }
                 }
             }
             assert indexedRow >= 0;
@@ -104,7 +108,6 @@
         assertRowCount(execute("SELECT DISTINCT s FROM %s WHERE k = ? ORDER BY t DESC", 0), 1);
     }
 
-    // Creates a random string
     public static String makeRandomString(int length)
     {
         Random random = new Random();

diff --git a/test/unit/org/apache/cassandra/cql3/TraceCqlTest.java b/test/unit/org/apache/cassandra/cql3/TraceCqlTest.java
new file mode 100644
index 0000000..735fb6a
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/TraceCqlTest.java

@@ -0,0 +1,126 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.cql3;
+
+import org.junit.Test;
+
+import com.datastax.driver.core.CodecRegistry;
+import com.datastax.driver.core.DataType;
+import com.datastax.driver.core.PreparedStatement;
+import com.datastax.driver.core.ProtocolVersion;
+import com.datastax.driver.core.QueryTrace;
+import com.datastax.driver.core.Session;
+import com.datastax.driver.core.TupleType;
+import com.datastax.driver.core.TupleValue;
+
+import static org.junit.Assert.assertEquals;
+
+public class TraceCqlTest extends CQLTester
+{
+    @Test
+    public void testCqlStatementTracing() throws Throwable
+    {
+        requireNetwork();
+
+        createTable("CREATE TABLE %s (id int primary key, v1 text, v2 text)");
+        execute("INSERT INTO %s (id, v1, v2) VALUES (?, ?, ?)", 1, "Apache", "Cassandra");
+        execute("INSERT INTO %s (id, v1, v2) VALUES (?, ?, ?)", 2, "trace", "test");
+
+        try (Session session = sessionNet())
+        {
+            String cql = "SELECT id, v1, v2 FROM " + KEYSPACE + '.' + currentTable() + " WHERE id = ?";
+            PreparedStatement pstmt = session.prepare(cql)
+                                             .enableTracing();
+            QueryTrace trace = session.execute(pstmt.bind(1)).getExecutionInfo().getQueryTrace();
+            assertEquals(cql, trace.getParameters().get("query"));
+
+            assertEquals("1", trace.getParameters().get("bound_var_0_id"));
+
+            String cql2 = "SELECT id, v1, v2 FROM " + KEYSPACE + '.' + currentTable() + " WHERE id IN (?, ?, ?)";
+            pstmt = session.prepare(cql2).enableTracing();
+            trace = session.execute(pstmt.bind(19, 15, 16)).getExecutionInfo().getQueryTrace();
+            assertEquals(cql2, trace.getParameters().get("query"));
+            assertEquals("19", trace.getParameters().get("bound_var_0_id"));
+            assertEquals("15", trace.getParameters().get("bound_var_1_id"));
+            assertEquals("16", trace.getParameters().get("bound_var_2_id"));
+
+            //some more complex tests for tables with map and tuple data types and long bound values
+            createTable("CREATE TABLE %s (id int primary key, v1 text, v2 tuple<int, text, float>, v3 map<int, text>)");
+            execute("INSERT INTO %s (id, v1, v2, v3) values (?, ?, ?, ?)", 12, "mahdix", tuple(3, "bar", 2.1f),
+                    map(1290, "birthday", 39, "anniversary"));
+            execute("INSERT INTO %s (id, v1, v2, v3) values (?, ?, ?, ?)", 274, "CassandraRocks", tuple(9, "foo", 3.14f),
+                    map(9181, "statement", 716, "public speech"));
+
+            cql = "SELECT id, v1, v2, v3 FROM " + KEYSPACE + '.' + currentTable() + " WHERE v2 = ? ALLOW FILTERING";
+            pstmt = session.prepare(cql)
+                           .enableTracing();
+            TupleType tt = TupleType.of(ProtocolVersion.NEWEST_SUPPORTED, CodecRegistry.DEFAULT_INSTANCE, DataType.cint(),
+                                        DataType.text(), DataType.cfloat());
+            TupleValue value = tt.newValue();
+            value.setInt(0, 3);
+            value.setString(1, "bar");
+            value.setFloat(2, 2.1f);
+
+            trace = session.execute(pstmt.bind(value)).getExecutionInfo().getQueryTrace();
+            assertEquals(cql, trace.getParameters().get("query"));
+            assertEquals("(3, 'bar', 2.1)", trace.getParameters().get("bound_var_0_v2"));
+
+            cql2 = "SELECT id, v1, v2, v3 FROM " + KEYSPACE + '.' + currentTable() + " WHERE v3 CONTAINS KEY ? ALLOW FILTERING";
+            pstmt = session.prepare(cql2).enableTracing();
+            trace = session.execute(pstmt.bind(9181)).getExecutionInfo().getQueryTrace();
+
+            assertEquals(cql2, trace.getParameters().get("query"));
+            assertEquals("9181", trace.getParameters().get("bound_var_0_key(v3)"));
+
+            String boundValue = "Indulgence announcing uncommonly met she continuing two unpleasing terminated. Now " +
+                                "busy say down the shed eyes roof paid her. Of shameless collected suspicion existence " +
+                                "in. Share walls stuff think but the arise guest. Course suffer to do he sussex it " +
+                                "window advice. Yet matter enable misery end extent common men should. Her indulgence " +
+                                "but assistance favourable cultivated everything collecting." +
+                                "On projection apartments unsatiable so if he entreaties appearance. Rose you wife " +
+                                "how set lady half wish. Hard sing an in true felt. Welcomed stronger if steepest " +
+                                "ecstatic an suitable finished of oh. Entered at excited at forming between so " +
+                                "produce. Chicken unknown besides attacks gay compact out you. Continuing no " +
+                                "simplicity no favourable on reasonably melancholy estimating. Own hence views two " +
+                                "ask right whole ten seems. What near kept met call old west dine. Our announcing " +
+                                "sufficient why pianoforte. Full age foo set feel her told. Tastes giving in passed" +
+                                "direct me valley as supply. End great stood boy noisy often way taken short. Rent the " +
+                                "size our more door. Years no place abode in \uFEFFno child my. Man pianoforte too " +
+                                "solicitude friendship devonshire ten ask. Course sooner its silent but formal she " +
+                                "led. Extensive he assurance extremity at breakfast. Dear sure ye sold fine sell on. " +
+                                "Projection at up connection literature insensible motionless projecting." +
+                                "Nor hence hoped her after other known defer his. For county now sister engage had " +
+                                "season better had waited. Occasional mrs interested far expression acceptance. Day " +
+                                "either mrs talent pulled men rather regret admire but. Life ye sake it shed. Five " +
+                                "lady he cold in meet up. Service get met adapted matters offence for. Principles man " +
+                                "any insipidity age you simplicity understood. Do offering pleasure no ecstatic " +
+                                "whatever on mr directly. ";
+
+            String cql3 = "SELECT id, v1, v2, v3 FROM " + KEYSPACE + '.' + currentTable() + " WHERE v3 CONTAINS ? ALLOW FILTERING";
+            pstmt = session.prepare(cql3).enableTracing();
+            trace = session.execute(pstmt.bind(boundValue)).getExecutionInfo().getQueryTrace();
+
+            assertEquals(cql3, trace.getParameters().get("query"));
+
+            //when tracing is done, this boundValue will be surrounded by single quote, and first 1000 characters
+            //will be filtered. Here we take into account single quotes by adding them to the expected output
+            assertEquals("'" + boundValue.substring(0, 999) + "...'", trace.getParameters().get("bound_var_0_value(v3)"));
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java b/test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java
new file mode 100644
index 0000000..9b4b570
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/functions/CastFctsTest.java

@@ -0,0 +1,314 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql3.functions;
+
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.util.Date;
+
+import org.apache.cassandra.cql3.CQLTester;
+import org.apache.cassandra.serializers.SimpleDateSerializer;
+import org.apache.cassandra.utils.UUIDGen;
+import org.joda.time.DateTime;
+import org.joda.time.DateTimeZone;
+import org.joda.time.format.DateTimeFormat;
+import org.junit.Test;
+
+public class CastFctsTest extends CQLTester
+{
+    @Test
+    public void testInvalidQueries() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int primary key, b text, c double)");
+
+        assertInvalidSyntaxMessage("no viable alternative at input '(' (... b, c) VALUES ([CAST](...)",
+                                   "INSERT INTO %s (a, b, c) VALUES (CAST(? AS int), ?, ?)", 1.6, "test", 6.3);
+
+        assertInvalidSyntaxMessage("no viable alternative at input '(' (..." + KEYSPACE + "." + currentTable()
+                + " SET c = [cast](...)",
+                                   "UPDATE %s SET c = cast(? as double) WHERE a = ?", 1, 1);
+
+        assertInvalidSyntaxMessage("no viable alternative at input '(' (...= ? WHERE a = [CAST] (...)",
+                                   "UPDATE %s SET c = ? WHERE a = CAST (? AS INT)", 1, 2.0);
+
+        assertInvalidSyntaxMessage("no viable alternative at input '(' (..." + KEYSPACE + "." + currentTable()
+                + " WHERE a = [CAST] (...)",
+                                   "DELETE FROM %s WHERE a = CAST (? AS INT)", 1, 2.0);
+
+        assertInvalidSyntaxMessage("no viable alternative at input '(' (..." + KEYSPACE + "." + currentTable()
+                + " WHERE a = [CAST] (...)",
+                                   "SELECT * FROM %s WHERE a = CAST (? AS INT)", 1, 2.0);
+
+        assertInvalidMessage("a cannot be cast to boolean", "SELECT CAST(a AS boolean) FROM %s");
+    }
+
+    @Test
+    public void testNumericCastsInSelectionClause() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a tinyint primary key,"
+                                   + " b smallint,"
+                                   + " c int,"
+                                   + " d bigint,"
+                                   + " e float,"
+                                   + " f double,"
+                                   + " g decimal,"
+                                   + " h varint,"
+                                   + " i int)");
+
+        execute("INSERT INTO %s (a, b, c, d, e, f, g, h) VALUES (?, ?, ?, ?, ?, ?, ?, ?)",
+                (byte) 1, (short) 2, 3, 4L, 5.2F, 6.3, BigDecimal.valueOf(6.3), BigInteger.valueOf(4));
+
+        assertColumnNames(execute("SELECT CAST(b AS int), CAST(c AS int), CAST(d AS double) FROM %s"),
+                          "cast(b as int)",
+                          "c",
+                          "cast(d as double)");
+
+        assertRows(execute("SELECT CAST(a AS tinyint), " +
+                "CAST(b AS tinyint), " +
+                "CAST(c AS tinyint), " +
+                "CAST(d AS tinyint), " +
+                "CAST(e AS tinyint), " +
+                "CAST(f AS tinyint), " +
+                "CAST(g AS tinyint), " +
+                "CAST(h AS tinyint), " +
+                "CAST(i AS tinyint) FROM %s"),
+                   row((byte) 1, (byte) 2, (byte) 3, (byte) 4L, (byte) 5, (byte) 6, (byte) 6, (byte) 4, null));
+
+        assertRows(execute("SELECT CAST(a AS smallint), " +
+                "CAST(b AS smallint), " +
+                "CAST(c AS smallint), " +
+                "CAST(d AS smallint), " +
+                "CAST(e AS smallint), " +
+                "CAST(f AS smallint), " +
+                "CAST(g AS smallint), " +
+                "CAST(h AS smallint), " +
+                "CAST(i AS smallint) FROM %s"),
+                   row((short) 1, (short) 2, (short) 3, (short) 4L, (short) 5, (short) 6, (short) 6, (short) 4, null));
+
+        assertRows(execute("SELECT CAST(a AS int), " +
+                "CAST(b AS int), " +
+                "CAST(c AS int), " +
+                "CAST(d AS int), " +
+                "CAST(e AS int), " +
+                "CAST(f AS int), " +
+                "CAST(g AS int), " +
+                "CAST(h AS int), " +
+                "CAST(i AS int) FROM %s"),
+                   row(1, 2, 3, 4, 5, 6, 6, 4, null));
+
+        assertRows(execute("SELECT CAST(a AS bigint), " +
+                "CAST(b AS bigint), " +
+                "CAST(c AS bigint), " +
+                "CAST(d AS bigint), " +
+                "CAST(e AS bigint), " +
+                "CAST(f AS bigint), " +
+                "CAST(g AS bigint), " +
+                "CAST(h AS bigint), " +
+                "CAST(i AS bigint) FROM %s"),
+                   row(1L, 2L, 3L, 4L, 5L, 6L, 6L, 4L, null));
+
+        assertRows(execute("SELECT CAST(a AS float), " +
+                "CAST(b AS float), " +
+                "CAST(c AS float), " +
+                "CAST(d AS float), " +
+                "CAST(e AS float), " +
+                "CAST(f AS float), " +
+                "CAST(g AS float), " +
+                "CAST(h AS float), " +
+                "CAST(i AS float) FROM %s"),
+                   row(1.0F, 2.0F, 3.0F, 4.0F, 5.2F, 6.3F, 6.3F, 4.0F, null));
+
+        assertRows(execute("SELECT CAST(a AS double), " +
+                "CAST(b AS double), " +
+                "CAST(c AS double), " +
+                "CAST(d AS double), " +
+                "CAST(e AS double), " +
+                "CAST(f AS double), " +
+                "CAST(g AS double), " +
+                "CAST(h AS double), " +
+                "CAST(i AS double) FROM %s"),
+                   row(1.0, 2.0, 3.0, 4.0, (double) 5.2F, 6.3, 6.3, 4.0, null));
+
+        assertRows(execute("SELECT CAST(a AS decimal), " +
+                "CAST(b AS decimal), " +
+                "CAST(c AS decimal), " +
+                "CAST(d AS decimal), " +
+                "CAST(e AS decimal), " +
+                "CAST(f AS decimal), " +
+                "CAST(g AS decimal), " +
+                "CAST(h AS decimal), " +
+                "CAST(i AS decimal) FROM %s"),
+                   row(BigDecimal.valueOf(1.0),
+                       BigDecimal.valueOf(2.0),
+                       BigDecimal.valueOf(3.0),
+                       BigDecimal.valueOf(4.0),
+                       BigDecimal.valueOf(5.2F),
+                       BigDecimal.valueOf(6.3),
+                       BigDecimal.valueOf(6.3),
+                       BigDecimal.valueOf(4.0),
+                       null));
+
+        assertRows(execute("SELECT CAST(a AS ascii), " +
+                "CAST(b AS ascii), " +
+                "CAST(c AS ascii), " +
+                "CAST(d AS ascii), " +
+                "CAST(e AS ascii), " +
+                "CAST(f AS ascii), " +
+                "CAST(g AS ascii), " +
+                "CAST(h AS ascii), " +
+                "CAST(i AS ascii) FROM %s"),
+                   row("1",
+                       "2",
+                       "3",
+                       "4",
+                       "5.2",
+                       "6.3",
+                       "6.3",
+                       "4",
+                       null));
+
+        assertRows(execute("SELECT CAST(a AS text), " +
+                "CAST(b AS text), " +
+                "CAST(c AS text), " +
+                "CAST(d AS text), " +
+                "CAST(e AS text), " +
+                "CAST(f AS text), " +
+                "CAST(g AS text), " +
+                "CAST(h AS text), " +
+                "CAST(i AS text) FROM %s"),
+                   row("1",
+                       "2",
+                       "3",
+                       "4",
+                       "5.2",
+                       "6.3",
+                       "6.3",
+                       "4",
+                       null));
+    }
+
+    @Test
+    public void testTimeCastsInSelectionClause() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a timeuuid primary key, b timestamp, c date, d time)");
+
+        DateTime dateTime = DateTimeFormat.forPattern("yyyy-MM-dd hh:mm:ss")
+                .withZone(DateTimeZone.UTC)
+                .parseDateTime("2015-05-21 11:03:02");
+
+        DateTime date = DateTimeFormat.forPattern("yyyy-MM-dd")
+                .withZone(DateTimeZone.UTC)
+                .parseDateTime("2015-05-21");
+
+        long timeInMillis = dateTime.getMillis();
+
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, '2015-05-21 11:03:02+00', '2015-05-21', '11:03:02')",
+                UUIDGen.getTimeUUID(timeInMillis));
+
+        assertRows(execute("SELECT CAST(a AS timestamp), " +
+                           "CAST(b AS timestamp), " +
+                           "CAST(c AS timestamp) FROM %s"),
+                   row(new Date(dateTime.getMillis()), new Date(dateTime.getMillis()), new Date(date.getMillis())));
+
+        int timeInMillisToDay = SimpleDateSerializer.timeInMillisToDay(date.getMillis());
+        assertRows(execute("SELECT CAST(a AS date), " +
+                           "CAST(b AS date), " +
+                           "CAST(c AS date) FROM %s"),
+                   row(timeInMillisToDay, timeInMillisToDay, timeInMillisToDay));
+
+        assertRows(execute("SELECT CAST(b AS text), " +
+                           "CAST(c AS text), " +
+                           "CAST(d AS text) FROM %s"),
+                   row("2015-05-21T11:03:02.000Z", "2015-05-21", "11:03:02.000000000"));
+    }
+
+    @Test
+    public void testOtherTypeCastsInSelectionClause() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a ascii primary key,"
+                                   + " b inet,"
+                                   + " c boolean)");
+
+        execute("INSERT INTO %s (a, b, c) VALUES (?, '127.0.0.1', ?)",
+                "test", true);
+
+        assertRows(execute("SELECT CAST(a AS text), " +
+                "CAST(b AS text), " +
+                "CAST(c AS text) FROM %s"),
+                   row("test", "127.0.0.1", "true"));
+    }
+
+    @Test
+    public void testCastsWithReverseOrder() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int,"
+                                   + " b smallint,"
+                                   + " c double,"
+                                   + " primary key (a, b)) WITH CLUSTERING ORDER BY (b DESC);");
+
+        execute("INSERT INTO %s (a, b, c) VALUES (?, ?, ?)",
+                1, (short) 2, 6.3);
+
+        assertRows(execute("SELECT CAST(a AS tinyint), " +
+                "CAST(b AS tinyint), " +
+                "CAST(c AS tinyint) FROM %s"),
+                   row((byte) 1, (byte) 2, (byte) 6));
+
+        assertRows(execute("SELECT CAST(CAST(a AS tinyint) AS smallint), " +
+                "CAST(CAST(b AS tinyint) AS smallint), " +
+                "CAST(CAST(c AS tinyint) AS smallint) FROM %s"),
+                   row((short) 1, (short) 2, (short) 6));
+
+        assertRows(execute("SELECT CAST(CAST(CAST(a AS tinyint) AS double) AS text), " +
+                "CAST(CAST(CAST(b AS tinyint) AS double) AS text), " +
+                "CAST(CAST(CAST(c AS tinyint) AS double) AS text) FROM %s"),
+                   row("1.0", "2.0", "6.0"));
+
+        String f = createFunction(KEYSPACE, "int",
+                                  "CREATE FUNCTION %s(val int) " +
+                                          "RETURNS NULL ON NULL INPUT " +
+                                          "RETURNS double " +
+                                          "LANGUAGE java " +
+                                          "AS 'return (double)val;'");
+
+        assertRows(execute("SELECT " + f + "(CAST(b AS int)) FROM %s"),
+                   row((double) 2));
+
+        assertRows(execute("SELECT CAST(" + f + "(CAST(b AS int)) AS text) FROM %s"),
+                   row("2.0"));
+    }
+
+    @Test
+    public void testCounterCastsInSelectionClause() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int primary key, b counter)");
+
+        execute("UPDATE %s SET b = b + 2 WHERE a = 1");
+
+        assertRows(execute("SELECT CAST(b AS tinyint), " +
+                "CAST(b AS smallint), " +
+                "CAST(b AS int), " +
+                "CAST(b AS bigint), " +
+                "CAST(b AS float), " +
+                "CAST(b AS double), " +
+                "CAST(b AS decimal), " +
+                "CAST(b AS ascii), " +
+                "CAST(b AS text) FROM %s"),
+                   row((byte) 2, (short) 2, 2, 2L, 2.0F, 2.0, BigDecimal.valueOf(2.0), "2", "2"));
+    }
+}

diff --git a/test/unit/org/apache/cassandra/cql3/restrictions/PrimaryKeyRestrictionSetTest.java b/test/unit/org/apache/cassandra/cql3/restrictions/ClusteringColumnRestrictionsTest.java
similarity index 87%
rename from test/unit/org/apache/cassandra/cql3/restrictions/PrimaryKeyRestrictionSetTest.java
rename to test/unit/org/apache/cassandra/cql3/restrictions/ClusteringColumnRestrictionsTest.java
index abbd36b..f78967d 100644
--- a/test/unit/org/apache/cassandra/cql3/restrictions/PrimaryKeyRestrictionSetTest.java
+++ b/test/unit/org/apache/cassandra/cql3/restrictions/ClusteringColumnRestrictionsTest.java

@@ -39,16 +39,16 @@
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertTrue;
 
-public class PrimaryKeyRestrictionSetTest
+public class ClusteringColumnRestrictionsTest
 {
     @Test
     public void testBoundsAsClusteringWithNoRestrictions()
     {
         CFMetaData cfMetaData = newCFMetaData(Sort.ASC);
 
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertEmptyStart(get(bounds, 0));
 
@@ -68,10 +68,10 @@
         ByteBuffer clustering_0 = ByteBufferUtil.bytes(1);
         Restriction eq = newSingleEq(cfMetaData, 0, clustering_0);
 
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(eq);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertStartBound(get(bounds, 0), true, clustering_0);
 
@@ -91,10 +91,10 @@
         ByteBuffer clustering_0 = ByteBufferUtil.bytes(1);
         Restriction eq = newSingleEq(cfMetaData, 0, clustering_0);
 
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(eq);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertStartBound(get(bounds, 0), true, clustering_0);
 
@@ -117,10 +117,10 @@
 
         Restriction in = newSingleIN(cfMetaData, 0, value1, value2, value3);
 
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(in);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(3, bounds.size());
         assertStartBound(get(bounds, 0), true, value1);
         assertStartBound(get(bounds, 1), true, value2);
@@ -145,10 +145,10 @@
         ByteBuffer value2 = ByteBufferUtil.bytes(2);
 
         Restriction slice = newSingleSlice(cfMetaData, 0, Bound.START, false, value1);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertStartBound(get(bounds, 0), false, value1);
 
@@ -157,7 +157,7 @@
         assertEmptyEnd(get(bounds, 0));
 
         slice = newSingleSlice(cfMetaData, 0, Bound.START, true, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -169,7 +169,7 @@
         assertEmptyEnd(get(bounds, 0));
 
         slice = newSingleSlice(cfMetaData, 0, Bound.END, true, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -181,7 +181,7 @@
         assertEndBound(get(bounds, 0), true, value1);
 
         slice = newSingleSlice(cfMetaData, 0, Bound.END, false, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -194,7 +194,7 @@
 
         slice = newSingleSlice(cfMetaData, 0, Bound.START, false, value1);
         Restriction slice2 = newSingleSlice(cfMetaData, 0, Bound.END, false, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -207,7 +207,7 @@
 
         slice = newSingleSlice(cfMetaData, 0, Bound.START, true, value1);
         slice2 = newSingleSlice(cfMetaData, 0, Bound.END, true, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -231,10 +231,10 @@
         ByteBuffer value2 = ByteBufferUtil.bytes(2);
 
         Restriction slice = newSingleSlice(cfMetaData, 0, Bound.START, false, value1);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertEmptyStart(get(bounds, 0));
 
@@ -243,7 +243,7 @@
         assertEndBound(get(bounds, 0), false, value1);
 
         slice = newSingleSlice(cfMetaData, 0, Bound.START, true, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -255,7 +255,7 @@
         assertEndBound(get(bounds, 0), true, value1);
 
         slice = newSingleSlice(cfMetaData, 0, Bound.END, true, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -267,7 +267,7 @@
         assertEmptyEnd(get(bounds, 0));
 
         slice = newSingleSlice(cfMetaData, 0, Bound.END, false, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -280,7 +280,7 @@
 
         slice = newSingleSlice(cfMetaData, 0, Bound.START, false, value1);
         Restriction slice2 = newSingleSlice(cfMetaData, 0, Bound.END, false, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -293,7 +293,7 @@
 
         slice = newSingleSlice(cfMetaData, 0, Bound.START, true, value1);
         slice2 = newSingleSlice(cfMetaData, 0, Bound.END, true, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -318,10 +318,10 @@
         ByteBuffer value3 = ByteBufferUtil.bytes(3);
         Restriction eq = newSingleEq(cfMetaData, 0, value1);
         Restriction in = newSingleIN(cfMetaData, 1, value1, value2, value3);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(eq).mergeWith(in);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(3, bounds.size());
         assertStartBound(get(bounds, 0), true, value1, value1);
         assertStartBound(get(bounds, 1), true, value1, value2);
@@ -349,10 +349,10 @@
         Restriction eq = newSingleEq(cfMetaData, 0, value3);
 
         Restriction slice = newSingleSlice(cfMetaData, 1, Bound.START, false, value1);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(eq).mergeWith(slice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertStartBound(get(bounds, 0), false, value3, value1);
 
@@ -361,7 +361,7 @@
         assertEndBound(get(bounds, 0), true, value3);
 
         slice = newSingleSlice(cfMetaData, 1, Bound.START, true, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(eq).mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -373,7 +373,7 @@
         assertEndBound(get(bounds, 0), true, value3);
 
         slice = newSingleSlice(cfMetaData, 1, Bound.END, true, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(eq).mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -385,7 +385,7 @@
         assertEndBound(get(bounds, 0), true, value3, value1);
 
         slice = newSingleSlice(cfMetaData, 1, Bound.END, false, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(eq).mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -398,7 +398,7 @@
 
         slice = newSingleSlice(cfMetaData, 1, Bound.START, false, value1);
         Restriction slice2 = newSingleSlice(cfMetaData, 1, Bound.END, false, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(eq).mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -411,7 +411,7 @@
 
         slice = newSingleSlice(cfMetaData, 1, Bound.START, true, value1);
         slice2 = newSingleSlice(cfMetaData, 1, Bound.END, true, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(eq).mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -434,10 +434,10 @@
         ByteBuffer value1 = ByteBufferUtil.bytes(1);
         ByteBuffer value2 = ByteBufferUtil.bytes(2);
         Restriction eq = newMultiEq(cfMetaData, 0, value1, value2);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(eq);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertStartBound(get(bounds, 0), true, value1, value2);
 
@@ -458,10 +458,10 @@
         ByteBuffer value2 = ByteBufferUtil.bytes(2);
         ByteBuffer value3 = ByteBufferUtil.bytes(3);
         Restriction in = newMultiIN(cfMetaData, 0, asList(value1, value2), asList(value2, value3));
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(in);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(2, bounds.size());
         assertStartBound(get(bounds, 0), true, value1, value2);
         assertStartBound(get(bounds, 1), true, value2, value3);
@@ -485,10 +485,10 @@
         ByteBuffer value2 = ByteBufferUtil.bytes(2);
 
         Restriction slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertStartBound(get(bounds, 0), false, value1);
 
@@ -497,7 +497,7 @@
         assertEmptyEnd(get(bounds, 0));
 
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -509,7 +509,7 @@
         assertEmptyEnd(get(bounds, 0));
 
         slice = newMultiSlice(cfMetaData, 0, Bound.END, true, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -521,7 +521,7 @@
         assertEndBound(get(bounds, 0), true, value1);
 
         slice = newMultiSlice(cfMetaData, 0, Bound.END, false, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -534,7 +534,7 @@
 
         slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1);
         Restriction slice2 = newMultiSlice(cfMetaData, 0, Bound.END, false, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -547,7 +547,7 @@
 
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1);
         slice2 = newMultiSlice(cfMetaData, 0, Bound.END, true, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -572,10 +572,10 @@
         ByteBuffer value2 = ByteBufferUtil.bytes(2);
 
         Restriction slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertEmptyStart(get(bounds, 0));
 
@@ -584,7 +584,7 @@
         assertEndBound(get(bounds, 0), false, value1);
 
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -596,7 +596,7 @@
         assertEndBound(get(bounds, 0), true, value1);
 
         slice = newMultiSlice(cfMetaData, 0, Bound.END, true, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -608,7 +608,7 @@
         assertEmptyEnd(get(bounds, 0));
 
         slice = newMultiSlice(cfMetaData, 0, Bound.END, false, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -621,7 +621,7 @@
 
         slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1);
         Restriction slice2 = newMultiSlice(cfMetaData, 0, Bound.END, false, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -634,7 +634,7 @@
 
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1);
         slice2 = newMultiSlice(cfMetaData, 0, Bound.END, true, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -659,10 +659,10 @@
 
         // (clustering_0, clustering1) > (1, 2)
         Restriction slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertStartBound(get(bounds, 0), false, value1, value2);
 
@@ -672,7 +672,7 @@
 
         // (clustering_0, clustering1) >= (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -685,7 +685,7 @@
 
         // (clustering_0, clustering1) <= (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, true, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -698,7 +698,7 @@
 
         // (clustering_0, clustering1) < (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, false, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -712,7 +712,7 @@
         // (clustering_0, clustering1) > (1, 2) AND (clustering_0) < (2)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2);
         Restriction slice2 = newMultiSlice(cfMetaData, 0, Bound.END, false, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -726,7 +726,7 @@
         // (clustering_0, clustering1) >= (1, 2) AND (clustering_0, clustering1) <= (2, 1)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2);
         slice2 = newMultiSlice(cfMetaData, 0, Bound.END, true, value2, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -751,10 +751,10 @@
 
         // (clustering_0, clustering1) > (1, 2)
         Restriction slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertEmptyStart(get(bounds, 0));
 
@@ -764,7 +764,7 @@
 
         // (clustering_0, clustering1) >= (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -777,7 +777,7 @@
 
         // (clustering_0, clustering1) <= (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, true, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -790,7 +790,7 @@
 
         // (clustering_0, clustering1) < (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, false, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -805,7 +805,7 @@
         // (clustering_0, clustering1) > (1, 2) AND (clustering_0) < (2)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2);
         Restriction slice2 = newMultiSlice(cfMetaData, 0, Bound.END, false, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -819,7 +819,7 @@
         // (clustering_0, clustering1) >= (1, 2) AND (clustering_0, clustering1) <= (2, 1)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2);
         slice2 = newMultiSlice(cfMetaData, 0, Bound.END, true, value2, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -845,10 +845,10 @@
 
         // (clustering_0, clustering1) > (1, 2)
         Restriction slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(2, bounds.size());
         assertEmptyStart(get(bounds, 0));
         assertStartBound(get(bounds, 1), false, value1, value2);
@@ -860,7 +860,7 @@
 
         // (clustering_0, clustering1) >= (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -875,7 +875,7 @@
 
         // (clustering_0, clustering1) <= (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, true, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -890,7 +890,7 @@
 
         // (clustering_0, clustering1) < (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, false, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -906,7 +906,7 @@
         // (clustering_0, clustering1) > (1, 2) AND (clustering_0) < (2)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2);
         Restriction slice2 = newMultiSlice(cfMetaData, 0, Bound.END, false, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -922,7 +922,7 @@
         // (clustering_0) > (1) AND (clustering_0, clustering1) < (2, 1)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1);
         slice2 = newMultiSlice(cfMetaData, 0, Bound.END, false, value2, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -938,7 +938,7 @@
         // (clustering_0, clustering1) >= (1, 2) AND (clustering_0, clustering1) <= (2, 1)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2);
         slice2 = newMultiSlice(cfMetaData, 0, Bound.END, true, value2, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -968,10 +968,10 @@
 
         // (clustering_0, clustering1) > (1, 2)
         Restriction slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(2, bounds.size());
         assertStartBound(get(bounds, 0), true, value1);
         assertStartBound(get(bounds, 1), false, value1);
@@ -983,7 +983,7 @@
 
         // (clustering_0, clustering1) >= (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -998,7 +998,7 @@
 
         // (clustering_0, clustering1) <= (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, true, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1013,7 +1013,7 @@
 
         // (clustering_0, clustering1) < (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, false, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1029,7 +1029,7 @@
         // (clustering_0, clustering1) > (1, 2) AND (clustering_0) < (2)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2);
         Restriction slice2 = newMultiSlice(cfMetaData, 0, Bound.END, false, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1045,7 +1045,7 @@
         // (clustering_0, clustering1) >= (1, 2) AND (clustering_0, clustering1) <= (2, 1)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2);
         slice2 = newMultiSlice(cfMetaData, 0, Bound.END, true, value2, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1077,10 +1077,10 @@
 
         // (clustering_0, clustering1, clustering_2, clustering_3) > (1, 2, 3, 4)
         Restriction slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2, value3, value4);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(2, bounds.size());
         assertStartBound(get(bounds, 0), true, value1, value2);
         assertStartBound(get(bounds, 1), false, value1, value2);
@@ -1093,7 +1093,7 @@
         // clustering_0 = 1 AND (clustering_1, clustering_2, clustering_3) > (2, 3, 4)
         Restriction eq = newSingleEq(cfMetaData, 0, value1);
         slice = newMultiSlice(cfMetaData, 1, Bound.START, false, value2, value3, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
         restrictions = restrictions.mergeWith(eq);
 
@@ -1110,7 +1110,7 @@
         // clustering_0 IN (1, 2) AND (clustering_1, clustering_2, clustering_3) > (2, 3, 4)
         Restriction in = newSingleIN(cfMetaData, 0, value1, value2);
         slice = newMultiSlice(cfMetaData, 1, Bound.START, false, value2, value3, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
         restrictions = restrictions.mergeWith(in);
 
@@ -1130,7 +1130,7 @@
 
         // (clustering_0, clustering1) >= (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1143,7 +1143,7 @@
 
         // (clustering_0, clustering1, clustering_2, clustering_3) >= (1, 2, 3, 4)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2, value3, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1158,7 +1158,7 @@
 
         // (clustering_0, clustering1, clustering_2, clustering_3) <= (1, 2, 3, 4)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, true, value1, value2, value3, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1173,7 +1173,7 @@
 
         // (clustering_0, clustering1, clustering_2, clustering_3) < (1, 2, 3, 4)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, false, value1, value2, value3, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1189,7 +1189,7 @@
         // (clustering_0, clustering1, clustering_2, clustering_3) > (1, 2, 3, 4) AND (clustering_0, clustering_1) < (2, 3)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2, value3, value4);
         Restriction slice2 = newMultiSlice(cfMetaData, 0, Bound.END, false, value2, value3);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1205,7 +1205,7 @@
         // (clustering_0, clustering1, clustering_2, clustering_3) >= (1, 2, 3, 4) AND (clustering_0, clustering1, clustering_2, clustering_3) <= (4, 3, 2, 1)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2, value3, value4);
         slice2 = newMultiSlice(cfMetaData, 0, Bound.END, true, value4, value3, value2, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1237,10 +1237,10 @@
 
         // (clustering_0, clustering1, clustering_2, clustering_3) > (1, 2, 3, 4)
         Restriction slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2, value3, value4);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(4, bounds.size());
         assertStartBound(get(bounds, 0), true, value1);
         assertStartBound(get(bounds, 1), true, value1, value2, value3);
@@ -1258,7 +1258,7 @@
         // clustering_0 = 1 AND (clustering_1, clustering_2, clustering_3) > (2, 3, 4)
         Restriction eq = newSingleEq(cfMetaData, 0, value1);
         slice = newMultiSlice(cfMetaData, 1, Bound.START, false, value2, value3, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
         restrictions = restrictions.mergeWith(eq);
 
@@ -1276,7 +1276,7 @@
 
         // (clustering_0, clustering1) >= (1, 2)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1291,7 +1291,7 @@
 
         // (clustering_0, clustering1, clustering_2, clustering_3) >= (1, 2, 3, 4)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2, value3, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1310,7 +1310,7 @@
 
         // (clustering_0, clustering1, clustering_2, clustering_3) <= (1, 2, 3, 4)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, true, value1, value2, value3, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1329,7 +1329,7 @@
 
         // (clustering_0, clustering1, clustering_2, clustering_3) < (1, 2, 3, 4)
         slice = newMultiSlice(cfMetaData, 0, Bound.END, false, value1, value2, value3, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1349,7 +1349,7 @@
         // (clustering_0, clustering1, clustering_2, clustering_3) > (1, 2, 3, 4) AND (clustering_0, clustering_1) < (2, 3)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, false, value1, value2, value3, value4);
         Restriction slice2 = newMultiSlice(cfMetaData, 0, Bound.END, false, value2, value3);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1371,7 +1371,7 @@
         // (clustering_0, clustering1, clustering_2, clustering_3) >= (1, 2, 3, 4) AND (clustering_0, clustering1, clustering_2, clustering_3) <= (4, 3, 2, 1)
         slice = newMultiSlice(cfMetaData, 0, Bound.START, true, value1, value2, value3, value4);
         slice2 = newMultiSlice(cfMetaData, 0, Bound.END, true, value4, value3, value2, value1);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(slice).mergeWith(slice2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1411,10 +1411,10 @@
         // clustering_0 = 1 AND (clustering_1, clustering_2) = (2, 3)
         Restriction singleEq = newSingleEq(cfMetaData, 0, value1);
         Restriction multiEq = newMultiEq(cfMetaData, 1, value2, value3);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(singleEq).mergeWith(multiEq);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertStartBound(get(bounds, 0), true, value1, value2, value3);
 
@@ -1426,7 +1426,7 @@
         singleEq = newSingleEq(cfMetaData, 0, value1);
         Restriction singleEq2 = newSingleEq(cfMetaData, 1, value2);
         multiEq = newMultiEq(cfMetaData, 2, value3, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(singleEq).mergeWith(singleEq2).mergeWith(multiEq);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1440,7 +1440,7 @@
         // (clustering_0, clustering_1) = (1, 2) AND clustering_2 = 3
         singleEq = newSingleEq(cfMetaData, 2, value3);
         multiEq = newMultiEq(cfMetaData, 0, value1, value2);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(singleEq).mergeWith(multiEq);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1455,7 +1455,7 @@
         singleEq = newSingleEq(cfMetaData, 0, value1);
         singleEq2 = newSingleEq(cfMetaData, 3, value4);
         multiEq = newMultiEq(cfMetaData, 1, value2, value3);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(singleEq).mergeWith(multiEq).mergeWith(singleEq2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1484,10 +1484,10 @@
         // clustering_0 = 1 AND (clustering_1, clustering_2) IN ((2, 3), (4, 5))
         Restriction singleEq = newSingleEq(cfMetaData, 0, value1);
         Restriction multiIN = newMultiIN(cfMetaData, 1, asList(value2, value3), asList(value4, value5));
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(singleEq).mergeWith(multiIN);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(2, bounds.size());
         assertStartBound(get(bounds, 0), true, value1, value2, value3);
         assertStartBound(get(bounds, 1), true, value1, value4, value5);
@@ -1500,7 +1500,7 @@
         // clustering_0 = 1 AND (clustering_1, clustering_2) IN ((2, 3))
         singleEq = newSingleEq(cfMetaData, 0, value1);
         multiIN = newMultiIN(cfMetaData, 1, asList(value2, value3));
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(multiIN).mergeWith(singleEq);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1515,7 +1515,7 @@
         singleEq = newSingleEq(cfMetaData, 0, value1);
         Restriction singleEq2 = newSingleEq(cfMetaData, 1, value5);
         multiIN = newMultiIN(cfMetaData, 2, asList(value2, value3), asList(value4, value5));
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(singleEq).mergeWith(multiIN).mergeWith(singleEq2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1547,10 +1547,10 @@
         // clustering_0 = 1 AND (clustering_1, clustering_2) > (2, 3)
         Restriction singleEq = newSingleEq(cfMetaData, 0, value1);
         Restriction multiSlice = newMultiSlice(cfMetaData, 1, Bound.START, false, value2, value3);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(singleEq).mergeWith(multiSlice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertStartBound(get(bounds, 0), false, value1, value2, value3);
 
@@ -1562,7 +1562,7 @@
         singleEq = newSingleEq(cfMetaData, 0, value1);
         multiSlice = newMultiSlice(cfMetaData, 1, Bound.START, false, value2, value3);
         Restriction multiSlice2 = newMultiSlice(cfMetaData, 1, Bound.END, false, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(multiSlice2).mergeWith(singleEq).mergeWith(multiSlice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1577,7 +1577,7 @@
         singleEq = newSingleEq(cfMetaData, 0, value1);
         multiSlice = newMultiSlice(cfMetaData, 1, Bound.START, true, value2, value3);
         multiSlice2 = newMultiSlice(cfMetaData, 1, Bound.END, true, value4, value5);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(multiSlice2).mergeWith(singleEq).mergeWith(multiSlice);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1605,10 +1605,10 @@
         // (clustering_0, clustering_1) = (1, 2) AND clustering_2 > 3
         Restriction multiEq = newMultiEq(cfMetaData, 0, value1, value2);
         Restriction singleSlice = newSingleSlice(cfMetaData, 2, Bound.START, false, value3);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(multiEq).mergeWith(singleSlice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertStartBound(get(bounds, 0), false, value1, value2, value3);
 
@@ -1631,10 +1631,10 @@
         // (clustering_0, clustering_1) = (1, 2) AND (clustering_2, clustering_3) > (3, 4)
         Restriction multiEq = newMultiEq(cfMetaData, 0, value1, value2);
         Restriction multiSlice = newMultiSlice(cfMetaData, 2, Bound.START, false, value3, value4);
-        PrimaryKeyRestrictions restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        ClusteringColumnRestrictions restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(multiEq).mergeWith(multiSlice);
 
-        SortedSet<Slice.Bound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
+        SortedSet<ClusteringBound> bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
         assertEquals(1, bounds.size());
         assertStartBound(get(bounds, 0), false, value1, value2, value3, value4);
 
@@ -1645,7 +1645,7 @@
         // (clustering_0, clustering_1) = (1, 2) AND (clustering_2, clustering_3) IN ((3, 4), (4, 5))
         multiEq = newMultiEq(cfMetaData, 0, value1, value2);
         Restriction multiIN = newMultiIN(cfMetaData, 2, asList(value3, value4), asList(value4, value5));
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(multiEq).mergeWith(multiIN);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1661,7 +1661,7 @@
         // (clustering_0, clustering_1) = (1, 2) AND (clustering_2, clustering_3) = (3, 4)
         multiEq = newMultiEq(cfMetaData, 0, value1, value2);
         Restriction multiEq2 = newMultiEq(cfMetaData, 2, value3, value4);
-        restrictions = new PrimaryKeyRestrictionSet(cfMetaData.comparator, false);
+        restrictions = new ClusteringColumnRestrictions(cfMetaData);
         restrictions = restrictions.mergeWith(multiEq).mergeWith(multiEq2);
 
         bounds = restrictions.boundsAsClustering(Bound.START, QueryOptions.DEFAULT);
@@ -1678,9 +1678,9 @@
      *
      * @param bound the bound to check
      */
-    private static void assertEmptyStart(Slice.Bound bound)
+    private static void assertEmptyStart(ClusteringBound bound)
     {
-        assertEquals(Slice.Bound.BOTTOM, bound);
+        assertEquals(ClusteringBound.BOTTOM, bound);
     }
 
     /**
@@ -1688,36 +1688,36 @@
      *
      * @param bound the bound to check
      */
-    private static void assertEmptyEnd(Slice.Bound bound)
+    private static void assertEmptyEnd(ClusteringBound bound)
     {
-        assertEquals(Slice.Bound.TOP, bound);
+        assertEquals(ClusteringBound.TOP, bound);
     }
 
     /**
-     * Asserts that the specified <code>Slice.Bound</code> is a start with the specified elements.
+     * Asserts that the specified <code>ClusteringBound</code> is a start with the specified elements.
      *
      * @param bound the bound to check
      * @param isInclusive if the bound is expected to be inclusive
      * @param elements the expected elements of the clustering
      */
-    private static void assertStartBound(Slice.Bound bound, boolean isInclusive, ByteBuffer... elements)
+    private static void assertStartBound(ClusteringBound bound, boolean isInclusive, ByteBuffer... elements)
     {
         assertBound(bound, true, isInclusive, elements);
     }
 
     /**
-     * Asserts that the specified <code>Slice.Bound</code> is a end with the specified elements.
+     * Asserts that the specified <code>ClusteringBound</code> is a end with the specified elements.
      *
      * @param bound the bound to check
      * @param isInclusive if the bound is expected to be inclusive
      * @param elements the expected elements of the clustering
      */
-    private static void assertEndBound(Slice.Bound bound, boolean isInclusive, ByteBuffer... elements)
+    private static void assertEndBound(ClusteringBound bound, boolean isInclusive, ByteBuffer... elements)
     {
         assertBound(bound, false, isInclusive, elements);
     }
 
-    private static void assertBound(Slice.Bound bound, boolean isStart, boolean isInclusive, ByteBuffer... elements)
+    private static void assertBound(ClusteringBound bound, boolean isStart, boolean isInclusive, ByteBuffer... elements)
     {
         assertEquals("the bound size is not the expected one:", elements.length, bound.size());
         assertEquals("the bound should be a " + (isStart ? "start" : "end") + " but is a " + (bound.isStart() ? "start" : "end"), isStart, bound.isStart());

diff --git a/test/unit/org/apache/cassandra/cql3/selection/SelectionColumnMappingTest.java b/test/unit/org/apache/cassandra/cql3/selection/SelectionColumnMappingTest.java
index 2b7a197..c930b2a 100644
--- a/test/unit/org/apache/cassandra/cql3/selection/SelectionColumnMappingTest.java
+++ b/test/unit/org/apache/cassandra/cql3/selection/SelectionColumnMappingTest.java

@@ -54,6 +54,8 @@
     public static void setUpClass()
     {
         DatabaseDescriptor.setPartitionerUnsafe(ByteOrderedPartitioner.instance);
+
+        prepareServer();
     }
 
     @Test
@@ -68,7 +70,7 @@
                                 " v1 int," +
                                 " v2 ascii," +
                                 " v3 frozen<" + typeName + ">)");
-        userType = Schema.instance.getKSMetaData(KEYSPACE).types.get(ByteBufferUtil.bytes(typeName)).get();
+        userType = Schema.instance.getKSMetaData(KEYSPACE).types.get(ByteBufferUtil.bytes(typeName)).get().freeze();
         functionName = createFunction(KEYSPACE, "int, ascii",
                                       "CREATE FUNCTION %s (i int, a ascii) " +
                                       "CALLED ON NULL INPUT " +

diff --git a/test/unit/org/apache/cassandra/cql3/selection/TermSelectionTest.java b/test/unit/org/apache/cassandra/cql3/selection/TermSelectionTest.java
new file mode 100644
index 0000000..065fdbd
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/selection/TermSelectionTest.java

@@ -0,0 +1,338 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.cql3.selection;
+
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.util.*;
+
+import org.junit.Test;
+
+import org.apache.cassandra.cql3.*;
+import org.apache.cassandra.db.marshal.*;
+import org.apache.cassandra.transport.messages.ResultMessage;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNull;
+
+public class TermSelectionTest extends CQLTester
+{
+    // Helper method for testSelectLiteral()
+    private void assertConstantResult(UntypedResultSet result, Object constant)
+    {
+        assertRows(result,
+                   row(1, "one", constant),
+                   row(2, "two", constant),
+                   row(3, "three", constant));
+    }
+
+    @Test
+    public void testSelectLiteral() throws Throwable
+    {
+        createTable("CREATE TABLE %s (pk int, ck int, t text, PRIMARY KEY (pk, ck) )");
+        execute("INSERT INTO %s (pk, ck, t) VALUES (1, 1, 'one')");
+        execute("INSERT INTO %s (pk, ck, t) VALUES (1, 2, 'two')");
+        execute("INSERT INTO %s (pk, ck, t) VALUES (1, 3, 'three')");
+
+        assertInvalidMessage("Cannot infer type for term", "SELECT ck, t, 'a const' FROM %s");
+        assertConstantResult(execute("SELECT ck, t, (text)'a const' FROM %s"), "a const");
+
+        assertInvalidMessage("Cannot infer type for term", "SELECT ck, t, 42 FROM %s");
+        assertConstantResult(execute("SELECT ck, t, (int)42 FROM %s"), 42);
+
+        assertInvalidMessage("Cannot infer type for term", "SELECT ck, t, (1, 'foo') FROM %s");
+        assertConstantResult(execute("SELECT ck, t, (tuple<int, text>)(1, 'foo') FROM %s"), tuple(1, "foo"));
+
+        assertInvalidMessage("Cannot infer type for term", "SELECT ck, t, [1, 2, 3] FROM %s");
+        assertConstantResult(execute("SELECT ck, t, (list<int>)[1, 2, 3] FROM %s"), list(1, 2, 3));
+
+        assertInvalidMessage("Cannot infer type for term", "SELECT ck, t, {1, 2, 3} FROM %s");
+        assertConstantResult(execute("SELECT ck, t, (set<int>){1, 2, 3} FROM %s"), set(1, 2, 3));
+
+        assertInvalidMessage("Cannot infer type for term", "SELECT ck, t, {1: 'foo', 2: 'bar', 3: 'baz'} FROM %s");
+        assertConstantResult(execute("SELECT ck, t, (map<int, text>){1: 'foo', 2: 'bar', 3: 'baz'} FROM %s"), map(1, "foo", 2, "bar", 3, "baz"));
+
+        assertColumnNames(execute("SELECT ck, t, (int)42, (int)43 FROM %s"), "ck", "t", "(int)42", "(int)43");
+        assertRows(execute("SELECT ck, t, (int) 42, (int) 43 FROM %s"),
+                   row(1, "one", 42, 43),
+                   row(2, "two", 42, 43),
+                   row(3, "three", 42, 43));
+    }
+
+    @Test
+    public void testSelectUDTLiteral() throws Throwable
+    {
+        String type = createType("CREATE TYPE %s(a int, b text)");
+        createTable("CREATE TABLE %s (k int PRIMARY KEY, v " + type + ")");
+
+        execute("INSERT INTO %s(k, v) VALUES (?, ?)", 0, userType("a", 3, "b", "foo"));
+
+        assertInvalidMessage("Cannot infer type for term", "SELECT k, v, { a: 4, b: 'bar'} FROM %s");
+
+        assertRows(execute("SELECT k, v, (" + type + "){ a: 4, b: 'bar'} FROM %s"),
+            row(0, userType("a", 3, "b", "foo"), userType("a", 4, "b", "bar"))
+        );
+    }
+
+    @Test
+    public void testInvalidSelect() throws Throwable
+    {
+        // Creates a table just so we can reference it in the (invalid) SELECT below
+        createTable("CREATE TABLE %s (k int PRIMARY KEY)");
+
+        assertInvalidMessage("Cannot infer type for term", "SELECT ? FROM %s");
+        assertInvalidMessage("Cannot infer type for term", "SELECT k, ? FROM %s");
+
+        assertInvalidMessage("Cannot infer type for term", "SELECT k, null FROM %s");
+    }
+
+    private void assertColumnSpec(ColumnSpecification spec, String expectedName, AbstractType<?> expectedType)
+    {
+        assertEquals(expectedName, spec.name.toString());
+        assertEquals(expectedType, spec.type);
+    }
+
+    @Test
+    public void testSelectPrepared() throws Throwable
+    {
+        createTable("CREATE TABLE %s (pk int, ck int, t text, PRIMARY KEY (pk, ck) )");
+        execute("INSERT INTO %s (pk, ck, t) VALUES (1, 1, 'one')");
+        execute("INSERT INTO %s (pk, ck, t) VALUES (1, 2, 'two')");
+        execute("INSERT INTO %s (pk, ck, t) VALUES (1, 3, 'three')");
+
+        String query = "SELECT (int)?, (decimal):adecimal, (text)?, (tuple<int,text>):atuple, pk, ck, t FROM %s WHERE pk = ?";
+        ResultMessage.Prepared prepared = prepare(query);
+
+        List<ColumnSpecification> boundNames = prepared.metadata.names;
+
+        // 5 bound variables
+        assertEquals(5, boundNames.size());
+        assertColumnSpec(boundNames.get(0), "[selection]", Int32Type.instance);
+        assertColumnSpec(boundNames.get(1), "adecimal", DecimalType.instance);
+        assertColumnSpec(boundNames.get(2), "[selection]", UTF8Type.instance);
+        assertColumnSpec(boundNames.get(3), "atuple", TypeParser.parse("TupleType(Int32Type,UTF8Type)"));
+        assertColumnSpec(boundNames.get(4), "pk", Int32Type.instance);
+
+
+        List<ColumnSpecification> resultNames = prepared.resultMetadata.names;
+
+        // 7 result "columns"
+        assertEquals(7, resultNames.size());
+        assertColumnSpec(resultNames.get(0), "(int)?", Int32Type.instance);
+        assertColumnSpec(resultNames.get(1), "(decimal)?", DecimalType.instance);
+        assertColumnSpec(resultNames.get(2), "(text)?", UTF8Type.instance);
+        assertColumnSpec(resultNames.get(3), "(tuple<int, text>)?", TypeParser.parse("TupleType(Int32Type,UTF8Type)"));
+        assertColumnSpec(resultNames.get(4), "pk", Int32Type.instance);
+        assertColumnSpec(resultNames.get(5), "ck", Int32Type.instance);
+        assertColumnSpec(resultNames.get(6), "t", UTF8Type.instance);
+
+        assertRows(execute(query, 88, BigDecimal.TEN, "foo bar baz", tuple(42, "ursus"), 1),
+                   row(88, BigDecimal.TEN, "foo bar baz", tuple(42, "ursus"),
+                       1, 1, "one"),
+                   row(88, BigDecimal.TEN, "foo bar baz", tuple(42, "ursus"),
+                       1, 2, "two"),
+                   row(88, BigDecimal.TEN, "foo bar baz", tuple(42, "ursus"),
+                       1, 3, "three"));
+    }
+
+    @Test
+    public void testConstantFunctionArgs() throws Throwable
+    {
+        String fInt = createFunction(KEYSPACE,
+                                     "int,int",
+                                     "CREATE FUNCTION %s (val1 int, val2 int) " +
+                                     "CALLED ON NULL INPUT " +
+                                     "RETURNS int " +
+                                     "LANGUAGE java\n" +
+                                     "AS 'return Math.max(val1, val2);';");
+        String fFloat = createFunction(KEYSPACE,
+                                       "float,float",
+                                       "CREATE FUNCTION %s (val1 float, val2 float) " +
+                                       "CALLED ON NULL INPUT " +
+                                       "RETURNS float " +
+                                       "LANGUAGE java\n" +
+                                       "AS 'return Math.max(val1, val2);';");
+        String fText = createFunction(KEYSPACE,
+                                      "text,text",
+                                      "CREATE FUNCTION %s (val1 text, val2 text) " +
+                                      "CALLED ON NULL INPUT " +
+                                      "RETURNS text " +
+                                      "LANGUAGE java\n" +
+                                      "AS 'return val2;';");
+        String fAscii = createFunction(KEYSPACE,
+                                       "ascii,ascii",
+                                       "CREATE FUNCTION %s (val1 ascii, val2 ascii) " +
+                                       "CALLED ON NULL INPUT " +
+                                       "RETURNS ascii " +
+                                       "LANGUAGE java\n" +
+                                       "AS 'return val2;';");
+        String fTimeuuid = createFunction(KEYSPACE,
+                                          "timeuuid,timeuuid",
+                                          "CREATE FUNCTION %s (val1 timeuuid, val2 timeuuid) " +
+                                          "CALLED ON NULL INPUT " +
+                                          "RETURNS timeuuid " +
+                                          "LANGUAGE java\n" +
+                                          "AS 'return val2;';");
+
+        createTable("CREATE TABLE %s (pk int PRIMARY KEY, valInt int, valFloat float, valText text, valAscii ascii, valTimeuuid timeuuid)");
+        execute("INSERT INTO %s (pk, valInt, valFloat, valText, valAscii, valTimeuuid) " +
+                "VALUES (1, 10, 10.0, '100', '100', 2deb23e0-96b5-11e5-b26d-a939dd1405a3)");
+
+        assertRows(execute("SELECT pk, " + fInt + "(valInt, 100) FROM %s"),
+                   row(1, 100));
+        assertRows(execute("SELECT pk, " + fInt + "(valInt, (int)100) FROM %s"),
+                   row(1, 100));
+        assertInvalidMessage("Type error: (bigint)100 cannot be passed as argument 1 of function",
+                             "SELECT pk, " + fInt + "(valInt, (bigint)100) FROM %s");
+        assertRows(execute("SELECT pk, " + fFloat + "(valFloat, (float)100.00) FROM %s"),
+                   row(1, 100f));
+        assertRows(execute("SELECT pk, " + fText + "(valText, 'foo') FROM %s"),
+                   row(1, "foo"));
+        assertRows(execute("SELECT pk, " + fAscii + "(valAscii, (ascii)'foo') FROM %s"),
+                   row(1, "foo"));
+        assertRows(execute("SELECT pk, " + fTimeuuid + "(valTimeuuid, (timeuuid)34617f80-96b5-11e5-b26d-a939dd1405a3) FROM %s"),
+                   row(1, UUID.fromString("34617f80-96b5-11e5-b26d-a939dd1405a3")));
+
+        // ambiguous
+
+        String fAmbiguousFunc1 = createFunction(KEYSPACE,
+                                                "int,bigint",
+                                                "CREATE FUNCTION %s (val1 int, val2 bigint) " +
+                                                "CALLED ON NULL INPUT " +
+                                                "RETURNS bigint " +
+                                                "LANGUAGE java\n" +
+                                                "AS 'return Math.max((long)val1, val2);';");
+        assertRows(execute("SELECT pk, " + fAmbiguousFunc1 + "(valInt, 100) FROM %s"),
+                   row(1, 100L));
+        createFunctionOverload(fAmbiguousFunc1, "int,int",
+                                                "CREATE FUNCTION %s (val1 int, val2 int) " +
+                                                "CALLED ON NULL INPUT " +
+                                                "RETURNS bigint " +
+                                                "LANGUAGE java\n" +
+                                                "AS 'return (long)Math.max(val1, val2);';");
+        assertInvalidMessage("Ambiguous call to function cql_test_keyspace.function_",
+                             "SELECT pk, " + fAmbiguousFunc1 + "(valInt, 100) FROM %s");
+    }
+
+    @Test
+    public void testPreparedFunctionArgs() throws Throwable
+    {
+        createTable("CREATE TABLE %s (pk int, ck int, t text, i int, PRIMARY KEY (pk, ck) )");
+        execute("INSERT INTO %s (pk, ck, t, i) VALUES (1, 1, 'one', 50)");
+        execute("INSERT INTO %s (pk, ck, t, i) VALUES (1, 2, 'two', 100)");
+        execute("INSERT INTO %s (pk, ck, t, i) VALUES (1, 3, 'three', 150)");
+
+        String fIntMax = createFunction(KEYSPACE,
+                                        "int,int",
+                                        "CREATE FUNCTION %s (val1 int, val2 int) " +
+                                        "CALLED ON NULL INPUT " +
+                                        "RETURNS int " +
+                                        "LANGUAGE java\n" +
+                                        "AS 'return Math.max(val1, val2);';");
+
+        // weak typing
+
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, ?) FROM %s", 0),
+                   row(1, 1, 50),
+                   row(1, 2, 100),
+                   row(1, 3, 150));
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, ?) FROM %s", 100),
+                   row(1, 1, 100),
+                   row(1, 2, 100),
+                   row(1, 3, 150));
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, ?) FROM %s", 200),
+                   row(1, 1, 200),
+                   row(1, 2, 200),
+                   row(1, 3, 200));
+
+        // explicit typing
+
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, (int)?) FROM %s", 0),
+                   row(1, 1, 50),
+                   row(1, 2, 100),
+                   row(1, 3, 150));
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, (int)?) FROM %s", 100),
+                   row(1, 1, 100),
+                   row(1, 2, 100),
+                   row(1, 3, 150));
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, (int)?) FROM %s", 200),
+                   row(1, 1, 200),
+                   row(1, 2, 200),
+                   row(1, 3, 200));
+
+        // weak typing
+
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, ?) FROM %s WHERE pk = " + fIntMax + "(1,1)", 0),
+                   row(1, 1, 50),
+                   row(1, 2, 100),
+                   row(1, 3, 150));
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, ?) FROM %s WHERE pk = " + fIntMax + "(2,1)", 0));
+
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, ?) FROM %s WHERE pk = " + fIntMax + "(?,1)", 0, 1),
+                   row(1, 1, 50),
+                   row(1, 2, 100),
+                   row(1, 3, 150));
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, ?) FROM %s WHERE pk = " + fIntMax + "(?,1)", 0, 2));
+
+        // explicit typing
+
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, (int)?) FROM %s WHERE pk = " + fIntMax + "((int)1,(int)1)", 0),
+                   row(1, 1, 50),
+                   row(1, 2, 100),
+                   row(1, 3, 150));
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, (int)?) FROM %s WHERE pk = " + fIntMax + "((int)2,(int)1)", 0));
+
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, (int)?) FROM %s WHERE pk = " + fIntMax + "((int)?,(int)1)", 0, 1),
+                   row(1, 1, 50),
+                   row(1, 2, 100),
+                   row(1, 3, 150));
+        assertRows(execute("SELECT pk, ck, " + fIntMax + "(i, (int)?) FROM %s WHERE pk = " + fIntMax + "((int)?,(int)1)", 0, 2));
+
+        assertInvalidMessage("Invalid unset value for argument", "SELECT pk, ck, " + fIntMax + "(i, (int)?) FROM %s WHERE pk = " + fIntMax + "((int)1,(int)1)", unset());
+    }
+
+    @Test
+    public void testInsertUpdateDelete() throws Throwable
+    {
+        String fIntMax = createFunction(KEYSPACE,
+                                        "int,int",
+                                        "CREATE FUNCTION %s (val1 int, val2 int) " +
+                                        "CALLED ON NULL INPUT " +
+                                        "RETURNS int " +
+                                        "LANGUAGE java\n" +
+                                        "AS 'return Math.max(val1, val2);';");
+
+        createTable("CREATE TABLE %s (pk int, ck int, t text, i int, PRIMARY KEY (pk, ck) )");
+
+        execute("UPDATE %s SET i = " + fIntMax + "(100, 200) WHERE pk = 1 AND ck = 1");
+        assertRows(execute("SELECT i FROM %s WHERE pk = 1 AND ck = 1"),
+                   row(200));
+
+        execute("UPDATE %s SET i = " + fIntMax + "(100, 300) WHERE pk = 1 AND ck = " + fIntMax + "(1,2)");
+        assertRows(execute("SELECT i FROM %s WHERE pk = 1 AND ck = 2"),
+                   row(300));
+
+        execute("DELETE FROM %s WHERE pk = 1 AND ck = " + fIntMax + "(1,2)");
+        assertRows(execute("SELECT i FROM %s WHERE pk = 1 AND ck = 2"));
+
+        execute("INSERT INTO %s (pk, ck, i) VALUES (1, " + fIntMax + "(1,2), " + fIntMax + "(100, 300))");
+        assertRows(execute("SELECT i FROM %s WHERE pk = 1 AND ck = 2"),
+                   row(300));
+    }
+}

diff --git a/test/unit/org/apache/cassandra/cql3/statements/PropertyDefinitionsTest.java b/test/unit/org/apache/cassandra/cql3/statements/PropertyDefinitionsTest.java
new file mode 100644
index 0000000..18487f7
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/statements/PropertyDefinitionsTest.java

@@ -0,0 +1,81 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.cql3.statements;
+
+import org.junit.After;
+import org.junit.Test;
+import org.junit.Before;
+
+import static org.junit.Assert.assertEquals;
+
+public class PropertyDefinitionsTest {
+    
+    PropertyDefinitions pd;
+    
+    @Before
+    public void setUp()
+    {
+        pd = new PropertyDefinitions();
+    }
+    
+    @After
+    public void clear()
+    {
+        pd = null;
+    }
+    
+
+    @Test
+    public void testGetBooleanExistant()
+    {
+        String key = "one";
+        pd.addProperty(key, "1");
+        assertEquals(Boolean.TRUE, pd.getBoolean(key, null));
+        
+        key = "TRUE";
+        pd.addProperty(key, "TrUe");
+        assertEquals(Boolean.TRUE, pd.getBoolean(key, null));
+        
+        key = "YES";
+        pd.addProperty(key, "YeS");
+        assertEquals(Boolean.TRUE, pd.getBoolean(key, null));
+   
+        key = "BAD_ONE";
+        pd.addProperty(key, " 1");
+        assertEquals(Boolean.FALSE, pd.getBoolean(key, null));
+        
+        key = "BAD_TRUE";
+        pd.addProperty(key, "true ");
+        assertEquals(Boolean.FALSE, pd.getBoolean(key, null));
+        
+        key = "BAD_YES";
+        pd.addProperty(key, "ye s");
+        assertEquals(Boolean.FALSE, pd.getBoolean(key, null));
+    }
+    
+    @Test
+    public void testGetBooleanNonexistant()
+    {
+        assertEquals(Boolean.FALSE, pd.getBoolean("nonexistant", Boolean.FALSE));
+        assertEquals(Boolean.TRUE, pd.getBoolean("nonexistant", Boolean.TRUE));
+    }
+    
+}

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/CountersTest.java b/test/unit/org/apache/cassandra/cql3/validation/entities/CountersTest.java
index c9939c8..33b4a4f 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/CountersTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/CountersTest.java

@@ -116,7 +116,7 @@
     @Test
     public void testCounterFiltering() throws Throwable
     {
-        for (String compactStorageClause: new String[] {"", " WITH COMPACT STORAGE"})
+        for (String compactStorageClause : new String[]{ "", " WITH COMPACT STORAGE" })
         {
             createTable("CREATE TABLE %s (k int PRIMARY KEY, a counter)" + compactStorageClause);
 

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/FrozenCollectionsTest.java b/test/unit/org/apache/cassandra/cql3/validation/entities/FrozenCollectionsTest.java
index 9df8ea0..4c52ed2 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/FrozenCollectionsTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/FrozenCollectionsTest.java

@@ -41,10 +41,12 @@
 public class FrozenCollectionsTest extends CQLTester
 {
     @BeforeClass
-    public static void setUpClass()
+    public static void setUpClass()     // overrides CQLTester.setUpClass()
     {
         // Selecting partitioner for a table is not exposed on CREATE TABLE.
         StorageService.instance.setPartitionerUnsafe(ByteOrderedPartitioner.instance);
+
+        prepareServer();
     }
 
     @Test
@@ -630,17 +632,18 @@
                              "SELECT * FROM %s WHERE c CONTAINS KEY ?", 1);
 
         // normal indexes on frozen collections don't support CONTAINS or CONTAINS KEY
-        assertInvalidMessage("Cannot restrict clustering columns by a CONTAINS relation without a secondary index",
+        assertInvalidMessage("Clustering columns can only be restricted with CONTAINS with a secondary index or filtering",
                              "SELECT * FROM %s WHERE b CONTAINS ?", 1);
 
-        assertInvalidMessage("Cannot restrict clustering columns by a CONTAINS relation without a secondary index",
-                             "SELECT * FROM %s WHERE b CONTAINS ? ALLOW FILTERING", 1);
+        assertRows(execute("SELECT * FROM %s WHERE b CONTAINS ? ALLOW FILTERING", 1),
+                   row(0, list(1, 2, 3), set(1, 2, 3), map(1, "a")),
+                   row(1, list(1, 2, 3), set(4, 5, 6), map(2, "b")));
 
         assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
                              "SELECT * FROM %s WHERE d CONTAINS KEY ?", 1);
 
-        assertInvalidMessage("Cannot restrict clustering columns by a CONTAINS relation without a secondary index",
-                             "SELECT * FROM %s WHERE b CONTAINS ? AND d CONTAINS KEY ? ALLOW FILTERING", 1, 1);
+        assertRows(execute("SELECT * FROM %s WHERE b CONTAINS ? AND d CONTAINS KEY ? ALLOW FILTERING", 1, 1),
+                   row(0, list(1, 2, 3), set(1, 2, 3), map(1, "a")));
 
         // index lookup on b
         assertRows(execute("SELECT * FROM %s WHERE b=?", list(1, 2, 3)),

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/SecondaryIndexOnStaticColumnTest.java b/test/unit/org/apache/cassandra/cql3/validation/entities/SecondaryIndexOnStaticColumnTest.java
new file mode 100644
index 0000000..f69d8d5
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/SecondaryIndexOnStaticColumnTest.java

@@ -0,0 +1,217 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.cql3.validation.entities;
+
+import org.junit.Test;
+import org.apache.cassandra.cql3.CQLTester;
+
+public class SecondaryIndexOnStaticColumnTest extends CQLTester
+{
+    @Test
+    public void testSimpleStaticColumn() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, name text, age int static, PRIMARY KEY (id, name))");
+
+        createIndex("CREATE INDEX static_age on %s(age)");
+        int id1 = 1, id2 = 2, age1 = 24, age2 = 32;
+        String name1A = "Taylor", name1B = "Swift",
+               name2 = "Jamie";
+
+        execute("INSERT INTO %s (id, name, age) VALUES (?, ?, ?)", id1, name1A, age1);
+        execute("INSERT INTO %s (id, name, age) VALUES (?, ?, ?)", id1, name1B, age1);
+        execute("INSERT INTO %s (id, name, age) VALUES (?, ?, ?)", id2, name2, age2);
+
+        assertRows(execute("SELECT id, name, age FROM %s WHERE age=?", age1),
+              row(id1, name1B, age1), row(id1, name1A, age1));
+        assertRows(execute("SELECT id, name, age FROM %s WHERE age=?", age2),
+                row(id2, name2, age2));
+
+        // Update the rows. Validate that updated values will be reflected in the index.
+        int newAge1 = 40;
+        execute("UPDATE %s SET age = ? WHERE id = ?", newAge1, id1);
+        assertEmpty(execute("SELECT id, name, age FROM %s WHERE age=?", age1));
+        assertRows(execute("SELECT id, name, age FROM %s WHERE age=?", newAge1),
+                row(id1, name1B, newAge1), row(id1, name1A, newAge1));
+        execute("DELETE FROM %s WHERE id = ?", id2);
+        assertEmpty(execute("SELECT id, name, age FROM %s WHERE age=?", age2));
+    }
+
+    @Test
+    public void testIndexOnCompoundRowKey() throws Throwable
+    {
+        createTable("CREATE TABLE %s (interval text, seq int, id int, severity int static, PRIMARY KEY ((interval, seq), id) ) WITH CLUSTERING ORDER BY (id DESC)");
+
+        execute("CREATE INDEX ON %s (severity)");
+
+        execute("insert into %s (interval, seq, id , severity) values('t',1, 3, 10)");
+        execute("insert into %s (interval, seq, id , severity) values('t',1, 4, 10)");
+        execute("insert into %s (interval, seq, id , severity) values('t',2, 3, 10)");
+        execute("insert into %s (interval, seq, id , severity) values('t',2, 4, 10)");
+        execute("insert into %s (interval, seq, id , severity) values('m',1, 3, 11)");
+        execute("insert into %s (interval, seq, id , severity) values('m',1, 4, 11)");
+        execute("insert into %s (interval, seq, id , severity) values('m',2, 3, 11)");
+        execute("insert into %s (interval, seq, id , severity) values('m',2, 4, 11)");
+
+        assertRows(execute("select * from %s where severity = 10 and interval = 't' and seq = 1"),
+                   row("t", 1, 4, 10), row("t", 1, 3, 10));
+    }
+
+    @Test
+    public void testIndexOnCollections() throws Throwable
+    {
+        createTable("CREATE TABLE %s (k int, v int, l list<int> static, s set<text> static, m map<text, int> static, PRIMARY KEY (k, v))");
+
+        createIndex("CREATE INDEX ON %s (l)");
+        createIndex("CREATE INDEX ON %s (s)");
+        createIndex("CREATE INDEX ON %s (m)");
+        createIndex("CREATE INDEX ON %s (keys(m))");
+
+        execute("INSERT INTO %s (k, v, l, s, m) VALUES (0, 0, [1, 2],    {'a'},      {'a' : 1, 'b' : 2})");
+        execute("INSERT INTO %s (k, v)          VALUES (0, 1)                                  ");
+        execute("INSERT INTO %s (k, v, l, s, m) VALUES (1, 0, [4, 5],    {'d'},      {'b' : 1, 'c' : 4})");
+
+        // lists
+        assertRows(execute("SELECT k, v FROM %s WHERE l CONTAINS 1"), row(0, 0), row(0, 1));
+        assertEmpty(execute("SELECT k, v FROM %s WHERE k = 1 AND l CONTAINS 1"));
+        assertRows(execute("SELECT k, v FROM %s WHERE l CONTAINS 4"), row(1, 0));
+        assertEmpty(execute("SELECT k, v FROM %s WHERE l CONTAINS 6"));
+
+        // update lists
+        execute("UPDATE %s SET l = l + [3] WHERE k = ?", 0);
+        assertRows(execute("SELECT k, v FROM %s WHERE l CONTAINS 3"), row(0, 0), row(0, 1));
+
+        // sets
+        assertRows(execute("SELECT k, v FROM %s WHERE s CONTAINS 'a'"), row(0, 0), row(0, 1));
+        assertRows(execute("SELECT k, v FROM %s WHERE k = 0 AND s CONTAINS 'a'"), row(0, 0), row(0, 1));
+        assertRows(execute("SELECT k, v FROM %s WHERE s CONTAINS 'd'"), row(1, 0));
+        assertEmpty(execute("SELECT k, v FROM %s  WHERE s CONTAINS 'e'"));
+
+        // update sets
+        execute("UPDATE %s SET s = s + {'b'} WHERE k = ?", 0);
+        assertRows(execute("SELECT k, v FROM %s WHERE s CONTAINS 'b'"), row(0, 0), row(0, 1));
+        execute("UPDATE %s SET s = s - {'a'} WHERE k = ?", 0);
+        assertEmpty(execute("SELECT k, v FROM %s WHERE s CONTAINS 'a'"));
+
+        // maps
+        assertRows(execute("SELECT k, v FROM %s WHERE m CONTAINS 1"), row(1, 0), row(0, 0), row(0, 1));
+        assertRows(execute("SELECT k, v FROM %s WHERE k = 0 AND m CONTAINS 1"), row(0, 0), row(0, 1));
+        assertRows(execute("SELECT k, v FROM %s WHERE m CONTAINS 4"), row(1, 0));
+        assertEmpty(execute("SELECT k, v FROM %s  WHERE m CONTAINS 5"));
+
+        assertRows(execute("SELECT k, v FROM %s WHERE m CONTAINS KEY 'b'"), row(1, 0), row(0, 0), row(0, 1));
+        assertRows(execute("SELECT k, v FROM %s WHERE k = 0 AND m CONTAINS KEY 'b'"), row(0, 0), row(0, 1));
+        assertRows(execute("SELECT k, v FROM %s WHERE m CONTAINS KEY 'c'"), row(1, 0));
+        assertEmpty(execute("SELECT k, v FROM %s  WHERE m CONTAINS KEY 'd'"));
+
+        // update maps.
+        execute("UPDATE %s SET m['c'] = 5 WHERE k = 0");
+        assertRows(execute("SELECT k, v FROM %s WHERE m CONTAINS 5"), row(0, 0), row(0, 1));
+        assertRows(execute("SELECT k, v FROM %s WHERE m CONTAINS KEY 'c'"), row(1, 0), row(0, 0), row(0, 1));
+        execute("DELETE m['a'] FROM %s WHERE k = 0");
+        assertEmpty(execute("SELECT k, v FROM %s  WHERE m CONTAINS KEY 'a'"));
+    }
+
+    @Test
+    public void testIndexOnFrozenCollections() throws Throwable
+    {
+        createTable("CREATE TABLE %s (k int, v int, l frozen<list<int>> static, s frozen<set<text>> static, m frozen<map<text, int>> static, PRIMARY KEY (k, v))");
+
+        createIndex("CREATE INDEX ON %s (FULL(l))");
+        createIndex("CREATE INDEX ON %s (FULL(s))");
+        createIndex("CREATE INDEX ON %s (FULL(m))");
+
+        execute("INSERT INTO %s (k, v, l, s, m) VALUES (0, 0, [1, 2],    {'a'},      {'a' : 1, 'b' : 2})");
+        execute("INSERT INTO %s (k, v)          VALUES (0, 1)                                  ");
+        execute("INSERT INTO %s (k, v, l, s, m) VALUES (1, 0, [4, 5],    {'d'},      {'b' : 1, 'c' : 4})");
+        execute("UPDATE %s SET l=[3], s={'3'}, m={'3': 3} WHERE k=3" );
+
+        // lists
+        assertRows(execute("SELECT k, v FROM %s WHERE l = [1, 2]"), row(0, 0), row(0, 1));
+        assertEmpty(execute("SELECT k, v FROM %s WHERE k = 1 AND l = [1, 2]"));
+        assertEmpty(execute("SELECT k, v FROM %s WHERE l = [4]"));
+        assertRows(execute("SELECT k, v FROM %s WHERE l = [3]"), row(3, null));
+
+        // update lists
+        execute("UPDATE %s SET l = [1, 2, 3] WHERE k = ?", 0);
+        assertEmpty(execute("SELECT k, v FROM %s WHERE l = [1, 2]"));
+        assertRows(execute("SELECT k, v FROM %s WHERE l = [1, 2, 3]"), row(0, 0), row(0, 1));
+
+        // sets
+        assertRows(execute("SELECT k, v FROM %s WHERE s = {'a'}"), row(0, 0), row(0, 1));
+        assertEmpty(execute("SELECT k, v FROM %s WHERE k = 1 AND s = {'a'}"));
+        assertEmpty(execute("SELECT k, v FROM %s WHERE s = {'b'}"));
+        assertRows(execute("SELECT k, v FROM %s WHERE s = {'3'}"), row(3, null));
+
+        // update sets
+        execute("UPDATE %s SET s = {'a', 'b'} WHERE k = ?", 0);
+        assertEmpty(execute("SELECT k, v FROM %s WHERE s = {'a'}"));
+        assertRows(execute("SELECT k, v FROM %s WHERE s = {'a', 'b'}"), row(0, 0), row(0, 1));
+
+        // maps
+        assertRows(execute("SELECT k, v FROM %s WHERE m = {'a' : 1, 'b' : 2}"), row(0, 0), row(0, 1));
+        assertEmpty(execute("SELECT k, v FROM %s WHERE k = 1 AND m = {'a' : 1, 'b' : 2}"));
+        assertEmpty(execute("SELECT k, v FROM %s WHERE m = {'a' : 1, 'b' : 3}"));
+        assertEmpty(execute("SELECT k, v FROM %s WHERE m = {'a' : 1, 'c' : 2}"));
+        assertRows(execute("SELECT k, v FROM %s WHERE m = {'3': 3}"), row(3, null));
+
+        // update maps.
+        execute("UPDATE %s SET m = {'a': 2, 'b': 3} WHERE k = ?", 0);
+        assertEmpty(execute("SELECT k, v FROM %s WHERE m = {'a': 1, 'b': 2}"));
+        assertRows(execute("SELECT k, v FROM %s WHERE m = {'a': 2, 'b': 3}"), row(0, 0), row(0, 1));
+    }
+
+    @Test
+    public void testStaticIndexAndNonStaticIndex() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, company text, age int static, salary int, PRIMARY KEY(id, company))");
+        createIndex("CREATE INDEX on %s(age)");
+        createIndex("CREATE INDEX on %s(salary)");
+
+        String company1 = "company1", company2 = "company2";
+
+        execute("INSERT INTO %s(id, company, age, salary) VALUES(?, ?, ?, ?)", 1, company1, 20, 1000);
+        execute("INSERT INTO %s(id, company,      salary) VALUES(?, ?,    ?)", 1, company2,     2000);
+        execute("INSERT INTO %s(id, company, age, salary) VALUES(?, ?, ?, ?)", 2, company1, 40, 2000);
+
+        assertRows(execute("SELECT id, company, age, salary FROM %s WHERE age = 20 AND salary = 2000 ALLOW FILTERING"),
+                   row(1, company2, 20, 2000));
+    }
+
+    @Test
+    public void testIndexOnUDT() throws Throwable
+    {
+        String typeName = createType("CREATE TYPE %s (street text, city text)");
+
+        createTable(String.format(
+            "CREATE TABLE %%s (id int, company text, home frozen<%s> static, price int, PRIMARY KEY(id, company))",
+            typeName));
+        createIndex("CREATE INDEX on %s(home)");
+
+        String addressString = "{street: 'Centre', city: 'C'}";
+        String companyName = "Random";
+
+        execute("INSERT INTO %s(id, company, home, price) "
+                + "VALUES(1, '" + companyName + "', " + addressString + ", 10000)");
+        assertRows(execute("SELECT id, company FROM %s WHERE home = " + addressString), row(1, companyName));
+        String newAddressString = "{street: 'Fifth', city: 'P'}";
+
+        execute("UPDATE %s SET home = " + newAddressString + " WHERE id = 1");
+        assertEmpty(execute("SELECT id, company FROM %s WHERE home = " + addressString));
+        assertRows(execute("SELECT id, company FROM %s WHERE home = " + newAddressString), row(1, companyName));
+    }
+}
\ No newline at end of file

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/SecondaryIndexTest.java b/test/unit/org/apache/cassandra/cql3/validation/entities/SecondaryIndexTest.java
index f9802d7..feea656 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/SecondaryIndexTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/SecondaryIndexTest.java

@@ -44,6 +44,7 @@
 import org.apache.cassandra.index.SecondaryIndexManager;
 import org.apache.cassandra.index.StubIndex;
 import org.apache.cassandra.index.internal.CustomCassandraIndex;
+import org.apache.cassandra.index.sasi.SASIIndex;
 import org.apache.cassandra.schema.IndexMetadata;
 import org.apache.cassandra.service.ClientState;
 import org.apache.cassandra.transport.messages.ResultMessage;
@@ -685,6 +686,65 @@
                    "APPLY BATCH", map);
     }
 
+    @Test
+    public void prepareStatementsWithLIKEClauses() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int, c1 text, c2 text, v1 text, v2 text, v3 int, PRIMARY KEY (a, c1, c2))");
+        createIndex(String.format("CREATE CUSTOM INDEX c1_idx on %%s(c1) USING '%s' WITH OPTIONS = {'mode' : 'PREFIX'}",
+                                  SASIIndex.class.getName()));
+        createIndex(String.format("CREATE CUSTOM INDEX c2_idx on %%s(c2) USING '%s' WITH OPTIONS = {'mode' : 'CONTAINS'}",
+                                  SASIIndex.class.getName()));
+        createIndex(String.format("CREATE CUSTOM INDEX v1_idx on %%s(v1) USING '%s' WITH OPTIONS = {'mode' : 'PREFIX'}",
+                                  SASIIndex.class.getName()));
+        createIndex(String.format("CREATE CUSTOM INDEX v2_idx on %%s(v2) USING '%s' WITH OPTIONS = {'mode' : 'CONTAINS'}",
+                                  SASIIndex.class.getName()));
+        createIndex(String.format("CREATE CUSTOM INDEX v3_idx on %%s(v3) USING '%s'", SASIIndex.class.getName()));
+
+        forcePreparedValues();
+        // prefix mode indexes support prefix/contains/matches
+        assertInvalidMessage("c1 LIKE '%<term>' abc is only supported on properly indexed columns",
+                             "SELECT * FROM %s WHERE c1 LIKE ?",
+                             "%abc");
+        assertInvalidMessage("c1 LIKE '%<term>%' abc is only supported on properly indexed columns",
+                             "SELECT * FROM %s WHERE c1 LIKE ?",
+                             "%abc%");
+        execute("SELECT * FROM %s WHERE c1 LIKE ?", "abc%");
+        execute("SELECT * FROM %s WHERE c1 LIKE ?", "abc");
+        assertInvalidMessage("v1 LIKE '%<term>' abc is only supported on properly indexed columns",
+                             "SELECT * FROM %s WHERE v1 LIKE ?",
+                             "%abc");
+        assertInvalidMessage("v1 LIKE '%<term>%' abc is only supported on properly indexed columns",
+                             "SELECT * FROM %s WHERE v1 LIKE ?",
+                             "%abc%");
+        execute("SELECT * FROM %s WHERE v1 LIKE ?", "abc%");
+        execute("SELECT * FROM %s WHERE v1 LIKE ?", "abc");
+
+        // contains mode indexes support prefix/suffix/contains/matches
+        execute("SELECT * FROM %s WHERE c2 LIKE ?", "abc%");
+        execute("SELECT * FROM %s WHERE c2 LIKE ?", "%abc");
+        execute("SELECT * FROM %s WHERE c2 LIKE ?", "%abc%");
+        execute("SELECT * FROM %s WHERE c2 LIKE ?", "abc");
+        execute("SELECT * FROM %s WHERE v2 LIKE ?", "abc%");
+        execute("SELECT * FROM %s WHERE v2 LIKE ?", "%abc");
+        execute("SELECT * FROM %s WHERE v2 LIKE ?", "%abc%");
+        execute("SELECT * FROM %s WHERE v2 LIKE ?", "abc");
+
+        // LIKE is not supported on indexes of non-literal values
+        // this is rejected before binding, so the value isn't available in the error message
+        assertInvalidMessage("LIKE restriction is only supported on properly indexed columns. v3 LIKE ? is not valid",
+                             "SELECT * FROM %s WHERE v3 LIKE ?",
+                             "%abc");
+        assertInvalidMessage("LIKE restriction is only supported on properly indexed columns. v3 LIKE ? is not valid",
+                             "SELECT * FROM %s WHERE v3 LIKE ?",
+                             "%abc%");
+        assertInvalidMessage("LIKE restriction is only supported on properly indexed columns. v3 LIKE ? is not valid",
+                             "SELECT * FROM %s WHERE v3 LIKE ?",
+                             "%abc%");
+        assertInvalidMessage("LIKE restriction is only supported on properly indexed columns. v3 LIKE ? is not valid",
+                             "SELECT * FROM %s WHERE v3 LIKE ?",
+                             "abc");
+    }
+
     public void failInsert(String insertCQL, Object...args) throws Throwable
     {
         try
@@ -754,9 +814,6 @@
         assertInvalid("CREATE INDEX ON %s (a)");
         assertInvalid("CREATE INDEX ON %s (b)");
         assertInvalid("CREATE INDEX ON %s (c)");
-
-        createTable("CREATE TABLE %s (a int, b int, c int static , PRIMARY KEY (a, b))");
-        assertInvalid("CREATE INDEX ON %s (c)");
     }
 
     @Test

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/StaticColumnsTest.java b/test/unit/org/apache/cassandra/cql3/validation/entities/StaticColumnsTest.java
index 75cbcc7..ecffbf0 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/StaticColumnsTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/StaticColumnsTest.java

@@ -37,9 +37,16 @@
     @Test
     public void testStaticColumns() throws Throwable
     {
+        testStaticColumns(false);
+        testStaticColumns(true);
+    }
+
+    private void testStaticColumns(boolean forceFlush) throws Throwable
+    {
         createTable("CREATE TABLE %s ( k int, p int, s int static, v int, PRIMARY KEY (k, p))");
 
         execute("INSERT INTO %s(k, s) VALUES (0, 42)");
+        flush(forceFlush);
 
         assertRows(execute("SELECT * FROM %s"), row(0, null, 42, null));
 
@@ -51,6 +58,7 @@
 
         execute("INSERT INTO %s (k, p, s, v) VALUES (0, 0, 12, 0)");
         execute("INSERT INTO %s (k, p, s, v) VALUES (0, 1, 24, 1)");
+        flush(forceFlush);
 
         // Check the static columns in indeed "static"
         assertRows(execute("SELECT * FROM %s"), row(0, 0, 24, 0), row(0, 1, 24, 1));
@@ -81,10 +89,12 @@
 
         // Check that deleting a row don't implicitely deletes statics
         execute("DELETE FROM %s WHERE k=0 AND p=0");
+        flush(forceFlush);
         assertRows(execute("SELECT * FROM %s"),row(0, 1, 24, 1));
 
         // But that explicitely deleting the static column does remove it
         execute("DELETE s FROM %s WHERE k=0");
+        flush(forceFlush);
         assertRows(execute("SELECT * FROM %s"), row(0, 1, null, 1));
 
         // Check we can add a static column ...

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/UFAuthTest.java b/test/unit/org/apache/cassandra/cql3/validation/entities/UFAuthTest.java
index 6993bec..d085a9d 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/UFAuthTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/UFAuthTest.java

@@ -29,14 +29,11 @@
 import org.apache.cassandra.auth.*;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
-import org.apache.cassandra.cql3.Attributes;
-import org.apache.cassandra.cql3.CQLStatement;
-import org.apache.cassandra.cql3.QueryProcessor;
+import org.apache.cassandra.cql3.*;
 import org.apache.cassandra.cql3.functions.Function;
 import org.apache.cassandra.cql3.functions.FunctionName;
 import org.apache.cassandra.cql3.statements.BatchStatement;
 import org.apache.cassandra.cql3.statements.ModificationStatement;
-import org.apache.cassandra.cql3.CQLTester;
 import org.apache.cassandra.exceptions.*;
 import org.apache.cassandra.service.ClientState;
 import org.apache.cassandra.utils.Pair;
@@ -626,99 +623,4 @@
     {
         return String.format("%s(%s)", functionName, Joiner.on(",").join(args));
     }
-
-    static class StubAuthorizer implements IAuthorizer
-    {
-        Map<Pair<String, IResource>, Set<Permission>> userPermissions = new HashMap<>();
-
-        private void clear()
-        {
-            userPermissions.clear();
-        }
-
-        public Set<Permission> authorize(AuthenticatedUser user, IResource resource)
-        {
-            Pair<String, IResource> key = Pair.create(user.getName(), resource);
-            Set<Permission> perms = userPermissions.get(key);
-            return perms != null ? perms : Collections.<Permission>emptySet();
-        }
-
-        public void grant(AuthenticatedUser performer,
-                          Set<Permission> permissions,
-                          IResource resource,
-                          RoleResource grantee) throws RequestValidationException, RequestExecutionException
-        {
-            Pair<String, IResource> key = Pair.create(grantee.getRoleName(), resource);
-            Set<Permission> perms = userPermissions.get(key);
-            if (null == perms)
-            {
-                perms = new HashSet<>();
-                userPermissions.put(key, perms);
-            }
-            perms.addAll(permissions);
-        }
-
-        public void revoke(AuthenticatedUser performer,
-                           Set<Permission> permissions,
-                           IResource resource,
-                           RoleResource revokee) throws RequestValidationException, RequestExecutionException
-        {
-            Pair<String, IResource> key = Pair.create(revokee.getRoleName(), resource);
-            Set<Permission> perms = userPermissions.get(key);
-            if (null != perms)
-                perms.removeAll(permissions);
-            if (perms.isEmpty())
-                userPermissions.remove(key);
-        }
-
-        public Set<PermissionDetails> list(AuthenticatedUser performer,
-                                           Set<Permission> permissions,
-                                           IResource resource,
-                                           RoleResource grantee) throws RequestValidationException, RequestExecutionException
-        {
-            Pair<String, IResource> key = Pair.create(grantee.getRoleName(), resource);
-            Set<Permission> perms = userPermissions.get(key);
-            if (perms == null)
-                return Collections.emptySet();
-
-
-            Set<PermissionDetails> details = new HashSet<>();
-            for (Permission permission : perms)
-            {
-                if (permissions.contains(permission))
-                    details.add(new PermissionDetails(grantee.getRoleName(), resource, permission));
-            }
-            return details;
-        }
-
-        public void revokeAllFrom(RoleResource revokee)
-        {
-            for (Pair<String, IResource> key : userPermissions.keySet())
-                if (key.left.equals(revokee.getRoleName()))
-                    userPermissions.remove(key);
-        }
-
-        public void revokeAllOn(IResource droppedResource)
-        {
-            for (Pair<String, IResource> key : userPermissions.keySet())
-                if (key.right.equals(droppedResource))
-                    userPermissions.remove(key);
-
-        }
-
-        public Set<? extends IResource> protectedResources()
-        {
-            return Collections.emptySet();
-        }
-
-        public void validateConfiguration() throws ConfigurationException
-        {
-
-        }
-
-        public void setup()
-        {
-
-        }
-    }
 }

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/UFTest.java b/test/unit/org/apache/cassandra/cql3/validation/entities/UFTest.java
index cc0e806..e7c46a5 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/UFTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/UFTest.java

@@ -18,6 +18,7 @@
 package org.apache.cassandra.cql3.validation.entities;
 
 import java.nio.ByteBuffer;
+import java.security.AccessControlException;
 import java.util.ArrayList;
 import java.util.Arrays;
 import java.util.Date;
@@ -27,21 +28,22 @@
 import java.util.TreeMap;
 import java.util.TreeSet;
 import java.util.UUID;
-import java.security.AccessControlException;
 
+import com.google.common.reflect.TypeToken;
 import org.junit.Assert;
 import org.junit.Test;
 
 import com.datastax.driver.core.*;
 import com.datastax.driver.core.exceptions.InvalidQueryException;
+import org.apache.cassandra.config.Config;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.cql3.CQL3Type;
 import org.apache.cassandra.cql3.CQLTester;
 import org.apache.cassandra.cql3.QueryProcessor;
 import org.apache.cassandra.cql3.UntypedResultSet;
-import org.apache.cassandra.config.Config;
 import org.apache.cassandra.cql3.functions.FunctionName;
+import org.apache.cassandra.cql3.functions.JavaBasedUDFunction;
 import org.apache.cassandra.cql3.functions.UDFunction;
 import org.apache.cassandra.cql3.functions.UDHelper;
 import org.apache.cassandra.db.marshal.CollectionType;
@@ -59,6 +61,15 @@
 public class UFTest extends CQLTester
 {
     @Test
+    public void testJavaSourceName()
+    {
+        Assert.assertEquals("String", JavaBasedUDFunction.javaSourceName(TypeToken.of(String.class)));
+        Assert.assertEquals("java.util.Map<Integer, String>", JavaBasedUDFunction.javaSourceName(TypeTokens.mapOf(Integer.class, String.class)));
+        Assert.assertEquals("com.datastax.driver.core.UDTValue", JavaBasedUDFunction.javaSourceName(TypeToken.of(UDTValue.class)));
+        Assert.assertEquals("java.util.Set<com.datastax.driver.core.UDTValue>", JavaBasedUDFunction.javaSourceName(TypeTokens.setOf(UDTValue.class)));
+    }
+
+    @Test
     public void testNonExistingOnes() throws Throwable
     {
         assertInvalidThrowMessage("Cannot drop non existing function", InvalidRequestException.class, "DROP FUNCTION " + KEYSPACE + ".func_does_not_exist");
@@ -2485,4 +2496,224 @@
             }
         }
     }
+
+    @Test
+    public void testArgumentGenerics() throws Throwable
+    {
+        createTable("CREATE TABLE %s (key int primary key, sval text, aval ascii, bval blob, empty_int int)");
+
+        String typeName = createType("CREATE TYPE %s (txt text, i int)");
+
+        createFunction(KEYSPACE, "map<text,bigint>,list<text>",
+                       "CREATE FUNCTION IF NOT EXISTS %s(state map<text,bigint>, styles list<text>)\n" +
+                       "  RETURNS NULL ON NULL INPUT\n" +
+                       "  RETURNS map<text,bigint>\n" +
+                       "  LANGUAGE java\n" +
+                       "  AS $$\n" +
+                       "    for (String style : styles) {\n" +
+                       "      if (state.containsKey(style)) {\n" +
+                       "        state.put(style, state.get(style) + 1L);\n" +
+                       "      } else {\n" +
+                       "        state.put(style, 1L);\n" +
+                       "      }\n" +
+                       "    }\n" +
+                       "    return state;\n" +
+                       "  $$");
+
+        createFunction(KEYSPACE, "text",
+                                  "CREATE OR REPLACE FUNCTION %s("                 +
+                                  "  listText list<text>,"                         +
+                                  "  setText set<text>,"                           +
+                                  "  mapTextInt map<text, int>,"                   +
+                                  "  mapListTextSetInt map<frozen<list<text>>, frozen<set<int>>>," +
+                                  "  mapTextTuple map<text, frozen<tuple<int, text>>>," +
+                                  "  mapTextType map<text, frozen<" + typeName + ">>" +
+                                  ") "                                             +
+                                  "CALLED ON NULL INPUT "                          +
+                                  "RETURNS map<frozen<list<text>>, frozen<set<int>>> " +
+                                  "LANGUAGE JAVA\n"                                +
+                                  "AS $$" +
+                                  "     for (String s : listtext) {};" +
+                                  "     for (String s : settext) {};" +
+                                  "     for (String s : maptextint.keySet()) {};" +
+                                  "     for (Integer s : maptextint.values()) {};" +
+                                  "     for (java.util.List<String> l : maplisttextsetint.keySet()) {};" +
+                                  "     for (java.util.Set<Integer> s : maplisttextsetint.values()) {};" +
+                                  "     for (com.datastax.driver.core.TupleValue t : maptexttuple.values()) {};" +
+                                  "     for (com.datastax.driver.core.UDTValue u : maptexttype.values()) {};" +
+                                  "     return maplisttextsetint;" +
+                                  "$$");
+    }
+
+    @Test
+    public void testArgAndReturnTypes() throws Throwable
+    {
+
+        String type = KEYSPACE + '.' + createType("CREATE TYPE %s (txt text, i int)");
+
+        createTable("CREATE TABLE %s (key int primary key, udt frozen<" + type + ">)");
+        execute("INSERT INTO %s (key, udt) VALUES (1, {txt: 'foo', i: 42})");
+
+        // Java UDFs
+
+        String f = createFunction(KEYSPACE, "int",
+                                  "CREATE OR REPLACE FUNCTION %s(val int) " +
+                                  "RETURNS NULL ON NULL INPUT " +
+                                  "RETURNS " + type + ' ' +
+                                  "LANGUAGE JAVA\n" +
+                                  "AS 'return udfContext.newReturnUDTValue();';");
+
+        assertRows(execute("SELECT " + f + "(key) FROM %s"),
+                   row(userType("txt", null, "i", null)));
+
+        f = createFunction(KEYSPACE, "int",
+                           "CREATE OR REPLACE FUNCTION %s(val " + type + ") " +
+                           "RETURNS NULL ON NULL INPUT " +
+                           "RETURNS " + type + ' ' +
+                           "LANGUAGE JAVA\n" +
+                           "AS $$" +
+                           "   com.datastax.driver.core.UDTValue udt = udfContext.newArgUDTValue(\"val\");" +
+                           "   udt.setString(\"txt\", \"baz\");" +
+                           "   udt.setInt(\"i\", 88);" +
+                           "   return udt;" +
+                           "$$;");
+
+        assertRows(execute("SELECT " + f + "(udt) FROM %s"),
+                   row(userType("txt", "baz", "i", 88)));
+
+        f = createFunction(KEYSPACE, "int",
+                           "CREATE OR REPLACE FUNCTION %s(val " + type + ") " +
+                           "RETURNS NULL ON NULL INPUT " +
+                           "RETURNS tuple<text, int>" +
+                           "LANGUAGE JAVA\n" +
+                           "AS $$" +
+                           "   com.datastax.driver.core.TupleValue tv = udfContext.newReturnTupleValue();" +
+                           "   tv.setString(0, \"baz\");" +
+                           "   tv.setInt(1, 88);" +
+                           "   return tv;" +
+                           "$$;");
+
+        assertRows(execute("SELECT " + f + "(udt) FROM %s"),
+                   row(tuple("baz", 88)));
+
+        // JavaScript UDFs
+
+        f = createFunction(KEYSPACE, "int",
+                           "CREATE OR REPLACE FUNCTION %s(val int) " +
+                           "RETURNS NULL ON NULL INPUT " +
+                           "RETURNS " + type + ' ' +
+                           "LANGUAGE JAVASCRIPT\n" +
+                           "AS $$" +
+                           "   udt = udfContext.newReturnUDTValue();" +
+                           "   udt;" +
+                           "$$;");
+
+        assertRows(execute("SELECT " + f + "(key) FROM %s"),
+                   row(userType("txt", null, "i", null)));
+
+        f = createFunction(KEYSPACE, "int",
+                           "CREATE OR REPLACE FUNCTION %s(val " + type + ") " +
+                           "RETURNS NULL ON NULL INPUT " +
+                           "RETURNS " + type + ' ' +
+                           "LANGUAGE JAVASCRIPT\n" +
+                           "AS $$" +
+                           "   udt = udfContext.newArgUDTValue(0);" +
+                           "   udt.setString(\"txt\", \"baz\");" +
+                           "   udt.setInt(\"i\", 88);" +
+                           "   udt;" +
+                           "$$;");
+
+        assertRows(execute("SELECT " + f + "(udt) FROM %s"),
+                   row(userType("txt", "baz", "i", 88)));
+
+        f = createFunction(KEYSPACE, "int",
+                           "CREATE OR REPLACE FUNCTION %s(val " + type + ") " +
+                           "RETURNS NULL ON NULL INPUT " +
+                           "RETURNS tuple<text, int>" +
+                           "LANGUAGE JAVASCRIPT\n" +
+                           "AS $$" +
+                           "   tv = udfContext.newReturnTupleValue();" +
+                           "   tv.setString(0, \"baz\");" +
+                           "   tv.setInt(1, 88);" +
+                           "   tv;" +
+                           "$$;");
+
+        assertRows(execute("SELECT " + f + "(udt) FROM %s"),
+                   row(tuple("baz", 88)));
+
+        createFunction(KEYSPACE, "map",
+                       "CREATE FUNCTION %s(my_map map<text, text>)\n" +
+                       "         CALLED ON NULL INPUT\n" +
+                       "         RETURNS text\n" +
+                       "         LANGUAGE java\n" +
+                       "         AS $$\n" +
+                       "             String buffer = \"\";\n" +
+                       "             for(java.util.Map.Entry<String, String> entry: my_map.entrySet()) {\n" +
+                       "                 buffer = buffer + entry.getKey() + \": \" + entry.getValue() + \", \";\n" +
+                       "             }\n" +
+                       "             return buffer;\n" +
+                       "         $$;\n");
+    }
+
+    @Test
+    public void testImportJavaUtil() throws Throwable
+    {
+        createFunction(KEYSPACE, "list<text>",
+                "CREATE OR REPLACE FUNCTION %s(listText list<text>) "                                             +
+                        "CALLED ON NULL INPUT "                          +
+                        "RETURNS set<text> " +
+                        "LANGUAGE JAVA\n"                                +
+                        "AS $$\n" +
+                        "     Set<String> set = new HashSet<String>(); " +
+                        "     for (String s : listtext) {" +
+                        "            set.add(s);" +
+                        "     }" +
+                        "     return set;" +
+                        "$$");
+
+    }
+
+    @Test
+    public void testAnyUserTupleType() throws Throwable
+    {
+        createTable("CREATE TABLE %s (key int primary key, sval text)");
+        execute("INSERT INTO %s (key, sval) VALUES (1, 'foo')");
+
+        String udt = createType("CREATE TYPE %s (a int, b text, c bigint)");
+
+        String fUdt = createFunction(KEYSPACE, "text",
+                                     "CREATE OR REPLACE FUNCTION %s(arg text) " +
+                                     "CALLED ON NULL INPUT " +
+                                     "RETURNS " + udt + " " +
+                                     "LANGUAGE JAVA\n" +
+                                     "AS $$\n" +
+                                     "    UDTValue udt = udfContext.newUDTValue(\"" + udt + "\");" +
+                                     "    udt.setInt(\"a\", 42);" +
+                                     "    udt.setString(\"b\", \"42\");" +
+                                     "    udt.setLong(\"c\", 4242);" +
+                                     "    return udt;" +
+                                     "$$");
+
+        assertRows(execute("SELECT " + fUdt + "(sval) FROM %s"),
+                   row(userType("a", 42, "b", "42", "c", 4242L)));
+
+        String fTup = createFunction(KEYSPACE, "text",
+                                     "CREATE OR REPLACE FUNCTION %s(arg text) " +
+                                     "CALLED ON NULL INPUT " +
+                                     "RETURNS tuple<int, " + udt + "> " +
+                                     "LANGUAGE JAVA\n" +
+                                     "AS $$\n" +
+                                     "    UDTValue udt = udfContext.newUDTValue(\"" + udt + "\");" +
+                                     "    udt.setInt(\"a\", 42);" +
+                                     "    udt.setString(\"b\", \"42\");" +
+                                     "    udt.setLong(\"c\", 4242);" +
+                                     "    TupleValue tup = udfContext.newTupleValue(\"tuple<int," + udt + ">\");" +
+                                     "    tup.setInt(0, 88);" +
+                                     "    tup.setUDTValue(1, udt);" +
+                                     "    return tup;" +
+                                     "$$");
+
+        assertRows(execute("SELECT " + fTup + "(sval) FROM %s"),
+                   row(tuple(88, userType("a", 42, "b", "42", "c", 4242L))));
+    }
 }

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/UFVerifierTest.java b/test/unit/org/apache/cassandra/cql3/validation/entities/UFVerifierTest.java
index 0b78bf2..9a8e682 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/UFVerifierTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/UFVerifierTest.java

@@ -25,6 +25,7 @@
 import java.util.Arrays;
 import java.util.Collections;
 import java.util.HashSet;
+import java.util.Set;
 
 import org.junit.Test;
 
@@ -38,7 +39,10 @@
 import org.apache.cassandra.cql3.validation.entities.udfverify.ClassWithInitializer;
 import org.apache.cassandra.cql3.validation.entities.udfverify.ClassWithInitializer2;
 import org.apache.cassandra.cql3.validation.entities.udfverify.ClassWithInitializer3;
+import org.apache.cassandra.cql3.validation.entities.udfverify.ClassWithInnerClass;
+import org.apache.cassandra.cql3.validation.entities.udfverify.ClassWithInnerClass2;
 import org.apache.cassandra.cql3.validation.entities.udfverify.ClassWithStaticInitializer;
+import org.apache.cassandra.cql3.validation.entities.udfverify.ClassWithStaticInnerClass;
 import org.apache.cassandra.cql3.validation.entities.udfverify.GoodClass;
 import org.apache.cassandra.cql3.validation.entities.udfverify.UseOfSynchronized;
 import org.apache.cassandra.cql3.validation.entities.udfverify.UseOfSynchronizedWithNotify;
@@ -46,6 +50,7 @@
 import org.apache.cassandra.cql3.validation.entities.udfverify.UseOfSynchronizedWithWait;
 import org.apache.cassandra.cql3.validation.entities.udfverify.UseOfSynchronizedWithWaitL;
 import org.apache.cassandra.cql3.validation.entities.udfverify.UseOfSynchronizedWithWaitLI;
+import org.apache.cassandra.cql3.validation.entities.udfverify.UsingMapEntry;
 
 import static org.junit.Assert.assertEquals;
 
@@ -57,14 +62,14 @@
     @Test
     public void testByteCodeVerifier()
     {
-        new UDFByteCodeVerifier().verify(readClass(GoodClass.class));
+        verify(GoodClass.class);
     }
 
     @Test
     public void testClassWithField()
     {
         assertEquals(new HashSet<>(Collections.singletonList("field declared: field")),
-                     new UDFByteCodeVerifier().verify(readClass(ClassWithField.class)));
+                     verify(ClassWithField.class));
     }
 
     @Test
@@ -72,7 +77,7 @@
     {
         assertEquals(new HashSet<>(Arrays.asList("field declared: field",
                                                  "initializer declared")),
-                     new UDFByteCodeVerifier().verify(readClass(ClassWithInitializer.class)));
+                     verify(ClassWithInitializer.class));
     }
 
     @Test
@@ -80,91 +85,129 @@
     {
         assertEquals(new HashSet<>(Arrays.asList("field declared: field",
                                                  "initializer declared")),
-                     new UDFByteCodeVerifier().verify(readClass(ClassWithInitializer2.class)));
+                     verify(ClassWithInitializer2.class));
     }
 
     @Test
     public void testClassWithInitializer3()
     {
         assertEquals(new HashSet<>(Collections.singletonList("initializer declared")),
-                     new UDFByteCodeVerifier().verify(readClass(ClassWithInitializer3.class)));
+                     verify(ClassWithInitializer3.class));
     }
 
     @Test
     public void testClassWithStaticInitializer()
     {
         assertEquals(new HashSet<>(Collections.singletonList("static initializer declared")),
-                     new UDFByteCodeVerifier().verify(readClass(ClassWithStaticInitializer.class)));
+                     verify(ClassWithStaticInitializer.class));
     }
 
     @Test
     public void testUseOfSynchronized()
     {
         assertEquals(new HashSet<>(Collections.singletonList("use of synchronized")),
-                     new UDFByteCodeVerifier().verify(readClass(UseOfSynchronized.class)));
+                     verify(UseOfSynchronized.class));
     }
 
     @Test
     public void testUseOfSynchronizedWithNotify()
     {
         assertEquals(new HashSet<>(Arrays.asList("use of synchronized", "call to java.lang.Object.notify()")),
-                     new UDFByteCodeVerifier().verify(readClass(UseOfSynchronizedWithNotify.class)));
+                     verify(UseOfSynchronizedWithNotify.class));
     }
 
     @Test
     public void testUseOfSynchronizedWithNotifyAll()
     {
         assertEquals(new HashSet<>(Arrays.asList("use of synchronized", "call to java.lang.Object.notifyAll()")),
-                     new UDFByteCodeVerifier().verify(readClass(UseOfSynchronizedWithNotifyAll.class)));
+                     verify(UseOfSynchronizedWithNotifyAll.class));
     }
 
     @Test
     public void testUseOfSynchronizedWithWait()
     {
         assertEquals(new HashSet<>(Arrays.asList("use of synchronized", "call to java.lang.Object.wait()")),
-                     new UDFByteCodeVerifier().verify(readClass(UseOfSynchronizedWithWait.class)));
+                     verify(UseOfSynchronizedWithWait.class));
     }
 
     @Test
     public void testUseOfSynchronizedWithWaitL()
     {
         assertEquals(new HashSet<>(Arrays.asList("use of synchronized", "call to java.lang.Object.wait()")),
-                     new UDFByteCodeVerifier().verify(readClass(UseOfSynchronizedWithWaitL.class)));
+                     verify(UseOfSynchronizedWithWaitL.class));
     }
 
     @Test
     public void testUseOfSynchronizedWithWaitI()
     {
         assertEquals(new HashSet<>(Arrays.asList("use of synchronized", "call to java.lang.Object.wait()")),
-                     new UDFByteCodeVerifier().verify(readClass(UseOfSynchronizedWithWaitLI.class)));
+                     verify(UseOfSynchronizedWithWaitLI.class));
     }
 
     @Test
     public void testCallClone()
     {
         assertEquals(new HashSet<>(Collections.singletonList("call to java.lang.Object.clone()")),
-                     new UDFByteCodeVerifier().verify(readClass(CallClone.class)));
+                     verify(CallClone.class));
     }
 
     @Test
     public void testCallFinalize()
     {
         assertEquals(new HashSet<>(Collections.singletonList("call to java.lang.Object.finalize()")),
-                     new UDFByteCodeVerifier().verify(readClass(CallFinalize.class)));
+                     verify(CallFinalize.class));
     }
 
     @Test
     public void testCallComDatastax()
     {
         assertEquals(new HashSet<>(Collections.singletonList("call to com.datastax.driver.core.DataType.cint()")),
-                     new UDFByteCodeVerifier().addDisallowedPackage("com/").verify(readClass(CallComDatastax.class)));
+                     verify("com/", CallComDatastax.class));
     }
 
     @Test
     public void testCallOrgApache()
     {
         assertEquals(new HashSet<>(Collections.singletonList("call to org.apache.cassandra.config.DatabaseDescriptor.getClusterName()")),
-                     new UDFByteCodeVerifier().addDisallowedPackage("org/").verify(readClass(CallOrgApache.class)));
+                     verify("org/", CallOrgApache.class));
+    }
+
+    @Test
+    public void testClassStaticInnerClass()
+    {
+        assertEquals(new HashSet<>(Collections.singletonList("class declared as inner class")),
+                     verify(ClassWithStaticInnerClass.class));
+    }
+
+    @Test
+    public void testUsingMapEntry()
+    {
+        assertEquals(Collections.emptySet(),
+                     verify(UsingMapEntry.class));
+    }
+
+    @Test
+    public void testClassInnerClass()
+    {
+        assertEquals(new HashSet<>(Collections.singletonList("class declared as inner class")),
+                     verify(ClassWithInnerClass.class));
+    }
+
+    @Test
+    public void testClassInnerClass2()
+    {
+        assertEquals(Collections.emptySet(),
+                     verify(ClassWithInnerClass2.class));
+    }
+
+    private Set<String> verify(Class cls)
+    {
+        return new UDFByteCodeVerifier().verify(cls.getName(), readClass(cls));
+    }
+
+    private Set<String> verify(String disallowedPkg, Class cls)
+    {
+        return new UDFByteCodeVerifier().addDisallowedPackage(disallowedPkg).verify(cls.getName(), readClass(cls));
     }
 
     @SuppressWarnings("resource")

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/UserTypesTest.java b/test/unit/org/apache/cassandra/cql3/validation/entities/UserTypesTest.java
index 535f3e3..59383c6 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/UserTypesTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/UserTypesTest.java

@@ -29,10 +29,12 @@
 public class UserTypesTest extends CQLTester
 {
     @BeforeClass
-    public static void setUpClass()
+    public static void setUpClass()     // overrides CQLTester.setUpClass()
     {
         // Selecting partitioner for a table is not exposed on CREATE TABLE.
         StorageService.instance.setPartitionerUnsafe(ByteOrderedPartitioner.instance);
+
+        prepareServer();
     }
 
     @Test
@@ -73,32 +75,86 @@
         execute("INSERT INTO %s(k, v) VALUES (?, {x:?})", 1, -104.99251);
         execute("UPDATE %s SET b = ? WHERE k = ?", true, 1);
 
-        assertRows(execute("SELECT v.x FROM %s WHERE k = ? AND v = {x:?}", 1, -104.99251),
-            row(-104.99251)
-        );
-
-        flush();
-
-        assertRows(execute("SELECT v.x FROM %s WHERE k = ? AND v = {x:?}", 1, -104.99251),
-                   row(-104.99251)
+        beforeAndAfterFlush(() ->
+            assertRows(execute("SELECT v.x FROM %s WHERE k = ? AND v = {x:?}", 1, -104.99251),
+                row(-104.99251)
+            )
         );
     }
 
     @Test
-    public void testCreateInvalidTablesWithUDT() throws Throwable
+    public void testInvalidUDTStatements() throws Throwable
     {
-        String myType = createType("CREATE TYPE %s (f int)");
+        String typename = createType("CREATE TYPE %s (a int)");
+        String myType = KEYSPACE + '.' + typename;
 
-        // Using a UDT without frozen shouldn't work
-        assertInvalidMessage("Non-frozen User-Defined types are not supported, please use frozen<>",
-                             "CREATE TABLE " + KEYSPACE + ".wrong (k int PRIMARY KEY, v " + KEYSPACE + '.' + myType + ")");
+        // non-frozen UDTs in a table PK
+        assertInvalidMessage("Invalid non-frozen user-defined type for PRIMARY KEY component k",
+                "CREATE TABLE " + KEYSPACE + ".wrong (k " + myType + " PRIMARY KEY , v int)");
+        assertInvalidMessage("Invalid non-frozen user-defined type for PRIMARY KEY component k2",
+                "CREATE TABLE " + KEYSPACE + ".wrong (k1 int, k2 " + myType + ", v int, PRIMARY KEY (k1, k2))");
 
+        // non-frozen UDTs in a collection
+        assertInvalidMessage("Non-frozen UDTs are not allowed inside collections: list<" + myType + ">",
+                "CREATE TABLE " + KEYSPACE + ".wrong (k int PRIMARY KEY, v list<" + myType + ">)");
+        assertInvalidMessage("Non-frozen UDTs are not allowed inside collections: set<" + myType + ">",
+                "CREATE TABLE " + KEYSPACE + ".wrong (k int PRIMARY KEY, v set<" + myType + ">)");
+        assertInvalidMessage("Non-frozen UDTs are not allowed inside collections: map<" + myType + ", int>",
+                "CREATE TABLE " + KEYSPACE + ".wrong (k int PRIMARY KEY, v map<" + myType + ", int>)");
+        assertInvalidMessage("Non-frozen UDTs are not allowed inside collections: map<int, " + myType + ">",
+                "CREATE TABLE " + KEYSPACE + ".wrong (k int PRIMARY KEY, v map<int, " + myType + ">)");
+
+        // non-frozen UDT in a collection (as part of a UDT definition)
+        assertInvalidMessage("Non-frozen UDTs are not allowed inside collections: list<" + myType + ">",
+                "CREATE TYPE " + KEYSPACE + ".wrong (a int, b list<" + myType + ">)");
+
+        // non-frozen UDT in a UDT
+        assertInvalidMessage("A user type cannot contain non-frozen UDTs",
+                "CREATE TYPE " + KEYSPACE + ".wrong (a int, b " + myType + ")");
+
+        // referencing a UDT in another keyspace
         assertInvalidMessage("Statement on keyspace " + KEYSPACE + " cannot refer to a user type in keyspace otherkeyspace;" +
                              " user types can only be used in the keyspace they are defined in",
                              "CREATE TABLE " + KEYSPACE + ".wrong (k int PRIMARY KEY, v frozen<otherKeyspace.myType>)");
 
+        // referencing an unknown UDT
         assertInvalidMessage("Unknown type " + KEYSPACE + ".unknowntype",
                              "CREATE TABLE " + KEYSPACE + ".wrong (k int PRIMARY KEY, v frozen<" + KEYSPACE + '.' + "unknownType>)");
+
+        // bad deletions on frozen UDTs
+        createTable("CREATE TABLE %s (a int PRIMARY KEY, b frozen<" + myType + ">, c int)");
+        assertInvalidMessage("Frozen UDT column b does not support field deletion", "DELETE b.a FROM %s WHERE a = 0");
+        assertInvalidMessage("Invalid field deletion operation for non-UDT column c", "DELETE c.a FROM %s WHERE a = 0");
+
+        // bad updates on frozen UDTs
+        assertInvalidMessage("Invalid operation (b.a = 0) for frozen UDT column b", "UPDATE %s SET b.a = 0 WHERE a = 0");
+        assertInvalidMessage("Invalid operation (c.a = 0) for non-UDT column c", "UPDATE %s SET c.a = 0 WHERE a = 0");
+
+        // bad deletions on non-frozen UDTs
+        createTable("CREATE TABLE %s (a int PRIMARY KEY, b " + myType + ", c int)");
+        assertInvalidMessage("UDT column b does not have a field named foo", "DELETE b.foo FROM %s WHERE a = 0");
+
+        // bad updates on non-frozen UDTs
+        assertInvalidMessage("UDT column b does not have a field named foo", "UPDATE %s SET b.foo = 0 WHERE a = 0");
+
+        // bad insert on non-frozen UDTs
+        assertInvalidMessage("Unknown field 'foo' in value of user defined type", "INSERT INTO %s (a, b, c) VALUES (0, {a: 0, foo: 0}, 0)");
+        if (usePrepared())
+        {
+            assertInvalidMessage("Expected 1 value for " + typename + " column, but got more",
+                    "INSERT INTO %s (a, b, c) VALUES (0, ?, 0)", userType("a", 0, "foo", 0));
+        }
+        else
+        {
+            assertInvalidMessage("Unknown field 'foo' in value of user defined type " + typename,
+                    "INSERT INTO %s (a, b, c) VALUES (0, ?, 0)", userType("a", 0, "foo", 0));
+        }
+
+        // non-frozen UDT with non-frozen nested collection
+        String typename2 = createType("CREATE TYPE %s (bar int, foo list<int>)");
+        String myType2 = KEYSPACE + '.' + typename2;
+        assertInvalidMessage("Non-frozen UDTs with nested non-frozen collections are not supported",
+                "CREATE TABLE " + KEYSPACE + ".wrong (k int PRIMARY KEY, v " + myType2 + ")");
     }
 
     @Test
@@ -106,24 +162,61 @@
     {
         String myType = KEYSPACE + '.' + createType("CREATE TYPE %s (a int)");
         createTable("CREATE TABLE %s (a int PRIMARY KEY, b frozen<" + myType + ">)");
-        execute("INSERT INTO %s (a, b) VALUES (1, {a: 1})");
+        execute("INSERT INTO %s (a, b) VALUES (1, ?)", userType("a", 1));
 
         assertRows(execute("SELECT b.a FROM %s"), row(1));
 
         flush();
 
-        execute("ALTER TYPE " + myType + " ADD b int");
-        execute("INSERT INTO %s (a, b) VALUES (2, {a: 2, b :2})");
+        schemaChange("ALTER TYPE " + myType + " ADD b int");
+        execute("INSERT INTO %s (a, b) VALUES (2, ?)", userType("a", 2, "b", 2));
 
-        assertRows(execute("SELECT b.a, b.b FROM %s"),
-                   row(1, null),
-                   row(2, 2));
+        beforeAndAfterFlush(() ->
+            assertRows(execute("SELECT b.a, b.b FROM %s"),
+                       row(1, null),
+                       row(2, 2))
+        );
+    }
 
-        flush();
+    @Test
+    public void testAlterNonFrozenUDT() throws Throwable
+    {
+        String myType = KEYSPACE + '.' + createType("CREATE TYPE %s (a int, b text)");
+        createTable("CREATE TABLE %s (k int PRIMARY KEY, v " + myType + ")");
+        execute("INSERT INTO %s (k, v) VALUES (0, ?)", userType("a", 1, "b", "abc"));
 
-        assertRows(execute("SELECT b.a, b.b FROM %s"),
-                   row(1, null),
-                   row(2, 2));
+        beforeAndAfterFlush(() -> {
+            assertRows(execute("SELECT v FROM %s"), row(userType("a", 1, "b", "abc")));
+            assertRows(execute("SELECT v.a FROM %s"), row(1));
+            assertRows(execute("SELECT v.b FROM %s"), row("abc"));
+        });
+
+        schemaChange("ALTER TYPE " + myType + " RENAME b TO foo");
+        assertRows(execute("SELECT v FROM %s"), row(userType("a", 1, "b", "abc")));
+        assertRows(execute("SELECT v.a FROM %s"), row(1));
+        assertRows(execute("SELECT v.foo FROM %s"), row("abc"));
+
+        execute("UPDATE %s SET v.foo = 'def' WHERE k = 0");
+        assertRows(execute("SELECT v FROM %s"), row(userType("a", 1, "foo", "def")));
+        assertRows(execute("SELECT v.a FROM %s"), row(1));
+        assertRows(execute("SELECT v.foo FROM %s"), row("def"));
+
+        execute("INSERT INTO %s (k, v) VALUES (0, ?)", userType("a", 2, "foo", "def"));
+        assertRows(execute("SELECT v FROM %s"), row(userType("a", 2, "foo", "def")));
+        assertRows(execute("SELECT v.a FROM %s"), row(2));
+        assertRows(execute("SELECT v.foo FROM %s"), row("def"));
+
+        schemaChange("ALTER TYPE " + myType + " ADD c int");
+        assertRows(execute("SELECT v FROM %s"), row(userType("a", 2, "foo", "def", "c", null)));
+        assertRows(execute("SELECT v.a FROM %s"), row(2));
+        assertRows(execute("SELECT v.foo FROM %s"), row("def"));
+        assertRows(execute("SELECT v.c FROM %s"), row(new Object[] {null}));
+
+        execute("INSERT INTO %s (k, v) VALUES (0, ?)", userType("a", 3, "foo", "abc", "c", 0));
+        beforeAndAfterFlush(() -> {
+            assertRows(execute("SELECT v FROM %s"), row(userType("a", 3, "foo", "abc", "c", 0)));
+            assertRows(execute("SELECT v.c FROM %s"), row(0));
+        });
     }
 
     @Test
@@ -134,11 +227,14 @@
         String myOtherType = createType("CREATE TYPE %s (a frozen<" + myType + ">)");
         createTable("CREATE TABLE %s (k int PRIMARY KEY, v frozen<" + myType + ">, z frozen<" + myOtherType + ">)");
 
-        assertInvalidMessage("Invalid unset value for field 'y' of user defined type " + myType,
-                             "INSERT INTO %s (k, v) VALUES (10, {x:?, y:?})", 1, unset());
+        if (usePrepared())
+        {
+            assertInvalidMessage("Invalid unset value for field 'y' of user defined type " + myType,
+                    "INSERT INTO %s (k, v) VALUES (10, {x:?, y:?})", 1, unset());
 
-        assertInvalidMessage("Invalid unset value for field 'y' of user defined type " + myType,
-                             "INSERT INTO %s (k, v, z) VALUES (10, {x:?, y:?}, {a:{x: ?, y: ?}})", 1, 1, 1, unset());
+            assertInvalidMessage("Invalid unset value for field 'y' of user defined type " + myType,
+                    "INSERT INTO %s (k, v, z) VALUES (10, {x:?, y:?}, {a:{x: ?, y: ?}})", 1, 1, 1, unset());
+        }
     }
 
     @Test
@@ -153,28 +249,22 @@
 
             createTable("CREATE TABLE %s (x int PRIMARY KEY, y " + columnType + ")");
 
-            execute("INSERT INTO %s (x, y) VALUES(1, {'firstValue':{a:1}})");
-            assertRows(execute("SELECT * FROM %s"), row(1, map("firstValue", userType(1))));
+            execute("INSERT INTO %s (x, y) VALUES(1, ?)", map("firstValue", userType("a", 1)));
+            assertRows(execute("SELECT * FROM %s"), row(1, map("firstValue", userType("a", 1))));
             flush();
 
             execute("ALTER TYPE " + KEYSPACE + "." + ut1 + " ADD b int");
-            execute("INSERT INTO %s (x, y) VALUES(2, {'secondValue':{a:2, b:2}})");
-            execute("INSERT INTO %s (x, y) VALUES(3, {'thirdValue':{a:3}})");
-            execute("INSERT INTO %s (x, y) VALUES(4, {'fourthValue':{b:4}})");
+            execute("INSERT INTO %s (x, y) VALUES(2, ?)", map("secondValue", userType("a", 2, "b", 2)));
+            execute("INSERT INTO %s (x, y) VALUES(3, ?)", map("thirdValue", userType("a", 3, "b", null)));
+            execute("INSERT INTO %s (x, y) VALUES(4, ?)", map("fourthValue", userType("a", null, "b", 4)));
 
-            assertRows(execute("SELECT * FROM %s"),
-                    row(1, map("firstValue", userType(1))),
-                    row(2, map("secondValue", userType(2, 2))),
-                    row(3, map("thirdValue", userType(3, null))),
-                    row(4, map("fourthValue", userType(null, 4))));
-
-            flush();
-
-            assertRows(execute("SELECT * FROM %s"),
-                    row(1, map("firstValue", userType(1))),
-                    row(2, map("secondValue", userType(2, 2))),
-                    row(3, map("thirdValue", userType(3, null))),
-                    row(4, map("fourthValue", userType(null, 4))));
+            beforeAndAfterFlush(() ->
+                assertRows(execute("SELECT * FROM %s"),
+                        row(1, map("firstValue", userType("a", 1))),
+                        row(2, map("secondValue", userType("a", 2, "b", 2))),
+                        row(3, map("thirdValue", userType("a", 3, "b", null))),
+                        row(4, map("fourthValue", userType("a", null, "b", 4))))
+            );
         }
     }
 
@@ -188,22 +278,18 @@
 
         execute("INSERT INTO %s (x, y) VALUES(1, {'firstValue': {a: 1}})");
         assertRows(execute("SELECT * FROM %s"),
-                   row(1, map("firstValue", userType(1))));
+                   row(1, map("firstValue", userType("a", 1))));
 
         flush();
 
         execute("ALTER TYPE " + columnType + " ADD b int");
         execute("UPDATE %s SET y['secondValue'] = {a: 2, b: 2} WHERE x = 1");
 
-        assertRows(execute("SELECT * FROM %s"),
-                   row(1, map("firstValue", userType(1),
-                              "secondValue", userType(2, 2))));
-
-        flush();
-
-        assertRows(execute("SELECT * FROM %s"),
-                   row(1, map("firstValue", userType(1),
-                              "secondValue", userType(2, 2))));
+        beforeAndAfterFlush(() ->
+                            assertRows(execute("SELECT * FROM %s"),
+                                       row(1, map("firstValue", userType("a", 1),
+                                                  "secondValue", userType("a", 2, "b", 2))))
+        );
     }
 
     @Test
@@ -218,28 +304,22 @@
 
             createTable("CREATE TABLE %s (x int PRIMARY KEY, y " + columnType + ")");
 
-            execute("INSERT INTO %s (x, y) VALUES(1, {1} )");
-            assertRows(execute("SELECT * FROM %s"), row(1, set(userType(1))));
+            execute("INSERT INTO %s (x, y) VALUES(1, ?)", set(userType("a", 1)));
+            assertRows(execute("SELECT * FROM %s"), row(1, set(userType("a", 1))));
             flush();
 
             execute("ALTER TYPE " + KEYSPACE + "." + ut1 + " ADD b int");
-            execute("INSERT INTO %s (x, y) VALUES(2, {{a:2, b:2}})");
-            execute("INSERT INTO %s (x, y) VALUES(3, {{a:3}})");
-            execute("INSERT INTO %s (x, y) VALUES(4, {{b:4}})");
+            execute("INSERT INTO %s (x, y) VALUES(2, ?)", set(userType("a", 2, "b", 2)));
+            execute("INSERT INTO %s (x, y) VALUES(3, ?)", set(userType("a", 3, "b", null)));
+            execute("INSERT INTO %s (x, y) VALUES(4, ?)", set(userType("a", null, "b", 4)));
 
-            assertRows(execute("SELECT * FROM %s"),
-                    row(1, set(userType(1))),
-                    row(2, set(userType(2, 2))),
-                    row(3, set(userType(3, null))),
-                    row(4, set(userType(null, 4))));
-
-            flush();
-
-            assertRows(execute("SELECT * FROM %s"),
-                    row(1, set(userType(1))),
-                    row(2, set(userType(2, 2))),
-                    row(3, set(userType(3, null))),
-                    row(4, set(userType(null, 4))));
+            beforeAndAfterFlush(() ->
+                assertRows(execute("SELECT * FROM %s"),
+                        row(1, set(userType("a", 1))),
+                        row(2, set(userType("a", 2, "b", 2))),
+                        row(3, set(userType("a", 3, "b", null))),
+                        row(4, set(userType("a", null, "b", 4))))
+            );
         }
     }
 
@@ -255,28 +335,22 @@
 
             createTable("CREATE TABLE %s (x int PRIMARY KEY, y " + columnType + ")");
 
-            execute("INSERT INTO %s (x, y) VALUES(1, [1] )");
-            assertRows(execute("SELECT * FROM %s"), row(1, list(userType(1))));
+            execute("INSERT INTO %s (x, y) VALUES(1, ?)", list(userType("a", 1)));
+            assertRows(execute("SELECT * FROM %s"), row(1, list(userType("a", 1))));
             flush();
 
             execute("ALTER TYPE " + KEYSPACE + "." + ut1 + " ADD b int");
-            execute("INSERT INTO %s (x, y) VALUES(2, [{a:2, b:2}])");
-            execute("INSERT INTO %s (x, y) VALUES(3, [{a:3}])");
-            execute("INSERT INTO %s (x, y) VALUES(4, [{b:4}])");
+            execute("INSERT INTO %s (x, y) VALUES (2, ?)", list(userType("a", 2, "b", 2)));
+            execute("INSERT INTO %s (x, y) VALUES (3, ?)", list(userType("a", 3, "b", null)));
+            execute("INSERT INTO %s (x, y) VALUES (4, ?)", list(userType("a", null, "b", 4)));
 
-            assertRows(execute("SELECT * FROM %s"),
-                    row(1, list(userType(1))),
-                    row(2, list(userType(2, 2))),
-                    row(3, list(userType(3, null))),
-                    row(4, list(userType(null, 4))));
-
-            flush();
-
-            assertRows(execute("SELECT * FROM %s"),
-                    row(1, list(userType(1))),
-                    row(2, list(userType(2, 2))),
-                    row(3, list(userType(3, null))),
-                    row(4, list(userType(null, 4))));
+            beforeAndAfterFlush(() ->
+                assertRows(execute("SELECT * FROM %s"),
+                        row(1, list(userType("a", 1))),
+                        row(2, list(userType("a", 2, "b", 2))),
+                        row(3, list(userType("a", 3, "b", null))),
+                        row(4, list(userType("a", null, "b", 4))))
+            );
         }
     }
 
@@ -287,28 +361,22 @@
 
         createTable("CREATE TABLE %s (a int PRIMARY KEY, b frozen<tuple<int, " + KEYSPACE + "." + type + ">>)");
 
-        execute("INSERT INTO %s (a, b) VALUES(1, (1, {a:1, b:1}))");
-        assertRows(execute("SELECT * FROM %s"), row(1, tuple(1, userType(1, 1))));
+        execute("INSERT INTO %s (a, b) VALUES(1, (1, ?))", userType("a", 1, "b", 1));
+        assertRows(execute("SELECT * FROM %s"), row(1, tuple(1, userType("a", 1, "b", 1))));
         flush();
 
         execute("ALTER TYPE " + KEYSPACE + "." + type + " ADD c int");
-        execute("INSERT INTO %s (a, b) VALUES(2, (2, {a: 2, b: 2, c: 2}))");
-        execute("INSERT INTO %s (a, b) VALUES(3, (3, {a: 3, b: 3}))");
-        execute("INSERT INTO %s (a, b) VALUES(4, (4, {b:4}))");
+        execute("INSERT INTO %s (a, b) VALUES (2, (2, ?))", userType("a", 2, "b", 2, "c", 2));
+        execute("INSERT INTO %s (a, b) VALUES (3, (3, ?))", userType("a", 3, "b", 3, "c", null));
+        execute("INSERT INTO %s (a, b) VALUES (4, (4, ?))", userType("a", null, "b", 4, "c", null));
 
-        assertRows(execute("SELECT * FROM %s"),
-                   row(1, tuple(1, userType(1, 1))),
-                   row(2, tuple(2, userType(2, 2, 2))),
-                   row(3, tuple(3, userType(3, 3, null))),
-                   row(4, tuple(4, userType(null, 4, null))));
-
-        flush();
-
-        assertRows(execute("SELECT * FROM %s"),
-                   row(1, tuple(1, userType(1, 1))),
-                   row(2, tuple(2, userType(2, 2, 2))),
-                   row(3, tuple(3, userType(3, 3, null))),
-                   row(4, tuple(4, userType(null, 4, null))));
+        beforeAndAfterFlush(() ->
+            assertRows(execute("SELECT * FROM %s"),
+                    row(1, tuple(1, userType("a", 1, "b", 1))),
+                    row(2, tuple(2, userType("a", 2, "b", 2, "c", 2))),
+                    row(3, tuple(3, userType("a", 3, "b", 3, "c", null))),
+                    row(4, tuple(4, userType("a", null, "b", 4, "c", null))))
+        );
     }
 
     @Test
@@ -318,28 +386,22 @@
 
         createTable("CREATE TABLE %s (a int PRIMARY KEY, b frozen<tuple<int, tuple<int, " + KEYSPACE + "." + type + ">>>)");
 
-        execute("INSERT INTO %s (a, b) VALUES(1, (1, (1, {a:1, b:1})))");
-        assertRows(execute("SELECT * FROM %s"), row(1, tuple(1, tuple(1, userType(1, 1)))));
+        execute("INSERT INTO %s (a, b) VALUES(1, (1, (1, ?)))", userType("a", 1, "b", 1));
+        assertRows(execute("SELECT * FROM %s"), row(1, tuple(1, tuple(1, userType("a", 1, "b", 1)))));
         flush();
 
         execute("ALTER TYPE " + KEYSPACE + "." + type + " ADD c int");
-        execute("INSERT INTO %s (a, b) VALUES(2, (2, (1, {a: 2, b: 2, c: 2})))");
-        execute("INSERT INTO %s (a, b) VALUES(3, (3, (1, {a: 3, b: 3})))");
-        execute("INSERT INTO %s (a, b) VALUES(4, (4, (1, {b:4})))");
+        execute("INSERT INTO %s (a, b) VALUES(2, (2, (1, ?)))", userType("a", 2, "b", 2, "c", 2));
+        execute("INSERT INTO %s (a, b) VALUES(3, (3, ?))", tuple(1, userType("a", 3, "b", 3, "c", null)));
+        execute("INSERT INTO %s (a, b) VALUES(4, ?)", tuple(4, tuple(1, userType("a", null, "b", 4, "c", null))));
 
-        assertRows(execute("SELECT * FROM %s"),
-                   row(1, tuple(1, tuple(1, userType(1, 1)))),
-                   row(2, tuple(2, tuple(1, userType(2, 2, 2)))),
-                   row(3, tuple(3, tuple(1, userType(3, 3, null)))),
-                   row(4, tuple(4, tuple(1, userType(null, 4, null)))));
-
-        flush();
-
-        assertRows(execute("SELECT * FROM %s"),
-                   row(1, tuple(1, tuple(1, userType(1, 1)))),
-                   row(2, tuple(2, tuple(1, userType(2, 2, 2)))),
-                   row(3, tuple(3, tuple(1, userType(3, 3, null)))),
-                   row(4, tuple(4, tuple(1, userType(null, 4, null)))));
+        beforeAndAfterFlush(() ->
+            assertRows(execute("SELECT * FROM %s"),
+                    row(1, tuple(1, tuple(1, userType("a", 1, "b", 1)))),
+                    row(2, tuple(2, tuple(1, userType("a", 2, "b", 2, "c", 2)))),
+                    row(3, tuple(3, tuple(1, userType("a", 3, "b", 3, "c", null)))),
+                    row(4, tuple(4, tuple(1, userType("a", null, "b", 4, "c", null)))))
+        );
     }
 
     @Test
@@ -350,28 +412,24 @@
 
         createTable("CREATE TABLE %s (a int PRIMARY KEY, b frozen<" + KEYSPACE + "." + otherType + ">)");
 
-        execute("INSERT INTO %s (a, b) VALUES(1, {x: {a:1, b:1}})");
+        execute("INSERT INTO %s (a, b) VALUES(1, {x: ?})", userType("a", 1, "b", 1));
+        assertRows(execute("SELECT b.x.a, b.x.b FROM %s"), row(1, 1));
+        execute("INSERT INTO %s (a, b) VALUES(1, ?)", userType("x", userType("a", 1, "b", 1)));
         assertRows(execute("SELECT b.x.a, b.x.b FROM %s"), row(1, 1));
         flush();
 
         execute("ALTER TYPE " + KEYSPACE + "." + type + " ADD c int");
-        execute("INSERT INTO %s (a, b) VALUES(2, {x: {a: 2, b: 2, c: 2}})");
-        execute("INSERT INTO %s (a, b) VALUES(3, {x: {a: 3, b: 3}})");
-        execute("INSERT INTO %s (a, b) VALUES(4, {x: {b:4}})");
+        execute("INSERT INTO %s (a, b) VALUES(2, {x: ?})", userType("a", 2, "b", 2, "c", 2));
+        execute("INSERT INTO %s (a, b) VALUES(3, {x: ?})", userType("a", 3, "b", 3));
+        execute("INSERT INTO %s (a, b) VALUES(4, {x: ?})", userType("a", null, "b", 4));
 
-        assertRows(execute("SELECT b.x.a, b.x.b, b.x.c FROM %s"),
-                   row(1, 1, null),
-                   row(2, 2, 2),
-                   row(3, 3, null),
-                   row(null, 4, null));
-
-        flush();
-
-        assertRows(execute("SELECT b.x.a, b.x.b, b.x.c FROM %s"),
-                   row(1, 1, null),
-                   row(2, 2, 2),
-                   row(3, 3, null),
-                   row(null, 4, null));
+        beforeAndAfterFlush(() ->
+            assertRows(execute("SELECT b.x.a, b.x.b, b.x.c FROM %s"),
+                       row(1, 1, null),
+                       row(2, 2, 2),
+                       row(3, 3, null),
+                       row(null, 4, null))
+        );
     }
 
     /**
@@ -411,10 +469,11 @@
 
         createTable("CREATE TABLE %s (id int PRIMARY KEY, val frozen<" + type2 + ">)");
 
-        execute("INSERT INTO %s (id, val) VALUES (0, { s : {{ s : {'foo', 'bar'}, m : { 'foo' : 'bar' }, l : ['foo', 'bar']} }})");
+        execute("INSERT INTO %s (id, val) VALUES (0, ?)",
+                userType("s", set(userType("s", set("foo", "bar"), "m", map("foo", "bar"), "l", list("foo", "bar")))));
 
-        // TODO: check result once we have an easy way to do it. For now we just check it doesn't crash
-        execute("SELECT * FROM %s");
+        assertRows(execute("SELECT * FROM %s"),
+                row(0, userType("s", set(userType("s", set("foo", "bar"), "m", map("foo", "bar"), "l", list("foo", "bar"))))));
     }
 
     /**
@@ -426,9 +485,11 @@
         String typeName = createType("CREATE TYPE %s (fooint int, fooset set <text>)");
         createTable("CREATE TABLE %s (key int PRIMARY KEY, data frozen <" + typeName + ">)");
 
-        execute("INSERT INTO %s (key, data) VALUES (1, {fooint: 1, fooset: {'2'}})");
+        execute("INSERT INTO %s (key, data) VALUES (1, ?)", userType("fooint", 1, "fooset", set("2")));
         execute("ALTER TYPE " + keyspace() + "." + typeName + " ADD foomap map <int,text>");
-        execute("INSERT INTO %s (key, data) VALUES (1, {fooint: 1, fooset: {'2'}, foomap: {3 : 'bar'}})");
+        execute("INSERT INTO %s (key, data) VALUES (1, ?)", userType("fooint", 1, "fooset", set("2"), "foomap", map(3, "bar")));
+        assertRows(execute("SELECT * FROM %s"),
+                row(1, userType("fooint", 1, "fooset", set("2"), "foomap", map(3, "bar"))));
     }
 
     @Test
@@ -492,13 +553,13 @@
 
         type1 = createType("CREATE TYPE %s (foo ascii)");
         String type2 = createType("CREATE TYPE %s (foo frozen<" + type1 + ">)");
-        assertComplexInvalidAlterDropStatements(type1, type2, "{foo: 'abc'}");
+        assertComplexInvalidAlterDropStatements(type1, type2, "{foo: {foo: 'abc'}}");
 
         type1 = createType("CREATE TYPE %s (foo ascii)");
         type2 = createType("CREATE TYPE %s (foo frozen<" + type1 + ">)");
         assertComplexInvalidAlterDropStatements(type1,
                                                 "list<frozen<" + type2 + ">>",
-                                                "[{foo: 'abc'}]");
+                                                "[{foo: {foo: 'abc'}}]");
 
         type1 = createType("CREATE TYPE %s (foo ascii)");
         type2 = createType("CREATE TYPE %s (foo frozen<set<" + type1 + ">>)");
@@ -543,6 +604,110 @@
         assertInvalidMessage("Cannot drop user type " + typeWithKs(t), "DROP TYPE " + typeWithKs(t) + ';');
     }
 
+    @Test
+    public void testInsertNonFrozenUDT() throws Throwable
+    {
+        String typeName = createType("CREATE TYPE %s (a int, b text)");
+        createTable("CREATE TABLE %s (k int PRIMARY KEY, v " + typeName + ")");
+
+        execute("INSERT INTO %s (k, v) VALUES (?, {a: ?, b: ?})", 0, 0, "abc");
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 0, "b", "abc")));
+
+        execute("INSERT INTO %s (k, v) VALUES (?, ?)", 0, userType("a", 0, "b", "abc"));
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 0, "b", "abc")));
+
+        execute("INSERT INTO %s (k, v) VALUES (?, {a: ?, b: ?})", 0, 0, null);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 0, "b", null)));
+
+        execute("INSERT INTO %s (k, v) VALUES (?, ?)", 0, userType("a", null, "b", "abc"));
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", null, "b", "abc")));
+    }
+
+    @Test
+    public void testUpdateNonFrozenUDT() throws Throwable
+    {
+        String typeName = createType("CREATE TYPE %s (a int, b text)");
+        createTable("CREATE TABLE %s (k int PRIMARY KEY, v " + typeName + ")");
+
+        execute("INSERT INTO %s (k, v) VALUES (?, ?)", 0, userType("a", 0, "b", "abc"));
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 0, "b", "abc")));
+
+        // overwrite the whole UDT
+        execute("UPDATE %s SET v = ? WHERE k = ?", userType("a", 1, "b", "def"), 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 1, "b", "def")));
+
+        execute("UPDATE %s SET v = ? WHERE k = ?", userType("a", 0, "b", null), 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 0, "b", null)));
+
+        execute("UPDATE %s SET v = ? WHERE k = ?", userType("a", null, "b", "abc"), 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", null, "b", "abc")));
+
+        // individually set fields to non-null values
+        execute("UPDATE %s SET v.a = ? WHERE k = ?", 1, 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 1, "b", "abc")));
+
+        execute("UPDATE %s SET v.b = ? WHERE k = ?", "def", 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 1, "b", "def")));
+
+        execute("UPDATE %s SET v.a = ?, v.b = ? WHERE k = ?", 0, "abc", 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 0, "b", "abc")));
+
+        execute("UPDATE %s SET v.b = ?, v.a = ? WHERE k = ?", "abc", 0, 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 0, "b", "abc")));
+
+        // individually set fields to null values
+        execute("UPDATE %s SET v.a = ? WHERE k = ?", null, 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", null, "b", "abc")));
+
+        execute("UPDATE %s SET v.a = ? WHERE k = ?", 0, 0);
+        execute("UPDATE %s SET v.b = ? WHERE k = ?", null, 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 0, "b", null)));
+
+        execute("UPDATE %s SET v.a = ? WHERE k = ?", null, 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, null));
+
+        assertInvalid("UPDATE %s SET v.bad = ? FROM %s WHERE k = ?", 0, 0);
+        assertInvalid("UPDATE %s SET v = ? FROM %s WHERE k = ?", 0, 0);
+        assertInvalid("UPDATE %s SET v = ? FROM %s WHERE k = ?", userType("a", 1, "b", "abc", "bad", 123), 0);
+    }
+
+    @Test
+    public void testDeleteNonFrozenUDT() throws Throwable
+    {
+        String typeName = createType("CREATE TYPE %s (a int, b text)");
+        createTable("CREATE TABLE %s (k int PRIMARY KEY, v " + typeName + ")");
+
+        execute("INSERT INTO %s (k, v) VALUES (?, ?)", 0, userType("a", 0, "b", "abc"));
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 0, "b", "abc")));
+
+        execute("DELETE v.b FROM %s WHERE k = ?", 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", 0, "b", null)));
+
+        execute("INSERT INTO %s (k, v) VALUES (?, ?)", 0, userType("a", 0, "b", "abc"));
+        execute("DELETE v.a FROM %s WHERE k = ?", 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, userType("a", null, "b", "abc")));
+
+        execute("DELETE v.b FROM %s WHERE k = ?", 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, null));
+
+        // delete both fields at once
+        execute("INSERT INTO %s (k, v) VALUES (?, ?)", 0, userType("a", 0, "b", "abc"));
+        execute("DELETE v.a, v.b FROM %s WHERE k = ?", 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, null));
+
+        // same, but reverse field order
+        execute("INSERT INTO %s (k, v) VALUES (?, ?)", 0, userType("a", 0, "b", "abc"));
+        execute("DELETE v.b, v.a FROM %s WHERE k = ?", 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, null));
+
+        // delete the whole thing at once
+        execute("INSERT INTO %s (k, v) VALUES (?, ?)", 0, userType("a", 0, "b", "abc"));
+        execute("DELETE v FROM %s WHERE k = ?", 0);
+        assertRows(execute("SELECT * FROM %s WHERE k = ?", 0), row(0, null));
+
+        assertInvalid("DELETE v.bad FROM %s WHERE k = ?", 0);
+    }
+
     private String typeWithKs(String type1)
     {
         return keyspace() + '.' + type1;

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallClone.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallClone.java
index c01fbe6..e8bae70 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallClone.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallClone.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class CallClone extends JavaUDF
 {
-    public CallClone(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public CallClone(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallComDatastax.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallComDatastax.java
index 9cd799f..1af5b01 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallComDatastax.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallComDatastax.java

@@ -24,15 +24,16 @@
 import com.datastax.driver.core.DataType;
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class CallComDatastax extends JavaUDF
 {
-    public CallComDatastax(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public CallComDatastax(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallFinalize.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallFinalize.java
index a16bd31..5208849 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallFinalize.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallFinalize.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class CallFinalize extends JavaUDF
 {
-    public CallFinalize(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public CallFinalize(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallOrgApache.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallOrgApache.java
index 4f511d7..758d0d0 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallOrgApache.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/CallOrgApache.java

@@ -24,15 +24,16 @@
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class CallOrgApache extends JavaUDF
 {
-    public CallOrgApache(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public CallOrgApache(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithField.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithField.java
index d981c18..256c2bd 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithField.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithField.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class ClassWithField extends JavaUDF
 {
-    public ClassWithField(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public ClassWithField(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer.java
index f53cc24..3366314 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class ClassWithInitializer extends JavaUDF
 {
-    public ClassWithInitializer(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public ClassWithInitializer(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer2.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer2.java
index 134f9f9..aaf3e7b 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer2.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer2.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class ClassWithInitializer2 extends JavaUDF
 {
-    public ClassWithInitializer2(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public ClassWithInitializer2(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer3.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer3.java
index 9cd04fb..4895aa0 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer3.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInitializer3.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class ClassWithInitializer3 extends JavaUDF
 {
-    public ClassWithInitializer3(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public ClassWithInitializer3(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInnerClass.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInnerClass.java
new file mode 100644
index 0000000..2166771
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInnerClass.java

@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.cql3.validation.entities.udfverify;
+
+import java.nio.ByteBuffer;
+import java.util.List;
+
+import com.datastax.driver.core.TypeCodec;
+import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
+
+/**
+ * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
+ */
+public final class ClassWithInnerClass extends JavaUDF
+{
+    public ClassWithInnerClass(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
+    {
+        super(returnDataType, argDataTypes, udfContext);
+    }
+
+    protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)
+    {
+        return null;
+    }
+
+    // this is NOT fine
+    final class ClassWithInner_Inner {
+
+    }
+}

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInnerClass2.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInnerClass2.java
new file mode 100644
index 0000000..9c18510
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithInnerClass2.java

@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.cql3.validation.entities.udfverify;
+
+import java.nio.ByteBuffer;
+import java.util.List;
+
+import com.datastax.driver.core.TypeCodec;
+import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
+
+/**
+ * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
+ */
+public final class ClassWithInnerClass2 extends JavaUDF
+{
+    public ClassWithInnerClass2(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
+    {
+        super(returnDataType, argDataTypes, udfContext);
+    }
+
+    protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)
+    {
+        // this is fine
+        new Runnable() {
+            public void run()
+            {
+
+            }
+        }.run();
+        return null;
+    }
+}

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithStaticInitializer.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithStaticInitializer.java
index 64470ca..3c958e8 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithStaticInitializer.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithStaticInitializer.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class ClassWithStaticInitializer extends JavaUDF
 {
-    public ClassWithStaticInitializer(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public ClassWithStaticInitializer(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithStaticInnerClass.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithStaticInnerClass.java
new file mode 100644
index 0000000..fada145
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/ClassWithStaticInnerClass.java

@@ -0,0 +1,47 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.cql3.validation.entities.udfverify;
+
+import java.nio.ByteBuffer;
+import java.util.List;
+
+import com.datastax.driver.core.TypeCodec;
+import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
+
+/**
+ * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
+ */
+public final class ClassWithStaticInnerClass extends JavaUDF
+{
+    public ClassWithStaticInnerClass(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
+    {
+        super(returnDataType, argDataTypes, udfContext);
+    }
+
+    protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)
+    {
+        return null;
+    }
+
+    // this is NOT fine
+    static final class ClassWithStaticInner_Inner {
+
+    }
+}

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/GoodClass.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/GoodClass.java
index e3bc1e2..eb25f72 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/GoodClass.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/GoodClass.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class GoodClass extends JavaUDF
 {
-    public GoodClass(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public GoodClass(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronized.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronized.java
index 2927b3e..bbbc823 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronized.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronized.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class UseOfSynchronized extends JavaUDF
 {
-    public UseOfSynchronized(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public UseOfSynchronized(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithNotify.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithNotify.java
index 7ef2e1c..07c70c7 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithNotify.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithNotify.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class UseOfSynchronizedWithNotify extends JavaUDF
 {
-    public UseOfSynchronizedWithNotify(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public UseOfSynchronizedWithNotify(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithNotifyAll.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithNotifyAll.java
index 50a3da8..529c995 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithNotifyAll.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithNotifyAll.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class UseOfSynchronizedWithNotifyAll extends JavaUDF
 {
-    public UseOfSynchronizedWithNotifyAll(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public UseOfSynchronizedWithNotifyAll(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWait.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWait.java
index 135c550..6e39813 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWait.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWait.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class UseOfSynchronizedWithWait extends JavaUDF
 {
-    public UseOfSynchronizedWithWait(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public UseOfSynchronizedWithWait(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWaitL.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWaitL.java
index 4e49e5b..ac29211 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWaitL.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWaitL.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class UseOfSynchronizedWithWaitL extends JavaUDF
 {
-    public UseOfSynchronizedWithWaitL(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public UseOfSynchronizedWithWaitL(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWaitLI.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWaitLI.java
index 6770e7a..3b9ce8b 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWaitLI.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UseOfSynchronizedWithWaitLI.java

@@ -23,15 +23,16 @@
 
 import com.datastax.driver.core.TypeCodec;
 import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
 
 /**
  * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
  */
 public final class UseOfSynchronizedWithWaitLI extends JavaUDF
 {
-    public UseOfSynchronizedWithWaitLI(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes)
+    public UseOfSynchronizedWithWaitLI(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
     {
-        super(returnDataType, argDataTypes);
+        super(returnDataType, argDataTypes, udfContext);
     }
 
     protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)

diff --git a/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UsingMapEntry.java b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UsingMapEntry.java
new file mode 100644
index 0000000..5091dc1
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/validation/entities/udfverify/UsingMapEntry.java

@@ -0,0 +1,50 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.cql3.validation.entities.udfverify;
+
+import java.nio.ByteBuffer;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+
+import com.datastax.driver.core.TypeCodec;
+import org.apache.cassandra.cql3.functions.JavaUDF;
+import org.apache.cassandra.cql3.functions.UDFContext;
+
+/**
+ * Used by {@link org.apache.cassandra.cql3.validation.entities.UFVerifierTest}.
+ */
+public final class UsingMapEntry extends JavaUDF
+{
+    public UsingMapEntry(TypeCodec<Object> returnDataType, TypeCodec<Object>[] argDataTypes, UDFContext udfContext)
+    {
+        super(returnDataType, argDataTypes, udfContext);
+    }
+
+    protected ByteBuffer executeImpl(int protocolVersion, List<ByteBuffer> params)
+    {
+        Map<String, String> map = new HashMap<>();
+        // Map.Entry is passed in as an "inner class usage"
+        for (Map.Entry<String, String> stringStringEntry : map.entrySet())
+        {
+
+        }
+        return null;
+    }
+}

diff --git a/test/unit/org/apache/cassandra/cql3/validation/miscellaneous/CrcCheckChanceTest.java b/test/unit/org/apache/cassandra/cql3/validation/miscellaneous/CrcCheckChanceTest.java
index d059f7d..2760ae5 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/miscellaneous/CrcCheckChanceTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/miscellaneous/CrcCheckChanceTest.java

@@ -30,8 +30,8 @@
 import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.db.compaction.CompactionInterruptedException;
 import org.apache.cassandra.db.compaction.CompactionManager;
-import org.apache.cassandra.io.compress.CompressedRandomAccessReader;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
+import org.apache.cassandra.io.util.RandomAccessReader;
 import org.apache.cassandra.utils.FBUtilities;
 
 
@@ -153,8 +153,8 @@
         // note: only compressed files currently perform crc checks, so only the dfile reader is relevant here
         SSTableReader baseSSTable = cfs.getLiveSSTables().iterator().next();
         SSTableReader idxSSTable = indexCfs.getLiveSSTables().iterator().next();
-        try (CompressedRandomAccessReader baseDataReader = (CompressedRandomAccessReader)baseSSTable.openDataReader();
-             CompressedRandomAccessReader idxDataReader = (CompressedRandomAccessReader)idxSSTable.openDataReader())
+        try (RandomAccessReader baseDataReader = baseSSTable.openDataReader();
+             RandomAccessReader idxDataReader = idxSSTable.openDataReader())
         {
             Assert.assertEquals(0.03, baseDataReader.getCrcCheckChance());
             Assert.assertEquals(0.03, idxDataReader.getCrcCheckChance());

diff --git a/test/unit/org/apache/cassandra/cql3/validation/miscellaneous/SSTablesIteratedTest.java b/test/unit/org/apache/cassandra/cql3/validation/miscellaneous/SSTablesIteratedTest.java
new file mode 100644
index 0000000..ad7bd15
--- /dev/null
+++ b/test/unit/org/apache/cassandra/cql3/validation/miscellaneous/SSTablesIteratedTest.java

@@ -0,0 +1,475 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.cql3.validation.miscellaneous;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.cql3.CQLTester;
+import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.metrics.ClearableHistogram;
+
+/**
+ * Tests for checking how many sstables we access during cql queries with LIMIT specified,
+ * see CASSANDRA-8180.
+ */
+public class SSTablesIteratedTest extends CQLTester
+{
+    private void executeAndCheck(String query, int numSSTables, Object[]... rows) throws Throwable
+    {
+        ColumnFamilyStore cfs = getCurrentColumnFamilyStore();
+
+        ((ClearableHistogram) cfs.metric.sstablesPerReadHistogram.cf).clear(); // resets counts
+
+        assertRows(execute(query), rows);
+
+        assertEquals(numSSTables, cfs.metric.sstablesPerReadHistogram.cf.getSnapshot().getMax()); // max sstables read
+    }
+
+    @Override
+    protected String createTable(String query)
+    {
+        String ret = super.createTable(query);
+        disableCompaction();
+        return ret;
+    }
+
+    @Test
+    public void testSSTablesOnlyASC() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col ASC)");
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 10, "10");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 20, "20");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 30, "30");
+        flush();
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 1, row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 2, row(1, 10, "10"), row(1, 20, "20"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 3", 3, row(1, 10, "10"), row(1, 20, "20"), row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 3, row(1, 10, "10"), row(1, 20, "20"), row(1, 30, "30"));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 25 LIMIT 1", 1, row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col < 40 LIMIT 1", 1, row(1, 10, "10"));
+    }
+
+    @Test
+    public void testMixedMemtableSStablesASC() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col ASC)");
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 30, "30");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 20, "20");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 10, "10");
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 0, row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 1, row(1, 10, "10"), row(1, 20, "20"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 3", 2, row(1, 10, "10"), row(1, 20, "20"), row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 2, row(1, 10, "10"), row(1, 20, "20"), row(1, 30, "30"));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 25 LIMIT 1", 1, row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col < 40 LIMIT 1", 0, row(1, 10, "10"));
+    }
+
+    @Test
+    public void testOverlappingSStablesASC() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col ASC)");
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 10, "10");
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 30, "30");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 20, "20");
+        flush();
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 1, row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 2, row(1, 10, "10"), row(1, 20, "20"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 3", 2, row(1, 10, "10"), row(1, 20, "20"), row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 2, row(1, 10, "10"), row(1, 20, "20"), row(1, 30, "30"));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 25 LIMIT 1", 1, row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col < 40 LIMIT 1", 1, row(1, 10, "10"));
+    }
+
+    @Test
+    public void testSSTablesOnlyDESC() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col DESC)");
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 10, "10");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 20, "20");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 30, "30");
+        flush();
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 1, row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 2, row(1, 30, "30"), row(1, 20, "20"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 3", 3, row(1, 30, "30"), row(1, 20, "20"), row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 3, row(1, 30, "30"), row(1, 20, "20"), row(1, 10, "10"));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 25 LIMIT 1", 1, row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col < 40 LIMIT 1", 1, row(1, 30, "30"));
+    }
+
+    @Test
+    public void testMixedMemtableSStablesDESC() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col DESC)");
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 10, "10");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 20, "20");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 30, "30");
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 0, row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 1, row(1, 30, "30"), row(1, 20, "20"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 3", 2, row(1, 30, "30"), row(1, 20, "20"), row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 2, row(1, 30, "30"), row(1, 20, "20"), row(1, 10, "10"));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 25 LIMIT 1", 0, row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col < 40 LIMIT 1", 0, row(1, 30, "30"));
+    }
+
+    @Test
+    public void testOverlappingSStablesDESC() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col DESC)");
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 10, "10");
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 30, "30");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 20, "20");
+        flush();
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 1, row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 2, row(1, 30, "30"), row(1, 20, "20"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 3", 2, row(1, 30, "30"), row(1, 20, "20"), row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 2, row(1, 30, "30"), row(1, 20, "20"), row(1, 10, "10"));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 25 LIMIT 1", 1, row(1, 30, "30"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col < 40 LIMIT 1", 1, row(1, 30, "30"));
+    }
+
+    @Test
+    public void testDeletionOnDifferentSSTables() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col DESC)");
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 10, "10");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 20, "20");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 30, "30");
+        flush();
+
+        execute("DELETE FROM %s WHERE id=1 and col=30");
+        flush();
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 3, row(1, 20, "20"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 4, row(1, 20, "20"), row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 3", 4, row(1, 20, "20"), row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 4, row(1, 20, "20"), row(1, 10, "10"));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 25 LIMIT 1", 2);
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col < 40 LIMIT 1", 3, row(1, 20, "20"));
+    }
+
+    @Test
+    public void testDeletionOnSameSSTable() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col DESC)");
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 10, "10");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 20, "20");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 30, "30");
+        execute("DELETE FROM %s WHERE id=1 and col=30");
+        flush();
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 2, row(1, 20, "20"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 3, row(1, 20, "20"), row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 3", 3, row(1, 20, "20"), row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 3, row(1, 20, "20"), row(1, 10, "10"));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 25 LIMIT 1", 1);
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col < 40 LIMIT 1", 2, row(1, 20, "20"));
+    }
+
+    @Test
+    public void testDeletionOnMemTable() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col DESC)");
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 10, "10");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 20, "20");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 30, "30");
+        execute("DELETE FROM %s WHERE id=1 and col=30");
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 1, row(1, 20, "20"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 2, row(1, 20, "20"), row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 3", 2, row(1, 20, "20"), row(1, 10, "10"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 2, row(1, 20, "20"), row(1, 10, "10"));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 25 LIMIT 1", 0);
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col < 40 LIMIT 1", 1, row(1, 20, "20"));
+    }
+
+    @Test
+    public void testDeletionOnIndexedSSTableDESC() throws Throwable
+    {
+        testDeletionOnIndexedSSTableDESC(true);
+        testDeletionOnIndexedSSTableDESC(false);
+    }
+
+    private void testDeletionOnIndexedSSTableDESC(boolean deleteWithRange) throws Throwable
+    {
+        // reduce the column index size so that columns get indexed during flush
+        DatabaseDescriptor.setColumnIndexSize(1);
+
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col DESC)");
+
+        for (int i = 1; i <= 1000; i++)
+        {
+            execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, i, Integer.toString(i));
+        }
+        flush();
+
+        Object[][] allRows = new Object[1000][];
+        for (int i = 1001; i <= 2000; i++)
+        {
+            execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, i, Integer.toString(i));
+            allRows[2000 - i] = row(1, i, Integer.toString(i));
+        }
+
+        if (deleteWithRange)
+        {
+            execute("DELETE FROM %s WHERE id=1 and col <= ?", 1000);
+        }
+        else
+        {
+            for (int i = 1; i <= 1000; i++)
+                execute("DELETE FROM %s WHERE id=1 and col = ?", i);
+        }
+        flush();
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 1, row(1, 2000, "2000"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 1, row(1, 2000, "2000"), row(1, 1999, "1999"));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 2, allRows);
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 1000 LIMIT 1", 1, row(1, 2000, "2000"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col <= 2000 LIMIT 1", 1, row(1, 2000, "2000"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 1000", 1, allRows);
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col <= 2000", 2, allRows);
+    }
+
+    @Test
+    public void testDeletionOnIndexedSSTableASC() throws Throwable
+    {
+        testDeletionOnIndexedSSTableASC(true);
+        testDeletionOnIndexedSSTableASC(false);
+    }
+
+    private void testDeletionOnIndexedSSTableASC(boolean deleteWithRange) throws Throwable
+    {
+        // reduce the column index size so that columns get indexed during flush
+        DatabaseDescriptor.setColumnIndexSize(1);
+
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col ASC)");
+
+        for (int i = 1; i <= 1000; i++)
+        {
+            execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, i, Integer.toString(i));
+        }
+        flush();
+
+        Object[][] allRows = new Object[1000][];
+        for (int i = 1001; i <= 2000; i++)
+        {
+            execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, i, Integer.toString(i));
+            allRows[i - 1001] = row(1, i, Integer.toString(i));
+        }
+        flush();
+
+        if (deleteWithRange)
+        {
+            execute("DELETE FROM %s WHERE id =1 and col <= ?", 1000);
+        }
+        else
+        {
+            for (int i = 1; i <= 1000; i++)
+                execute("DELETE FROM %s WHERE id=1 and col = ?", i);
+        }
+        flush();
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 3, row(1, 1001, "1001"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 3, row(1, 1001, "1001"), row(1, 1002, "1002"));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 3, allRows);
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 1000 LIMIT 1", 2, row(1, 1001, "1001"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col <= 2000 LIMIT 1", 3, row(1, 1001, "1001"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 1000", 2, allRows);
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col <= 2000", 3, allRows);
+    }
+
+    @Test
+    public void testDeletionOnOverlappingIndexedSSTable() throws Throwable
+    {
+        testDeletionOnOverlappingIndexedSSTable(true);
+        testDeletionOnOverlappingIndexedSSTable(false);
+    }
+
+    private void testDeletionOnOverlappingIndexedSSTable(boolean deleteWithRange) throws Throwable
+    {
+        // reduce the column index size so that columns get indexed during flush
+        DatabaseDescriptor.setColumnIndexSize(1);
+
+        createTable("CREATE TABLE %s (id int, col int, val1 text, val2 text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col ASC)");
+
+        for (int i = 1; i <= 500; i++)
+        {
+            if (i % 2 == 0)
+                execute("INSERT INTO %s (id, col, val1) VALUES (?, ?, ?)", 1, i, Integer.toString(i));
+            else
+                execute("INSERT INTO %s (id, col, val1, val2) VALUES (?, ?, ?, ?)", 1, i, Integer.toString(i), Integer.toString(i));
+        }
+
+        for (int i = 1001; i <= 1500; i++)
+        {
+            if (i % 2 == 0)
+                execute("INSERT INTO %s (id, col, val1) VALUES (?, ?, ?)", 1, i, Integer.toString(i));
+            else
+                execute("INSERT INTO %s (id, col, val1, val2) VALUES (?, ?, ?, ?)", 1, i, Integer.toString(i), Integer.toString(i));
+        }
+
+        flush();
+
+        for (int i = 501; i <= 1000; i++)
+        {
+            if (i % 2 == 0)
+                execute("INSERT INTO %s (id, col, val1) VALUES (?, ?, ?)", 1, i, Integer.toString(i));
+            else
+                execute("INSERT INTO %s (id, col, val1, val2) VALUES (?, ?, ?, ?)", 1, i, Integer.toString(i), Integer.toString(i));
+        }
+
+        for (int i = 1501; i <= 2000; i++)
+        {
+            if (i % 2 == 0)
+                execute("INSERT INTO %s (id, col, val1) VALUES (?, ?, ?)", 1, i, Integer.toString(i));
+            else
+                execute("INSERT INTO %s (id, col, val1, val2) VALUES (?, ?, ?, ?)", 1, i, Integer.toString(i), Integer.toString(i));
+        }
+
+        if (deleteWithRange)
+        {
+            execute("DELETE FROM %s WHERE id=1 and col > ? and col <= ?", 250, 750);
+        }
+        else
+        {
+            for (int i = 251; i <= 750; i++)
+                execute("DELETE FROM %s WHERE id=1 and col = ?", i);
+        }
+
+        flush();
+
+        Object[][] allRows = new Object[1500][]; // non deleted rows
+        for (int i = 1; i <= 2000; i++)
+        {
+            if (i > 250 && i <= 750)
+                continue; // skip deleted records
+
+            int idx = (i <= 250 ? i - 1 : i - 501);
+
+            if (i % 2 == 0)
+                allRows[idx] = row(1, i, Integer.toString(i), null);
+            else
+                allRows[idx] = row(1, i, Integer.toString(i), Integer.toString(i));
+        }
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 1", 2, row(1, 1, "1", "1"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 LIMIT 2", 2, row(1, 1, "1", "1"), row(1, 2, "2", null));
+
+        executeAndCheck("SELECT * FROM %s WHERE id=1", 2, allRows);
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 1000 LIMIT 1", 2, row(1, 1001, "1001", "1001"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col <= 2000 LIMIT 1", 2, row(1, 1, "1", "1"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col > 500 LIMIT 1", 2, row(1, 751, "751", "751"));
+        executeAndCheck("SELECT * FROM %s WHERE id=1 AND col <= 500 LIMIT 1", 2, row(1, 1, "1", "1"));
+    }
+
+    @Test
+    public void testMultiplePartitionsDESC() throws Throwable
+    {
+        createTable("CREATE TABLE %s (id int, col int, val text, PRIMARY KEY (id, col)) WITH CLUSTERING ORDER BY (col DESC)");
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 10, "10");
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 2, 10, "10");
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 3, 10, "10");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 20, "20");
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 2, 20, "20");
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 3, 20, "20");
+        flush();
+
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 1, 30, "30");
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 2, 30, "30");
+        execute("INSERT INTO %s (id, col, val) VALUES (?, ?, ?)", 3, 30, "30");
+        flush();
+
+        for (int i = 1; i <= 3; i++)
+        {
+            String base = "SELECT * FROM %s ";
+
+            executeAndCheck(base + String.format("WHERE id=%d LIMIT 1", i), 1, row(i, 30, "30"));
+            executeAndCheck(base + String.format("WHERE id=%d LIMIT 2", i), 2, row(i, 30, "30"), row(i, 20, "20"));
+            executeAndCheck(base + String.format("WHERE id=%d LIMIT 3", i), 3, row(i, 30, "30"), row(i, 20, "20"), row(i, 10, "10"));
+            executeAndCheck(base + String.format("WHERE id=%d", i), 3, row(i, 30, "30"), row(i, 20, "20"), row(i, 10, "10"));
+
+            executeAndCheck(base + String.format("WHERE id=%d AND col > 25 LIMIT 1", i), 1, row(i, 30, "30"));
+            executeAndCheck(base + String.format("WHERE id=%d AND col < 40 LIMIT 1", i), 1, row(i, 30, "30"));
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/AggregationTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/AggregationTest.java
index 411d5ee..24a9528 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/AggregationTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/AggregationTest.java

@@ -126,6 +126,7 @@
         assertRows(execute("SELECT COUNT(b), count(c), count(e), count(f) FROM %s LIMIT 2"), row(4L, 3L, 3L, 3L));
         assertRows(execute("SELECT COUNT(b), count(c), count(e), count(f) FROM %s WHERE a = 1 LIMIT 2"),
                    row(4L, 3L, 3L, 3L));
+        assertRows(execute("SELECT AVG(CAST(b AS double)) FROM %s"), row(11.0/4));
     }
 
     @Test
@@ -147,9 +148,6 @@
         assertColumnNames(execute("SELECT COUNT(1) as myCount FROM %s"), "mycount");
         assertRows(execute("SELECT COUNT(1) as myCount FROM %s"), row(0L));
 
-        // Test invalid call
-        assertInvalidSyntaxMessage("Only COUNT(1) is supported, got COUNT(2)", "SELECT COUNT(2) FROM %s");
-
         // Test with other aggregates
         assertColumnNames(execute("SELECT COUNT(*), max(b), b FROM %s"), "count", "system.max(b)", "b");
         assertRows(execute("SELECT COUNT(*), max(b), b  FROM %s"), row(0L, null, null));
@@ -249,7 +247,7 @@
         assertRows(execute("SELECT count(b.x), max(b.x) as max, b.x, c.x as first FROM %s"),
                    row(3L, 8, 2, null));
 
-        assertInvalidMessage("Invalid field selection: max(b) of type blob is not a user type",
+        assertInvalidMessage("Invalid field selection: system.max(b) of type blob is not a user type",
                              "SELECT max(b).x as max FROM %s");
     }
 
@@ -352,7 +350,6 @@
 
         assertInvalidSyntax("SELECT max(b), max(c) FROM %s WHERE max(a) = 1");
         assertInvalidMessage("aggregate functions cannot be used as arguments of aggregate functions", "SELECT max(sum(c)) FROM %s");
-        assertInvalidSyntax("SELECT COUNT(2) FROM %s");
     }
 
     @Test

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/AlterTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/AlterTest.java
index 509aeac..bb4bf48 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/AlterTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/AlterTest.java

@@ -323,6 +323,87 @@
         assertInvalidThrow(InvalidRequestException.class, "ALTER TABLE %s ALTER column1 TYPE ascii");
     }
 
+    /*
+     * Test case to check addition of one column
+    */
+    @Test
+    public void testAlterAddOneColumn() throws Throwable
+    {
+        createTable("CREATE TABLE IF NOT EXISTS %s (id int, name text, PRIMARY KEY (id))");
+        alterTable("ALTER TABLE %s add mail text;");
+
+        assertColumnNames(execute("SELECT * FROM %s"), "id", "mail", "name");
+    }
+
+    /*
+     * Test case to check addition of more than one column
+     */
+    @Test
+    public void testAlterAddMultiColumn() throws Throwable
+    {
+        createTable("CREATE TABLE IF NOT EXISTS %s (id int, yearofbirth int, PRIMARY KEY (id))");
+        alterTable("ALTER TABLE %s add (firstname text, password blob, lastname text, \"SOME escaped col\" bigint)");
+
+        assertColumnNames(execute("SELECT * FROM %s"), "id", "SOME escaped col", "firstname", "lastname", "password", "yearofbirth");
+    }
+
+    /*
+     *  Should throw SyntaxException if multiple columns are added using wrong syntax.
+     *  Expected Syntax : Alter table T1 add (C1 datatype,C2 datatype,C3 datatype)
+     */
+    @Test(expected = SyntaxException.class)
+    public void testAlterAddMultiColumnWithoutBraces() throws Throwable
+    {
+        execute("ALTER TABLE %s.users add lastname text, password blob, yearofbirth int;");
+    }
+
+    /*
+     *  Test case to check deletion of one column
+     */
+    @Test
+    public void testAlterDropOneColumn() throws Throwable
+    {
+        createTable("CREATE TABLE IF NOT EXISTS %s (id text, telephone int, yearofbirth int, PRIMARY KEY (id))");
+        alterTable("ALTER TABLE %s drop telephone");
+
+        assertColumnNames(execute("SELECT * FROM %s"), "id", "yearofbirth");
+    }
+
+    @Test
+    /*
+     * Test case to check deletion of more than one column
+     */
+    public void testAlterDropMultiColumn() throws Throwable
+    {
+        createTable("CREATE TABLE IF NOT EXISTS %s (id text, address text, telephone int, yearofbirth int, \"SOME escaped col\" bigint, PRIMARY KEY (id))");
+        alterTable("ALTER TABLE %s drop (address, telephone, \"SOME escaped col\");");
+
+        assertColumnNames(execute("SELECT * FROM %s"), "id", "yearofbirth");
+    }
+
+    /*
+     *  Should throw SyntaxException if multiple columns are dropped using wrong syntax.
+     */
+    @Test(expected = SyntaxException.class)
+    public void testAlterDeletionColumnWithoutBraces() throws Throwable
+    {
+        execute("ALTER TABLE %s.users drop name,address;");
+    }
+
+    @Test(expected = InvalidRequestException.class)
+    public void testAlterAddDuplicateColumn() throws Throwable
+    {
+        createTable("CREATE TABLE IF NOT EXISTS %s (id text, address text, telephone int, yearofbirth int, PRIMARY KEY (id))");
+        execute("ALTER TABLE %s add (salary int, salary int);");
+    }
+
+    @Test(expected = InvalidRequestException.class)
+    public void testAlterDropDuplicateColumn() throws Throwable
+    {
+        createTable("CREATE TABLE IF NOT EXISTS %s (id text, address text, telephone int, yearofbirth int, PRIMARY KEY (id))");
+        execute("ALTER TABLE %s drop (address, address);");
+    }
+
     @Test
     public void testAlterToBlob() throws Throwable
     {

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/CreateTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/CreateTest.java
index 33a41d8..8f92403 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/CreateTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/CreateTest.java

@@ -23,7 +23,6 @@
 
 import org.junit.Test;
 
-
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.cql3.CQLTester;
@@ -36,10 +35,10 @@
 import org.apache.cassandra.utils.ByteBufferUtil;
 
 import static java.lang.String.format;
-import static junit.framework.Assert.assertFalse;
-import static junit.framework.Assert.fail;
 import static junit.framework.Assert.assertEquals;
+import static junit.framework.Assert.assertFalse;
 import static junit.framework.Assert.assertTrue;
+import static junit.framework.Assert.fail;
 
 public class CreateTest extends CQLTester
 {

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/DeleteTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/DeleteTest.java
index 814e822..9b92ebb 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/DeleteTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/DeleteTest.java

@@ -326,6 +326,27 @@
 
         assertEmpty(execute("select * from %s  where a=1 and b=1"));
     }
+
+    /** Test that two deleted rows for the same partition but on different sstables do not resurface */
+    @Test
+    public void testDeletedRowsDoNotResurface() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int, b int, c text, primary key (a, b))");
+        execute("INSERT INTO %s (a, b, c) VALUES(1, 1, '1')");
+        execute("INSERT INTO %s (a, b, c) VALUES(1, 2, '2')");
+        execute("INSERT INTO %s (a, b, c) VALUES(1, 3, '3')");
+        flush();
+
+        execute("DELETE FROM %s where a=1 and b = 1");
+        flush();
+
+        execute("DELETE FROM %s where a=1 and b = 2");
+        flush();
+
+        assertRows(execute("SELECT * FROM %s WHERE a = ?", 1),
+                   row(1, 3, "3"));
+    }
+
     @Test
     public void testDeleteWithNoClusteringColumns() throws Throwable
     {
@@ -376,10 +397,10 @@
                                  "DELETE FROM %s WHERE partitionKey = ? AND partitionKey = ?", 0, 1);
 
             // unknown identifiers
-            assertInvalidMessage("Unknown identifier unknown",
+            assertInvalidMessage("Undefined column name unknown",
                                  "DELETE unknown FROM %s WHERE partitionKey = ?", 0);
 
-            assertInvalidMessage("Undefined name partitionkey1 in where clause ('partitionkey1 = ?')",
+            assertInvalidMessage("Undefined column name partitionkey1",
                                  "DELETE FROM %s WHERE partitionKey1 = ?", 0);
 
             // Invalid operator in the where clause
@@ -465,13 +486,13 @@
                                  "DELETE FROM %s WHERE partitionKey = ? AND clustering = ? AND clustering = ?", 0, 1, 1);
 
             // unknown identifiers
-            assertInvalidMessage("Unknown identifier value1",
+            assertInvalidMessage("Undefined column name value1",
                                  "DELETE value1 FROM %s WHERE partitionKey = ? AND clustering = ?", 0, 1);
 
-            assertInvalidMessage("Undefined name partitionkey1 in where clause ('partitionkey1 = ?')",
+            assertInvalidMessage("Undefined column name partitionkey1",
                                  "DELETE FROM %s WHERE partitionKey1 = ? AND clustering = ?", 0, 1);
 
-            assertInvalidMessage("Undefined name clustering_3 in where clause ('clustering_3 = ?')",
+            assertInvalidMessage("Undefined column name clustering_3",
                                  "DELETE FROM %s WHERE partitionKey = ? AND clustering_3 = ?", 0, 1);
 
             // Invalid operator in the where clause
@@ -595,13 +616,13 @@
                                  "DELETE FROM %s WHERE partitionKey = ? AND clustering_1 = ? AND clustering_2 = ? AND clustering_1 = ?", 0, 1, 1, 1);
 
             // unknown identifiers
-            assertInvalidMessage("Unknown identifier value1",
+            assertInvalidMessage("Undefined column name value1",
                                  "DELETE value1 FROM %s WHERE partitionKey = ? AND clustering_1 = ? AND clustering_2 = ?", 0, 1, 1);
 
-            assertInvalidMessage("Undefined name partitionkey1 in where clause ('partitionkey1 = ?')",
+            assertInvalidMessage("Undefined column name partitionkey1",
                                  "DELETE FROM %s WHERE partitionKey1 = ? AND clustering_1 = ? AND clustering_2 = ?", 0, 1, 1);
 
-            assertInvalidMessage("Undefined name clustering_3 in where clause ('clustering_3 = ?')",
+            assertInvalidMessage("Undefined column name clustering_3",
                                  "DELETE FROM %s WHERE partitionKey = ? AND clustering_1 = ? AND clustering_3 = ?", 0, 1, 1);
 
             // Invalid operator in the where clause
@@ -618,6 +639,39 @@
     }
 
     @Test
+    public void testDeleteWithNonoverlappingRange() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int, b int, c text, primary key (a, b))");
+
+        for (int i = 0; i < 10; i++)
+            execute("INSERT INTO %s (a, b, c) VALUES(1, ?, 'abc')", i);
+        flush();
+
+        execute("DELETE FROM %s WHERE a=1 and b <= 3");
+        flush();
+
+        // this query does not overlap the tombstone range above and caused the rows to be resurrected
+        assertEmpty(execute("SELECT * FROM %s WHERE a=1 and b <= 2"));
+    }
+
+    @Test
+    public void testDeleteWithIntermediateRangeAndOneClusteringColumn() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int, b int, c text, primary key (a, b))");
+        execute("INSERT INTO %s (a, b, c) VALUES(1, 1, '1')");
+        execute("INSERT INTO %s (a, b, c) VALUES(1, 3, '3')");
+        execute("DELETE FROM %s where a=1 and b >= 2 and b <= 3");
+        execute("INSERT INTO %s (a, b, c) VALUES(1, 2, '2')");
+        flush();
+
+        execute("DELETE FROM %s where a=1 and b >= 2 and b <= 3");
+        flush();
+
+        assertRows(execute("SELECT * FROM %s WHERE a = ?", 1),
+                   row(1, 1, "1"));
+    }
+
+    @Test
     public void testDeleteWithRangeAndOneClusteringColumn() throws Throwable
     {
         testDeleteWithRangeAndOneClusteringColumn(false);
@@ -1052,12 +1106,6 @@
         assertRows(execute("SELECT * FROM %s"), row(0, null));
     }
 
-    private void flush(boolean forceFlush)
-    {
-        if (forceFlush)
-            flush();
-    }
-
     @Test
     public void testDeleteAndReverseQueries() throws Throwable
     {

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/InsertTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/InsertTest.java
index a030613..9adcb62 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/InsertTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/InsertTest.java

@@ -18,9 +18,12 @@
 
 package org.apache.cassandra.cql3.validation.operations;
 
+import org.junit.Assert;
 import org.junit.Test;
 
 import org.apache.cassandra.cql3.CQLTester;
+import org.apache.cassandra.cql3.UntypedResultSet;
+import org.apache.cassandra.cql3.UntypedResultSet.Row;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 
 public class InsertTest extends CQLTester
@@ -95,10 +98,10 @@
                              "INSERT INTO %s (partitionKey, clustering, clustering, value) VALUES (0, 0, 0, 2)");
 
         // unknown identifiers
-        assertInvalidMessage("Unknown identifier clusteringx",
+        assertInvalidMessage("Undefined column name clusteringx",
                              "INSERT INTO %s (partitionKey, clusteringx, value) VALUES (0, 0, 2)");
 
-        assertInvalidMessage("Unknown identifier valuex",
+        assertInvalidMessage("Undefined column name valuex",
                              "INSERT INTO %s (partitionKey, clustering, valuex) VALUES (0, 0, 2)");
     }
 
@@ -143,10 +146,10 @@
                              "INSERT INTO %s (partitionKey, clustering, clustering, value) VALUES (0, 0, 0, 2)");
 
         // unknown identifiers
-        assertInvalidMessage("Unknown identifier clusteringx",
+        assertInvalidMessage("Undefined column name clusteringx",
                              "INSERT INTO %s (partitionKey, clusteringx, value) VALUES (0, 0, 2)");
 
-        assertInvalidMessage("Unknown identifier valuex",
+        assertInvalidMessage("Undefined column name valuex",
                              "INSERT INTO %s (partitionKey, clustering, valuex) VALUES (0, 0, 2)");
     }
 
@@ -188,10 +191,10 @@
                              "INSERT INTO %s (partitionKey, clustering_1, clustering_1, clustering_2, value) VALUES (0, 0, 0, 0, 2)");
 
         // unknown identifiers
-        assertInvalidMessage("Unknown identifier clustering_1x",
+        assertInvalidMessage("Undefined column name clustering_1x",
                              "INSERT INTO %s (partitionKey, clustering_1x, clustering_2, value) VALUES (0, 0, 0, 2)");
 
-        assertInvalidMessage("Unknown identifier valuex",
+        assertInvalidMessage("Undefined column name valuex",
                              "INSERT INTO %s (partitionKey, clustering_1, clustering_2, valuex) VALUES (0, 0, 0, 2)");
     }
 
@@ -241,10 +244,10 @@
                              "INSERT INTO %s (partitionKey, clustering_1, clustering_1, clustering_2, value) VALUES (0, 0, 0, 0, 2)");
 
         // unknown identifiers
-        assertInvalidMessage("Unknown identifier clustering_1x",
+        assertInvalidMessage("Undefined column name clustering_1x",
                              "INSERT INTO %s (partitionKey, clustering_1x, clustering_2, value) VALUES (0, 0, 0, 2)");
 
-        assertInvalidMessage("Unknown identifier valuex",
+        assertInvalidMessage("Undefined column name valuex",
                              "INSERT INTO %s (partitionKey, clustering_1, clustering_2, valuex) VALUES (0, 0, 0, 2)");
     }
 
@@ -285,10 +288,32 @@
                              "INSERT INTO %s (partitionKey, clustering_2, staticValue) VALUES (0, 0, 'A')");
     }
 
-    private void flush(boolean forceFlush)
+    @Test
+    public void testInsertWithDefaultTtl() throws Throwable
     {
-        if (forceFlush)
-            flush();
+        final int secondsPerMinute = 60;
+        createTable("CREATE TABLE %s (a int PRIMARY KEY, b int) WITH default_time_to_live = " + (10 * secondsPerMinute));
+
+        execute("INSERT INTO %s (a, b) VALUES (1, 1)");
+        UntypedResultSet resultSet = execute("SELECT ttl(b) FROM %s WHERE a = 1");
+        Assert.assertEquals(1, resultSet.size());
+        Row row = resultSet.one();
+        Assert.assertTrue(row.getInt("ttl(b)") >= (9 * secondsPerMinute));
+
+        execute("INSERT INTO %s (a, b) VALUES (2, 2) USING TTL ?", (5 * secondsPerMinute));
+        resultSet = execute("SELECT ttl(b) FROM %s WHERE a = 2");
+        Assert.assertEquals(1, resultSet.size());
+        row = resultSet.one();
+        Assert.assertTrue(row.getInt("ttl(b)") <= (5 * secondsPerMinute));
+
+        execute("INSERT INTO %s (a, b) VALUES (3, 3) USING TTL ?", 0);
+        assertRows(execute("SELECT ttl(b) FROM %s WHERE a = 3"), row(new Object[]{null}));
+
+        execute("INSERT INTO %s (a, b) VALUES (4, 4) USING TTL ?", unset());
+        resultSet = execute("SELECT ttl(b) FROM %s WHERE a = 4");
+        Assert.assertEquals(1, resultSet.size());
+        row = resultSet.one();
+        Assert.assertTrue(row.getInt("ttl(b)") >= (9 * secondsPerMinute));
     }
 
     @Test

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/InsertUpdateIfConditionTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/InsertUpdateIfConditionTest.java
index a1ee4f8..b132958 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/InsertUpdateIfConditionTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/InsertUpdateIfConditionTest.java

@@ -497,6 +497,377 @@
         assertRows(execute("INSERT INTO %s (partition, key, owner) VALUES ('a', 'c', 'x') IF NOT EXISTS"), row(true));
     }
 
+    @Test
+    public void testWholeUDT() throws Throwable
+    {
+        String typename = createType("CREATE TYPE %s (a int, b text)");
+        String myType = KEYSPACE + '.' + typename;
+
+        for (boolean frozen : new boolean[] {false, true})
+        {
+            createTable(String.format("CREATE TABLE %%s (k int PRIMARY KEY, v %s)",
+                                      frozen
+                                      ? "frozen<" + myType + ">"
+                                      : myType));
+
+            Object v = userType("a", 0, "b", "abc");
+            execute("INSERT INTO %s (k, v) VALUES (0, ?)", v);
+
+            checkAppliesUDT("v = {a: 0, b: 'abc'}", v);
+            checkAppliesUDT("v != null", v);
+            checkAppliesUDT("v != {a: 1, b: 'abc'}", v);
+            checkAppliesUDT("v != {a: 0, b: 'def'}", v);
+            checkAppliesUDT("v > {a: -1, b: 'abc'}", v);
+            checkAppliesUDT("v > {a: 0, b: 'aaa'}", v);
+            checkAppliesUDT("v > {a: 0}", v);
+            checkAppliesUDT("v >= {a: 0, b: 'aaa'}", v);
+            checkAppliesUDT("v >= {a: 0, b: 'abc'}", v);
+            checkAppliesUDT("v < {a: 0, b: 'zzz'}", v);
+            checkAppliesUDT("v < {a: 1, b: 'abc'}", v);
+            checkAppliesUDT("v < {a: 1}", v);
+            checkAppliesUDT("v <= {a: 0, b: 'zzz'}", v);
+            checkAppliesUDT("v <= {a: 0, b: 'abc'}", v);
+            checkAppliesUDT("v IN (null, {a: 0, b: 'abc'}, {a: 1})", v);
+
+            // multiple conditions
+            checkAppliesUDT("v > {a: -1, b: 'abc'} AND v > {a: 0}", v);
+            checkAppliesUDT("v != null AND v IN ({a: 0, b: 'abc'})", v);
+
+            // should not apply
+            checkDoesNotApplyUDT("v = {a: 0, b: 'def'}", v);
+            checkDoesNotApplyUDT("v = {a: 1, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v = null", v);
+            checkDoesNotApplyUDT("v != {a: 0, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v > {a: 1, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v > {a: 0, b: 'zzz'}", v);
+            checkDoesNotApplyUDT("v >= {a: 1, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v >= {a: 0, b: 'zzz'}", v);
+            checkDoesNotApplyUDT("v < {a: -1, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v < {a: 0, b: 'aaa'}", v);
+            checkDoesNotApplyUDT("v <= {a: -1, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v <= {a: 0, b: 'aaa'}", v);
+            checkDoesNotApplyUDT("v IN ({a: 0}, {b: 'abc'}, {a: 0, b: 'def'}, null)", v);
+            checkDoesNotApplyUDT("v IN ()", v);
+
+            // multiple conditions
+            checkDoesNotApplyUDT("v IN () AND v IN ({a: 0, b: 'abc'})", v);
+            checkDoesNotApplyUDT("v > {a: 0, b: 'aaa'} AND v < {a: 0, b: 'aaa'}", v);
+
+            // invalid conditions
+            checkInvalidUDT("v = {a: 1, b: 'abc', c: 'foo'}", v, InvalidRequestException.class);
+            checkInvalidUDT("v = {foo: 'foo'}", v, InvalidRequestException.class);
+            checkInvalidUDT("v < {a: 1, b: 'abc', c: 'foo'}", v, InvalidRequestException.class);
+            checkInvalidUDT("v < null", v, InvalidRequestException.class);
+            checkInvalidUDT("v <= {a: 1, b: 'abc', c: 'foo'}", v, InvalidRequestException.class);
+            checkInvalidUDT("v <= null", v, InvalidRequestException.class);
+            checkInvalidUDT("v > {a: 1, b: 'abc', c: 'foo'}", v, InvalidRequestException.class);
+            checkInvalidUDT("v > null", v, InvalidRequestException.class);
+            checkInvalidUDT("v >= {a: 1, b: 'abc', c: 'foo'}", v, InvalidRequestException.class);
+            checkInvalidUDT("v >= null", v, InvalidRequestException.class);
+            checkInvalidUDT("v IN null", v, SyntaxException.class);
+            checkInvalidUDT("v IN 367", v, SyntaxException.class);
+            checkInvalidUDT("v CONTAINS KEY 123", v, SyntaxException.class);
+            checkInvalidUDT("v CONTAINS 'bar'", v, SyntaxException.class);
+
+
+            /////////////////// null suffix on stored udt ////////////////////
+            v = userType("a", 0, "b", null);
+            execute("INSERT INTO %s (k, v) VALUES (0, ?)", v);
+
+            checkAppliesUDT("v = {a: 0}", v);
+            checkAppliesUDT("v = {a: 0, b: null}", v);
+            checkAppliesUDT("v != null", v);
+            checkAppliesUDT("v != {a: 1, b: null}", v);
+            checkAppliesUDT("v != {a: 1}", v);
+            checkAppliesUDT("v != {a: 0, b: 'def'}", v);
+            checkAppliesUDT("v > {a: -1, b: 'abc'}", v);
+            checkAppliesUDT("v > {a: -1}", v);
+            checkAppliesUDT("v >= {a: 0}", v);
+            checkAppliesUDT("v >= {a: -1, b: 'abc'}", v);
+            checkAppliesUDT("v < {a: 0, b: 'zzz'}", v);
+            checkAppliesUDT("v < {a: 1, b: 'abc'}", v);
+            checkAppliesUDT("v < {a: 1}", v);
+            checkAppliesUDT("v <= {a: 0, b: 'zzz'}", v);
+            checkAppliesUDT("v <= {a: 0}", v);
+            checkAppliesUDT("v IN (null, {a: 0, b: 'abc'}, {a: 0})", v);
+
+            // multiple conditions
+            checkAppliesUDT("v > {a: -1, b: 'abc'} AND v >= {a: 0}", v);
+            checkAppliesUDT("v != null AND v IN ({a: 0}, {a: 0, b: null})", v);
+
+            // should not apply
+            checkDoesNotApplyUDT("v = {a: 0, b: 'def'}", v);
+            checkDoesNotApplyUDT("v = {a: 1}", v);
+            checkDoesNotApplyUDT("v = {b: 'abc'}", v);
+            checkDoesNotApplyUDT("v = null", v);
+            checkDoesNotApplyUDT("v != {a: 0}", v);
+            checkDoesNotApplyUDT("v != {a: 0, b: null}", v);
+            checkDoesNotApplyUDT("v > {a: 1, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v > {a: 0}", v);
+            checkDoesNotApplyUDT("v >= {a: 1, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v >= {a: 1}", v);
+            checkDoesNotApplyUDT("v < {a: -1, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v < {a: -1}", v);
+            checkDoesNotApplyUDT("v < {a: 0}", v);
+            checkDoesNotApplyUDT("v <= {a: -1, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v <= {a: -1}", v);
+            checkDoesNotApplyUDT("v IN ({a: 1}, {b: 'abc'}, {a: 0, b: 'def'}, null)", v);
+            checkDoesNotApplyUDT("v IN ()", v);
+
+            // multiple conditions
+            checkDoesNotApplyUDT("v IN () AND v IN ({a: 0})", v);
+            checkDoesNotApplyUDT("v > {a: -1} AND v < {a: 0}", v);
+
+
+            /////////////////// null prefix on stored udt ////////////////////
+            v = userType("a", null, "b", "abc");
+            execute("INSERT INTO %s (k, v) VALUES (0, ?)", v);
+
+            checkAppliesUDT("v = {a: null, b: 'abc'}", v);
+            checkAppliesUDT("v = {b: 'abc'}", v);
+            checkAppliesUDT("v != null", v);
+            checkAppliesUDT("v != {a: 0, b: 'abc'}", v);
+            checkAppliesUDT("v != {a: 0}", v);
+            checkAppliesUDT("v != {b: 'def'}", v);
+            checkAppliesUDT("v > {a: null, b: 'aaa'}", v);
+            checkAppliesUDT("v > {b: 'aaa'}", v);
+            checkAppliesUDT("v >= {a: null, b: 'aaa'}", v);
+            checkAppliesUDT("v >= {b: 'abc'}", v);
+            checkAppliesUDT("v < {a: null, b: 'zzz'}", v);
+            checkAppliesUDT("v < {a: 0, b: 'abc'}", v);
+            checkAppliesUDT("v < {a: 0}", v);
+            checkAppliesUDT("v < {b: 'zzz'}", v);
+            checkAppliesUDT("v <= {a: null, b: 'zzz'}", v);
+            checkAppliesUDT("v <= {a: 0}", v);
+            checkAppliesUDT("v <= {b: 'abc'}", v);
+            checkAppliesUDT("v IN (null, {a: null, b: 'abc'}, {a: 0})", v);
+            checkAppliesUDT("v IN (null, {a: 0, b: 'abc'}, {b: 'abc'})", v);
+
+            // multiple conditions
+            checkAppliesUDT("v > {b: 'aaa'} AND v >= {b: 'abc'}", v);
+            checkAppliesUDT("v != null AND v IN ({a: 0}, {a: null, b: 'abc'})", v);
+
+            // should not apply
+            checkDoesNotApplyUDT("v = {a: 0, b: 'def'}", v);
+            checkDoesNotApplyUDT("v = {a: 1}", v);
+            checkDoesNotApplyUDT("v = {b: 'def'}", v);
+            checkDoesNotApplyUDT("v = null", v);
+            checkDoesNotApplyUDT("v != {b: 'abc'}", v);
+            checkDoesNotApplyUDT("v != {a: null, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v > {a: 1, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v > {a: null, b: 'zzz'}", v);
+            checkDoesNotApplyUDT("v > {b: 'zzz'}", v);
+            checkDoesNotApplyUDT("v >= {a: null, b: 'zzz'}", v);
+            checkDoesNotApplyUDT("v >= {a: 1}", v);
+            checkDoesNotApplyUDT("v >= {b: 'zzz'}", v);
+            checkDoesNotApplyUDT("v < {a: null, b: 'aaa'}", v);
+            checkDoesNotApplyUDT("v < {b: 'aaa'}", v);
+            checkDoesNotApplyUDT("v <= {a: null, b: 'aaa'}", v);
+            checkDoesNotApplyUDT("v <= {b: 'aaa'}", v);
+            checkDoesNotApplyUDT("v IN ({a: 1}, {a: 1, b: 'abc'}, {a: null, b: 'def'}, null)", v);
+            checkDoesNotApplyUDT("v IN ()", v);
+
+            // multiple conditions
+            checkDoesNotApplyUDT("v IN () AND v IN ({b: 'abc'})", v);
+            checkDoesNotApplyUDT("v IN () AND v IN ({a: null, b: 'abc'})", v);
+            checkDoesNotApplyUDT("v > {a: -1} AND v < {a: 0}", v);
+
+
+            /////////////////// null udt ////////////////////
+            v = null;
+            execute("INSERT INTO %s (k, v) VALUES (0, ?)", v);
+
+            checkAppliesUDT("v = null", v);
+            checkAppliesUDT("v IN (null, {a: null, b: 'abc'}, {a: 0})", v);
+            checkAppliesUDT("v IN (null, {a: 0, b: 'abc'}, {b: 'abc'})", v);
+
+            // multiple conditions
+            checkAppliesUDT("v = null AND v IN (null, {a: 0}, {a: null, b: 'abc'})", v);
+
+            // should not apply
+            checkDoesNotApplyUDT("v = {a: 0, b: 'def'}", v);
+            checkDoesNotApplyUDT("v = {a: 1}", v);
+            checkDoesNotApplyUDT("v = {b: 'def'}", v);
+            checkDoesNotApplyUDT("v != null", v);
+            checkDoesNotApplyUDT("v > {a: 1, b: 'abc'}", v);
+            checkDoesNotApplyUDT("v > {a: null, b: 'zzz'}", v);
+            checkDoesNotApplyUDT("v > {b: 'zzz'}", v);
+            checkDoesNotApplyUDT("v >= {a: null, b: 'zzz'}", v);
+            checkDoesNotApplyUDT("v >= {a: 1}", v);
+            checkDoesNotApplyUDT("v >= {b: 'zzz'}", v);
+            checkDoesNotApplyUDT("v < {a: null, b: 'aaa'}", v);
+            checkDoesNotApplyUDT("v < {b: 'aaa'}", v);
+            checkDoesNotApplyUDT("v <= {a: null, b: 'aaa'}", v);
+            checkDoesNotApplyUDT("v <= {b: 'aaa'}", v);
+            checkDoesNotApplyUDT("v IN ({a: 1}, {a: 1, b: 'abc'}, {a: null, b: 'def'})", v);
+            checkDoesNotApplyUDT("v IN ()", v);
+
+            // multiple conditions
+            checkDoesNotApplyUDT("v IN () AND v IN ({b: 'abc'})", v);
+            checkDoesNotApplyUDT("v > {a: -1} AND v < {a: 0}", v);
+
+        }
+    }
+
+    @Test
+    public void testUDTField() throws Throwable
+    {
+        String typename = createType("CREATE TYPE %s (a int, b text)");
+        String myType = KEYSPACE + '.' + typename;
+
+        for (boolean frozen : new boolean[] {false, true})
+        {
+            createTable(String.format("CREATE TABLE %%s (k int PRIMARY KEY, v %s)",
+                                      frozen
+                                      ? "frozen<" + myType + ">"
+                                      : myType));
+
+            Object v = userType("a", 0, "b", "abc");
+            execute("INSERT INTO %s (k, v) VALUES (0, ?)", v);
+
+            checkAppliesUDT("v.a = 0", v);
+            checkAppliesUDT("v.b = 'abc'", v);
+            checkAppliesUDT("v.a < 1", v);
+            checkAppliesUDT("v.b < 'zzz'", v);
+            checkAppliesUDT("v.b <= 'bar'", v);
+            checkAppliesUDT("v.b > 'aaa'", v);
+            checkAppliesUDT("v.b >= 'abc'", v);
+            checkAppliesUDT("v.a != -1", v);
+            checkAppliesUDT("v.b != 'xxx'", v);
+            checkAppliesUDT("v.a != null", v);
+            checkAppliesUDT("v.b != null", v);
+            checkAppliesUDT("v.a IN (null, 0, 1)", v);
+            checkAppliesUDT("v.b IN (null, 'xxx', 'abc')", v);
+            checkAppliesUDT("v.b > 'aaa' AND v.b < 'zzz'", v);
+            checkAppliesUDT("v.a = 0 AND v.b > 'aaa'", v);
+
+            // do not apply
+            checkDoesNotApplyUDT("v.a = -1", v);
+            checkDoesNotApplyUDT("v.b = 'xxx'", v);
+            checkDoesNotApplyUDT("v.a < -1", v);
+            checkDoesNotApplyUDT("v.b < 'aaa'", v);
+            checkDoesNotApplyUDT("v.b <= 'aaa'", v);
+            checkDoesNotApplyUDT("v.b > 'zzz'", v);
+            checkDoesNotApplyUDT("v.b >= 'zzz'", v);
+            checkDoesNotApplyUDT("v.a != 0", v);
+            checkDoesNotApplyUDT("v.b != 'abc'", v);
+            checkDoesNotApplyUDT("v.a IN (null, -1)", v);
+            checkDoesNotApplyUDT("v.b IN (null, 'xxx')", v);
+            checkDoesNotApplyUDT("v.a IN ()", v);
+            checkDoesNotApplyUDT("v.b IN ()", v);
+            checkDoesNotApplyUDT("v.b != null AND v.b IN ()", v);
+
+            // invalid
+            checkInvalidUDT("v.c = null", v, InvalidRequestException.class);
+            checkInvalidUDT("v.a < null", v, InvalidRequestException.class);
+            checkInvalidUDT("v.a <= null", v, InvalidRequestException.class);
+            checkInvalidUDT("v.a > null", v, InvalidRequestException.class);
+            checkInvalidUDT("v.a >= null", v, InvalidRequestException.class);
+            checkInvalidUDT("v.a IN null", v, SyntaxException.class);
+            checkInvalidUDT("v.a IN 367", v, SyntaxException.class);
+            checkInvalidUDT("v.b IN (1, 2, 3)", v, InvalidRequestException.class);
+            checkInvalidUDT("v.a CONTAINS 367", v, SyntaxException.class);
+            checkInvalidUDT("v.a CONTAINS KEY 367", v, SyntaxException.class);
+
+
+            /////////////// null suffix on udt ////////////////
+            v = userType("a", 0, "b", null);
+            execute("INSERT INTO %s (k, v) VALUES (0, ?)", v);
+
+            checkAppliesUDT("v.a = 0", v);
+            checkAppliesUDT("v.b = null", v);
+            checkAppliesUDT("v.b != 'xxx'", v);
+            checkAppliesUDT("v.a != null", v);
+            checkAppliesUDT("v.a IN (null, 0, 1)", v);
+            checkAppliesUDT("v.b IN (null, 'xxx', 'abc')", v);
+            checkAppliesUDT("v.a = 0 AND v.b = null", v);
+
+            // do not apply
+            checkDoesNotApplyUDT("v.b = 'abc'", v);
+            checkDoesNotApplyUDT("v.a < -1", v);
+            checkDoesNotApplyUDT("v.b < 'aaa'", v);
+            checkDoesNotApplyUDT("v.b <= 'aaa'", v);
+            checkDoesNotApplyUDT("v.b > 'zzz'", v);
+            checkDoesNotApplyUDT("v.b >= 'zzz'", v);
+            checkDoesNotApplyUDT("v.a != 0", v);
+            checkDoesNotApplyUDT("v.b != null", v);
+            checkDoesNotApplyUDT("v.a IN (null, -1)", v);
+            checkDoesNotApplyUDT("v.b IN ('xxx', 'abc')", v);
+            checkDoesNotApplyUDT("v.a IN ()", v);
+            checkDoesNotApplyUDT("v.b IN ()", v);
+            checkDoesNotApplyUDT("v.b != null AND v.b IN ()", v);
+
+
+            /////////////// null prefix on udt ////////////////
+            v = userType("a", null, "b", "abc");
+            execute("INSERT INTO %s (k, v) VALUES (0, ?)", v);
+
+            checkAppliesUDT("v.a = null", v);
+            checkAppliesUDT("v.b = 'abc'", v);
+            checkAppliesUDT("v.a != 0", v);
+            checkAppliesUDT("v.b != null", v);
+            checkAppliesUDT("v.a IN (null, 0, 1)", v);
+            checkAppliesUDT("v.b IN (null, 'xxx', 'abc')", v);
+            checkAppliesUDT("v.a = null AND v.b = 'abc'", v);
+
+            // do not apply
+            checkDoesNotApplyUDT("v.a = 0", v);
+            checkDoesNotApplyUDT("v.a < -1", v);
+            checkDoesNotApplyUDT("v.b >= 'zzz'", v);
+            checkDoesNotApplyUDT("v.a != null", v);
+            checkDoesNotApplyUDT("v.b != 'abc'", v);
+            checkDoesNotApplyUDT("v.a IN (-1, 0)", v);
+            checkDoesNotApplyUDT("v.b IN (null, 'xxx')", v);
+            checkDoesNotApplyUDT("v.a IN ()", v);
+            checkDoesNotApplyUDT("v.b IN ()", v);
+            checkDoesNotApplyUDT("v.b != null AND v.b IN ()", v);
+
+
+            /////////////// null udt ////////////////
+            v = null;
+            execute("INSERT INTO %s (k, v) VALUES (0, ?)", v);
+
+            checkAppliesUDT("v.a = null", v);
+            checkAppliesUDT("v.b = null", v);
+            checkAppliesUDT("v.a != 0", v);
+            checkAppliesUDT("v.b != 'abc'", v);
+            checkAppliesUDT("v.a IN (null, 0, 1)", v);
+            checkAppliesUDT("v.b IN (null, 'xxx', 'abc')", v);
+            checkAppliesUDT("v.a = null AND v.b = null", v);
+
+            // do not apply
+            checkDoesNotApplyUDT("v.a = 0", v);
+            checkDoesNotApplyUDT("v.a < -1", v);
+            checkDoesNotApplyUDT("v.b >= 'zzz'", v);
+            checkDoesNotApplyUDT("v.a != null", v);
+            checkDoesNotApplyUDT("v.b != null", v);
+            checkDoesNotApplyUDT("v.a IN (-1, 0)", v);
+            checkDoesNotApplyUDT("v.b IN ('xxx', 'abc')", v);
+            checkDoesNotApplyUDT("v.a IN ()", v);
+            checkDoesNotApplyUDT("v.b IN ()", v);
+            checkDoesNotApplyUDT("v.b != null AND v.b IN ()", v);
+        }
+    }
+
+    void checkAppliesUDT(String condition, Object value) throws Throwable
+    {
+        assertRows(execute("UPDATE %s SET v = ? WHERE k = 0 IF " + condition, value), row(true));
+        assertRows(execute("SELECT * FROM %s"), row(0, value));
+    }
+
+    void checkDoesNotApplyUDT(String condition, Object value) throws Throwable
+    {
+        assertRows(execute("UPDATE %s SET v = ? WHERE k = 0 IF " + condition, value),
+                   row(false, value));
+        assertRows(execute("SELECT * FROM %s"), row(0, value));
+    }
+
+    void checkInvalidUDT(String condition, Object value, Class<? extends Throwable> expected) throws Throwable
+    {
+        assertInvalidThrow(expected, "UPDATE %s SET v = ?  WHERE k = 0 IF " + condition, value);
+        assertRows(execute("SELECT * FROM %s"), row(0, value));
+    }
+
     /**
      * Migrated from cql_tests.py:TestCQL.whole_list_conditional_test()
      */

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectLimitTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectLimitTest.java
index aeb3d56..21c48dd 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectLimitTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectLimitTest.java

@@ -26,7 +26,6 @@
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.cql3.CQLTester;
 import org.apache.cassandra.dht.ByteOrderedPartitioner;
-import org.apache.cassandra.exceptions.InvalidRequestException;
 
 public class SelectLimitTest extends CQLTester
 {
@@ -135,6 +134,118 @@
     }
 
     @Test
+    public void testPerPartitionLimit() throws Throwable
+    {
+        perPartitionLimitTest(false);
+    }
+
+    @Test
+    public void testPerPartitionLimitWithCompactStorage() throws Throwable
+    {
+        perPartitionLimitTest(true);
+    }
+
+    private void perPartitionLimitTest(boolean withCompactStorage) throws Throwable
+    {
+        String query = "CREATE TABLE %s (a int, b int, c int, PRIMARY KEY (a, b))";
+
+        if (withCompactStorage)
+            createTable(query + " WITH COMPACT STORAGE");
+        else
+            createTable(query);
+
+        for (int i = 0; i < 5; i++)
+        {
+            for (int j = 0; j < 5; j++)
+            {
+                execute("INSERT INTO %s (a, b, c) VALUES (?, ?, ?)", i, j, j);
+            }
+        }
+
+        assertInvalidMessage("LIMIT must be strictly positive",
+                             "SELECT * FROM %s PER PARTITION LIMIT ?", 0);
+        assertInvalidMessage("LIMIT must be strictly positive",
+                             "SELECT * FROM %s PER PARTITION LIMIT ?", -1);
+
+        assertRowsIgnoringOrder(execute("SELECT * FROM %s PER PARTITION LIMIT ?", 2),
+                                row(0, 0, 0),
+                                row(0, 1, 1),
+                                row(1, 0, 0),
+                                row(1, 1, 1),
+                                row(2, 0, 0),
+                                row(2, 1, 1),
+                                row(3, 0, 0),
+                                row(3, 1, 1),
+                                row(4, 0, 0),
+                                row(4, 1, 1));
+
+        // Combined Per Partition and "global" limit
+        assertRowCount(execute("SELECT * FROM %s PER PARTITION LIMIT ? LIMIT ?", 2, 6),
+                       6);
+
+        // odd amount of results
+        assertRowCount(execute("SELECT * FROM %s PER PARTITION LIMIT ? LIMIT ?", 2, 5),
+                       5);
+
+        // IN query
+        assertRows(execute("SELECT * FROM %s WHERE a IN (2,3) PER PARTITION LIMIT ?", 2),
+                   row(2, 0, 0),
+                   row(2, 1, 1),
+                   row(3, 0, 0),
+                   row(3, 1, 1));
+
+        assertRows(execute("SELECT * FROM %s WHERE a IN (2,3) PER PARTITION LIMIT ? LIMIT 3", 2),
+                   row(2, 0, 0),
+                   row(2, 1, 1),
+                   row(3, 0, 0));
+
+        assertRows(execute("SELECT * FROM %s WHERE a IN (1,2,3) PER PARTITION LIMIT ? LIMIT 3", 2),
+                   row(1, 0, 0),
+                   row(1, 1, 1),
+                   row(2, 0, 0));
+
+        // with restricted partition key
+        assertRows(execute("SELECT * FROM %s WHERE a = ? PER PARTITION LIMIT ?", 2, 3),
+                   row(2, 0, 0),
+                   row(2, 1, 1),
+                   row(2, 2, 2));
+
+        // with ordering
+        assertRows(execute("SELECT * FROM %s WHERE a IN (3, 2) ORDER BY b DESC PER PARTITION LIMIT ?", 2),
+                   row(2, 4, 4),
+                   row(3, 4, 4),
+                   row(2, 3, 3),
+                   row(3, 3, 3));
+
+        assertRows(execute("SELECT * FROM %s WHERE a IN (3, 2) ORDER BY b DESC PER PARTITION LIMIT ? LIMIT ?", 3, 4),
+                   row(2, 4, 4),
+                   row(3, 4, 4),
+                   row(2, 3, 3),
+                   row(3, 3, 3));
+
+        assertRows(execute("SELECT * FROM %s WHERE a = ? ORDER BY b DESC PER PARTITION LIMIT ?", 2, 3),
+                   row(2, 4, 4),
+                   row(2, 3, 3),
+                   row(2, 2, 2));
+
+        // with filtering
+        assertRows(execute("SELECT * FROM %s WHERE a = ? AND b > ? PER PARTITION LIMIT ? ALLOW FILTERING", 2, 0, 2),
+                   row(2, 1, 1),
+                   row(2, 2, 2));
+
+        assertRows(execute("SELECT * FROM %s WHERE a = ? AND b > ? ORDER BY b DESC PER PARTITION LIMIT ? ALLOW FILTERING", 2, 2, 2),
+                   row(2, 4, 4),
+                   row(2, 3, 3));
+
+        assertInvalidMessage("PER PARTITION LIMIT is not allowed with SELECT DISTINCT queries",
+                             "SELECT DISTINCT a FROM %s PER PARTITION LIMIT ?", 3);
+        assertInvalidMessage("PER PARTITION LIMIT is not allowed with SELECT DISTINCT queries",
+                             "SELECT DISTINCT a FROM %s PER PARTITION LIMIT ? LIMIT ?", 3, 4);
+        assertInvalidMessage("PER PARTITION LIMIT is not allowed with aggregate queries.",
+                             "SELECT COUNT(*) FROM %s PER PARTITION LIMIT ?", 3);
+    }
+
+    @Test
     public void testLimitWithDeletedRowsAndStaticColumns() throws Throwable
     {
         createTable("CREATE TABLE %s (pk int, c int, v int, s int static, PRIMARY KEY (pk, c))");

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectMultiColumnRelationTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectMultiColumnRelationTest.java
index ce74fe2..7f43c6b 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectMultiColumnRelationTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectMultiColumnRelationTest.java

@@ -103,16 +103,16 @@
 
             assertInvalidMessage("Clustering column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
                                  "SELECT * FROM %s WHERE a = ? AND b > ?  AND (c, d) > (?, ?)", 0, 0, 0, 0);
-            assertInvalidMessage("Clustering column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
+            assertInvalidMessage("PRIMARY KEY column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
                                  "SELECT * FROM %s WHERE a = ? AND (c, d) > (?, ?) AND b > ?  ", 0, 0, 0, 0);
 
             assertInvalidMessage("Column \"c\" cannot be restricted by two inequalities not starting with the same column",
                                  "SELECT * FROM %s WHERE a = ? AND (b, c) > (?, ?) AND (b) < (?) AND (c) < (?)", 0, 0, 0, 0, 0);
             assertInvalidMessage("Column \"c\" cannot be restricted by two inequalities not starting with the same column",
                                  "SELECT * FROM %s WHERE a = ? AND (c) < (?) AND (b, c) > (?, ?) AND (b) < (?)", 0, 0, 0, 0, 0);
-            assertInvalidMessage("Column \"c\" cannot be restricted by two inequalities not starting with the same column",
+            assertInvalidMessage("Clustering column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
                                  "SELECT * FROM %s WHERE a = ? AND (b) < (?) AND (c) < (?) AND (b, c) > (?, ?)", 0, 0, 0, 0, 0);
-            assertInvalidMessage("Column \"c\" cannot be restricted by two inequalities not starting with the same column",
+            assertInvalidMessage("Clustering column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
                                  "SELECT * FROM %s WHERE a = ? AND (b) < (?) AND c < ? AND (b, c) > (?, ?)", 0, 0, 0, 0, 0);
 
             assertInvalidMessage("Column \"c\" cannot be restricted by two inequalities not starting with the same column",
@@ -885,6 +885,19 @@
     }
 
     @Test
+    public void testMultipleClusteringWithIndexAndValueOver64K() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int, b blob, c int, d int, PRIMARY KEY (a, b, c))");
+        createIndex("CREATE INDEX ON %s (b)");
+
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 0, ByteBufferUtil.bytes(1), 0, 0);
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 0, ByteBufferUtil.bytes(2), 1, 0);
+
+        assertInvalidMessage("Index expression values may not be larger than 64K",
+                             "SELECT * FROM %s WHERE (b, c) = (?, ?) AND d = ?  ALLOW FILTERING", TOO_BIG, 1, 2);
+    }
+
+    @Test
     public void testMultiColumnRestrictionsWithIndex() throws Throwable
     {
         createTable("CREATE TABLE %s (a int, b int, c int, d int, e int, v int, PRIMARY KEY (a, b, c, d, e))");
@@ -912,19 +925,6 @@
     }
 
     @Test
-    public void testMultipleClusteringWithIndexAndValueOver64K() throws Throwable
-    {
-        createTable("CREATE TABLE %s (a int, b blob, c int, d int, PRIMARY KEY (a, b, c))");
-        createIndex("CREATE INDEX ON %s (b)");
-
-        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 0, ByteBufferUtil.bytes(1), 0, 0);
-        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 0, ByteBufferUtil.bytes(2), 1, 0);
-
-        assertInvalidMessage("Index expression values may not be larger than 64K",
-                             "SELECT * FROM %s WHERE (b, c) = (?, ?) AND d = ?  ALLOW FILTERING", TOO_BIG, 1, 2);
-    }
-
-    @Test
     public void testMultiplePartitionKeyAndMultiClusteringWithIndex() throws Throwable
     {
         createTable("CREATE TABLE %s (a int, b int, c int, d int, e int, f int, PRIMARY KEY ((a, b), c, d, e))");
@@ -1936,11 +1936,11 @@
     public void testInvalidColumnNames() throws Throwable
     {
         createTable("CREATE TABLE %s (a int, b int, c int, d int, PRIMARY KEY (a, b, c))");
-        assertInvalidMessage("Undefined name e in where clause ('(b, e) = (0, 0)')", "SELECT * FROM %s WHERE (b, e) = (0, 0)");
-        assertInvalidMessage("Undefined name e in where clause ('(b, e) IN ((0, 1), (2, 4))')", "SELECT * FROM %s WHERE (b, e) IN ((0, 1), (2, 4))");
-        assertInvalidMessage("Undefined name e in where clause ('(b, e) > (0, 1)')", "SELECT * FROM %s WHERE (b, e) > (0, 1) and b <= 2");
-        assertInvalidMessage("Aliases aren't allowed in the where clause ('(b, e) = (0, 0)')", "SELECT c AS e FROM %s WHERE (b, e) = (0, 0)");
-        assertInvalidMessage("Aliases aren't allowed in the where clause ('(b, e) IN ((0, 1), (2, 4))')", "SELECT c AS e FROM %s WHERE (b, e) IN ((0, 1), (2, 4))");
-        assertInvalidMessage("Aliases aren't allowed in the where clause ('(b, e) > (0, 1)')", "SELECT c AS e FROM %s WHERE (b, e) > (0, 1) and b <= 2");
+        assertInvalidMessage("Undefined column name e", "SELECT * FROM %s WHERE (b, e) = (0, 0)");
+        assertInvalidMessage("Undefined column name e", "SELECT * FROM %s WHERE (b, e) IN ((0, 1), (2, 4))");
+        assertInvalidMessage("Undefined column name e", "SELECT * FROM %s WHERE (b, e) > (0, 1) and b <= 2");
+        assertInvalidMessage("Undefined column name e", "SELECT c AS e FROM %s WHERE (b, e) = (0, 0)");
+        assertInvalidMessage("Undefined column name e", "SELECT c AS e FROM %s WHERE (b, e) IN ((0, 1), (2, 4))");
+        assertInvalidMessage("Undefined column name e", "SELECT c AS e FROM %s WHERE (b, e) > (0, 1) and b <= 2");
     }
  }

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectOrderedPartitionerTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectOrderedPartitionerTest.java
index 5e82020..83e7e47 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectOrderedPartitionerTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectOrderedPartitionerTest.java

@@ -503,9 +503,9 @@
     public void testTokenFunctionWithInvalidColumnNames() throws Throwable
     {
         createTable("CREATE TABLE %s (a int, b int, c int, d int, PRIMARY KEY ((a, b), c))");
-        assertInvalidMessage("Undefined name e in where clause ('token(a, e) = token(0, 0)')", "SELECT * FROM %s WHERE token(a, e) = token(0, 0)");
-        assertInvalidMessage("Undefined name e in where clause ('token(a, e) > token(0, 1)')", "SELECT * FROM %s WHERE token(a, e) > token(0, 1)");
-        assertInvalidMessage("Aliases aren't allowed in the where clause ('token(a, e) = token(0, 0)')", "SELECT b AS e FROM %s WHERE token(a, e) = token(0, 0)");
-        assertInvalidMessage("Aliases aren't allowed in the where clause ('token(a, e) > token(0, 1)')", "SELECT b AS e FROM %s WHERE token(a, e) > token(0, 1)");
+        assertInvalidMessage("Undefined column name e", "SELECT * FROM %s WHERE token(a, e) = token(0, 0)");
+        assertInvalidMessage("Undefined column name e", "SELECT * FROM %s WHERE token(a, e) > token(0, 1)");
+        assertInvalidMessage("Undefined column name e", "SELECT b AS e FROM %s WHERE token(a, e) = token(0, 0)");
+        assertInvalidMessage("Undefined column name e", "SELECT b AS e FROM %s WHERE token(a, e) > token(0, 1)");
     }
 }

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectSingleColumnRelationTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectSingleColumnRelationTest.java
index 4beb1fb..0e2517b 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectSingleColumnRelationTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectSingleColumnRelationTest.java

@@ -620,16 +620,16 @@
     public void testInvalidColumnNames() throws Throwable
     {
         createTable("CREATE TABLE %s (a int, b int, c map<int, int>, PRIMARY KEY (a, b))");
-        assertInvalidMessage("Undefined name d in where clause ('d = 0')", "SELECT * FROM %s WHERE d = 0");
-        assertInvalidMessage("Undefined name d in where clause ('d IN [0, 1]')", "SELECT * FROM %s WHERE d IN (0, 1)");
-        assertInvalidMessage("Undefined name d in where clause ('d > 0')", "SELECT * FROM %s WHERE d > 0 and d <= 2");
-        assertInvalidMessage("Undefined name d in where clause ('d CONTAINS 0')", "SELECT * FROM %s WHERE d CONTAINS 0");
-        assertInvalidMessage("Undefined name d in where clause ('d CONTAINS KEY 0')", "SELECT * FROM %s WHERE d CONTAINS KEY 0");
-        assertInvalidMessage("Aliases aren't allowed in the where clause ('d = 0')", "SELECT a AS d FROM %s WHERE d = 0");
-        assertInvalidMessage("Aliases aren't allowed in the where clause ('d IN [0, 1]')", "SELECT b AS d FROM %s WHERE d IN (0, 1)");
-        assertInvalidMessage("Aliases aren't allowed in the where clause ('d > 0')", "SELECT b AS d FROM %s WHERE d > 0 and d <= 2");
-        assertInvalidMessage("Aliases aren't allowed in the where clause ('d CONTAINS 0')", "SELECT c AS d FROM %s WHERE d CONTAINS 0");
-        assertInvalidMessage("Aliases aren't allowed in the where clause ('d CONTAINS KEY 0')", "SELECT c AS d FROM %s WHERE d CONTAINS KEY 0");
-        assertInvalidMessage("Undefined name d in selection clause", "SELECT d FROM %s WHERE a = 0");
+        assertInvalidMessage("Undefined column name d", "SELECT * FROM %s WHERE d = 0");
+        assertInvalidMessage("Undefined column name d", "SELECT * FROM %s WHERE d IN (0, 1)");
+        assertInvalidMessage("Undefined column name d", "SELECT * FROM %s WHERE d > 0 and d <= 2");
+        assertInvalidMessage("Undefined column name d", "SELECT * FROM %s WHERE d CONTAINS 0");
+        assertInvalidMessage("Undefined column name d", "SELECT * FROM %s WHERE d CONTAINS KEY 0");
+        assertInvalidMessage("Undefined column name d", "SELECT a AS d FROM %s WHERE d = 0");
+        assertInvalidMessage("Undefined column name d", "SELECT b AS d FROM %s WHERE d IN (0, 1)");
+        assertInvalidMessage("Undefined column name d", "SELECT b AS d FROM %s WHERE d > 0 and d <= 2");
+        assertInvalidMessage("Undefined column name d", "SELECT c AS d FROM %s WHERE d CONTAINS 0");
+        assertInvalidMessage("Undefined column name d", "SELECT c AS d FROM %s WHERE d CONTAINS KEY 0");
+        assertInvalidMessage("Undefined column name d", "SELECT d FROM %s WHERE a = 0");
     }
 }

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
index 1b6fe9b..9a1493b 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/SelectTest.java

@@ -23,11 +23,11 @@
 import org.junit.Test;
 
 import junit.framework.Assert;
+import org.apache.cassandra.cql3.CQLTester;
 import org.apache.cassandra.cql3.UntypedResultSet;
 import org.apache.cassandra.cql3.restrictions.StatementRestrictions;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.utils.ByteBufferUtil;
-import org.apache.cassandra.cql3.CQLTester;
 
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertTrue;
@@ -522,11 +522,11 @@
                              "SELECT * FROM %s WHERE account = ? AND id = ? AND categories CONTAINS ?", "test", 5, unset());
 
         assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE account = ? AND id = ? AND categories CONTAINS ? AND categories CONTAINS ?",
-                             "test", 5, "foo", "notPresent");
+                             "SELECT * FROM %s WHERE account = ? AND id = ? AND categories CONTAINS ? AND categories CONTAINS ?"
+                            , "test", 5, "foo", "notPresent");
 
-        assertEmpty(execute("SELECT * FROM %s WHERE account = ? AND id = ? AND categories CONTAINS ? AND categories CONTAINS ? ALLOW FILTERING",
-                            "test", 5, "foo", "notPresent"));
+        assertEmpty(execute("SELECT * FROM %s WHERE account = ? AND id = ? AND categories CONTAINS ? AND categories CONTAINS ? ALLOW FILTERING"
+                           , "test", 5, "foo", "notPresent"));
     }
 
     // See CASSANDRA-7525
@@ -1159,11 +1159,11 @@
         assertEquals(ByteBuffer.wrap(new byte[4]), rs.one().getBlob(rs.metadata().get(0).name.toString()));
 
         // test that select throws a meaningful exception for aliases in where clause
-        assertInvalidMessage("Aliases aren't allowed in the where clause",
+        assertInvalidMessage("Undefined column name user_id",
                              "SELECT id AS user_id, name AS user_name FROM %s WHERE user_id = 0");
 
         // test that select throws a meaningful exception for aliases in order by clause
-        assertInvalidMessage("Aliases are not allowed in order by clause",
+        assertInvalidMessage("Undefined column name user_name",
                              "SELECT id AS user_id, name AS user_name FROM %s WHERE id IN (0) ORDER BY user_name");
     }
 
@@ -1401,11 +1401,11 @@
         for (int i = 0; i < 5; i++)
             execute("INSERT INTO %s (id, name) VALUES (?, ?) USING TTL 10 AND TIMESTAMP 0", i, Integer.toString(i));
 
-        assertInvalidMessage("Aliases aren't allowed in the where clause",
+        assertInvalidMessage("Undefined column name user_id",
                              "SELECT id AS user_id, name AS user_name FROM %s WHERE user_id = 0");
 
         // test that select throws a meaningful exception for aliases in order by clause
-        assertInvalidMessage("Aliases are not allowed in order by clause",
+        assertInvalidMessage("Undefined column name user_name",
                              "SELECT id AS user_id, name AS user_name FROM %s WHERE id IN (0) ORDER BY user_name");
 
     }
@@ -1447,65 +1447,66 @@
         execute("DELETE FROM %s WHERE a = 1 AND b = 1");
         execute("DELETE FROM %s WHERE a = 2 AND b = 2");
 
-        flush();
+        beforeAndAfterFlush(() -> {
 
-        // Checks filtering
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c = 4 AND d = 8");
+            // Checks filtering
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c = 4 AND d = 8");
 
-        assertRows(execute("SELECT * FROM %s WHERE c = 4 AND d = 8 ALLOW FILTERING"),
-                   row(1, 2, 1, 4, 8),
-                   row(1, 4, 1, 4, 8));
+            assertRows(execute("SELECT * FROM %s WHERE c = 4 AND d = 8 ALLOW FILTERING"),
+                       row(1, 2, 1, 4, 8),
+                       row(1, 4, 1, 4, 8));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE a = 1 AND b = 4 AND d = 8");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE a = 1 AND b = 4 AND d = 8");
 
-        assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 4 AND d = 8 ALLOW FILTERING"),
-                   row(1, 4, 1, 4, 8));
+            assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 4 AND d = 8 ALLOW FILTERING"),
+                       row(1, 4, 1, 4, 8));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE s = 1 AND d = 12");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE s = 1 AND d = 12");
 
-        assertRows(execute("SELECT * FROM %s WHERE s = 1 AND d = 12 ALLOW FILTERING"),
-                   row(1, 3, 1, 6, 12));
+            assertRows(execute("SELECT * FROM %s WHERE s = 1 AND d = 12 ALLOW FILTERING"),
+                       row(1, 3, 1, 6, 12));
 
-        assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
-                             "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7)");
+            assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
+                                 "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7)");
 
-        assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
-                             "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7) ALLOW FILTERING");
+            assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
+                                 "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7) ALLOW FILTERING");
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c > 4");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c > 4");
 
-        assertRows(execute("SELECT * FROM %s WHERE c > 4 ALLOW FILTERING"),
-                   row(1, 3, 1, 6, 12),
-                   row(2, 3, 2, 7, 12));
+            assertRows(execute("SELECT * FROM %s WHERE c > 4 ALLOW FILTERING"),
+                       row(1, 3, 1, 6, 12),
+                       row(2, 3, 2, 7, 12));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                "SELECT * FROM %s WHERE s > 1");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE s > 1");
 
-        assertRows(execute("SELECT * FROM %s WHERE s > 1 ALLOW FILTERING"),
-                   row(2, 3, 2, 7, 12),
-                   row(3, null, 3, null, null));
+            assertRows(execute("SELECT * FROM %s WHERE s > 1 ALLOW FILTERING"),
+                       row(2, 3, 2, 7, 12),
+                       row(3, null, 3, null, null));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE b < 3 AND c <= 4");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE b < 3 AND c <= 4");
 
-        assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= 4 ALLOW FILTERING"),
-                   row(1, 2, 1, 4, 8));
+            assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= 4 ALLOW FILTERING"),
+                       row(1, 2, 1, 4, 8));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c >= 3 AND c <= 6");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c >= 3 AND c <= 6");
 
-        assertRows(execute("SELECT * FROM %s WHERE c >= 3 AND c <= 6 ALLOW FILTERING"),
-                   row(1, 2, 1, 4, 8),
-                   row(1, 3, 1, 6, 12),
-                   row(1, 4, 1, 4, 8));
+            assertRows(execute("SELECT * FROM %s WHERE c >= 3 AND c <= 6 ALLOW FILTERING"),
+                       row(1, 2, 1, 4, 8),
+                       row(1, 3, 1, 6, 12),
+                       row(1, 4, 1, 4, 8));
 
-        assertRows(execute("SELECT * FROM %s WHERE s >= 1 LIMIT 2 ALLOW FILTERING"),
-                   row(1, 2, 1, 4, 8),
-                   row(1, 3, 1, 6, 12));
+            assertRows(execute("SELECT * FROM %s WHERE s >= 1 LIMIT 2 ALLOW FILTERING"),
+                       row(1, 2, 1, 4, 8),
+                       row(1, 3, 1, 6, 12));
+        });
 
         // Checks filtering with null
         assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
@@ -1534,24 +1535,6 @@
     }
 
     @Test
-    public void testIndexQueryWithCompositePartitionKey() throws Throwable
-    {
-        createTable("CREATE TABLE %s (p1 int, p2 int, v int, PRIMARY KEY ((p1, p2)))");
-        assertInvalidMessage("Partition key parts: p2 must be restricted as other parts are",
-                             "SELECT * FROM %s WHERE p1 = 1 AND v = 3 ALLOW FILTERING");
-
-        createIndex("CREATE INDEX ON %s(v)");
-
-        execute("INSERT INTO %s(p1, p2, v) values (?, ?, ?)", 1, 1, 3);
-        execute("INSERT INTO %s(p1, p2, v) values (?, ?, ?)", 1, 2, 3);
-        execute("INSERT INTO %s(p1, p2, v) values (?, ?, ?)", 2, 1, 3);
-
-        assertRows(execute("SELECT * FROM %s WHERE p1 = 1 AND v = 3 ALLOW FILTERING"),
-                   row(1, 2, 3),
-                   row(1, 1, 3));
-    }
-
-    @Test
     public void testFilteringOnCompactTablesWithoutIndices() throws Throwable
     {
         //----------------------------------------------
@@ -1570,41 +1553,43 @@
         execute("DELETE FROM %s WHERE a = 1 AND b = 1");
         execute("DELETE FROM %s WHERE a = 2 AND b = 2");
 
-        flush();
+        beforeAndAfterFlush(() -> {
 
-        // Checks filtering
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = 4");
+            // Checks filtering
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = 4");
 
-        assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = 4 ALLOW FILTERING"),
-                   row(1, 4, 4));
+            assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = 4 ALLOW FILTERING"),
+                       row(1, 4, 4));
 
-        assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
-                             "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7)");
+            assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
+                                 "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7)");
 
-        assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
-                             "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7) ALLOW FILTERING");
+            assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
+                                 "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7) ALLOW FILTERING");
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c > 4");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c > 4");
 
-        assertRows(execute("SELECT * FROM %s WHERE c > 4 ALLOW FILTERING"),
-                   row(1, 3, 6),
-                   row(2, 3, 7));
+            assertRows(execute("SELECT * FROM %s WHERE c > 4 ALLOW FILTERING"),
+                       row(1, 3, 6),
+                       row(2, 3, 7));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE b < 3 AND c <= 4");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE b < 3 AND c <= 4");
 
-        assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= 4 ALLOW FILTERING"),
-                   row(1, 2, 4));
+            assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= 4 ALLOW FILTERING"),
+                       row(1, 2, 4));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c >= 3 AND c <= 6");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c >= 3 AND c <= 6");
 
-        assertRows(execute("SELECT * FROM %s WHERE c >= 3 AND c <= 6 ALLOW FILTERING"),
-                   row(1, 2, 4),
-                   row(1, 3, 6),
-                   row(1, 4, 4));
+            assertRows(execute("SELECT * FROM %s WHERE c >= 3 AND c <= 6 ALLOW FILTERING"),
+                       row(1, 2, 4),
+                       row(1, 3, 6),
+                       row(1, 4, 4));
+
+        });
 
         // Checks filtering with null
         assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
@@ -1616,7 +1601,7 @@
         assertInvalidMessage("Unsupported null value for column c",
                              "SELECT * FROM %s WHERE c > null ALLOW FILTERING");
 
-        // // Checks filtering with unset
+        // Checks filtering with unset
         assertInvalidMessage("Unsupported unset value for column c",
                              "SELECT * FROM %s WHERE c = ? ALLOW FILTERING",
                              unset());
@@ -1640,42 +1625,43 @@
         execute("DELETE FROM %s WHERE a = 0");
         execute("DELETE FROM %s WHERE a = 5");
 
-        flush();
+        beforeAndAfterFlush(() -> {
 
-        // Checks filtering
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = 4");
+            // Checks filtering
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = 4");
 
-        assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = 4 ALLOW FILTERING"),
-                   row(1, 2, 4));
+            assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = 4 ALLOW FILTERING"),
+                       row(1, 2, 4));
 
-        assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
-                             "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7)");
+            assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
+                                 "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7)");
 
-        assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
-                             "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7) ALLOW FILTERING");
+            assertInvalidMessage("IN predicates on non-primary-key columns (c) is not yet supported",
+                                 "SELECT * FROM %s WHERE a IN (1, 2) AND c IN (6, 7) ALLOW FILTERING");
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c > 4");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c > 4");
 
-        assertRows(execute("SELECT * FROM %s WHERE c > 4 ALLOW FILTERING"),
-                   row(2, 1, 6),
-                   row(4, 1, 7));
+            assertRows(execute("SELECT * FROM %s WHERE c > 4 ALLOW FILTERING"),
+                       row(2, 1, 6),
+                       row(4, 1, 7));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE b < 3 AND c <= 4");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE b < 3 AND c <= 4");
 
-        assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= 4 ALLOW FILTERING"),
-                   row(1, 2, 4),
-                   row(3, 2, 4));
+            assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= 4 ALLOW FILTERING"),
+                       row(1, 2, 4),
+                       row(3, 2, 4));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c >= 3 AND c <= 6");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c >= 3 AND c <= 6");
 
-        assertRows(execute("SELECT * FROM %s WHERE c >= 3 AND c <= 6 ALLOW FILTERING"),
-                   row(1, 2, 4),
-                   row(2, 1, 6),
-                   row(3, 2, 4));
+            assertRows(execute("SELECT * FROM %s WHERE c >= 3 AND c <= 6 ALLOW FILTERING"),
+                       row(1, 2, 4),
+                       row(2, 1, 6),
+                       row(3, 2, 4));
+        });
 
         // Checks filtering with null
         assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
@@ -1706,50 +1692,51 @@
         execute("INSERT INTO %s (a, b, c, d, e) VALUES (1, 4, [1, 2], {2, 4}, {1: 2})");
         execute("INSERT INTO %s (a, b, c, d, e) VALUES (2, 3, [3, 6], {6, 12}, {3: 6})");
 
-        flush();
+        beforeAndAfterFlush(() -> {
 
-        // Checks filtering for lists
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c CONTAINS 2");
+            // Checks filtering for lists
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c CONTAINS 2");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
-                   row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
+                       row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 3 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 3 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        // Checks filtering for sets
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE d CONTAINS 4");
+            // Checks filtering for sets
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE d CONTAINS 4");
 
-        assertRows(execute("SELECT * FROM %s WHERE d CONTAINS 4 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
-                   row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE d CONTAINS 4 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
+                       row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE d CONTAINS 4 AND d CONTAINS 6 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE d CONTAINS 4 AND d CONTAINS 6 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        // Checks filtering for maps
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE e CONTAINS 2");
+            // Checks filtering for maps
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE e CONTAINS 2");
 
-        assertRows(execute("SELECT * FROM %s WHERE e CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
-                   row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE e CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
+                       row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE e CONTAINS KEY 1 ALLOW FILTERING"),
-                   row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
-                   row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE e CONTAINS KEY 1 ALLOW FILTERING"),
+                       row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
+                       row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE e[1] = 6 ALLOW FILTERING"),
-                   row(1, 2, list(1, 6), set(2, 12), map(1, 6)));
+            assertRows(execute("SELECT * FROM %s WHERE e[1] = 6 ALLOW FILTERING"),
+                       row(1, 2, list(1, 6), set(2, 12), map(1, 6)));
 
-        assertRows(execute("SELECT * FROM %s WHERE e CONTAINS KEY 1 AND e CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE e CONTAINS KEY 1 AND e CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND d CONTAINS 4 AND e CONTAINS KEY 3 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND d CONTAINS 4 AND e CONTAINS KEY 3 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+        });
 
         // Checks filtering with null
         assertInvalidMessage("Unsupported null value for column c",
@@ -1796,100 +1783,101 @@
         execute("INSERT INTO %s (a, b, c, d, e) VALUES (1, 4, [1, 2], {2, 4}, {1: 2})");
         execute("INSERT INTO %s (a, b, c, d, e) VALUES (2, 3, [3, 6], {6, 12}, {3: 6})");
 
-        flush();
+        beforeAndAfterFlush(() -> {
 
-        // Checks filtering for lists
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c = [3, 2]");
+            // Checks filtering for lists
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c = [3, 2]");
 
-        assertRows(execute("SELECT * FROM %s WHERE c = [3, 2] ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c = [3, 2] ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c > [1, 5] AND c < [3, 6]");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c > [1, 5] AND c < [3, 6]");
 
-        assertRows(execute("SELECT * FROM %s WHERE c > [1, 5] AND c < [3, 6] ALLOW FILTERING"),
-                   row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c > [1, 5] AND c < [3, 6] ALLOW FILTERING"),
+                       row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE c >= [1, 6] AND c < [3, 3] ALLOW FILTERING"),
-                   row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c >= [1, 6] AND c < [3, 3] ALLOW FILTERING"),
+                       row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                "SELECT * FROM %s WHERE c CONTAINS 2");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c CONTAINS 2");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
-                   row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
+                       row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 3 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 3 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        // Checks filtering for sets
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE d = {6, 4}");
+            // Checks filtering for sets
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE d = {6, 4}");
 
-        assertRows(execute("SELECT * FROM %s WHERE d = {6, 4} ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE d = {6, 4} ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE d > {4, 5} AND d < {6}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE d > {4, 5} AND d < {6}");
 
-        assertRows(execute("SELECT * FROM %s WHERE d > {4, 5} AND d < {6} ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE d > {4, 5} AND d < {6} ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE d >= {2, 12} AND d <= {4, 6} ALLOW FILTERING"),
-                   row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE d >= {2, 12} AND d <= {4, 6} ALLOW FILTERING"),
+                       row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE d CONTAINS 4");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE d CONTAINS 4");
 
-        assertRows(execute("SELECT * FROM %s WHERE d CONTAINS 4 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
-                   row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE d CONTAINS 4 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
+                       row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE d CONTAINS 4 AND d CONTAINS 6 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE d CONTAINS 4 AND d CONTAINS 6 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        // Checks filtering for maps
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE e = {1 : 2}");
+            // Checks filtering for maps
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE e = {1 : 2}");
 
-        assertRows(execute("SELECT * FROM %s WHERE e = {1 : 2} ALLOW FILTERING"),
-                   row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE e = {1 : 2} ALLOW FILTERING"),
+                       row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                "SELECT * FROM %s WHERE e > {1 : 4} AND e < {3 : 6}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE e > {1 : 4} AND e < {3 : 6}");
 
-        assertRows(execute("SELECT * FROM %s WHERE e > {1 : 4} AND e < {3 : 6} ALLOW FILTERING"),
-                   row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE e > {1 : 4} AND e < {3 : 6} ALLOW FILTERING"),
+                       row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE e >= {1 : 6} AND e <= {3 : 2} ALLOW FILTERING"),
-                   row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE e >= {1 : 6} AND e <= {3 : 2} ALLOW FILTERING"),
+                       row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE e CONTAINS 2");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE e CONTAINS 2");
 
-        assertRows(execute("SELECT * FROM %s WHERE e CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
-                   row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE e CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)),
+                       row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE e CONTAINS KEY 1 ALLOW FILTERING"),
-                   row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
-                   row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE e CONTAINS KEY 1 ALLOW FILTERING"),
+                       row(1, 2, list(1, 6), set(2, 12), map(1, 6)),
+                       row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
 
-        assertInvalidMessage("Map-entry equality predicates on frozen map column e are not supported",
-                             "SELECT * FROM %s WHERE e[1] = 6 ALLOW FILTERING");
+            assertInvalidMessage("Map-entry equality predicates on frozen map column e are not supported",
+                                 "SELECT * FROM %s WHERE e[1] = 6 ALLOW FILTERING");
 
-        assertRows(execute("SELECT * FROM %s WHERE e CONTAINS KEY 1 AND e CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE e CONTAINS KEY 1 AND e CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 4, list(1, 2), set(2, 4), map(1, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND d CONTAINS 4 AND e CONTAINS KEY 3 ALLOW FILTERING"),
-                   row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND d CONTAINS 4 AND e CONTAINS KEY 3 ALLOW FILTERING"),
+                       row(1, 3, list(3, 2), set(6, 4), map(3, 2)));
+        });
 
         // Checks filtering with null
         assertInvalidMessage("Unsupported null value for column c",
@@ -1954,47 +1942,48 @@
         execute("INSERT INTO %s (a, b, c) VALUES (1, 4, [4, 1])");
         execute("INSERT INTO %s (a, b, c) VALUES (2, 3, [7, 1])");
 
-        flush();
+        beforeAndAfterFlush(() -> {
 
-        // Checks filtering
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = [4, 1]");
+            // Checks filtering
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = [4, 1]");
 
-        assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = [4, 1] ALLOW FILTERING"),
-                   row(1, 4, list(4, 1)));
+            assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = [4, 1] ALLOW FILTERING"),
+                       row(1, 4, list(4, 1)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c > [4, 2]");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c > [4, 2]");
 
-        assertRows(execute("SELECT * FROM %s WHERE c > [4, 2] ALLOW FILTERING"),
-                   row(1, 3, list(6, 2)),
-                   row(2, 3, list(7, 1)));
+            assertRows(execute("SELECT * FROM %s WHERE c > [4, 2] ALLOW FILTERING"),
+                       row(1, 3, list(6, 2)),
+                       row(2, 3, list(7, 1)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE b <= 3 AND c < [6, 2]");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE b <= 3 AND c < [6, 2]");
 
-        assertRows(execute("SELECT * FROM %s WHERE b <= 3 AND c < [6, 2] ALLOW FILTERING"),
-                   row(1, 2, list(4, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE b <= 3 AND c < [6, 2] ALLOW FILTERING"),
+                       row(1, 2, list(4, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c >= [4, 2] AND c <= [6, 4]");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c >= [4, 2] AND c <= [6, 4]");
 
-        assertRows(execute("SELECT * FROM %s WHERE c >= [4, 2] AND c <= [6, 4] ALLOW FILTERING"),
-                   row(1, 2, list(4, 2)),
-                   row(1, 3, list(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c >= [4, 2] AND c <= [6, 4] ALLOW FILTERING"),
+                       row(1, 2, list(4, 2)),
+                       row(1, 3, list(6, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c CONTAINS 2");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c CONTAINS 2");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 2, list(4, 2)),
-                   row(1, 3, list(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 2, list(4, 2)),
+                       row(1, 3, list(6, 2)));
 
-        assertInvalidMessage("Cannot use CONTAINS KEY on non-map column c",
-                             "SELECT * FROM %s WHERE c CONTAINS KEY 2 ALLOW FILTERING");
+            assertInvalidMessage("Cannot use CONTAINS KEY on non-map column c",
+                                 "SELECT * FROM %s WHERE c CONTAINS KEY 2 ALLOW FILTERING");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 6 ALLOW FILTERING"),
-                   row(1, 3, list(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 6 ALLOW FILTERING"),
+                       row(1, 3, list(6, 2)));
+        });
 
         // Checks filtering with null
         assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
@@ -2031,47 +2020,48 @@
         execute("INSERT INTO %s (a, b, c) VALUES (3, 2, [4, 1])");
         execute("INSERT INTO %s (a, b, c) VALUES (4, 1, [7, 1])");
 
-        flush();
+        beforeAndAfterFlush(() -> {
 
-        // Checks filtering
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = [4, 2]");
+            // Checks filtering
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = [4, 2]");
 
-        assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = [4, 2] ALLOW FILTERING"),
-                   row(1, 2, list(4, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = [4, 2] ALLOW FILTERING"),
+                       row(1, 2, list(4, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c > [4, 2]");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c > [4, 2]");
 
-        assertRows(execute("SELECT * FROM %s WHERE c > [4, 2] ALLOW FILTERING"),
-                   row(2, 1, list(6, 2)),
-                   row(4, 1, list(7, 1)));
+            assertRows(execute("SELECT * FROM %s WHERE c > [4, 2] ALLOW FILTERING"),
+                       row(2, 1, list(6, 2)),
+                       row(4, 1, list(7, 1)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE b < 3 AND c <= [4, 2]");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE b < 3 AND c <= [4, 2]");
 
-        assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= [4, 2] ALLOW FILTERING"),
-                   row(1, 2, list(4, 2)),
-                   row(3, 2, list(4, 1)));
+            assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= [4, 2] ALLOW FILTERING"),
+                       row(1, 2, list(4, 2)),
+                       row(3, 2, list(4, 1)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c >= [4, 3] AND c <= [7]");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c >= [4, 3] AND c <= [7]");
 
-        assertRows(execute("SELECT * FROM %s WHERE c >= [4, 3] AND c <= [7] ALLOW FILTERING"),
-                   row(2, 1, list(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c >= [4, 3] AND c <= [7] ALLOW FILTERING"),
+                       row(2, 1, list(6, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                "SELECT * FROM %s WHERE c CONTAINS 2");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c CONTAINS 2");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 2, list(4, 2)),
-                   row(2, 1, list(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 2, list(4, 2)),
+                       row(2, 1, list(6, 2)));
 
-        assertInvalidMessage("Cannot use CONTAINS KEY on non-map column c",
-                             "SELECT * FROM %s WHERE c CONTAINS KEY 2 ALLOW FILTERING");
+            assertInvalidMessage("Cannot use CONTAINS KEY on non-map column c",
+                                 "SELECT * FROM %s WHERE c CONTAINS KEY 2 ALLOW FILTERING");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 6 ALLOW FILTERING"),
-                   row(2, 1, list(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 6 ALLOW FILTERING"),
+                       row(2, 1, list(6, 2)));
+        });
 
         // Checks filtering with null
         assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
@@ -2112,48 +2102,48 @@
         execute("INSERT INTO %s (a, b, c) VALUES (1, 4, {4, 1})");
         execute("INSERT INTO %s (a, b, c) VALUES (2, 3, {7, 1})");
 
-        flush();
+        beforeAndAfterFlush(() -> {
 
-        // Checks filtering
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = {4, 1}");
+            // Checks filtering
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = {4, 1}");
 
-        assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = {4, 1} ALLOW FILTERING"),
-                   row(1, 4, set(4, 1)));
+            assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = {4, 1} ALLOW FILTERING"),
+                       row(1, 4, set(4, 1)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c > {4, 2}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c > {4, 2}");
 
-        assertRows(execute("SELECT * FROM %s WHERE c > {4, 2} ALLOW FILTERING"),
-                   row(1, 3, set(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c > {4, 2} ALLOW FILTERING"),
+                       row(1, 3, set(6, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE b <= 3 AND c < {6, 2}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE b <= 3 AND c < {6, 2}");
 
-        assertRows(execute("SELECT * FROM %s WHERE b <= 3 AND c < {6, 2} ALLOW FILTERING"),
-                   row(1, 2, set(2, 4)),
-                   row(2, 3, set(1, 7)));
+            assertRows(execute("SELECT * FROM %s WHERE b <= 3 AND c < {6, 2} ALLOW FILTERING"),
+                       row(1, 2, set(2, 4)),
+                       row(2, 3, set(1, 7)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c >= {4, 2} AND c <= {6, 4}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c >= {4, 2} AND c <= {6, 4}");
 
-        assertRows(execute("SELECT * FROM %s WHERE c >= {4, 2} AND c <= {6, 4} ALLOW FILTERING"),
-                   row(1, 2, set(4, 2)),
-                   row(1, 3, set(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c >= {4, 2} AND c <= {6, 4} ALLOW FILTERING"),
+                       row(1, 2, set(4, 2)),
+                       row(1, 3, set(6, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c CONTAINS 2");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c CONTAINS 2");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 2, set(4, 2)),
-                   row(1, 3, set(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 2, set(4, 2)),
+                       row(1, 3, set(6, 2)));
 
-        assertInvalidMessage("Cannot use CONTAINS KEY on non-map column c",
-                             "SELECT * FROM %s WHERE c CONTAINS KEY 2 ALLOW FILTERING");
+            assertInvalidMessage("Cannot use CONTAINS KEY on non-map column c",
+                                 "SELECT * FROM %s WHERE c CONTAINS KEY 2 ALLOW FILTERING");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 6 ALLOW FILTERING"),
-                   row(1, 3, set(6, 2)));
-
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 6 ALLOW FILTERING"),
+                       row(1, 3, set(6, 2)));
+        });
         // Checks filtering with null
         assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
                              "SELECT * FROM %s WHERE c = null");
@@ -2189,47 +2179,48 @@
         execute("INSERT INTO %s (a, b, c) VALUES (3, 2, {4, 1})");
         execute("INSERT INTO %s (a, b, c) VALUES (4, 1, {7, 1})");
 
-        flush();
+        beforeAndAfterFlush(() -> {
 
-        // Checks filtering
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = {4, 2}");
+            // Checks filtering
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = {4, 2}");
 
-        assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = {4, 2} ALLOW FILTERING"),
-                   row(1, 2, set(4, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = {4, 2} ALLOW FILTERING"),
+                       row(1, 2, set(4, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c > {4, 2}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c > {4, 2}");
 
-        assertRows(execute("SELECT * FROM %s WHERE c > {4, 2} ALLOW FILTERING"),
-                   row(2, 1, set(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c > {4, 2} ALLOW FILTERING"),
+                       row(2, 1, set(6, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE b < 3 AND c <= {4, 2}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE b < 3 AND c <= {4, 2}");
 
-        assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= {4, 2} ALLOW FILTERING"),
-                   row(1, 2, set(4, 2)),
-                   row(4, 1, set(1, 7)),
-                   row(3, 2, set(4, 1)));
+            assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= {4, 2} ALLOW FILTERING"),
+                       row(1, 2, set(4, 2)),
+                       row(4, 1, set(1, 7)),
+                       row(3, 2, set(4, 1)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c >= {4, 3} AND c <= {7}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c >= {4, 3} AND c <= {7}");
 
-        assertRows(execute("SELECT * FROM %s WHERE c >= {5, 2} AND c <= {7} ALLOW FILTERING"),
-                   row(2, 1, set(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c >= {5, 2} AND c <= {7} ALLOW FILTERING"),
+                       row(2, 1, set(6, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                "SELECT * FROM %s WHERE c CONTAINS 2");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c CONTAINS 2");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 2, set(4, 2)),
-                   row(2, 1, set(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 2, set(4, 2)),
+                       row(2, 1, set(6, 2)));
 
-        assertInvalidMessage("Cannot use CONTAINS KEY on non-map column c",
-                             "SELECT * FROM %s WHERE c CONTAINS KEY 2 ALLOW FILTERING");
+            assertInvalidMessage("Cannot use CONTAINS KEY on non-map column c",
+                                 "SELECT * FROM %s WHERE c CONTAINS KEY 2 ALLOW FILTERING");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 6 ALLOW FILTERING"),
-                   row(2, 1, set(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 6 ALLOW FILTERING"),
+                       row(2, 1, set(6, 2)));
+        });
 
         // Checks filtering with null
         assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
@@ -2303,47 +2294,48 @@
         execute("INSERT INTO %s (a, b, c) VALUES (1, 4, {4 : 1})");
         execute("INSERT INTO %s (a, b, c) VALUES (2, 3, {7 : 1})");
 
-        flush();
+        beforeAndAfterFlush(() -> {
 
-        // Checks filtering
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = {4 : 1}");
+            // Checks filtering
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = {4 : 1}");
 
-        assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = {4 : 1} ALLOW FILTERING"),
-                   row(1, 4, map(4, 1)));
+            assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 4 AND c = {4 : 1} ALLOW FILTERING"),
+                       row(1, 4, map(4, 1)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c > {4 : 2}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c > {4 : 2}");
 
-        assertRows(execute("SELECT * FROM %s WHERE c > {4 : 2} ALLOW FILTERING"),
-                   row(1, 3, map(6, 2)),
-                   row(2, 3, map(7, 1)));
+            assertRows(execute("SELECT * FROM %s WHERE c > {4 : 2} ALLOW FILTERING"),
+                       row(1, 3, map(6, 2)),
+                       row(2, 3, map(7, 1)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE b <= 3 AND c < {6 : 2}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE b <= 3 AND c < {6 : 2}");
 
-        assertRows(execute("SELECT * FROM %s WHERE b <= 3 AND c < {6 : 2} ALLOW FILTERING"),
-                   row(1, 2, map(4, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE b <= 3 AND c < {6 : 2} ALLOW FILTERING"),
+                       row(1, 2, map(4, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c >= {4 : 2} AND c <= {6 : 4}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c >= {4 : 2} AND c <= {6 : 4}");
 
-        assertRows(execute("SELECT * FROM %s WHERE c >= {4 : 2} AND c <= {6 : 4} ALLOW FILTERING"),
-                   row(1, 2, map(4, 2)),
-                   row(1, 3, map(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c >= {4 : 2} AND c <= {6 : 4} ALLOW FILTERING"),
+                       row(1, 2, map(4, 2)),
+                       row(1, 3, map(6, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c CONTAINS 2");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c CONTAINS 2");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 2, map(4, 2)),
-                   row(1, 3, map(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 2, map(4, 2)),
+                       row(1, 3, map(6, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS KEY 6 ALLOW FILTERING"),
-                   row(1, 3, map(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS KEY 6 ALLOW FILTERING"),
+                       row(1, 3, map(6, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS KEY 6 ALLOW FILTERING"),
-                   row(1, 3, map(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS KEY 6 ALLOW FILTERING"),
+                       row(1, 3, map(6, 2)));
+        });
 
         // Checks filtering with null
         assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
@@ -2385,48 +2377,49 @@
         execute("INSERT INTO %s (a, b, c) VALUES (3, 2, {4 : 1})");
         execute("INSERT INTO %s (a, b, c) VALUES (4, 1, {7 : 1})");
 
-        flush();
+        beforeAndAfterFlush(() -> {
 
-        // Checks filtering
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = {4 : 2}");
+            // Checks filtering
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = {4 : 2}");
 
-        assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = {4 : 2} ALLOW FILTERING"),
-                   row(1, 2, map(4, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE a = 1 AND b = 2 AND c = {4 : 2} ALLOW FILTERING"),
+                       row(1, 2, map(4, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c > {4 : 2}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c > {4 : 2}");
 
-        assertRows(execute("SELECT * FROM %s WHERE c > {4 : 2} ALLOW FILTERING"),
-                   row(2, 1, map(6, 2)),
-                   row(4, 1, map(7, 1)));
+            assertRows(execute("SELECT * FROM %s WHERE c > {4 : 2} ALLOW FILTERING"),
+                       row(2, 1, map(6, 2)),
+                       row(4, 1, map(7, 1)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE b < 3 AND c <= {4 : 2}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE b < 3 AND c <= {4 : 2}");
 
-        assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= {4 : 2} ALLOW FILTERING"),
-                   row(1, 2, map(4, 2)),
-                   row(3, 2, map(4, 1)));
+            assertRows(execute("SELECT * FROM %s WHERE b < 3 AND c <= {4 : 2} ALLOW FILTERING"),
+                       row(1, 2, map(4, 2)),
+                       row(3, 2, map(4, 1)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                             "SELECT * FROM %s WHERE c >= {4 : 3} AND c <= {7 : 1}");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c >= {4 : 3} AND c <= {7 : 1}");
 
-        assertRows(execute("SELECT * FROM %s WHERE c >= {5 : 2} AND c <= {7 : 0} ALLOW FILTERING"),
-                   row(2, 1, map(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c >= {5 : 2} AND c <= {7 : 0} ALLOW FILTERING"),
+                       row(2, 1, map(6, 2)));
 
-        assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
-                "SELECT * FROM %s WHERE c CONTAINS 2");
+            assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
+                                 "SELECT * FROM %s WHERE c CONTAINS 2");
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
-                   row(1, 2, map(4, 2)),
-                   row(2, 1, map(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 ALLOW FILTERING"),
+                       row(1, 2, map(4, 2)),
+                       row(2, 1, map(6, 2)));
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS KEY 4 ALLOW FILTERING"),
-                   row(1, 2, map(4, 2)),
-                   row(3, 2, map(4, 1)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS KEY 4 ALLOW FILTERING"),
+                       row(1, 2, map(4, 2)),
+                       row(3, 2, map(4, 1)));
 
-        assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS KEY 6 ALLOW FILTERING"),
-                   row(2, 1, map(6, 2)));
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS KEY 6 ALLOW FILTERING"),
+                       row(2, 1, map(6, 2)));
+        });
 
         // Checks filtering with null
         assertInvalidMessage(StatementRestrictions.REQUIRES_ALLOW_FILTERING_MESSAGE,
@@ -2461,6 +2454,417 @@
                              unset());
     }
 
+    @Test
+    public void filteringOnClusteringColumns() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int, b int, c int, d int, PRIMARY KEY (a, b, c))");
+
+        execute("INSERT INTO %s (a,b,c,d) VALUES (11, 12, 13, 14)");
+        execute("INSERT INTO %s (a,b,c,d) VALUES (11, 15, 16, 17)");
+        execute("INSERT INTO %s (a,b,c,d) VALUES (21, 22, 23, 24)");
+        execute("INSERT INTO %s (a,b,c,d) VALUES (31, 32, 33, 34)");
+
+        beforeAndAfterFlush(() -> {
+
+            assertRows(execute("SELECT * FROM %s WHERE a = 11 AND b = 15"),
+                       row(11, 15, 16, 17));
+
+            assertInvalidMessage("Clustering column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
+                                 "SELECT * FROM %s WHERE a = 11 AND b > 12 AND c = 15");
+
+            assertRows(execute("SELECT * FROM %s WHERE a = 11 AND b = 15 AND c > 15"),
+                       row(11, 15, 16, 17));
+
+            assertRows(execute("SELECT * FROM %s WHERE a = 11 AND b > 12 AND c > 13 AND d = 17 ALLOW FILTERING"),
+                       row(11, 15, 16, 17));
+            assertInvalidMessage("Clustering column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
+                                 "SELECT * FROM %s WHERE a = 11 AND b > 12 AND c > 13 and d = 17");
+
+            assertRows(execute("SELECT * FROM %s WHERE b > 20 AND c > 30 ALLOW FILTERING"),
+                       row(31, 32, 33, 34));
+            assertInvalidMessage("Clustering column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
+                                 "SELECT * FROM %s WHERE b > 20 AND c > 30");
+
+            assertRows(execute("SELECT * FROM %s WHERE b > 20 AND c < 30 ALLOW FILTERING"),
+                       row(21, 22, 23, 24));
+            assertInvalidMessage("Clustering column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
+                                 "SELECT * FROM %s WHERE b > 20 AND c < 30");
+
+            assertRows(execute("SELECT * FROM %s WHERE b > 20 AND c = 33 ALLOW FILTERING"),
+                       row(31, 32, 33, 34));
+            assertInvalidMessage("Clustering column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
+                                 "SELECT * FROM %s WHERE b > 20 AND c = 33");
+
+            assertRows(execute("SELECT * FROM %s WHERE c = 33 ALLOW FILTERING"),
+                       row(31, 32, 33, 34));
+            assertInvalidMessage("PRIMARY KEY column \"c\" cannot be restricted as preceding column \"b\" is not restricted",
+                                 "SELECT * FROM %s WHERE c = 33");
+        });
+
+        // --------------------------------------------------
+        // Clustering column within and across partition keys
+        // --------------------------------------------------
+        createTable("CREATE TABLE %s (a int, b int, c int, d int, PRIMARY KEY (a, b, c))");
+
+        execute("INSERT INTO %s (a,b,c,d) VALUES (11, 12, 13, 14)");
+        execute("INSERT INTO %s (a,b,c,d) VALUES (11, 15, 16, 17)");
+        execute("INSERT INTO %s (a,b,c,d) VALUES (11, 18, 19, 20)");
+
+        execute("INSERT INTO %s (a,b,c,d) VALUES (21, 22, 23, 24)");
+        execute("INSERT INTO %s (a,b,c,d) VALUES (21, 25, 26, 27)");
+        execute("INSERT INTO %s (a,b,c,d) VALUES (21, 28, 29, 30)");
+
+        execute("INSERT INTO %s (a,b,c,d) VALUES (31, 32, 33, 34)");
+        execute("INSERT INTO %s (a,b,c,d) VALUES (31, 35, 36, 37)");
+        execute("INSERT INTO %s (a,b,c,d) VALUES (31, 38, 39, 40)");
+
+        beforeAndAfterFlush(() -> {
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE a = 21 AND c > 23"),
+                       row(21, 25, 26, 27),
+                       row(21, 28, 29, 30));
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE a = 21 AND c > 23 ORDER BY b DESC"),
+                       row(21, 28, 29, 30),
+                       row(21, 25, 26, 27));
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE c > 16 and c < 36"),
+                       row(11, 18, 19, 20),
+                       row(21, 22, 23, 24),
+                       row(21, 25, 26, 27),
+                       row(21, 28, 29, 30),
+                       row(31, 32, 33, 34));
+        });
+    }
+
+    @Test
+    public void filteringWithMultiColumnSlices() throws Throwable
+    {
+        //----------------------------------------
+        // Multi-column slices for clustering keys
+        //----------------------------------------
+        createTable("CREATE TABLE %s (a int, b int, c int, d int, e int, PRIMARY KEY (a, b, c, d))");
+
+        execute("INSERT INTO %s (a,b,c,d,e) VALUES (11, 12, 13, 14, 15)");
+        execute("INSERT INTO %s (a,b,c,d,e) VALUES (21, 22, 23, 24, 25)");
+        execute("INSERT INTO %s (a,b,c,d,e) VALUES (31, 32, 33, 34, 35)");
+
+        beforeAndAfterFlush(() -> {
+
+            assertRows(execute("SELECT * FROM %s WHERE b = 22 AND d = 24 ALLOW FILTERING"),
+                       row(21, 22, 23, 24, 25));
+            assertInvalidMessage("PRIMARY KEY column \"d\" cannot be restricted as preceding column \"c\" is not restricted",
+                                 "SELECT * FROM %s WHERE b = 22 AND d = 24");
+
+            assertRows(execute("SELECT * FROM %s WHERE (b, c) > (20, 30) AND d = 34 ALLOW FILTERING"),
+                       row(31, 32, 33, 34, 35));
+            assertInvalidMessage("Clustering column \"d\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
+                                 "SELECT * FROM %s WHERE (b, c) > (20, 30) AND d = 34");
+        });
+    }
+
+    @Test
+    public void containsFilteringForClusteringKeys() throws Throwable
+    {
+        //-------------------------------------------------
+        // Frozen collections filtering for clustering keys
+        //-------------------------------------------------
+
+        // first clustering column
+        createTable("CREATE TABLE %s (a int, b frozen<list<int>>, c int, PRIMARY KEY (a, b, c))");
+        execute("INSERT INTO %s (a,b,c) VALUES (?, ?, ?)", 11, list(1, 3), 14);
+        execute("INSERT INTO %s (a,b,c) VALUES (?, ?, ?)", 21, list(2, 3), 24);
+        execute("INSERT INTO %s (a,b,c) VALUES (?, ?, ?)", 21, list(3, 3), 34);
+
+        beforeAndAfterFlush(() -> {
+
+            assertRows(execute("SELECT * FROM %s WHERE a = 21 AND b CONTAINS 2 ALLOW FILTERING"),
+                       row(21, list(2, 3), 24));
+            assertInvalidMessage("Clustering columns can only be restricted with CONTAINS with a secondary index or filtering",
+                                 "SELECT * FROM %s WHERE a = 21 AND b CONTAINS 2");
+
+            assertRows(execute("SELECT * FROM %s WHERE b CONTAINS 2 ALLOW FILTERING"),
+                       row(21, list(2, 3), 24));
+            assertInvalidMessage("Clustering columns can only be restricted with CONTAINS with a secondary index or filtering",
+                                 "SELECT * FROM %s WHERE b CONTAINS 2");
+
+            assertRows(execute("SELECT * FROM %s WHERE b CONTAINS 3 ALLOW FILTERING"),
+                       row(11, list(1, 3), 14),
+                       row(21, list(2, 3), 24),
+                       row(21, list(3, 3), 34));
+        });
+
+        // non-first clustering column
+        createTable("CREATE TABLE %s (a int, b int, c frozen<list<int>>, d int, PRIMARY KEY (a, b, c))");
+
+        execute("INSERT INTO %s (a,b,c,d) VALUES (?, ?, ?, ?)", 11, 12, list(1, 3), 14);
+        execute("INSERT INTO %s (a,b,c,d) VALUES (?, ?, ?, ?)", 21, 22, list(2, 3), 24);
+        execute("INSERT INTO %s (a,b,c,d) VALUES (?, ?, ?, ?)", 21, 22, list(3, 3), 34);
+
+        beforeAndAfterFlush(() -> {
+
+            assertRows(execute("SELECT * FROM %s WHERE a = 21 AND c CONTAINS 2 ALLOW FILTERING"),
+                       row(21, 22, list(2, 3), 24));
+            assertInvalidMessage("Clustering columns can only be restricted with CONTAINS with a secondary index or filtering",
+                                 "SELECT * FROM %s WHERE a = 21 AND c CONTAINS 2");
+
+            assertRows(execute("SELECT * FROM %s WHERE b > 20 AND c CONTAINS 2 ALLOW FILTERING"),
+                       row(21, 22, list(2, 3), 24));
+            assertInvalidMessage("Clustering column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
+                                 "SELECT * FROM %s WHERE b > 20 AND c CONTAINS 2");
+
+            assertRows(execute("SELECT * FROM %s WHERE c CONTAINS 3 ALLOW FILTERING"),
+                       row(11, 12, list(1, 3), 14),
+                       row(21, 22, list(2, 3), 24),
+                       row(21, 22, list(3, 3), 34));
+        });
+
+        createTable("CREATE TABLE %s (a int, b int, c frozen<map<text, text>>, d int, PRIMARY KEY (a, b, c))");
+
+        execute("INSERT INTO %s (a,b,c,d) VALUES (?, ?, ?, ?)", 11, 12, map("1", "3"), 14);
+        execute("INSERT INTO %s (a,b,c,d) VALUES (?, ?, ?, ?)", 21, 22, map("2", "3"), 24);
+        execute("INSERT INTO %s (a,b,c,d) VALUES (?, ?, ?, ?)", 21, 22, map("3", "3"), 34);
+
+        beforeAndAfterFlush(() -> {
+            assertRows(execute("SELECT * FROM %s WHERE b > 20 AND c CONTAINS KEY '2' ALLOW FILTERING"),
+                       row(21, 22, map("2", "3"), 24));
+            assertInvalidMessage("Clustering column \"c\" cannot be restricted (preceding column \"b\" is restricted by a non-EQ relation)",
+                                 "SELECT * FROM %s WHERE b > 20 AND c CONTAINS KEY '2'");
+        });
+    }
+
+    @Test
+    public void filteringWithOrderClause() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int, b int, c int, d list<int>, PRIMARY KEY (a, b, c))");
+
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 11, 12, 13, list(1,4));
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 21, 22, 23, list(2,4));
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 21, 25, 26, list(2,7));
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 31, 32, 33, list(3,4));
+
+        beforeAndAfterFlush(() -> {
+
+            assertRows(executeFilteringOnly("SELECT a, b, c, d FROM %s WHERE a = 21 AND c > 20 ORDER BY b DESC"),
+                       row(21, 25, 26, list(2, 7)),
+                       row(21, 22, 23, list(2, 4)));
+
+            assertRows(executeFilteringOnly("SELECT a, b, c, d FROM %s WHERE a IN(21, 31) AND c > 20 ORDER BY b DESC"),
+                       row(31, 32, 33, list(3, 4)),
+                       row(21, 25, 26, list(2, 7)),
+                       row(21, 22, 23, list(2, 4)));
+        });
+    }
+
+
+    @Test
+    public void filteringOnStaticColumnTest() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int, b int, c int, d int, s int static, PRIMARY KEY (a, b))");
+
+        execute("INSERT INTO %s (a, b, c, d, s) VALUES (11, 12, 13, 14, 15)");
+        execute("INSERT INTO %s (a, b, c, d, s) VALUES (21, 22, 23, 24, 25)");
+        execute("INSERT INTO %s (a, b, c, d, s) VALUES (21, 26, 27, 28, 29)");
+        execute("INSERT INTO %s (a, b, c, d, s) VALUES (31, 32, 33, 34, 35)");
+        execute("INSERT INTO %s (a, b, c, d, s) VALUES (11, 42, 43, 44, 45)");
+
+        beforeAndAfterFlush(() -> {
+
+            assertRows(executeFilteringOnly("SELECT a, b, c, d, s FROM %s WHERE s = 29"),
+                       row(21, 22, 23, 24, 29),
+                       row(21, 26, 27, 28, 29));
+            assertRows(executeFilteringOnly("SELECT a, b, c, d, s FROM %s WHERE b > 22 AND s = 29"),
+                       row(21, 26, 27, 28, 29));
+            assertRows(executeFilteringOnly("SELECT a, b, c, d, s FROM %s WHERE b > 10 and b < 26 AND s = 29"),
+                       row(21, 22, 23, 24, 29));
+            assertRows(executeFilteringOnly("SELECT a, b, c, d, s FROM %s WHERE c > 10 and c < 27 AND s = 29"),
+                       row(21, 22, 23, 24, 29));
+            assertRows(executeFilteringOnly("SELECT a, b, c, d, s FROM %s WHERE c > 10 and c < 43 AND s = 29"),
+                       row(21, 22, 23, 24, 29),
+                       row(21, 26, 27, 28, 29));
+            assertRows(executeFilteringOnly("SELECT a, b, c, d, s FROM %s WHERE c > 10 AND s > 15 AND s < 45"),
+                       row(21, 22, 23, 24, 29),
+                       row(21, 26, 27, 28, 29),
+                       row(31, 32, 33, 34, 35));
+            assertRows(executeFilteringOnly("SELECT a, b, c, d, s FROM %s WHERE a = 21 AND s > 15 AND s < 45 ORDER BY b DESC"),
+                       row(21, 26, 27, 28, 29),
+                       row(21, 22, 23, 24, 29));
+            assertRows(executeFilteringOnly("SELECT a, b, c, d, s FROM %s WHERE c > 13 and d < 44"),
+                       row(21, 22, 23, 24, 29),
+                       row(21, 26, 27, 28, 29),
+                       row(31, 32, 33, 34, 35));
+        });
+    }
+
+    @Test
+    public void containsFilteringOnNonClusteringColumn() throws Throwable {
+        createTable("CREATE TABLE %s (a int, b int, c int, d list<int>, PRIMARY KEY (a, b, c))");
+
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 11, 12, 13, list(1,4));
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 21, 22, 23, list(2,4));
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 21, 25, 26, list(2,7));
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 31, 32, 33, list(3,4));
+
+        beforeAndAfterFlush(() -> {
+
+            assertRows(executeFilteringOnly("SELECT a, b, c, d FROM %s WHERE b > 20 AND d CONTAINS 2"),
+                       row(21, 22, 23, list(2, 4)),
+                       row(21, 25, 26, list(2, 7)));
+
+            assertRows(executeFilteringOnly("SELECT a, b, c, d FROM %s WHERE b > 20 AND d CONTAINS 2 AND d contains 4"),
+                       row(21, 22, 23, list(2, 4)));
+        });
+    }
+
+    @Test
+    public void filteringOnCompactTable() throws Throwable
+    {
+        createTable("CREATE TABLE %s (a int, b int, c int, d int, PRIMARY KEY (a, b, c)) WITH COMPACT STORAGE");
+
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 11, 12, 13, 14);
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 21, 22, 23, 24);
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 21, 25, 26, 27);
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 31, 32, 33, 34);
+
+        beforeAndAfterFlush(() -> {
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE c > 13"),
+                       row(21, 22, 23, 24),
+                       row(21, 25, 26, 27),
+                       row(31, 32, 33, 34));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE c > 13 AND c < 33"),
+                       row(21, 22, 23, 24),
+                       row(21, 25, 26, 27));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE c > 13 AND b < 32"),
+                       row(21, 22, 23, 24),
+                       row(21, 25, 26, 27));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE a = 21 AND c > 13 AND b < 32 ORDER BY b DESC"),
+                       row(21, 25, 26, 27),
+                       row(21, 22, 23, 24));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE a IN (21, 31) AND c > 13 ORDER BY b DESC"),
+                       row(31, 32, 33, 34),
+                       row(21, 25, 26, 27),
+                       row(21, 22, 23, 24));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE c > 13 AND d < 34"),
+                       row(21, 22, 23, 24),
+                       row(21, 25, 26, 27));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE c > 13"),
+                       row(21, 22, 23, 24),
+                       row(21, 25, 26, 27),
+                       row(31, 32, 33, 34));
+        });
+
+        // with frozen in clustering key
+        createTable("CREATE TABLE %s (a int, b int, c frozen<list<int>>, d int, PRIMARY KEY (a, b, c)) WITH COMPACT STORAGE");
+
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 11, 12, list(1, 3), 14);
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 21, 22, list(2, 3), 24);
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 21, 25, list(2, 6), 27);
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 31, 32, list(3, 3), 34);
+
+        beforeAndAfterFlush(() -> {
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE c CONTAINS 2"),
+                       row(21, 22, list(2, 3), 24),
+                       row(21, 25, list(2, 6), 27));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE c CONTAINS 2 AND b < 25"),
+                       row(21, 22, list(2, 3), 24));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE c CONTAINS 2 AND c CONTAINS 3"),
+                       row(21, 22, list(2, 3), 24));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE b > 12 AND c CONTAINS 2 AND d < 27"),
+                       row(21, 22, list(2, 3), 24));
+        });
+
+        // with frozen in value
+        createTable("CREATE TABLE %s (a int, b int, c int, d frozen<list<int>>, PRIMARY KEY (a, b, c)) WITH COMPACT STORAGE");
+
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 11, 12, 13, list(1, 4));
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 21, 22, 23, list(2, 4));
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 21, 25, 25, list(2, 6));
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", 31, 32, 34, list(3, 4));
+
+        beforeAndAfterFlush(() -> {
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE d CONTAINS 2"),
+                       row(21, 22, 23, list(2, 4)),
+                       row(21, 25, 25, list(2, 6)));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE d CONTAINS 2 AND b < 25"),
+                       row(21, 22, 23, list(2, 4)));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE d CONTAINS 2 AND d CONTAINS 4"),
+                       row(21, 22, 23, list(2, 4)));
+
+            assertRows(executeFilteringOnly("SELECT * FROM %s WHERE b > 12 AND c < 25 AND d CONTAINS 2"),
+                       row(21, 22, 23, list(2, 4)));
+        });
+    }
+
+    @Test
+    public void testCustomIndexWithFiltering() throws Throwable
+    {
+        // Test for CASSANDRA-11310 compatibility with 2i
+        createTable("CREATE TABLE %s (a text, b int, c text, d int, PRIMARY KEY (a, b, c));");
+        createIndex("CREATE INDEX ON %s(c)");
+        
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", "a", 0, "b", 1);
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", "a", 1, "b", 2);
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", "a", 2, "b", 3);
+        execute("INSERT INTO %s (a, b, c, d) VALUES (?, ?, ?, ?)", "c", 3, "b", 4);
+
+        assertRows(executeFilteringOnly("SELECT * FROM %s WHERE a='a' AND b > 0 AND c = 'b'"),
+                   row("a", 1, "b", 2),
+                   row("a", 2, "b", 3));
+    }
+
+    @Test
+    public void testFilteringWithCounters() throws Throwable
+    {
+        for (String compactStorageClause: new String[] {"", " WITH COMPACT STORAGE"})
+        {
+            createTable("CREATE TABLE %s (a int, b int, c int, cnt counter, PRIMARY KEY (a, b, c))" + compactStorageClause);
+
+            execute("UPDATE %s SET cnt = cnt + ? WHERE a = ? AND b = ? AND c = ?", 14L, 11, 12, 13);
+            execute("UPDATE %s SET cnt = cnt + ? WHERE a = ? AND b = ? AND c = ?", 24L, 21, 22, 23);
+            execute("UPDATE %s SET cnt = cnt + ? WHERE a = ? AND b = ? AND c = ?", 27L, 21, 25, 26);
+            execute("UPDATE %s SET cnt = cnt + ? WHERE a = ? AND b = ? AND c = ?", 34L, 31, 32, 33);
+            execute("UPDATE %s SET cnt = cnt + ? WHERE a = ? AND b = ? AND c = ?", 24L, 41, 42, 43);
+
+            beforeAndAfterFlush(() -> {
+
+                assertRows(executeFilteringOnly("SELECT * FROM %s WHERE cnt = 24"),
+                           row(21, 22, 23, 24L),
+                           row(41, 42, 43, 24L));
+                assertRows(executeFilteringOnly("SELECT * FROM %s WHERE b > 22 AND cnt = 24"),
+                           row(41, 42, 43, 24L));
+                assertRows(executeFilteringOnly("SELECT * FROM %s WHERE b > 10 AND b < 25 AND cnt = 24"),
+                           row(21, 22, 23, 24L));
+                assertRows(executeFilteringOnly("SELECT * FROM %s WHERE b > 10 AND c < 25 AND cnt = 24"),
+                           row(21, 22, 23, 24L));
+                assertRows(executeFilteringOnly("SELECT * FROM %s WHERE a = 21 AND b > 10 AND cnt > 23 ORDER BY b DESC"),
+                           row(21, 25, 26, 27L),
+                           row(21, 22, 23, 24L));
+                assertRows(executeFilteringOnly("SELECT * FROM %s WHERE cnt > 20 AND cnt < 30"),
+                           row(21, 22, 23, 24L),
+                           row(21, 25, 26, 27L),
+                           row(41, 42, 43, 24L));
+            });
+        }
+    }
+
+    private UntypedResultSet executeFilteringOnly(String statement) throws Throwable
+    {
+        assertInvalid(statement);
+        return execute(statement + " ALLOW FILTERING");
+    }
+
     /**
      * Check select with and without compact storage, with different column
      * order. See CASSANDRA-10988
@@ -2569,11 +2973,22 @@
 
         assertRows(execute("SELECT * FROM %s WHERE pk = 1 AND  c1 IN(0,1,2) AND c2 = 1 AND v = 3"),
                    row(1, 1, 1, 3, 3));
+    }
 
-        assertInvalidMessage("Clustering column \"c2\" cannot be restricted (preceding column \"c1\" is restricted by a non-EQ relation)",
-                             "SELECT * FROM %s WHERE pk = 1 AND  c1 > 0 AND c1 < 5 AND c2 = 1 ALLOW FILTERING;");
+    @Test
+    public void testIndexQueryWithCompositePartitionKey() throws Throwable
+    {
+        createTable("CREATE TABLE %s (p1 int, p2 int, v int, PRIMARY KEY ((p1, p2)))");
+        assertInvalidMessage("Partition key parts: p2 must be restricted as other parts are",
+                             "SELECT * FROM %s WHERE p1 = 1 AND v = 3 ALLOW FILTERING");
+        createIndex("CREATE INDEX ON %s(v)");
 
-        assertInvalidMessage("PRIMARY KEY column \"c2\" cannot be restricted as preceding column \"c1\" is not restricted",
-                             "SELECT * FROM %s WHERE pk = 1 AND  c2 = 1 ALLOW FILTERING;");
+        execute("INSERT INTO %s(p1, p2, v) values (?, ?, ?)", 1, 1, 3);
+        execute("INSERT INTO %s(p1, p2, v) values (?, ?, ?)", 1, 2, 3);
+        execute("INSERT INTO %s(p1, p2, v) values (?, ?, ?)", 2, 1, 3);
+
+        assertRows(execute("SELECT * FROM %s WHERE p1 = 1 AND v = 3 ALLOW FILTERING"),
+                   row(1, 2, 3),
+                   row(1, 1, 3));
     }
 }

diff --git a/test/unit/org/apache/cassandra/cql3/validation/operations/UpdateTest.java b/test/unit/org/apache/cassandra/cql3/validation/operations/UpdateTest.java
index 0170ed2..9c42fc2 100644
--- a/test/unit/org/apache/cassandra/cql3/validation/operations/UpdateTest.java
+++ b/test/unit/org/apache/cassandra/cql3/validation/operations/UpdateTest.java

@@ -20,10 +20,13 @@
 
 import java.util.Arrays;
 
+import org.junit.Assert;
 import org.junit.Test;
 
 import static org.apache.commons.lang3.StringUtils.isEmpty;
 import org.apache.cassandra.cql3.CQLTester;
+import org.apache.cassandra.cql3.UntypedResultSet;
+import org.apache.cassandra.cql3.UntypedResultSet.Row;
 import org.apache.cassandra.utils.ByteBufferUtil;
 
 public class UpdateTest extends CQLTester
@@ -193,13 +196,13 @@
                                  "UPDATE %s SET value = ? WHERE partitionKey = ? AND clustering_1 = ? AND clustering_1 = ?", 7, 0, 1, 1);
 
             // unknown identifiers
-            assertInvalidMessage("Unknown identifier value1",
+            assertInvalidMessage("Undefined column name value1",
                                  "UPDATE %s SET value1 = ? WHERE partitionKey = ? AND clustering_1 = ?", 7, 0, 1);
 
-            assertInvalidMessage("Undefined name partitionkey1 in where clause ('partitionkey1 = ?')",
+            assertInvalidMessage("Undefined column name partitionkey1",
                                  "UPDATE %s SET value = ? WHERE partitionKey1 = ? AND clustering_1 = ?", 7, 0, 1);
 
-            assertInvalidMessage("Undefined name clustering_3 in where clause ('clustering_3 = ?')",
+            assertInvalidMessage("Undefined column name clustering_3",
                                  "UPDATE %s SET value = ? WHERE partitionKey = ? AND clustering_3 = ?", 7, 0, 1);
 
             // Invalid operator in the where clause
@@ -380,13 +383,13 @@
                                  "UPDATE %s SET value = ? WHERE partitionKey = ? AND clustering_1 = ? AND clustering_2 = ? AND clustering_1 = ?", 7, 0, 1, 1, 1);
 
             // unknown identifiers
-            assertInvalidMessage("Unknown identifier value1",
+            assertInvalidMessage("Undefined column name value1",
                                  "UPDATE %s SET value1 = ? WHERE partitionKey = ? AND clustering_1 = ? AND clustering_2 = ?", 7, 0, 1, 1);
 
-            assertInvalidMessage("Undefined name partitionkey1 in where clause ('partitionkey1 = ?')",
+            assertInvalidMessage("Undefined column name partitionkey1",
                                  "UPDATE %s SET value = ? WHERE partitionKey1 = ? AND clustering_1 = ? AND clustering_2 = ?", 7, 0, 1, 1);
 
-            assertInvalidMessage("Undefined name clustering_3 in where clause ('clustering_3 = ?')",
+            assertInvalidMessage("Undefined column name clustering_3",
                                  "UPDATE %s SET value = ? WHERE partitionKey = ? AND clustering_1 = ? AND clustering_3 = ?", 7, 0, 1, 1);
 
             // Invalid operator in the where clause
@@ -521,9 +524,31 @@
         assertRows(execute("SELECT l FROM %s WHERE k = 0"), row(list("v1", "v4", "v3")));
     }
 
-    private void flush(boolean forceFlush)
+    @Test
+    public void testUpdateWithDefaultTtl() throws Throwable
     {
-        if (forceFlush)
-            flush();
+        final int secondsPerMinute = 60;
+        createTable("CREATE TABLE %s (a int PRIMARY KEY, b int) WITH default_time_to_live = " + (10 * secondsPerMinute));
+
+        execute("UPDATE %s SET b = 1 WHERE a = 1");
+        UntypedResultSet resultSet = execute("SELECT ttl(b) FROM %s WHERE a = 1");
+        Assert.assertEquals(1, resultSet.size());
+        Row row = resultSet.one();
+        Assert.assertTrue(row.getInt("ttl(b)") >= (9 * secondsPerMinute));
+
+        execute("UPDATE %s USING TTL ? SET b = 3 WHERE a = 1", 0);
+        assertRows(execute("SELECT ttl(b) FROM %s WHERE a = 1"), row(new Object[]{null}));
+
+        execute("UPDATE %s SET b = 3 WHERE a = 1");
+        resultSet = execute("SELECT ttl(b) FROM %s WHERE a = 1");
+        Assert.assertEquals(1, resultSet.size());
+        row = resultSet.one();
+        Assert.assertTrue(row.getInt("ttl(b)") >= (9 * secondsPerMinute));
+
+        execute("UPDATE %s USING TTL ? SET b = 2 WHERE a = 2", unset());
+        resultSet = execute("SELECT ttl(b) FROM %s WHERE a = 2");
+        Assert.assertEquals(1, resultSet.size());
+        row = resultSet.one();
+        Assert.assertTrue(row.getInt("ttl(b)") >= (9 * secondsPerMinute));
     }
 }

diff --git a/test/unit/org/apache/cassandra/db/CellTest.java b/test/unit/org/apache/cassandra/db/CellTest.java
index 9072f98..1249989 100644
--- a/test/unit/org/apache/cassandra/db/CellTest.java
+++ b/test/unit/org/apache/cassandra/db/CellTest.java

@@ -53,8 +53,6 @@
                                                              .addRegularColumn("m", MapType.getInstance(IntegerType.instance, IntegerType.instance, true))
                                                              .build();
 
-    private static final CFMetaData fakeMetadata = CFMetaData.createFake("fakeKS", "fakeTable");
-
     @BeforeClass
     public static void defineSchema() throws ConfigurationException
     {
@@ -64,8 +62,8 @@
 
     private static ColumnDefinition fakeColumn(String name, AbstractType<?> type)
     {
-        return new ColumnDefinition(fakeMetadata.ksName,
-                                    fakeMetadata.cfName,
+        return new ColumnDefinition("fakeKs",
+                                    "fakeTable",
                                     ColumnIdentifier.getInterned(name, false),
                                     type,
                                     ColumnDefinition.NO_POSITION,
@@ -127,8 +125,8 @@
 
         // Valid cells
         c = fakeColumn("c", Int32Type.instance);
-        assertValid(BufferCell.live(fakeMetadata, c, 0, ByteBufferUtil.EMPTY_BYTE_BUFFER));
-        assertValid(BufferCell.live(fakeMetadata, c, 0, ByteBufferUtil.bytes(4)));
+        assertValid(BufferCell.live(c, 0, ByteBufferUtil.EMPTY_BYTE_BUFFER));
+        assertValid(BufferCell.live(c, 0, ByteBufferUtil.bytes(4)));
 
         assertValid(BufferCell.expiring(c, 0, 4, 4, ByteBufferUtil.EMPTY_BYTE_BUFFER));
         assertValid(BufferCell.expiring(c, 0, 4, 4, ByteBufferUtil.bytes(4)));
@@ -137,11 +135,11 @@
 
         // Invalid value (we don't all empty values for smallint)
         c = fakeColumn("c", ShortType.instance);
-        assertInvalid(BufferCell.live(fakeMetadata, c, 0, ByteBufferUtil.EMPTY_BYTE_BUFFER));
+        assertInvalid(BufferCell.live(c, 0, ByteBufferUtil.EMPTY_BYTE_BUFFER));
         // But this should be valid even though the underlying value is an empty BB (catches bug #11618)
         assertValid(BufferCell.tombstone(c, 0, 4));
         // And of course, this should be valid with a proper value
-        assertValid(BufferCell.live(fakeMetadata, c, 0, ByteBufferUtil.bytes((short)4)));
+        assertValid(BufferCell.live(c, 0, ByteBufferUtil.bytes((short)4)));
 
         // Invalid ttl
         assertInvalid(BufferCell.expiring(c, 0, -4, 4, ByteBufferUtil.bytes(4)));
@@ -151,9 +149,9 @@
 
         c = fakeColumn("c", MapType.getInstance(Int32Type.instance, Int32Type.instance, true));
         // Valid cell path
-        assertValid(BufferCell.live(fakeMetadata, c, 0, ByteBufferUtil.bytes(4), CellPath.create(ByteBufferUtil.bytes(4))));
+        assertValid(BufferCell.live(c, 0, ByteBufferUtil.bytes(4), CellPath.create(ByteBufferUtil.bytes(4))));
         // Invalid cell path (int values should be 0 or 4 bytes)
-        assertInvalid(BufferCell.live(fakeMetadata, c, 0, ByteBufferUtil.bytes(4), CellPath.create(ByteBufferUtil.bytes((long)4))));
+        assertInvalid(BufferCell.live(c, 0, ByteBufferUtil.bytes(4), CellPath.create(ByteBufferUtil.bytes((long)4))));
     }
 
     @Test
@@ -187,15 +185,15 @@
         long ts1 = now1*1000000;
 
 
-        Cell r1m1 = BufferCell.live(cfm2, m, ts1, bb(1), CellPath.create(bb(1)));
-        Cell r1m2 = BufferCell.live(cfm2, m, ts1, bb(2), CellPath.create(bb(2)));
+        Cell r1m1 = BufferCell.live(m, ts1, bb(1), CellPath.create(bb(1)));
+        Cell r1m2 = BufferCell.live(m, ts1, bb(2), CellPath.create(bb(2)));
         List<Cell> cells1 = Lists.newArrayList(r1m1, r1m2);
 
         int now2 = now1 + 1;
         long ts2 = now2*1000000;
-        Cell r2m2 = BufferCell.live(cfm2, m, ts2, bb(1), CellPath.create(bb(2)));
-        Cell r2m3 = BufferCell.live(cfm2, m, ts2, bb(2), CellPath.create(bb(3)));
-        Cell r2m4 = BufferCell.live(cfm2, m, ts2, bb(3), CellPath.create(bb(4)));
+        Cell r2m2 = BufferCell.live(m, ts2, bb(1), CellPath.create(bb(2)));
+        Cell r2m3 = BufferCell.live(m, ts2, bb(2), CellPath.create(bb(3)));
+        Cell r2m4 = BufferCell.live(m, ts2, bb(3), CellPath.create(bb(4)));
         List<Cell> cells2 = Lists.newArrayList(r2m2, r2m3, r2m4);
 
         RowBuilder builder = new RowBuilder();
@@ -225,7 +223,7 @@
     private Cell regular(CFMetaData cfm, String columnName, String value, long timestamp)
     {
         ColumnDefinition cdef = cfm.getColumnDefinition(ByteBufferUtil.bytes(columnName));
-        return BufferCell.live(cfm, cdef, timestamp, ByteBufferUtil.bytes(value));
+        return BufferCell.live(cdef, timestamp, ByteBufferUtil.bytes(value));
     }
 
     private Cell expiring(CFMetaData cfm, String columnName, String value, long timestamp, int localExpirationTime)

diff --git a/test/unit/org/apache/cassandra/db/CleanupTest.java b/test/unit/org/apache/cassandra/db/CleanupTest.java
index b4ffe57..abd5a04 100644
--- a/test/unit/org/apache/cassandra/db/CleanupTest.java
+++ b/test/unit/org/apache/cassandra/db/CleanupTest.java

@@ -174,6 +174,32 @@
     }
 
     @Test
+    public void testuserDefinedCleanupWithNewToken() throws ExecutionException, InterruptedException, UnknownHostException
+    {
+        StorageService.instance.getTokenMetadata().clearUnsafe();
+
+        Keyspace keyspace = Keyspace.open(KEYSPACE1);
+        ColumnFamilyStore cfs = keyspace.getColumnFamilyStore(CF_STANDARD1);
+
+        // insert data and verify we get it back w/ range query
+        fillCF(cfs, "val", LOOPS);
+
+        assertEquals(LOOPS, Util.getAll(Util.cmd(cfs).build()).size());
+        TokenMetadata tmd = StorageService.instance.getTokenMetadata();
+
+        byte[] tk1 = new byte[1], tk2 = new byte[1];
+        tk1[0] = 2;
+        tk2[0] = 1;
+        tmd.updateNormalToken(new BytesToken(tk1), InetAddress.getByName("127.0.0.1"));
+        tmd.updateNormalToken(new BytesToken(tk2), InetAddress.getByName("127.0.0.2"));
+
+        for(SSTableReader r: cfs.getLiveSSTables())
+            CompactionManager.instance.forceUserDefinedCleanup(r.getFilename());
+
+        assertEquals(0, Util.getAll(Util.cmd(cfs).build()).size());
+    }
+
+    @Test
     public void testNeedsCleanup() throws Exception
     {
         // setup

diff --git a/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java b/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java
index 6840e2b..af43152 100644
--- a/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java
+++ b/test/unit/org/apache/cassandra/db/ColumnFamilyStoreTest.java

@@ -321,8 +321,8 @@
         }
         ScrubTest.fillIndexCF(cfs, false, colValues);
 
-        cfs.snapshot("nonEphemeralSnapshot", null, false);
-        cfs.snapshot("ephemeralSnapshot", null, true);
+        cfs.snapshot("nonEphemeralSnapshot", null, false, false);
+        cfs.snapshot("ephemeralSnapshot", null, true, false);
 
         Map<String, Pair<Long, Long>> snapshotDetails = cfs.getSnapshotDetails();
         assertEquals(2, snapshotDetails.size());

diff --git a/test/unit/org/apache/cassandra/db/CounterCacheTest.java b/test/unit/org/apache/cassandra/db/CounterCacheTest.java
index 91157ad..4cfd848 100644
--- a/test/unit/org/apache/cassandra/db/CounterCacheTest.java
+++ b/test/unit/org/apache/cassandra/db/CounterCacheTest.java

@@ -169,10 +169,10 @@
         Clustering c2 = CBuilder.create(cfs.metadata.comparator).add(ByteBufferUtil.bytes(2)).build();
         ColumnDefinition cd = cfs.metadata.getColumnDefinition(ByteBufferUtil.bytes("c"));
 
-        assertEquals(ClockAndCount.create(1L, 1L), cfs.getCachedCounter(bytes(1), c1, cd, null));
-        assertEquals(ClockAndCount.create(1L, 2L), cfs.getCachedCounter(bytes(1), c2, cd, null));
-        assertEquals(ClockAndCount.create(1L, 1L), cfs.getCachedCounter(bytes(2), c1, cd, null));
-        assertEquals(ClockAndCount.create(1L, 2L), cfs.getCachedCounter(bytes(2), c2, cd, null));
+        assertEquals(1L, cfs.getCachedCounter(bytes(1), c1, cd, null).count);
+        assertEquals(2L, cfs.getCachedCounter(bytes(1), c2, cd, null).count);
+        assertEquals(1L, cfs.getCachedCounter(bytes(2), c1, cd, null).count);
+        assertEquals(2L, cfs.getCachedCounter(bytes(2), c2, cd, null).count);
     }
 
     @Test

diff --git a/test/unit/org/apache/cassandra/db/CounterCellTest.java b/test/unit/org/apache/cassandra/db/CounterCellTest.java
index 08e0b25..b09bfad 100644
--- a/test/unit/org/apache/cassandra/db/CounterCellTest.java
+++ b/test/unit/org/apache/cassandra/db/CounterCellTest.java

@@ -101,20 +101,20 @@
     {
         ColumnDefinition cDef = cfs.metadata.getColumnDefinition(colName);
         ByteBuffer val = CounterContext.instance().createLocal(count);
-        return BufferCell.live(cfs.metadata, cDef, ts, val);
+        return BufferCell.live(cDef, ts, val);
     }
 
     private Cell createCounterCell(ColumnFamilyStore cfs, ByteBuffer colName, CounterId id, long count, long ts)
     {
         ColumnDefinition cDef = cfs.metadata.getColumnDefinition(colName);
         ByteBuffer val = CounterContext.instance().createGlobal(id, ts, count);
-        return BufferCell.live(cfs.metadata, cDef, ts, val);
+        return BufferCell.live(cDef, ts, val);
     }
 
     private Cell createCounterCellFromContext(ColumnFamilyStore cfs, ByteBuffer colName, ContextState context, long ts)
     {
         ColumnDefinition cDef = cfs.metadata.getColumnDefinition(colName);
-        return BufferCell.live(cfs.metadata, cDef, ts, context.context);
+        return BufferCell.live(cDef, ts, context.context);
     }
 
     private Cell createDeleted(ColumnFamilyStore cfs, ByteBuffer colName, long ts, int localDeletionTime)
@@ -274,7 +274,7 @@
         Cell original = createCounterCellFromContext(cfs, col, state, 5);
 
         ColumnDefinition cDef = cfs.metadata.getColumnDefinition(col);
-        Cell cleared = BufferCell.live(cfs.metadata, cDef, 5, CounterContext.instance().clearAllLocal(state.context));
+        Cell cleared = BufferCell.live(cDef, 5, CounterContext.instance().clearAllLocal(state.context));
 
         CounterContext.instance().updateDigest(digest1, original.value());
         CounterContext.instance().updateDigest(digest2, cleared.value());

diff --git a/test/unit/org/apache/cassandra/db/CounterMutationTest.java b/test/unit/org/apache/cassandra/db/CounterMutationTest.java
index 912dd68..c8d4703 100644
--- a/test/unit/org/apache/cassandra/db/CounterMutationTest.java
+++ b/test/unit/org/apache/cassandra/db/CounterMutationTest.java

@@ -150,11 +150,11 @@
         CBuilder cb = CBuilder.create(cfsOne.metadata.comparator);
         cb.add("cc");
 
-        assertEquals(ClockAndCount.create(1L, 1L), cfsOne.getCachedCounter(Util.dk("key1").getKey(), cb.build(), c1cfs1, null));
-        assertEquals(ClockAndCount.create(1L, -1L), cfsOne.getCachedCounter(Util.dk("key1").getKey(), cb.build(), c2cfs1, null));
+        assertEquals(1L, cfsOne.getCachedCounter(Util.dk("key1").getKey(), cb.build(), c1cfs1, null).count);
+        assertEquals(-1L, cfsOne.getCachedCounter(Util.dk("key1").getKey(), cb.build(), c2cfs1, null).count);
 
-        assertEquals(ClockAndCount.create(1L, 2L), cfsTwo.getCachedCounter(Util.dk("key1").getKey(), cb.build(), c1cfs2, null));
-        assertEquals(ClockAndCount.create(1L, -2L), cfsTwo.getCachedCounter(Util.dk("key1").getKey(), cb.build(), c2cfs2, null));
+        assertEquals(2L, cfsTwo.getCachedCounter(Util.dk("key1").getKey(), cb.build(), c1cfs2, null).count);
+        assertEquals(-2L, cfsTwo.getCachedCounter(Util.dk("key1").getKey(), cb.build(), c2cfs2, null).count);
     }
 
     @Test

diff --git a/test/unit/org/apache/cassandra/db/KeyCacheTest.java b/test/unit/org/apache/cassandra/db/KeyCacheTest.java
index 515d30e..ada6b5b 100644
--- a/test/unit/org/apache/cassandra/db/KeyCacheTest.java
+++ b/test/unit/org/apache/cassandra/db/KeyCacheTest.java

@@ -17,16 +17,15 @@
  */
 package org.apache.cassandra.db;
 
+import java.io.IOException;
 import java.util.Collection;
 import java.util.HashMap;
 import java.util.Iterator;
 import java.util.Map;
 import java.util.Set;
 import java.util.concurrent.ExecutionException;
-import java.util.concurrent.TimeUnit;
 
 import com.google.common.collect.ImmutableList;
-import com.google.common.util.concurrent.Uninterruptibles;
 import org.junit.AfterClass;
 import org.junit.BeforeClass;
 import org.junit.Test;
@@ -34,9 +33,7 @@
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
 import org.apache.cassandra.cache.KeyCacheKey;
-import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.DatabaseDescriptor;
-import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.db.compaction.OperationType;
 import org.apache.cassandra.db.compaction.CompactionManager;
 import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
@@ -54,6 +51,9 @@
     private static final String COLUMN_FAMILY1 = "Standard1";
     private static final String COLUMN_FAMILY2 = "Standard2";
     private static final String COLUMN_FAMILY3 = "Standard3";
+    private static final String COLUMN_FAMILY4 = "Standard4";
+    private static final String COLUMN_FAMILY5 = "Standard5";
+    private static final String COLUMN_FAMILY6 = "Standard6";
 
 
     @BeforeClass
@@ -64,7 +64,10 @@
                                     KeyspaceParams.simple(1),
                                     SchemaLoader.standardCFMD(KEYSPACE1, COLUMN_FAMILY1),
                                     SchemaLoader.standardCFMD(KEYSPACE1, COLUMN_FAMILY2),
-                                    SchemaLoader.standardCFMD(KEYSPACE1, COLUMN_FAMILY3));
+                                    SchemaLoader.standardCFMD(KEYSPACE1, COLUMN_FAMILY3),
+                                    SchemaLoader.standardCFMD(KEYSPACE1, COLUMN_FAMILY4),
+                                    SchemaLoader.standardCFMD(KEYSPACE1, COLUMN_FAMILY5),
+                                    SchemaLoader.standardCFMD(KEYSPACE1, COLUMN_FAMILY6));
     }
 
     @AfterClass
@@ -74,42 +77,61 @@
     }
 
     @Test
-    public void testKeyCacheLoad() throws Exception
+    public void testKeyCacheLoadShallowIndexEntry() throws Exception
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(0);
+        testKeyCacheLoad(COLUMN_FAMILY2);
+    }
+
+    @Test
+    public void testKeyCacheLoadIndexInfoOnHeap() throws Exception
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(8);
+        testKeyCacheLoad(COLUMN_FAMILY5);
+    }
+
+    private void testKeyCacheLoad(String cf) throws Exception
     {
         CompactionManager.instance.disableAutoCompaction();
 
-        ColumnFamilyStore store = Keyspace.open(KEYSPACE1).getColumnFamilyStore(COLUMN_FAMILY2);
+        ColumnFamilyStore store = Keyspace.open(KEYSPACE1).getColumnFamilyStore(cf);
 
         // empty the cache
         CacheService.instance.invalidateKeyCache();
-        assertKeyCacheSize(0, KEYSPACE1, COLUMN_FAMILY2);
+        assertKeyCacheSize(0, KEYSPACE1, cf);
 
         // insert data and force to disk
-        SchemaLoader.insertData(KEYSPACE1, COLUMN_FAMILY2, 0, 100);
+        SchemaLoader.insertData(KEYSPACE1, cf, 0, 100);
         store.forceBlockingFlush();
 
         // populate the cache
-        readData(KEYSPACE1, COLUMN_FAMILY2, 0, 100);
-        assertKeyCacheSize(100, KEYSPACE1, COLUMN_FAMILY2);
+        readData(KEYSPACE1, cf, 0, 100);
+        assertKeyCacheSize(100, KEYSPACE1, cf);
 
         // really? our caches don't implement the map interface? (hence no .addAll)
-        Map<KeyCacheKey, RowIndexEntry> savedMap = new HashMap<KeyCacheKey, RowIndexEntry>();
+        Map<KeyCacheKey, RowIndexEntry> savedMap = new HashMap<>();
+        Map<KeyCacheKey, RowIndexEntry.IndexInfoRetriever> savedInfoMap = new HashMap<>();
         for (Iterator<KeyCacheKey> iter = CacheService.instance.keyCache.keyIterator();
              iter.hasNext();)
         {
             KeyCacheKey k = iter.next();
-            if (k.desc.ksname.equals(KEYSPACE1) && k.desc.cfname.equals(COLUMN_FAMILY2))
-                savedMap.put(k, CacheService.instance.keyCache.get(k));
+            if (k.desc.ksname.equals(KEYSPACE1) && k.desc.cfname.equals(cf))
+            {
+                RowIndexEntry rie = CacheService.instance.keyCache.get(k);
+                savedMap.put(k, rie);
+                SSTableReader sstr = readerForKey(k);
+                savedInfoMap.put(k, rie.openWithIndex(sstr.getIndexFile()));
+            }
         }
 
         // force the cache to disk
         CacheService.instance.keyCache.submitWrite(Integer.MAX_VALUE).get();
 
         CacheService.instance.invalidateKeyCache();
-        assertKeyCacheSize(0, KEYSPACE1, COLUMN_FAMILY2);
+        assertKeyCacheSize(0, KEYSPACE1, cf);
 
         CacheService.instance.keyCache.loadSaved();
-        assertKeyCacheSize(savedMap.size(), KEYSPACE1, COLUMN_FAMILY2);
+        assertKeyCacheSize(savedMap.size(), KEYSPACE1, cf);
 
         // probably it's better to add equals/hashCode to RowIndexEntry...
         for (Map.Entry<KeyCacheKey, RowIndexEntry> entry : savedMap.entrySet())
@@ -117,77 +139,132 @@
             RowIndexEntry expected = entry.getValue();
             RowIndexEntry actual = CacheService.instance.keyCache.get(entry.getKey());
             assertEquals(expected.position, actual.position);
-            assertEquals(expected.columnsIndex(), actual.columnsIndex());
+            assertEquals(expected.columnsIndexCount(), actual.columnsIndexCount());
+            for (int i = 0; i < expected.columnsIndexCount(); i++)
+            {
+                SSTableReader actualSstr = readerForKey(entry.getKey());
+                try (RowIndexEntry.IndexInfoRetriever actualIir = actual.openWithIndex(actualSstr.getIndexFile()))
+                {
+                    RowIndexEntry.IndexInfoRetriever expectedIir = savedInfoMap.get(entry.getKey());
+                    assertEquals(expectedIir.columnsIndex(i), actualIir.columnsIndex(i));
+                }
+            }
             if (expected.isIndexed())
             {
                 assertEquals(expected.deletionTime(), actual.deletionTime());
             }
         }
+
+        savedInfoMap.values().forEach(iir -> {
+            try
+            {
+                if (iir != null)
+                    iir.close();
+            }
+            catch (IOException e)
+            {
+                throw new RuntimeException(e);
+            }
+        });
+    }
+
+    private static SSTableReader readerForKey(KeyCacheKey k)
+    {
+        return ColumnFamilyStore.getIfExists(k.desc.ksname, k.desc.cfname).getLiveSSTables()
+                                .stream()
+                                .filter(sstreader -> sstreader.descriptor.generation == k.desc.generation)
+                                .findFirst().get();
     }
 
     @Test
-    public void testKeyCacheLoadWithLostTable() throws Exception
+    public void testKeyCacheLoadWithLostTableShallowIndexEntry() throws Exception
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(0);
+        testKeyCacheLoadWithLostTable(COLUMN_FAMILY3);
+    }
+
+    @Test
+    public void testKeyCacheLoadWithLostTableIndexInfoOnHeap() throws Exception
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(8);
+        testKeyCacheLoadWithLostTable(COLUMN_FAMILY6);
+    }
+
+    private void testKeyCacheLoadWithLostTable(String cf) throws Exception
     {
         CompactionManager.instance.disableAutoCompaction();
 
-        ColumnFamilyStore store = Keyspace.open(KEYSPACE1).getColumnFamilyStore(COLUMN_FAMILY3);
+        ColumnFamilyStore store = Keyspace.open(KEYSPACE1).getColumnFamilyStore(cf);
 
         // empty the cache
         CacheService.instance.invalidateKeyCache();
-        assertKeyCacheSize(0, KEYSPACE1, COLUMN_FAMILY3);
+        assertKeyCacheSize(0, KEYSPACE1, cf);
 
         // insert data and force to disk
-        SchemaLoader.insertData(KEYSPACE1, COLUMN_FAMILY3, 0, 100);
+        SchemaLoader.insertData(KEYSPACE1, cf, 0, 100);
         store.forceBlockingFlush();
 
         Collection<SSTableReader> firstFlushTables = ImmutableList.copyOf(store.getLiveSSTables());
 
         // populate the cache
-        readData(KEYSPACE1, COLUMN_FAMILY3, 0, 100);
-        assertKeyCacheSize(100, KEYSPACE1, COLUMN_FAMILY3);
+        readData(KEYSPACE1, cf, 0, 100);
+        assertKeyCacheSize(100, KEYSPACE1, cf);
 
         // insert some new data and force to disk
-        SchemaLoader.insertData(KEYSPACE1, COLUMN_FAMILY3, 100, 50);
+        SchemaLoader.insertData(KEYSPACE1, cf, 100, 50);
         store.forceBlockingFlush();
 
         // check that it's fine
-        readData(KEYSPACE1, COLUMN_FAMILY3, 100, 50);
-        assertKeyCacheSize(150, KEYSPACE1, COLUMN_FAMILY3);
+        readData(KEYSPACE1, cf, 100, 50);
+        assertKeyCacheSize(150, KEYSPACE1, cf);
 
         // force the cache to disk
         CacheService.instance.keyCache.submitWrite(Integer.MAX_VALUE).get();
 
         CacheService.instance.invalidateKeyCache();
-        assertKeyCacheSize(0, KEYSPACE1, COLUMN_FAMILY3);
+        assertKeyCacheSize(0, KEYSPACE1, cf);
 
         // check that the content is written correctly
         CacheService.instance.keyCache.loadSaved();
-        assertKeyCacheSize(150, KEYSPACE1, COLUMN_FAMILY3);
+        assertKeyCacheSize(150, KEYSPACE1, cf);
 
         CacheService.instance.invalidateKeyCache();
-        assertKeyCacheSize(0, KEYSPACE1, COLUMN_FAMILY3);
+        assertKeyCacheSize(0, KEYSPACE1, cf);
 
         // now remove the first sstable from the store to simulate losing the file
         store.markObsolete(firstFlushTables, OperationType.UNKNOWN);
 
         // check that reading now correctly skips over lost table and reads the rest (CASSANDRA-10219)
         CacheService.instance.keyCache.loadSaved();
-        assertKeyCacheSize(50, KEYSPACE1, COLUMN_FAMILY3);
+        assertKeyCacheSize(50, KEYSPACE1, cf);
     }
 
     @Test
-    public void testKeyCache() throws ExecutionException, InterruptedException
+    public void testKeyCacheShallowIndexEntry() throws ExecutionException, InterruptedException
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(0);
+        testKeyCache(COLUMN_FAMILY1);
+    }
+
+    @Test
+    public void testKeyCacheIndexInfoOnHeap() throws ExecutionException, InterruptedException
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(8);
+        testKeyCache(COLUMN_FAMILY4);
+    }
+
+    private void testKeyCache(String cf) throws ExecutionException, InterruptedException
     {
         CompactionManager.instance.disableAutoCompaction();
 
         Keyspace keyspace = Keyspace.open(KEYSPACE1);
-        ColumnFamilyStore cfs = keyspace.getColumnFamilyStore(COLUMN_FAMILY1);
+        ColumnFamilyStore cfs = keyspace.getColumnFamilyStore(cf);
 
         // just to make sure that everything is clean
         CacheService.instance.invalidateKeyCache();
 
         // KeyCache should start at size 0 if we're caching X% of zero data.
-        assertKeyCacheSize(0, KEYSPACE1, COLUMN_FAMILY1);
+        assertKeyCacheSize(0, KEYSPACE1, cf);
 
         Mutation rm;
 
@@ -202,7 +279,7 @@
         Util.getAll(Util.cmd(cfs, "key1").build());
         Util.getAll(Util.cmd(cfs, "key2").build());
 
-        assertKeyCacheSize(2, KEYSPACE1, COLUMN_FAMILY1);
+        assertKeyCacheSize(2, KEYSPACE1, cf);
 
         Set<SSTableReader> readers = cfs.getLiveSSTables();
         Refs<SSTableReader> refs = Refs.tryRef(readers);
@@ -215,20 +292,20 @@
         // after compaction cache should have entries for new SSTables,
         // but since we have kept a reference to the old sstables,
         // if we had 2 keys in cache previously it should become 4
-        assertKeyCacheSize(noEarlyOpen ? 2 : 4, KEYSPACE1, COLUMN_FAMILY1);
+        assertKeyCacheSize(noEarlyOpen ? 2 : 4, KEYSPACE1, cf);
 
         refs.release();
 
         LifecycleTransaction.waitForDeletions();
 
         // after releasing the reference this should drop to 2
-        assertKeyCacheSize(2, KEYSPACE1, COLUMN_FAMILY1);
+        assertKeyCacheSize(2, KEYSPACE1, cf);
 
         // re-read same keys to verify that key cache didn't grow further
         Util.getAll(Util.cmd(cfs, "key1").build());
         Util.getAll(Util.cmd(cfs, "key2").build());
 
-        assertKeyCacheSize(noEarlyOpen ? 4 : 2, KEYSPACE1, COLUMN_FAMILY1);
+        assertKeyCacheSize(noEarlyOpen ? 4 : 2, KEYSPACE1, cf);
     }
 
     private static void readData(String keyspace, String columnFamily, int startRow, int numberOfRows)

diff --git a/test/unit/org/apache/cassandra/db/KeyspaceTest.java b/test/unit/org/apache/cassandra/db/KeyspaceTest.java
index d864fa3..5036749 100644
--- a/test/unit/org/apache/cassandra/db/KeyspaceTest.java
+++ b/test/unit/org/apache/cassandra/db/KeyspaceTest.java

@@ -129,13 +129,14 @@
 
     private static void assertRowsInSlice(ColumnFamilyStore cfs, String key, int sliceStart, int sliceEnd, int limit, boolean reversed, String columnValuePrefix)
     {
-        Clustering startClustering = new Clustering(ByteBufferUtil.bytes(sliceStart));
-        Clustering endClustering = new Clustering(ByteBufferUtil.bytes(sliceEnd));
+        Clustering startClustering = Clustering.make(ByteBufferUtil.bytes(sliceStart));
+        Clustering endClustering = Clustering.make(ByteBufferUtil.bytes(sliceEnd));
         Slices slices = Slices.with(cfs.getComparator(), Slice.make(startClustering, endClustering));
         ClusteringIndexSliceFilter filter = new ClusteringIndexSliceFilter(slices, reversed);
         SinglePartitionReadCommand command = singlePartitionSlice(cfs, key, filter, limit);
 
-        try (ReadOrderGroup orderGroup = command.startOrderGroup(); PartitionIterator iterator = command.executeInternal(orderGroup))
+        try (ReadExecutionController executionController = command.executionController();
+             PartitionIterator iterator = command.executeInternal(executionController))
         {
             try (RowIterator rowIterator = iterator.next())
             {
@@ -208,7 +209,8 @@
             PartitionColumns columns = PartitionColumns.of(cfs.metadata.getColumnDefinition(new ColumnIdentifier("c", false)));
             ClusteringIndexSliceFilter filter = new ClusteringIndexSliceFilter(Slices.ALL, false);
             SinglePartitionReadCommand command = singlePartitionSlice(cfs, "0", filter, null);
-            try (ReadOrderGroup orderGroup = command.startOrderGroup(); PartitionIterator iterator = command.executeInternal(orderGroup))
+            try (ReadExecutionController executionController = command.executionController();
+                 PartitionIterator iterator = command.executeInternal(executionController))
             {
                 try (RowIterator rowIterator = iterator.next())
                 {
@@ -222,7 +224,8 @@
 
     private static void assertRowsInResult(ColumnFamilyStore cfs, SinglePartitionReadCommand command, int ... columnValues)
     {
-        try (ReadOrderGroup orderGroup = command.startOrderGroup(); PartitionIterator iterator = command.executeInternal(orderGroup))
+        try (ReadExecutionController executionController = command.executionController();
+             PartitionIterator iterator = command.executeInternal(executionController))
         {
             if (columnValues.length == 0)
             {
@@ -248,12 +251,12 @@
 
     private static ClusteringIndexSliceFilter slices(ColumnFamilyStore cfs, Integer sliceStart, Integer sliceEnd, boolean reversed)
     {
-        Slice.Bound startBound = sliceStart == null
-                               ? Slice.Bound.BOTTOM
-                               : Slice.Bound.create(ClusteringPrefix.Kind.INCL_START_BOUND, new ByteBuffer[]{ByteBufferUtil.bytes(sliceStart)});
-        Slice.Bound endBound = sliceEnd == null
-                             ? Slice.Bound.TOP
-                             : Slice.Bound.create(ClusteringPrefix.Kind.INCL_END_BOUND, new ByteBuffer[]{ByteBufferUtil.bytes(sliceEnd)});
+        ClusteringBound startBound = sliceStart == null
+                                   ? ClusteringBound.BOTTOM
+                                   : ClusteringBound.create(ClusteringPrefix.Kind.INCL_START_BOUND, new ByteBuffer[]{ByteBufferUtil.bytes(sliceStart)});
+        ClusteringBound endBound = sliceEnd == null
+                                 ? ClusteringBound.TOP
+                                 : ClusteringBound.create(ClusteringPrefix.Kind.INCL_END_BOUND, new ByteBuffer[]{ByteBufferUtil.bytes(sliceEnd)});
         Slices slices = Slices.with(cfs.getComparator(), Slice.make(startBound, endBound));
         return new ClusteringIndexSliceFilter(slices, reversed);
     }
@@ -387,7 +390,7 @@
         // verify that we do indeed have multiple index entries
         SSTableReader sstable = cfs.getLiveSSTables().iterator().next();
         RowIndexEntry indexEntry = sstable.getPosition(Util.dk("0"), SSTableReader.Operator.EQ);
-        assert indexEntry.columnsIndex().size() > 2;
+        assert indexEntry.columnsIndexCount() > 2;
 
         validateSliceLarge(cfs);
     }

diff --git a/test/unit/org/apache/cassandra/db/NativeCellTest.java b/test/unit/org/apache/cassandra/db/NativeCellTest.java
new file mode 100644
index 0000000..69e615b
--- /dev/null
+++ b/test/unit/org/apache/cassandra/db/NativeCellTest.java

@@ -0,0 +1,171 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db;
+
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Random;
+import java.util.UUID;
+
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.cql3.ColumnIdentifier;
+import org.apache.cassandra.db.marshal.BytesType;
+import org.apache.cassandra.db.marshal.SetType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.utils.concurrent.OpOrder;
+import org.apache.cassandra.utils.memory.HeapAllocator;
+import org.apache.cassandra.utils.memory.NativeAllocator;
+import org.apache.cassandra.utils.memory.NativePool;
+
+public class NativeCellTest
+{
+
+    private static final Logger logger = LoggerFactory.getLogger(NativeCellTest.class);
+    private static final NativeAllocator nativeAllocator = new NativePool(Integer.MAX_VALUE, Integer.MAX_VALUE, 1f, null).newAllocator();
+    private static final OpOrder.Group group = new OpOrder().start();
+    private static Random rand;
+
+    @BeforeClass
+    public static void setUp()
+    {
+        long seed = System.currentTimeMillis();
+        logger.info("Seed : {}", seed);
+        rand = new Random(seed);
+    }
+
+    @Test
+    public void testCells() throws IOException
+    {
+        for (int run = 0 ; run < 1000 ; run++)
+        {
+            Row.Builder builder = BTreeRow.unsortedBuilder(1);
+            builder.newRow(rndclustering());
+            int count = 1 + rand.nextInt(10);
+            for (int i = 0 ; i < count ; i++)
+                rndcd(builder);
+            test(builder.build());
+        }
+    }
+
+    private static Clustering rndclustering()
+    {
+        int count = 1 + rand.nextInt(100);
+        ByteBuffer[] values = new ByteBuffer[count];
+        int size = rand.nextInt(65535);
+        for (int i = 0 ; i < count ; i++)
+        {
+            int twiceShare = 1 + (2 * size) / (count - i);
+            int nextSize = Math.min(size, rand.nextInt(twiceShare));
+            if (nextSize < 10 && rand.nextBoolean())
+                continue;
+
+            byte[] bytes = new byte[nextSize];
+            rand.nextBytes(bytes);
+            values[i] = ByteBuffer.wrap(bytes);
+            size -= nextSize;
+        }
+        return Clustering.make(values);
+    }
+
+    private static void rndcd(Row.Builder builder)
+    {
+        ColumnDefinition col = rndcol();
+        if (!col.isComplex())
+        {
+            builder.addCell(rndcell(col));
+        }
+        else
+        {
+            int count = 1 + rand.nextInt(100);
+            for (int i = 0 ; i < count ; i++)
+                builder.addCell(rndcell(col));
+        }
+    }
+
+    private static ColumnDefinition rndcol()
+    {
+        UUID uuid = new UUID(rand.nextLong(), rand.nextLong());
+        boolean isComplex = rand.nextBoolean();
+        return new ColumnDefinition("",
+                                    "",
+                                    ColumnIdentifier.getInterned(uuid.toString(), false),
+                                    isComplex ? new SetType<>(BytesType.instance, true) : BytesType.instance,
+                                    -1,
+                                    ColumnDefinition.Kind.REGULAR);
+    }
+
+    private static Cell rndcell(ColumnDefinition col)
+    {
+        long timestamp = rand.nextLong();
+        int ttl = rand.nextInt();
+        int localDeletionTime = rand.nextInt();
+        byte[] value = new byte[rand.nextInt(sanesize(expdecay()))];
+        rand.nextBytes(value);
+        CellPath path = null;
+        if (col.isComplex())
+        {
+            byte[] pathbytes = new byte[rand.nextInt(sanesize(expdecay()))];
+            rand.nextBytes(value);
+            path = CellPath.create(ByteBuffer.wrap(pathbytes));
+        }
+
+        return new BufferCell(col, timestamp, ttl, localDeletionTime, ByteBuffer.wrap(value), path);
+    }
+
+    private static int expdecay()
+    {
+        return 1 << Integer.numberOfTrailingZeros(Integer.lowestOneBit(rand.nextInt()));
+    }
+
+    private static int sanesize(int randomsize)
+    {
+        return Math.min(Math.max(1, randomsize), 1 << 26);
+    }
+
+    private static void test(Row row)
+    {
+        Row nrow = clone(row, nativeAllocator.rowBuilder(group));
+        Row brow = clone(row, HeapAllocator.instance.cloningBTreeRowBuilder());
+        Assert.assertEquals(row, nrow);
+        Assert.assertEquals(row, brow);
+        Assert.assertEquals(nrow, brow);
+
+        Assert.assertEquals(row.clustering(), nrow.clustering());
+        Assert.assertEquals(row.clustering(), brow.clustering());
+        Assert.assertEquals(nrow.clustering(), brow.clustering());
+
+        ClusteringComparator comparator = new ClusteringComparator(UTF8Type.instance);
+        Assert.assertTrue(comparator.compare(row.clustering(), nrow.clustering()) == 0);
+        Assert.assertTrue(comparator.compare(row.clustering(), brow.clustering()) == 0);
+        Assert.assertTrue(comparator.compare(nrow.clustering(), brow.clustering()) == 0);
+    }
+
+    private static Row clone(Row row, Row.Builder builder)
+    {
+        return Rows.copy(row, builder).build();
+    }
+
+}

diff --git a/test/unit/org/apache/cassandra/db/RangeTombstoneListTest.java b/test/unit/org/apache/cassandra/db/RangeTombstoneListTest.java
index f40abe9..d3dc835 100644
--- a/test/unit/org/apache/cassandra/db/RangeTombstoneListTest.java
+++ b/test/unit/org/apache/cassandra/db/RangeTombstoneListTest.java

@@ -605,7 +605,7 @@
 
     private static Clustering clustering(int i)
     {
-        return new Clustering(bb(i));
+        return Clustering.make(bb(i));
     }
 
     private static ByteBuffer bb(int i)
@@ -620,12 +620,12 @@
 
     private static RangeTombstone rt(int start, boolean startInclusive, int end, boolean endInclusive, long tstamp)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.create(cmp, true, startInclusive, start), Slice.Bound.create(cmp, false, endInclusive, end)), new DeletionTime(tstamp, 0));
+        return new RangeTombstone(Slice.make(ClusteringBound.create(cmp, true, startInclusive, start), ClusteringBound.create(cmp, false, endInclusive, end)), new DeletionTime(tstamp, 0));
     }
 
     private static RangeTombstone rt(int start, int end, long tstamp, int delTime)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.inclusiveStartOf(bb(start)), Slice.Bound.inclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
+        return new RangeTombstone(Slice.make(ClusteringBound.inclusiveStartOf(bb(start)), ClusteringBound.inclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
     }
 
     private static RangeTombstone rtei(int start, int end, long tstamp)
@@ -635,7 +635,7 @@
 
     private static RangeTombstone rtei(int start, int end, long tstamp, int delTime)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.exclusiveStartOf(bb(start)), Slice.Bound.inclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
+        return new RangeTombstone(Slice.make(ClusteringBound.exclusiveStartOf(bb(start)), ClusteringBound.inclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
     }
 
     private static RangeTombstone rtie(int start, int end, long tstamp)
@@ -645,26 +645,26 @@
 
     private static RangeTombstone rtie(int start, int end, long tstamp, int delTime)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.inclusiveStartOf(bb(start)), Slice.Bound.exclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
+        return new RangeTombstone(Slice.make(ClusteringBound.inclusiveStartOf(bb(start)), ClusteringBound.exclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
     }
 
     private static RangeTombstone atLeast(int start, long tstamp, int delTime)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.inclusiveStartOf(bb(start)), Slice.Bound.TOP), new DeletionTime(tstamp, delTime));
+        return new RangeTombstone(Slice.make(ClusteringBound.inclusiveStartOf(bb(start)), ClusteringBound.TOP), new DeletionTime(tstamp, delTime));
     }
 
     private static RangeTombstone atMost(int end, long tstamp, int delTime)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.BOTTOM, Slice.Bound.inclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
+        return new RangeTombstone(Slice.make(ClusteringBound.BOTTOM, ClusteringBound.inclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
     }
 
     private static RangeTombstone lessThan(int end, long tstamp, int delTime)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.BOTTOM, Slice.Bound.exclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
+        return new RangeTombstone(Slice.make(ClusteringBound.BOTTOM, ClusteringBound.exclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
     }
 
     private static RangeTombstone greaterThan(int start, long tstamp, int delTime)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.exclusiveStartOf(bb(start)), Slice.Bound.TOP), new DeletionTime(tstamp, delTime));
+        return new RangeTombstone(Slice.make(ClusteringBound.exclusiveStartOf(bb(start)), ClusteringBound.TOP), new DeletionTime(tstamp, delTime));
     }
 }

diff --git a/test/unit/org/apache/cassandra/db/RangeTombstoneTest.java b/test/unit/org/apache/cassandra/db/RangeTombstoneTest.java
index d0cc890..9120546 100644
--- a/test/unit/org/apache/cassandra/db/RangeTombstoneTest.java
+++ b/test/unit/org/apache/cassandra/db/RangeTombstoneTest.java

@@ -112,17 +112,17 @@
         int nowInSec = FBUtilities.nowInSeconds();
 
         for (int i : live)
-            assertTrue("Row " + i + " should be live", partition.getRow(new Clustering(bb(i))).hasLiveData(nowInSec));
+            assertTrue("Row " + i + " should be live", partition.getRow(Clustering.make(bb(i))).hasLiveData(nowInSec));
         for (int i : dead)
-            assertFalse("Row " + i + " shouldn't be live", partition.getRow(new Clustering(bb(i))).hasLiveData(nowInSec));
+            assertFalse("Row " + i + " shouldn't be live", partition.getRow(Clustering.make(bb(i))).hasLiveData(nowInSec));
 
         // Queries by slices
         partition = Util.getOnlyPartitionUnfiltered(Util.cmd(cfs, key).fromIncl(7).toIncl(30).build());
 
         for (int i : new int[]{ 7, 8, 9, 11, 13, 15, 17, 28, 29, 30 })
-            assertTrue("Row " + i + " should be live", partition.getRow(new Clustering(bb(i))).hasLiveData(nowInSec));
+            assertTrue("Row " + i + " should be live", partition.getRow(Clustering.make(bb(i))).hasLiveData(nowInSec));
         for (int i : new int[]{ 10, 12, 14, 16, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27 })
-            assertFalse("Row " + i + " shouldn't be live", partition.getRow(new Clustering(bb(i))).hasLiveData(nowInSec));
+            assertFalse("Row " + i + " shouldn't be live", partition.getRow(Clustering.make(bb(i))).hasLiveData(nowInSec));
     }
 
     @Test
@@ -207,8 +207,8 @@
         assertEquals(1, rt.size());
 
         Slices.Builder sb = new Slices.Builder(cfs.getComparator());
-        sb.add(Slice.Bound.create(cfs.getComparator(), true, true, 1), Slice.Bound.create(cfs.getComparator(), false, true, 10));
-        sb.add(Slice.Bound.create(cfs.getComparator(), true, true, 16), Slice.Bound.create(cfs.getComparator(), false, true, 20));
+        sb.add(ClusteringBound.create(cfs.getComparator(), true, true, 1), ClusteringBound.create(cfs.getComparator(), false, true, 10));
+        sb.add(ClusteringBound.create(cfs.getComparator(), true, true, 16), ClusteringBound.create(cfs.getComparator(), false, true, 20));
 
         partition = Util.getOnlyPartitionUnfiltered(SinglePartitionReadCommand.create(cfs.metadata, FBUtilities.nowInSeconds(), Util.dk(key), sb.build()));
         rt = rangeTombstones(partition);
@@ -408,22 +408,22 @@
         int nowInSec = FBUtilities.nowInSeconds();
 
         for (int i = 0; i < 5; i++)
-            assertTrue("Row " + i + " should be live", partition.getRow(new Clustering(bb(i))).hasLiveData(nowInSec));
+            assertTrue("Row " + i + " should be live", partition.getRow(Clustering.make(bb(i))).hasLiveData(nowInSec));
         for (int i = 16; i < 20; i++)
-            assertTrue("Row " + i + " should be live", partition.getRow(new Clustering(bb(i))).hasLiveData(nowInSec));
+            assertTrue("Row " + i + " should be live", partition.getRow(Clustering.make(bb(i))).hasLiveData(nowInSec));
         for (int i = 5; i <= 15; i++)
-            assertFalse("Row " + i + " shouldn't be live", partition.getRow(new Clustering(bb(i))).hasLiveData(nowInSec));
+            assertFalse("Row " + i + " shouldn't be live", partition.getRow(Clustering.make(bb(i))).hasLiveData(nowInSec));
 
         // Compact everything and re-test
         CompactionManager.instance.performMaximal(cfs, false);
         partition = Util.getOnlyPartitionUnfiltered(Util.cmd(cfs, key).build());
 
         for (int i = 0; i < 5; i++)
-            assertTrue("Row " + i + " should be live", partition.getRow(new Clustering(bb(i))).hasLiveData(FBUtilities.nowInSeconds()));
+            assertTrue("Row " + i + " should be live", partition.getRow(Clustering.make(bb(i))).hasLiveData(FBUtilities.nowInSeconds()));
         for (int i = 16; i < 20; i++)
-            assertTrue("Row " + i + " should be live", partition.getRow(new Clustering(bb(i))).hasLiveData(FBUtilities.nowInSeconds()));
+            assertTrue("Row " + i + " should be live", partition.getRow(Clustering.make(bb(i))).hasLiveData(FBUtilities.nowInSeconds()));
         for (int i = 5; i <= 15; i++)
-            assertFalse("Row " + i + " shouldn't be live", partition.getRow(new Clustering(bb(i))).hasLiveData(nowInSec));
+            assertFalse("Row " + i + " shouldn't be live", partition.getRow(Clustering.make(bb(i))).hasLiveData(nowInSec));
     }
 
     @Test

diff --git a/test/unit/org/apache/cassandra/db/ReadCommandTest.java b/test/unit/org/apache/cassandra/db/ReadCommandTest.java
new file mode 100644
index 0000000..663080b
--- /dev/null
+++ b/test/unit/org/apache/cassandra/db/ReadCommandTest.java

@@ -0,0 +1,149 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db;
+
+import java.util.List;
+
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import org.apache.cassandra.SchemaLoader;
+import org.apache.cassandra.Util;
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.db.marshal.AsciiType;
+import org.apache.cassandra.db.marshal.BytesType;
+import org.apache.cassandra.db.partitions.FilteredPartition;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.schema.KeyspaceParams;
+import org.apache.cassandra.utils.ByteBufferUtil;
+
+import static org.junit.Assert.assertEquals;
+
+public class ReadCommandTest
+{
+    private static final String KEYSPACE = "ReadCommandTest";
+    private static final String CF1 = "Standard1";
+    private static final String CF2 = "Standard2";
+
+    @BeforeClass
+    public static void defineSchema() throws ConfigurationException
+    {
+        CFMetaData metadata1 = SchemaLoader.standardCFMD(KEYSPACE, CF1);
+
+        CFMetaData metadata2 = CFMetaData.Builder.create(KEYSPACE, CF2)
+                                                         .addPartitionKey("key", BytesType.instance)
+                                                         .addClusteringColumn("col", AsciiType.instance)
+                                                         .addRegularColumn("a", AsciiType.instance)
+                                                         .addRegularColumn("b", AsciiType.instance).build();
+
+        SchemaLoader.prepareServer();
+        SchemaLoader.createKeyspace(KEYSPACE,
+                                    KeyspaceParams.simple(1),
+                                    metadata1,
+                                    metadata2);
+    }
+
+    @Test
+    public void testPartitionRangeAbort() throws Exception
+    {
+        ColumnFamilyStore cfs = Keyspace.open(KEYSPACE).getColumnFamilyStore(CF1);
+
+        new RowUpdateBuilder(cfs.metadata, 0, ByteBufferUtil.bytes("key1"))
+                .clustering("Column1")
+                .add("val", ByteBufferUtil.bytes("abcd"))
+                .build()
+                .apply();
+
+        cfs.forceBlockingFlush();
+
+        new RowUpdateBuilder(cfs.metadata, 0, ByteBufferUtil.bytes("key2"))
+                .clustering("Column1")
+                .add("val", ByteBufferUtil.bytes("abcd"))
+                .build()
+                .apply();
+
+        ReadCommand readCommand = Util.cmd(cfs).build();
+        assertEquals(2, Util.getAll(readCommand).size());
+
+        readCommand.abort();
+        assertEquals(0, Util.getAll(readCommand).size());
+    }
+
+    @Test
+    public void testSinglePartitionSliceAbort() throws Exception
+    {
+        ColumnFamilyStore cfs = Keyspace.open(KEYSPACE).getColumnFamilyStore(CF2);
+
+        cfs.truncateBlocking();
+
+        new RowUpdateBuilder(cfs.metadata, 0, ByteBufferUtil.bytes("key"))
+                .clustering("cc")
+                .add("a", ByteBufferUtil.bytes("abcd"))
+                .build()
+                .apply();
+
+        cfs.forceBlockingFlush();
+
+        new RowUpdateBuilder(cfs.metadata, 0, ByteBufferUtil.bytes("key"))
+                .clustering("dd")
+                .add("a", ByteBufferUtil.bytes("abcd"))
+                .build()
+                .apply();
+
+        ReadCommand readCommand = Util.cmd(cfs, Util.dk("key")).build();
+
+        List<FilteredPartition> partitions = Util.getAll(readCommand);
+        assertEquals(1, partitions.size());
+        assertEquals(2, partitions.get(0).rowCount());
+
+        readCommand.abort();
+        assertEquals(0, Util.getAll(readCommand).size());
+    }
+
+    @Test
+    public void testSinglePartitionNamesAbort() throws Exception
+    {
+        ColumnFamilyStore cfs = Keyspace.open(KEYSPACE).getColumnFamilyStore(CF2);
+
+        cfs.truncateBlocking();
+
+        new RowUpdateBuilder(cfs.metadata, 0, ByteBufferUtil.bytes("key"))
+                .clustering("cc")
+                .add("a", ByteBufferUtil.bytes("abcd"))
+                .build()
+                .apply();
+
+        cfs.forceBlockingFlush();
+
+        new RowUpdateBuilder(cfs.metadata, 0, ByteBufferUtil.bytes("key"))
+                .clustering("dd")
+                .add("a", ByteBufferUtil.bytes("abcd"))
+                .build()
+                .apply();
+
+        ReadCommand readCommand = Util.cmd(cfs, Util.dk("key")).includeRow("cc").includeRow("dd").build();
+
+        List<FilteredPartition> partitions = Util.getAll(readCommand);
+        assertEquals(1, partitions.size());
+        assertEquals(2, partitions.get(0).rowCount());
+
+        readCommand.abort();
+        assertEquals(0, Util.getAll(readCommand).size());
+    }
+}

diff --git a/test/unit/org/apache/cassandra/db/ReadMessageTest.java b/test/unit/org/apache/cassandra/db/ReadMessageTest.java
index d801b32..4047cc9 100644
--- a/test/unit/org/apache/cassandra/db/ReadMessageTest.java
+++ b/test/unit/org/apache/cassandra/db/ReadMessageTest.java

@@ -43,6 +43,8 @@
 import org.apache.cassandra.io.util.DataOutputBuffer;
 import org.apache.cassandra.net.MessagingService;
 import org.apache.cassandra.schema.KeyspaceParams;
+import org.apache.cassandra.service.CassandraDaemon;
+import org.apache.cassandra.service.StorageService;
 import org.apache.cassandra.utils.ByteBufferUtil;
 
 public class ReadMessageTest
@@ -56,6 +58,10 @@
     @BeforeClass
     public static void defineSchema() throws ConfigurationException
     {
+        CassandraDaemon daemon = new CassandraDaemon();
+        daemon.completeSetup(); //startup must be completed, otherwise commit log failure must kill JVM regardless of failure policy
+        StorageService.instance.registerDaemon(daemon);
+
         CFMetaData cfForReadMetadata = CFMetaData.Builder.create(KEYSPACE1, CF_FOR_READ_TEST)
                                                             .addPartitionKey("key", BytesType.instance)
                                                             .addClusteringColumn("col1", AsciiType.instance)
@@ -195,7 +201,9 @@
 
         Checker checker = new Checker(cfs.metadata.getColumnDefinition(ByteBufferUtil.bytes("commit1")),
                                       cfsnocommit.metadata.getColumnDefinition(ByteBufferUtil.bytes("commit2")));
-        CommitLogTestReplayer.examineCommitLog(checker);
+
+        CommitLogTestReplayer replayer = new CommitLogTestReplayer(checker);
+        replayer.examineCommitLog();
 
         assertTrue(checker.commitLogMessageFound);
         assertFalse(checker.noCommitLogMessageFound);
@@ -219,7 +227,7 @@
         {
             for (PartitionUpdate upd : mutation.getPartitionUpdates())
             {
-                Row r = upd.getRow(new Clustering(ByteBufferUtil.bytes("c")));
+                Row r = upd.getRow(Clustering.make(ByteBufferUtil.bytes("c")));
                 if (r != null)
                 {
                     if (r.getCell(withCommit) != null)

diff --git a/test/unit/org/apache/cassandra/db/RecoveryManagerFlushedTest.java b/test/unit/org/apache/cassandra/db/RecoveryManagerFlushedTest.java
index d06c112..86fa5b4 100644
--- a/test/unit/org/apache/cassandra/db/RecoveryManagerFlushedTest.java
+++ b/test/unit/org/apache/cassandra/db/RecoveryManagerFlushedTest.java

@@ -36,14 +36,16 @@
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.ParameterizedClass;
-import org.apache.cassandra.db.commitlog.CommitLog;
 import org.apache.cassandra.db.compaction.CompactionManager;
+import org.apache.cassandra.db.commitlog.CommitLog;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.io.compress.DeflateCompressor;
 import org.apache.cassandra.io.compress.LZ4Compressor;
 import org.apache.cassandra.io.compress.SnappyCompressor;
 import org.apache.cassandra.schema.KeyspaceParams;
 import org.apache.cassandra.schema.SchemaKeyspace;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.security.EncryptionContextGenerator;
 import org.apache.cassandra.utils.FBUtilities;
 
 @RunWith(Parameterized.class)
@@ -55,19 +57,21 @@
     private static final String CF_STANDARD1 = "Standard1";
     private static final String CF_STANDARD2 = "Standard2";
 
-    @BeforeClass
-    public static void defineSchema() throws ConfigurationException
-    {
-        SchemaLoader.prepareServer();
-        SchemaLoader.createKeyspace(KEYSPACE1,
-                                    KeyspaceParams.simple(1),
-                                    SchemaLoader.standardCFMD(KEYSPACE1, CF_STANDARD1),
-                                    SchemaLoader.standardCFMD(KEYSPACE1, CF_STANDARD2));
-    }
-
-    public RecoveryManagerFlushedTest(ParameterizedClass commitLogCompression)
+    public RecoveryManagerFlushedTest(ParameterizedClass commitLogCompression, EncryptionContext encryptionContext)
     {
         DatabaseDescriptor.setCommitLogCompression(commitLogCompression);
+        DatabaseDescriptor.setEncryptionContext(encryptionContext);
+    }
+
+    @Parameters()
+    public static Collection<Object[]> generateData()
+    {
+        return Arrays.asList(new Object[][]{
+            {null, EncryptionContextGenerator.createDisabledContext()}, // No compression, no encryption
+            {null, EncryptionContextGenerator.createContext(true)}, // Encryption
+            {new ParameterizedClass(LZ4Compressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+            {new ParameterizedClass(SnappyCompressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+            {new ParameterizedClass(DeflateCompressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()}});
     }
 
     @Before
@@ -76,14 +80,14 @@
         CommitLog.instance.resetUnsafe(true);
     }
 
-    @Parameters()
-    public static Collection<Object[]> generateData()
+    @BeforeClass
+    public static void defineSchema() throws ConfigurationException
     {
-        return Arrays.asList(new Object[][] {
-                { null }, // No compression
-                { new ParameterizedClass(LZ4Compressor.class.getName(), Collections.emptyMap()) },
-                { new ParameterizedClass(SnappyCompressor.class.getName(), Collections.emptyMap()) },
-                { new ParameterizedClass(DeflateCompressor.class.getName(), Collections.emptyMap()) } });
+        SchemaLoader.prepareServer();
+        SchemaLoader.createKeyspace(KEYSPACE1,
+                                    KeyspaceParams.simple(1),
+                                    SchemaLoader.standardCFMD(KEYSPACE1, CF_STANDARD1),
+                                    SchemaLoader.standardCFMD(KEYSPACE1, CF_STANDARD2));
     }
 
     @Test

diff --git a/test/unit/org/apache/cassandra/db/RecoveryManagerMissingHeaderTest.java b/test/unit/org/apache/cassandra/db/RecoveryManagerMissingHeaderTest.java
index 8ac7c5d..a67e9e5 100644
--- a/test/unit/org/apache/cassandra/db/RecoveryManagerMissingHeaderTest.java
+++ b/test/unit/org/apache/cassandra/db/RecoveryManagerMissingHeaderTest.java

@@ -36,14 +36,16 @@
 import org.apache.cassandra.Util;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.ParameterizedClass;
-import org.apache.cassandra.db.commitlog.CommitLog;
 import org.apache.cassandra.db.rows.UnfilteredRowIterator;
+import org.apache.cassandra.db.commitlog.CommitLog;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.io.compress.DeflateCompressor;
 import org.apache.cassandra.io.compress.LZ4Compressor;
 import org.apache.cassandra.io.compress.SnappyCompressor;
 import org.apache.cassandra.io.util.FileUtils;
 import org.apache.cassandra.schema.KeyspaceParams;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.security.EncryptionContextGenerator;
 
 @RunWith(Parameterized.class)
 public class RecoveryManagerMissingHeaderTest
@@ -54,9 +56,21 @@
     private static final String KEYSPACE2 = "RecoveryManager3Test2";
     private static final String CF_STANDARD3 = "Standard3";
 
-    public RecoveryManagerMissingHeaderTest(ParameterizedClass commitLogCompression)
+    public RecoveryManagerMissingHeaderTest(ParameterizedClass commitLogCompression, EncryptionContext encryptionContext)
     {
         DatabaseDescriptor.setCommitLogCompression(commitLogCompression);
+        DatabaseDescriptor.setEncryptionContext(encryptionContext);
+    }
+
+    @Parameters()
+    public static Collection<Object[]> generateData()
+    {
+        return Arrays.asList(new Object[][]{
+            {null, EncryptionContextGenerator.createDisabledContext()}, // No compression, no encryption
+            {null, EncryptionContextGenerator.createContext(true)}, // Encryption
+            {new ParameterizedClass(LZ4Compressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+            {new ParameterizedClass(SnappyCompressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+            {new ParameterizedClass(DeflateCompressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()}});
     }
 
     @Before
@@ -65,16 +79,6 @@
         CommitLog.instance.resetUnsafe(true);
     }
 
-    @Parameters()
-    public static Collection<Object[]> generateData()
-    {
-        return Arrays.asList(new Object[][] {
-                { null }, // No compression
-                { new ParameterizedClass(LZ4Compressor.class.getName(), Collections.emptyMap()) },
-                { new ParameterizedClass(SnappyCompressor.class.getName(), Collections.emptyMap()) },
-                { new ParameterizedClass(DeflateCompressor.class.getName(), Collections.emptyMap()) } });
-    }
-
     @BeforeClass
     public static void defineSchema() throws ConfigurationException
     {

diff --git a/test/unit/org/apache/cassandra/db/RecoveryManagerTest.java b/test/unit/org/apache/cassandra/db/RecoveryManagerTest.java
index 397030a..37d719e 100644
--- a/test/unit/org/apache/cassandra/db/RecoveryManagerTest.java
+++ b/test/unit/org/apache/cassandra/db/RecoveryManagerTest.java

@@ -23,7 +23,26 @@
 import java.util.Collection;
 import java.util.Collections;
 import java.util.Date;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Future;
+import java.util.concurrent.Semaphore;
 import java.util.concurrent.TimeUnit;
+import java.util.concurrent.TimeoutException;
+import java.util.concurrent.atomic.AtomicReference;
+
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.cassandra.Util;
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.config.ParameterizedClass;
+import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.db.context.CounterContext;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.io.compress.DeflateCompressor;
+import org.apache.cassandra.io.compress.LZ4Compressor;
+import org.apache.cassandra.io.compress.SnappyCompressor;
 
 import org.junit.Assert;
 import org.junit.Before;
@@ -33,27 +52,16 @@
 import org.junit.runners.Parameterized;
 import org.junit.runners.Parameterized.Parameters;
 
-import org.slf4j.Logger;
-import org.slf4j.LoggerFactory;
+import static org.junit.Assert.assertEquals;
 
 import org.apache.cassandra.SchemaLoader;
-import org.apache.cassandra.Util;
-import org.apache.cassandra.config.ColumnDefinition;
-import org.apache.cassandra.config.DatabaseDescriptor;
-import org.apache.cassandra.config.ParameterizedClass;
 import org.apache.cassandra.db.commitlog.CommitLog;
 import org.apache.cassandra.db.commitlog.CommitLogArchiver;
-import org.apache.cassandra.db.context.CounterContext;
-import org.apache.cassandra.db.rows.Row;
-import org.apache.cassandra.db.rows.UnfilteredRowIterator;
-import org.apache.cassandra.exceptions.ConfigurationException;
-import org.apache.cassandra.io.compress.DeflateCompressor;
-import org.apache.cassandra.io.compress.LZ4Compressor;
-import org.apache.cassandra.io.compress.SnappyCompressor;
 import org.apache.cassandra.schema.KeyspaceParams;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.security.EncryptionContextGenerator;
 import org.apache.cassandra.utils.ByteBufferUtil;
-
-import static org.junit.Assert.assertEquals;
+import org.apache.cassandra.db.commitlog.CommitLogReplayer;
 
 @RunWith(Parameterized.class)
 public class RecoveryManagerTest
@@ -67,6 +75,29 @@
     private static final String KEYSPACE2 = "RecoveryManagerTest2";
     private static final String CF_STANDARD3 = "Standard3";
 
+    public RecoveryManagerTest(ParameterizedClass commitLogCompression, EncryptionContext encryptionContext)
+    {
+        DatabaseDescriptor.setCommitLogCompression(commitLogCompression);
+        DatabaseDescriptor.setEncryptionContext(encryptionContext);
+    }
+
+    @Parameters()
+    public static Collection<Object[]> generateData()
+    {
+        return Arrays.asList(new Object[][]{
+            {null, EncryptionContextGenerator.createDisabledContext()}, // No compression, no encryption
+            {null, EncryptionContextGenerator.createContext(true)}, // Encryption
+            {new ParameterizedClass(LZ4Compressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+            {new ParameterizedClass(SnappyCompressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+            {new ParameterizedClass(DeflateCompressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()}});
+    }
+
+    @Before
+    public void setUp() throws IOException
+    {
+        CommitLog.instance.resetUnsafe(true);
+    }
+
     @BeforeClass
     public static void defineSchema() throws ConfigurationException
     {
@@ -83,24 +114,11 @@
     @Before
     public void clearData()
     {
+        // clear data
         Keyspace.open(KEYSPACE1).getColumnFamilyStore(CF_STANDARD1).truncateBlocking();
         Keyspace.open(KEYSPACE1).getColumnFamilyStore(CF_COUNTER1).truncateBlocking();
         Keyspace.open(KEYSPACE2).getColumnFamilyStore(CF_STANDARD3).truncateBlocking();
     }
-    public RecoveryManagerTest(ParameterizedClass commitLogCompression)
-    {
-        DatabaseDescriptor.setCommitLogCompression(commitLogCompression);
-    }
-
-    @Parameters()
-    public static Collection<Object[]> generateData()
-    {
-        return Arrays.asList(new Object[][] {
-                { null }, // No compression
-                { new ParameterizedClass(LZ4Compressor.class.getName(), Collections.emptyMap()) },
-                { new ParameterizedClass(SnappyCompressor.class.getName(), Collections.emptyMap()) },
-                { new ParameterizedClass(DeflateCompressor.class.getName(), Collections.emptyMap()) } });
-    }
 
     @Test
     public void testNothingToRecover() throws IOException
@@ -109,6 +127,79 @@
     }
 
     @Test
+    public void testRecoverBlocksOnBytesOutstanding() throws Exception
+    {
+        long originalMaxOutstanding = CommitLogReplayer.MAX_OUTSTANDING_REPLAY_BYTES;
+        CommitLogReplayer.MAX_OUTSTANDING_REPLAY_BYTES = 1;
+        CommitLogReplayer.MutationInitiator originalInitiator = CommitLogReplayer.mutationInitiator;
+        MockInitiator mockInitiator = new MockInitiator();
+        CommitLogReplayer.mutationInitiator = mockInitiator;
+        try
+        {
+            CommitLog.instance.resetUnsafe(true);
+            Keyspace keyspace1 = Keyspace.open(KEYSPACE1);
+            Keyspace keyspace2 = Keyspace.open(KEYSPACE2);
+
+            UnfilteredRowIterator upd1 = Util.apply(new RowUpdateBuilder(keyspace1.getColumnFamilyStore(CF_STANDARD1).metadata, 1L, 0, "keymulti")
+                .clustering("col1").add("val", "1")
+                .build());
+
+            UnfilteredRowIterator upd2 = Util.apply(new RowUpdateBuilder(keyspace2.getColumnFamilyStore(CF_STANDARD3).metadata, 1L, 0, "keymulti")
+                                           .clustering("col2").add("val", "1")
+                                           .build());
+
+            keyspace1.getColumnFamilyStore("Standard1").clearUnsafe();
+            keyspace2.getColumnFamilyStore("Standard3").clearUnsafe();
+
+            DecoratedKey dk = Util.dk("keymulti");
+            Assert.assertTrue(Util.getAllUnfiltered(Util.cmd(keyspace1.getColumnFamilyStore(CF_STANDARD1), dk).build()).isEmpty());
+            Assert.assertTrue(Util.getAllUnfiltered(Util.cmd(keyspace2.getColumnFamilyStore(CF_STANDARD3), dk).build()).isEmpty());
+
+            final AtomicReference<Throwable> err = new AtomicReference<Throwable>();
+            Thread t = new Thread() {
+                @Override
+                public void run()
+                {
+                    try
+                    {
+                        CommitLog.instance.resetUnsafe(false); // disassociate segments from live CL
+                    }
+                    catch (Throwable t)
+                    {
+                        err.set(t);
+                    }
+                }
+            };
+            t.start();
+            Assert.assertTrue(mockInitiator.blocked.tryAcquire(1, 20, TimeUnit.SECONDS));
+            Thread.sleep(100);
+            Assert.assertTrue(t.isAlive());
+            mockInitiator.blocker.release(Integer.MAX_VALUE);
+            t.join(20 * 1000);
+
+            if (err.get() != null)
+                throw new RuntimeException(err.get());
+
+            if (t.isAlive())
+            {
+                Throwable toPrint = new Throwable();
+                toPrint.setStackTrace(Thread.getAllStackTraces().get(t));
+                toPrint.printStackTrace(System.out);
+            }
+            Assert.assertFalse(t.isAlive());
+
+            Assert.assertTrue(Util.equal(upd1, Util.getOnlyPartitionUnfiltered(Util.cmd(keyspace1.getColumnFamilyStore(CF_STANDARD1), dk).build()).unfilteredIterator()));
+            Assert.assertTrue(Util.equal(upd2, Util.getOnlyPartitionUnfiltered(Util.cmd(keyspace2.getColumnFamilyStore(CF_STANDARD3), dk).build()).unfilteredIterator()));
+        }
+        finally
+        {
+            CommitLogReplayer.mutationInitiator = originalInitiator;
+            CommitLogReplayer.MAX_OUTSTANDING_REPLAY_BYTES = originalMaxOutstanding;
+        }
+    }
+
+
+    @Test
     public void testOne() throws IOException
     {
         CommitLog.instance.resetUnsafe(true);
@@ -159,8 +250,8 @@
     @Test
     public void testRecoverPIT() throws Exception
     {
-        ColumnFamilyStore cfs = Keyspace.open(KEYSPACE1).getColumnFamilyStore(CF_STANDARD1);
         CommitLog.instance.resetUnsafe(true);
+        ColumnFamilyStore cfs = Keyspace.open(KEYSPACE1).getColumnFamilyStore(CF_STANDARD1);
         Date date = CommitLogArchiver.format.parse("2112:12:12 12:12:12");
         long timeMS = date.getTime() - 5000;
 
@@ -187,8 +278,8 @@
     @Test
     public void testRecoverPITUnordered() throws Exception
     {
-        ColumnFamilyStore cfs = Keyspace.open(KEYSPACE1).getColumnFamilyStore(CF_STANDARD1);
         CommitLog.instance.resetUnsafe(true);
+        ColumnFamilyStore cfs = Keyspace.open(KEYSPACE1).getColumnFamilyStore(CF_STANDARD1);
         Date date = CommitLogArchiver.format.parse("2112:12:12 12:12:12");
         long timeMS = date.getTime();
 
@@ -218,4 +309,64 @@
 
         assertEquals(2, Util.getAll(Util.cmd(cfs).build()).size());
     }
+
+    private static class MockInitiator extends CommitLogReplayer.MutationInitiator
+    {
+        final Semaphore blocker = new Semaphore(0);
+        final Semaphore blocked = new Semaphore(0);
+
+        @Override
+        protected Future<Integer> initiateMutation(final Mutation mutation,
+                final long segmentId,
+                final int serializedSize,
+                final int entryLocation,
+                final CommitLogReplayer clr)
+        {
+            final Future<Integer> toWrap = super.initiateMutation(mutation,
+                                                                  segmentId,
+                                                                  serializedSize,
+                                                                  entryLocation,
+                                                                  clr);
+            return new Future<Integer>()
+            {
+
+                @Override
+                public boolean cancel(boolean mayInterruptIfRunning)
+                {
+                    throw new UnsupportedOperationException();
+                }
+
+                @Override
+                public boolean isCancelled()
+                {
+                    throw new UnsupportedOperationException();
+                }
+
+                @Override
+                public boolean isDone()
+                {
+                    return blocker.availablePermits() > 0 && toWrap.isDone();
+                }
+
+                @Override
+                public Integer get() throws InterruptedException, ExecutionException
+                {
+                    System.out.println("Got blocker once");
+                    blocked.release();
+                    blocker.acquire();
+                    return toWrap.get();
+                }
+
+                @Override
+                public Integer get(long timeout, TimeUnit unit)
+                        throws InterruptedException, ExecutionException, TimeoutException
+                {
+                    blocked.release();
+                    blocker.tryAcquire(1, timeout, unit);
+                    return toWrap.get(timeout, unit);
+                }
+
+            };
+        }
+    };
 }

diff --git a/test/unit/org/apache/cassandra/db/RecoveryManagerTruncateTest.java b/test/unit/org/apache/cassandra/db/RecoveryManagerTruncateTest.java
index 5a59f1c..738888f 100644
--- a/test/unit/org/apache/cassandra/db/RecoveryManagerTruncateTest.java
+++ b/test/unit/org/apache/cassandra/db/RecoveryManagerTruncateTest.java

@@ -23,13 +23,6 @@
 import java.util.Collection;
 import java.util.Collections;
 
-import org.junit.Before;
-import org.junit.BeforeClass;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-import org.junit.runners.Parameterized;
-import org.junit.runners.Parameterized.Parameters;
-
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
 import org.apache.cassandra.config.DatabaseDescriptor;
@@ -40,8 +33,17 @@
 import org.apache.cassandra.io.compress.LZ4Compressor;
 import org.apache.cassandra.io.compress.SnappyCompressor;
 import org.apache.cassandra.schema.KeyspaceParams;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.security.EncryptionContextGenerator;
 
-import static org.junit.Assert.assertTrue;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+import org.junit.runners.Parameterized;
+import org.junit.runners.Parameterized.Parameters;
+
+import static org.junit.Assert.*;
 
 /**
  * Test for the truncate operation.
@@ -52,9 +54,21 @@
     private static final String KEYSPACE1 = "RecoveryManagerTruncateTest";
     private static final String CF_STANDARD1 = "Standard1";
 
-    public RecoveryManagerTruncateTest(ParameterizedClass commitLogCompression)
+    public RecoveryManagerTruncateTest(ParameterizedClass commitLogCompression, EncryptionContext encryptionContext)
     {
         DatabaseDescriptor.setCommitLogCompression(commitLogCompression);
+        DatabaseDescriptor.setEncryptionContext(encryptionContext);
+    }
+
+    @Parameters()
+    public static Collection<Object[]> generateData()
+    {
+        return Arrays.asList(new Object[][]{
+            {null, EncryptionContextGenerator.createDisabledContext()}, // No compression, no encryption
+            {null, EncryptionContextGenerator.createContext(true)}, // Encryption
+            {new ParameterizedClass(LZ4Compressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+            {new ParameterizedClass(SnappyCompressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+            {new ParameterizedClass(DeflateCompressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()}});
     }
 
     @Before
@@ -63,16 +77,6 @@
         CommitLog.instance.resetUnsafe(true);
     }
 
-    @Parameters()
-    public static Collection<Object[]> generateData()
-    {
-        return Arrays.asList(new Object[][] {
-                { null }, // No compression
-                { new ParameterizedClass(LZ4Compressor.class.getName(), Collections.emptyMap()) },
-                { new ParameterizedClass(SnappyCompressor.class.getName(), Collections.emptyMap()) },
-                { new ParameterizedClass(DeflateCompressor.class.getName(), Collections.emptyMap()) } });
-    }
-
     @BeforeClass
     public static void defineSchema() throws ConfigurationException
     {

diff --git a/test/unit/org/apache/cassandra/db/RepairedDataTombstonesTest.java b/test/unit/org/apache/cassandra/db/RepairedDataTombstonesTest.java
index 3a74029..ad009a4 100644
--- a/test/unit/org/apache/cassandra/db/RepairedDataTombstonesTest.java
+++ b/test/unit/org/apache/cassandra/db/RepairedDataTombstonesTest.java

@@ -177,7 +177,8 @@
         Thread.sleep(1000);
         ReadCommand cmd = Util.cmd(getCurrentColumnFamilyStore()).build();
         int partitionsFound = 0;
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup(); UnfilteredPartitionIterator iterator = cmd.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController();
+             UnfilteredPartitionIterator iterator = cmd.executeLocally(executionController))
         {
             while (iterator.hasNext())
             {
@@ -232,7 +233,8 @@
     {
         ReadCommand cmd = Util.cmd(getCurrentColumnFamilyStore()).build();
         int foundRows = 0;
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup(); UnfilteredPartitionIterator iterator = cmd.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController();
+             UnfilteredPartitionIterator iterator = cmd.executeLocally(executionController))
         {
             while (iterator.hasNext())
             {
@@ -263,7 +265,8 @@
     {
         ReadCommand cmd = Util.cmd(getCurrentColumnFamilyStore(), Util.dk(ByteBufferUtil.bytes(key))).build();
         int foundRows = 0;
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup(); UnfilteredPartitionIterator iterator = cmd.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController();
+             UnfilteredPartitionIterator iterator = cmd.executeLocally(executionController))
         {
             while (iterator.hasNext())
             {

diff --git a/test/unit/org/apache/cassandra/db/RowCacheTest.java b/test/unit/org/apache/cassandra/db/RowCacheTest.java
index 267e5e4..21d7b8f 100644
--- a/test/unit/org/apache/cassandra/db/RowCacheTest.java
+++ b/test/unit/org/apache/cassandra/db/RowCacheTest.java

@@ -480,7 +480,7 @@
         for (int i = offset; i < offset + numberOfRows; i++)
         {
             DecoratedKey key = Util.dk("key" + i);
-            Clustering cl = new Clustering(ByteBufferUtil.bytes("col" + i));
+            Clustering cl = Clustering.make(ByteBufferUtil.bytes("col" + i));
             Util.getAll(Util.cmd(store, key).build());
         }
     }

diff --git a/test/unit/org/apache/cassandra/db/RowIndexEntryTest.java b/test/unit/org/apache/cassandra/db/RowIndexEntryTest.java
index 62c88a0..73f97fa 100644
--- a/test/unit/org/apache/cassandra/db/RowIndexEntryTest.java
+++ b/test/unit/org/apache/cassandra/db/RowIndexEntryTest.java

@@ -20,94 +20,380 @@
 import java.io.File;
 import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collection;
 import java.util.Collections;
+import java.util.Iterator;
 import java.util.List;
 
+import com.google.common.primitives.Ints;
+import org.junit.Assert;
+import org.junit.Test;
+
 import org.apache.cassandra.Util;
+import org.apache.cassandra.cache.IMeasurableMemory;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.cql3.CQLTester;
+import org.apache.cassandra.db.columniterator.AbstractSSTableIterator;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.LongType;
+import org.apache.cassandra.db.partitions.ImmutableBTreePartition;
+import org.apache.cassandra.db.rows.AbstractUnfilteredRowIterator;
+import org.apache.cassandra.db.rows.BTreeRow;
+import org.apache.cassandra.db.rows.BufferCell;
+import org.apache.cassandra.db.rows.ColumnData;
 import org.apache.cassandra.db.rows.EncodingStats;
-import org.apache.cassandra.db.partitions.*;
-import org.apache.cassandra.io.sstable.IndexHelper;
+import org.apache.cassandra.db.rows.RangeTombstoneMarker;
+import org.apache.cassandra.db.rows.Row;
+import org.apache.cassandra.db.rows.Unfiltered;
+import org.apache.cassandra.db.rows.UnfilteredRowIterator;
+import org.apache.cassandra.db.rows.UnfilteredSerializer;
+import org.apache.cassandra.dht.Murmur3Partitioner;
+import org.apache.cassandra.dht.Token;
+import org.apache.cassandra.io.compress.BufferType;
+import org.apache.cassandra.io.sstable.IndexInfo;
+import org.apache.cassandra.io.sstable.format.SSTableFlushObserver;
+import org.apache.cassandra.io.sstable.format.Version;
 import org.apache.cassandra.io.sstable.format.big.BigFormat;
-import org.apache.cassandra.io.util.DataInputBuffer;
-import org.apache.cassandra.io.util.DataOutputBuffer;
-import org.apache.cassandra.io.util.SequentialWriter;
+import org.apache.cassandra.io.util.*;
+import org.apache.cassandra.serializers.LongSerializer;
+import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.FBUtilities;
-
-import org.junit.Assert;
-import org.junit.Test;
+import org.apache.cassandra.utils.ObjectSizes;
+import org.apache.cassandra.utils.btree.BTree;
 
 import static junit.framework.Assert.assertEquals;
 import static junit.framework.Assert.assertTrue;
 
 public class RowIndexEntryTest extends CQLTester
 {
-    private static final List<AbstractType<?>> clusterTypes = Collections.<AbstractType<?>>singletonList(LongType.instance);
+    private static final List<AbstractType<?>> clusterTypes = Collections.singletonList(LongType.instance);
     private static final ClusteringComparator comp = new ClusteringComparator(clusterTypes);
-    private static ClusteringPrefix cn(long l)
+
+    private static final byte[] dummy_100k = new byte[100000];
+
+    private static Clustering cn(long l)
     {
         return Util.clustering(comp, l);
     }
 
     @Test
-    public void testArtificialIndexOf() throws IOException
+    public void testC11206AgainstPreviousArray() throws Exception
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(99999);
+        testC11206AgainstPrevious();
+    }
+
+    @Test
+    public void testC11206AgainstPreviousShallow() throws Exception
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(0);
+        testC11206AgainstPrevious();
+    }
+
+    private static void testC11206AgainstPrevious() throws Exception
+    {
+        // partition without IndexInfo
+        try (DoubleSerializer doubleSerializer = new DoubleSerializer())
+        {
+            doubleSerializer.build(null, partitionKey(42L),
+                                   Collections.singletonList(cn(42)),
+                                   0L);
+            assertEquals(doubleSerializer.rieOldSerialized, doubleSerializer.rieNewSerialized);
+        }
+
+        // partition with multiple IndexInfo
+        try (DoubleSerializer doubleSerializer = new DoubleSerializer())
+        {
+            doubleSerializer.build(null, partitionKey(42L),
+                                   Arrays.asList(cn(42), cn(43), cn(44)),
+                                   0L);
+            assertEquals(doubleSerializer.rieOldSerialized, doubleSerializer.rieNewSerialized);
+        }
+
+        // partition with multiple IndexInfo
+        try (DoubleSerializer doubleSerializer = new DoubleSerializer())
+        {
+            doubleSerializer.build(null, partitionKey(42L),
+                                   Arrays.asList(cn(42), cn(43), cn(44), cn(45), cn(46), cn(47), cn(48), cn(49), cn(50), cn(51)),
+                                   0L);
+            assertEquals(doubleSerializer.rieOldSerialized, doubleSerializer.rieNewSerialized);
+        }
+    }
+
+    private static DecoratedKey partitionKey(long l)
+    {
+        ByteBuffer key = LongSerializer.instance.serialize(l);
+        Token token = Murmur3Partitioner.instance.getToken(key);
+        return new BufferDecoratedKey(token, key);
+    }
+
+    private static class DoubleSerializer implements AutoCloseable
     {
         CFMetaData cfMeta = CFMetaData.compile("CREATE TABLE pipe.dev_null (pk bigint, ck bigint, val text, PRIMARY KEY(pk, ck))", "foo");
+        Version version = BigFormat.latestVersion;
 
         DeletionTime deletionInfo = new DeletionTime(FBUtilities.timestampMicros(), FBUtilities.nowInSeconds());
+        LivenessInfo primaryKeyLivenessInfo = LivenessInfo.EMPTY;
+        Row.Deletion deletion = Row.Deletion.LIVE;
 
         SerializationHeader header = new SerializationHeader(true, cfMeta, cfMeta.partitionColumns(), EncodingStats.NO_STATS);
-        IndexHelper.IndexInfo.Serializer indexSerializer = new IndexHelper.IndexInfo.Serializer(cfMeta, BigFormat.latestVersion, header);
 
-        DataOutputBuffer dob = new DataOutputBuffer();
-        dob.writeUnsignedVInt(0);
-        DeletionTime.serializer.serialize(DeletionTime.LIVE, dob);
-        dob.writeUnsignedVInt(3);
-        int off0 = dob.getLength();
-        indexSerializer.serialize(new IndexHelper.IndexInfo(cn(0L), cn(5L), 0, 0, deletionInfo), dob);
-        int off1 = dob.getLength();
-        indexSerializer.serialize(new IndexHelper.IndexInfo(cn(10L), cn(15L), 0, 0, deletionInfo), dob);
-        int off2 = dob.getLength();
-        indexSerializer.serialize(new IndexHelper.IndexInfo(cn(20L), cn(25L), 0, 0, deletionInfo), dob);
-        dob.writeInt(off0);
-        dob.writeInt(off1);
-        dob.writeInt(off2);
+        // create C-11206 + old serializer instances
+        RowIndexEntry.IndexSerializer rieSerializer = new RowIndexEntry.Serializer(cfMeta, version, header);
+        Pre_C_11206_RowIndexEntry.Serializer oldSerializer = new Pre_C_11206_RowIndexEntry.Serializer(cfMeta, version, header);
 
-        @SuppressWarnings("resource") DataOutputBuffer dobRie = new DataOutputBuffer();
-        dobRie.writeUnsignedVInt(42L);
-        dobRie.writeUnsignedVInt(dob.getLength());
-        dobRie.write(dob.buffer());
+        @SuppressWarnings({ "resource", "IOResourceOpenedButNotSafelyClosed" })
+        final DataOutputBuffer rieOutput = new DataOutputBuffer(1024);
+        @SuppressWarnings({ "resource", "IOResourceOpenedButNotSafelyClosed" })
+        final DataOutputBuffer oldOutput = new DataOutputBuffer(1024);
 
-        ByteBuffer buf = dobRie.buffer();
+        final SequentialWriter dataWriterNew;
+        final SequentialWriter dataWriterOld;
+        final org.apache.cassandra.db.ColumnIndex columnIndex;
 
-        RowIndexEntry<IndexHelper.IndexInfo> rie = new RowIndexEntry.Serializer(cfMeta, BigFormat.latestVersion, header).deserialize(new DataInputBuffer(buf, false));
+        RowIndexEntry rieNew;
+        ByteBuffer rieNewSerialized;
+        Pre_C_11206_RowIndexEntry rieOld;
+        ByteBuffer rieOldSerialized;
 
-        Assert.assertEquals(42L, rie.position);
+        DoubleSerializer() throws IOException
+        {
+            SequentialWriterOption option = SequentialWriterOption.newBuilder().bufferSize(1024).build();
+            File f = File.createTempFile("RowIndexEntryTest-", "db");
+            dataWriterNew = new SequentialWriter(f, option);
+            columnIndex = new org.apache.cassandra.db.ColumnIndex(header, dataWriterNew, version, Collections.emptyList(),
+                                                                  rieSerializer.indexInfoSerializer());
 
-        Assert.assertEquals(0, IndexHelper.indexFor(cn(-1L), rie.columnsIndex(), comp, false, -1));
-        Assert.assertEquals(0, IndexHelper.indexFor(cn(5L), rie.columnsIndex(), comp, false, -1));
-        Assert.assertEquals(1, IndexHelper.indexFor(cn(12L), rie.columnsIndex(), comp, false, -1));
-        Assert.assertEquals(2, IndexHelper.indexFor(cn(17L), rie.columnsIndex(), comp, false, -1));
-        Assert.assertEquals(3, IndexHelper.indexFor(cn(100L), rie.columnsIndex(), comp, false, -1));
-        Assert.assertEquals(3, IndexHelper.indexFor(cn(100L), rie.columnsIndex(), comp, false, 0));
-        Assert.assertEquals(3, IndexHelper.indexFor(cn(100L), rie.columnsIndex(), comp, false, 1));
-        Assert.assertEquals(3, IndexHelper.indexFor(cn(100L), rie.columnsIndex(), comp, false, 2));
-        Assert.assertEquals(3, IndexHelper.indexFor(cn(100L), rie.columnsIndex(), comp, false, 3));
+            f = File.createTempFile("RowIndexEntryTest-", "db");
+            dataWriterOld = new SequentialWriter(f, option);
+        }
 
-        Assert.assertEquals(-1, IndexHelper.indexFor(cn(-1L), rie.columnsIndex(), comp, true, -1));
-        Assert.assertEquals(0, IndexHelper.indexFor(cn(5L), rie.columnsIndex(), comp, true, 3));
-        Assert.assertEquals(0, IndexHelper.indexFor(cn(5L), rie.columnsIndex(), comp, true, 2));
-        Assert.assertEquals(1, IndexHelper.indexFor(cn(17L), rie.columnsIndex(), comp, true, 3));
-        Assert.assertEquals(2, IndexHelper.indexFor(cn(100L), rie.columnsIndex(), comp, true, 3));
-        Assert.assertEquals(2, IndexHelper.indexFor(cn(100L), rie.columnsIndex(), comp, true, 4));
-        Assert.assertEquals(1, IndexHelper.indexFor(cn(12L), rie.columnsIndex(), comp, true, 3));
-        Assert.assertEquals(1, IndexHelper.indexFor(cn(12L), rie.columnsIndex(), comp, true, 2));
-        Assert.assertEquals(1, IndexHelper.indexFor(cn(100L), rie.columnsIndex(), comp, true, 1));
-        Assert.assertEquals(2, IndexHelper.indexFor(cn(100L), rie.columnsIndex(), comp, true, 2));
+        public void close() throws Exception
+        {
+            dataWriterNew.close();
+            dataWriterOld.close();
+        }
+
+        void build(Row staticRow, DecoratedKey partitionKey,
+                   Collection<Clustering> clusterings, long startPosition) throws IOException
+        {
+
+            Iterator<Clustering> clusteringIter = clusterings.iterator();
+            columnIndex.buildRowIndex(makeRowIter(staticRow, partitionKey, clusteringIter, dataWriterNew));
+            rieNew = RowIndexEntry.create(startPosition, 0L,
+                                          deletionInfo, columnIndex.headerLength, columnIndex.columnIndexCount,
+                                          columnIndex.indexInfoSerializedSize(),
+                                          columnIndex.indexSamples(), columnIndex.offsets(),
+                                          rieSerializer.indexInfoSerializer());
+            rieSerializer.serialize(rieNew, rieOutput, columnIndex.buffer());
+            rieNewSerialized = rieOutput.buffer().duplicate();
+
+            Iterator<Clustering> clusteringIter2 = clusterings.iterator();
+            ColumnIndex columnIndex = RowIndexEntryTest.ColumnIndex.writeAndBuildIndex(makeRowIter(staticRow, partitionKey, clusteringIter2, dataWriterOld),
+                                                                                       dataWriterOld, header, Collections.emptySet(), BigFormat.latestVersion);
+            rieOld = Pre_C_11206_RowIndexEntry.create(startPosition, deletionInfo, columnIndex);
+            oldSerializer.serialize(rieOld, oldOutput);
+            rieOldSerialized = oldOutput.buffer().duplicate();
+        }
+
+        private AbstractUnfilteredRowIterator makeRowIter(Row staticRow, DecoratedKey partitionKey,
+                                                          Iterator<Clustering> clusteringIter, SequentialWriter dataWriter)
+        {
+            return new AbstractUnfilteredRowIterator(cfMeta, partitionKey, deletionInfo, cfMeta.partitionColumns(),
+                                                     staticRow, false, new EncodingStats(0, 0, 0))
+            {
+                protected Unfiltered computeNext()
+                {
+                    if (!clusteringIter.hasNext())
+                        return endOfData();
+                    try
+                    {
+                        // write some fake bytes to the data file to force writing the IndexInfo object
+                        dataWriter.write(dummy_100k);
+                    }
+                    catch (IOException e)
+                    {
+                        throw new RuntimeException(e);
+                    }
+                    return buildRow(clusteringIter.next());
+                }
+            };
+        }
+
+        private Unfiltered buildRow(Clustering clustering)
+        {
+            BTree.Builder<ColumnData> builder = BTree.builder(ColumnData.comparator);
+            builder.add(BufferCell.live(cfMeta.partitionColumns().iterator().next(),
+                                        1L,
+                                        ByteBuffer.allocate(0)));
+            return BTreeRow.create(clustering, primaryKeyLivenessInfo, deletion, builder.build());
+        }
+    }
+
+    /**
+     * Pre C-11206 code.
+     */
+    static final class ColumnIndex
+    {
+        final long partitionHeaderLength;
+        final List<IndexInfo> columnsIndex;
+
+        private static final ColumnIndex EMPTY = new ColumnIndex(-1, Collections.emptyList());
+
+        private ColumnIndex(long partitionHeaderLength, List<IndexInfo> columnsIndex)
+        {
+            assert columnsIndex != null;
+
+            this.partitionHeaderLength = partitionHeaderLength;
+            this.columnsIndex = columnsIndex;
+        }
+
+        static ColumnIndex writeAndBuildIndex(UnfilteredRowIterator iterator,
+                                              SequentialWriter output,
+                                              SerializationHeader header,
+                                              Collection<SSTableFlushObserver> observers,
+                                              Version version) throws IOException
+        {
+            assert !iterator.isEmpty() && version.storeRows();
+
+            Builder builder = new Builder(iterator, output, header, observers, version.correspondingMessagingVersion());
+            return builder.build();
+        }
+
+        public static ColumnIndex nothing()
+        {
+            return EMPTY;
+        }
+
+        /**
+         * Help to create an index for a column family based on size of columns,
+         * and write said columns to disk.
+         */
+        private static class Builder
+        {
+            private final UnfilteredRowIterator iterator;
+            private final SequentialWriter writer;
+            private final SerializationHeader header;
+            private final int version;
+
+            private final List<IndexInfo> columnsIndex = new ArrayList<>();
+            private final long initialPosition;
+            private long headerLength = -1;
+
+            private long startPosition = -1;
+
+            private int written;
+            private long previousRowStart;
+
+            private ClusteringPrefix firstClustering;
+            private ClusteringPrefix lastClustering;
+
+            private DeletionTime openMarker;
+
+            private final Collection<SSTableFlushObserver> observers;
+
+            Builder(UnfilteredRowIterator iterator,
+                           SequentialWriter writer,
+                           SerializationHeader header,
+                           Collection<SSTableFlushObserver> observers,
+                           int version)
+            {
+                this.iterator = iterator;
+                this.writer = writer;
+                this.header = header;
+                this.version = version;
+                this.observers = observers == null ? Collections.emptyList() : observers;
+                this.initialPosition = writer.position();
+            }
+
+            private void writePartitionHeader(UnfilteredRowIterator iterator) throws IOException
+            {
+                ByteBufferUtil.writeWithShortLength(iterator.partitionKey().getKey(), writer);
+                DeletionTime.serializer.serialize(iterator.partitionLevelDeletion(), writer);
+                if (header.hasStatic())
+                    UnfilteredSerializer.serializer.serializeStaticRow(iterator.staticRow(), header, writer, version);
+            }
+
+            public ColumnIndex build() throws IOException
+            {
+                writePartitionHeader(iterator);
+                this.headerLength = writer.position() - initialPosition;
+
+                while (iterator.hasNext())
+                    add(iterator.next());
+
+                return close();
+            }
+
+            private long currentPosition()
+            {
+                return writer.position() - initialPosition;
+            }
+
+            private void addIndexBlock()
+            {
+                IndexInfo cIndexInfo = new IndexInfo(firstClustering,
+                                                     lastClustering,
+                                                     startPosition,
+                                                                             currentPosition() - startPosition,
+                                                     openMarker);
+                columnsIndex.add(cIndexInfo);
+                firstClustering = null;
+            }
+
+            private void add(Unfiltered unfiltered) throws IOException
+            {
+                long pos = currentPosition();
+
+                if (firstClustering == null)
+                {
+                    // Beginning of an index block. Remember the start and position
+                    firstClustering = unfiltered.clustering();
+                    startPosition = pos;
+                }
+
+                UnfilteredSerializer.serializer.serialize(unfiltered, header, writer, pos - previousRowStart, version);
+
+                // notify observers about each new row
+                if (!observers.isEmpty())
+                    observers.forEach((o) -> o.nextUnfilteredCluster(unfiltered));
+
+                lastClustering = unfiltered.clustering();
+                previousRowStart = pos;
+                ++written;
+
+                if (unfiltered.kind() == Unfiltered.Kind.RANGE_TOMBSTONE_MARKER)
+                {
+                    RangeTombstoneMarker marker = (RangeTombstoneMarker)unfiltered;
+                    openMarker = marker.isOpen(false) ? marker.openDeletionTime(false) : null;
+                }
+
+                // if we hit the column index size that we have to index after, go ahead and index it.
+                if (currentPosition() - startPosition >= DatabaseDescriptor.getColumnIndexSize())
+                    addIndexBlock();
+
+            }
+
+            private ColumnIndex close() throws IOException
+            {
+                UnfilteredSerializer.serializer.writeEndOfPartition(writer);
+
+                // It's possible we add no rows, just a top level deletion
+                if (written == 0)
+                    return RowIndexEntryTest.ColumnIndex.EMPTY;
+
+                // the last column may have fallen on an index boundary already.  if not, index it explicitly.
+                if (firstClustering != null)
+                    addIndexBlock();
+
+                // we should always have at least one computed index block, but we only write it out if there is more than that.
+                assert !columnsIndex.isEmpty() && headerLength >= 0;
+                return new ColumnIndex(headerLength, columnsIndex);
+            }
+        }
     }
 
     @Test
@@ -116,11 +402,11 @@
         String tableName = createTable("CREATE TABLE %s (a int, b text, c int, PRIMARY KEY(a, b))");
         ColumnFamilyStore cfs = Keyspace.open(KEYSPACE).getColumnFamilyStore(tableName);
 
-        final RowIndexEntry simple = new RowIndexEntry(123);
+        Pre_C_11206_RowIndexEntry simple = new Pre_C_11206_RowIndexEntry(123);
 
         DataOutputBuffer buffer = new DataOutputBuffer();
         SerializationHeader header = new SerializationHeader(true, cfs.metadata, cfs.metadata.partitionColumns(), EncodingStats.NO_STATS);
-        RowIndexEntry.Serializer serializer = new RowIndexEntry.Serializer(cfs.metadata, BigFormat.latestVersion, header);
+        Pre_C_11206_RowIndexEntry.Serializer serializer = new Pre_C_11206_RowIndexEntry.Serializer(cfs.metadata, BigFormat.latestVersion, header);
 
         serializer.serialize(simple, buffer);
 
@@ -128,16 +414,16 @@
 
         // write enough rows to ensure we get a few column index entries
         for (int i = 0; i <= DatabaseDescriptor.getColumnIndexSize() / 4; i++)
-            execute("INSERT INTO %s (a, b, c) VALUES (?, ?, ?)", 0, "" + i, i);
+            execute("INSERT INTO %s (a, b, c) VALUES (?, ?, ?)", 0, String.valueOf(i), i);
 
         ImmutableBTreePartition partition = Util.getOnlyPartitionUnfiltered(Util.cmd(cfs).build());
 
         File tempFile = File.createTempFile("row_index_entry_test", null);
         tempFile.deleteOnExit();
-        SequentialWriter writer = SequentialWriter.open(tempFile);
-        ColumnIndex columnIndex = ColumnIndex.writeAndBuildIndex(partition.unfilteredIterator(), writer, header, BigFormat.latestVersion);
-        RowIndexEntry<IndexHelper.IndexInfo> withIndex = RowIndexEntry.create(0xdeadbeef, DeletionTime.LIVE, columnIndex);
-        IndexHelper.IndexInfo.Serializer indexSerializer = new IndexHelper.IndexInfo.Serializer(cfs.metadata, BigFormat.latestVersion, header);
+        SequentialWriter writer = new SequentialWriter(tempFile);
+        ColumnIndex columnIndex = RowIndexEntryTest.ColumnIndex.writeAndBuildIndex(partition.unfilteredIterator(), writer, header, Collections.emptySet(), BigFormat.latestVersion);
+        Pre_C_11206_RowIndexEntry withIndex = Pre_C_11206_RowIndexEntry.create(0xdeadbeef, DeletionTime.LIVE, columnIndex);
+        IndexInfo.Serializer indexSerializer = cfs.metadata.serializers().indexInfoSerializer(BigFormat.latestVersion, header);
 
         // sanity check
         assertTrue(columnIndex.columnsIndex.size() >= 3);
@@ -174,11 +460,11 @@
 
         bb = buffer.buffer();
         input = new DataInputBuffer(bb, false);
-        RowIndexEntry.Serializer.skip(input, BigFormat.latestVersion);
+        Pre_C_11206_RowIndexEntry.Serializer.skip(input, BigFormat.latestVersion);
         Assert.assertEquals(0, bb.remaining());
     }
 
-    private void serializationCheck(RowIndexEntry<IndexHelper.IndexInfo> withIndex, IndexHelper.IndexInfo.Serializer indexSerializer, ByteBuffer bb, DataInputBuffer input) throws IOException
+    private static void serializationCheck(Pre_C_11206_RowIndexEntry withIndex, IndexInfo.Serializer indexSerializer, ByteBuffer bb, DataInputBuffer input) throws IOException
     {
         Assert.assertEquals(0xdeadbeef, input.readUnsignedVInt());
         Assert.assertEquals(withIndex.promotedSize(indexSerializer), input.readUnsignedVInt());
@@ -193,7 +479,7 @@
         {
             int pos = bb.position();
             offsets[i] = pos - offset;
-            IndexHelper.IndexInfo info = indexSerializer.deserialize(input);
+            IndexInfo info = indexSerializer.deserialize(input);
             int end = bb.position();
 
             Assert.assertEquals(indexSerializer.serializedSize(info), end - pos);
@@ -210,4 +496,361 @@
 
         Assert.assertEquals(0, bb.remaining());
     }
+
+    static class Pre_C_11206_RowIndexEntry implements IMeasurableMemory
+    {
+        private static final long EMPTY_SIZE = ObjectSizes.measure(new Pre_C_11206_RowIndexEntry(0));
+
+        public final long position;
+
+        Pre_C_11206_RowIndexEntry(long position)
+        {
+            this.position = position;
+        }
+
+        protected int promotedSize(IndexInfo.Serializer idxSerializer)
+        {
+            return 0;
+        }
+
+        public static Pre_C_11206_RowIndexEntry create(long position, DeletionTime deletionTime, ColumnIndex index)
+        {
+            assert index != null;
+            assert deletionTime != null;
+
+            // we only consider the columns summary when determining whether to create an IndexedEntry,
+            // since if there are insufficient columns to be worth indexing we're going to seek to
+            // the beginning of the row anyway, so we might as well read the tombstone there as well.
+            if (index.columnsIndex.size() > 1)
+                return new Pre_C_11206_RowIndexEntry.IndexedEntry(position, deletionTime, index.partitionHeaderLength, index.columnsIndex);
+            else
+                return new Pre_C_11206_RowIndexEntry(position);
+        }
+
+        /**
+         * @return true if this index entry contains the row-level tombstone and column summary.  Otherwise,
+         * caller should fetch these from the row header.
+         */
+        public boolean isIndexed()
+        {
+            return !columnsIndex().isEmpty();
+        }
+
+        public DeletionTime deletionTime()
+        {
+            throw new UnsupportedOperationException();
+        }
+
+        /**
+         * The length of the row header (partition key, partition deletion and static row).
+         * This value is only provided for indexed entries and this method will throw
+         * {@code UnsupportedOperationException} if {@code !isIndexed()}.
+         */
+        public long headerLength()
+        {
+            throw new UnsupportedOperationException();
+        }
+
+        public List<IndexInfo> columnsIndex()
+        {
+            return Collections.emptyList();
+        }
+
+        public long unsharedHeapSize()
+        {
+            return EMPTY_SIZE;
+        }
+
+        public static class Serializer
+        {
+            private final IndexInfo.Serializer idxSerializer;
+            private final Version version;
+
+            Serializer(CFMetaData metadata, Version version, SerializationHeader header)
+            {
+                this.idxSerializer = metadata.serializers().indexInfoSerializer(version, header);
+                this.version = version;
+            }
+
+            public void serialize(Pre_C_11206_RowIndexEntry rie, DataOutputPlus out) throws IOException
+            {
+                assert version.storeRows() : "We read old index files but we should never write them";
+
+                out.writeUnsignedVInt(rie.position);
+                out.writeUnsignedVInt(rie.promotedSize(idxSerializer));
+
+                if (rie.isIndexed())
+                {
+                    out.writeUnsignedVInt(rie.headerLength());
+                    DeletionTime.serializer.serialize(rie.deletionTime(), out);
+                    out.writeUnsignedVInt(rie.columnsIndex().size());
+
+                    // Calculate and write the offsets to the IndexInfo objects.
+
+                    int[] offsets = new int[rie.columnsIndex().size()];
+
+                    if (out.hasPosition())
+                    {
+                        // Out is usually a SequentialWriter, so using the file-pointer is fine to generate the offsets.
+                        // A DataOutputBuffer also works.
+                        long start = out.position();
+                        int i = 0;
+                        for (IndexInfo info : rie.columnsIndex())
+                        {
+                            offsets[i] = i == 0 ? 0 : (int)(out.position() - start);
+                            i++;
+                            idxSerializer.serialize(info, out);
+                        }
+                    }
+                    else
+                    {
+                        // Not sure this branch will ever be needed, but if it is called, it has to calculate the
+                        // serialized sizes instead of simply using the file-pointer.
+                        int i = 0;
+                        int offset = 0;
+                        for (IndexInfo info : rie.columnsIndex())
+                        {
+                            offsets[i++] = offset;
+                            idxSerializer.serialize(info, out);
+                            offset += idxSerializer.serializedSize(info);
+                        }
+                    }
+
+                    for (int off : offsets)
+                        out.writeInt(off);
+                }
+            }
+
+            public Pre_C_11206_RowIndexEntry deserialize(DataInputPlus in) throws IOException
+            {
+                if (!version.storeRows())
+                {
+                    long position = in.readLong();
+
+                    int size = in.readInt();
+                    if (size > 0)
+                    {
+                        DeletionTime deletionTime = DeletionTime.serializer.deserialize(in);
+
+                        int entries = in.readInt();
+                        List<IndexInfo> columnsIndex = new ArrayList<>(entries);
+
+                        long headerLength = 0L;
+                        for (int i = 0; i < entries; i++)
+                        {
+                            IndexInfo info = idxSerializer.deserialize(in);
+                            columnsIndex.add(info);
+                            if (i == 0)
+                                headerLength = info.offset;
+                        }
+
+                        return new Pre_C_11206_RowIndexEntry.IndexedEntry(position, deletionTime, headerLength, columnsIndex);
+                    }
+                    else
+                    {
+                        return new Pre_C_11206_RowIndexEntry(position);
+                    }
+                }
+
+                long position = in.readUnsignedVInt();
+
+                int size = (int)in.readUnsignedVInt();
+                if (size > 0)
+                {
+                    long headerLength = in.readUnsignedVInt();
+                    DeletionTime deletionTime = DeletionTime.serializer.deserialize(in);
+                    int entries = (int)in.readUnsignedVInt();
+                    List<IndexInfo> columnsIndex = new ArrayList<>(entries);
+                    for (int i = 0; i < entries; i++)
+                        columnsIndex.add(idxSerializer.deserialize(in));
+
+                    in.skipBytesFully(entries * TypeSizes.sizeof(0));
+
+                    return new Pre_C_11206_RowIndexEntry.IndexedEntry(position, deletionTime, headerLength, columnsIndex);
+                }
+                else
+                {
+                    return new Pre_C_11206_RowIndexEntry(position);
+                }
+            }
+
+            // Reads only the data 'position' of the index entry and returns it. Note that this left 'in' in the middle
+            // of reading an entry, so this is only useful if you know what you are doing and in most case 'deserialize'
+            // should be used instead.
+            static long readPosition(DataInputPlus in, Version version) throws IOException
+            {
+                return version.storeRows() ? in.readUnsignedVInt() : in.readLong();
+            }
+
+            public static void skip(DataInputPlus in, Version version) throws IOException
+            {
+                readPosition(in, version);
+                skipPromotedIndex(in, version);
+            }
+
+            private static void skipPromotedIndex(DataInputPlus in, Version version) throws IOException
+            {
+                int size = version.storeRows() ? (int)in.readUnsignedVInt() : in.readInt();
+                if (size <= 0)
+                    return;
+
+                in.skipBytesFully(size);
+            }
+
+            public int serializedSize(Pre_C_11206_RowIndexEntry rie)
+            {
+                assert version.storeRows() : "We read old index files but we should never write them";
+
+                int indexedSize = 0;
+                if (rie.isIndexed())
+                {
+                    List<IndexInfo> index = rie.columnsIndex();
+
+                    indexedSize += TypeSizes.sizeofUnsignedVInt(rie.headerLength());
+                    indexedSize += DeletionTime.serializer.serializedSize(rie.deletionTime());
+                    indexedSize += TypeSizes.sizeofUnsignedVInt(index.size());
+
+                    for (IndexInfo info : index)
+                        indexedSize += idxSerializer.serializedSize(info);
+
+                    indexedSize += index.size() * TypeSizes.sizeof(0);
+                }
+
+                return TypeSizes.sizeofUnsignedVInt(rie.position) + TypeSizes.sizeofUnsignedVInt(indexedSize) + indexedSize;
+            }
+        }
+
+        /**
+         * An entry in the row index for a row whose columns are indexed.
+         */
+        private static final class IndexedEntry extends Pre_C_11206_RowIndexEntry
+        {
+            private final DeletionTime deletionTime;
+
+            // The offset in the file when the index entry end
+            private final long headerLength;
+            private final List<IndexInfo> columnsIndex;
+            private static final long BASE_SIZE =
+            ObjectSizes.measure(new IndexedEntry(0, DeletionTime.LIVE, 0, Arrays.asList(null, null)))
+            + ObjectSizes.measure(new ArrayList<>(1));
+
+            private IndexedEntry(long position, DeletionTime deletionTime, long headerLength, List<IndexInfo> columnsIndex)
+            {
+                super(position);
+                assert deletionTime != null;
+                assert columnsIndex != null && columnsIndex.size() > 1;
+                this.deletionTime = deletionTime;
+                this.headerLength = headerLength;
+                this.columnsIndex = columnsIndex;
+            }
+
+            @Override
+            public DeletionTime deletionTime()
+            {
+                return deletionTime;
+            }
+
+            @Override
+            public long headerLength()
+            {
+                return headerLength;
+            }
+
+            @Override
+            public List<IndexInfo> columnsIndex()
+            {
+                return columnsIndex;
+            }
+
+            @Override
+            protected int promotedSize(IndexInfo.Serializer idxSerializer)
+            {
+                long size = TypeSizes.sizeofUnsignedVInt(headerLength)
+                            + DeletionTime.serializer.serializedSize(deletionTime)
+                            + TypeSizes.sizeofUnsignedVInt(columnsIndex.size()); // number of entries
+                for (IndexInfo info : columnsIndex)
+                    size += idxSerializer.serializedSize(info);
+
+                size += columnsIndex.size() * TypeSizes.sizeof(0);
+
+                return Ints.checkedCast(size);
+            }
+
+            @Override
+            public long unsharedHeapSize()
+            {
+                long entrySize = 0;
+                for (IndexInfo idx : columnsIndex)
+                    entrySize += idx.unsharedHeapSize();
+
+                return BASE_SIZE
+                       + entrySize
+                       + deletionTime.unsharedHeapSize()
+                       + ObjectSizes.sizeOfReferenceArray(columnsIndex.size());
+            }
+        }
+    }
+
+    @Test
+    public void testIndexFor() throws IOException
+    {
+        DeletionTime deletionInfo = new DeletionTime(FBUtilities.timestampMicros(), FBUtilities.nowInSeconds());
+
+        List<IndexInfo> indexes = new ArrayList<>();
+        indexes.add(new IndexInfo(cn(0L), cn(5L), 0, 0, deletionInfo));
+        indexes.add(new IndexInfo(cn(10L), cn(15L), 0, 0, deletionInfo));
+        indexes.add(new IndexInfo(cn(20L), cn(25L), 0, 0, deletionInfo));
+
+        RowIndexEntry rie = new RowIndexEntry(0L)
+        {
+            public IndexInfoRetriever openWithIndex(SegmentedFile indexFile)
+            {
+                return new IndexInfoRetriever()
+                {
+                    public IndexInfo columnsIndex(int index)
+                    {
+                        return indexes.get(index);
+                    }
+
+                    public void close()
+                    {
+                    }
+                };
+            }
+
+            public int columnsIndexCount()
+            {
+                return indexes.size();
+            }
+        };
+        
+        AbstractSSTableIterator.IndexState indexState = new AbstractSSTableIterator.IndexState(
+            null, comp, rie, false, null                                                                                              
+        );
+        
+        assertEquals(0, indexState.indexFor(cn(-1L), -1));
+        assertEquals(0, indexState.indexFor(cn(5L), -1));
+        assertEquals(1, indexState.indexFor(cn(12L), -1));
+        assertEquals(2, indexState.indexFor(cn(17L), -1));
+        assertEquals(3, indexState.indexFor(cn(100L), -1));
+        assertEquals(3, indexState.indexFor(cn(100L), 0));
+        assertEquals(3, indexState.indexFor(cn(100L), 1));
+        assertEquals(3, indexState.indexFor(cn(100L), 2));
+        assertEquals(3, indexState.indexFor(cn(100L), 3));
+
+        indexState = new AbstractSSTableIterator.IndexState(
+            null, comp, rie, true, null
+        );
+
+        assertEquals(-1, indexState.indexFor(cn(-1L), -1));
+        assertEquals(0, indexState.indexFor(cn(5L), 3));
+        assertEquals(0, indexState.indexFor(cn(5L), 2));
+        assertEquals(1, indexState.indexFor(cn(17L), 3));
+        assertEquals(2, indexState.indexFor(cn(100L), 3));
+        assertEquals(2, indexState.indexFor(cn(100L), 4));
+        assertEquals(1, indexState.indexFor(cn(12L), 3));
+        assertEquals(1, indexState.indexFor(cn(12L), 2));
+        assertEquals(1, indexState.indexFor(cn(100L), 1));
+        assertEquals(2, indexState.indexFor(cn(100L), 2));
+    }
 }

diff --git a/test/unit/org/apache/cassandra/db/RowTest.java b/test/unit/org/apache/cassandra/db/RowTest.java
index e3f4884..7294b3a 100644
--- a/test/unit/org/apache/cassandra/db/RowTest.java
+++ b/test/unit/org/apache/cassandra/db/RowTest.java

@@ -108,12 +108,12 @@
             while (merged.hasNext())
             {
                 RangeTombstoneBoundMarker openMarker = (RangeTombstoneBoundMarker)merged.next();
-                Slice.Bound openBound = openMarker.clustering();
+                ClusteringBound openBound = openMarker.clustering();
                 DeletionTime openDeletion = new DeletionTime(openMarker.deletionTime().markedForDeleteAt(),
                                                                    openMarker.deletionTime().localDeletionTime());
 
                 RangeTombstoneBoundMarker closeMarker = (RangeTombstoneBoundMarker)merged.next();
-                Slice.Bound closeBound = closeMarker.clustering();
+                ClusteringBound closeBound = closeMarker.clustering();
                 DeletionTime closeDeletion = new DeletionTime(closeMarker.deletionTime().markedForDeleteAt(),
                                                                     closeMarker.deletionTime().localDeletionTime());
 
@@ -185,16 +185,16 @@
         assertEquals(Integer.valueOf(1), map.get(row));
     }
 
-    private void assertRangeTombstoneMarkers(Slice.Bound start, Slice.Bound end, DeletionTime deletionTime, Object[] expected)
+    private void assertRangeTombstoneMarkers(ClusteringBound start, ClusteringBound end, DeletionTime deletionTime, Object[] expected)
     {
         AbstractType clusteringType = (AbstractType)cfm.comparator.subtype(0);
 
         assertEquals(1, start.size());
-        assertEquals(start.kind(), Slice.Bound.Kind.INCL_START_BOUND);
+        assertEquals(start.kind(), ClusteringPrefix.Kind.INCL_START_BOUND);
         assertEquals(expected[0], clusteringType.getString(start.get(0)));
 
         assertEquals(1, end.size());
-        assertEquals(end.kind(), Slice.Bound.Kind.INCL_END_BOUND);
+        assertEquals(end.kind(), ClusteringPrefix.Kind.INCL_END_BOUND);
         assertEquals(expected[1], clusteringType.getString(end.get(0)));
 
         assertEquals(expected[2], deletionTime.markedForDeleteAt());
@@ -213,6 +213,6 @@
                                       String value,
                                       long timestamp)
     {
-       builder.addCell(BufferCell.live(cfm, columnDefinition, timestamp, ((AbstractType) columnDefinition.cellValueType()).decompose(value)));
+       builder.addCell(BufferCell.live(columnDefinition, timestamp, ((AbstractType) columnDefinition.cellValueType()).decompose(value)));
     }
 }

diff --git a/test/unit/org/apache/cassandra/db/ScrubTest.java b/test/unit/org/apache/cassandra/db/ScrubTest.java
index f97d9a9..7e7e145 100644
--- a/test/unit/org/apache/cassandra/db/ScrubTest.java
+++ b/test/unit/org/apache/cassandra/db/ScrubTest.java

@@ -32,6 +32,7 @@
 import org.junit.runner.RunWith;
 
 import org.apache.cassandra.*;
+import org.apache.cassandra.cache.ChunkCache;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.cql3.Operator;
@@ -420,6 +421,8 @@
         file.seek(startPosition);
         file.writeBytes(StringUtils.repeat('z', (int) (endPosition - startPosition)));
         file.close();
+        if (ChunkCache.instance != null)
+            ChunkCache.instance.invalidateFile(sstable.getFilename());
     }
 
     private static void assertOrderedAll(ColumnFamilyStore cfs, int expectedSize)
@@ -636,14 +639,14 @@
     {
         SerializationHeader header = new SerializationHeader(true, metadata, metadata.partitionColumns(), EncodingStats.NO_STATS);
         MetadataCollector collector = new MetadataCollector(metadata.comparator).sstableLevel(0);
-        return new TestMultiWriter(new TestWriter(descriptor, keyCount, 0, metadata, collector, header, txn));
+        return new TestMultiWriter(new TestWriter(descriptor, keyCount, 0, metadata, collector, header, txn), txn);
     }
 
     private static class TestMultiWriter extends SimpleSSTableMultiWriter
     {
-        TestMultiWriter(SSTableWriter writer)
+        TestMultiWriter(SSTableWriter writer, LifecycleTransaction txn)
         {
-            super(writer);
+            super(writer, txn);
         }
     }
 
@@ -655,7 +658,7 @@
         TestWriter(Descriptor descriptor, long keyCount, long repairedAt, CFMetaData metadata,
                    MetadataCollector collector, SerializationHeader header, LifecycleTransaction txn)
         {
-            super(descriptor, keyCount, repairedAt, metadata, collector, header, txn);
+            super(descriptor, keyCount, repairedAt, metadata, collector, header, Collections.emptySet(), txn);
         }
 
         @Override

diff --git a/test/unit/org/apache/cassandra/db/SecondaryIndexTest.java b/test/unit/org/apache/cassandra/db/SecondaryIndexTest.java
index bbccc48..a037d90 100644
--- a/test/unit/org/apache/cassandra/db/SecondaryIndexTest.java
+++ b/test/unit/org/apache/cassandra/db/SecondaryIndexTest.java

@@ -28,7 +28,6 @@
 import org.junit.Before;
 import org.junit.BeforeClass;
 import org.junit.Test;
-
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
 import org.apache.cassandra.config.ColumnDefinition;
@@ -37,10 +36,8 @@
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.partitions.*;
 import org.apache.cassandra.db.rows.Row;
-import org.apache.cassandra.db.rows.RowIterator;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.index.Index;
-import org.apache.cassandra.index.internal.CassandraIndex;
 import org.apache.cassandra.schema.IndexMetadata;
 import org.apache.cassandra.schema.KeyspaceParams;
 import org.apache.cassandra.utils.ByteBufferUtil;
@@ -66,7 +63,7 @@
         SchemaLoader.prepareServer();
         SchemaLoader.createKeyspace(KEYSPACE1,
                                     KeyspaceParams.simple(1),
-                                    SchemaLoader.compositeIndexCFMD(KEYSPACE1, WITH_COMPOSITE_INDEX, true).gcGraceSeconds(0),
+                                    SchemaLoader.compositeIndexCFMD(KEYSPACE1, WITH_COMPOSITE_INDEX, true, true).gcGraceSeconds(0),
                                     SchemaLoader.compositeIndexCFMD(KEYSPACE1, COMPOSITE_INDEX_TO_BE_ADDED, false).gcGraceSeconds(0),
                                     SchemaLoader.keysIndexCFMD(KEYSPACE1, WITH_KEYS_INDEX, true).gcGraceSeconds(0));
     }
@@ -119,7 +116,8 @@
                                       .build();
 
         Index.Searcher searcher = cfs.indexManager.getBestIndexFor(rc).searcherFor(rc);
-        try (ReadOrderGroup orderGroup = rc.startOrderGroup(); UnfilteredPartitionIterator pi = searcher.search(orderGroup))
+        try (ReadExecutionController executionController = rc.executionController();
+             UnfilteredPartitionIterator pi = searcher.search(executionController))
         {
             assertTrue(pi.hasNext());
             pi.next().close();
@@ -324,15 +322,30 @@
     @Test
     public void testDeleteOfInconsistentValuesFromCompositeIndex() throws Exception
     {
+        runDeleteOfInconsistentValuesFromCompositeIndexTest(false);
+    }
+
+    @Test
+    public void testDeleteOfInconsistentValuesFromCompositeIndexOnStaticColumn() throws Exception
+    {
+        runDeleteOfInconsistentValuesFromCompositeIndexTest(true);
+    }
+
+    private void runDeleteOfInconsistentValuesFromCompositeIndexTest(boolean isStatic) throws Exception
+    {
         Keyspace keyspace = Keyspace.open(KEYSPACE1);
         String cfName = WITH_COMPOSITE_INDEX;
 
         ColumnFamilyStore cfs = keyspace.getColumnFamilyStore(cfName);
 
-        ByteBuffer col = ByteBufferUtil.bytes("birthdate");
+        String colName = isStatic ? "static" : "birthdate";
+        ByteBuffer col = ByteBufferUtil.bytes(colName);
 
         // create a row and update the author value
-        new RowUpdateBuilder(cfs.metadata, 0, "k1").clustering("c").add("birthdate", 10l).build().applyUnsafe();
+        RowUpdateBuilder builder = new RowUpdateBuilder(cfs.metadata, 0, "k1");
+        if (!isStatic)
+            builder = builder.clustering("c");
+        builder.add(colName, 10l).build().applyUnsafe();
 
         // test that the index query fetches this version
         assertIndexedOne(cfs, col, 10l);
@@ -342,9 +355,11 @@
         assertIndexedOne(cfs, col, 10l);
 
         // now apply another update, but force the index update to be skipped
-        keyspace.apply(new RowUpdateBuilder(cfs.metadata, 1, "k1").clustering("c").add("birthdate", 20l).build(),
-                       true,
-                       false);
+        builder = new RowUpdateBuilder(cfs.metadata, 0, "k1");
+        if (!isStatic)
+            builder = builder.clustering("c");
+        builder.add(colName, 20l);
+        keyspace.apply(builder.build(), true, false);
 
         // Now searching the index for either the old or new value should return 0 rows
         // because the new value was not indexed and the old value should be ignored
@@ -356,7 +371,11 @@
         // now, reset back to the original value, still skipping the index update, to
         // make sure the value was expunged from the index when it was discovered to be inconsistent
         // TODO: Figure out why this is re-inserting
-        keyspace.apply(new RowUpdateBuilder(cfs.metadata, 2, "k1").clustering("c1").add("birthdate", 10l).build(), true, false);
+        builder = new RowUpdateBuilder(cfs.metadata, 2, "k1");
+        if (!isStatic)
+            builder = builder.clustering("c");
+        builder.add(colName, 10L);
+        keyspace.apply(builder.build(), true, false);
         assertIndexedNone(cfs, col, 20l);
 
         ColumnFamilyStore indexCfs = cfs.indexManager.getAllIndexColumnFamilyStores().iterator().next();
@@ -508,8 +527,8 @@
         if (count != 0)
             assertNotNull(searcher);
 
-        try (ReadOrderGroup orderGroup = rc.startOrderGroup();
-             PartitionIterator iter = UnfilteredPartitionIterators.filter(searcher.search(orderGroup),
+        try (ReadExecutionController executionController = rc.executionController();
+             PartitionIterator iter = UnfilteredPartitionIterators.filter(searcher.search(executionController),
                                                                           FBUtilities.nowInSeconds()))
         {
             assertEquals(count, Util.size(iter));
@@ -519,8 +538,8 @@
     private void assertIndexCfsIsEmpty(ColumnFamilyStore indexCfs)
     {
         PartitionRangeReadCommand command = (PartitionRangeReadCommand)Util.cmd(indexCfs).build();
-        try (ReadOrderGroup orderGroup = command.startOrderGroup();
-             PartitionIterator iter = UnfilteredPartitionIterators.filter(Util.executeLocally(command, indexCfs, orderGroup),
+        try (ReadExecutionController controller = command.executionController();
+             PartitionIterator iter = UnfilteredPartitionIterators.filter(Util.executeLocally(command, indexCfs, controller),
                                                                           FBUtilities.nowInSeconds()))
         {
             assertFalse(iter.hasNext());

diff --git a/test/unit/org/apache/cassandra/db/SinglePartitionSliceCommandTest.java b/test/unit/org/apache/cassandra/db/SinglePartitionSliceCommandTest.java
index b5d8159..45a2c1e 100644
--- a/test/unit/org/apache/cassandra/db/SinglePartitionSliceCommandTest.java
+++ b/test/unit/org/apache/cassandra/db/SinglePartitionSliceCommandTest.java

@@ -114,7 +114,7 @@
 
         ColumnFilter columnFilter = ColumnFilter.selection(PartitionColumns.of(v));
         ByteBuffer zero = ByteBufferUtil.bytes(0);
-        Slices slices = Slices.with(cfm.comparator, Slice.make(Slice.Bound.inclusiveStartOf(zero), Slice.Bound.inclusiveEndOf(zero)));
+        Slices slices = Slices.with(cfm.comparator, Slice.make(ClusteringBound.inclusiveStartOf(zero), ClusteringBound.inclusiveEndOf(zero)));
         ClusteringIndexSliceFilter sliceFilter = new ClusteringIndexSliceFilter(slices, false);
         ReadCommand cmd = new SinglePartitionReadCommand(false, MessagingService.VERSION_30, true, cfm,
                                                           FBUtilities.nowInSeconds(),
@@ -130,16 +130,22 @@
         cmd = ReadCommand.legacyReadCommandSerializer.deserialize(in, MessagingService.VERSION_21);
 
         logger.debug("ReadCommand: {}", cmd);
-        UnfilteredPartitionIterator partitionIterator = cmd.executeLocally(ReadOrderGroup.emptyGroup());
-        ReadResponse response = ReadResponse.createDataResponse(partitionIterator, cmd);
+        try (ReadExecutionController controller = cmd.executionController();
+             UnfilteredPartitionIterator partitionIterator = cmd.executeLocally(controller))
+        {
+            ReadResponse response = ReadResponse.createDataResponse(partitionIterator, cmd);
 
-        logger.debug("creating response: {}", response);
-        partitionIterator = response.makeIterator(cmd);
-        assert partitionIterator.hasNext();
-        UnfilteredRowIterator partition = partitionIterator.next();
-
-        LegacyLayout.LegacyUnfilteredPartition rowIter = LegacyLayout.fromUnfilteredRowIterator(cmd, partition);
-        Assert.assertEquals(Collections.emptyList(), rowIter.cells);
+            logger.debug("creating response: {}", response);
+            try (UnfilteredPartitionIterator pIter = response.makeIterator(cmd))
+            {
+                assert pIter.hasNext();
+                try (UnfilteredRowIterator partition = pIter.next())
+                {
+                    LegacyLayout.LegacyUnfilteredPartition rowIter = LegacyLayout.fromUnfilteredRowIterator(cmd, partition);
+                    Assert.assertEquals(Collections.emptyList(), rowIter.cells);
+                }
+            }
+        }
     }
 
     private void checkForS(UnfilteredPartitionIterator pi)
@@ -175,7 +181,7 @@
                                                          sliceFilter);
 
         // check raw iterator for static cell
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup(); UnfilteredPartitionIterator pi = cmd.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController(); UnfilteredPartitionIterator pi = cmd.executeLocally(executionController))
         {
             checkForS(pi);
         }
@@ -186,7 +192,7 @@
         ReadResponse dst;
 
         // check (de)serialized iterator for memtable static cell
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup(); UnfilteredPartitionIterator pi = cmd.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController(); UnfilteredPartitionIterator pi = cmd.executeLocally(executionController))
         {
             response = ReadResponse.createDataResponse(pi, cmd);
         }
@@ -202,7 +208,7 @@
 
         // check (de)serialized iterator for sstable static cell
         Schema.instance.getColumnFamilyStoreInstance(cfm.cfId).forceBlockingFlush();
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup(); UnfilteredPartitionIterator pi = cmd.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController(); UnfilteredPartitionIterator pi = cmd.executeLocally(executionController))
         {
             response = ReadResponse.createDataResponse(pi, cmd);
         }

diff --git a/test/unit/org/apache/cassandra/db/VerifyTest.java b/test/unit/org/apache/cassandra/db/VerifyTest.java
index 9de01c1..d216860 100644
--- a/test/unit/org/apache/cassandra/db/VerifyTest.java
+++ b/test/unit/org/apache/cassandra/db/VerifyTest.java

@@ -23,6 +23,7 @@
 import org.apache.cassandra.OrderedJUnit4ClassRunner;
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
+import org.apache.cassandra.cache.ChunkCache;
 import org.apache.cassandra.UpdateBuilder;
 import org.apache.cassandra.db.compaction.CompactionManager;
 import org.apache.cassandra.db.compaction.Verifier;
@@ -313,6 +314,8 @@
         file.seek(startPosition);
         file.writeBytes(StringUtils.repeat('z', (int) 2));
         file.close();
+        if (ChunkCache.instance != null)
+            ChunkCache.instance.invalidateFile(sstable.getFilename());
 
         // Update the Digest to have the right Checksum
         writeChecksum(simpleFullChecksum(sstable.getFilename()), sstable.descriptor.filenameFor(sstable.descriptor.digestComponent));

diff --git a/test/unit/org/apache/cassandra/db/commitlog/CommitLogDescriptorTest.java b/test/unit/org/apache/cassandra/db/commitlog/CommitLogDescriptorTest.java
index 898c19f..fdedafd 100644
--- a/test/unit/org/apache/cassandra/db/commitlog/CommitLogDescriptorTest.java
+++ b/test/unit/org/apache/cassandra/db/commitlog/CommitLogDescriptorTest.java

@@ -15,88 +15,298 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
+
 package org.apache.cassandra.db.commitlog;
 
 import java.io.IOException;
 import java.nio.ByteBuffer;
+import java.util.Collections;
 import java.util.HashMap;
 import java.util.Map;
 
 import com.google.common.collect.ImmutableMap;
-
+import org.junit.Assert;
+import org.junit.Before;
 import org.junit.Test;
 
 import org.apache.cassandra.config.ParameterizedClass;
+import org.apache.cassandra.config.TransparentDataEncryptionOptions;
 import org.apache.cassandra.exceptions.ConfigurationException;
-import org.apache.cassandra.io.util.DataInputBuffer;
+import org.apache.cassandra.io.compress.LZ4Compressor;
+import org.apache.cassandra.io.util.FileDataInput;
+import org.apache.cassandra.io.util.FileSegmentInputStream;
 import org.apache.cassandra.net.MessagingService;
-
-import static org.junit.Assert.assertEquals;
-import static org.junit.Assert.assertFalse;
-import static org.junit.Assert.assertTrue;
-import static org.junit.Assert.fail;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.security.EncryptionContextGenerator;
 
 public class CommitLogDescriptorTest
 {
+    private static final byte[] iv = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15};
+
+    ParameterizedClass compression;
+    TransparentDataEncryptionOptions enabledTdeOptions;
+
+    // Context with enabledTdeOptions enabled
+    EncryptionContext enabledEncryption;
+
+    // Context with enabledTdeOptions disabled, with the assumption that enabledTdeOptions was never previously enabled
+    EncryptionContext neverEnabledEncryption;
+
+    // Context with enabledTdeOptions disabled, with the assumption that enabledTdeOptions was previously enabled, but now disabled
+    // due to operator changing the yaml.
+    EncryptionContext previouslyEnabledEncryption;
+
+    @Before
+    public void setup()
+    {
+        Map<String,String> params = new HashMap<>();
+        compression = new ParameterizedClass(LZ4Compressor.class.getName(), params);
+
+        enabledTdeOptions = EncryptionContextGenerator.createEncryptionOptions();
+        enabledEncryption = new EncryptionContext(enabledTdeOptions, iv, false);
+        
+        neverEnabledEncryption = EncryptionContextGenerator.createDisabledContext();
+        TransparentDataEncryptionOptions disaabledTdeOptions = new TransparentDataEncryptionOptions(false, enabledTdeOptions.cipher, enabledTdeOptions.key_alias, enabledTdeOptions.key_provider);
+        previouslyEnabledEncryption = new EncryptionContext(disaabledTdeOptions);
+    }
+
     @Test
     public void testVersions()
     {
-        assertTrue(CommitLogDescriptor.isValid("CommitLog-1340512736956320000.log"));
-        assertTrue(CommitLogDescriptor.isValid("CommitLog-2-1340512736956320000.log"));
-        assertFalse(CommitLogDescriptor.isValid("CommitLog--1340512736956320000.log"));
-        assertFalse(CommitLogDescriptor.isValid("CommitLog--2-1340512736956320000.log"));
-        assertFalse(CommitLogDescriptor.isValid("CommitLog-2-1340512736956320000-123.log"));
+        Assert.assertTrue(CommitLogDescriptor.isValid("CommitLog-1340512736956320000.log"));
+        Assert.assertTrue(CommitLogDescriptor.isValid("CommitLog-2-1340512736956320000.log"));
+        Assert.assertFalse(CommitLogDescriptor.isValid("CommitLog--1340512736956320000.log"));
+        Assert.assertFalse(CommitLogDescriptor.isValid("CommitLog--2-1340512736956320000.log"));
+        Assert.assertFalse(CommitLogDescriptor.isValid("CommitLog-2-1340512736956320000-123.log"));
 
-        assertEquals(1340512736956320000L, CommitLogDescriptor.fromFileName("CommitLog-2-1340512736956320000.log").id);
+        Assert.assertEquals(1340512736956320000L, CommitLogDescriptor.fromFileName("CommitLog-2-1340512736956320000.log").id);
 
-        assertEquals(MessagingService.current_version, new CommitLogDescriptor(1340512736956320000L, null).getMessagingVersion());
+        Assert.assertEquals(MessagingService.current_version, new CommitLogDescriptor(1340512736956320000L, null, neverEnabledEncryption).getMessagingVersion());
         String newCLName = "CommitLog-" + CommitLogDescriptor.current_version + "-1340512736956320000.log";
-        assertEquals(MessagingService.current_version, CommitLogDescriptor.fromFileName(newCLName).getMessagingVersion());
+        Assert.assertEquals(MessagingService.current_version, CommitLogDescriptor.fromFileName(newCLName).getMessagingVersion());
     }
 
+    // migrated from CommitLogTest
     private void testDescriptorPersistence(CommitLogDescriptor desc) throws IOException
     {
         ByteBuffer buf = ByteBuffer.allocate(1024);
         CommitLogDescriptor.writeHeader(buf, desc);
+        long length = buf.position();
         // Put some extra data in the stream.
         buf.putDouble(0.1);
         buf.flip();
-
-        try (DataInputBuffer input = new DataInputBuffer(buf, false))
-        {
-            CommitLogDescriptor read = CommitLogDescriptor.readHeader(input);
-            assertEquals("Descriptors", desc, read);
-        }
+        FileDataInput input = new FileSegmentInputStream(buf, "input", 0);
+        CommitLogDescriptor read = CommitLogDescriptor.readHeader(input, neverEnabledEncryption);
+        Assert.assertEquals("Descriptor length", length, input.getFilePointer());
+        Assert.assertEquals("Descriptors", desc, read);
     }
 
+    // migrated from CommitLogTest
     @Test
     public void testDescriptorPersistence() throws IOException
     {
-        testDescriptorPersistence(new CommitLogDescriptor(11, null));
-        testDescriptorPersistence(new CommitLogDescriptor(CommitLogDescriptor.VERSION_21, 13, null));
-        testDescriptorPersistence(new CommitLogDescriptor(CommitLogDescriptor.VERSION_30, 15, null));
-        testDescriptorPersistence(new CommitLogDescriptor(CommitLogDescriptor.VERSION_30, 17, new ParameterizedClass("LZ4Compressor", null)));
-        testDescriptorPersistence(new CommitLogDescriptor(CommitLogDescriptor.VERSION_30, 19,
-                new ParameterizedClass("StubbyCompressor", ImmutableMap.of("parameter1", "value1", "flag2", "55", "argument3", "null"))));
+        testDescriptorPersistence(new CommitLogDescriptor(11, null, neverEnabledEncryption));
+        testDescriptorPersistence(new CommitLogDescriptor(CommitLogDescriptor.VERSION_21, 13, null, neverEnabledEncryption));
+        testDescriptorPersistence(new CommitLogDescriptor(CommitLogDescriptor.VERSION_22, 15, null, neverEnabledEncryption));
+        testDescriptorPersistence(new CommitLogDescriptor(CommitLogDescriptor.VERSION_22, 17, new ParameterizedClass("LZ4Compressor", null), neverEnabledEncryption));
+        testDescriptorPersistence(new CommitLogDescriptor(CommitLogDescriptor.VERSION_22, 19,
+                                                          new ParameterizedClass("StubbyCompressor", ImmutableMap.of("parameter1", "value1", "flag2", "55", "argument3", "null")
+                                                          ), neverEnabledEncryption));
     }
 
+    // migrated from CommitLogTest
     @Test
     public void testDescriptorInvalidParametersSize() throws IOException
     {
-        final int numberOfParameters = 65535;
-        Map<String, String> params = new HashMap<>(numberOfParameters);
-        for (int i=0; i<numberOfParameters; ++i)
+        Map<String, String> params = new HashMap<>();
+        for (int i=0; i<65535; ++i)
             params.put("key"+i, Integer.toString(i, 16));
         try {
-            CommitLogDescriptor desc = new CommitLogDescriptor(CommitLogDescriptor.VERSION_30,
+            CommitLogDescriptor desc = new CommitLogDescriptor(CommitLogDescriptor.VERSION_22,
                                                                21,
-                                                               new ParameterizedClass("LZ4Compressor", params));
+                                                               new ParameterizedClass("LZ4Compressor", params),
+                                                               neverEnabledEncryption);
+
             ByteBuffer buf = ByteBuffer.allocate(1024000);
             CommitLogDescriptor.writeHeader(buf, desc);
-            fail("Parameter object too long should fail on writing descriptor.");
+            Assert.fail("Parameter object too long should fail on writing descriptor.");
         } catch (ConfigurationException e)
         {
             // correct path
         }
     }
+
+    @Test
+    public void constructParametersString_NoCompressionOrEncryption()
+    {
+        String json = CommitLogDescriptor.constructParametersString(null, null, Collections.emptyMap());
+        Assert.assertFalse(json.contains(CommitLogDescriptor.COMPRESSION_CLASS_KEY));
+        Assert.assertFalse(json.contains(EncryptionContext.ENCRYPTION_CIPHER));
+
+        json = CommitLogDescriptor.constructParametersString(null, neverEnabledEncryption, Collections.emptyMap());
+        Assert.assertFalse(json.contains(CommitLogDescriptor.COMPRESSION_CLASS_KEY));
+        Assert.assertFalse(json.contains(EncryptionContext.ENCRYPTION_CIPHER));
+    }
+
+    @Test
+    public void constructParametersString_WithCompressionAndEncryption()
+    {
+        String json = CommitLogDescriptor.constructParametersString(compression, enabledEncryption, Collections.emptyMap());
+        Assert.assertTrue(json.contains(CommitLogDescriptor.COMPRESSION_CLASS_KEY));
+        Assert.assertTrue(json.contains(EncryptionContext.ENCRYPTION_CIPHER));
+    }
+
+    @Test
+    public void writeAndReadHeader_NoCompressionOrEncryption() throws IOException
+    {
+        CommitLogDescriptor descriptor = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, neverEnabledEncryption);
+        ByteBuffer buffer = ByteBuffer.allocate(16 * 1024);
+        CommitLogDescriptor.writeHeader(buffer, descriptor);
+        buffer.flip();
+        FileSegmentInputStream dataInput = new FileSegmentInputStream(buffer, null, 0);
+        CommitLogDescriptor result = CommitLogDescriptor.readHeader(dataInput, neverEnabledEncryption);
+        Assert.assertNotNull(result);
+        Assert.assertNull(result.compression);
+        Assert.assertFalse(result.getEncryptionContext().isEnabled());
+    }
+
+    @Test
+    public void writeAndReadHeader_OnlyCompression() throws IOException
+    {
+        CommitLogDescriptor descriptor = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, compression, neverEnabledEncryption);
+        ByteBuffer buffer = ByteBuffer.allocate(16 * 1024);
+        CommitLogDescriptor.writeHeader(buffer, descriptor);
+        buffer.flip();
+        FileSegmentInputStream dataInput = new FileSegmentInputStream(buffer, null, 0);
+        CommitLogDescriptor result = CommitLogDescriptor.readHeader(dataInput, neverEnabledEncryption);
+        Assert.assertNotNull(result);
+        Assert.assertEquals(compression, result.compression);
+        Assert.assertFalse(result.getEncryptionContext().isEnabled());
+    }
+
+    @Test
+    public void writeAndReadHeader_WithEncryptionHeader_EncryptionEnabledInYaml() throws IOException
+    {
+        CommitLogDescriptor descriptor = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, enabledEncryption);
+        ByteBuffer buffer = ByteBuffer.allocate(16 * 1024);
+        CommitLogDescriptor.writeHeader(buffer, descriptor);
+        buffer.flip();
+        FileSegmentInputStream dataInput = new FileSegmentInputStream(buffer, null, 0);
+        CommitLogDescriptor result = CommitLogDescriptor.readHeader(dataInput, enabledEncryption);
+        Assert.assertNotNull(result);
+        Assert.assertNull(result.compression);
+        Assert.assertTrue(result.getEncryptionContext().isEnabled());
+        Assert.assertArrayEquals(iv, result.getEncryptionContext().getIV());
+    }
+
+    /**
+     * Check that even though enabledTdeOptions is disabled in the yaml, we can still read the commit log header as encrypted.
+     */
+    @Test
+    public void writeAndReadHeader_WithEncryptionHeader_EncryptionDisabledInYaml() throws IOException
+    {
+        CommitLogDescriptor descriptor = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, enabledEncryption);
+        ByteBuffer buffer = ByteBuffer.allocate(16 * 1024);
+        CommitLogDescriptor.writeHeader(buffer, descriptor);
+        buffer.flip();
+        FileSegmentInputStream dataInput = new FileSegmentInputStream(buffer, null, 0);
+        CommitLogDescriptor result = CommitLogDescriptor.readHeader(dataInput, previouslyEnabledEncryption);
+        Assert.assertNotNull(result);
+        Assert.assertNull(result.compression);
+        Assert.assertTrue(result.getEncryptionContext().isEnabled());
+        Assert.assertArrayEquals(iv, result.getEncryptionContext().getIV());
+    }
+
+    /**
+     * Shouldn't happen in the real world (should only have either compression or enabledTdeOptions), but the header
+     * functionality should be correct
+     */
+    @Test
+    public void writeAndReadHeader_WithCompressionAndEncryption() throws IOException
+    {
+        CommitLogDescriptor descriptor = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, compression, enabledEncryption);
+        ByteBuffer buffer = ByteBuffer.allocate(16 * 1024);
+        CommitLogDescriptor.writeHeader(buffer, descriptor);
+        buffer.flip();
+        FileSegmentInputStream dataInput = new FileSegmentInputStream(buffer, null, 0);
+        CommitLogDescriptor result = CommitLogDescriptor.readHeader(dataInput, enabledEncryption);
+        Assert.assertNotNull(result);
+        Assert.assertEquals(compression, result.compression);
+        Assert.assertTrue(result.getEncryptionContext().isEnabled());
+        Assert.assertEquals(enabledEncryption, result.getEncryptionContext());
+        Assert.assertArrayEquals(iv, result.getEncryptionContext().getIV());
+    }
+
+    @Test
+    public void equals_NoCompressionOrEncryption()
+    {
+        CommitLogDescriptor desc1 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, null);
+        Assert.assertEquals(desc1, desc1);
+
+        CommitLogDescriptor desc2 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, null);
+        Assert.assertEquals(desc1, desc2);
+
+        desc1 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, neverEnabledEncryption);
+        Assert.assertEquals(desc1, desc1);
+        desc2 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, neverEnabledEncryption);
+        Assert.assertEquals(desc1, desc2);
+
+        desc1 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, previouslyEnabledEncryption);
+        Assert.assertEquals(desc1, desc1);
+        desc2 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, previouslyEnabledEncryption);
+        Assert.assertEquals(desc1, desc2);
+    }
+
+    @Test
+    public void equals_OnlyCompression()
+    {
+        CommitLogDescriptor desc1 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, compression, null);
+        Assert.assertEquals(desc1, desc1);
+
+        CommitLogDescriptor desc2 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, compression, null);
+        Assert.assertEquals(desc1, desc2);
+
+        desc1 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, compression, neverEnabledEncryption);
+        Assert.assertEquals(desc1, desc1);
+        desc2 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, compression, neverEnabledEncryption);
+        Assert.assertEquals(desc1, desc2);
+
+        desc1 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, compression, previouslyEnabledEncryption);
+        Assert.assertEquals(desc1, desc1);
+        desc2 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, compression, previouslyEnabledEncryption);
+        Assert.assertEquals(desc1, desc2);
+    }
+
+    @Test
+    public void equals_OnlyEncryption()
+    {
+        CommitLogDescriptor desc1 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, enabledEncryption);
+        Assert.assertEquals(desc1, desc1);
+
+        CommitLogDescriptor desc2 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, enabledEncryption);
+        Assert.assertEquals(desc1, desc2);
+
+        desc1 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, neverEnabledEncryption);
+        Assert.assertEquals(desc1, desc1);
+        desc2 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, neverEnabledEncryption);
+        Assert.assertEquals(desc1, desc2);
+
+        desc1 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, previouslyEnabledEncryption);
+        Assert.assertEquals(desc1, desc1);
+        desc2 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, null, previouslyEnabledEncryption);
+        Assert.assertEquals(desc1, desc2);
+    }
+
+    /**
+     * Shouldn't have both enabled in real life, but ensure they are correct, nonetheless
+     */
+    @Test
+    public void equals_BothCompressionAndEncryption()
+    {
+        CommitLogDescriptor desc1 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, compression, enabledEncryption);
+        Assert.assertEquals(desc1, desc1);
+
+        CommitLogDescriptor desc2 = new CommitLogDescriptor(CommitLogDescriptor.current_version, 1, compression, enabledEncryption);
+        Assert.assertEquals(desc1, desc2);
+    }
 }

diff --git a/test/unit/org/apache/cassandra/db/commitlog/CommitLogReaderTest.java b/test/unit/org/apache/cassandra/db/commitlog/CommitLogReaderTest.java
new file mode 100644
index 0000000..edff3b7
--- /dev/null
+++ b/test/unit/org/apache/cassandra/db/commitlog/CommitLogReaderTest.java

@@ -0,0 +1,267 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db.commitlog;
+
+import java.io.File;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.List;
+
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.config.Config;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.cql3.CQLTester;
+import org.apache.cassandra.cql3.ColumnIdentifier;
+import org.apache.cassandra.db.Keyspace;
+import org.apache.cassandra.db.Mutation;
+import org.apache.cassandra.db.partitions.PartitionUpdate;
+import org.apache.cassandra.db.rows.Row;
+import org.apache.cassandra.utils.JVMStabilityInspector;
+import org.apache.cassandra.utils.KillerForTests;
+
+public class CommitLogReaderTest extends CQLTester
+{
+    @BeforeClass
+    public static void beforeClass()
+    {
+        DatabaseDescriptor.setCommitFailurePolicy(Config.CommitFailurePolicy.ignore);
+        JVMStabilityInspector.replaceKiller(new KillerForTests(false));
+    }
+
+    @Before
+    public void before() throws IOException
+    {
+        CommitLog.instance.resetUnsafe(true);
+    }
+
+    @Test
+    public void testReadAll() throws Throwable
+    {
+        int samples = 1000;
+        populateData(samples);
+        ArrayList<File> toCheck = getCommitLogs();
+
+        CommitLogReader reader = new CommitLogReader();
+
+        TestCLRHandler testHandler = new TestCLRHandler(currentTableMetadata());
+        for (File f : toCheck)
+            reader.readCommitLogSegment(testHandler, f, CommitLogReader.ALL_MUTATIONS, false);
+
+        Assert.assertEquals("Expected 1000 seen mutations, got: " + testHandler.seenMutationCount(),
+                            1000, testHandler.seenMutationCount());
+
+        confirmReadOrder(testHandler, 0);
+    }
+
+    @Test
+    public void testReadCount() throws Throwable
+    {
+        int samples = 50;
+        int readCount = 10;
+        populateData(samples);
+        ArrayList<File> toCheck = getCommitLogs();
+
+        CommitLogReader reader = new CommitLogReader();
+        TestCLRHandler testHandler = new TestCLRHandler();
+
+        for (File f : toCheck)
+            reader.readCommitLogSegment(testHandler, f, readCount - testHandler.seenMutationCount(), false);
+
+        Assert.assertEquals("Expected " + readCount + " seen mutations, got: " + testHandler.seenMutations.size(),
+                            readCount, testHandler.seenMutationCount());
+    }
+
+    @Test
+    public void testReadFromMidpoint() throws Throwable
+    {
+        int samples = 1000;
+        int readCount = 500;
+        CommitLogPosition midpoint = populateData(samples);
+        ArrayList<File> toCheck = getCommitLogs();
+
+        CommitLogReader reader = new CommitLogReader();
+        TestCLRHandler testHandler = new TestCLRHandler();
+
+        // Will skip on incorrect segments due to id mismatch on midpoint
+        for (File f : toCheck)
+            reader.readCommitLogSegment(testHandler, f, midpoint, readCount, false);
+
+        // Confirm correct count on replay
+        Assert.assertEquals("Expected " + readCount + " seen mutations, got: " + testHandler.seenMutations.size(),
+                            readCount, testHandler.seenMutationCount());
+
+        confirmReadOrder(testHandler, samples / 2);
+    }
+
+    @Test
+    public void testReadFromMidpointTooMany() throws Throwable
+    {
+        int samples = 1000;
+        int readCount = 5000;
+        CommitLogPosition midpoint = populateData(samples);
+        ArrayList<File> toCheck = getCommitLogs();
+
+        CommitLogReader reader = new CommitLogReader();
+        TestCLRHandler testHandler = new TestCLRHandler(currentTableMetadata());
+
+        // Reading from mid to overflow by 4.5k
+        // Will skip on incorrect segments due to id mismatch on midpoint
+        for (File f : toCheck)
+            reader.readCommitLogSegment(testHandler, f, midpoint, readCount, false);
+
+        Assert.assertEquals("Expected " + samples / 2 + " seen mutations, got: " + testHandler.seenMutations.size(),
+                            samples / 2, testHandler.seenMutationCount());
+
+        confirmReadOrder(testHandler, samples / 2);
+    }
+
+    @Test
+    public void testReadCountFromMidpoint() throws Throwable
+    {
+        int samples = 1000;
+        int readCount = 10;
+        CommitLogPosition midpoint = populateData(samples);
+        ArrayList<File> toCheck = getCommitLogs();
+
+        CommitLogReader reader = new CommitLogReader();
+        TestCLRHandler testHandler = new TestCLRHandler();
+
+        for (File f: toCheck)
+            reader.readCommitLogSegment(testHandler, f, midpoint, readCount, false);
+
+        // Confirm correct count on replay
+        Assert.assertEquals("Expected " + readCount + " seen mutations, got: " + testHandler.seenMutations.size(),
+            readCount, testHandler.seenMutationCount());
+
+        confirmReadOrder(testHandler, samples / 2);
+    }
+
+    /**
+     * Since we have both cfm and non mixed into the CL, we ignore updates that aren't for the cfm the test handler
+     * is configured to check.
+     * @param handler
+     * @param offset integer offset of count we expect to see in record
+     */
+    private void confirmReadOrder(TestCLRHandler handler, int offset)
+    {
+        ColumnDefinition cd = currentTableMetadata().getColumnDefinition(new ColumnIdentifier("data", false));
+        int i = 0;
+        int j = 0;
+        while (i + j < handler.seenMutationCount())
+        {
+            PartitionUpdate pu = handler.seenMutations.get(i + j).get(currentTableMetadata());
+            if (pu == null)
+            {
+                j++;
+                continue;
+            }
+
+            for (Row r : pu)
+            {
+                String expected = Integer.toString(i + offset);
+                String seen = new String(r.getCell(cd).value().array());
+                if (!expected.equals(seen))
+                    Assert.fail("Mismatch at index: " + i + ". Offset: " + offset + " Expected: " + expected + " Seen: " + seen);
+            }
+            i++;
+        }
+    }
+
+    static ArrayList<File> getCommitLogs()
+    {
+        File dir = new File(DatabaseDescriptor.getCommitLogLocation());
+        File[] files = dir.listFiles();
+        ArrayList<File> results = new ArrayList<>();
+        for (File f : files)
+        {
+            if (f.isDirectory())
+                continue;
+            results.add(f);
+        }
+        Assert.assertTrue("Didn't find any commit log files.", 0 != results.size());
+        return results;
+    }
+
+    static class TestCLRHandler implements CommitLogReadHandler
+    {
+        public List<Mutation> seenMutations = new ArrayList<Mutation>();
+        public boolean sawStopOnErrorCheck = false;
+
+        private final CFMetaData cfm;
+
+        // Accept all
+        public TestCLRHandler()
+        {
+            this.cfm = null;
+        }
+
+        public TestCLRHandler(CFMetaData cfm)
+        {
+            this.cfm = cfm;
+        }
+
+        public boolean shouldSkipSegmentOnError(CommitLogReadException exception) throws IOException
+        {
+            sawStopOnErrorCheck = true;
+            return false;
+        }
+
+        public void handleUnrecoverableError(CommitLogReadException exception) throws IOException
+        {
+            sawStopOnErrorCheck = true;
+        }
+
+        public void handleMutation(Mutation m, int size, int entryLocation, CommitLogDescriptor desc)
+        {
+            if ((cfm == null) || (cfm != null && m.get(cfm) != null)) {
+                seenMutations.add(m);
+            }
+        }
+
+        public int seenMutationCount() { return seenMutations.size(); }
+    }
+
+    /**
+     * Returns offset of active written data at halfway point of data
+     */
+    CommitLogPosition populateData(int entryCount) throws Throwable
+    {
+        Assert.assertEquals("entryCount must be an even number.", 0, entryCount % 2);
+
+        createTable("CREATE TABLE %s (idx INT, data TEXT, PRIMARY KEY(idx));");
+        int midpoint = entryCount / 2;
+
+        for (int i = 0; i < midpoint; i++) {
+            execute("INSERT INTO %s (idx, data) VALUES (?, ?)", i, Integer.toString(i));
+        }
+
+        CommitLogPosition result = CommitLog.instance.getCurrentPosition();
+
+        for (int i = midpoint; i < entryCount; i++)
+            execute("INSERT INTO %s (idx, data) VALUES (?, ?)", i, Integer.toString(i));
+
+        Keyspace.open(keyspace()).getColumnFamilyStore(currentTable()).forceBlockingFlush();
+        return result;
+    }
+}

diff --git a/test/unit/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDCTest.java b/test/unit/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDCTest.java
new file mode 100644
index 0000000..e308a2f
--- /dev/null
+++ b/test/unit/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerCDCTest.java

@@ -0,0 +1,220 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.commitlog;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Random;
+
+import org.junit.Assert;
+import org.junit.Assume;
+import org.junit.Before;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.cql3.CQLTester;
+import org.apache.cassandra.db.Keyspace;
+import org.apache.cassandra.db.RowUpdateBuilder;
+import org.apache.cassandra.db.commitlog.CommitLogSegment.CDCState;
+import org.apache.cassandra.exceptions.WriteTimeoutException;
+import org.apache.cassandra.io.util.FileUtils;
+
+public class CommitLogSegmentManagerCDCTest extends CQLTester
+{
+    private static Random random = new Random();
+
+    @BeforeClass
+    public static void checkConfig()
+    {
+        Assume.assumeTrue(DatabaseDescriptor.isCDCEnabled());
+    }
+
+    @Before
+    public void before() throws IOException
+    {
+        // disable reserve segment to get more deterministic allocation/testing of CDC boundary states
+        CommitLog.instance.forceRecycleAllSegments();
+        for (File f : new File(DatabaseDescriptor.getCDCLogLocation()).listFiles())
+            FileUtils.deleteWithConfirm(f);
+    }
+
+    @Test
+    public void testCDCWriteTimeout() throws Throwable
+    {
+        createTable("CREATE TABLE %s (idx int, data text, primary key(idx)) WITH cdc=true;");
+        CommitLogSegmentManagerCDC cdcMgr = (CommitLogSegmentManagerCDC)CommitLog.instance.segmentManager;
+        CFMetaData cfm = currentTableMetadata();
+
+        // Confirm that logic to check for whether or not we can allocate new CDC segments works
+        Integer originalCDCSize = DatabaseDescriptor.getCDCSpaceInMB();
+        try
+        {
+            DatabaseDescriptor.setCDCSpaceInMB(32);
+            // Spin until we hit CDC capacity and make sure we get a WriteTimeout
+            try
+            {
+                // Should trigger on anything < 20:1 compression ratio during compressed test
+                for (int i = 0; i < 100; i++)
+                {
+                    new RowUpdateBuilder(cfm, 0, i)
+                        .add("data", randomizeBuffer(DatabaseDescriptor.getCommitLogSegmentSize() / 3))
+                        .build().apply();
+                }
+                Assert.fail("Expected WriteTimeoutException from full CDC but did not receive it.");
+            }
+            catch (WriteTimeoutException e)
+            {
+                // expected, do nothing
+            }
+            expectCurrentCDCState(CDCState.FORBIDDEN);
+
+            // Confirm we can create a non-cdc table and write to it even while at cdc capacity
+            createTable("CREATE TABLE %s (idx int, data text, primary key(idx)) WITH cdc=false;");
+            execute("INSERT INTO %s (idx, data) VALUES (1, '1');");
+
+            // Confirm that, on flush+recyle, we see files show up in cdc_raw
+            Keyspace.open(keyspace()).getColumnFamilyStore(currentTable()).forceBlockingFlush();
+            CommitLog.instance.forceRecycleAllSegments();
+            cdcMgr.awaitManagementTasksCompletion();
+            Assert.assertTrue("Expected files to be moved to overflow.", getCDCRawCount() > 0);
+
+            // Simulate a CDC consumer reading files then deleting them
+            for (File f : new File(DatabaseDescriptor.getCDCLogLocation()).listFiles())
+                FileUtils.deleteWithConfirm(f);
+
+            // Update size tracker to reflect deleted files. Should flip flag on current allocatingFrom to allow.
+            cdcMgr.updateCDCTotalSize();
+            expectCurrentCDCState(CDCState.PERMITTED);
+        }
+        finally
+        {
+            DatabaseDescriptor.setCDCSpaceInMB(originalCDCSize);
+        }
+    }
+
+    @Test
+    public void testCLSMCDCDiscardLogic() throws Throwable
+    {
+        CommitLogSegmentManagerCDC cdcMgr = (CommitLogSegmentManagerCDC)CommitLog.instance.segmentManager;
+
+        createTable("CREATE TABLE %s (idx int, data text, primary key(idx)) WITH cdc=false;");
+        for (int i = 0; i < 8; i++)
+        {
+            new RowUpdateBuilder(currentTableMetadata(), 0, i)
+                .add("data", randomizeBuffer(DatabaseDescriptor.getCommitLogSegmentSize() / 3))
+                .build().apply();
+        }
+
+        // Should have 4 segments CDC since we haven't flushed yet, 3 PERMITTED, one of which is active, and 1 PERMITTED, in waiting
+        Assert.assertEquals(4 * DatabaseDescriptor.getCommitLogSegmentSize(), cdcMgr.updateCDCTotalSize());
+        expectCurrentCDCState(CDCState.PERMITTED);
+        CommitLog.instance.forceRecycleAllSegments();
+
+        // on flush, these PERMITTED should be deleted
+        Assert.assertEquals(0, new File(DatabaseDescriptor.getCDCLogLocation()).listFiles().length);
+
+        createTable("CREATE TABLE %s (idx int, data text, primary key(idx)) WITH cdc=true;");
+        for (int i = 0; i < 8; i++)
+        {
+            new RowUpdateBuilder(currentTableMetadata(), 0, i)
+                .add("data", randomizeBuffer(DatabaseDescriptor.getCommitLogSegmentSize() / 3))
+                .build().apply();
+        }
+        // 4 total again, 3 CONTAINS, 1 in waiting PERMITTED
+        Assert.assertEquals(4 * DatabaseDescriptor.getCommitLogSegmentSize(), cdcMgr.updateCDCTotalSize());
+        CommitLog.instance.forceRecycleAllSegments();
+        expectCurrentCDCState(CDCState.PERMITTED);
+
+        // On flush, PERMITTED is deleted, CONTAINS is preserved.
+        cdcMgr.awaitManagementTasksCompletion();
+        int seen = getCDCRawCount();
+        Assert.assertTrue("Expected >3 files in cdc_raw, saw: " + seen, seen >= 3);
+    }
+
+    @Test
+    public void testSegmentFlaggingOnCreation() throws Throwable
+    {
+        CommitLogSegmentManagerCDC cdcMgr = (CommitLogSegmentManagerCDC)CommitLog.instance.segmentManager;
+        String ct = createTable("CREATE TABLE %s (idx int, data text, primary key(idx)) WITH cdc=true;");
+
+        int origSize = DatabaseDescriptor.getCDCSpaceInMB();
+        try
+        {
+            DatabaseDescriptor.setCDCSpaceInMB(16);
+            CFMetaData ccfm = Keyspace.open(keyspace()).getColumnFamilyStore(ct).metadata;
+            // Spin until we hit CDC capacity and make sure we get a WriteTimeout
+            try
+            {
+                for (int i = 0; i < 1000; i++)
+                {
+                    new RowUpdateBuilder(ccfm, 0, i)
+                        .add("data", randomizeBuffer(DatabaseDescriptor.getCommitLogSegmentSize() / 3))
+                        .build().apply();
+                }
+                Assert.fail("Expected WriteTimeoutException from full CDC but did not receive it.");
+            }
+            catch (WriteTimeoutException e) { }
+
+            expectCurrentCDCState(CDCState.FORBIDDEN);
+            CommitLog.instance.forceRecycleAllSegments();
+
+            cdcMgr.awaitManagementTasksCompletion();
+            new File(DatabaseDescriptor.getCDCLogLocation()).listFiles()[0].delete();
+            cdcMgr.updateCDCTotalSize();
+            // Confirm cdc update process changes flag on active segment
+            expectCurrentCDCState(CDCState.PERMITTED);
+
+            // Clear out archived CDC files
+            for (File f : new File(DatabaseDescriptor.getCDCLogLocation()).listFiles()) {
+                FileUtils.deleteWithConfirm(f);
+            }
+
+            // Set space to 0, confirm newly allocated segments are FORBIDDEN
+            DatabaseDescriptor.setCDCSpaceInMB(0);
+            CommitLog.instance.forceRecycleAllSegments();
+            CommitLog.instance.segmentManager.awaitManagementTasksCompletion();
+            expectCurrentCDCState(CDCState.FORBIDDEN);
+        }
+        finally
+        {
+            DatabaseDescriptor.setCDCSpaceInMB(origSize);
+        }
+    }
+
+    private ByteBuffer randomizeBuffer(int size)
+    {
+        byte[] toWrap = new byte[size];
+        random.nextBytes(toWrap);
+        return ByteBuffer.wrap(toWrap);
+    }
+
+    private int getCDCRawCount()
+    {
+        return new File(DatabaseDescriptor.getCDCLogLocation()).listFiles().length;
+    }
+
+    private void expectCurrentCDCState(CDCState state)
+    {
+        Assert.assertEquals("Received unexpected CDCState on current allocatingFrom segment.",
+            state, CommitLog.instance.segmentManager.allocatingFrom.getCDCState());
+    }
+}

diff --git a/test/unit/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerTest.java b/test/unit/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerTest.java
index 6a4aace..b777389 100644
--- a/test/unit/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerTest.java
+++ b/test/unit/org/apache/cassandra/db/commitlog/CommitLogSegmentManagerTest.java

@@ -23,9 +23,14 @@
 import java.nio.ByteBuffer;
 import java.util.Random;
 import java.util.concurrent.Semaphore;
-
 import javax.naming.ConfigurationException;
 
+import com.google.common.collect.ImmutableMap;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
 import org.apache.cassandra.config.Config.CommitLogSync;
@@ -41,12 +46,6 @@
 import org.apache.cassandra.schema.KeyspaceParams;
 import org.jboss.byteman.contrib.bmunit.BMRule;
 import org.jboss.byteman.contrib.bmunit.BMUnitRunner;
-import org.junit.Assert;
-import org.junit.BeforeClass;
-import org.junit.Test;
-import org.junit.runner.RunWith;
-
-import com.google.common.collect.ImmutableMap;
 
 @RunWith(BMUnitRunner.class)
 public class CommitLogSegmentManagerTest
@@ -99,9 +98,9 @@
         });
         dummyThread.start();
 
-        CommitLogSegmentManager clsm = CommitLog.instance.allocator;
+        AbstractCommitLogSegmentManager clsm = CommitLog.instance.segmentManager;
 
-        //Protect against delay, but still break out as fast as possible
+        // Protect against delay, but still break out as fast as possible
         long start = System.currentTimeMillis();
         while (System.currentTimeMillis() - start < 5000)
         {
@@ -110,11 +109,11 @@
         }
         Thread.sleep(1000);
 
-        //Should only be able to create 3 segments not 7 because it blocks waiting for truncation that never comes
+        // Should only be able to create 3 segments (not 7) because it blocks waiting for truncation that never comes.
         Assert.assertEquals(3, clsm.getActiveSegments().size());
 
-        clsm.getActiveSegments().forEach( segment -> clsm.recycleSegment(segment));
+        clsm.getActiveSegments().forEach(segment -> clsm.recycleSegment(segment));
 
         Util.spinAssertEquals(3, () -> clsm.getActiveSegments().size(), 5);
     }
-}
+}
\ No newline at end of file

diff --git a/test/unit/org/apache/cassandra/db/commitlog/CommitLogTest.java b/test/unit/org/apache/cassandra/db/commitlog/CommitLogTest.java
index 39ba886..23ec58b 100644
--- a/test/unit/org/apache/cassandra/db/commitlog/CommitLogTest.java
+++ b/test/unit/org/apache/cassandra/db/commitlog/CommitLogTest.java

@@ -18,32 +18,24 @@
 */
 package org.apache.cassandra.db.commitlog;
 
-import java.io.ByteArrayOutputStream;
-import java.io.DataOutputStream;
-import java.io.File;
-import java.io.FileOutputStream;
-import java.io.IOException;
-import java.io.OutputStream;
+import java.io.*;
 import java.nio.ByteBuffer;
-import java.util.Arrays;
-import java.util.Collection;
-import java.util.Collections;
-import java.util.UUID;
+import java.util.*;
 import java.util.concurrent.Callable;
 import java.util.concurrent.ExecutionException;
 import java.util.zip.CRC32;
 import java.util.zip.Checksum;
 
-import org.junit.Assert;
-import org.junit.Before;
-import org.junit.BeforeClass;
-import org.junit.Test;
+import com.google.common.collect.Iterables;
+
+import org.junit.*;
 import org.junit.runner.RunWith;
 import org.junit.runners.Parameterized;
 import org.junit.runners.Parameterized.Parameters;
 
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
+import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.ParameterizedClass;
 import org.apache.cassandra.db.ColumnFamilyStore;
@@ -54,17 +46,23 @@
 import org.apache.cassandra.db.compaction.CompactionManager;
 import org.apache.cassandra.db.marshal.AsciiType;
 import org.apache.cassandra.db.marshal.BytesType;
+import org.apache.cassandra.db.partitions.PartitionUpdate;
+import org.apache.cassandra.db.rows.Row;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.io.compress.DeflateCompressor;
 import org.apache.cassandra.io.compress.LZ4Compressor;
 import org.apache.cassandra.io.compress.SnappyCompressor;
 import org.apache.cassandra.net.MessagingService;
 import org.apache.cassandra.schema.KeyspaceParams;
-import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.security.EncryptionContextGenerator;
+import org.apache.cassandra.utils.Hex;
 import org.apache.cassandra.utils.JVMStabilityInspector;
 import org.apache.cassandra.utils.KillerForTests;
+import org.apache.cassandra.utils.Pair;
 import org.apache.cassandra.utils.vint.VIntCoding;
 
+import org.junit.After;
 import static org.apache.cassandra.utils.ByteBufferUtil.bytes;
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertTrue;
@@ -77,29 +75,28 @@
     private static final String STANDARD1 = "Standard1";
     private static final String STANDARD2 = "Standard2";
 
-    public CommitLogTest(ParameterizedClass commitLogCompression)
+    private static JVMStabilityInspector.Killer oldKiller;
+    private static KillerForTests testKiller;
+
+    public CommitLogTest(ParameterizedClass commitLogCompression, EncryptionContext encryptionContext)
     {
         DatabaseDescriptor.setCommitLogCompression(commitLogCompression);
-    }
-
-    @Before
-    public void setUp() throws IOException
-    {
-        CommitLog.instance.resetUnsafe(true);
+        DatabaseDescriptor.setEncryptionContext(encryptionContext);
     }
 
     @Parameters()
     public static Collection<Object[]> generateData()
     {
-        return Arrays.asList(new Object[][] {
-                { null }, // No compression
-                { new ParameterizedClass(LZ4Compressor.class.getName(), Collections.emptyMap()) },
-                { new ParameterizedClass(SnappyCompressor.class.getName(), Collections.emptyMap()) },
-                { new ParameterizedClass(DeflateCompressor.class.getName(), Collections.emptyMap()) } });
+        return Arrays.asList(new Object[][]{
+            {null, EncryptionContextGenerator.createDisabledContext()}, // No compression, no encryption
+            {null, EncryptionContextGenerator.createContext(true)}, // Encryption
+            {new ParameterizedClass(LZ4Compressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+            {new ParameterizedClass(SnappyCompressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()},
+            {new ParameterizedClass(DeflateCompressor.class.getName(), Collections.emptyMap()), EncryptionContextGenerator.createDisabledContext()}});
     }
 
     @BeforeClass
-    public static void defineSchema() throws ConfigurationException
+    public static void beforeClass() throws ConfigurationException
     {
         SchemaLoader.prepareServer();
         SchemaLoader.createKeyspace(KEYSPACE1,
@@ -111,13 +108,38 @@
                                     SchemaLoader.standardCFMD(KEYSPACE1, STANDARD1, 0, AsciiType.instance, BytesType.instance),
                                     SchemaLoader.standardCFMD(KEYSPACE1, STANDARD2, 0, AsciiType.instance, BytesType.instance));
         CompactionManager.instance.disableAutoCompaction();
+
+        testKiller = new KillerForTests();
+
+        // While we don't want the JVM to be nuked from under us on a test failure, we DO want some indication of
+        // an error. If we hit a "Kill the JVM" condition while working with the CL when we don't expect it, an aggressive
+        // KillerForTests will assertion out on us.
+        oldKiller = JVMStabilityInspector.replaceKiller(testKiller);
+    }
+
+    @AfterClass
+    public static void afterClass()
+    {
+        JVMStabilityInspector.replaceKiller(oldKiller);
+    }
+
+    @Before
+    public void beforeTest() throws IOException
+    {
+        CommitLog.instance.resetUnsafe(true);
+    }
+
+    @After
+    public void afterTest()
+    {
+        testKiller.reset();
     }
 
     @Test
     public void testRecoveryWithEmptyLog() throws Exception
     {
         runExpecting(() -> {
-            CommitLog.instance.recover(new File[]{ tmpFile(CommitLogDescriptor.current_version) });
+            CommitLog.instance.recoverFiles(tmpFile(CommitLogDescriptor.current_version));
             return null;
         }, CommitLogReplayException.class);
     }
@@ -125,7 +147,7 @@
     @Test
     public void testRecoveryWithEmptyLog20() throws Exception
     {
-        CommitLog.instance.recover(new File[]{ tmpFile(CommitLogDescriptor.VERSION_20) });
+        CommitLog.instance.recoverFiles(tmpFile(CommitLogDescriptor.VERSION_20));
     }
 
     @Test
@@ -151,14 +173,6 @@
     }
 
     @Test
-    public void testRecoveryWithShortCheckSum() throws Exception
-    {
-        byte[] data = new byte[8];
-        data[3] = 10;   // make sure this is not a legacy end marker.
-        testRecovery(data, CommitLogReplayException.class);
-    }
-
-    @Test
     public void testRecoveryWithShortMutationSize() throws Exception
     {
         testRecoveryWithBadSizeArgument(9, 10);
@@ -209,13 +223,14 @@
     @Test
     public void testDontDeleteIfDirty() throws Exception
     {
-        ColumnFamilyStore cfs1 = Keyspace.open(KEYSPACE1).getColumnFamilyStore(STANDARD1);
-        ColumnFamilyStore cfs2 = Keyspace.open(KEYSPACE1).getColumnFamilyStore(STANDARD2);
+        Keyspace ks = Keyspace.open(KEYSPACE1);
+        ColumnFamilyStore cfs1 = ks.getColumnFamilyStore(STANDARD1);
+        ColumnFamilyStore cfs2 = ks.getColumnFamilyStore(STANDARD2);
 
         // Roughly 32 MB mutation
         Mutation m = new RowUpdateBuilder(cfs1.metadata, 0, "k")
                      .clustering("bytes")
-                     .add("val", ByteBuffer.allocate(DatabaseDescriptor.getCommitLogSegmentSize()/4))
+                     .add("val", ByteBuffer.allocate(DatabaseDescriptor.getCommitLogSegmentSize() / 4))
                      .build();
 
         // Adding it 5 times
@@ -232,39 +247,40 @@
                       .build();
         CommitLog.instance.add(m2);
 
-        assert CommitLog.instance.activeSegments() == 2 : "Expecting 2 segments, got " + CommitLog.instance.activeSegments();
+        assertEquals(2, CommitLog.instance.segmentManager.getActiveSegments().size());
 
         UUID cfid2 = m2.getColumnFamilyIds().iterator().next();
-        CommitLog.instance.discardCompletedSegments(cfid2, CommitLog.instance.getContext());
+        CommitLog.instance.discardCompletedSegments(cfid2, CommitLog.instance.getCurrentPosition());
 
-        // Assert we still have both our segment
-        assert CommitLog.instance.activeSegments() == 2 : "Expecting 2 segments, got " + CommitLog.instance.activeSegments();
+        // Assert we still have both our segments
+        assertEquals(2, CommitLog.instance.segmentManager.getActiveSegments().size());
     }
 
     @Test
     public void testDeleteIfNotDirty() throws Exception
     {
-        ColumnFamilyStore cfs1 = Keyspace.open(KEYSPACE1).getColumnFamilyStore(STANDARD1);
-        ColumnFamilyStore cfs2 = Keyspace.open(KEYSPACE1).getColumnFamilyStore(STANDARD2);
+        Keyspace ks = Keyspace.open(KEYSPACE1);
+        ColumnFamilyStore cfs1 = ks.getColumnFamilyStore(STANDARD1);
+        ColumnFamilyStore cfs2 = ks.getColumnFamilyStore(STANDARD2);
 
         // Roughly 32 MB mutation
-        Mutation rm = new RowUpdateBuilder(cfs1.metadata, 0, "k")
-                      .clustering("bytes")
-                      .add("val", ByteBuffer.allocate((DatabaseDescriptor.getCommitLogSegmentSize()/4) - 1))
-                      .build();
+         Mutation rm = new RowUpdateBuilder(cfs1.metadata, 0, "k")
+                  .clustering("bytes")
+                  .add("val", ByteBuffer.allocate((DatabaseDescriptor.getCommitLogSegmentSize()/4) - 1))
+                  .build();
 
         // Adding it twice (won't change segment)
         CommitLog.instance.add(rm);
         CommitLog.instance.add(rm);
 
-        assert CommitLog.instance.activeSegments() == 1 : "Expecting 1 segment, got " + CommitLog.instance.activeSegments();
+        assertEquals(1, CommitLog.instance.segmentManager.getActiveSegments().size());
 
         // "Flush": this won't delete anything
         UUID cfid1 = rm.getColumnFamilyIds().iterator().next();
         CommitLog.instance.sync(true);
-        CommitLog.instance.discardCompletedSegments(cfid1, CommitLog.instance.getContext());
+        CommitLog.instance.discardCompletedSegments(cfid1, CommitLog.instance.getCurrentPosition());
 
-        assert CommitLog.instance.activeSegments() == 1 : "Expecting 1 segment, got " + CommitLog.instance.activeSegments();
+        assertEquals(1, CommitLog.instance.segmentManager.getActiveSegments().size());
 
         // Adding new mutation on another CF, large enough (including CL entry overhead) that a new segment is created
         Mutation rm2 = new RowUpdateBuilder(cfs2.metadata, 0, "k")
@@ -276,17 +292,16 @@
         CommitLog.instance.add(rm2);
         CommitLog.instance.add(rm2);
 
-        assert CommitLog.instance.activeSegments() == 3 : "Expecting 3 segments, got " + CommitLog.instance.activeSegments();
-
+        assertEquals(3, CommitLog.instance.segmentManager.getActiveSegments().size());
 
         // "Flush" second cf: The first segment should be deleted since we
         // didn't write anything on cf1 since last flush (and we flush cf2)
 
         UUID cfid2 = rm2.getColumnFamilyIds().iterator().next();
-        CommitLog.instance.discardCompletedSegments(cfid2, CommitLog.instance.getContext());
+        CommitLog.instance.discardCompletedSegments(cfid2, CommitLog.instance.getCurrentPosition());
 
         // Assert we still have both our segment
-        assert CommitLog.instance.activeSegments() == 1 : "Expecting 1 segment, got " + CommitLog.instance.activeSegments();
+        assertEquals(1, CommitLog.instance.segmentManager.getActiveSegments().size());
     }
 
     private static int getMaxRecordDataSize(String keyspace, ByteBuffer key, String cfName, String colName)
@@ -327,28 +342,20 @@
                       .clustering("bytes")
                       .add("val", ByteBuffer.allocate(getMaxRecordDataSize()))
                       .build();
-
         CommitLog.instance.add(rm);
     }
 
-    @Test
+    @Test(expected = IllegalArgumentException.class)
     public void testExceedRecordLimit() throws Exception
     {
-        CommitLog.instance.resetUnsafe(true);
-        ColumnFamilyStore cfs = Keyspace.open(KEYSPACE1).getColumnFamilyStore(STANDARD1);
-        try
-        {
-            Mutation rm = new RowUpdateBuilder(cfs.metadata, 0, "k")
-                          .clustering("bytes")
-                          .add("val", ByteBuffer.allocate(1 + getMaxRecordDataSize()))
-                          .build();
-            CommitLog.instance.add(rm);
-            throw new AssertionError("mutation larger than limit was accepted");
-        }
-        catch (IllegalArgumentException e)
-        {
-            // IAE is thrown on too-large mutations
-        }
+        Keyspace ks = Keyspace.open(KEYSPACE1);
+        ColumnFamilyStore cfs = ks.getColumnFamilyStore(STANDARD1);
+        Mutation rm = new RowUpdateBuilder(cfs.metadata, 0, "k")
+                      .clustering("bytes")
+                      .add("val", ByteBuffer.allocate(1 + getMaxRecordDataSize()))
+                      .build();
+        CommitLog.instance.add(rm);
+        throw new AssertionError("mutation larger than limit was accepted");
     }
 
     protected void testRecoveryWithBadSizeArgument(int size, int dataSize) throws Exception
@@ -369,10 +376,50 @@
         testRecovery(out.toByteArray(), CommitLogReplayException.class);
     }
 
+    /**
+     * Create a temporary commit log file with an appropriate descriptor at the head.
+     *
+     * @return the commit log file reference and the first position after the descriptor in the file
+     * (so that subsequent writes happen at the correct file location).
+     */
+    protected Pair<File, Integer> tmpFile() throws IOException
+    {
+        EncryptionContext encryptionContext = DatabaseDescriptor.getEncryptionContext();
+        CommitLogDescriptor desc = new CommitLogDescriptor(CommitLogDescriptor.current_version,
+                                                           CommitLogSegment.getNextId(),
+                                                           DatabaseDescriptor.getCommitLogCompression(),
+                                                           encryptionContext);
+
+
+        ByteBuffer buf = ByteBuffer.allocate(1024);
+        CommitLogDescriptor.writeHeader(buf, desc, getAdditionalHeaders(encryptionContext));
+        buf.flip();
+        int positionAfterHeader = buf.limit() + 1;
+
+        File logFile = new File(DatabaseDescriptor.getCommitLogLocation(), desc.fileName());
+
+        try (OutputStream lout = new FileOutputStream(logFile))
+        {
+            lout.write(buf.array(), 0, buf.limit());
+        }
+
+        return Pair.create(logFile, positionAfterHeader);
+    }
+
+    private Map<String, String> getAdditionalHeaders(EncryptionContext encryptionContext)
+    {
+        if (!encryptionContext.isEnabled())
+            return Collections.emptyMap();
+
+        // if we're testing encryption, we need to write out a cipher IV to the descriptor headers
+        byte[] buf = new byte[16];
+        new Random().nextBytes(buf);
+        return Collections.singletonMap(EncryptionContext.ENCRYPTION_IV, Hex.bytesToHex(buf));
+    }
+
     protected File tmpFile(int version) throws IOException
     {
         File logFile = File.createTempFile("CommitLog-" + version + "-", ".log");
-        logFile.deleteOnExit();
         assert logFile.length() == 0;
         return logFile;
     }
@@ -394,9 +441,9 @@
         File logFile = tmpFile(desc.version);
         CommitLogDescriptor fromFile = CommitLogDescriptor.fromFileName(logFile.getName());
         // Change id to match file.
-        desc = new CommitLogDescriptor(desc.version, fromFile.id, desc.compression);
+        desc = new CommitLogDescriptor(desc.version, fromFile.id, desc.compression, desc.getEncryptionContext());
         ByteBuffer buf = ByteBuffer.allocate(1024);
-        CommitLogDescriptor.writeHeader(buf, desc);
+        CommitLogDescriptor.writeHeader(buf, desc, getAdditionalHeaders(desc.getEncryptionContext()));
         try (OutputStream lout = new FileOutputStream(logFile))
         {
             lout.write(buf.array(), 0, buf.position());
@@ -410,7 +457,7 @@
     @Test
     public void testRecoveryWithIdMismatch() throws Exception
     {
-        CommitLogDescriptor desc = new CommitLogDescriptor(4, null);
+        CommitLogDescriptor desc = new CommitLogDescriptor(4, null, EncryptionContextGenerator.createDisabledContext());
         File logFile = tmpFile(desc.version);
         ByteBuffer buf = ByteBuffer.allocate(1024);
         CommitLogDescriptor.writeHeader(buf, desc);
@@ -428,7 +475,7 @@
     @Test
     public void testRecoveryWithBadCompressor() throws Exception
     {
-        CommitLogDescriptor desc = new CommitLogDescriptor(4, new ParameterizedClass("UnknownCompressor", null));
+        CommitLogDescriptor desc = new CommitLogDescriptor(4, new ParameterizedClass("UnknownCompressor", null), EncryptionContextGenerator.createDisabledContext());
         runExpecting(() -> {
             testRecovery(desc, new byte[0]);
             return null;
@@ -437,12 +484,6 @@
 
     protected void runExpecting(Callable<Void> r, Class<?> expected)
     {
-        JVMStabilityInspector.Killer originalKiller;
-        KillerForTests killerForTests;
-
-        killerForTests = new KillerForTests();
-        originalKiller = JVMStabilityInspector.replaceKiller(killerForTests);
-
         Throwable caught = null;
         try
         {
@@ -457,14 +498,28 @@
         if (expected != null && caught == null)
             Assert.fail("Expected exception " + expected + " but call completed successfully.");
 
-        JVMStabilityInspector.replaceKiller(originalKiller);
-        assertEquals("JVM killed", expected != null, killerForTests.wasKilled());
+        assertEquals("JVM kill state doesn't match expectation.", expected != null, testKiller.wasKilled());
     }
 
     protected void testRecovery(final byte[] logData, Class<?> expected) throws Exception
     {
+        ParameterizedClass commitLogCompression = DatabaseDescriptor.getCommitLogCompression();
+        EncryptionContext encryptionContext = DatabaseDescriptor.getEncryptionContext();
         runExpecting(() -> testRecovery(logData, CommitLogDescriptor.VERSION_20), expected);
-        runExpecting(() -> testRecovery(new CommitLogDescriptor(4, null), logData), expected);
+        runExpecting(() -> testRecovery(new CommitLogDescriptor(4, commitLogCompression, encryptionContext), logData), expected);
+    }
+
+    protected void testRecovery(byte[] logData) throws Exception
+    {
+        Pair<File, Integer> pair = tmpFile();
+        try (RandomAccessFile raf = new RandomAccessFile(pair.left, "rw"))
+        {
+            raf.seek(pair.right);
+            raf.write(logData);
+            raf.close();
+
+            CommitLog.instance.recoverFiles(pair.left); //CASSANDRA-1119 / CASSANDRA-1179 throw on failure*/
+        }
     }
 
     @Test
@@ -473,11 +528,11 @@
         boolean originalState = DatabaseDescriptor.isAutoSnapshot();
         try
         {
-            CommitLog.instance.resetUnsafe(true);
             boolean prev = DatabaseDescriptor.isAutoSnapshot();
             DatabaseDescriptor.setAutoSnapshot(false);
-            ColumnFamilyStore cfs1 = Keyspace.open(KEYSPACE1).getColumnFamilyStore(STANDARD1);
-            ColumnFamilyStore cfs2 = Keyspace.open(KEYSPACE1).getColumnFamilyStore(STANDARD2);
+            Keyspace ks = Keyspace.open(KEYSPACE1);
+            ColumnFamilyStore cfs1 = ks.getColumnFamilyStore(STANDARD1);
+            ColumnFamilyStore cfs2 = ks.getColumnFamilyStore(STANDARD2);
 
             new RowUpdateBuilder(cfs1.metadata, 0, "k").clustering("bytes").add("val", ByteBuffer.allocate(100)).build().applyUnsafe();
             cfs1.truncateBlocking();
@@ -490,13 +545,13 @@
             for (int i = 0 ; i < 5 ; i++)
                 CommitLog.instance.add(m2);
 
-            assertEquals(2, CommitLog.instance.activeSegments());
-            ReplayPosition position = CommitLog.instance.getContext();
-            for (Keyspace ks : Keyspace.system())
-                for (ColumnFamilyStore syscfs : ks.getColumnFamilyStores())
+            assertEquals(2, CommitLog.instance.segmentManager.getActiveSegments().size());
+            CommitLogPosition position = CommitLog.instance.getCurrentPosition();
+            for (Keyspace keyspace : Keyspace.system())
+                for (ColumnFamilyStore syscfs : keyspace.getColumnFamilyStores())
                     CommitLog.instance.discardCompletedSegments(syscfs.metadata.cfId, position);
             CommitLog.instance.discardCompletedSegments(cfs2.metadata.cfId, position);
-            assertEquals(1, CommitLog.instance.activeSegments());
+            assertEquals(1, CommitLog.instance.segmentManager.getActiveSegments().size());
         }
         finally
         {
@@ -516,12 +571,12 @@
 
             ColumnFamilyStore cfs = notDurableKs.getColumnFamilyStore("Standard1");
             new RowUpdateBuilder(cfs.metadata, 0, "key1")
-                .clustering("bytes").add("val", ByteBufferUtil.bytes("abcd"))
-                .build()
-                .applyUnsafe();
+            .clustering("bytes").add("val", bytes("abcd"))
+            .build()
+            .applyUnsafe();
 
             assertTrue(Util.getOnlyRow(Util.cmd(cfs).columns("val").build())
-                            .cells().iterator().next().value().equals(ByteBufferUtil.bytes("abcd")));
+                           .cells().iterator().next().value().equals(bytes("abcd")));
 
             cfs.truncateBlocking();
 
@@ -532,5 +587,110 @@
             DatabaseDescriptor.setAutoSnapshot(originalState);
         }
     }
+
+    @Test
+    public void replaySimple() throws IOException
+    {
+        int cellCount = 0;
+        ColumnFamilyStore cfs = Keyspace.open(KEYSPACE1).getColumnFamilyStore(STANDARD1);
+        final Mutation rm1 = new RowUpdateBuilder(cfs.metadata, 0, "k1")
+                             .clustering("bytes")
+                             .add("val", bytes("this is a string"))
+                             .build();
+        cellCount += 1;
+        CommitLog.instance.add(rm1);
+
+        final Mutation rm2 = new RowUpdateBuilder(cfs.metadata, 0, "k2")
+                             .clustering("bytes")
+                             .add("val", bytes("this is a string"))
+                             .build();
+        cellCount += 1;
+        CommitLog.instance.add(rm2);
+
+        CommitLog.instance.sync(true);
+
+        SimpleCountingReplayer replayer = new SimpleCountingReplayer(CommitLog.instance, CommitLogPosition.NONE, cfs.metadata);
+        List<String> activeSegments = CommitLog.instance.getActiveSegmentNames();
+        Assert.assertFalse(activeSegments.isEmpty());
+
+        File[] files = new File(CommitLog.instance.segmentManager.storageDirectory).listFiles((file, name) -> activeSegments.contains(name));
+        replayer.replayFiles(files);
+
+        assertEquals(cellCount, replayer.cells);
+    }
+
+    @Test
+    public void replayWithDiscard() throws IOException
+    {
+        int cellCount = 0;
+        int max = 1024;
+        int discardPosition = (int)(max * .8); // an arbitrary number of entries that we'll skip on the replay
+        CommitLogPosition commitLogPosition = null;
+        ColumnFamilyStore cfs = Keyspace.open(KEYSPACE1).getColumnFamilyStore(STANDARD1);
+
+        for (int i = 0; i < max; i++)
+        {
+            final Mutation rm1 = new RowUpdateBuilder(cfs.metadata, 0, "k" + 1)
+                                 .clustering("bytes")
+                                 .add("val", bytes("this is a string"))
+                                 .build();
+            CommitLogPosition position = CommitLog.instance.add(rm1);
+
+            if (i == discardPosition)
+                commitLogPosition = position;
+            if (i > discardPosition)
+            {
+                cellCount += 1;
+            }
+        }
+
+        CommitLog.instance.sync(true);
+
+        SimpleCountingReplayer replayer = new SimpleCountingReplayer(CommitLog.instance, commitLogPosition, cfs.metadata);
+        List<String> activeSegments = CommitLog.instance.getActiveSegmentNames();
+        Assert.assertFalse(activeSegments.isEmpty());
+
+        File[] files = new File(CommitLog.instance.segmentManager.storageDirectory).listFiles((file, name) -> activeSegments.contains(name));
+        replayer.replayFiles(files);
+
+        assertEquals(cellCount, replayer.cells);
+    }
+
+    class SimpleCountingReplayer extends CommitLogReplayer
+    {
+        private final CommitLogPosition filterPosition;
+        private final CFMetaData metadata;
+        int cells;
+        int skipped;
+
+        SimpleCountingReplayer(CommitLog commitLog, CommitLogPosition filterPosition, CFMetaData cfm)
+        {
+            super(commitLog, filterPosition, Collections.emptyMap(), ReplayFilter.create());
+            this.filterPosition = filterPosition;
+            this.metadata = cfm;
+        }
+
+        @SuppressWarnings("resource")
+        @Override
+        public void handleMutation(Mutation m, int size, int entryLocation, CommitLogDescriptor desc)
+        {
+            if (entryLocation <= filterPosition.position)
+            {
+                // Skip over this mutation.
+                skipped++;
+                return;
+            }
+            for (PartitionUpdate partitionUpdate : m.getPartitionUpdates())
+            {
+                // Only process mutations for the CF's we're testing against, since we can't deterministically predict
+                // whether or not system keyspaces will be mutated during a test.
+                if (partitionUpdate.metadata().cfName.equals(metadata.cfName))
+                {
+                    for (Row row : partitionUpdate)
+                        cells += Iterables.size(row.cells());
+                }
+            }
+        }
+    }
 }
 

diff --git a/test/unit/org/apache/cassandra/db/commitlog/CommitLogTestReplayer.java b/test/unit/org/apache/cassandra/db/commitlog/CommitLogTestReplayer.java
index e690785..9a22b04 100644
--- a/test/unit/org/apache/cassandra/db/commitlog/CommitLogTestReplayer.java
+++ b/test/unit/org/apache/cassandra/db/commitlog/CommitLogTestReplayer.java

@@ -22,13 +22,12 @@
 import java.io.IOException;
 
 import com.google.common.base.Predicate;
-
 import org.junit.Assert;
+
 import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.db.Mutation;
 import org.apache.cassandra.db.rows.SerializationHelper;
 import org.apache.cassandra.io.util.DataInputBuffer;
-import org.apache.cassandra.io.util.NIODataInputStream;
 import org.apache.cassandra.io.util.RebufferingInputStream;
 
 /**
@@ -36,44 +35,44 @@
  */
 public class CommitLogTestReplayer extends CommitLogReplayer
 {
-    public static void examineCommitLog(Predicate<Mutation> processor) throws IOException
+    private final Predicate<Mutation> processor;
+
+    public CommitLogTestReplayer(Predicate<Mutation> processor) throws IOException
     {
+        super(CommitLog.instance, CommitLogPosition.NONE, null, ReplayFilter.create());
         CommitLog.instance.sync(true);
 
-        CommitLogTestReplayer replayer = new CommitLogTestReplayer(CommitLog.instance, processor);
-        File commitLogDir = new File(DatabaseDescriptor.getCommitLogLocation());
-        replayer.recover(commitLogDir.listFiles());
-    }
-
-    final private Predicate<Mutation> processor;
-
-    public CommitLogTestReplayer(CommitLog log, Predicate<Mutation> processor)
-    {
-        this(log, ReplayPosition.NONE, processor);
-    }
-
-    public CommitLogTestReplayer(CommitLog log, ReplayPosition discardedPos, Predicate<Mutation> processor)
-    {
-        super(log, discardedPos, null, ReplayFilter.create());
         this.processor = processor;
+        commitLogReader = new CommitLogTestReader();
     }
 
-    @Override
-    void replayMutation(byte[] inputBuffer, int size, final int entryLocation, final CommitLogDescriptor desc)
+    public void examineCommitLog() throws IOException
     {
-        RebufferingInputStream bufIn = new DataInputBuffer(inputBuffer, 0, size);
-        Mutation mutation;
-        try
+        replayFiles(new File(DatabaseDescriptor.getCommitLogLocation()).listFiles());
+    }
+
+    private class CommitLogTestReader extends CommitLogReader
+    {
+        @Override
+        protected void readMutation(CommitLogReadHandler handler,
+                                    byte[] inputBuffer,
+                                    int size,
+                                    CommitLogPosition minPosition,
+                                    final int entryLocation,
+                                    final CommitLogDescriptor desc) throws IOException
         {
-            mutation = Mutation.serializer.deserialize(bufIn,
-                                                           desc.getMessagingVersion(),
-                                                           SerializationHelper.Flag.LOCAL);
-            Assert.assertTrue(processor.apply(mutation));
-        }
-        catch (IOException e)
-        {
-            // Test fails.
-            throw new AssertionError(e);
+            RebufferingInputStream bufIn = new DataInputBuffer(inputBuffer, 0, size);
+            Mutation mutation;
+            try
+            {
+                mutation = Mutation.serializer.deserialize(bufIn, desc.getMessagingVersion(), SerializationHelper.Flag.LOCAL);
+                Assert.assertTrue(processor.apply(mutation));
+            }
+            catch (IOException e)
+            {
+                // Test fails.
+                throw new AssertionError(e);
+            }
         }
     }
 }

diff --git a/test/unit/org/apache/cassandra/db/commitlog/CommitLogUpgradeTest.java b/test/unit/org/apache/cassandra/db/commitlog/CommitLogUpgradeTest.java
index 00a143b..90e4ffc 100644
--- a/test/unit/org/apache/cassandra/db/commitlog/CommitLogUpgradeTest.java
+++ b/test/unit/org/apache/cassandra/db/commitlog/CommitLogUpgradeTest.java

@@ -37,6 +37,7 @@
 
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.db.Mutation;
 import org.apache.cassandra.db.rows.Cell;
@@ -45,10 +46,15 @@
 import org.apache.cassandra.db.marshal.BytesType;
 import org.apache.cassandra.db.partitions.PartitionUpdate;
 import org.apache.cassandra.schema.KeyspaceParams;
+import org.apache.cassandra.security.EncryptionContextGenerator;
 import org.apache.cassandra.utils.JVMStabilityInspector;
 import org.apache.cassandra.utils.KillerForTests;
 import org.apache.cassandra.db.commitlog.CommitLogReplayer.CommitLogReplayException;
 
+/**
+ * Note: if you are looking to create new test cases for this test, check out
+ * {@link CommitLogUpgradeTestMaker}
+ */
 public class CommitLogUpgradeTest
 {
     static final String DATA_DIR = "test/data/legacy-commitlog/";
@@ -65,6 +71,13 @@
     private KillerForTests killerForTests;
     private boolean shouldBeKilled = false;
 
+    static CFMetaData metadata = CFMetaData.Builder.createDense(KEYSPACE, TABLE, false, false)
+                                                   .addPartitionKey("key", AsciiType.instance)
+                                                   .addClusteringColumn("col", AsciiType.instance)
+                                                   .addRegularColumn("val", BytesType.instance)
+                                                   .build()
+                                                   .compression(SchemaLoader.getCompressionParameters());
+
     @Before
     public void prepareToBeKilled()
     {
@@ -92,7 +105,6 @@
     }
 
     @Test
-
     public void test22() throws Exception
     {
         testRestore(DATA_DIR + "2.2");
@@ -125,10 +137,13 @@
     @Test
     public void test22_bitrot_ignored() throws Exception
     {
-        try {
+        try
+        {
             System.setProperty(CommitLogReplayer.IGNORE_REPLAY_ERRORS_PROPERTY, "true");
             testRestore(DATA_DIR + "2.2-lz4-bitrot");
-        } finally {
+        }
+        finally
+        {
             System.clearProperty(CommitLogReplayer.IGNORE_REPLAY_ERRORS_PROPERTY);
         }
     }
@@ -143,27 +158,31 @@
     @Test
     public void test22_bitrot2_ignored() throws Exception
     {
-        try {
+        try
+        {
             System.setProperty(CommitLogReplayer.IGNORE_REPLAY_ERRORS_PROPERTY, "true");
             testRestore(DATA_DIR + "2.2-lz4-bitrot2");
-        } finally {
+        }
+        finally
+        {
             System.clearProperty(CommitLogReplayer.IGNORE_REPLAY_ERRORS_PROPERTY);
         }
     }
 
-    @BeforeClass
-    static public void initialize() throws FileNotFoundException, IOException, InterruptedException
+    @Test
+    public void test34_encrypted() throws Exception
     {
-        CFMetaData metadata = CFMetaData.Builder.createDense(KEYSPACE, TABLE, false, false)
-                                                .addPartitionKey("key", AsciiType.instance)
-                                                .addClusteringColumn("col", AsciiType.instance)
-                                                .addRegularColumn("val", BytesType.instance)
-                                                .build()
-                                                .compression(SchemaLoader.getCompressionParameters());
+        testRestore(DATA_DIR + "3.4-encrypted");
+    }
+
+    @BeforeClass
+    public static void initialize()
+    {
         SchemaLoader.loadSchema();
         SchemaLoader.createKeyspace(KEYSPACE,
                                     KeyspaceParams.simple(1),
                                     metadata);
+        DatabaseDescriptor.setEncryptionContext(EncryptionContextGenerator.createContext(true));
     }
 
     public void testRestore(String location) throws IOException, InterruptedException
@@ -186,9 +205,9 @@
         }
 
         Hasher hasher = new Hasher();
-        CommitLogTestReplayer replayer = new CommitLogTestReplayer(CommitLog.instance, hasher);
+        CommitLogTestReplayer replayer = new CommitLogTestReplayer(hasher);
         File[] files = new File(location).listFiles((file, name) -> name.endsWith(".log"));
-        replayer.recover(files);
+        replayer.replayFiles(files);
 
         Assert.assertEquals(cells, hasher.cells);
         Assert.assertEquals(hash, hasher.hash);

diff --git a/test/unit/org/apache/cassandra/db/commitlog/CommitLogUpgradeTestMaker.java b/test/unit/org/apache/cassandra/db/commitlog/CommitLogUpgradeTestMaker.java
index 3538bd1..5a03f9f 100644
--- a/test/unit/org/apache/cassandra/db/commitlog/CommitLogUpgradeTestMaker.java
+++ b/test/unit/org/apache/cassandra/db/commitlog/CommitLogUpgradeTestMaker.java

@@ -42,6 +42,7 @@
 import org.apache.cassandra.db.Mutation;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.schema.KeyspaceParams;
 import org.apache.cassandra.utils.FBUtilities;
 
 import static org.apache.cassandra.db.commitlog.CommitLogUpgradeTest.*;
@@ -91,17 +92,20 @@
         }
 
         SchemaLoader.loadSchema();
-        SchemaLoader.schemaDefinition("");
+        SchemaLoader.createKeyspace(KEYSPACE,
+                                    KeyspaceParams.simple(1),
+                                    metadata);
     }
 
     public void makeLog() throws IOException, InterruptedException
     {
         CommitLog commitLog = CommitLog.instance;
-        System.out.format("\nUsing commit log size %dmb, compressor %s, sync %s%s\n",
+        System.out.format("\nUsing commit log size: %dmb, compressor: %s, encryption: %s, sync: %s, %s\n",
                           mb(DatabaseDescriptor.getCommitLogSegmentSize()),
                           commitLog.configuration.getCompressorName(),
+                          commitLog.configuration.useEncryption(),
                           commitLog.executor.getClass().getSimpleName(),
-                          randomSize ? " random size" : "");
+                          randomSize ? "random size" : "");
         final List<CommitlogExecutor> threads = new ArrayList<>();
         ScheduledExecutorService scheduled = startThreads(commitLog, threads);
 
@@ -215,7 +219,7 @@
         int dataSize = 0;
         final CommitLog commitLog;
 
-        volatile ReplayPosition rp;
+        volatile CommitLogPosition clsp;
 
         public CommitlogExecutor(CommitLog commitLog)
         {
@@ -230,7 +234,6 @@
             {
                 if (rl != null)
                     rl.acquire();
-                String ks = KEYSPACE;
                 ByteBuffer key = randomBytes(16, tlr);
 
                 UpdateBuilder builder = UpdateBuilder.create(Schema.instance.getCFMetaData(KEYSPACE, TABLE), Util.dk(key));
@@ -245,7 +248,7 @@
                     dataSize += sz;
                 }
 
-                rp = commitLog.add((Mutation)builder.makeMutation());
+                clsp = commitLog.add((Mutation)builder.makeMutation());
                 counter.incrementAndGet();
             }
         }

diff --git a/test/unit/org/apache/cassandra/db/commitlog/SegmentReaderTest.java b/test/unit/org/apache/cassandra/db/commitlog/SegmentReaderTest.java
new file mode 100644
index 0000000..88300a1
--- /dev/null
+++ b/test/unit/org/apache/cassandra/db/commitlog/SegmentReaderTest.java

@@ -0,0 +1,196 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.db.commitlog;
+
+import java.io.DataInput;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.util.Collections;
+import java.util.Random;
+import java.util.function.BiFunction;
+
+import javax.crypto.Cipher;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+import org.apache.cassandra.db.commitlog.CommitLogSegmentReader.CompressedSegmenter;
+import org.apache.cassandra.db.commitlog.CommitLogSegmentReader.EncryptedSegmenter;
+import org.apache.cassandra.db.commitlog.CommitLogSegmentReader.SyncSegment;
+import org.apache.cassandra.io.compress.DeflateCompressor;
+import org.apache.cassandra.io.compress.ICompressor;
+import org.apache.cassandra.io.compress.LZ4Compressor;
+import org.apache.cassandra.io.compress.SnappyCompressor;
+import org.apache.cassandra.io.util.FileDataInput;
+import org.apache.cassandra.io.util.RandomAccessReader;
+import org.apache.cassandra.security.CipherFactory;
+import org.apache.cassandra.security.EncryptionUtils;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.security.EncryptionContextGenerator;
+import org.apache.cassandra.utils.ByteBufferUtil;
+
+public class SegmentReaderTest
+{
+    static final Random random = new Random();
+
+    @Test
+    public void compressedSegmenter_LZ4() throws IOException
+    {
+        compressedSegmenter(LZ4Compressor.create(Collections.emptyMap()));
+    }
+
+    @Test
+    public void compressedSegmenter_Snappy() throws IOException
+    {
+        compressedSegmenter(SnappyCompressor.create(null));
+    }
+
+    @Test
+    public void compressedSegmenter_Deflate() throws IOException
+    {
+        compressedSegmenter(DeflateCompressor.create(null));
+    }
+
+    private void compressedSegmenter(ICompressor compressor) throws IOException
+    {
+        int rawSize = (1 << 15) - 137;
+        ByteBuffer plainTextBuffer = compressor.preferredBufferType().allocate(rawSize);
+        byte[] b = new byte[rawSize];
+        random.nextBytes(b);
+        plainTextBuffer.put(b);
+        plainTextBuffer.flip();
+
+        int uncompressedHeaderSize = 4;  // need to add in the plain text size to the block we write out
+        int length = compressor.initialCompressedBufferLength(rawSize);
+        ByteBuffer compBuffer = ByteBufferUtil.ensureCapacity(null, length + uncompressedHeaderSize, true, compressor.preferredBufferType());
+        compBuffer.putInt(rawSize);
+        compressor.compress(plainTextBuffer, compBuffer);
+        compBuffer.flip();
+
+        File compressedFile = File.createTempFile("compressed-segment-", ".log");
+        compressedFile.deleteOnExit();
+        FileOutputStream fos = new FileOutputStream(compressedFile);
+        fos.getChannel().write(compBuffer);
+        fos.close();
+
+        try (RandomAccessReader reader = RandomAccessReader.open(compressedFile))
+        {
+            CompressedSegmenter segmenter = new CompressedSegmenter(compressor, reader);
+            int fileLength = (int) compressedFile.length();
+            SyncSegment syncSegment = segmenter.nextSegment(0, fileLength);
+            FileDataInput fileDataInput = syncSegment.input;
+            ByteBuffer fileBuffer = readBytes(fileDataInput, rawSize);
+
+            plainTextBuffer.flip();
+            Assert.assertEquals(plainTextBuffer, fileBuffer);
+
+            // CompressedSegmenter includes the Sync header length in the syncSegment.endPosition (value)
+            Assert.assertEquals(rawSize, syncSegment.endPosition - CommitLogSegment.SYNC_MARKER_SIZE);
+        }
+    }
+
+    private ByteBuffer readBytes(FileDataInput input, int len)
+    {
+        byte[] buf = new byte[len];
+        try
+        {
+            input.readFully(buf);
+        }
+        catch (IOException e)
+        {
+            throw new RuntimeException(e);
+        }
+        return ByteBuffer.wrap(buf);
+    }
+
+    private ByteBuffer readBytesSeek(FileDataInput input, int len)
+    {
+        byte[] buf = new byte[len];
+
+        /// divide output buffer into 5
+        int[] offsets = new int[] { 0, len / 5, 2 * len / 5, 3 * len / 5, 4 * len / 5, len };
+        
+        //seek offset
+        long inputStart = input.getFilePointer();
+
+        for (int i = 0; i < offsets.length - 1; i++)
+        {
+            try
+            {
+                // seek to beginning of offet
+                input.seek(inputStart + offsets[i]);
+                //read this segment
+                input.readFully(buf, offsets[i], offsets[i + 1] - offsets[i]);
+            }
+            catch (IOException e)
+            {
+                throw new RuntimeException(e);
+            }
+        }
+        return ByteBuffer.wrap(buf);
+    }
+
+    @Test
+    public void encryptedSegmenterRead() throws IOException
+    {
+        underlyingEncryptedSegmenterTest((s, t) -> readBytes(s, t));
+    }
+
+    @Test
+    public void encryptedSegmenterSeek() throws IOException
+    {
+        underlyingEncryptedSegmenterTest((s, t) -> readBytesSeek(s, t));
+    }
+
+    public void underlyingEncryptedSegmenterTest(BiFunction<FileDataInput, Integer, ByteBuffer> readFun)
+            throws IOException
+    {
+        EncryptionContext context = EncryptionContextGenerator.createContext(true);
+        CipherFactory cipherFactory = new CipherFactory(context.getTransparentDataEncryptionOptions());
+
+        int plainTextLength = (1 << 13) - 137;
+        ByteBuffer plainTextBuffer = ByteBuffer.allocate(plainTextLength);
+        random.nextBytes(plainTextBuffer.array());
+
+        ByteBuffer compressedBuffer = EncryptionUtils.compress(plainTextBuffer, null, true, context.getCompressor());
+        Cipher cipher = cipherFactory.getEncryptor(context.getTransparentDataEncryptionOptions().cipher, context.getTransparentDataEncryptionOptions().key_alias);
+        File encryptedFile = File.createTempFile("encrypted-segment-", ".log");
+        encryptedFile.deleteOnExit();
+        FileChannel channel = new RandomAccessFile(encryptedFile, "rw").getChannel();
+        channel.write(ByteBufferUtil.bytes(plainTextLength));
+        EncryptionUtils.encryptAndWrite(compressedBuffer, channel, true, cipher);
+        channel.close();
+
+        try (RandomAccessReader reader = RandomAccessReader.open(encryptedFile))
+        {
+            context = EncryptionContextGenerator.createContext(cipher.getIV(), true);
+            EncryptedSegmenter segmenter = new EncryptedSegmenter(reader, context);
+            SyncSegment syncSegment = segmenter.nextSegment(0, (int) reader.length());
+
+            // EncryptedSegmenter includes the Sync header length in the syncSegment.endPosition (value)
+            Assert.assertEquals(plainTextLength, syncSegment.endPosition - CommitLogSegment.SYNC_MARKER_SIZE);
+            ByteBuffer fileBuffer = readFun.apply(syncSegment.input, plainTextLength);
+            plainTextBuffer.position(0);
+            Assert.assertEquals(plainTextBuffer, fileBuffer);
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/db/compaction/BlacklistingCompactionsTest.java b/test/unit/org/apache/cassandra/db/compaction/BlacklistingCompactionsTest.java
index df2d8a9..54579fb 100644
--- a/test/unit/org/apache/cassandra/db/compaction/BlacklistingCompactionsTest.java
+++ b/test/unit/org/apache/cassandra/db/compaction/BlacklistingCompactionsTest.java

@@ -37,6 +37,7 @@
 
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
+import org.apache.cassandra.cache.ChunkCache;
 import org.apache.cassandra.config.*;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.exceptions.ConfigurationException;
@@ -172,6 +173,8 @@
                 byte[] corruption = new byte[corruptionSize];
                 random.nextBytes(corruption);
                 raf.write(corruption);
+                if (ChunkCache.instance != null)
+                    ChunkCache.instance.invalidateFile(sstable.getFilename());
 
             }
             finally

diff --git a/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java b/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java
index afbfee1..ca206b3 100644
--- a/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java
+++ b/test/unit/org/apache/cassandra/db/compaction/CompactionsCQLTest.java

@@ -18,6 +18,7 @@
 package org.apache.cassandra.db.compaction;
 
 import java.util.HashMap;
+import java.util.List;
 import java.util.Map;
 
 import org.junit.Test;
@@ -217,9 +218,9 @@
     public boolean verifyStrategies(CompactionStrategyManager manager, Class<? extends AbstractCompactionStrategy> expected)
     {
         boolean found = false;
-        for (AbstractCompactionStrategy actualStrategy : manager.getStrategies())
+        for (List<AbstractCompactionStrategy> strategies : manager.getStrategies())
         {
-            if (!actualStrategy.getClass().equals(expected))
+            if (!strategies.stream().allMatch((strategy) -> strategy.getClass().equals(expected)))
                 return false;
             found = true;
         }

diff --git a/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java b/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java
index 26d53ed..ef26b35 100644
--- a/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java
+++ b/test/unit/org/apache/cassandra/db/compaction/CompactionsPurgeTest.java

@@ -19,6 +19,7 @@
 package org.apache.cassandra.db.compaction;
 
 import java.util.Collection;
+import java.util.List;
 import java.util.concurrent.ExecutionException;
 
 import org.junit.BeforeClass;
@@ -167,7 +168,9 @@
                 .build().applyUnsafe();
 
         cfs.forceBlockingFlush();
-        cfs.getCompactionStrategyManager().getUserDefinedTask(sstablesIncomplete, Integer.MAX_VALUE).execute(null);
+        List<AbstractCompactionTask> tasks = cfs.getCompactionStrategyManager().getUserDefinedTasks(sstablesIncomplete, Integer.MAX_VALUE);
+        assertEquals(1, tasks.size());
+        tasks.get(0).execute(null);
 
         // verify that minor compaction does GC when key is provably not
         // present in a non-compacted sstable
@@ -215,7 +218,9 @@
         cfs.forceBlockingFlush();
 
         // compact the sstables with the c1/c2 data and the c1 tombstone
-        cfs.getCompactionStrategyManager().getUserDefinedTask(sstablesIncomplete, Integer.MAX_VALUE).execute(null);
+        List<AbstractCompactionTask> tasks = cfs.getCompactionStrategyManager().getUserDefinedTasks(sstablesIncomplete, Integer.MAX_VALUE);
+        assertEquals(1, tasks.size());
+        tasks.get(0).execute(null);
 
         // We should have both the c1 and c2 tombstones still. Since the min timestamp in the c2 tombstone
         // sstable is older than the c1 tombstone, it is invalid to throw out the c1 tombstone.

diff --git a/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java b/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
index 1277209..bd964ed 100644
--- a/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java
+++ b/test/unit/org/apache/cassandra/db/compaction/LeveledCompactionStrategyTest.java

@@ -56,6 +56,7 @@
 import org.apache.cassandra.service.ActiveRepairService;
 import org.apache.cassandra.utils.FBUtilities;
 
+import static java.util.Collections.singleton;
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertTrue;
@@ -122,10 +123,11 @@
         }
 
         waitForLeveling(cfs);
-        CompactionStrategyManager strategy =  cfs.getCompactionStrategyManager();
+        CompactionStrategyManager strategyManager = cfs.getCompactionStrategyManager();
         // Checking we're not completely bad at math
-        int l1Count = strategy.getSSTableCountPerLevel()[1];
-        int l2Count = strategy.getSSTableCountPerLevel()[2];
+
+        int l1Count = strategyManager.getSSTableCountPerLevel()[1];
+        int l2Count = strategyManager.getSSTableCountPerLevel()[2];
         if (l1Count == 0 || l2Count == 0)
         {
             logger.error("L1 or L2 has 0 sstables. Expected > 0 on both.");
@@ -177,10 +179,10 @@
         }
 
         waitForLeveling(cfs);
-        CompactionStrategyManager strategy =  cfs.getCompactionStrategyManager();
+        CompactionStrategyManager strategyManager = cfs.getCompactionStrategyManager();
         // Checking we're not completely bad at math
-        assertTrue(strategy.getSSTableCountPerLevel()[1] > 0);
-        assertTrue(strategy.getSSTableCountPerLevel()[2] > 0);
+        assertTrue(strategyManager.getSSTableCountPerLevel()[1] > 0);
+        assertTrue(strategyManager.getSSTableCountPerLevel()[2] > 0);
 
         Range<Token> range = new Range<>(Util.token(""), Util.token(""));
         int gcBefore = keyspace.getColumnFamilyStore(CF_STANDARDDLEVELED).gcBefore(FBUtilities.nowInSeconds());
@@ -194,7 +196,7 @@
     /**
      * wait for leveled compaction to quiesce on the given columnfamily
      */
-    private void waitForLeveling(ColumnFamilyStore cfs) throws InterruptedException
+    public static void waitForLeveling(ColumnFamilyStore cfs) throws InterruptedException
     {
         CompactionStrategyManager strategyManager = cfs.getCompactionStrategyManager();
         while (true)
@@ -204,16 +206,19 @@
             // so it should be good enough
             boolean allL0Empty = true;
             boolean anyL1NonEmpty = false;
-            for (AbstractCompactionStrategy strategy : strategyManager.getStrategies())
+            for (List<AbstractCompactionStrategy> strategies : strategyManager.getStrategies())
             {
-                if (!(strategy instanceof LeveledCompactionStrategy))
-                    return;
-                // note that we check > 1 here, if there is too little data in L0, we don't compact it up to L1
-                if (((LeveledCompactionStrategy)strategy).getLevelSize(0) > 1)
-                    allL0Empty = false;
-                for (int i = 1; i < 5; i++)
-                    if (((LeveledCompactionStrategy)strategy).getLevelSize(i) > 0)
-                        anyL1NonEmpty = true;
+                for (AbstractCompactionStrategy strategy : strategies)
+                {
+                    if (!(strategy instanceof LeveledCompactionStrategy))
+                        return;
+                    // note that we check > 1 here, if there is too little data in L0, we don't compact it up to L1
+                    if (((LeveledCompactionStrategy)strategy).getLevelSize(0) > 1)
+                        allL0Empty = false;
+                    for (int i = 1; i < 5; i++)
+                        if (((LeveledCompactionStrategy)strategy).getLevelSize(i) > 0)
+                            anyL1NonEmpty = true;
+                }
             }
             if (allL0Empty && anyL1NonEmpty)
                 return;
@@ -240,7 +245,7 @@
         }
 
         waitForLeveling(cfs);
-        LeveledCompactionStrategy strategy = (LeveledCompactionStrategy) (cfs.getCompactionStrategyManager()).getStrategies().get(1);
+        LeveledCompactionStrategy strategy = (LeveledCompactionStrategy) cfs.getCompactionStrategyManager().getStrategies().get(1).get(0);
         assert strategy.getLevelSize(1) > 0;
 
         // get LeveledScanner for level 1 sstables
@@ -276,7 +281,7 @@
             cfs.forceBlockingFlush();
         }
         cfs.forceBlockingFlush();
-        LeveledCompactionStrategy strategy = (LeveledCompactionStrategy) ( cfs.getCompactionStrategyManager()).getStrategies().get(1);
+        LeveledCompactionStrategy strategy = (LeveledCompactionStrategy) cfs.getCompactionStrategyManager().getStrategies().get(1).get(0);
         cfs.forceMajorCompaction();
 
         for (SSTableReader s : cfs.getLiveSSTables())
@@ -322,14 +327,14 @@
         while(CompactionManager.instance.isCompacting(Arrays.asList(cfs)))
             Thread.sleep(100);
 
-        CompactionStrategyManager strategy =  cfs.getCompactionStrategyManager();
-        List<AbstractCompactionStrategy> strategies = strategy.getStrategies();
-        LeveledCompactionStrategy repaired = (LeveledCompactionStrategy) strategies.get(0);
-        LeveledCompactionStrategy unrepaired = (LeveledCompactionStrategy) strategies.get(1);
+        CompactionStrategyManager manager = cfs.getCompactionStrategyManager();
+        List<List<AbstractCompactionStrategy>> strategies = manager.getStrategies();
+        LeveledCompactionStrategy repaired = (LeveledCompactionStrategy) strategies.get(0).get(0);
+        LeveledCompactionStrategy unrepaired = (LeveledCompactionStrategy) strategies.get(1).get(0);
         assertEquals(0, repaired.manifest.getLevelCount() );
         assertEquals(2, unrepaired.manifest.getLevelCount());
-        assertTrue(strategy.getSSTableCountPerLevel()[1] > 0);
-        assertTrue(strategy.getSSTableCountPerLevel()[2] > 0);
+        assertTrue(manager.getSSTableCountPerLevel()[1] > 0);
+        assertTrue(manager.getSSTableCountPerLevel()[2] > 0);
 
         for (SSTableReader sstable : cfs.getLiveSSTables())
             assertFalse(sstable.isRepaired());
@@ -347,7 +352,7 @@
         sstable1.reloadSSTableMetadata();
         assertTrue(sstable1.isRepaired());
 
-        strategy.handleNotification(new SSTableRepairStatusChanged(Arrays.asList(sstable1)), this);
+        manager.handleNotification(new SSTableRepairStatusChanged(Arrays.asList(sstable1)), this);
 
         int repairedSSTableCount = 0;
         for (List<SSTableReader> level : repaired.manifest.generations)
@@ -359,7 +364,7 @@
         assertFalse(unrepaired.manifest.generations[2].contains(sstable1));
 
         unrepaired.removeSSTable(sstable2);
-        strategy.handleNotification(new SSTableAddedNotification(Collections.singleton(sstable2)), this);
+        manager.handleNotification(new SSTableAddedNotification(singleton(sstable2)), this);
         assertTrue(unrepaired.manifest.getLevel(1).contains(sstable2));
         assertFalse(repaired.manifest.getLevel(1).contains(sstable2));
     }

diff --git a/test/unit/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyTest.java b/test/unit/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyTest.java
index 3238170..5041b31 100644
--- a/test/unit/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyTest.java
+++ b/test/unit/org/apache/cassandra/db/compaction/TimeWindowCompactionStrategyTest.java

@@ -26,9 +26,6 @@
 
 import com.google.common.collect.HashMultimap;
 import com.google.common.collect.Iterables;
-import com.google.common.collect.Iterables;
-import com.google.common.collect.Lists;
-
 
 import org.junit.BeforeClass;
 import org.junit.Test;
@@ -179,10 +176,10 @@
             Pair<Long,Long> bounds = getWindowBoundsInMillis(TimeUnit.HOURS, 1, tstamp );
             buckets.put(bounds.left, sstrs.get(i));
         }
-        List<SSTableReader> newBucket = newestBucket(buckets, 4, 32, TimeUnit.HOURS, 1, new SizeTieredCompactionStrategyOptions(), getWindowBoundsInMillis(TimeUnit.HOURS, 1, System.currentTimeMillis()).left );
+        List<SSTableReader> newBucket = newestBucket(buckets, 4, 32, new SizeTieredCompactionStrategyOptions(), getWindowBoundsInMillis(TimeUnit.HOURS, 1, System.currentTimeMillis()).left );
         assertTrue("incoming bucket should not be accepted when it has below the min threshold SSTables", newBucket.isEmpty());
 
-        newBucket = newestBucket(buckets, 2, 32, TimeUnit.HOURS, 1, new SizeTieredCompactionStrategyOptions(), getWindowBoundsInMillis(TimeUnit.HOURS, 1, System.currentTimeMillis()).left);
+        newBucket = newestBucket(buckets, 2, 32, new SizeTieredCompactionStrategyOptions(), getWindowBoundsInMillis(TimeUnit.HOURS, 1, System.currentTimeMillis()).left);
         assertTrue("incoming bucket should be accepted when it is larger than the min threshold SSTables", !newBucket.isEmpty());
 
         // And 2 into the second bucket (1 hour back)
@@ -218,7 +215,7 @@
             buckets.put(bounds.left, sstrs.get(i));
         }
 
-        newBucket = newestBucket(buckets, 4, 32, TimeUnit.DAYS, 1, new SizeTieredCompactionStrategyOptions(), getWindowBoundsInMillis(TimeUnit.HOURS, 1, System.currentTimeMillis()).left);
+        newBucket = newestBucket(buckets, 4, 32, new SizeTieredCompactionStrategyOptions(), getWindowBoundsInMillis(TimeUnit.HOURS, 1, System.currentTimeMillis()).left);
         assertEquals("new bucket should be trimmed to max threshold of 32", newBucket.size(),  32);
     }
 

diff --git a/test/unit/org/apache/cassandra/db/filter/SliceTest.java b/test/unit/org/apache/cassandra/db/filter/SliceTest.java
index 2f07a24..b0705ce 100644
--- a/test/unit/org/apache/cassandra/db/filter/SliceTest.java
+++ b/test/unit/org/apache/cassandra/db/filter/SliceTest.java

@@ -367,14 +367,14 @@
         assertSlicesNormalization(cc, slices(s(-1, 2), s(-1, 3), s(5, 9)), slices(s(-1, 3), s(5, 9)));
     }
 
-    private static Slice.Bound makeBound(ClusteringPrefix.Kind kind, Integer... components)
+    private static ClusteringBound makeBound(ClusteringPrefix.Kind kind, Integer... components)
     {
         ByteBuffer[] values = new ByteBuffer[components.length];
         for (int i = 0; i < components.length; i++)
         {
             values[i] = ByteBufferUtil.bytes(components[i]);
         }
-        return Slice.Bound.create(kind, values);
+        return ClusteringBound.create(kind, values);
     }
 
     private static List<ByteBuffer> columnNames(Integer ... components)

diff --git a/test/unit/org/apache/cassandra/db/lifecycle/LogTransactionTest.java b/test/unit/org/apache/cassandra/db/lifecycle/LogTransactionTest.java
index 0f03baf..5ed22e4 100644
--- a/test/unit/org/apache/cassandra/db/lifecycle/LogTransactionTest.java
+++ b/test/unit/org/apache/cassandra/db/lifecycle/LogTransactionTest.java

@@ -520,7 +520,7 @@
                             getTemporaryFiles(dataFolder2));
 
         // normally called at startup
-        LogTransaction.removeUnfinishedLeftovers(Arrays.asList(dataFolder1, dataFolder2));
+        assertTrue(LogTransaction.removeUnfinishedLeftovers(Arrays.asList(dataFolder1, dataFolder2)));
 
         // new tables should be only table left
         assertFiles(dataFolder1.getPath(), new HashSet<>(sstables[1].getAllFilePaths()));
@@ -571,7 +571,7 @@
                             getTemporaryFiles(dataFolder2));
 
         // normally called at startup
-        LogTransaction.removeUnfinishedLeftovers(Arrays.asList(dataFolder1, dataFolder2));
+        assertTrue(LogTransaction.removeUnfinishedLeftovers(Arrays.asList(dataFolder1, dataFolder2)));
 
         // old tables should be only table left
         assertFiles(dataFolder1.getPath(), new HashSet<>(sstables[0].getAllFilePaths()));
@@ -736,7 +736,8 @@
 
         Arrays.stream(sstables).forEach(s -> s.selfRef().release());
 
-        LogTransaction.removeUnfinishedLeftovers(Arrays.asList(dataFolder1, dataFolder2));
+        // if shouldCommit is true then it should remove the leftovers and return true, false otherwise
+        assertEquals(shouldCommit, LogTransaction.removeUnfinishedLeftovers(Arrays.asList(dataFolder1, dataFolder2)));
         LogTransaction.waitForDeletions();
 
         if (shouldCommit)
@@ -929,7 +930,7 @@
                                   if (filePath.endsWith("Data.db"))
                                   {
                                       assertTrue(FileUtils.delete(filePath));
-                                      assertNull(t.txnFile().syncFolder(null));
+                                      assertNull(t.txnFile().syncDirectory(null));
                                       break;
                                   }
                               }

diff --git a/test/unit/org/apache/cassandra/db/lifecycle/RealTransactionsTest.java b/test/unit/org/apache/cassandra/db/lifecycle/RealTransactionsTest.java
index 4fbbb36..595610e 100644
--- a/test/unit/org/apache/cassandra/db/lifecycle/RealTransactionsTest.java
+++ b/test/unit/org/apache/cassandra/db/lifecycle/RealTransactionsTest.java

@@ -153,7 +153,7 @@
         int nowInSec = FBUtilities.nowInSeconds();
         try (CompactionController controller = new CompactionController(cfs, txn.originals(), cfs.gcBefore(FBUtilities.nowInSeconds())))
         {
-            try (SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, false);
+            try (SSTableRewriter rewriter = SSTableRewriter.constructKeepingOriginals(txn, false, 1000);
                  AbstractCompactionStrategy.ScannerList scanners = cfs.getCompactionStrategyManager().getScanners(txn.originals());
                  CompactionIterator ci = new CompactionIterator(txn.opType(), scanners.scanners, controller, nowInSec, txn.opId())
             )
@@ -168,6 +168,7 @@
                                                            0,
                                                            0,
                                                            SerializationHeader.make(cfs.metadata, txn.originals()),
+                                                           cfs.indexManager.listIndexes(),
                                                            txn));
                 while (ci.hasNext())
                 {

diff --git a/test/unit/org/apache/cassandra/db/lifecycle/TrackerTest.java b/test/unit/org/apache/cassandra/db/lifecycle/TrackerTest.java
index b8de711..1668ddc 100644
--- a/test/unit/org/apache/cassandra/db/lifecycle/TrackerTest.java
+++ b/test/unit/org/apache/cassandra/db/lifecycle/TrackerTest.java

@@ -39,7 +39,7 @@
 import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.Memtable;
 import org.apache.cassandra.db.commitlog.CommitLog;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.apache.cassandra.db.compaction.OperationType;
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.notifications.*;
@@ -266,23 +266,24 @@
         Tracker tracker = cfs.getTracker();
         tracker.subscribe(listener);
 
-        Memtable prev1 = tracker.switchMemtable(true, new Memtable(new AtomicReference<>(CommitLog.instance.getContext()), cfs));
+        Memtable prev1 = tracker.switchMemtable(true, new Memtable(new AtomicReference<>(CommitLog.instance.getCurrentPosition()), cfs));
         OpOrder.Group write1 = cfs.keyspace.writeOrder.getCurrent();
         OpOrder.Barrier barrier1 = cfs.keyspace.writeOrder.newBarrier();
-        prev1.setDiscarding(barrier1, new AtomicReference<>(CommitLog.instance.getContext()));
+        prev1.setDiscarding(barrier1, new AtomicReference<>(CommitLog.instance.getCurrentPosition()));
         barrier1.issue();
-        Memtable prev2 = tracker.switchMemtable(false, new Memtable(new AtomicReference<>(CommitLog.instance.getContext()), cfs));
+        Memtable prev2 = tracker.switchMemtable(false, new Memtable(new AtomicReference<>(CommitLog.instance.getCurrentPosition()), cfs));
         OpOrder.Group write2 = cfs.keyspace.writeOrder.getCurrent();
         OpOrder.Barrier barrier2 = cfs.keyspace.writeOrder.newBarrier();
-        prev2.setDiscarding(barrier2, new AtomicReference<>(CommitLog.instance.getContext()));
+        prev2.setDiscarding(barrier2, new AtomicReference<>(CommitLog.instance.getCurrentPosition()));
         barrier2.issue();
         Memtable cur = tracker.getView().getCurrentMemtable();
         OpOrder.Group writecur = cfs.keyspace.writeOrder.getCurrent();
-        Assert.assertEquals(prev1, tracker.getMemtableFor(write1, ReplayPosition.NONE));
-        Assert.assertEquals(prev2, tracker.getMemtableFor(write2, ReplayPosition.NONE));
-        Assert.assertEquals(cur, tracker.getMemtableFor(writecur, ReplayPosition.NONE));
-        Assert.assertEquals(1, listener.received.size());
+        Assert.assertEquals(prev1, tracker.getMemtableFor(write1, CommitLogPosition.NONE));
+        Assert.assertEquals(prev2, tracker.getMemtableFor(write2, CommitLogPosition.NONE));
+        Assert.assertEquals(cur, tracker.getMemtableFor(writecur, CommitLogPosition.NONE));
+        Assert.assertEquals(2, listener.received.size());
         Assert.assertTrue(listener.received.get(0) instanceof MemtableRenewedNotification);
+        Assert.assertTrue(listener.received.get(1) instanceof MemtableSwitchedNotification);
         listener.received.clear();
 
         tracker.markFlushing(prev2);
@@ -298,13 +299,14 @@
         Assert.assertTrue(tracker.getView().flushingMemtables.contains(prev2));
 
         SSTableReader reader = MockSchema.sstable(0, 10, false, cfs);
-        tracker.replaceFlushed(prev2, Collections.singleton(reader));
+        tracker.replaceFlushed(prev2, singleton(reader));
         Assert.assertEquals(1, tracker.getView().sstables.size());
         Assert.assertEquals(1, tracker.getView().premature.size());
         tracker.permitCompactionOfFlushed(singleton(reader));
         Assert.assertEquals(0, tracker.getView().premature.size());
-        Assert.assertEquals(1, listener.received.size());
-        Assert.assertEquals(singleton(reader), ((SSTableAddedNotification) listener.received.get(0)).added);
+        Assert.assertEquals(2, listener.received.size());
+        Assert.assertEquals(prev2, ((MemtableDiscardedNotification) listener.received.get(0)).memtable);
+        Assert.assertEquals(singleton(reader), ((SSTableAddedNotification) listener.received.get(1)).added);
         listener.received.clear();
         Assert.assertTrue(reader.isKeyCacheSetup());
         Assert.assertEquals(10, cfs.metric.liveDiskSpaceUsed.getCount());
@@ -314,17 +316,21 @@
         tracker = cfs.getTracker();
         listener = new MockListener(false);
         tracker.subscribe(listener);
-        prev1 = tracker.switchMemtable(false, new Memtable(new AtomicReference<>(CommitLog.instance.getContext()), cfs));
+        prev1 = tracker.switchMemtable(false, new Memtable(new AtomicReference<>(CommitLog.instance.getCurrentPosition()), cfs));
         tracker.markFlushing(prev1);
         reader = MockSchema.sstable(0, 10, true, cfs);
         cfs.invalidate(false);
-        tracker.replaceFlushed(prev1, Collections.singleton(reader));
+        tracker.replaceFlushed(prev1, singleton(reader));
         tracker.permitCompactionOfFlushed(Collections.singleton(reader));
         Assert.assertEquals(0, tracker.getView().sstables.size());
         Assert.assertEquals(0, tracker.getView().flushingMemtables.size());
         Assert.assertEquals(0, cfs.metric.liveDiskSpaceUsed.getCount());
-        Assert.assertEquals(reader, (((SSTableDeletingNotification) listener.received.get(0)).deleting));
-        Assert.assertEquals(1, ((SSTableListChangedNotification) listener.received.get(1)).removed.size());
+        System.out.println(listener.received);
+        Assert.assertEquals(4, listener.received.size());
+        Assert.assertEquals(prev1, ((MemtableSwitchedNotification) listener.received.get(0)).memtable);
+        Assert.assertEquals(prev1, ((MemtableDiscardedNotification) listener.received.get(1)).memtable);
+        Assert.assertTrue(listener.received.get(2) instanceof SSTableDeletingNotification);
+        Assert.assertEquals(1, ((SSTableListChangedNotification) listener.received.get(3)).removed.size());
         DatabaseDescriptor.setIncrementalBackupsEnabled(backups);
     }
 
@@ -347,7 +353,7 @@
         Assert.assertEquals(singleton(r2), ((SSTableListChangedNotification) listener.received.get(0)).added);
         listener.received.clear();
         tracker.notifySSTableRepairedStatusChanged(singleton(r1));
-        Assert.assertEquals(singleton(r1), ((SSTableRepairStatusChanged) listener.received.get(0)).sstable);
+        Assert.assertEquals(singleton(r1), ((SSTableRepairStatusChanged) listener.received.get(0)).sstables);
         listener.received.clear();
         Memtable memtable = MockSchema.memtable(cfs);
         tracker.notifyRenewed(memtable);

diff --git a/test/unit/org/apache/cassandra/db/lifecycle/ViewTest.java b/test/unit/org/apache/cassandra/db/lifecycle/ViewTest.java
index a5dceca..2cf79bd 100644
--- a/test/unit/org/apache/cassandra/db/lifecycle/ViewTest.java
+++ b/test/unit/org/apache/cassandra/db/lifecycle/ViewTest.java

@@ -42,6 +42,7 @@
 import static com.google.common.collect.ImmutableSet.copyOf;
 import static com.google.common.collect.ImmutableSet.of;
 import static com.google.common.collect.Iterables.concat;
+import static java.util.Collections.singleton;
 import static org.apache.cassandra.db.lifecycle.Helpers.emptySet;
 
 public class ViewTest
@@ -195,7 +196,7 @@
         Assert.assertEquals(memtable3, cur.getCurrentMemtable());
 
         SSTableReader sstable = MockSchema.sstable(1, cfs);
-        cur = View.replaceFlushed(memtable1, Collections.singleton(sstable)).apply(cur);
+        cur = View.replaceFlushed(memtable1, singleton(sstable)).apply(cur);
         Assert.assertEquals(0, cur.flushingMemtables.size());
         Assert.assertEquals(1, cur.liveMemtables.size());
         Assert.assertEquals(memtable3, cur.getCurrentMemtable());

diff --git a/test/unit/org/apache/cassandra/db/marshal/AbstractCompositeTypeTest.java b/test/unit/org/apache/cassandra/db/marshal/AbstractCompositeTypeTest.java
new file mode 100644
index 0000000..dc78bb9
--- /dev/null
+++ b/test/unit/org/apache/cassandra/db/marshal/AbstractCompositeTypeTest.java

@@ -0,0 +1,55 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.db.marshal;
+
+import org.junit.Test;
+import static org.junit.Assert.assertEquals;
+
+public class AbstractCompositeTypeTest
+{
+    
+    @Test
+    public void testEscape()
+    {
+        assertEquals("", AbstractCompositeType.escape(""));
+        assertEquals("Ab!CdXy \\Z123-345", AbstractCompositeType.escape("Ab!CdXy \\Z123-345"));
+        assertEquals("Ab!CdXy \\Z123-345!!", AbstractCompositeType.escape("Ab!CdXy \\Z123-345!"));
+        assertEquals("Ab!CdXy \\Z123-345\\!", AbstractCompositeType.escape("Ab!CdXy \\Z123-345\\"));
+        
+        assertEquals("A\\:b!CdXy \\\\:Z123-345", AbstractCompositeType.escape("A:b!CdXy \\:Z123-345"));
+        assertEquals("A\\:b!CdXy \\\\:Z123-345!!", AbstractCompositeType.escape("A:b!CdXy \\:Z123-345!"));
+        assertEquals("A\\:b!CdXy \\\\:Z123-345\\!", AbstractCompositeType.escape("A:b!CdXy \\:Z123-345\\"));
+        
+    }
+    
+    @Test
+    public void testUnescape()
+    {
+        assertEquals("", AbstractCompositeType.escape(""));
+        assertEquals("Ab!CdXy \\Z123-345", AbstractCompositeType.unescape("Ab!CdXy \\Z123-345"));
+        assertEquals("Ab!CdXy \\Z123-345!", AbstractCompositeType.unescape("Ab!CdXy \\Z123-345!!"));
+        assertEquals("Ab!CdXy \\Z123-345\\", AbstractCompositeType.unescape("Ab!CdXy \\Z123-345\\!"));
+        
+        assertEquals("A:b!CdXy \\:Z123-345", AbstractCompositeType.unescape("A\\:b!CdXy \\\\:Z123-345"));
+        assertEquals("A:b!CdXy \\:Z123-345!", AbstractCompositeType.unescape("A\\:b!CdXy \\\\:Z123-345!!"));
+        assertEquals("A:b!CdXy \\:Z123-345\\", AbstractCompositeType.unescape("A\\:b!CdXy \\\\:Z123-345\\!"));
+    }
+}

diff --git a/test/unit/org/apache/cassandra/db/monitoring/MonitoringTaskTest.java b/test/unit/org/apache/cassandra/db/monitoring/MonitoringTaskTest.java
new file mode 100644
index 0000000..4490519
--- /dev/null
+++ b/test/unit/org/apache/cassandra/db/monitoring/MonitoringTaskTest.java

@@ -0,0 +1,341 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.db.monitoring;
+
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.List;
+import java.util.UUID;
+import java.util.concurrent.CountDownLatch;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+public class MonitoringTaskTest
+{
+    private static final long timeout = 100;
+    private static final long MAX_SPIN_TIME_NANOS = TimeUnit.SECONDS.toNanos(5);
+
+    public static final int REPORT_INTERVAL_MS = 600000; // long enough so that it won't check unless told to do so
+    public static final int MAX_TIMEDOUT_OPERATIONS = -1; // unlimited
+
+    @BeforeClass
+    public static void setup()
+    {
+        MonitoringTask.instance = MonitoringTask.make(REPORT_INTERVAL_MS, MAX_TIMEDOUT_OPERATIONS);
+    }
+
+    private static final class TestMonitor extends MonitorableImpl
+    {
+        private final String name;
+
+        TestMonitor(String name, ConstructionTime constructionTime, long timeout)
+        {
+            this.name = name;
+            setMonitoringTime(constructionTime, timeout);
+        }
+
+        public String name()
+        {
+            return name;
+        }
+
+        @Override
+        public String toString()
+        {
+            return name();
+        }
+    }
+
+    private static void waitForOperationsToComplete(Monitorable... operations) throws InterruptedException
+    {
+        waitForOperationsToComplete(Arrays.asList(operations));
+    }
+
+    private static void waitForOperationsToComplete(List<Monitorable> operations) throws InterruptedException
+    {
+        long timeout = operations.stream().map(Monitorable::timeout).reduce(0L, Long::max);
+        Thread.sleep(timeout * 2 + ApproximateTime.precision());
+
+        long start = System.nanoTime();
+        while(System.nanoTime() - start <= MAX_SPIN_TIME_NANOS)
+        {
+            long numInProgress = operations.stream().filter(Monitorable::isInProgress).count();
+            if (numInProgress == 0)
+                return;
+
+            Thread.yield();
+        }
+    }
+
+    @Test
+    public void testAbort() throws InterruptedException
+    {
+        Monitorable operation = new TestMonitor("Test abort", new ConstructionTime(System.currentTimeMillis()), timeout);
+        waitForOperationsToComplete(operation);
+
+        assertTrue(operation.isAborted());
+        assertFalse(operation.isCompleted());
+        assertEquals(1, MonitoringTask.instance.getFailedOperations().size());
+    }
+
+    @Test
+    public void testAbortIdemPotent() throws InterruptedException
+    {
+        Monitorable operation = new TestMonitor("Test abort", new ConstructionTime(System.currentTimeMillis()), timeout);
+        waitForOperationsToComplete(operation);
+
+        assertTrue(operation.abort());
+
+        assertTrue(operation.isAborted());
+        assertFalse(operation.isCompleted());
+        assertEquals(1, MonitoringTask.instance.getFailedOperations().size());
+    }
+
+    @Test
+    public void testAbortCrossNode() throws InterruptedException
+    {
+        Monitorable operation = new TestMonitor("Test for cross node", new ConstructionTime(System.currentTimeMillis(), true), timeout);
+        waitForOperationsToComplete(operation);
+
+        assertTrue(operation.isAborted());
+        assertFalse(operation.isCompleted());
+        assertEquals(1, MonitoringTask.instance.getFailedOperations().size());
+    }
+
+    @Test
+    public void testComplete() throws InterruptedException
+    {
+        Monitorable operation = new TestMonitor("Test complete", new ConstructionTime(System.currentTimeMillis()), timeout);
+        operation.complete();
+        waitForOperationsToComplete(operation);
+
+        assertFalse(operation.isAborted());
+        assertTrue(operation.isCompleted());
+        assertEquals(0, MonitoringTask.instance.getFailedOperations().size());
+    }
+
+    @Test
+    public void testCompleteIdemPotent() throws InterruptedException
+    {
+        Monitorable operation = new TestMonitor("Test complete", new ConstructionTime(System.currentTimeMillis()), timeout);
+        operation.complete();
+        waitForOperationsToComplete(operation);
+
+        assertTrue(operation.complete());
+
+        assertFalse(operation.isAborted());
+        assertTrue(operation.isCompleted());
+        assertEquals(0, MonitoringTask.instance.getFailedOperations().size());
+    }
+
+    @Test
+    public void testReport() throws InterruptedException
+    {
+        Monitorable operation = new TestMonitor("Test report", new ConstructionTime(System.currentTimeMillis()), timeout);
+        waitForOperationsToComplete(operation);
+
+        assertTrue(operation.isAborted());
+        assertFalse(operation.isCompleted());
+        MonitoringTask.instance.logFailedOperations(ApproximateTime.currentTimeMillis());
+        assertEquals(0, MonitoringTask.instance.getFailedOperations().size());
+    }
+
+    @Test
+    public void testRealScheduling() throws InterruptedException
+    {
+        MonitoringTask.instance = MonitoringTask.make(10, -1);
+        try
+        {
+            Monitorable operation = new TestMonitor("Test report", new ConstructionTime(System.currentTimeMillis()), timeout);
+            waitForOperationsToComplete(operation);
+
+            assertTrue(operation.isAborted());
+            assertFalse(operation.isCompleted());
+
+            Thread.sleep(ApproximateTime.precision() + 500);
+            assertEquals(0, MonitoringTask.instance.getFailedOperations().size());
+        }
+        finally
+        {
+            MonitoringTask.instance = MonitoringTask.make(REPORT_INTERVAL_MS, MAX_TIMEDOUT_OPERATIONS);
+        }
+    }
+
+    @Test
+    public void testMultipleThreads() throws InterruptedException
+    {
+        final int opCount = 50;
+        final ExecutorService executorService = Executors.newFixedThreadPool(20);
+        final List<Monitorable> operations = Collections.synchronizedList(new ArrayList<>(opCount));
+
+        for (int i = 0; i < opCount; i++)
+        {
+            executorService.submit(() ->
+                operations.add(new TestMonitor(UUID.randomUUID().toString(), new ConstructionTime(), timeout))
+            );
+        }
+
+        executorService.shutdown();
+        assertTrue(executorService.awaitTermination(30, TimeUnit.SECONDS));
+        assertEquals(opCount, operations.size());
+
+        waitForOperationsToComplete(operations);
+        assertEquals(opCount, MonitoringTask.instance.getFailedOperations().size());
+    }
+
+    @Test
+    public void testZeroMaxTimedoutOperations() throws InterruptedException
+    {
+        doTestMaxTimedoutOperations(0, 1, 0);
+    }
+
+    @Test
+    public void testMaxTimedoutOperationsExceeded() throws InterruptedException
+    {
+        doTestMaxTimedoutOperations(5, 10, 6);
+    }
+
+    private static void doTestMaxTimedoutOperations(int maxTimedoutOperations,
+                                                    int numThreads,
+                                                    int numExpectedOperations) throws InterruptedException
+    {
+        MonitoringTask.instance = MonitoringTask.make(REPORT_INTERVAL_MS, maxTimedoutOperations);
+        try
+        {
+            final int threadCount = numThreads;
+            ExecutorService executorService = Executors.newFixedThreadPool(threadCount);
+            final CountDownLatch finished = new CountDownLatch(threadCount);
+
+            for (int i = 0; i < threadCount; i++)
+            {
+                final String operationName = "Operation " + Integer.toString(i+1);
+                final int numTimes = i + 1;
+                executorService.submit(() -> {
+                    try
+                    {
+                        for (int j = 0; j < numTimes; j++)
+                        {
+                            Monitorable operation = new TestMonitor(operationName,
+                                                                    new ConstructionTime(System.currentTimeMillis()),
+                                                                    timeout);
+                            waitForOperationsToComplete(operation);
+                        }
+                    }
+                    catch (InterruptedException e)
+                    {
+                        e.printStackTrace();
+                        fail("Unexpected exception");
+                    }
+                    finally
+                    {
+                        finished.countDown();
+                    }
+                });
+            }
+
+            finished.await();
+            assertEquals(0, executorService.shutdownNow().size());
+
+            List<String> failedOperations = MonitoringTask.instance.getFailedOperations();
+            assertEquals(numExpectedOperations, failedOperations.size());
+            if (numExpectedOperations > 0)
+                assertTrue(failedOperations.get(numExpectedOperations - 1).startsWith("..."));
+        }
+        finally
+        {
+            MonitoringTask.instance = MonitoringTask.make(REPORT_INTERVAL_MS, MAX_TIMEDOUT_OPERATIONS);
+        }
+    }
+
+    @Test
+    public void testMultipleThreadsSameName() throws InterruptedException
+    {
+        final int threadCount = 50;
+        final List<Monitorable> operations = new ArrayList<>(threadCount);
+        ExecutorService executorService = Executors.newFixedThreadPool(threadCount);
+        final CountDownLatch finished = new CountDownLatch(threadCount);
+
+        for (int i = 0; i < threadCount; i++)
+        {
+            executorService.submit(() -> {
+                try
+                {
+                    Monitorable operation = new TestMonitor("Test testMultipleThreadsSameName",
+                                                            new ConstructionTime(System.currentTimeMillis()),
+                                                            timeout);
+                    operations.add(operation);
+                }
+                finally
+                {
+                    finished.countDown();
+                }
+            });
+        }
+
+        finished.await();
+        assertEquals(0, executorService.shutdownNow().size());
+
+        waitForOperationsToComplete(operations);
+        //MonitoringTask.instance.checkFailedOperations(ApproximateTime.currentTimeMillis());
+        assertEquals(1, MonitoringTask.instance.getFailedOperations().size());
+    }
+
+    @Test
+    public void testMultipleThreadsNoFailedOps() throws InterruptedException
+    {
+        final int threadCount = 50;
+        final List<Monitorable> operations = new ArrayList<>(threadCount);
+        ExecutorService executorService = Executors.newFixedThreadPool(threadCount);
+        final CountDownLatch finished = new CountDownLatch(threadCount);
+
+        for (int i = 0; i < threadCount; i++)
+        {
+            executorService.submit(() -> {
+                try
+                {
+                    Monitorable operation = new TestMonitor("Test thread " + Thread.currentThread().getName(),
+                                                            new ConstructionTime(System.currentTimeMillis()),
+                                                            timeout);
+                    operations.add(operation);
+                    operation.complete();
+                }
+                finally
+                {
+                    finished.countDown();
+                }
+            });
+        }
+
+        finished.await();
+        assertEquals(0, executorService.shutdownNow().size());
+
+        waitForOperationsToComplete(operations);
+        assertEquals(0, MonitoringTask.instance.getFailedOperations().size());
+    }
+}

diff --git a/test/unit/org/apache/cassandra/db/partition/PartitionImplementationTest.java b/test/unit/org/apache/cassandra/db/partition/PartitionImplementationTest.java
index f215331..90d6310 100644
--- a/test/unit/org/apache/cassandra/db/partition/PartitionImplementationTest.java
+++ b/test/unit/org/apache/cassandra/db/partition/PartitionImplementationTest.java

@@ -39,7 +39,6 @@
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.ColumnIdentifier;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.Slice.Bound;
 import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.marshal.AsciiType;
 import org.apache.cassandra.db.partitions.AbstractBTreePartition;
@@ -104,7 +103,7 @@
         ColumnDefinition defCol = cfm.getColumnDefinition(new ColumnIdentifier("col", true));
         Row.Builder row = BTreeRow.unsortedBuilder(TIMESTAMP);
         row.newRow(clustering);
-        row.addCell(BufferCell.live(cfm, defCol, TIMESTAMP, ByteBufferUtil.bytes(colValue)));
+        row.addCell(BufferCell.live(defCol, TIMESTAMP, ByteBufferUtil.bytes(colValue)));
         return row.build();
     }
 
@@ -113,7 +112,7 @@
         ColumnDefinition defCol = cfm.getColumnDefinition(new ColumnIdentifier("static_col", true));
         Row.Builder row = BTreeRow.unsortedBuilder(TIMESTAMP);
         row.newRow(Clustering.STATIC_CLUSTERING);
-        row.addCell(BufferCell.live(cfm, defCol, TIMESTAMP, ByteBufferUtil.bytes("static value")));
+        row.addCell(BufferCell.live(defCol, TIMESTAMP, ByteBufferUtil.bytes("static value")));
         return row.build();
     }
 
@@ -314,10 +313,10 @@
         testSearchIterator(sortedContent, partition, cf, true);
 
         // sliceable iter
-        testSliceableIterator(sortedContent, partition, ColumnFilter.all(cfm), false);
-        testSliceableIterator(sortedContent, partition, cf, false);
-        testSliceableIterator(sortedContent, partition, ColumnFilter.all(cfm), true);
-        testSliceableIterator(sortedContent, partition, cf, true);
+        testSlicingOfIterators(sortedContent, partition, ColumnFilter.all(cfm), false);
+        testSlicingOfIterators(sortedContent, partition, cf, false);
+        testSlicingOfIterators(sortedContent, partition, ColumnFilter.all(cfm), true);
+        testSlicingOfIterators(sortedContent, partition, cf, true);
     }
 
     void testSearchIterator(NavigableSet<Clusterable> sortedContent, Partition partition, ColumnFilter cf, boolean reversed)
@@ -355,29 +354,36 @@
             Clustering start = clustering(pos);
             pos += sz;
             Clustering end = clustering(pos);
-            Slice slice = Slice.make(skip == 0 ? Bound.exclusiveStartOf(start) : Bound.inclusiveStartOf(start), Bound.inclusiveEndOf(end));
+            Slice slice = Slice.make(skip == 0 ? ClusteringBound.exclusiveStartOf(start) : ClusteringBound.inclusiveStartOf(start), ClusteringBound.inclusiveEndOf(end));
             builder.add(slice);
         }
         return builder.build();
     }
 
-    void testSliceableIterator(NavigableSet<Clusterable> sortedContent, AbstractBTreePartition partition, ColumnFilter cf, boolean reversed)
+    void testSlicingOfIterators(NavigableSet<Clusterable> sortedContent, AbstractBTreePartition partition, ColumnFilter cf, boolean reversed)
     {
         Function<? super Clusterable, ? extends Clusterable> colFilter = x -> x instanceof Row ? ((Row) x).filter(cf, cfm) : x;
         Slices slices = makeSlices();
-        try (SliceableUnfilteredRowIterator sliceableIter = partition.sliceableUnfilteredIterator(cf, reversed))
+
+        // fetch each slice in turn
+        for (Slice slice : (Iterable<Slice>) () -> directed(slices, reversed))
         {
-            for (Slice slice : (Iterable<Slice>) () -> directed(slices, reversed))
+            try (UnfilteredRowIterator slicedIter = partition.unfilteredIterator(cf, Slices.with(cfm.comparator, slice), reversed))
+            {
                 assertIteratorsEqual(streamOf(directed(slice(sortedContent, slice), reversed)).map(colFilter).iterator(),
-                                     sliceableIter.slice(slice));
+                                     slicedIter);
+            }
         }
 
-        // Try using sliceable as unfiltered iterator
-        try (SliceableUnfilteredRowIterator sliceableIter = partition.sliceableUnfilteredIterator(cf, reversed))
+        // Fetch all slices at once
+        try (UnfilteredRowIterator slicedIter = partition.unfilteredIterator(cf, slices, reversed))
         {
-            assertIteratorsEqual((reversed ? sortedContent.descendingSet() : sortedContent).
-                                     stream().map(colFilter).iterator(),
-                                 sliceableIter);
+            List<Iterator<? extends Clusterable>> slicelist = new ArrayList<>();
+            slices.forEach(slice -> slicelist.add(directed(slice(sortedContent, slice), reversed)));
+            if (reversed)
+                Collections.reverse(slicelist);
+
+            assertIteratorsEqual(Iterators.concat(slicelist.toArray(new Iterator[0])), slicedIter);
         }
     }
 

diff --git a/test/unit/org/apache/cassandra/db/rows/DigestBackwardCompatibilityTest.java b/test/unit/org/apache/cassandra/db/rows/DigestBackwardCompatibilityTest.java
index c8f5cb1..a72d397 100644
--- a/test/unit/org/apache/cassandra/db/rows/DigestBackwardCompatibilityTest.java
+++ b/test/unit/org/apache/cassandra/db/rows/DigestBackwardCompatibilityTest.java

@@ -18,7 +18,6 @@
 package org.apache.cassandra.db.rows;
 
 import java.nio.ByteBuffer;
-import java.util.*;
 import java.security.MessageDigest;
 
 import org.junit.Test;
@@ -28,7 +27,6 @@
 import org.apache.cassandra.config.ColumnDefinition;
 import org.apache.cassandra.cql3.CQLTester;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.filter.*;
 import org.apache.cassandra.db.partitions.*;
 import org.apache.cassandra.db.context.CounterContext;
 import org.apache.cassandra.net.MessagingService;
@@ -78,7 +76,6 @@
         createTable("CREATE TABLE %s (k text, t int, v1 text, v2 int, PRIMARY KEY (k, t))");
 
         String key = "someKey";
-        int N = 10;
 
         for (int i = 0; i < 10; i++)
             execute("INSERT INTO %s(k, t, v1, v2) VALUES (?, ?, ?, ?) USING TIMESTAMP ? AND TTL ?", key, i, "v" + i, i, 1L, 200);
@@ -105,7 +102,6 @@
         createTable("CREATE TABLE %s (k text, t int, v text, PRIMARY KEY (k, t)) WITH COMPACT STORAGE");
 
         String key = "someKey";
-        int N = 10;
 
         for (int i = 0; i < 10; i++)
             execute("INSERT INTO %s(k, t, v) VALUES (?, ?, ?) USING TIMESTAMP ? AND TTL ?", key, i, "v" + i, 1L, 200);
@@ -174,7 +170,7 @@
         CFMetaData metadata = getCurrentColumnFamilyStore().metadata;
         ColumnDefinition column = metadata.getColumnDefinition(ByteBufferUtil.bytes("c"));
         ByteBuffer value = CounterContext.instance().createGlobal(CounterId.fromInt(1), 1L, 42L);
-        Row row = BTreeRow.singleCellRow(Clustering.STATIC_CLUSTERING, BufferCell.live(metadata, column, 0L, value));
+        Row row = BTreeRow.singleCellRow(Clustering.STATIC_CLUSTERING, BufferCell.live(column, 0L, value));
 
         new Mutation(PartitionUpdate.singleRowUpdate(metadata, Util.dk(key), row)).applyUnsafe();
 

diff --git a/test/unit/org/apache/cassandra/db/rows/RowAndDeletionMergeIteratorTest.java b/test/unit/org/apache/cassandra/db/rows/RowAndDeletionMergeIteratorTest.java
index 400d65a..93dc904 100644
--- a/test/unit/org/apache/cassandra/db/rows/RowAndDeletionMergeIteratorTest.java
+++ b/test/unit/org/apache/cassandra/db/rows/RowAndDeletionMergeIteratorTest.java

@@ -28,8 +28,6 @@
 import org.junit.BeforeClass;
 import org.junit.Test;
 
-import org.apache.cassandra.db.Slice.Bound;
-import org.apache.cassandra.db.ClusteringPrefix;
 import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.filter.ColumnFilter;
@@ -41,9 +39,6 @@
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
 import org.apache.cassandra.config.CFMetaData;
-import org.apache.cassandra.db.ColumnFamilyStore;
-import org.apache.cassandra.db.DecoratedKey;
-import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.db.marshal.AsciiType;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.schema.KeyspaceParams;
@@ -130,7 +125,7 @@
         assertRtMarker(iterator.next(), ClusteringPrefix.Kind.INCL_START_BOUND, 4);
 
         assertTrue(iterator.hasNext());
-        assertRtMarker(iterator.next(), Bound.TOP);
+        assertRtMarker(iterator.next(), ClusteringBound.TOP);
 
         assertFalse(iterator.hasNext());
     }
@@ -148,7 +143,7 @@
         UnfilteredRowIterator iterator = createMergeIterator(rowIterator, rangeTombstoneIterator, false);
 
         assertTrue(iterator.hasNext());
-        assertRtMarker(iterator.next(), Bound.BOTTOM);
+        assertRtMarker(iterator.next(), ClusteringBound.BOTTOM);
 
         assertTrue(iterator.hasNext());
         assertRtMarker(iterator.next(), ClusteringPrefix.Kind.INCL_END_BOUND, 0);
@@ -193,7 +188,7 @@
         assertRtMarker(iterator.next(), ClusteringPrefix.Kind.EXCL_START_BOUND, 2);
 
         assertTrue(iterator.hasNext());
-        assertRtMarker(iterator.next(), Bound.TOP);
+        assertRtMarker(iterator.next(), ClusteringBound.TOP);
 
         assertFalse(iterator.hasNext());
     }
@@ -212,7 +207,7 @@
         UnfilteredRowIterator iterator = createMergeIterator(rowIterator, rangeTombstoneIterator, false);
 
         assertTrue(iterator.hasNext());
-        assertRtMarker(iterator.next(), Bound.BOTTOM);
+        assertRtMarker(iterator.next(), ClusteringBound.BOTTOM);
 
         assertTrue(iterator.hasNext());
         assertRtMarker(iterator.next(), ClusteringPrefix.Kind.INCL_END_BOUND, 0);
@@ -227,7 +222,7 @@
         assertRtMarker(iterator.next(), ClusteringPrefix.Kind.EXCL_START_BOUND, 2);
 
         assertTrue(iterator.hasNext());
-        assertRtMarker(iterator.next(), Bound.TOP);
+        assertRtMarker(iterator.next(), ClusteringBound.TOP);
 
         assertFalse(iterator.hasNext());
     }
@@ -253,13 +248,13 @@
         UnfilteredRowIterator iterator = createMergeIterator(rowIterator, rangeTombstoneIterator, false);
 
         assertTrue(iterator.hasNext());
-        assertRtMarker(iterator.next(), Bound.BOTTOM);
+        assertRtMarker(iterator.next(), ClusteringBound.BOTTOM);
 
         assertTrue(iterator.hasNext());
         assertRtMarker(iterator.next(), ClusteringPrefix.Kind.INCL_END_EXCL_START_BOUNDARY, 2);
 
         assertTrue(iterator.hasNext());
-        assertRtMarker(iterator.next(), Bound.TOP);
+        assertRtMarker(iterator.next(), ClusteringBound.TOP);
 
         assertFalse(iterator.hasNext());
     }
@@ -278,13 +273,13 @@
         UnfilteredRowIterator iterator = createMergeIterator(rowIterator, rangeTombstoneIterator, false);
 
         assertTrue(iterator.hasNext());
-        assertRtMarker(iterator.next(), Bound.BOTTOM);
+        assertRtMarker(iterator.next(), ClusteringBound.BOTTOM);
 
         assertTrue(iterator.hasNext());
         assertRtMarker(iterator.next(), ClusteringPrefix.Kind.EXCL_END_INCL_START_BOUNDARY, 2);
 
         assertTrue(iterator.hasNext());
-        assertRtMarker(iterator.next(), Bound.TOP);
+        assertRtMarker(iterator.next(), ClusteringBound.TOP);
 
         assertFalse(iterator.hasNext());
     }
@@ -299,7 +294,7 @@
         UnfilteredRowIterator iterator = createMergeIterator(rowIterator, rangeTombstoneIterator, false);
 
         assertTrue(iterator.hasNext());
-        assertRtMarker(iterator.next(), Bound.BOTTOM);
+        assertRtMarker(iterator.next(), ClusteringBound.BOTTOM);
 
         assertTrue(iterator.hasNext());
         assertRow(iterator.next(), 0);
@@ -345,7 +340,7 @@
     }
 
 
-    private void assertRtMarker(Unfiltered unfiltered, Bound bound)
+    private void assertRtMarker(Unfiltered unfiltered, ClusteringBoundOrBoundary bound)
     {
         assertEquals(Unfiltered.Kind.RANGE_TOMBSTONE_MARKER, unfiltered.kind());
         assertEquals(bound, unfiltered.clustering());
@@ -400,38 +395,38 @@
 
     private void addRow(PartitionUpdate update, int col1, int a)
     {
-        update.add(BTreeRow.singleCellRow(update.metadata().comparator.make(col1), makeCell(cfm, defA, a, 0)));
+        update.add(BTreeRow.singleCellRow(update.metadata().comparator.make(col1), makeCell(defA, a, 0)));
     }
 
-    private Cell makeCell(CFMetaData cfm, ColumnDefinition columnDefinition, int value, long timestamp)
+    private Cell makeCell(ColumnDefinition columnDefinition, int value, long timestamp)
     {
-        return BufferCell.live(cfm, columnDefinition, timestamp, ((AbstractType)columnDefinition.cellValueType()).decompose(value));
+        return BufferCell.live(columnDefinition, timestamp, ((AbstractType)columnDefinition.cellValueType()).decompose(value));
     }
 
     private static RangeTombstone atLeast(int start, long tstamp, int delTime)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.inclusiveStartOf(bb(start)), Slice.Bound.TOP), new DeletionTime(tstamp, delTime));
+        return new RangeTombstone(Slice.make(ClusteringBound.inclusiveStartOf(bb(start)), ClusteringBound.TOP), new DeletionTime(tstamp, delTime));
     }
 
     private static RangeTombstone atMost(int end, long tstamp, int delTime)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.BOTTOM, Slice.Bound.inclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
+        return new RangeTombstone(Slice.make(ClusteringBound.BOTTOM, ClusteringBound.inclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
     }
 
     private static RangeTombstone lessThan(int end, long tstamp, int delTime)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.BOTTOM, Slice.Bound.exclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
+        return new RangeTombstone(Slice.make(ClusteringBound.BOTTOM, ClusteringBound.exclusiveEndOf(bb(end))), new DeletionTime(tstamp, delTime));
     }
 
     private static RangeTombstone greaterThan(int start, long tstamp, int delTime)
     {
-        return new RangeTombstone(Slice.make(Slice.Bound.exclusiveStartOf(bb(start)), Slice.Bound.TOP), new DeletionTime(tstamp, delTime));
+        return new RangeTombstone(Slice.make(ClusteringBound.exclusiveStartOf(bb(start)), ClusteringBound.TOP), new DeletionTime(tstamp, delTime));
     }
 
     private static RangeTombstone rt(int start, boolean startInclusive, int end, boolean endInclusive, long tstamp, int delTime)
     {
-        Slice.Bound startBound = startInclusive ? Slice.Bound.inclusiveStartOf(bb(start)) : Slice.Bound.exclusiveStartOf(bb(start));
-        Slice.Bound endBound = endInclusive ? Slice.Bound.inclusiveEndOf(bb(end)) : Slice.Bound.exclusiveEndOf(bb(end));
+        ClusteringBound startBound = startInclusive ? ClusteringBound.inclusiveStartOf(bb(start)) : ClusteringBound.exclusiveStartOf(bb(start));
+        ClusteringBound endBound = endInclusive ? ClusteringBound.inclusiveEndOf(bb(end)) : ClusteringBound.exclusiveEndOf(bb(end));
 
         return new RangeTombstone(Slice.make(startBound, endBound), new DeletionTime(tstamp, delTime));
     }

diff --git a/test/unit/org/apache/cassandra/db/rows/RowsTest.java b/test/unit/org/apache/cassandra/db/rows/RowsTest.java
index b47bea2..ba03478 100644
--- a/test/unit/org/apache/cassandra/db/rows/RowsTest.java
+++ b/test/unit/org/apache/cassandra/db/rows/RowsTest.java

@@ -210,15 +210,15 @@
         long ts = secondToTs(now);
         Row.Builder builder = BTreeRow.unsortedBuilder(now);
         builder.newRow(c);
-        builder.addPrimaryKeyLivenessInfo(LivenessInfo.create(kcvm, ts, now));
+        builder.addPrimaryKeyLivenessInfo(LivenessInfo.create(ts, now));
         if (vVal != null)
         {
-            builder.addCell(BufferCell.live(kcvm, v, ts, vVal));
+            builder.addCell(BufferCell.live(v, ts, vVal));
         }
         if (mKey != null && mVal != null)
         {
             builder.addComplexDeletion(m, new DeletionTime(ts - 1, now));
-            builder.addCell(BufferCell.live(kcvm, m, ts, mVal, CellPath.create(mKey)));
+            builder.addCell(BufferCell.live(m, ts, mVal, CellPath.create(mKey)));
         }
 
         return builder;
@@ -231,13 +231,13 @@
         long ts = secondToTs(now);
         Row.Builder originalBuilder = BTreeRow.unsortedBuilder(now);
         originalBuilder.newRow(c1);
-        LivenessInfo liveness = LivenessInfo.create(kcvm, ts, now);
+        LivenessInfo liveness = LivenessInfo.create(ts, now);
         originalBuilder.addPrimaryKeyLivenessInfo(liveness);
         DeletionTime complexDeletion = new DeletionTime(ts-1, now);
         originalBuilder.addComplexDeletion(m, complexDeletion);
-        List<Cell> expectedCells = Lists.newArrayList(BufferCell.live(kcvm, v, secondToTs(now), BB1),
-                                                      BufferCell.live(kcvm, m, secondToTs(now), BB1, CellPath.create(BB1)),
-                                                      BufferCell.live(kcvm, m, secondToTs(now), BB2, CellPath.create(BB2)));
+        List<Cell> expectedCells = Lists.newArrayList(BufferCell.live(v, secondToTs(now), BB1),
+                                                      BufferCell.live(m, secondToTs(now), BB1, CellPath.create(BB1)),
+                                                      BufferCell.live(m, secondToTs(now), BB2, CellPath.create(BB2)));
         expectedCells.forEach(originalBuilder::addCell);
         // We need to use ts-1 so the deletion doesn't shadow what we've created
         Row.Deletion rowDeletion = new Row.Deletion(new DeletionTime(ts-1, now), false);
@@ -260,13 +260,13 @@
         long ts = secondToTs(now);
         Row.Builder builder = BTreeRow.unsortedBuilder(now);
         builder.newRow(c1);
-        LivenessInfo liveness = LivenessInfo.create(kcvm, ts, now);
+        LivenessInfo liveness = LivenessInfo.create(ts, now);
         builder.addPrimaryKeyLivenessInfo(liveness);
         DeletionTime complexDeletion = new DeletionTime(ts-1, now);
         builder.addComplexDeletion(m, complexDeletion);
-        List<Cell> expectedCells = Lists.newArrayList(BufferCell.live(kcvm, v, ts, BB1),
-                                                      BufferCell.live(kcvm, m, ts, BB1, CellPath.create(BB1)),
-                                                      BufferCell.live(kcvm, m, ts, BB2, CellPath.create(BB2)));
+        List<Cell> expectedCells = Lists.newArrayList(BufferCell.live(v, ts, BB1),
+                                                      BufferCell.live(m, ts, BB1, CellPath.create(BB1)),
+                                                      BufferCell.live(m, ts, BB2, CellPath.create(BB2)));
         expectedCells.forEach(builder::addCell);
         // We need to use ts-1 so the deletion doesn't shadow what we've created
         Row.Deletion rowDeletion = new Row.Deletion(new DeletionTime(ts-1, now), false);
@@ -298,14 +298,14 @@
         long ts1 = secondToTs(now1);
         Row.Builder r1Builder = BTreeRow.unsortedBuilder(now1);
         r1Builder.newRow(c1);
-        LivenessInfo r1Liveness = LivenessInfo.create(kcvm, ts1, now1);
+        LivenessInfo r1Liveness = LivenessInfo.create(ts1, now1);
         r1Builder.addPrimaryKeyLivenessInfo(r1Liveness);
         DeletionTime r1ComplexDeletion = new DeletionTime(ts1-1, now1);
         r1Builder.addComplexDeletion(m, r1ComplexDeletion);
 
-        Cell r1v = BufferCell.live(kcvm, v, ts1, BB1);
-        Cell r1m1 = BufferCell.live(kcvm, m, ts1, BB1, CellPath.create(BB1));
-        Cell r1m2 = BufferCell.live(kcvm, m, ts1, BB2, CellPath.create(BB2));
+        Cell r1v = BufferCell.live(v, ts1, BB1);
+        Cell r1m1 = BufferCell.live(m, ts1, BB1, CellPath.create(BB1));
+        Cell r1m2 = BufferCell.live(m, ts1, BB2, CellPath.create(BB2));
         List<Cell> r1ExpectedCells = Lists.newArrayList(r1v, r1m1, r1m2);
 
         r1ExpectedCells.forEach(r1Builder::addCell);
@@ -314,12 +314,12 @@
         long ts2 = secondToTs(now2);
         Row.Builder r2Builder = BTreeRow.unsortedBuilder(now2);
         r2Builder.newRow(c1);
-        LivenessInfo r2Liveness = LivenessInfo.create(kcvm, ts2, now2);
+        LivenessInfo r2Liveness = LivenessInfo.create(ts2, now2);
         r2Builder.addPrimaryKeyLivenessInfo(r2Liveness);
-        Cell r2v = BufferCell.live(kcvm, v, ts2, BB2);
-        Cell r2m2 = BufferCell.live(kcvm, m, ts2, BB1, CellPath.create(BB2));
-        Cell r2m3 = BufferCell.live(kcvm, m, ts2, BB2, CellPath.create(BB3));
-        Cell r2m4 = BufferCell.live(kcvm, m, ts2, BB3, CellPath.create(BB4));
+        Cell r2v = BufferCell.live(v, ts2, BB2);
+        Cell r2m2 = BufferCell.live(m, ts2, BB1, CellPath.create(BB2));
+        Cell r2m3 = BufferCell.live(m, ts2, BB2, CellPath.create(BB3));
+        Cell r2m4 = BufferCell.live(m, ts2, BB3, CellPath.create(BB4));
         List<Cell> r2ExpectedCells = Lists.newArrayList(r2v, r2m2, r2m3, r2m4);
 
         r2ExpectedCells.forEach(r2Builder::addCell);
@@ -374,7 +374,7 @@
         long ts1 = secondToTs(now1);
         Row.Builder r1Builder = BTreeRow.unsortedBuilder(now1);
         r1Builder.newRow(c1);
-        LivenessInfo r1Liveness = LivenessInfo.create(kcvm, ts1, now1);
+        LivenessInfo r1Liveness = LivenessInfo.create(ts1, now1);
         r1Builder.addPrimaryKeyLivenessInfo(r1Liveness);
 
         // mergedData == null
@@ -382,14 +382,14 @@
         long ts2 = secondToTs(now2);
         Row.Builder r2Builder = BTreeRow.unsortedBuilder(now2);
         r2Builder.newRow(c1);
-        LivenessInfo r2Liveness = LivenessInfo.create(kcvm, ts2, now2);
+        LivenessInfo r2Liveness = LivenessInfo.create(ts2, now2);
         r2Builder.addPrimaryKeyLivenessInfo(r2Liveness);
         DeletionTime r2ComplexDeletion = new DeletionTime(ts2-1, now2);
         r2Builder.addComplexDeletion(m, r2ComplexDeletion);
-        Cell r2v = BufferCell.live(kcvm, v, ts2, BB2);
-        Cell r2m2 = BufferCell.live(kcvm, m, ts2, BB1, CellPath.create(BB2));
-        Cell r2m3 = BufferCell.live(kcvm, m, ts2, BB2, CellPath.create(BB3));
-        Cell r2m4 = BufferCell.live(kcvm, m, ts2, BB3, CellPath.create(BB4));
+        Cell r2v = BufferCell.live(v, ts2, BB2);
+        Cell r2m2 = BufferCell.live(m, ts2, BB1, CellPath.create(BB2));
+        Cell r2m3 = BufferCell.live(m, ts2, BB2, CellPath.create(BB3));
+        Cell r2m4 = BufferCell.live(m, ts2, BB3, CellPath.create(BB4));
         List<Cell> r2ExpectedCells = Lists.newArrayList(r2v, r2m2, r2m3, r2m4);
 
         r2ExpectedCells.forEach(r2Builder::addCell);
@@ -428,7 +428,7 @@
         long ts1 = secondToTs(now1);
         Row.Builder r1Builder = BTreeRow.unsortedBuilder(now1);
         r1Builder.newRow(c1);
-        LivenessInfo r1Liveness = LivenessInfo.create(kcvm, ts1, now1);
+        LivenessInfo r1Liveness = LivenessInfo.create(ts1, now1);
         r1Builder.addPrimaryKeyLivenessInfo(r1Liveness);
 
         // mergedData == null
@@ -436,14 +436,14 @@
         long ts2 = secondToTs(now2);
         Row.Builder r2Builder = BTreeRow.unsortedBuilder(now2);
         r2Builder.newRow(c1);
-        LivenessInfo r2Liveness = LivenessInfo.create(kcvm, ts2, now2);
+        LivenessInfo r2Liveness = LivenessInfo.create(ts2, now2);
         r2Builder.addPrimaryKeyLivenessInfo(r2Liveness);
         DeletionTime r2ComplexDeletion = new DeletionTime(ts2-1, now2);
         r2Builder.addComplexDeletion(m, r2ComplexDeletion);
-        Cell r2v = BufferCell.live(kcvm, v, ts2, BB2);
-        Cell r2m2 = BufferCell.live(kcvm, m, ts2, BB1, CellPath.create(BB2));
-        Cell r2m3 = BufferCell.live(kcvm, m, ts2, BB2, CellPath.create(BB3));
-        Cell r2m4 = BufferCell.live(kcvm, m, ts2, BB3, CellPath.create(BB4));
+        Cell r2v = BufferCell.live(v, ts2, BB2);
+        Cell r2m2 = BufferCell.live(m, ts2, BB1, CellPath.create(BB2));
+        Cell r2m3 = BufferCell.live(m, ts2, BB2, CellPath.create(BB3));
+        Cell r2m4 = BufferCell.live(m, ts2, BB3, CellPath.create(BB4));
         List<Cell> r2ExpectedCells = Lists.newArrayList(r2v, r2m2, r2m3, r2m4);
 
         r2ExpectedCells.forEach(r2Builder::addCell);
@@ -481,8 +481,8 @@
         int now2 = now1 + 1;
         long ts2 = secondToTs(now2);
 
-        Cell expectedVCell = BufferCell.live(kcvm, v, ts2, BB2);
-        Cell expectedMCell = BufferCell.live(kcvm, m, ts2, BB2, CellPath.create(BB1));
+        Cell expectedVCell = BufferCell.live(v, ts2, BB2);
+        Cell expectedMCell = BufferCell.live(m, ts2, BB2, CellPath.create(BB1));
         DeletionTime expectedComplexDeletionTime = new DeletionTime(ts2 - 1, now2);
 
         Row.Builder updateBuilder = createBuilder(c1, now2, null, null, null);
@@ -494,7 +494,7 @@
         long td = Rows.merge(existingBuilder.build(), updateBuilder.build(), builder, now2 + 1);
 
         Assert.assertEquals(c1, builder.clustering);
-        Assert.assertEquals(LivenessInfo.create(kcvm, ts2, now2), builder.livenessInfo);
+        Assert.assertEquals(LivenessInfo.create(ts2, now2), builder.livenessInfo);
         Assert.assertEquals(Lists.newArrayList(Pair.create(m, new DeletionTime(ts2-1, now2))), builder.complexDeletions);
 
         Assert.assertEquals(2, builder.cells.size());

diff --git a/test/unit/org/apache/cassandra/db/rows/UnfilteredRowIteratorsMergeTest.java b/test/unit/org/apache/cassandra/db/rows/UnfilteredRowIteratorsMergeTest.java
index 7637fa0..0eeb379 100644
--- a/test/unit/org/apache/cassandra/db/rows/UnfilteredRowIteratorsMergeTest.java
+++ b/test/unit/org/apache/cassandra/db/rows/UnfilteredRowIteratorsMergeTest.java

@@ -33,7 +33,6 @@
 import org.apache.cassandra.Util;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.db.*;
-import org.apache.cassandra.db.Slice.Bound;
 import org.apache.cassandra.db.marshal.AsciiType;
 import org.apache.cassandra.db.marshal.Int32Type;
 import org.apache.cassandra.db.rows.Unfiltered.Kind;
@@ -230,9 +229,12 @@
             if (prev != null && curr != null && prev.isClose(false) && curr.isOpen(false) && prev.clustering().invert().equals(curr.clustering()))
             {
                 // Join. Prefer not to use merger to check its correctness.
-                RangeTombstone.Bound b = prev.clustering();
-                b = b.withNewKind(b.isInclusive() ? RangeTombstone.Bound.Kind.INCL_END_EXCL_START_BOUNDARY : RangeTombstone.Bound.Kind.EXCL_END_INCL_START_BOUNDARY);
-                prev = new RangeTombstoneBoundaryMarker(b, prev.closeDeletionTime(false), curr.openDeletionTime(false));
+                ClusteringBound b = ((RangeTombstoneBoundMarker) prev).clustering();
+                ClusteringBoundary boundary = ClusteringBoundary.create(b.isInclusive()
+                                                                            ? ClusteringPrefix.Kind.INCL_END_EXCL_START_BOUNDARY
+                                                                            : ClusteringPrefix.Kind.EXCL_END_INCL_START_BOUNDARY,
+                                                                        b.getRawValues());
+                prev = new RangeTombstoneBoundaryMarker(boundary, prev.closeDeletionTime(false), curr.openDeletionTime(false));
                 currUnfiltered = prev;
                 --di;
             }
@@ -357,20 +359,20 @@
         return def;
     }
 
-    private static Bound boundFor(int pos, boolean start, boolean inclusive)
+    private static ClusteringBound boundFor(int pos, boolean start, boolean inclusive)
     {
-        return Bound.create(Bound.boundKind(start, inclusive), new ByteBuffer[] {Int32Type.instance.decompose(pos)});
+        return ClusteringBound.create(ClusteringBound.boundKind(start, inclusive), new ByteBuffer[] {Int32Type.instance.decompose(pos)});
     }
 
     private static Clustering clusteringFor(int i)
     {
-        return new Clustering(Int32Type.instance.decompose(i));
+        return Clustering.make(Int32Type.instance.decompose(i));
     }
 
     static Row emptyRowAt(int pos, Function<Integer, Integer> timeGenerator)
     {
         final Clustering clustering = clusteringFor(pos);
-        final LivenessInfo live = LivenessInfo.create(metadata, timeGenerator.apply(pos), nowInSec);
+        final LivenessInfo live = LivenessInfo.create(timeGenerator.apply(pos), nowInSec);
         return BTreeRow.noCellLiveRow(clustering, live);
     }
 
@@ -485,8 +487,8 @@
 
     private RangeTombstoneMarker marker(int pos, int delTime, boolean isStart, boolean inclusive)
     {
-        return new RangeTombstoneBoundMarker(Bound.create(Bound.boundKind(isStart, inclusive),
-                                                          new ByteBuffer[] {clusteringFor(pos).get(0)}),
+        return new RangeTombstoneBoundMarker(ClusteringBound.create(ClusteringBound.boundKind(isStart, inclusive),
+                                                                    new ByteBuffer[] {clusteringFor(pos).get(0)}),
                                              new DeletionTime(delTime, delTime));
     }
 }

diff --git a/test/unit/org/apache/cassandra/dht/BootStrapperTest.java b/test/unit/org/apache/cassandra/dht/BootStrapperTest.java
index 8974791..74126ee 100644
--- a/test/unit/org/apache/cassandra/dht/BootStrapperTest.java
+++ b/test/unit/org/apache/cassandra/dht/BootStrapperTest.java

@@ -96,7 +96,7 @@
         InetAddress myEndpoint = InetAddress.getByName("127.0.0.1");
 
         assertEquals(numOldNodes, tmd.sortedTokens().size());
-        RangeStreamer s = new RangeStreamer(tmd, null, myEndpoint, "Bootstrap", true, DatabaseDescriptor.getEndpointSnitch(), new StreamStateStore());
+        RangeStreamer s = new RangeStreamer(tmd, null, myEndpoint, "Bootstrap", true, DatabaseDescriptor.getEndpointSnitch(), new StreamStateStore(), false);
         IFailureDetector mockFailureDetector = new IFailureDetector()
         {
             public boolean isAlive(InetAddress ep)

diff --git a/test/unit/org/apache/cassandra/dht/LengthPartitioner.java b/test/unit/org/apache/cassandra/dht/LengthPartitioner.java
index 9cefbf2..e2202fe 100644
--- a/test/unit/org/apache/cassandra/dht/LengthPartitioner.java
+++ b/test/unit/org/apache/cassandra/dht/LengthPartitioner.java

@@ -61,6 +61,12 @@
         return MINIMUM;
     }
 
+    @Override
+    public Token getMaximumToken()
+    {
+        return null;
+    }
+
     public BigIntegerToken getRandomToken()
     {
         return new BigIntegerToken(BigInteger.valueOf(new Random().nextInt(15)));

diff --git a/test/unit/org/apache/cassandra/dht/SplitterTest.java b/test/unit/org/apache/cassandra/dht/SplitterTest.java
new file mode 100644
index 0000000..751a7d7
--- /dev/null
+++ b/test/unit/org/apache/cassandra/dht/SplitterTest.java

@@ -0,0 +1,158 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.dht;
+
+import java.math.BigDecimal;
+import java.math.BigInteger;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Random;
+import java.util.Set;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+
+public class SplitterTest
+{
+
+    @Test
+    public void randomSplitTestNoVNodesRandomPartitioner()
+    {
+        randomSplitTestNoVNodes(new RandomPartitioner());
+    }
+
+    @Test
+    public void randomSplitTestNoVNodesMurmur3Partitioner()
+    {
+        randomSplitTestNoVNodes(new Murmur3Partitioner());
+    }
+
+    @Test
+    public void randomSplitTestVNodesRandomPartitioner()
+    {
+        randomSplitTestVNodes(new RandomPartitioner());
+    }
+    @Test
+    public void randomSplitTestVNodesMurmur3Partitioner()
+    {
+        randomSplitTestVNodes(new Murmur3Partitioner());
+    }
+
+    public void randomSplitTestNoVNodes(IPartitioner partitioner)
+    {
+        Splitter splitter = partitioner.splitter().get();
+        Random r = new Random();
+        for (int i = 0; i < 10000; i++)
+        {
+            List<Range<Token>> localRanges = generateLocalRanges(1, r.nextInt(4)+1, splitter, r, partitioner instanceof RandomPartitioner);
+            List<Token> boundaries = splitter.splitOwnedRanges(r.nextInt(9) + 1, localRanges, false);
+            assertTrue("boundaries = "+boundaries+" ranges = "+localRanges, assertRangeSizeEqual(localRanges, boundaries, partitioner, splitter, true));
+        }
+    }
+
+    public void randomSplitTestVNodes(IPartitioner partitioner)
+    {
+        Splitter splitter = partitioner.splitter().get();
+        Random r = new Random();
+        for (int i = 0; i < 10000; i++)
+        {
+            // we need many tokens to be able to split evenly over the disks
+            int numTokens = 172 + r.nextInt(128);
+            int rf = r.nextInt(4) + 2;
+            int parts = r.nextInt(5)+1;
+            List<Range<Token>> localRanges = generateLocalRanges(numTokens, rf, splitter, r, partitioner instanceof RandomPartitioner);
+            List<Token> boundaries = splitter.splitOwnedRanges(parts, localRanges, true);
+            if (!assertRangeSizeEqual(localRanges, boundaries, partitioner, splitter, false))
+                fail(String.format("Could not split %d tokens with rf=%d into %d parts (localRanges=%s, boundaries=%s)", numTokens, rf, parts, localRanges, boundaries));
+        }
+    }
+
+    private boolean assertRangeSizeEqual(List<Range<Token>> localRanges, List<Token> tokens, IPartitioner partitioner, Splitter splitter, boolean splitIndividualRanges)
+    {
+        Token start = partitioner.getMinimumToken();
+        List<BigInteger> splits = new ArrayList<>();
+
+        for (int i = 0; i < tokens.size(); i++)
+        {
+            Token end = i == tokens.size() - 1 ? partitioner.getMaximumToken() : tokens.get(i);
+            splits.add(sumOwnedBetween(localRanges, start, end, splitter, splitIndividualRanges));
+            start = end;
+        }
+        // when we dont need to keep around full ranges, the difference is small between the partitions
+        BigDecimal delta = splitIndividualRanges ? BigDecimal.valueOf(0.001) : BigDecimal.valueOf(0.2);
+        boolean allBalanced = true;
+        for (BigInteger b : splits)
+        {
+            for (BigInteger i : splits)
+            {
+                BigDecimal bdb = new BigDecimal(b);
+                BigDecimal bdi = new BigDecimal(i);
+                BigDecimal q = bdb.divide(bdi, 2, BigDecimal.ROUND_HALF_DOWN);
+                if (q.compareTo(BigDecimal.ONE.add(delta)) > 0 || q.compareTo(BigDecimal.ONE.subtract(delta)) < 0)
+                    allBalanced = false;
+            }
+        }
+        return allBalanced;
+    }
+
+    private BigInteger sumOwnedBetween(List<Range<Token>> localRanges, Token start, Token end, Splitter splitter, boolean splitIndividualRanges)
+    {
+        BigInteger sum = BigInteger.ZERO;
+        for (Range<Token> range : localRanges)
+        {
+            if (splitIndividualRanges)
+            {
+                Set<Range<Token>> intersections = new Range<>(start, end).intersectionWith(range);
+                for (Range<Token> intersection : intersections)
+                    sum = sum.add(splitter.valueForToken(intersection.right).subtract(splitter.valueForToken(intersection.left)));
+            }
+            else
+            {
+                if (new Range<>(start, end).contains(range.left))
+                    sum = sum.add(splitter.valueForToken(range.right).subtract(splitter.valueForToken(range.left)));
+            }
+        }
+        return sum;
+    }
+
+    private List<Range<Token>> generateLocalRanges(int numTokens, int rf, Splitter splitter, Random r, boolean randomPartitioner)
+    {
+        int localTokens = numTokens * rf;
+        List<Token> randomTokens = new ArrayList<>();
+
+        for (int i = 0; i < localTokens * 2; i++)
+        {
+            Token t = splitter.tokenForValue(randomPartitioner ? new BigInteger(127, r) : BigInteger.valueOf(r.nextLong()));
+            randomTokens.add(t);
+        }
+
+        Collections.sort(randomTokens);
+
+        List<Range<Token>> localRanges = new ArrayList<>(localTokens);
+        for (int i = 0; i < randomTokens.size() - 1; i++)
+        {
+            assert randomTokens.get(i).compareTo(randomTokens.get(i+1)) < 0;
+            localRanges.add(new Range<>(randomTokens.get(i), randomTokens.get(i+1)));
+            i++;
+        }
+        return localRanges;
+    }
+}

diff --git a/test/unit/org/apache/cassandra/hints/AlteredHints.java b/test/unit/org/apache/cassandra/hints/AlteredHints.java
new file mode 100644
index 0000000..23dc32a
--- /dev/null
+++ b/test/unit/org/apache/cassandra/hints/AlteredHints.java

@@ -0,0 +1,129 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.hints;
+
+import java.io.File;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.UUID;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.collect.ImmutableMap;
+import com.google.common.io.Files;
+import org.junit.Assert;
+import org.junit.BeforeClass;
+
+import org.apache.cassandra.SchemaLoader;
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.Schema;
+import org.apache.cassandra.db.Mutation;
+import org.apache.cassandra.db.RowUpdateBuilder;
+import org.apache.cassandra.schema.KeyspaceParams;
+import org.apache.cassandra.utils.UUIDGen;
+
+import static org.apache.cassandra.utils.ByteBufferUtil.bytes;
+
+/**
+ * Base class for testing compressed and encrypted hints.
+ */
+public abstract class AlteredHints
+{
+    protected static final String KEYSPACE = "hints_compression_test";
+    private static final String TABLE = "table";
+
+    private static Mutation createMutation(int index, long timestamp)
+    {
+        CFMetaData table = Schema.instance.getCFMetaData(KEYSPACE, TABLE);
+        return new RowUpdateBuilder(table, timestamp, bytes(index))
+               .clustering(bytes(index))
+               .add("val", bytes(index))
+               .build();
+    }
+
+    private static Hint createHint(int idx, long baseTimestamp)
+    {
+        long timestamp = baseTimestamp + idx;
+        return Hint.create(createMutation(idx, TimeUnit.MILLISECONDS.toMicros(timestamp)), timestamp);
+    }
+
+    @BeforeClass
+    public static void defineSchema()
+    {
+        SchemaLoader.prepareServer();
+        SchemaLoader.createKeyspace(KEYSPACE, KeyspaceParams.simple(1), SchemaLoader.standardCFMD(KEYSPACE, TABLE));
+    }
+
+    abstract ImmutableMap<String, Object> params();
+    abstract boolean looksLegit(HintsWriter writer);
+    abstract boolean looksLegit(ChecksummedDataInput checksummedDataInput);
+
+    public void multiFlushAndDeserializeTest() throws Exception
+    {
+        int hintNum = 0;
+        int bufferSize = HintsWriteExecutor.WRITE_BUFFER_SIZE;
+        List<Hint> hints = new LinkedList<>();
+
+        UUID hostId = UUIDGen.getTimeUUID();
+        long ts = System.currentTimeMillis();
+
+        HintsDescriptor descriptor = new HintsDescriptor(hostId, ts, params());
+        File dir = Files.createTempDir();
+        try (HintsWriter writer = HintsWriter.create(dir, descriptor))
+        {
+            Assert.assertTrue(looksLegit(writer));
+
+            ByteBuffer writeBuffer = ByteBuffer.allocateDirect(bufferSize);
+            try (HintsWriter.Session session = writer.newSession(writeBuffer))
+            {
+                while (session.getBytesWritten() < bufferSize * 3)
+                {
+                    Hint hint = createHint(hintNum, ts+hintNum);
+                    session.append(hint);
+                    hints.add(hint);
+                    hintNum++;
+                }
+            }
+        }
+
+        try (HintsReader reader = HintsReader.open(new File(dir, descriptor.fileName())))
+        {
+            Assert.assertTrue(looksLegit(reader.getInput()));
+            List<Hint> deserialized = new ArrayList<>(hintNum);
+
+            for (HintsReader.Page page: reader)
+            {
+                Iterator<Hint> iterator = page.hintsIterator();
+                while (iterator.hasNext())
+                {
+                    deserialized.add(iterator.next());
+                }
+            }
+
+            Assert.assertEquals(hints.size(), deserialized.size());
+            hintNum = 0;
+            for (Hint expected: hints)
+            {
+                HintsTestUtil.assertHintsEqual(expected, deserialized.get(hintNum));
+                hintNum++;
+            }
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/hints/ChecksummedDataInputTest.java b/test/unit/org/apache/cassandra/hints/ChecksummedDataInputTest.java
index 323a12d..5a48b21 100644
--- a/test/unit/org/apache/cassandra/hints/ChecksummedDataInputTest.java
+++ b/test/unit/org/apache/cassandra/hints/ChecksummedDataInputTest.java

@@ -77,7 +77,7 @@
         // save the buffer to file to create a RAR
         File file = File.createTempFile("testReadMethods", "1");
         file.deleteOnExit();
-        try (SequentialWriter writer = SequentialWriter.open(file))
+        try (SequentialWriter writer = new SequentialWriter(file))
         {
             writer.write(buffer);
             writer.writeInt((int) crc.getValue());
@@ -111,7 +111,7 @@
 
             // assert that the crc matches, and that we've read exactly as many bytes as expected
             assertTrue(reader.checkCrc());
-            assertEquals(0, reader.bytesRemaining());
+            assertTrue(reader.isEOF());
 
             reader.checkLimit(0);
         }
@@ -152,7 +152,7 @@
         // save the buffer to file to create a RAR
         File file = File.createTempFile("testResetCrc", "1");
         file.deleteOnExit();
-        try (SequentialWriter writer = SequentialWriter.open(file))
+        try (SequentialWriter writer = new SequentialWriter(file))
         {
             writer.write(buffer);
             writer.finish();
@@ -177,7 +177,7 @@
             assertEquals(2.2f, reader.readFloat());
             assertEquals(42, reader.readInt());
             assertTrue(reader.checkCrc());
-            assertEquals(0, reader.bytesRemaining());
+            assertTrue(reader.isEOF());
         }
     }
 
@@ -208,7 +208,7 @@
         // save the buffer to file to create a RAR
         File file = File.createTempFile("testFailedCrc", "1");
         file.deleteOnExit();
-        try (SequentialWriter writer = SequentialWriter.open(file))
+        try (SequentialWriter writer = new SequentialWriter(file))
         {
             writer.write(buffer);
             writer.finish();
@@ -227,7 +227,7 @@
             assertEquals(10, reader.readByte());
             assertEquals('t', reader.readChar());
             assertFalse(reader.checkCrc());
-            assertEquals(0, reader.bytesRemaining());
+            assertTrue(reader.isEOF());
         }
     }
 }

diff --git a/test/unit/org/apache/cassandra/hints/HintTest.java b/test/unit/org/apache/cassandra/hints/HintTest.java
index 1d486e1..658a41c 100644
--- a/test/unit/org/apache/cassandra/hints/HintTest.java
+++ b/test/unit/org/apache/cassandra/hints/HintTest.java

@@ -232,7 +232,7 @@
         // Process hint message.
         HintMessage message = new HintMessage(localId, hint);
         MessagingService.instance().getVerbHandler(MessagingService.Verb.HINT).doVerb(
-                MessageIn.create(local, message, Collections.emptyMap(), MessagingService.Verb.HINT, MessagingService.current_version),
+                MessageIn.create(local, message, Collections.emptyMap(), MessagingService.Verb.HINT, MessagingService.current_version, MessageIn.createTimestamp()),
                 -1);
 
         // hint should not be applied as we no longer are a replica
@@ -277,7 +277,7 @@
             // Process hint message.
             HintMessage message = new HintMessage(localId, hint);
             MessagingService.instance().getVerbHandler(MessagingService.Verb.HINT).doVerb(
-                    MessageIn.create(local, message, Collections.emptyMap(), MessagingService.Verb.HINT, MessagingService.current_version),
+                    MessageIn.create(local, message, Collections.emptyMap(), MessagingService.Verb.HINT, MessagingService.current_version, MessageIn.createTimestamp()),
                     -1);
 
             // hint should not be applied as we no longer are a replica
@@ -331,8 +331,8 @@
     {
         ReadCommand cmd = cmd(key, table);
 
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup();
-             PartitionIterator iterator = cmd.executeInternal(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController();
+             PartitionIterator iterator = cmd.executeInternal(executionController))
         {
             assertFalse(iterator.hasNext());
         }

diff --git a/test/unit/org/apache/cassandra/hints/HintsCompressionTest.java b/test/unit/org/apache/cassandra/hints/HintsCompressionTest.java
index d6a08ca..f82db49 100644
--- a/test/unit/org/apache/cassandra/hints/HintsCompressionTest.java
+++ b/test/unit/org/apache/cassandra/hints/HintsCompressionTest.java

@@ -18,65 +18,20 @@
 
 package org.apache.cassandra.hints;
 
-import java.io.File;
-import java.nio.ByteBuffer;
-import java.util.ArrayList;
-import java.util.Iterator;
-import java.util.LinkedList;
-import java.util.List;
-import java.util.UUID;
-import java.util.concurrent.TimeUnit;
-
 import com.google.common.collect.ImmutableMap;
-import com.google.common.io.Files;
-import org.junit.Assert;
-import org.junit.BeforeClass;
 import org.junit.Test;
 
-import org.apache.cassandra.SchemaLoader;
-import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.ParameterizedClass;
-import org.apache.cassandra.config.Schema;
-import org.apache.cassandra.db.Mutation;
-import org.apache.cassandra.db.RowUpdateBuilder;
 import org.apache.cassandra.io.compress.DeflateCompressor;
 import org.apache.cassandra.io.compress.ICompressor;
 import org.apache.cassandra.io.compress.LZ4Compressor;
 import org.apache.cassandra.io.compress.SnappyCompressor;
-import org.apache.cassandra.schema.KeyspaceParams;
-import org.apache.cassandra.utils.UUIDGen;
 
-import static org.apache.cassandra.utils.ByteBufferUtil.bytes;
-
-public class HintsCompressionTest
+public class HintsCompressionTest extends AlteredHints
 {
-    private static final String KEYSPACE = "hints_compression_test";
-    private static final String TABLE = "table";
+    private Class<? extends ICompressor> compressorClass;
 
-
-    private static Mutation createMutation(int index, long timestamp)
-    {
-        CFMetaData table = Schema.instance.getCFMetaData(KEYSPACE, TABLE);
-        return new RowUpdateBuilder(table, timestamp, bytes(index))
-               .clustering(bytes(index))
-               .add("val", bytes(index))
-               .build();
-    }
-
-    private static Hint createHint(int idx, long baseTimestamp)
-    {
-        long timestamp = baseTimestamp + idx;
-        return Hint.create(createMutation(idx, TimeUnit.MILLISECONDS.toMicros(timestamp)), timestamp);
-    }
-
-    @BeforeClass
-    public static void defineSchema()
-    {
-        SchemaLoader.prepareServer();
-        SchemaLoader.createKeyspace(KEYSPACE, KeyspaceParams.simple(1), SchemaLoader.standardCFMD(KEYSPACE, TABLE));
-    }
-
-    private ImmutableMap<String, Object> params(Class<? extends ICompressor> compressorClass)
+    ImmutableMap<String, Object> params()
     {
         ImmutableMap<String, Object> compressionParams = ImmutableMap.<String, Object>builder()
                                                                      .put(ParameterizedClass.CLASS_NAME, compressorClass.getSimpleName())
@@ -86,72 +41,40 @@
                            .build();
     }
 
-    public void multiFlushAndDeserializeTest(Class<? extends ICompressor> compressorClass) throws Exception
+    boolean looksLegit(HintsWriter writer)
     {
-        int hintNum = 0;
-        int bufferSize = HintsWriteExecutor.WRITE_BUFFER_SIZE;
-        List<Hint> hints = new LinkedList<>();
+        if (!(writer instanceof CompressedHintsWriter))
+            return false;
+        CompressedHintsWriter compressedHintsWriter = (CompressedHintsWriter)writer;
+        return compressedHintsWriter.getCompressor().getClass().isAssignableFrom(compressorClass);
+    }
 
-        UUID hostId = UUIDGen.getTimeUUID();
-        long ts = System.currentTimeMillis();
-
-        HintsDescriptor descriptor = new HintsDescriptor(hostId, ts, params(compressorClass));
-        File dir = Files.createTempDir();
-        try (HintsWriter writer = HintsWriter.create(dir, descriptor))
-        {
-            assert writer instanceof CompressedHintsWriter;
-
-            ByteBuffer writeBuffer = ByteBuffer.allocateDirect(bufferSize);
-            try (HintsWriter.Session session = writer.newSession(writeBuffer))
-            {
-                while (session.getBytesWritten() < bufferSize * 3)
-                {
-                    Hint hint = createHint(hintNum, ts+hintNum);
-                    session.append(hint);
-                    hints.add(hint);
-                    hintNum++;
-                }
-            }
-        }
-
-        try (HintsReader reader = HintsReader.open(new File(dir, descriptor.fileName())))
-        {
-            List<Hint> deserialized = new ArrayList<>(hintNum);
-
-            for (HintsReader.Page page: reader)
-            {
-                Iterator<Hint> iterator = page.hintsIterator();
-                while (iterator.hasNext())
-                {
-                    deserialized.add(iterator.next());
-                }
-            }
-
-            Assert.assertEquals(hints.size(), deserialized.size());
-            hintNum = 0;
-            for (Hint expected: hints)
-            {
-                HintsTestUtil.assertHintsEqual(expected, deserialized.get(hintNum));
-                hintNum++;
-            }
-        }
+    boolean looksLegit(ChecksummedDataInput checksummedDataInput)
+    {
+        if (!(checksummedDataInput instanceof CompressedChecksummedDataInput))
+            return false;
+        CompressedChecksummedDataInput compressedChecksummedDataInput = (CompressedChecksummedDataInput)checksummedDataInput;
+        return compressedChecksummedDataInput.getCompressor().getClass().isAssignableFrom(compressorClass);
     }
 
     @Test
     public void lz4Compressor() throws Exception
     {
-        multiFlushAndDeserializeTest(LZ4Compressor.class);
+        compressorClass = LZ4Compressor.class;
+        multiFlushAndDeserializeTest();
     }
 
     @Test
     public void snappyCompressor() throws Exception
     {
-        multiFlushAndDeserializeTest(SnappyCompressor.class);
+        compressorClass = SnappyCompressor.class;
+        multiFlushAndDeserializeTest();
     }
 
     @Test
     public void deflateCompressor() throws Exception
     {
-        multiFlushAndDeserializeTest(DeflateCompressor.class);
+        compressorClass = DeflateCompressor.class;
+        multiFlushAndDeserializeTest();
     }
 }

diff --git a/test/unit/org/apache/cassandra/hints/HintsEncryptionTest.java b/test/unit/org/apache/cassandra/hints/HintsEncryptionTest.java
new file mode 100644
index 0000000..beb95d1
--- /dev/null
+++ b/test/unit/org/apache/cassandra/hints/HintsEncryptionTest.java

@@ -0,0 +1,81 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.hints;
+
+import java.util.Arrays;
+
+import javax.crypto.Cipher;
+
+import com.google.common.collect.ImmutableMap;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.security.EncryptionContext;
+import org.apache.cassandra.security.EncryptionContextGenerator;
+
+public class HintsEncryptionTest extends AlteredHints
+{
+    EncryptionContext encryptionContext;
+    Cipher cipher;
+
+    @Before
+    public void setup()
+    {
+        encryptionContext = EncryptionContextGenerator.createContext(true);
+        DatabaseDescriptor.setEncryptionContext(encryptionContext);
+    }
+
+    @Test
+    public void encryptedHints() throws Exception
+    {
+        multiFlushAndDeserializeTest();
+    }
+
+    boolean looksLegit(HintsWriter writer)
+    {
+        if (!(writer instanceof EncryptedHintsWriter))
+            return false;
+
+        EncryptedHintsWriter encryptedHintsWriter = (EncryptedHintsWriter)writer;
+        cipher = encryptedHintsWriter.getCipher();
+
+        return encryptedHintsWriter.getCompressor().getClass().isAssignableFrom(encryptionContext.getCompressor().getClass());
+    }
+
+    boolean looksLegit(ChecksummedDataInput checksummedDataInput)
+    {
+        if (!(checksummedDataInput instanceof EncryptedChecksummedDataInput))
+            return false;
+
+        EncryptedChecksummedDataInput encryptedDataInput = (EncryptedChecksummedDataInput)checksummedDataInput;
+
+        return Arrays.equals(cipher.getIV(), encryptedDataInput.getCipher().getIV()) &&
+               encryptedDataInput.getCompressor().getClass().isAssignableFrom(encryptionContext.getCompressor().getClass());
+    }
+
+    ImmutableMap<String, Object> params()
+    {
+        ImmutableMap<String, Object> compressionParams = ImmutableMap.<String, Object>builder()
+                                                         .putAll(encryptionContext.toHeaderParameters())
+                                                         .build();
+        return ImmutableMap.<String, Object>builder()
+               .put(HintsDescriptor.ENCRYPTION, compressionParams)
+               .build();
+    }
+}

diff --git a/test/unit/org/apache/cassandra/index/CustomIndexTest.java b/test/unit/org/apache/cassandra/index/CustomIndexTest.java
index f02823c..a8ec0b3 100644
--- a/test/unit/org/apache/cassandra/index/CustomIndexTest.java
+++ b/test/unit/org/apache/cassandra/index/CustomIndexTest.java

@@ -41,7 +41,7 @@
 import org.apache.cassandra.cql3.statements.ModificationStatement;
 import org.apache.cassandra.db.ColumnFamilyStore;
 import org.apache.cassandra.db.ReadCommand;
-import org.apache.cassandra.db.ReadOrderGroup;
+import org.apache.cassandra.db.ReadExecutionController;
 import org.apache.cassandra.db.marshal.AbstractType;
 import org.apache.cassandra.db.marshal.Int32Type;
 import org.apache.cassandra.db.marshal.UTF8Type;
@@ -555,8 +555,8 @@
         assertEquals(0, index.partitionDeletions.size());
 
         ReadCommand cmd = Util.cmd(cfs, 0).build();
-        try (ReadOrderGroup orderGroup = cmd.startOrderGroup();
-             UnfilteredPartitionIterator iterator = cmd.executeLocally(orderGroup))
+        try (ReadExecutionController executionController = cmd.executionController();
+             UnfilteredPartitionIterator iterator = cmd.executeLocally(executionController))
         {
             assertTrue(iterator.hasNext());
             cfs.indexManager.deletePartition(iterator.next(), FBUtilities.nowInSeconds());

diff --git a/test/unit/org/apache/cassandra/index/StubIndex.java b/test/unit/org/apache/cassandra/index/StubIndex.java
index 28ea097..5f9d1f3 100644
--- a/test/unit/org/apache/cassandra/index/StubIndex.java
+++ b/test/unit/org/apache/cassandra/index/StubIndex.java

@@ -194,7 +194,7 @@
 
     public Searcher searcherFor(final ReadCommand command)
     {
-        return (orderGroup) -> Util.executeLocally((PartitionRangeReadCommand)command, baseCfs, orderGroup);
+        return (controller) -> Util.executeLocally((PartitionRangeReadCommand)command, baseCfs, controller);
     }
 
     public BiFunction<PartitionIterator, ReadCommand, PartitionIterator> postProcessorFor(ReadCommand readCommand)

diff --git a/test/unit/org/apache/cassandra/index/internal/CassandraIndexTest.java b/test/unit/org/apache/cassandra/index/internal/CassandraIndexTest.java
index 2845c19..e3994ef 100644
--- a/test/unit/org/apache/cassandra/index/internal/CassandraIndexTest.java
+++ b/test/unit/org/apache/cassandra/index/internal/CassandraIndexTest.java

@@ -502,8 +502,8 @@
                                                                                indexKey,
                                                                                ColumnFilter.all(indexCfs.metadata),
                                                                                filter);
-        try (ReadOrderGroup orderGroup = ReadOrderGroup.forCommand(command);
-             UnfilteredRowIterator iter = command.queryMemtableAndDisk(indexCfs, orderGroup.indexReadOpOrderGroup()))
+        try (ReadExecutionController executionController = command.executionController();
+             UnfilteredRowIterator iter = command.queryMemtableAndDisk(indexCfs, executionController))
         {
             while( iter.hasNext())
             {

diff --git a/test/unit/org/apache/cassandra/index/internal/CustomCassandraIndex.java b/test/unit/org/apache/cassandra/index/internal/CustomCassandraIndex.java
index 2124abe..1173801 100644
--- a/test/unit/org/apache/cassandra/index/internal/CustomCassandraIndex.java
+++ b/test/unit/org/apache/cassandra/index/internal/CustomCassandraIndex.java

@@ -28,6 +28,7 @@
 import java.util.stream.Collectors;
 import java.util.stream.StreamSupport;
 
+import org.apache.cassandra.index.TargetParser;
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
@@ -60,7 +61,6 @@
 
 import static org.apache.cassandra.index.internal.CassandraIndex.getFunctions;
 import static org.apache.cassandra.index.internal.CassandraIndex.indexCfsMetadata;
-import static org.apache.cassandra.index.internal.CassandraIndex.parseTarget;
 
 /**
  * Clone of KeysIndex used in CassandraIndexTest#testCustomIndexWithCFS to verify
@@ -159,7 +159,7 @@
     private void setMetadata(IndexMetadata indexDef)
     {
         metadata = indexDef;
-        Pair<ColumnDefinition, IndexTarget.Type> target = parseTarget(baseCfs.metadata, indexDef);
+        Pair<ColumnDefinition, IndexTarget.Type> target = TargetParser.parse(baseCfs.metadata, indexDef);
         functions = getFunctions(indexDef, target);
         CFMetaData cfm = indexCfsMetadata(baseCfs.metadata, indexDef);
         indexCfs = ColumnFamilyStore.createColumnFamilyStore(baseCfs.keyspace,
@@ -376,7 +376,7 @@
                 insert(key.getKey(),
                        clustering,
                        cell,
-                       LivenessInfo.create(cell.timestamp(), cell.ttl(), cell.localDeletionTime()),
+                       LivenessInfo.withExpirationTime(cell.timestamp(), cell.ttl(), cell.localDeletionTime()),
                        opGroup);
             }
 
@@ -424,7 +424,7 @@
                         }
                     }
                 }
-                return LivenessInfo.create(baseCfs.metadata, timestamp, ttl, nowInSec);
+                return LivenessInfo.create(timestamp, ttl, nowInSec);
             }
         };
     }
@@ -640,9 +640,9 @@
                         metadata.name,
                         getSSTableNames(sstables));
 
-            SecondaryIndexBuilder builder = new SecondaryIndexBuilder(baseCfs,
-                                                                      Collections.singleton(this),
-                                                                      new ReducingKeyIterator(sstables));
+            SecondaryIndexBuilder builder = new CollatedViewIndexBuilder(baseCfs,
+                                                                         Collections.singleton(this),
+                                                                         new ReducingKeyIterator(sstables));
             Future<?> future = CompactionManager.instance.submitIndexBuild(builder);
             FBUtilities.waitOnFuture(future);
             indexCfs.forceBlockingFlush();

diff --git a/test/unit/org/apache/cassandra/index/sasi/SASIIndexTest.java b/test/unit/org/apache/cassandra/index/sasi/SASIIndexTest.java
new file mode 100644
index 0000000..46d1a3c
--- /dev/null
+++ b/test/unit/org/apache/cassandra/index/sasi/SASIIndexTest.java

@@ -0,0 +1,2424 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi;
+
+import java.nio.ByteBuffer;
+import java.util.*;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicInteger;
+
+import org.apache.cassandra.SchemaLoader;
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.cql3.*;
+import org.apache.cassandra.cql3.Term;
+import org.apache.cassandra.cql3.statements.IndexTarget;
+import org.apache.cassandra.cql3.statements.SelectStatement;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.filter.ColumnFilter;
+import org.apache.cassandra.db.filter.DataLimits;
+import org.apache.cassandra.db.filter.RowFilter;
+import org.apache.cassandra.db.marshal.*;
+import org.apache.cassandra.db.partitions.PartitionUpdate;
+import org.apache.cassandra.db.partitions.UnfilteredPartitionIterator;
+import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.dht.IPartitioner;
+import org.apache.cassandra.dht.Murmur3Partitioner;
+import org.apache.cassandra.dht.Range;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.exceptions.InvalidRequestException;
+import org.apache.cassandra.exceptions.SyntaxException;
+import org.apache.cassandra.index.sasi.conf.ColumnIndex;
+import org.apache.cassandra.index.sasi.disk.OnDiskIndexBuilder;
+import org.apache.cassandra.index.sasi.exceptions.TimeQuotaExceededException;
+import org.apache.cassandra.index.sasi.memory.IndexMemtable;
+import org.apache.cassandra.index.sasi.plan.QueryController;
+import org.apache.cassandra.index.sasi.plan.QueryPlan;
+import org.apache.cassandra.schema.IndexMetadata;
+import org.apache.cassandra.schema.KeyspaceMetadata;
+import org.apache.cassandra.schema.KeyspaceParams;
+import org.apache.cassandra.schema.Tables;
+import org.apache.cassandra.serializers.MarshalException;
+import org.apache.cassandra.serializers.TypeSerializer;
+import org.apache.cassandra.service.MigrationManager;
+import org.apache.cassandra.service.QueryState;
+import org.apache.cassandra.thrift.CqlRow;
+import org.apache.cassandra.transport.messages.ResultMessage;
+import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.utils.FBUtilities;
+import org.apache.cassandra.utils.Pair;
+
+import com.google.common.collect.Lists;
+import com.google.common.util.concurrent.Uninterruptibles;
+
+import junit.framework.Assert;
+
+import org.junit.*;
+
+public class SASIIndexTest
+{
+    private static final IPartitioner PARTITIONER;
+
+    static {
+        System.setProperty("cassandra.config", "cassandra-murmur.yaml");
+        PARTITIONER = Murmur3Partitioner.instance;
+    }
+
+    private static final String KS_NAME = "sasi";
+    private static final String CF_NAME = "test_cf";
+    private static final String CLUSTERING_CF_NAME_1 = "clustering_test_cf_1";
+    private static final String CLUSTERING_CF_NAME_2 = "clustering_test_cf_2";
+    private static final String STATIC_CF_NAME = "static_sasi_test_cf";
+    private static final String FTS_CF_NAME = "full_text_search_sasi_test_cf";
+
+    @BeforeClass
+    public static void loadSchema() throws ConfigurationException
+    {
+        SchemaLoader.loadSchema();
+        MigrationManager.announceNewKeyspace(KeyspaceMetadata.create(KS_NAME,
+                                                                     KeyspaceParams.simpleTransient(1),
+                                                                     Tables.of(SchemaLoader.sasiCFMD(KS_NAME, CF_NAME),
+                                                                               SchemaLoader.clusteringSASICFMD(KS_NAME, CLUSTERING_CF_NAME_1),
+                                                                               SchemaLoader.clusteringSASICFMD(KS_NAME, CLUSTERING_CF_NAME_2, "location"),
+                                                                               SchemaLoader.staticSASICFMD(KS_NAME, STATIC_CF_NAME),
+                                                                               SchemaLoader.fullTextSearchSASICFMD(KS_NAME, FTS_CF_NAME))));
+    }
+
+    @Before
+    public void cleanUp()
+    {
+        Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME).truncateBlocking();
+    }
+
+    @Test
+    public void testSingleExpressionQueries() throws Exception
+    {
+        testSingleExpressionQueries(false);
+        cleanupData();
+        testSingleExpressionQueries(true);
+    }
+
+    private void testSingleExpressionQueries(boolean forceFlush) throws Exception
+    {
+        Map<String, Pair<String, Integer>> data = new HashMap<String, Pair<String, Integer>>()
+        {{
+            put("key1", Pair.create("Pavel", 14));
+            put("key2", Pair.create("Pavel", 26));
+            put("key3", Pair.create("Pavel", 27));
+            put("key4", Pair.create("Jason", 27));
+        }};
+
+        ColumnFamilyStore store = loadData(data, forceFlush);
+
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+        final ByteBuffer age = UTF8Type.instance.decompose("age");
+
+        Set<String> rows;
+
+        rows = getIndexed(store, 10, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key2", "key3", "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("av")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key2", "key3" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("as")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("aw")));
+        Assert.assertEquals(rows.toString(), 0, rows.size());
+
+        rows = getIndexed(store, 10, buildExpression(firstName, Operator.LIKE_SUFFIX, UTF8Type.instance.decompose("avel")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key2", "key3" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(firstName, Operator.LIKE_SUFFIX, UTF8Type.instance.decompose("n")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(age, Operator.EQ, Int32Type.instance.decompose(27)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[]{"key3", "key4"}, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(age, Operator.EQ, Int32Type.instance.decompose(26)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(age, Operator.EQ, Int32Type.instance.decompose(13)));
+        Assert.assertEquals(rows.toString(), 0, rows.size());
+    }
+
+    @Test
+    public void testEmptyTokenizedResults() throws Exception
+    {
+        testEmptyTokenizedResults(false);
+        cleanupData();
+        testEmptyTokenizedResults(true);
+    }
+
+    private void testEmptyTokenizedResults(boolean forceFlush) throws Exception
+    {
+        Map<String, Pair<String, Integer>> data = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key1", Pair.create("  ", 14));
+        }};
+
+        ColumnFamilyStore store = loadData(data, forceFlush);
+
+        Set<String> rows= getIndexed(store, 10, buildExpression(UTF8Type.instance.decompose("first_name"), Operator.LIKE_MATCHES, UTF8Type.instance.decompose("doesntmatter")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[]{}, rows.toArray(new String[rows.size()])));
+    }
+
+    @Test
+    public void testMultiExpressionQueries() throws Exception
+    {
+        testMultiExpressionQueries(false);
+        cleanupData();
+        testMultiExpressionQueries(true);
+    }
+
+    public void testMultiExpressionQueries(boolean forceFlush) throws Exception
+    {
+        Map<String, Pair<String, Integer>> data = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key1", Pair.create("Pavel", 14));
+                put("key2", Pair.create("Pavel", 26));
+                put("key3", Pair.create("Pavel", 27));
+                put("key4", Pair.create("Jason", 27));
+        }};
+
+        ColumnFamilyStore store = loadData(data, forceFlush);
+
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+        final ByteBuffer age = UTF8Type.instance.decompose("age");
+
+        Set<String> rows;
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.GT, Int32Type.instance.decompose(14)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2", "key3", "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.LT, Int32Type.instance.decompose(27)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[]{"key1", "key2"}, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                         buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                         buildExpression(age, Operator.GT, Int32Type.instance.decompose(14)),
+                         buildExpression(age, Operator.LT, Int32Type.instance.decompose(27)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                         buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                         buildExpression(age, Operator.GT, Int32Type.instance.decompose(12)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[]{ "key1", "key2", "key3", "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                         buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                         buildExpression(age, Operator.GTE, Int32Type.instance.decompose(13)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key2", "key3", "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                         buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                         buildExpression(age, Operator.GTE, Int32Type.instance.decompose(16)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2", "key3", "key4" }, rows.toArray(new String[rows.size()])));
+
+
+        rows = getIndexed(store, 10,
+                         buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                         buildExpression(age, Operator.LT, Int32Type.instance.decompose(30)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key2", "key3", "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                         buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                         buildExpression(age, Operator.LTE, Int32Type.instance.decompose(29)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key2", "key3", "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                         buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                         buildExpression(age, Operator.LTE, Int32Type.instance.decompose(25)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[]{ "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(firstName, Operator.LIKE_SUFFIX, UTF8Type.instance.decompose("avel")),
+                                     buildExpression(age, Operator.LTE, Int32Type.instance.decompose(25)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(firstName, Operator.LIKE_SUFFIX, UTF8Type.instance.decompose("n")),
+                                     buildExpression(age, Operator.LTE, Int32Type.instance.decompose(25)));
+        Assert.assertTrue(rows.isEmpty());
+
+    }
+
+    @Test
+    public void testCrossSSTableQueries() throws Exception
+    {
+        testCrossSSTableQueries(false);
+        cleanupData();
+        testCrossSSTableQueries(true);
+
+    }
+
+    private void testCrossSSTableQueries(boolean forceFlush) throws Exception
+    {
+        Map<String, Pair<String, Integer>> part1 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key0", Pair.create("Maxie", 43));
+                put("key1", Pair.create("Chelsie", 33));
+                put("key2", Pair.create("Josephine", 43));
+                put("key3", Pair.create("Shanna", 27));
+                put("key4", Pair.create("Amiya", 36));
+            }};
+
+        loadData(part1, forceFlush); // first sstable
+
+        Map<String, Pair<String, Integer>> part2 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key5", Pair.create("Americo", 20));
+                put("key6", Pair.create("Fiona", 39));
+                put("key7", Pair.create("Francis", 41));
+                put("key8", Pair.create("Charley", 21));
+                put("key9", Pair.create("Amely", 40));
+            }};
+
+        loadData(part2, forceFlush);
+
+        Map<String, Pair<String, Integer>> part3 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key10", Pair.create("Eddie", 42));
+                put("key11", Pair.create("Oswaldo", 35));
+                put("key12", Pair.create("Susana", 35));
+                put("key13", Pair.create("Alivia", 42));
+                put("key14", Pair.create("Demario", 28));
+            }};
+
+        ColumnFamilyStore store = loadData(part3, forceFlush);
+
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+        final ByteBuffer age = UTF8Type.instance.decompose("age");
+
+        Set<String> rows;
+        rows = getIndexed(store, 10, buildExpression(firstName, Operator.EQ, UTF8Type.instance.decompose("Fiona")),
+                                     buildExpression(age, Operator.LT, Int32Type.instance.decompose(40)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key6" }, rows.toArray(new String[rows.size()])));
+
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key0", "key11", "key12", "key13", "key14",
+                                                                        "key3", "key4", "key6", "key7", "key8" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 5,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+
+        Assert.assertEquals(rows.toString(), 5, rows.size());
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.GTE, Int32Type.instance.decompose(35)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key0", "key11", "key12", "key13", "key4", "key6", "key7" },
+                                                         rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.LT, Int32Type.instance.decompose(32)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key14", "key3", "key8" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.GT, Int32Type.instance.decompose(27)),
+                          buildExpression(age, Operator.LT, Int32Type.instance.decompose(32)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key14" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.GT, Int32Type.instance.decompose(10)));
+
+        Assert.assertEquals(rows.toString(), 10, rows.size());
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.LTE, Int32Type.instance.decompose(50)));
+
+        Assert.assertEquals(rows.toString(), 10, rows.size());
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_SUFFIX, UTF8Type.instance.decompose("ie")),
+                          buildExpression(age, Operator.LT, Int32Type.instance.decompose(43)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key10" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_SUFFIX, UTF8Type.instance.decompose("a")));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key12", "key13", "key3", "key4", "key6" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_SUFFIX, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.LT, Int32Type.instance.decompose(33)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3" }, rows.toArray(new String[rows.size()])));
+    }
+
+    @Test
+    public void testQueriesThatShouldBeTokenized() throws Exception
+    {
+        testQueriesThatShouldBeTokenized(false);
+        cleanupData();
+        testQueriesThatShouldBeTokenized(true);
+    }
+
+    private void testQueriesThatShouldBeTokenized(boolean forceFlush) throws Exception
+    {
+        Map<String, Pair<String, Integer>> part1 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key0", Pair.create("If you can dream it, you can do it.", 43));
+                put("key1", Pair.create("What you get by achieving your goals is not " +
+                        "as important as what you become by achieving your goals, do it.", 33));
+                put("key2", Pair.create("Keep your face always toward the sunshine " +
+                        "- and shadows will fall behind you.", 43));
+                put("key3", Pair.create("We can't help everyone, but everyone can " +
+                        "help someone.", 27));
+            }};
+
+        ColumnFamilyStore store = loadData(part1, forceFlush);
+
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+        final ByteBuffer age = UTF8Type.instance.decompose("age");
+
+        Set<String> rows = getIndexed(store, 10,
+                buildExpression(firstName, Operator.LIKE_CONTAINS,
+                        UTF8Type.instance.decompose("What you get by achieving your goals")),
+                buildExpression(age, Operator.GT, Int32Type.instance.decompose(32)));
+
+        Assert.assertEquals(rows.toString(), Collections.singleton("key1"), rows);
+
+        rows = getIndexed(store, 10,
+                buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("do it.")));
+
+        Assert.assertEquals(rows.toString(), Arrays.asList("key0", "key1"), Lists.newArrayList(rows));
+    }
+
+    @Test
+    public void testPrefixSearchWithContainsMode() throws Exception
+    {
+        testPrefixSearchWithContainsMode(false);
+        cleanupData();
+        testPrefixSearchWithContainsMode(true);
+    }
+
+    private void testPrefixSearchWithContainsMode(boolean forceFlush) throws Exception
+    {
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(FTS_CF_NAME);
+
+        executeCQL(FTS_CF_NAME, "INSERT INTO %s.%s (song_id, title, artist) VALUES(?, ?, ?)", UUID.fromString("1a4abbcd-b5de-4c69-a578-31231e01ff09"), "Poker Face", "Lady Gaga");
+        executeCQL(FTS_CF_NAME, "INSERT INTO %s.%s (song_id, title, artist) VALUES(?, ?, ?)", UUID.fromString("9472a394-359b-4a06-b1d5-b6afce590598"), "Forgetting the Way Home", "Our Lady of Bells");
+        executeCQL(FTS_CF_NAME, "INSERT INTO %s.%s (song_id, title, artist) VALUES(?, ?, ?)", UUID.fromString("4f8dc18e-54e6-4e16-b507-c5324b61523b"), "Zamki na piasku", "Lady Pank");
+        executeCQL(FTS_CF_NAME, "INSERT INTO %s.%s (song_id, title, artist) VALUES(?, ?, ?)", UUID.fromString("eaf294fa-bad5-49d4-8f08-35ba3636a706"), "Koncertowa", "Lady Pank");
+
+
+        if (forceFlush)
+            store.forceBlockingFlush();
+
+        final UntypedResultSet results = executeCQL(FTS_CF_NAME, "SELECT * FROM %s.%s WHERE artist LIKE 'lady%%'");
+        Assert.assertNotNull(results);
+        Assert.assertEquals(3, results.size());
+    }
+
+    @Test
+    public void testMultiExpressionQueriesWhereRowSplitBetweenSSTables() throws Exception
+    {
+        testMultiExpressionQueriesWhereRowSplitBetweenSSTables(false);
+        cleanupData();
+        testMultiExpressionQueriesWhereRowSplitBetweenSSTables(true);
+    }
+
+    private void testMultiExpressionQueriesWhereRowSplitBetweenSSTables(boolean forceFlush) throws Exception
+    {
+        Map<String, Pair<String, Integer>> part1 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key0", Pair.create("Maxie", -1));
+                put("key1", Pair.create("Chelsie", 33));
+                put("key2", Pair.create((String)null, 43));
+                put("key3", Pair.create("Shanna", 27));
+                put("key4", Pair.create("Amiya", 36));
+        }};
+
+        loadData(part1, forceFlush); // first sstable
+
+        Map<String, Pair<String, Integer>> part2 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key5", Pair.create("Americo", 20));
+                put("key6", Pair.create("Fiona", 39));
+                put("key7", Pair.create("Francis", 41));
+                put("key8", Pair.create("Charley", 21));
+                put("key9", Pair.create("Amely", 40));
+                put("key14", Pair.create((String)null, 28));
+        }};
+
+        loadData(part2, forceFlush);
+
+        Map<String, Pair<String, Integer>> part3 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key0", Pair.create((String)null, 43));
+                put("key10", Pair.create("Eddie", 42));
+                put("key11", Pair.create("Oswaldo", 35));
+                put("key12", Pair.create("Susana", 35));
+                put("key13", Pair.create("Alivia", 42));
+                put("key14", Pair.create("Demario", -1));
+                put("key2", Pair.create("Josephine", -1));
+        }};
+
+        ColumnFamilyStore store = loadData(part3, forceFlush);
+
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+        final ByteBuffer age = UTF8Type.instance.decompose("age");
+
+        Set<String> rows = getIndexed(store, 10,
+                                      buildExpression(firstName, Operator.EQ, UTF8Type.instance.decompose("Fiona")),
+                                      buildExpression(age, Operator.LT, Int32Type.instance.decompose(40)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key6" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key0", "key11", "key12", "key13", "key14",
+                                                                        "key3", "key4", "key6", "key7", "key8" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 5,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+
+        Assert.assertEquals(rows.toString(), 5, rows.size());
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.GTE, Int32Type.instance.decompose(35)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key0", "key11", "key12", "key13", "key4", "key6", "key7" },
+                                                         rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.LT, Int32Type.instance.decompose(32)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key14", "key3", "key8" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.GT, Int32Type.instance.decompose(27)),
+                          buildExpression(age, Operator.LT, Int32Type.instance.decompose(32)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key14" }, rows.toArray(new String[rows.size()])));
+
+        Map<String, Pair<String, Integer>> part4 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key12", Pair.create((String)null, 12));
+                put("key14", Pair.create("Demario", 42));
+                put("key2", Pair.create("Frank", -1));
+        }};
+
+        store = loadData(part4, forceFlush);
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_MATCHES, UTF8Type.instance.decompose("Susana")),
+                          buildExpression(age, Operator.LTE, Int32Type.instance.decompose(13)),
+                          buildExpression(age, Operator.GT, Int32Type.instance.decompose(10)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key12" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_MATCHES, UTF8Type.instance.decompose("Demario")),
+                          buildExpression(age, Operator.LTE, Int32Type.instance.decompose(30)));
+        Assert.assertTrue(rows.toString(), rows.size() == 0);
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_MATCHES, UTF8Type.instance.decompose("Josephine")));
+        Assert.assertTrue(rows.toString(), rows.size() == 0);
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.GT, Int32Type.instance.decompose(10)));
+
+        Assert.assertEquals(rows.toString(), 10, rows.size());
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                          buildExpression(age, Operator.LTE, Int32Type.instance.decompose(50)));
+
+        Assert.assertEquals(rows.toString(), 10, rows.size());
+
+        rows = getIndexed(store, 10,
+                          buildExpression(firstName, Operator.LIKE_SUFFIX, UTF8Type.instance.decompose("ie")),
+                          buildExpression(age, Operator.LTE, Int32Type.instance.decompose(43)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key0", "key1", "key10" }, rows.toArray(new String[rows.size()])));
+    }
+
+    @Test
+    public void testPagination() throws Exception
+    {
+        testPagination(false);
+        cleanupData();
+        testPagination(true);
+    }
+
+    private void testPagination(boolean forceFlush) throws Exception
+    {
+        // split data into 3 distinct SSTables to test paging with overlapping token intervals.
+
+        Map<String, Pair<String, Integer>> part1 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key01", Pair.create("Ali", 33));
+                put("key02", Pair.create("Jeremy", 41));
+                put("key03", Pair.create("Elvera", 22));
+                put("key04", Pair.create("Bailey", 45));
+                put("key05", Pair.create("Emerson", 32));
+                put("key06", Pair.create("Kadin", 38));
+                put("key07", Pair.create("Maggie", 36));
+                put("key08", Pair.create("Kailey", 36));
+                put("key09", Pair.create("Armand", 21));
+                put("key10", Pair.create("Arnold", 35));
+        }};
+
+        Map<String, Pair<String, Integer>> part2 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key11", Pair.create("Ken", 38));
+                put("key12", Pair.create("Penelope", 43));
+                put("key13", Pair.create("Wyatt", 34));
+                put("key14", Pair.create("Johnpaul", 34));
+                put("key15", Pair.create("Trycia", 43));
+                put("key16", Pair.create("Aida", 21));
+                put("key17", Pair.create("Devon", 42));
+        }};
+
+        Map<String, Pair<String, Integer>> part3 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key18", Pair.create("Christina", 20));
+                put("key19", Pair.create("Rick", 19));
+                put("key20", Pair.create("Fannie", 22));
+                put("key21", Pair.create("Keegan", 29));
+                put("key22", Pair.create("Ignatius", 36));
+                put("key23", Pair.create("Ellis", 26));
+                put("key24", Pair.create("Annamarie", 29));
+                put("key25", Pair.create("Tianna", 31));
+                put("key26", Pair.create("Dennis", 32));
+        }};
+
+        ColumnFamilyStore store = loadData(part1, forceFlush);
+
+        loadData(part2, forceFlush);
+        loadData(part3, forceFlush);
+
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+        final ByteBuffer age = UTF8Type.instance.decompose("age");
+
+        Set<DecoratedKey> uniqueKeys = getPaged(store, 4,
+                buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                buildExpression(age, Operator.GTE, Int32Type.instance.decompose(21)));
+
+
+        List<String> expected = new ArrayList<String>()
+        {{
+                add("key25");
+                add("key20");
+                add("key13");
+                add("key22");
+                add("key09");
+                add("key14");
+                add("key16");
+                add("key24");
+                add("key03");
+                add("key04");
+                add("key08");
+                add("key07");
+                add("key15");
+                add("key06");
+                add("key21");
+        }};
+
+        Assert.assertEquals(expected, convert(uniqueKeys));
+
+        // now let's test a single equals condition
+
+        uniqueKeys = getPaged(store, 4, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+
+        expected = new ArrayList<String>()
+        {{
+                add("key25");
+                add("key20");
+                add("key13");
+                add("key22");
+                add("key09");
+                add("key14");
+                add("key16");
+                add("key24");
+                add("key03");
+                add("key04");
+                add("key18");
+                add("key08");
+                add("key07");
+                add("key15");
+                add("key06");
+                add("key21");
+        }};
+
+        Assert.assertEquals(expected, convert(uniqueKeys));
+
+        // now let's test something which is smaller than a single page
+        uniqueKeys = getPaged(store, 4,
+                              buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                              buildExpression(age, Operator.EQ, Int32Type.instance.decompose(36)));
+
+        expected = new ArrayList<String>()
+        {{
+                add("key22");
+                add("key08");
+                add("key07");
+        }};
+
+        Assert.assertEquals(expected, convert(uniqueKeys));
+
+        // the same but with the page size of 2 to test minimal pagination windows
+
+        uniqueKeys = getPaged(store, 2,
+                              buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                              buildExpression(age, Operator.EQ, Int32Type.instance.decompose(36)));
+
+        Assert.assertEquals(expected, convert(uniqueKeys));
+
+        // and last but not least, test age range query with pagination
+        uniqueKeys = getPaged(store, 4,
+                buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                buildExpression(age, Operator.GT, Int32Type.instance.decompose(20)),
+                buildExpression(age, Operator.LTE, Int32Type.instance.decompose(36)));
+
+        expected = new ArrayList<String>()
+        {{
+                add("key25");
+                add("key20");
+                add("key13");
+                add("key22");
+                add("key09");
+                add("key14");
+                add("key16");
+                add("key24");
+                add("key03");
+                add("key08");
+                add("key07");
+                add("key21");
+        }};
+
+        Assert.assertEquals(expected, convert(uniqueKeys));
+
+        Set<String> rows;
+
+        rows = executeCQLWithKeys(String.format("SELECT * FROM %s.%s WHERE first_name LIKE '%%a%%' limit 10 ALLOW FILTERING;", KS_NAME, CF_NAME));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key03", "key04", "key09", "key13", "key14", "key16", "key20", "key22", "key24", "key25" }, rows.toArray(new String[rows.size()])));
+
+        rows = executeCQLWithKeys(String.format("SELECT * FROM %s.%s WHERE first_name LIKE '%%a%%' and token(id) >= token('key14') limit 5 ALLOW FILTERING;", KS_NAME, CF_NAME));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key03", "key04", "key14", "key16", "key24" }, rows.toArray(new String[rows.size()])));
+
+        rows = executeCQLWithKeys(String.format("SELECT * FROM %s.%s WHERE first_name LIKE '%%a%%' and token(id) >= token('key14') and token(id) <= token('key24') limit 5 ALLOW FILTERING;", KS_NAME, CF_NAME));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key14", "key16", "key24" }, rows.toArray(new String[rows.size()])));
+
+        rows = executeCQLWithKeys(String.format("SELECT * FROM %s.%s WHERE first_name LIKE '%%a%%' and age > 30 and token(id) >= token('key14') and token(id) <= token('key24') limit 5 ALLOW FILTERING;", KS_NAME, CF_NAME));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key14" }, rows.toArray(new String[rows.size()])));
+
+        rows = executeCQLWithKeys(String.format("SELECT * FROM %s.%s WHERE first_name like '%%ie' limit 5 ALLOW FILTERING;", KS_NAME, CF_NAME));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key07", "key20", "key24" }, rows.toArray(new String[rows.size()])));
+
+        rows = executeCQLWithKeys(String.format("SELECT * FROM %s.%s WHERE first_name like '%%ie' AND token(id) > token('key24') limit 5 ALLOW FILTERING;", KS_NAME, CF_NAME));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key07", "key24" }, rows.toArray(new String[rows.size()])));
+    }
+
+    @Test
+    public void testColumnNamesWithSlashes() throws Exception
+    {
+        testColumnNamesWithSlashes(false);
+        cleanupData();
+        testColumnNamesWithSlashes(true);
+    }
+
+    private void testColumnNamesWithSlashes(boolean forceFlush) throws Exception
+    {
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+
+        Mutation rm1 = new Mutation(KS_NAME, decoratedKey(AsciiType.instance.decompose("key1")));
+        rm1.add(PartitionUpdate.singleRowUpdate(store.metadata,
+                                                rm1.key(),
+                                                buildRow(buildCell(store.metadata,
+                                                                   UTF8Type.instance.decompose("/data/output/id"),
+                                                                   AsciiType.instance.decompose("jason"),
+                                                                   System.currentTimeMillis()))));
+
+        Mutation rm2 = new Mutation(KS_NAME, decoratedKey(AsciiType.instance.decompose("key2")));
+        rm2.add(PartitionUpdate.singleRowUpdate(store.metadata,
+                                                rm2.key(),
+                                                buildRow(buildCell(store.metadata,
+                                                                   UTF8Type.instance.decompose("/data/output/id"),
+                                                                   AsciiType.instance.decompose("pavel"),
+                                                                   System.currentTimeMillis()))));
+
+        Mutation rm3 = new Mutation(KS_NAME, decoratedKey(AsciiType.instance.decompose("key3")));
+        rm3.add(PartitionUpdate.singleRowUpdate(store.metadata,
+                                                rm3.key(),
+                                                buildRow(buildCell(store.metadata,
+                                                                   UTF8Type.instance.decompose("/data/output/id"),
+                                                                   AsciiType.instance.decompose("Aleksey"),
+                                                                   System.currentTimeMillis()))));
+
+        rm1.apply();
+        rm2.apply();
+        rm3.apply();
+
+        if (forceFlush)
+            store.forceBlockingFlush();
+
+        final ByteBuffer dataOutputId = UTF8Type.instance.decompose("/data/output/id");
+
+        Set<String> rows = getIndexed(store, 10, buildExpression(dataOutputId, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(dataOutputId, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("A")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[]{ "key3" }, rows.toArray(new String[rows.size()])));
+
+        // doesn't really make sense to rebuild index for in-memory data
+        if (!forceFlush)
+            return;
+
+        store.indexManager.invalidateAllIndexesBlocking();
+
+        rows = getIndexed(store, 10, buildExpression(dataOutputId, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertTrue(rows.toString(), rows.isEmpty());
+
+        rows = getIndexed(store, 10, buildExpression(dataOutputId, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("A")));
+        Assert.assertTrue(rows.toString(), rows.isEmpty());
+
+        // now let's trigger index rebuild and check if we got the data back
+        store.indexManager.buildIndexBlocking(store.indexManager.getIndexByName("data_output_id"));
+
+        rows = getIndexed(store, 10, buildExpression(dataOutputId, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key2" }, rows.toArray(new String[rows.size()])));
+
+        // also let's try to build an index for column which has no data to make sure that doesn't fail
+        store.indexManager.buildIndexBlocking(store.indexManager.getIndexByName("first_name"));
+        store.indexManager.buildIndexBlocking(store.indexManager.getIndexByName("data_output_id"));
+
+        rows = getIndexed(store, 10, buildExpression(dataOutputId, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(dataOutputId, Operator.LIKE_SUFFIX, UTF8Type.instance.decompose("el")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+    }
+
+    @Test
+    public void testInvalidate() throws Exception
+    {
+        testInvalidate(false);
+        cleanupData();
+        testInvalidate(true);
+    }
+
+    private void testInvalidate(boolean forceFlush) throws Exception
+    {
+        Map<String, Pair<String, Integer>> part1 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key0", Pair.create("Maxie", -1));
+                put("key1", Pair.create("Chelsie", 33));
+                put("key2", Pair.create((String) null, 43));
+                put("key3", Pair.create("Shanna", 27));
+                put("key4", Pair.create("Amiya", 36));
+        }};
+
+        ColumnFamilyStore store = loadData(part1, forceFlush);
+
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+        final ByteBuffer age = UTF8Type.instance.decompose("age");
+
+        Set<String> rows = getIndexed(store, 10, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[]{ "key0", "key3", "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(age, Operator.EQ, Int32Type.instance.decompose(33)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[]{ "key1" }, rows.toArray(new String[rows.size()])));
+
+        store.indexManager.invalidateAllIndexesBlocking();
+
+        rows = getIndexed(store, 10, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertTrue(rows.toString(), rows.isEmpty());
+
+        rows = getIndexed(store, 10, buildExpression(age, Operator.EQ, Int32Type.instance.decompose(33)));
+        Assert.assertTrue(rows.toString(), rows.isEmpty());
+
+
+        Map<String, Pair<String, Integer>> part2 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key5", Pair.create("Americo", 20));
+                put("key6", Pair.create("Fiona", 39));
+                put("key7", Pair.create("Francis", 41));
+                put("key8", Pair.create("Fred", 21));
+                put("key9", Pair.create("Amely", 40));
+                put("key14", Pair.create("Dino", 28));
+        }};
+
+        loadData(part2, forceFlush);
+
+        rows = getIndexed(store, 10, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[]{ "key6", "key7" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(age, Operator.EQ, Int32Type.instance.decompose(40)));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[]{ "key9" }, rows.toArray(new String[rows.size()])));
+    }
+
+    @Test
+    public void testTruncate()
+    {
+        Map<String, Pair<String, Integer>> part1 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key01", Pair.create("Ali", 33));
+                put("key02", Pair.create("Jeremy", 41));
+                put("key03", Pair.create("Elvera", 22));
+                put("key04", Pair.create("Bailey", 45));
+                put("key05", Pair.create("Emerson", 32));
+                put("key06", Pair.create("Kadin", 38));
+                put("key07", Pair.create("Maggie", 36));
+                put("key08", Pair.create("Kailey", 36));
+                put("key09", Pair.create("Armand", 21));
+                put("key10", Pair.create("Arnold", 35));
+        }};
+
+        Map<String, Pair<String, Integer>> part2 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key11", Pair.create("Ken", 38));
+                put("key12", Pair.create("Penelope", 43));
+                put("key13", Pair.create("Wyatt", 34));
+                put("key14", Pair.create("Johnpaul", 34));
+                put("key15", Pair.create("Trycia", 43));
+                put("key16", Pair.create("Aida", 21));
+                put("key17", Pair.create("Devon", 42));
+        }};
+
+        Map<String, Pair<String, Integer>> part3 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key18", Pair.create("Christina", 20));
+                put("key19", Pair.create("Rick", 19));
+                put("key20", Pair.create("Fannie", 22));
+                put("key21", Pair.create("Keegan", 29));
+                put("key22", Pair.create("Ignatius", 36));
+                put("key23", Pair.create("Ellis", 26));
+                put("key24", Pair.create("Annamarie", 29));
+                put("key25", Pair.create("Tianna", 31));
+                put("key26", Pair.create("Dennis", 32));
+        }};
+
+        ColumnFamilyStore store = loadData(part1, 1000, true);
+
+        loadData(part2, 2000, true);
+        loadData(part3, 3000, true);
+
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+
+        Set<String> rows = getIndexed(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertEquals(rows.toString(), 16, rows.size());
+
+        // make sure we don't prematurely delete anything
+        store.indexManager.truncateAllIndexesBlocking(500);
+
+        rows = getIndexed(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertEquals(rows.toString(), 16, rows.size());
+
+        store.indexManager.truncateAllIndexesBlocking(1500);
+
+        rows = getIndexed(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertEquals(rows.toString(), 10, rows.size());
+
+        store.indexManager.truncateAllIndexesBlocking(2500);
+
+        rows = getIndexed(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertEquals(rows.toString(), 6, rows.size());
+
+        store.indexManager.truncateAllIndexesBlocking(3500);
+
+        rows = getIndexed(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertEquals(rows.toString(), 0, rows.size());
+
+        // add back in some data just to make sure it all still works
+        Map<String, Pair<String, Integer>> part4 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key40", Pair.create("Tianna", 31));
+                put("key41", Pair.create("Dennis", 32));
+        }};
+
+        loadData(part4, 4000, true);
+
+        rows = getIndexed(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertEquals(rows.toString(), 1, rows.size());
+    }
+
+
+    @Test
+    public void testConcurrentMemtableReadsAndWrites() throws Exception
+    {
+        final ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+
+        ExecutorService scheduler = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
+
+        final int writeCount = 10000;
+        final AtomicInteger updates = new AtomicInteger(0);
+
+        for (int i = 0; i < writeCount; i++)
+        {
+            final String key = "key" + i;
+            final String firstName = "first_name#" + i;
+            final String lastName = "last_name#" + i;
+
+            scheduler.submit((Runnable) () -> {
+                try
+                {
+                    newMutation(key, firstName, lastName, 26, System.currentTimeMillis()).apply();
+                    Uninterruptibles.sleepUninterruptibly(5, TimeUnit.MILLISECONDS); // back up a bit to do more reads
+                }
+                finally
+                {
+                    updates.incrementAndGet();
+                }
+            });
+        }
+
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+        final ByteBuffer age = UTF8Type.instance.decompose("age");
+
+        int previousCount = 0;
+
+        do
+        {
+            // this loop figures out if number of search results monotonically increasing
+            // to make sure that concurrent updates don't interfere with reads, uses first_name and age
+            // indexes to test correctness of both Trie and SkipList ColumnIndex implementations.
+
+            Set<DecoratedKey> rows = getPaged(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                                                          buildExpression(age, Operator.EQ, Int32Type.instance.decompose(26)));
+
+            Assert.assertTrue(previousCount <= rows.size());
+            previousCount = rows.size();
+        }
+        while (updates.get() < writeCount);
+
+        // to make sure that after all of the right are done we can read all "count" worth of rows
+        Set<DecoratedKey> rows = getPaged(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                                                      buildExpression(age, Operator.EQ, Int32Type.instance.decompose(26)));
+
+        Assert.assertEquals(writeCount, rows.size());
+    }
+
+    @Test
+    public void testSameKeyInMemtableAndSSTables()
+    {
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+        final ByteBuffer age = UTF8Type.instance.decompose("age");
+
+        Map<String, Pair<String, Integer>> data1 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key1", Pair.create("Pavel", 14));
+                put("key2", Pair.create("Pavel", 26));
+                put("key3", Pair.create("Pavel", 27));
+                put("key4", Pair.create("Jason", 27));
+        }};
+
+        ColumnFamilyStore store = loadData(data1, true);
+
+        Map<String, Pair<String, Integer>> data2 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key1", Pair.create("Pavel", 14));
+                put("key2", Pair.create("Pavel", 27));
+                put("key4", Pair.create("Jason", 28));
+        }};
+
+        loadData(data2, true);
+
+        Map<String, Pair<String, Integer>> data3 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key1", Pair.create("Pavel", 15));
+                put("key4", Pair.create("Jason", 29));
+        }};
+
+        loadData(data3, false);
+
+        Set<String> rows = getIndexed(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key2", "key3", "key4" }, rows.toArray(new String[rows.size()])));
+
+
+        rows = getIndexed(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                                      buildExpression(age, Operator.EQ, Int32Type.instance.decompose(15)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                                      buildExpression(age, Operator.EQ, Int32Type.instance.decompose(29)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 100, buildExpression(firstName, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("a")),
+                                      buildExpression(age, Operator.EQ, Int32Type.instance.decompose(27)));
+
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[]{"key2", "key3"}, rows.toArray(new String[rows.size()])));
+    }
+
+    @Test
+    public void testInsertingIncorrectValuesIntoAgeIndex()
+    {
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+        final ByteBuffer age = UTF8Type.instance.decompose("age");
+
+        Mutation rm = new Mutation(KS_NAME, decoratedKey(AsciiType.instance.decompose("key1")));
+        update(rm, new ArrayList<Cell>()
+        {{
+            add(buildCell(age, LongType.instance.decompose(26L), System.currentTimeMillis()));
+            add(buildCell(firstName, AsciiType.instance.decompose("pavel"), System.currentTimeMillis()));
+        }});
+        rm.apply();
+
+        store.forceBlockingFlush();
+
+        Set<String> rows = getIndexed(store, 10, buildExpression(firstName, Operator.EQ, UTF8Type.instance.decompose("a")),
+                                                 buildExpression(age, Operator.GTE, Int32Type.instance.decompose(26)));
+
+        // index is expected to have 0 results because age value was of wrong type
+        Assert.assertEquals(0, rows.size());
+    }
+
+
+    @Test
+    public void testUnicodeSupport()
+    {
+        testUnicodeSupport(false);
+        cleanupData();
+        testUnicodeSupport(true);
+    }
+
+    private void testUnicodeSupport(boolean forceFlush)
+    {
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+
+        final ByteBuffer comment = UTF8Type.instance.decompose("comment");
+
+        Mutation rm = new Mutation(KS_NAME, decoratedKey("key1"));
+        update(rm, comment, UTF8Type.instance.decompose("ⓈⓅⒺⒸⒾⒶⓁ ⒞⒣⒜⒭⒮ and normal ones"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key2"));
+        update(rm, comment, UTF8Type.instance.decompose("龍馭鬱"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key3"));
+        update(rm, comment, UTF8Type.instance.decompose("インディアナ"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key4"));
+        update(rm, comment, UTF8Type.instance.decompose("レストラン"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key5"));
+        update(rm, comment, UTF8Type.instance.decompose("ベンジャミン ウエスト"), System.currentTimeMillis());
+        rm.apply();
+
+        if (forceFlush)
+            store.forceBlockingFlush();
+
+        Set<String> rows;
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("ⓈⓅⒺⒸⒾ")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("normal")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("龍")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("鬱")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("馭鬱")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("龍馭鬱")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("ベンジャミン")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key5" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("レストラ")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("インディ")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("ベンジャミ")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key5" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_SUFFIX, UTF8Type.instance.decompose("ン")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key4", "key5" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_MATCHES, UTF8Type.instance.decompose("レストラン")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key4" }, rows.toArray(new String[rows.size()])));
+    }
+
+    @Test
+    public void testUnicodeSuffixModeNoSplits()
+    {
+        testUnicodeSuffixModeNoSplits(false);
+        cleanupData();
+        testUnicodeSuffixModeNoSplits(true);
+    }
+
+    private void testUnicodeSuffixModeNoSplits(boolean forceFlush)
+    {
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+
+        final ByteBuffer comment = UTF8Type.instance.decompose("comment_suffix_split");
+
+        Mutation rm = new Mutation(KS_NAME, decoratedKey("key1"));
+        update(rm, comment, UTF8Type.instance.decompose("龍馭鬱"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key2"));
+        update(rm, comment, UTF8Type.instance.decompose("インディアナ"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key3"));
+        update(rm, comment, UTF8Type.instance.decompose("レストラン"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key4"));
+        update(rm, comment, UTF8Type.instance.decompose("ベンジャミン ウエスト"), System.currentTimeMillis());
+        rm.apply();
+
+        if (forceFlush)
+            store.forceBlockingFlush();
+
+        Set<String> rows;
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("龍")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("鬱")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("馭鬱")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("龍馭鬱")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("ベンジャミン")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("トラン")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("ディア")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("ジャミン")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("ン")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2", "key3", "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_SUFFIX, UTF8Type.instance.decompose("ン")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_MATCHES, UTF8Type.instance.decompose("ベンジャミン ウエスト")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key4" }, rows.toArray(new String[rows.size()])));
+    }
+
+    @Test
+    public void testThatTooBigValueIsRejected()
+    {
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+
+        final ByteBuffer comment = UTF8Type.instance.decompose("comment_suffix_split");
+
+        for (int i = 0; i < 10; i++)
+        {
+            byte[] randomBytes = new byte[ThreadLocalRandom.current().nextInt(OnDiskIndexBuilder.MAX_TERM_SIZE, 5 * OnDiskIndexBuilder.MAX_TERM_SIZE)];
+            ThreadLocalRandom.current().nextBytes(randomBytes);
+
+            final ByteBuffer bigValue = UTF8Type.instance.decompose(new String(randomBytes));
+
+            Mutation rm = new Mutation(KS_NAME, decoratedKey("key1"));
+            update(rm, comment, bigValue, System.currentTimeMillis());
+            rm.apply();
+
+            Set<String> rows;
+
+            rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_MATCHES, bigValue.duplicate()));
+            Assert.assertEquals(0, rows.size());
+
+            store.forceBlockingFlush();
+
+            rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_MATCHES, bigValue.duplicate()));
+            Assert.assertEquals(0, rows.size());
+        }
+    }
+
+    @Test
+    public void testSearchTimeouts() throws Exception
+    {
+        final ByteBuffer firstName = UTF8Type.instance.decompose("first_name");
+
+        Map<String, Pair<String, Integer>> data1 = new HashMap<String, Pair<String, Integer>>()
+        {{
+                put("key1", Pair.create("Pavel", 14));
+                put("key2", Pair.create("Pavel", 26));
+                put("key3", Pair.create("Pavel", 27));
+                put("key4", Pair.create("Jason", 27));
+        }};
+
+        ColumnFamilyStore store = loadData(data1, true);
+
+        RowFilter filter = RowFilter.create();
+        filter.add(store.metadata.getColumnDefinition(firstName), Operator.LIKE_CONTAINS, AsciiType.instance.fromString("a"));
+
+        ReadCommand command = new PartitionRangeReadCommand(store.metadata,
+                                                            FBUtilities.nowInSeconds(),
+                                                            ColumnFilter.all(store.metadata),
+                                                            filter,
+                                                            DataLimits.NONE,
+                                                            DataRange.allData(store.metadata.partitioner),
+                                                            Optional.empty());
+
+        try
+        {
+            new QueryPlan(store, command, 0).execute(ReadExecutionController.empty());
+            Assert.fail();
+        }
+        catch (TimeQuotaExceededException e)
+        {
+            // correct behavior
+        }
+        catch (Exception e)
+        {
+            Assert.fail();
+            e.printStackTrace();
+        }
+
+        // to make sure that query doesn't fail in normal conditions
+
+        try (ReadExecutionController controller = command.executionController())
+        {
+            Set<String> rows = getKeys(new QueryPlan(store, command, DatabaseDescriptor.getRangeRpcTimeout()).execute(controller));
+            Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1", "key2", "key3", "key4" }, rows.toArray(new String[rows.size()])));
+        }
+    }
+
+    @Test
+    public void testLowerCaseAnalyzer()
+    {
+        testLowerCaseAnalyzer(false);
+        cleanupData();
+        testLowerCaseAnalyzer(true);
+    }
+
+    @Test
+    public void testChinesePrefixSearch()
+    {
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+
+        final ByteBuffer fullName = UTF8Type.instance.decompose("/output/full-name/");
+
+        Mutation rm = new Mutation(KS_NAME, decoratedKey("key1"));
+        update(rm, fullName, UTF8Type.instance.decompose("美加 八田"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key2"));
+        update(rm, fullName, UTF8Type.instance.decompose("仁美 瀧澤"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key3"));
+        update(rm, fullName, UTF8Type.instance.decompose("晃宏 高須"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key4"));
+        update(rm, fullName, UTF8Type.instance.decompose("弘孝 大竹"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key5"));
+        update(rm, fullName, UTF8Type.instance.decompose("満枝 榎本"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key6"));
+        update(rm, fullName, UTF8Type.instance.decompose("飛鳥 上原"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key7"));
+        update(rm, fullName, UTF8Type.instance.decompose("大輝 鎌田"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key8"));
+        update(rm, fullName, UTF8Type.instance.decompose("利久 寺地"), System.currentTimeMillis());
+        rm.apply();
+
+        store.forceBlockingFlush();
+
+
+        Set<String> rows;
+
+        rows = getIndexed(store, 10, buildExpression(fullName, Operator.EQ, UTF8Type.instance.decompose("美加 八田")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(fullName, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("美加")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(fullName, Operator.EQ, UTF8Type.instance.decompose("晃宏 高須")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(fullName, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("大輝")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key7" }, rows.toArray(new String[rows.size()])));
+    }
+
+    public void testLowerCaseAnalyzer(boolean forceFlush)
+    {
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+
+        final ByteBuffer comment = UTF8Type.instance.decompose("address");
+
+        Mutation rm = new Mutation(KS_NAME, decoratedKey("key1"));
+        update(rm, comment, UTF8Type.instance.decompose("577 Rogahn Valleys Apt. 178"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key2"));
+        update(rm, comment, UTF8Type.instance.decompose("89809 Beverly Course Suite 089"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key3"));
+        update(rm, comment, UTF8Type.instance.decompose("165 clydie oval apt. 399"), System.currentTimeMillis());
+        rm.apply();
+
+        if (forceFlush)
+            store.forceBlockingFlush();
+
+        Set<String> rows;
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("577 Rogahn Valleys")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("577 ROgAhn VallEYs")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("577 rogahn valleys")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("577 rogahn")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("57")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("89809 Beverly Course")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("89809 BEVERly COURSE")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("89809 beverly course")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("89809 Beverly")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("8980")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("165 ClYdie OvAl APT. 399")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("165 Clydie Oval Apt. 399")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("165 clydie oval apt. 399")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("165 ClYdie OvA")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("165 ClYdi")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(comment, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("165")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3" }, rows.toArray(new String[rows.size()])));
+    }
+
+    @Test
+    public void testPrefixSSTableLookup()
+    {
+        // This test coverts particular case which interval lookup can return invalid results
+        // when queried on the prefix e.g. "j".
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+
+        final ByteBuffer name = UTF8Type.instance.decompose("first_name_prefix");
+
+        Mutation rm;
+
+        rm = new Mutation(KS_NAME, decoratedKey("key1"));
+        update(rm, name, UTF8Type.instance.decompose("Pavel"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key2"));
+        update(rm, name, UTF8Type.instance.decompose("Jordan"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key3"));
+        update(rm, name, UTF8Type.instance.decompose("Mikhail"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key4"));
+        update(rm, name, UTF8Type.instance.decompose("Michael"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key5"));
+        update(rm, name, UTF8Type.instance.decompose("Johnny"), System.currentTimeMillis());
+        rm.apply();
+
+        // first flush would make interval for name - 'johnny' -> 'pavel'
+        store.forceBlockingFlush();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key6"));
+        update(rm, name, UTF8Type.instance.decompose("Jason"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key7"));
+        update(rm, name, UTF8Type.instance.decompose("Vijay"), System.currentTimeMillis());
+        rm.apply();
+
+        rm = new Mutation(KS_NAME, decoratedKey("key8")); // this name is going to be tokenized
+        update(rm, name, UTF8Type.instance.decompose("Jean-Claude"), System.currentTimeMillis());
+        rm.apply();
+
+        // this flush is going to produce range - 'jason' -> 'vijay'
+        store.forceBlockingFlush();
+
+        // make sure that overlap of the prefixes is properly handled across sstables
+        // since simple interval tree lookup is not going to cover it, prefix lookup actually required.
+
+        Set<String> rows;
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("J")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2", "key5", "key6", "key8"}, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("j")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2", "key5", "key6", "key8" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("m")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key3", "key4" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("v")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key7" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("p")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("j")),
+                                     buildExpression(name, Operator.NEQ, UTF8Type.instance.decompose("joh")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key2", "key6", "key8" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.LIKE_MATCHES, UTF8Type.instance.decompose("pavel")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.EQ, UTF8Type.instance.decompose("Pave")));
+        Assert.assertTrue(rows.isEmpty());
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.EQ, UTF8Type.instance.decompose("Pavel")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key1" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.LIKE_MATCHES, UTF8Type.instance.decompose("JeAn")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key8" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.LIKE_MATCHES, UTF8Type.instance.decompose("claUde")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key8" }, rows.toArray(new String[rows.size()])));
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.EQ, UTF8Type.instance.decompose("Jean")));
+        Assert.assertTrue(rows.isEmpty());
+
+        rows = getIndexed(store, 10, buildExpression(name, Operator.EQ, UTF8Type.instance.decompose("Jean-Claude")));
+        Assert.assertTrue(rows.toString(), Arrays.equals(new String[] { "key8" }, rows.toArray(new String[rows.size()])));
+    }
+
+    @Test
+    public void testSettingIsLiteralOption()
+    {
+
+        // special type which is UTF-8 but is only on the inside
+        AbstractType<?> stringType = new AbstractType<String>(AbstractType.ComparisonType.CUSTOM)
+        {
+            public ByteBuffer fromString(String source) throws MarshalException
+            {
+                return UTF8Type.instance.fromString(source);
+            }
+
+            public Term fromJSONObject(Object parsed) throws MarshalException
+            {
+                throw new UnsupportedOperationException();
+            }
+
+            public TypeSerializer<String> getSerializer()
+            {
+                return UTF8Type.instance.getSerializer();
+            }
+
+            public int compareCustom(ByteBuffer a, ByteBuffer b)
+            {
+                return UTF8Type.instance.compare(a, b);
+            }
+        };
+
+        // first let's check that we get 'false' for 'isLiteral' if we don't set the option with special comparator
+        ColumnDefinition columnA = ColumnDefinition.regularDef(KS_NAME, CF_NAME, "special-A", stringType);
+
+        ColumnIndex indexA = new ColumnIndex(UTF8Type.instance, columnA, IndexMetadata.fromSchemaMetadata("special-index-A", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+        {{
+            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+        }}));
+
+        Assert.assertEquals(true,  indexA.isIndexed());
+        Assert.assertEquals(false, indexA.isLiteral());
+
+        // now let's double-check that we do get 'true' when we set it
+        ColumnDefinition columnB = ColumnDefinition.regularDef(KS_NAME, CF_NAME, "special-B", stringType);
+
+        ColumnIndex indexB = new ColumnIndex(UTF8Type.instance, columnB, IndexMetadata.fromSchemaMetadata("special-index-B", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+        {{
+            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+            put("is_literal", "true");
+        }}));
+
+        Assert.assertEquals(true, indexB.isIndexed());
+        Assert.assertEquals(true, indexB.isLiteral());
+
+        // and finally we should also get a 'true' if it's built-in UTF-8/ASCII comparator
+        ColumnDefinition columnC = ColumnDefinition.regularDef(KS_NAME, CF_NAME, "special-C", UTF8Type.instance);
+
+        ColumnIndex indexC = new ColumnIndex(UTF8Type.instance, columnC, IndexMetadata.fromSchemaMetadata("special-index-C", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+        {{
+            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+        }}));
+
+        Assert.assertEquals(true, indexC.isIndexed());
+        Assert.assertEquals(true, indexC.isLiteral());
+
+        ColumnDefinition columnD = ColumnDefinition.regularDef(KS_NAME, CF_NAME, "special-D", AsciiType.instance);
+
+        ColumnIndex indexD = new ColumnIndex(UTF8Type.instance, columnD, IndexMetadata.fromSchemaMetadata("special-index-D", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+        {{
+            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+        }}));
+
+        Assert.assertEquals(true, indexD.isIndexed());
+        Assert.assertEquals(true, indexD.isLiteral());
+
+        // and option should supersedes the comparator type
+        ColumnDefinition columnE = ColumnDefinition.regularDef(KS_NAME, CF_NAME, "special-E", UTF8Type.instance);
+
+        ColumnIndex indexE = new ColumnIndex(UTF8Type.instance, columnE, IndexMetadata.fromSchemaMetadata("special-index-E", IndexMetadata.Kind.CUSTOM, new HashMap<String, String>()
+        {{
+            put(IndexTarget.CUSTOM_INDEX_OPTION_NAME, SASIIndex.class.getName());
+            put("is_literal", "false");
+        }}));
+
+        Assert.assertEquals(true,  indexE.isIndexed());
+        Assert.assertEquals(false, indexE.isLiteral());
+    }
+
+    @Test
+    public void testClusteringIndexes() throws Exception
+    {
+        testClusteringIndexes(false);
+        cleanupData();
+        testClusteringIndexes(true);
+    }
+
+    public void testClusteringIndexes(boolean forceFlush) throws Exception
+    {
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CLUSTERING_CF_NAME_1);
+
+        executeCQL(CLUSTERING_CF_NAME_1, "INSERT INTO %s.%s (name, location, age, height, score) VALUES (?, ?, ?, ?, ?)", "Pavel", "US", 27, 183, 1.0);
+        executeCQL(CLUSTERING_CF_NAME_1, "INSERT INTO %s.%s (name, location, age, height, score) VALUES (?, ?, ?, ?, ?)", "Pavel", "BY", 28, 182, 2.0);
+        executeCQL(CLUSTERING_CF_NAME_1 ,"INSERT INTO %s.%s (name, location, age, height, score) VALUES (?, ?, ?, ?, ?)", "Jordan", "US", 27, 182, 1.0);
+
+        if (forceFlush)
+            store.forceBlockingFlush();
+
+        UntypedResultSet results;
+
+        results = executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE location = ? ALLOW FILTERING", "US");
+        Assert.assertNotNull(results);
+        Assert.assertEquals(2, results.size());
+
+        results = executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE age >= ? AND height = ? ALLOW FILTERING", 27, 182);
+        Assert.assertNotNull(results);
+        Assert.assertEquals(2, results.size());
+
+        results = executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE age = ? AND height = ? ALLOW FILTERING", 28, 182);
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE age >= ? AND height = ? AND score >= ? ALLOW FILTERING", 27, 182, 1.0);
+        Assert.assertNotNull(results);
+        Assert.assertEquals(2, results.size());
+
+        results = executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE age >= ? AND height = ? AND score = ? ALLOW FILTERING", 27, 182, 1.0);
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE location = ? AND age >= ? ALLOW FILTERING", "US", 27);
+        Assert.assertNotNull(results);
+        Assert.assertEquals(2, results.size());
+
+        results = executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE location = ? ALLOW FILTERING", "BY");
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE location LIKE 'U%%' ALLOW FILTERING");
+        Assert.assertNotNull(results);
+        Assert.assertEquals(2, results.size());
+
+        results = executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE location LIKE 'U%%' AND height >= 183 ALLOW FILTERING");
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE location LIKE 'US%%' ALLOW FILTERING");
+        Assert.assertNotNull(results);
+        Assert.assertEquals(2, results.size());
+
+        results = executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE location LIKE 'US' ALLOW FILTERING");
+        Assert.assertNotNull(results);
+        Assert.assertEquals(2, results.size());
+
+        try
+        {
+            executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE location LIKE '%%U' ALLOW FILTERING");
+            Assert.fail();
+        }
+        catch (InvalidRequestException e)
+        {
+            Assert.assertTrue(e.getMessage().contains("only supported"));
+            // expected
+        }
+
+        try
+        {
+            executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE location LIKE '%%' ALLOW FILTERING");
+            Assert.fail();
+        }
+        catch (InvalidRequestException e)
+        {
+            Assert.assertTrue(e.getMessage().contains("empty"));
+            // expected
+        }
+
+        try
+        {
+            executeCQL(CLUSTERING_CF_NAME_1 ,"SELECT * FROM %s.%s WHERE location LIKE '%%%%' ALLOW FILTERING");
+            Assert.fail();
+        }
+        catch (InvalidRequestException e)
+        {
+            Assert.assertTrue(e.getMessage().contains("empty"));
+            // expected
+        }
+
+        // check restrictions on non-indexed clustering columns when preceding columns are indexed
+        store = Keyspace.open(KS_NAME).getColumnFamilyStore(CLUSTERING_CF_NAME_2);
+        executeCQL(CLUSTERING_CF_NAME_2 ,"INSERT INTO %s.%s (name, location, age, height, score) VALUES (?, ?, ?, ?, ?)", "Tony", "US", 43, 184, 2.0);
+        executeCQL(CLUSTERING_CF_NAME_2 ,"INSERT INTO %s.%s (name, location, age, height, score) VALUES (?, ?, ?, ?, ?)", "Christopher", "US", 27, 180, 1.0);
+
+        if (forceFlush)
+            store.forceBlockingFlush();
+
+        results = executeCQL(CLUSTERING_CF_NAME_2 ,"SELECT * FROM %s.%s WHERE location LIKE 'US' AND age = 43 ALLOW FILTERING");
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+        Assert.assertEquals("Tony", results.one().getString("name"));
+    }
+
+    @Test
+    public void testStaticIndex() throws Exception
+    {
+        testStaticIndex(false);
+        cleanupData();
+        testStaticIndex(true);
+    }
+
+    public void testStaticIndex(boolean shouldFlush) throws Exception
+    {
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(STATIC_CF_NAME);
+
+        executeCQL(STATIC_CF_NAME, "INSERT INTO %s.%s (sensor_id,sensor_type) VALUES(?, ?)", 1, "TEMPERATURE");
+        executeCQL(STATIC_CF_NAME, "INSERT INTO %s.%s (sensor_id,date,value,variance) VALUES(?, ?, ?, ?)", 1, 20160401L, 24.46, 2);
+        executeCQL(STATIC_CF_NAME, "INSERT INTO %s.%s (sensor_id,date,value,variance) VALUES(?, ?, ?, ?)", 1, 20160402L, 25.62, 5);
+        executeCQL(STATIC_CF_NAME, "INSERT INTO %s.%s (sensor_id,date,value,variance) VALUES(?, ?, ?, ?)", 1, 20160403L, 24.96, 4);
+
+        if (shouldFlush)
+            store.forceBlockingFlush();
+
+        executeCQL(STATIC_CF_NAME, "INSERT INTO %s.%s (sensor_id,sensor_type) VALUES(?, ?)", 2, "PRESSURE");
+        executeCQL(STATIC_CF_NAME, "INSERT INTO %s.%s (sensor_id,date,value,variance) VALUES(?, ?, ?, ?)", 2, 20160401L, 1.03, 9);
+        executeCQL(STATIC_CF_NAME, "INSERT INTO %s.%s (sensor_id,date,value,variance) VALUES(?, ?, ?, ?)", 2, 20160402L, 1.04, 7);
+        executeCQL(STATIC_CF_NAME, "INSERT INTO %s.%s (sensor_id,date,value,variance) VALUES(?, ?, ?, ?)", 2, 20160403L, 1.01, 4);
+
+        if (shouldFlush)
+            store.forceBlockingFlush();
+
+        UntypedResultSet results;
+
+        // Prefix search on static column only
+        results = executeCQL(STATIC_CF_NAME ,"SELECT * FROM %s.%s WHERE sensor_type LIKE 'temp%%'");
+        Assert.assertNotNull(results);
+        Assert.assertEquals(3, results.size());
+
+        Iterator<UntypedResultSet.Row> iterator = results.iterator();
+
+        UntypedResultSet.Row row1 = iterator.next();
+        Assert.assertEquals(20160401L, row1.getLong("date"));
+        Assert.assertEquals(24.46, row1.getDouble("value"));
+        Assert.assertEquals(2, row1.getInt("variance"));
+
+
+        UntypedResultSet.Row row2 = iterator.next();
+        Assert.assertEquals(20160402L, row2.getLong("date"));
+        Assert.assertEquals(25.62, row2.getDouble("value"));
+        Assert.assertEquals(5, row2.getInt("variance"));
+
+        UntypedResultSet.Row row3 = iterator.next();
+        Assert.assertEquals(20160403L, row3.getLong("date"));
+        Assert.assertEquals(24.96, row3.getDouble("value"));
+        Assert.assertEquals(4, row3.getInt("variance"));
+
+
+        // Combined static and non static filtering
+        results = executeCQL(STATIC_CF_NAME ,"SELECT * FROM %s.%s WHERE sensor_type=? AND value >= ? AND value <= ? AND variance=? ALLOW FILTERING",
+                             "pressure", 1.02, 1.05, 7);
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        row1 = results.one();
+        Assert.assertEquals(20160402L, row1.getLong("date"));
+        Assert.assertEquals(1.04, row1.getDouble("value"));
+        Assert.assertEquals(7, row1.getInt("variance"));
+
+        // Only non statc columns filtering
+        results = executeCQL(STATIC_CF_NAME ,"SELECT * FROM %s.%s WHERE value >= ? AND variance <= ? ALLOW FILTERING", 1.02, 7);
+        Assert.assertNotNull(results);
+        Assert.assertEquals(4, results.size());
+
+        iterator = results.iterator();
+
+        row1 = iterator.next();
+        Assert.assertEquals("TEMPERATURE", row1.getString("sensor_type"));
+        Assert.assertEquals(20160401L, row1.getLong("date"));
+        Assert.assertEquals(24.46, row1.getDouble("value"));
+        Assert.assertEquals(2, row1.getInt("variance"));
+
+
+        row2 = iterator.next();
+        Assert.assertEquals("TEMPERATURE", row2.getString("sensor_type"));
+        Assert.assertEquals(20160402L, row2.getLong("date"));
+        Assert.assertEquals(25.62, row2.getDouble("value"));
+        Assert.assertEquals(5, row2.getInt("variance"));
+
+        row3 = iterator.next();
+        Assert.assertEquals("TEMPERATURE", row3.getString("sensor_type"));
+        Assert.assertEquals(20160403L, row3.getLong("date"));
+        Assert.assertEquals(24.96, row3.getDouble("value"));
+        Assert.assertEquals(4, row3.getInt("variance"));
+
+        UntypedResultSet.Row row4 = iterator.next();
+        Assert.assertEquals("PRESSURE", row4.getString("sensor_type"));
+        Assert.assertEquals(20160402L, row4.getLong("date"));
+        Assert.assertEquals(1.04, row4.getDouble("value"));
+        Assert.assertEquals(7, row4.getInt("variance"));
+    }
+
+    @Test
+    public void testInvalidIndexOptions()
+    {
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+
+        try
+        {
+            // invalid index mode
+            SASIIndex.validateOptions(new HashMap<String, String>()
+                                      {{ put("target", "address"); put("mode", "NORMAL"); }},
+                                      store.metadata);
+            Assert.fail();
+        }
+        catch (ConfigurationException e)
+        {
+            Assert.assertTrue(e.getMessage().contains("Incorrect index mode"));
+        }
+
+        try
+        {
+            // invalid SPARSE on the literal index
+            SASIIndex.validateOptions(new HashMap<String, String>()
+                                      {{ put("target", "address"); put("mode", "SPARSE"); }},
+                                      store.metadata);
+            Assert.fail();
+        }
+        catch (ConfigurationException e)
+        {
+            Assert.assertTrue(e.getMessage().contains("non-literal"));
+        }
+
+        try
+        {
+            // invalid SPARSE on the explicitly literal index
+            SASIIndex.validateOptions(new HashMap<String, String>()
+                                      {{ put("target", "height"); put("mode", "SPARSE"); put("is_literal", "true"); }},
+                    store.metadata);
+            Assert.fail();
+        }
+        catch (ConfigurationException e)
+        {
+            Assert.assertTrue(e.getMessage().contains("non-literal"));
+        }
+
+        try
+        {
+            //  SPARSE with analyzer
+            SASIIndex.validateOptions(new HashMap<String, String>()
+                                      {{ put("target", "height"); put("mode", "SPARSE"); put("analyzed", "true"); }},
+                                      store.metadata);
+            Assert.fail();
+        }
+        catch (ConfigurationException e)
+        {
+            Assert.assertTrue(e.getMessage().contains("doesn't support analyzers"));
+        }
+    }
+
+    @Test
+    public void testLIKEAndEQSemanticsWithDifferenceKindsOfIndexes()
+    {
+        String containsTable = "sasi_like_contains_test";
+        String prefixTable = "sasi_like_prefix_test";
+        String analyzedPrefixTable = "sasi_like_analyzed_prefix_test";
+        String tokenizedContainsTable = "sasi_like_analyzed_contains_test";
+
+        QueryProcessor.executeOnceInternal(String.format("CREATE TABLE IF NOT EXISTS %s.%s (k int primary key, v text);", KS_NAME, containsTable));
+        QueryProcessor.executeOnceInternal(String.format("CREATE TABLE IF NOT EXISTS %s.%s (k int primary key, v text);", KS_NAME, prefixTable));
+        QueryProcessor.executeOnceInternal(String.format("CREATE TABLE IF NOT EXISTS %s.%s (k int primary key, v text);", KS_NAME, analyzedPrefixTable));
+        QueryProcessor.executeOnceInternal(String.format("CREATE TABLE IF NOT EXISTS %s.%s (k int primary key, v text);", KS_NAME, tokenizedContainsTable));
+
+        QueryProcessor.executeOnceInternal(String.format("CREATE CUSTOM INDEX IF NOT EXISTS ON %s.%s(v) " +
+                "USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode' : 'CONTAINS', " +
+                                                         "'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.NonTokenizingAnalyzer', " +
+                                                         "'case_sensitive': 'false' };",
+                                                         KS_NAME, containsTable));
+        QueryProcessor.executeOnceInternal(String.format("CREATE CUSTOM INDEX IF NOT EXISTS ON %s.%s(v) " +
+                "USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode' : 'PREFIX' };", KS_NAME, prefixTable));
+        QueryProcessor.executeOnceInternal(String.format("CREATE CUSTOM INDEX IF NOT EXISTS ON %s.%s(v) " +
+                "USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = { 'mode' : 'PREFIX', 'analyzed': 'true' };", KS_NAME, analyzedPrefixTable));
+        QueryProcessor.executeOnceInternal(String.format("CREATE CUSTOM INDEX IF NOT EXISTS ON %s.%s(v) " +
+                                                         "USING 'org.apache.cassandra.index.sasi.SASIIndex' WITH OPTIONS = " +
+                                                         "{ 'mode' : 'CONTAINS', 'analyzer_class': 'org.apache.cassandra.index.sasi.analyzer.StandardAnalyzer'," +
+                                                         "'analyzed': 'true', 'tokenization_enable_stemming': 'true', 'tokenization_normalize_lowercase': 'true', " +
+                                                         "'tokenization_locale': 'en' };",
+                                                         KS_NAME, tokenizedContainsTable));
+
+        testLIKEAndEQSemanticsWithDifferenceKindsOfIndexes(containsTable, prefixTable, analyzedPrefixTable, tokenizedContainsTable, false);
+        testLIKEAndEQSemanticsWithDifferenceKindsOfIndexes(containsTable, prefixTable, analyzedPrefixTable, tokenizedContainsTable, true);
+    }
+
+    private void testLIKEAndEQSemanticsWithDifferenceKindsOfIndexes(String containsTable,
+                                                                    String prefixTable,
+                                                                    String analyzedPrefixTable,
+                                                                    String tokenizedContainsTable,
+                                                                    boolean forceFlush)
+    {
+        QueryProcessor.executeOnceInternal(String.format("INSERT INTO %s.%s (k, v) VALUES (?, ?);", KS_NAME, containsTable), 0, "Pavel");
+        QueryProcessor.executeOnceInternal(String.format("INSERT INTO %s.%s (k, v) VALUES (?, ?);", KS_NAME, prefixTable), 0, "Jean-Claude");
+        QueryProcessor.executeOnceInternal(String.format("INSERT INTO %s.%s (k, v) VALUES (?, ?);", KS_NAME, analyzedPrefixTable), 0, "Jean-Claude");
+        QueryProcessor.executeOnceInternal(String.format("INSERT INTO %s.%s (k, v) VALUES (?, ?);", KS_NAME, tokenizedContainsTable), 0, "Pavel");
+
+        if (forceFlush)
+        {
+            Keyspace keyspace = Keyspace.open(KS_NAME);
+            for (String table : Arrays.asList(containsTable, prefixTable, analyzedPrefixTable))
+                keyspace.getColumnFamilyStore(table).forceBlockingFlush();
+        }
+
+        UntypedResultSet results;
+
+        // CONTAINS
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Pav';", KS_NAME, containsTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(0, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Pav%%';", KS_NAME, containsTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Pavel';", KS_NAME, containsTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v = 'Pav';", KS_NAME, containsTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(0, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v = 'Pavel';", KS_NAME, containsTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        try
+        {
+            QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v = 'Pav';", KS_NAME, tokenizedContainsTable));
+            Assert.fail();
+        }
+        catch (InvalidRequestException e)
+        {
+            // expected since CONTAINS + analyzed indexes only support LIKE
+        }
+
+        try
+        {
+            QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Pav%%';", KS_NAME, tokenizedContainsTable));
+            Assert.fail();
+        }
+        catch (InvalidRequestException e)
+        {
+            // expected since CONTAINS + analyzed only support LIKE
+        }
+
+        QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Pav%%';", KS_NAME, containsTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE '%%Pav';", KS_NAME, containsTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(0, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE '%%Pav%%';", KS_NAME, containsTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        // PREFIX
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v = 'Jean';", KS_NAME, prefixTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(0, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v = 'Jean-Claude';", KS_NAME, prefixTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Jea';", KS_NAME, prefixTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(0, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Jea%%';", KS_NAME, prefixTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        try
+        {
+            QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE '%%Jea';", KS_NAME, prefixTable));
+            Assert.fail();
+        }
+        catch (InvalidRequestException e)
+        {
+            // expected since PREFIX indexes only support LIKE '<term>%'
+        }
+
+        try
+        {
+            QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE '%%Jea%%';", KS_NAME, prefixTable));
+            Assert.fail();
+        }
+        catch (InvalidRequestException e)
+        {
+            // expected since PREFIX indexes only support LIKE '<term>%'
+        }
+
+        // PREFIX + analyzer
+
+        try
+        {
+            QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v = 'Jean';", KS_NAME, analyzedPrefixTable));
+            Assert.fail();
+        }
+        catch (InvalidRequestException e)
+        {
+            // expected since PREFIX indexes only support EQ without tokenization
+        }
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Jean';", KS_NAME, analyzedPrefixTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Claude';", KS_NAME, analyzedPrefixTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Jean-Claude';", KS_NAME, analyzedPrefixTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Jean%%';", KS_NAME, analyzedPrefixTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        results = QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE 'Claude%%';", KS_NAME, analyzedPrefixTable));
+        Assert.assertNotNull(results);
+        Assert.assertEquals(1, results.size());
+
+        try
+        {
+            QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE '%%Jean';", KS_NAME, analyzedPrefixTable));
+            Assert.fail();
+        }
+        catch (InvalidRequestException e)
+        {
+            // expected since PREFIX indexes only support LIKE '<term>%' and LIKE '<term>'
+        }
+
+        try
+        {
+            QueryProcessor.executeOnceInternal(String.format("SELECT * FROM %s.%s WHERE v LIKE '%%Claude%%';", KS_NAME, analyzedPrefixTable));
+            Assert.fail();
+        }
+        catch (InvalidRequestException e)
+        {
+            // expected since PREFIX indexes only support LIKE '<term>%' and LIKE '<term>'
+        }
+
+        for (String table : Arrays.asList(containsTable, prefixTable, analyzedPrefixTable))
+            QueryProcessor.executeOnceInternal(String.format("TRUNCATE TABLE %s.%s", KS_NAME, table));
+    }
+
+    @Test
+    public void testIndexMemtableSwitching()
+    {
+        // write some data but don't flush
+        ColumnFamilyStore store = loadData(new HashMap<String, Pair<String, Integer>>()
+        {{
+            put("key1", Pair.create("Pavel", 14));
+        }}, false);
+
+        ColumnIndex index = ((SASIIndex) store.indexManager.getIndexByName("first_name")).getIndex();
+        IndexMemtable beforeFlushMemtable = index.getCurrentMemtable();
+
+        PartitionRangeReadCommand command = new PartitionRangeReadCommand(store.metadata,
+                                                                          FBUtilities.nowInSeconds(),
+                                                                          ColumnFilter.all(store.metadata),
+                                                                          RowFilter.NONE,
+                                                                          DataLimits.NONE,
+                                                                          DataRange.allData(store.getPartitioner()),
+                                                                          Optional.empty());
+
+        QueryController controller = new QueryController(store, command, Integer.MAX_VALUE);
+        org.apache.cassandra.index.sasi.plan.Expression expression =
+                new org.apache.cassandra.index.sasi.plan.Expression(controller, index)
+                                                    .add(Operator.LIKE_MATCHES, UTF8Type.instance.fromString("Pavel"));
+
+        Assert.assertTrue(beforeFlushMemtable.search(expression).getCount() > 0);
+
+        store.forceBlockingFlush();
+
+        IndexMemtable afterFlushMemtable = index.getCurrentMemtable();
+
+        Assert.assertNotSame(afterFlushMemtable, beforeFlushMemtable);
+        Assert.assertNull(afterFlushMemtable.search(expression));
+        Assert.assertEquals(0, index.getPendingMemtables().size());
+
+        loadData(new HashMap<String, Pair<String, Integer>>()
+        {{
+            put("key2", Pair.create("Sam", 15));
+        }}, false);
+
+        expression = new org.apache.cassandra.index.sasi.plan.Expression(controller, index)
+                        .add(Operator.LIKE_MATCHES, UTF8Type.instance.fromString("Sam"));
+
+        beforeFlushMemtable = index.getCurrentMemtable();
+        Assert.assertTrue(beforeFlushMemtable.search(expression).getCount() > 0);
+
+        // let's emulate switching memtable and see if we can still read-data in "pending"
+        index.switchMemtable(store.getTracker().getView().getCurrentMemtable());
+
+        Assert.assertNotSame(index.getCurrentMemtable(), beforeFlushMemtable);
+        Assert.assertEquals(1, index.getPendingMemtables().size());
+
+        Assert.assertTrue(index.searchMemtable(expression).getCount() > 0);
+
+        // emulate "everything is flushed" notification
+        index.discardMemtable(store.getTracker().getView().getCurrentMemtable());
+
+        Assert.assertEquals(0, index.getPendingMemtables().size());
+        Assert.assertNull(index.searchMemtable(expression));
+
+        // test discarding data from memtable
+        loadData(new HashMap<String, Pair<String, Integer>>()
+        {{
+            put("key3", Pair.create("Jonathan", 16));
+        }}, false);
+
+        expression = new org.apache.cassandra.index.sasi.plan.Expression(controller, index)
+                .add(Operator.LIKE_MATCHES, UTF8Type.instance.fromString("Jonathan"));
+
+        Assert.assertTrue(index.searchMemtable(expression).getCount() > 0);
+
+        index.switchMemtable();
+        Assert.assertNull(index.searchMemtable(expression));
+    }
+
+    private static ColumnFamilyStore loadData(Map<String, Pair<String, Integer>> data, boolean forceFlush)
+    {
+        return loadData(data, System.currentTimeMillis(), forceFlush);
+    }
+
+    private static ColumnFamilyStore loadData(Map<String, Pair<String, Integer>> data, long timestamp, boolean forceFlush)
+    {
+        for (Map.Entry<String, Pair<String, Integer>> e : data.entrySet())
+            newMutation(e.getKey(), e.getValue().left, null, e.getValue().right, timestamp).apply();
+
+        ColumnFamilyStore store = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+
+        if (forceFlush)
+            store.forceBlockingFlush();
+
+        return store;
+    }
+
+    private void cleanupData()
+    {
+        Keyspace ks = Keyspace.open(KS_NAME);
+        ks.getColumnFamilyStore(CF_NAME).truncateBlocking();
+        ks.getColumnFamilyStore(CLUSTERING_CF_NAME_1).truncateBlocking();
+    }
+
+    private static Set<String> getIndexed(ColumnFamilyStore store, int maxResults, Expression... expressions)
+    {
+        return getIndexed(store, ColumnFilter.all(store.metadata), maxResults, expressions);
+    }
+
+    private static Set<String> getIndexed(ColumnFamilyStore store, ColumnFilter columnFilter, int maxResults, Expression... expressions)
+    {
+        return getKeys(getIndexed(store, columnFilter, null, maxResults, expressions));
+    }
+
+    private static Set<DecoratedKey> getPaged(ColumnFamilyStore store, int pageSize, Expression... expressions)
+    {
+        UnfilteredPartitionIterator currentPage;
+        Set<DecoratedKey> uniqueKeys = new TreeSet<>();
+
+        DecoratedKey lastKey = null;
+
+        int count;
+        do
+        {
+            count = 0;
+            currentPage = getIndexed(store, ColumnFilter.all(store.metadata), lastKey, pageSize, expressions);
+            if (currentPage == null)
+                break;
+
+            while (currentPage.hasNext())
+            {
+                try (UnfilteredRowIterator row = currentPage.next())
+                {
+                    uniqueKeys.add(row.partitionKey());
+                    lastKey = row.partitionKey();
+                    count++;
+                }
+            }
+
+            currentPage.close();
+        }
+        while (count == pageSize);
+
+        return uniqueKeys;
+    }
+
+    private static UnfilteredPartitionIterator getIndexed(ColumnFamilyStore store, ColumnFilter columnFilter, DecoratedKey startKey, int maxResults, Expression... expressions)
+    {
+        DataRange range = (startKey == null)
+                            ? DataRange.allData(PARTITIONER)
+                            : DataRange.forKeyRange(new Range<>(startKey, PARTITIONER.getMinimumToken().maxKeyBound()));
+
+        RowFilter filter = RowFilter.create();
+        for (Expression e : expressions)
+            filter.add(store.metadata.getColumnDefinition(e.name), e.op, e.value);
+
+        ReadCommand command = new PartitionRangeReadCommand(store.metadata,
+                                                            FBUtilities.nowInSeconds(),
+                                                            columnFilter,
+                                                            filter,
+                                                            DataLimits.thriftLimits(maxResults, DataLimits.NO_LIMIT),
+                                                            range,
+                                                            Optional.empty());
+
+        return command.executeLocally(command.executionController());
+    }
+
+    private static Mutation newMutation(String key, String firstName, String lastName, int age, long timestamp)
+    {
+        Mutation rm = new Mutation(KS_NAME, decoratedKey(AsciiType.instance.decompose(key)));
+        List<Cell> cells = new ArrayList<>(3);
+
+        if (age >= 0)
+            cells.add(buildCell(ByteBufferUtil.bytes("age"), Int32Type.instance.decompose(age), timestamp));
+        if (firstName != null)
+            cells.add(buildCell(ByteBufferUtil.bytes("first_name"), UTF8Type.instance.decompose(firstName), timestamp));
+        if (lastName != null)
+            cells.add(buildCell(ByteBufferUtil.bytes("last_name"), UTF8Type.instance.decompose(lastName), timestamp));
+
+        update(rm, cells);
+        return rm;
+    }
+
+    private static Set<String> getKeys(final UnfilteredPartitionIterator rows)
+    {
+        try
+        {
+            return new TreeSet<String>()
+            {{
+                while (rows.hasNext())
+                {
+                    try (UnfilteredRowIterator row = rows.next())
+                    {
+                        if (!row.isEmpty())
+                            add(AsciiType.instance.compose(row.partitionKey().getKey()));
+                    }
+                }
+            }};
+        }
+        finally
+        {
+            rows.close();
+        }
+    }
+
+    private static List<String> convert(final Set<DecoratedKey> keys)
+    {
+        return new ArrayList<String>()
+        {{
+            for (DecoratedKey key : keys)
+                add(AsciiType.instance.getString(key.getKey()));
+        }};
+    }
+
+    private UntypedResultSet executeCQL(String cfName, String query, Object... values)
+    {
+        return QueryProcessor.executeOnceInternal(String.format(query, KS_NAME, cfName), values);
+    }
+
+    private Set<String> executeCQLWithKeys(String rawStatement) throws Exception
+    {
+        SelectStatement statement = (SelectStatement) QueryProcessor.parseStatement(rawStatement).prepare().statement;
+        ResultMessage.Rows cqlRows = statement.executeInternal(QueryState.forInternalCalls(), QueryOptions.DEFAULT);
+
+        Set<String> results = new TreeSet<>();
+        for (CqlRow row : cqlRows.toThriftResult().getRows())
+        {
+            for (org.apache.cassandra.thrift.Column col : row.columns)
+            {
+                String columnName = UTF8Type.instance.getString(col.bufferForName());
+                if (columnName.equals("id"))
+                    results.add(AsciiType.instance.getString(col.bufferForValue()));
+            }
+        }
+
+        return results;
+    }
+
+    private static DecoratedKey decoratedKey(ByteBuffer key)
+    {
+        return PARTITIONER.decorateKey(key);
+    }
+
+    private static DecoratedKey decoratedKey(String key)
+    {
+        return decoratedKey(AsciiType.instance.fromString(key));
+    }
+
+    private static Row buildRow(Collection<Cell> cells)
+    {
+        return buildRow(cells.toArray(new Cell[cells.size()]));
+    }
+
+    private static Row buildRow(Cell... cells)
+    {
+        Row.Builder rowBuilder = BTreeRow.sortedBuilder();
+        rowBuilder.newRow(Clustering.EMPTY);
+        for (Cell c : cells)
+            rowBuilder.addCell(c);
+        return rowBuilder.build();
+    }
+
+    private static Cell buildCell(ByteBuffer name, ByteBuffer value, long timestamp)
+    {
+        CFMetaData cfm = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME).metadata;
+        return BufferCell.live(cfm.getColumnDefinition(name), timestamp, value);
+    }
+
+    private static Cell buildCell(CFMetaData cfm, ByteBuffer name, ByteBuffer value, long timestamp)
+    {
+        ColumnDefinition column = cfm.getColumnDefinition(name);
+        assert column != null;
+        return BufferCell.live(column, timestamp, value);
+    }
+
+    private static Expression buildExpression(ByteBuffer name, Operator op, ByteBuffer value)
+    {
+        return new Expression(name, op, value);
+    }
+
+    private static void update(Mutation rm, ByteBuffer name, ByteBuffer value, long timestamp)
+    {
+        CFMetaData metadata = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME).metadata;
+        rm.add(PartitionUpdate.singleRowUpdate(metadata, rm.key(), buildRow(buildCell(metadata, name, value, timestamp))));
+    }
+
+
+    private static void update(Mutation rm, List<Cell> cells)
+    {
+        CFMetaData metadata = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME).metadata;
+        rm.add(PartitionUpdate.singleRowUpdate(metadata, rm.key(), buildRow(cells)));
+    }
+
+    private static class Expression
+    {
+        public final ByteBuffer name;
+        public final Operator op;
+        public final ByteBuffer value;
+
+        public Expression(ByteBuffer name, Operator op, ByteBuffer value)
+        {
+            this.name = name;
+            this.op = op;
+            this.value = value;
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/index/sasi/analyzer/NonTokenizingAnalyzerTest.java b/test/unit/org/apache/cassandra/index/sasi/analyzer/NonTokenizingAnalyzerTest.java
new file mode 100644
index 0000000..ba67853
--- /dev/null
+++ b/test/unit/org/apache/cassandra/index/sasi/analyzer/NonTokenizingAnalyzerTest.java

@@ -0,0 +1,78 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer;
+
+import java.nio.ByteBuffer;
+
+import org.apache.cassandra.db.marshal.Int32Type;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.utils.ByteBufferUtil;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+/**
+ * Tests for the non-tokenizing analyzer
+ */
+public class NonTokenizingAnalyzerTest
+{
+    @Test
+    public void caseInsensitiveAnalizer() throws Exception
+    {
+        NonTokenizingAnalyzer analyzer = new NonTokenizingAnalyzer();
+        NonTokenizingOptions options = NonTokenizingOptions.getDefaultOptions();
+        options.setCaseSensitive(false);
+        analyzer.init(options, UTF8Type.instance);
+
+        String testString = "Nip it in the bud";
+        ByteBuffer toAnalyze = ByteBuffer.wrap(testString.getBytes());
+        analyzer.reset(toAnalyze);
+        ByteBuffer analyzed = null;
+        while (analyzer.hasNext())
+            analyzed = analyzer.next();
+        Assert.assertTrue(testString.toLowerCase().equals(ByteBufferUtil.string(analyzed)));
+    }
+
+    @Test
+    public void caseSensitiveAnalizer() throws Exception
+    {
+        NonTokenizingAnalyzer analyzer = new NonTokenizingAnalyzer();
+        NonTokenizingOptions options = NonTokenizingOptions.getDefaultOptions();
+        analyzer.init(options, UTF8Type.instance);
+
+        String testString = "Nip it in the bud";
+        ByteBuffer toAnalyze = ByteBuffer.wrap(testString.getBytes());
+        analyzer.reset(toAnalyze);
+        ByteBuffer analyzed = null;
+        while (analyzer.hasNext())
+            analyzed = analyzer.next();
+        Assert.assertFalse(testString.toLowerCase().equals(ByteBufferUtil.string(analyzed)));
+    }
+
+    @Test
+    public void ensureIncompatibleInputSkipped() throws Exception
+    {
+        NonTokenizingAnalyzer analyzer = new NonTokenizingAnalyzer();
+        NonTokenizingOptions options = NonTokenizingOptions.getDefaultOptions();
+        analyzer.init(options, Int32Type.instance);
+
+        ByteBuffer toAnalyze = ByteBufferUtil.bytes(1);
+        analyzer.reset(toAnalyze);
+        Assert.assertTrue(!analyzer.hasNext());
+    }
+}

diff --git a/test/unit/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzerTest.java b/test/unit/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzerTest.java
new file mode 100644
index 0000000..7a88a3d
--- /dev/null
+++ b/test/unit/org/apache/cassandra/index/sasi/analyzer/StandardAnalyzerTest.java

@@ -0,0 +1,227 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.analyzer;
+
+import java.io.InputStream;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Locale;
+
+import org.junit.Test;
+
+import org.apache.cassandra.serializers.UTF8Serializer;
+
+import static org.junit.Assert.assertEquals;
+
+public class StandardAnalyzerTest
+{
+    @Test
+    public void testTokenizationAscii() throws Exception
+    {
+        InputStream is = StandardAnalyzerTest.class.getClassLoader()
+                .getResourceAsStream("tokenization/apache_license_header.txt");
+
+        StandardTokenizerOptions options = new StandardTokenizerOptions.OptionsBuilder()
+                .maxTokenLength(5).build();
+        StandardAnalyzer tokenizer = new StandardAnalyzer();
+        tokenizer.init(options);
+
+        List<ByteBuffer> tokens = new ArrayList<>();
+        tokenizer.reset(is);
+        while (tokenizer.hasNext())
+            tokens.add(tokenizer.next());
+
+        assertEquals(67, tokens.size());
+    }
+
+    @Test
+    public void testTokenizationLoremIpsum() throws Exception
+    {
+        InputStream is = StandardAnalyzerTest.class.getClassLoader()
+                .getResourceAsStream("tokenization/lorem_ipsum.txt");
+
+        StandardAnalyzer tokenizer = new StandardAnalyzer();
+        tokenizer.init(StandardTokenizerOptions.getDefaultOptions());
+
+        List<ByteBuffer> tokens = new ArrayList<>();
+        tokenizer.reset(is);
+        while (tokenizer.hasNext())
+            tokens.add(tokenizer.next());
+
+        assertEquals(62, tokens.size());
+
+    }
+
+    @Test
+    public void testTokenizationJaJp1() throws Exception
+    {
+        InputStream is = StandardAnalyzerTest.class.getClassLoader()
+                .getResourceAsStream("tokenization/ja_jp_1.txt");
+
+        StandardAnalyzer tokenizer = new StandardAnalyzer();
+        tokenizer.init(StandardTokenizerOptions.getDefaultOptions());
+
+        tokenizer.reset(is);
+        List<ByteBuffer> tokens = new ArrayList<>();
+        while (tokenizer.hasNext())
+            tokens.add(tokenizer.next());
+
+        assertEquals(210, tokens.size());
+    }
+
+    @Test
+    public void testTokenizationJaJp2() throws Exception
+    {
+        InputStream is = StandardAnalyzerTest.class.getClassLoader()
+                .getResourceAsStream("tokenization/ja_jp_2.txt");
+
+        StandardTokenizerOptions options = new StandardTokenizerOptions.OptionsBuilder().stemTerms(true)
+                .ignoreStopTerms(true).alwaysLowerCaseTerms(true).build();
+        StandardAnalyzer tokenizer = new StandardAnalyzer();
+        tokenizer.init(options);
+
+        tokenizer.reset(is);
+        List<ByteBuffer> tokens = new ArrayList<>();
+        while (tokenizer.hasNext())
+            tokens.add(tokenizer.next());
+
+        assertEquals(57, tokens.size());
+    }
+
+    @Test
+    public void testTokenizationRuRu1() throws Exception
+    {
+        InputStream is = StandardAnalyzerTest.class.getClassLoader()
+                .getResourceAsStream("tokenization/ru_ru_1.txt");
+        StandardAnalyzer tokenizer = new StandardAnalyzer();
+        tokenizer.init(StandardTokenizerOptions.getDefaultOptions());
+
+        List<ByteBuffer> tokens = new ArrayList<>();
+        tokenizer.reset(is);
+        while (tokenizer.hasNext())
+            tokens.add(tokenizer.next());
+
+        assertEquals(456, tokens.size());
+    }
+
+    @Test
+    public void testTokenizationZnTw1() throws Exception
+    {
+        InputStream is = StandardAnalyzerTest.class.getClassLoader()
+                .getResourceAsStream("tokenization/zn_tw_1.txt");
+        StandardAnalyzer tokenizer = new StandardAnalyzer();
+        tokenizer.init(StandardTokenizerOptions.getDefaultOptions());
+
+        List<ByteBuffer> tokens = new ArrayList<>();
+        tokenizer.reset(is);
+        while (tokenizer.hasNext())
+            tokens.add(tokenizer.next());
+
+        assertEquals(963, tokens.size());
+    }
+
+    @Test
+    public void testTokenizationAdventuresOfHuckFinn() throws Exception
+    {
+        InputStream is = StandardAnalyzerTest.class.getClassLoader()
+                .getResourceAsStream("tokenization/adventures_of_huckleberry_finn_mark_twain.txt");
+
+        StandardTokenizerOptions options = new StandardTokenizerOptions.OptionsBuilder().stemTerms(true)
+                .ignoreStopTerms(true).useLocale(Locale.ENGLISH)
+                .alwaysLowerCaseTerms(true).build();
+        StandardAnalyzer tokenizer = new StandardAnalyzer();
+        tokenizer.init(options);
+
+        List<ByteBuffer> tokens = new ArrayList<>();
+        tokenizer.reset(is);
+        while (tokenizer.hasNext())
+            tokens.add(tokenizer.next());
+
+        assertEquals(37739, tokens.size());
+    }
+
+    @Test
+    public void testSkipStopWordBeforeStemmingFrench() throws Exception
+    {
+        InputStream is = StandardAnalyzerTest.class.getClassLoader()
+               .getResourceAsStream("tokenization/french_skip_stop_words_before_stemming.txt");
+
+        StandardTokenizerOptions options = new StandardTokenizerOptions.OptionsBuilder().stemTerms(true)
+                .ignoreStopTerms(true).useLocale(Locale.FRENCH)
+                .alwaysLowerCaseTerms(true).build();
+        StandardAnalyzer tokenizer = new StandardAnalyzer();
+        tokenizer.init(options);
+
+        List<ByteBuffer> tokens = new ArrayList<>();
+        List<String> words = new ArrayList<>();
+        tokenizer.reset(is);
+        while (tokenizer.hasNext())
+        {
+            final ByteBuffer nextToken = tokenizer.next();
+            tokens.add(nextToken);
+            words.add(UTF8Serializer.instance.deserialize(nextToken.duplicate()));
+        }
+
+        assertEquals(4, tokens.size());
+        assertEquals("dans", words.get(0));
+        assertEquals("plui", words.get(1));
+        assertEquals("chanson", words.get(2));
+        assertEquals("connu", words.get(3));
+    }
+
+    @Test
+    public void tokenizeDomainNamesAndUrls() throws Exception
+    {
+        InputStream is = StandardAnalyzerTest.class.getClassLoader()
+                .getResourceAsStream("tokenization/top_visited_domains.txt");
+
+        StandardAnalyzer tokenizer = new StandardAnalyzer();
+        tokenizer.init(StandardTokenizerOptions.getDefaultOptions());
+        tokenizer.reset(is);
+
+        List<ByteBuffer> tokens = new ArrayList<>();
+        while (tokenizer.hasNext())
+            tokens.add(tokenizer.next());
+
+        assertEquals(15, tokens.size());
+    }
+
+    @Test
+    public void testReuseAndResetTokenizerInstance() throws Exception
+    {
+        List<ByteBuffer> bbToTokenize = new ArrayList<>();
+        bbToTokenize.add(ByteBuffer.wrap("Nip it in the bud".getBytes()));
+        bbToTokenize.add(ByteBuffer.wrap("I couldn’t care less".getBytes()));
+        bbToTokenize.add(ByteBuffer.wrap("One and the same".getBytes()));
+        bbToTokenize.add(ByteBuffer.wrap("The squeaky wheel gets the grease.".getBytes()));
+        bbToTokenize.add(ByteBuffer.wrap("The pen is mightier than the sword.".getBytes()));
+
+        StandardAnalyzer tokenizer = new StandardAnalyzer();
+        tokenizer.init(StandardTokenizerOptions.getDefaultOptions());
+
+        List<ByteBuffer> tokens = new ArrayList<>();
+        for (ByteBuffer bb : bbToTokenize)
+        {
+            tokenizer.reset(bb);
+            while (tokenizer.hasNext())
+                tokens.add(tokenizer.next());
+        }
+        assertEquals(10, tokens.size());
+    }
+}

diff --git a/test/unit/org/apache/cassandra/index/sasi/disk/OnDiskIndexTest.java b/test/unit/org/apache/cassandra/index/sasi/disk/OnDiskIndexTest.java
new file mode 100644
index 0000000..a3985ca
--- /dev/null
+++ b/test/unit/org/apache/cassandra/index/sasi/disk/OnDiskIndexTest.java

@@ -0,0 +1,917 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import java.io.File;
+import java.nio.ByteBuffer;
+import java.util.*;
+import java.util.concurrent.ThreadLocalRandom;
+import java.util.stream.Collectors;
+
+import org.apache.cassandra.cql3.Operator;
+import org.apache.cassandra.db.BufferDecoratedKey;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.dht.Murmur3Partitioner;
+import org.apache.cassandra.index.sasi.plan.Expression;
+import org.apache.cassandra.index.sasi.utils.CombinedTerm;
+import org.apache.cassandra.index.sasi.utils.CombinedTermIterator;
+import org.apache.cassandra.index.sasi.utils.OnDiskIndexIterator;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.db.marshal.AbstractType;
+import org.apache.cassandra.db.marshal.Int32Type;
+import org.apache.cassandra.db.marshal.LongType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.io.util.DataOutputBuffer;
+import org.apache.cassandra.utils.MurmurHash;
+import org.apache.cassandra.utils.Pair;
+
+import com.carrotsearch.hppc.LongSet;
+import com.carrotsearch.hppc.cursors.LongCursor;
+
+import com.google.common.base.Function;
+import com.google.common.collect.Iterators;
+import com.google.common.collect.Sets;
+
+import junit.framework.Assert;
+import org.junit.Test;
+
+public class OnDiskIndexTest
+{
+    @Test
+    public void testStringSAConstruction() throws Exception
+    {
+        Map<ByteBuffer, TokenTreeBuilder> data = new HashMap<ByteBuffer, TokenTreeBuilder>()
+        {{
+                put(UTF8Type.instance.decompose("scat"), keyBuilder(1L));
+                put(UTF8Type.instance.decompose("mat"),  keyBuilder(2L));
+                put(UTF8Type.instance.decompose("fat"),  keyBuilder(3L));
+                put(UTF8Type.instance.decompose("cat"),  keyBuilder(1L, 4L));
+                put(UTF8Type.instance.decompose("till"), keyBuilder(2L, 6L));
+                put(UTF8Type.instance.decompose("bill"), keyBuilder(5L));
+                put(UTF8Type.instance.decompose("foo"),  keyBuilder(7L));
+                put(UTF8Type.instance.decompose("bar"),  keyBuilder(9L, 10L));
+                put(UTF8Type.instance.decompose("michael"), keyBuilder(11L, 12L, 1L));
+                put(UTF8Type.instance.decompose("am"), keyBuilder(15L));
+        }};
+
+        OnDiskIndexBuilder builder = new OnDiskIndexBuilder(UTF8Type.instance, UTF8Type.instance, OnDiskIndexBuilder.Mode.CONTAINS);
+        for (Map.Entry<ByteBuffer, TokenTreeBuilder> e : data.entrySet())
+            addAll(builder, e.getKey(), e.getValue());
+
+        File index = File.createTempFile("on-disk-sa-string", "db");
+        index.deleteOnExit();
+
+        builder.finish(index);
+
+        OnDiskIndex onDisk = new OnDiskIndex(index, UTF8Type.instance, new KeyConverter());
+
+        // first check if we can find exact matches
+        for (Map.Entry<ByteBuffer, TokenTreeBuilder> e : data.entrySet())
+        {
+            if (UTF8Type.instance.getString(e.getKey()).equals("cat"))
+                continue; // cat is embedded into scat, we'll test it in next section
+
+            Assert.assertEquals("Key was: " + UTF8Type.instance.compose(e.getKey()), convert(e.getValue()), convert(onDisk.search(expressionFor(UTF8Type.instance, e.getKey()))));
+        }
+
+        // check that cat returns positions for scat & cat
+        Assert.assertEquals(convert(1, 4), convert(onDisk.search(expressionFor("cat"))));
+
+        // random suffix queries
+        Assert.assertEquals(convert(9, 10), convert(onDisk.search(expressionFor("ar"))));
+        Assert.assertEquals(convert(1, 2, 3, 4), convert(onDisk.search(expressionFor("at"))));
+        Assert.assertEquals(convert(1, 11, 12), convert(onDisk.search(expressionFor("mic"))));
+        Assert.assertEquals(convert(1, 11, 12), convert(onDisk.search(expressionFor("ae"))));
+        Assert.assertEquals(convert(2, 5, 6), convert(onDisk.search(expressionFor("ll"))));
+        Assert.assertEquals(convert(1, 2, 5, 6, 11, 12), convert(onDisk.search(expressionFor("l"))));
+        Assert.assertEquals(convert(7), convert(onDisk.search(expressionFor("oo"))));
+        Assert.assertEquals(convert(7), convert(onDisk.search(expressionFor("o"))));
+        Assert.assertEquals(convert(1, 2, 3, 4, 6), convert(onDisk.search(expressionFor("t"))));
+        Assert.assertEquals(convert(1, 2, 11, 12), convert(onDisk.search(expressionFor("m", Operator.LIKE_PREFIX))));
+
+        Assert.assertEquals(Collections.<DecoratedKey>emptySet(), convert(onDisk.search(expressionFor("hello"))));
+
+        onDisk.close();
+    }
+
+    @Test
+    public void testIntegerSAConstruction() throws Exception
+    {
+        final Map<ByteBuffer, TokenTreeBuilder> data = new HashMap<ByteBuffer, TokenTreeBuilder>()
+        {{
+                put(Int32Type.instance.decompose(5),  keyBuilder(1L));
+                put(Int32Type.instance.decompose(7),  keyBuilder(2L));
+                put(Int32Type.instance.decompose(1),  keyBuilder(3L));
+                put(Int32Type.instance.decompose(3),  keyBuilder(1L, 4L));
+                put(Int32Type.instance.decompose(8),  keyBuilder(2L, 6L));
+                put(Int32Type.instance.decompose(10), keyBuilder(5L));
+                put(Int32Type.instance.decompose(6),  keyBuilder(7L));
+                put(Int32Type.instance.decompose(4),  keyBuilder(9L, 10L));
+                put(Int32Type.instance.decompose(0),  keyBuilder(11L, 12L, 1L));
+        }};
+
+        OnDiskIndexBuilder builder = new OnDiskIndexBuilder(UTF8Type.instance, Int32Type.instance, OnDiskIndexBuilder.Mode.PREFIX);
+        for (Map.Entry<ByteBuffer, TokenTreeBuilder> e : data.entrySet())
+            addAll(builder, e.getKey(), e.getValue());
+
+        File index = File.createTempFile("on-disk-sa-int", "db");
+        index.deleteOnExit();
+
+        builder.finish(index);
+
+        OnDiskIndex onDisk = new OnDiskIndex(index, Int32Type.instance, new KeyConverter());
+
+        for (Map.Entry<ByteBuffer, TokenTreeBuilder> e : data.entrySet())
+        {
+            Assert.assertEquals(convert(e.getValue()), convert(onDisk.search(expressionFor(Operator.EQ, Int32Type.instance, e.getKey()))));
+        }
+
+        List<ByteBuffer> sortedNumbers = new ArrayList<ByteBuffer>()
+        {{
+            addAll(data.keySet().stream().collect(Collectors.toList()));
+        }};
+
+        Collections.sort(sortedNumbers, Int32Type.instance::compare);
+
+        // test full iteration
+        int idx = 0;
+        for (OnDiskIndex.DataTerm term : onDisk)
+        {
+            ByteBuffer number = sortedNumbers.get(idx++);
+            Assert.assertEquals(number, term.getTerm());
+            Assert.assertEquals(convert(data.get(number)), convert(term.getTokens()));
+        }
+
+        // test partial iteration (descending)
+        idx = 3; // start from the 3rd element
+        Iterator<OnDiskIndex.DataTerm> partialIter = onDisk.iteratorAt(sortedNumbers.get(idx), OnDiskIndex.IteratorOrder.DESC, true);
+        while (partialIter.hasNext())
+        {
+            OnDiskIndex.DataTerm term = partialIter.next();
+            ByteBuffer number = sortedNumbers.get(idx++);
+
+            Assert.assertEquals(number, term.getTerm());
+            Assert.assertEquals(convert(data.get(number)), convert(term.getTokens()));
+        }
+
+        idx = 3; // start from the 3rd element exclusive
+        partialIter = onDisk.iteratorAt(sortedNumbers.get(idx++), OnDiskIndex.IteratorOrder.DESC, false);
+        while (partialIter.hasNext())
+        {
+            OnDiskIndex.DataTerm term = partialIter.next();
+            ByteBuffer number = sortedNumbers.get(idx++);
+
+            Assert.assertEquals(number, term.getTerm());
+            Assert.assertEquals(convert(data.get(number)), convert(term.getTokens()));
+        }
+
+        // test partial iteration (ascending)
+        idx = 6; // start from the 6rd element
+        partialIter = onDisk.iteratorAt(sortedNumbers.get(idx), OnDiskIndex.IteratorOrder.ASC, true);
+        while (partialIter.hasNext())
+        {
+            OnDiskIndex.DataTerm term = partialIter.next();
+            ByteBuffer number = sortedNumbers.get(idx--);
+
+            Assert.assertEquals(number, term.getTerm());
+            Assert.assertEquals(convert(data.get(number)), convert(term.getTokens()));
+        }
+
+        idx = 6; // start from the 6rd element exclusive
+        partialIter = onDisk.iteratorAt(sortedNumbers.get(idx--), OnDiskIndex.IteratorOrder.ASC, false);
+        while (partialIter.hasNext())
+        {
+            OnDiskIndex.DataTerm term = partialIter.next();
+            ByteBuffer number = sortedNumbers.get(idx--);
+
+            Assert.assertEquals(number, term.getTerm());
+            Assert.assertEquals(convert(data.get(number)), convert(term.getTokens()));
+        }
+
+        onDisk.close();
+
+        List<ByteBuffer> iterCheckNums = new ArrayList<ByteBuffer>()
+        {{
+            add(Int32Type.instance.decompose(3));
+            add(Int32Type.instance.decompose(9));
+            add(Int32Type.instance.decompose(14));
+            add(Int32Type.instance.decompose(42));
+        }};
+
+        OnDiskIndexBuilder iterTest = new OnDiskIndexBuilder(UTF8Type.instance, Int32Type.instance, OnDiskIndexBuilder.Mode.PREFIX);
+        for (int i = 0; i < iterCheckNums.size(); i++)
+            iterTest.add(iterCheckNums.get(i), keyAt((long) i), i);
+
+        File iterIndex = File.createTempFile("sa-iter", ".db");
+        iterIndex.deleteOnExit();
+
+        iterTest.finish(iterIndex);
+
+        onDisk = new OnDiskIndex(iterIndex, Int32Type.instance, new KeyConverter());
+
+        ByteBuffer number = Int32Type.instance.decompose(1);
+        Assert.assertEquals(0, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.ASC, false)));
+        Assert.assertEquals(0, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.ASC, true)));
+        Assert.assertEquals(4, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.DESC, false)));
+        Assert.assertEquals(4, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.DESC, true)));
+
+        number = Int32Type.instance.decompose(44);
+        Assert.assertEquals(4, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.ASC, false)));
+        Assert.assertEquals(4, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.ASC, true)));
+        Assert.assertEquals(0, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.DESC, false)));
+        Assert.assertEquals(0, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.DESC, true)));
+
+        number = Int32Type.instance.decompose(20);
+        Assert.assertEquals(3, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.ASC, false)));
+        Assert.assertEquals(3, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.ASC, true)));
+        Assert.assertEquals(1, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.DESC, false)));
+        Assert.assertEquals(1, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.DESC, true)));
+
+        number = Int32Type.instance.decompose(5);
+        Assert.assertEquals(1, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.ASC, false)));
+        Assert.assertEquals(1, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.ASC, true)));
+        Assert.assertEquals(3, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.DESC, false)));
+        Assert.assertEquals(3, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.DESC, true)));
+
+        number = Int32Type.instance.decompose(10);
+        Assert.assertEquals(2, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.ASC, false)));
+        Assert.assertEquals(2, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.ASC, true)));
+        Assert.assertEquals(2, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.DESC, false)));
+        Assert.assertEquals(2, Iterators.size(onDisk.iteratorAt(number, OnDiskIndex.IteratorOrder.DESC, true)));
+
+        onDisk.close();
+    }
+
+    @Test
+    public void testMultiSuffixMatches() throws Exception
+    {
+        OnDiskIndexBuilder builder = new OnDiskIndexBuilder(UTF8Type.instance, UTF8Type.instance, OnDiskIndexBuilder.Mode.CONTAINS)
+        {{
+                addAll(this, UTF8Type.instance.decompose("Eliza"), keyBuilder(1L, 2L));
+                addAll(this, UTF8Type.instance.decompose("Elizabeth"), keyBuilder(3L, 4L));
+                addAll(this, UTF8Type.instance.decompose("Aliza"), keyBuilder(5L, 6L));
+                addAll(this, UTF8Type.instance.decompose("Taylor"), keyBuilder(7L, 8L));
+                addAll(this, UTF8Type.instance.decompose("Pavel"), keyBuilder(9L, 10L));
+        }};
+
+        File index = File.createTempFile("on-disk-sa-multi-suffix-match", ".db");
+        index.deleteOnExit();
+
+        builder.finish(index);
+
+        OnDiskIndex onDisk = new OnDiskIndex(index, UTF8Type.instance, new KeyConverter());
+
+        Assert.assertEquals(convert(1, 2, 3, 4, 5, 6), convert(onDisk.search(expressionFor("liz"))));
+        Assert.assertEquals(convert(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), convert(onDisk.search(expressionFor("a"))));
+        Assert.assertEquals(convert(5, 6), convert(onDisk.search(expressionFor("A"))));
+        Assert.assertEquals(convert(1, 2, 3, 4), convert(onDisk.search(expressionFor("E"))));
+        Assert.assertEquals(convert(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), convert(onDisk.search(expressionFor("l"))));
+        Assert.assertEquals(convert(3, 4), convert(onDisk.search(expressionFor("bet"))));
+        Assert.assertEquals(convert(3, 4, 9, 10), convert(onDisk.search(expressionFor("e"))));
+        Assert.assertEquals(convert(7, 8), convert(onDisk.search(expressionFor("yl"))));
+        Assert.assertEquals(convert(7, 8), convert(onDisk.search(expressionFor("T"))));
+        Assert.assertEquals(convert(1, 2, 3, 4, 5, 6), convert(onDisk.search(expressionFor("za"))));
+        Assert.assertEquals(convert(3, 4), convert(onDisk.search(expressionFor("ab"))));
+
+        Assert.assertEquals(Collections.<DecoratedKey>emptySet(), convert(onDisk.search(expressionFor("Pi"))));
+        Assert.assertEquals(Collections.<DecoratedKey>emptySet(), convert(onDisk.search(expressionFor("ethz"))));
+        Assert.assertEquals(Collections.<DecoratedKey>emptySet(), convert(onDisk.search(expressionFor("liw"))));
+        Assert.assertEquals(Collections.<DecoratedKey>emptySet(), convert(onDisk.search(expressionFor("Taw"))));
+        Assert.assertEquals(Collections.<DecoratedKey>emptySet(), convert(onDisk.search(expressionFor("Av"))));
+
+        onDisk.close();
+    }
+
+    @Test
+    public void testSparseMode() throws Exception
+    {
+        OnDiskIndexBuilder builder = new OnDiskIndexBuilder(UTF8Type.instance, LongType.instance, OnDiskIndexBuilder.Mode.SPARSE);
+
+        final long start = System.currentTimeMillis();
+        final int numIterations = 100000;
+
+        for (long i = 0; i < numIterations; i++)
+            builder.add(LongType.instance.decompose(start + i), keyAt(i), i);
+
+        File index = File.createTempFile("on-disk-sa-sparse", "db");
+        index.deleteOnExit();
+
+        builder.finish(index);
+
+        OnDiskIndex onDisk = new OnDiskIndex(index, LongType.instance, new KeyConverter());
+
+        ThreadLocalRandom random = ThreadLocalRandom.current();
+
+        for (long step = start; step < (start + numIterations); step += 1000)
+        {
+            boolean lowerInclusive = random.nextBoolean();
+            boolean upperInclusive = random.nextBoolean();
+
+            long limit = random.nextLong(step, start + numIterations);
+            RangeIterator<Long, Token> rows = onDisk.search(expressionFor(step, lowerInclusive, limit, upperInclusive));
+
+            long lowerKey = step - start;
+            long upperKey = lowerKey + (limit - step);
+
+            if (!lowerInclusive)
+                lowerKey += 1;
+
+            if (upperInclusive)
+                upperKey += 1;
+
+            Set<DecoratedKey> actual = convert(rows);
+            for (long key = lowerKey; key < upperKey; key++)
+                Assert.assertTrue("key" + key + " wasn't found", actual.contains(keyAt(key)));
+
+            Assert.assertEquals((upperKey - lowerKey), actual.size());
+        }
+
+        // let's also explicitly test whole range search
+        RangeIterator<Long, Token> rows = onDisk.search(expressionFor(start, true, start + numIterations, true));
+
+        Set<DecoratedKey> actual = convert(rows);
+        Assert.assertEquals(numIterations, actual.size());
+    }
+
+    @Test
+    public void testNotEqualsQueryForStrings() throws Exception
+    {
+        Map<ByteBuffer, TokenTreeBuilder> data = new HashMap<ByteBuffer, TokenTreeBuilder>()
+        {{
+                put(UTF8Type.instance.decompose("Pavel"),   keyBuilder(1L, 2L));
+                put(UTF8Type.instance.decompose("Jason"),   keyBuilder(3L));
+                put(UTF8Type.instance.decompose("Jordan"),  keyBuilder(4L));
+                put(UTF8Type.instance.decompose("Michael"), keyBuilder(5L, 6L));
+                put(UTF8Type.instance.decompose("Vijay"),   keyBuilder(7L));
+                put(UTF8Type.instance.decompose("Travis"),  keyBuilder(8L));
+                put(UTF8Type.instance.decompose("Aleksey"), keyBuilder(9L, 10L));
+        }};
+
+        OnDiskIndexBuilder builder = new OnDiskIndexBuilder(UTF8Type.instance, UTF8Type.instance, OnDiskIndexBuilder.Mode.PREFIX);
+        for (Map.Entry<ByteBuffer, TokenTreeBuilder> e : data.entrySet())
+            addAll(builder, e.getKey(), e.getValue());
+
+        File index = File.createTempFile("on-disk-sa-except-test", "db");
+        index.deleteOnExit();
+
+        builder.finish(index);
+
+        OnDiskIndex onDisk = new OnDiskIndex(index, UTF8Type.instance, new KeyConverter());
+
+        // test whole words first
+        Assert.assertEquals(convert(3, 4, 5, 6, 7, 8, 9, 10), convert(onDisk.search(expressionForNot("Aleksey", "Vijay", "Pavel"))));
+
+        Assert.assertEquals(convert(3, 4, 7, 8, 9, 10), convert(onDisk.search(expressionForNot("Aleksey", "Vijay", "Pavel", "Michael"))));
+
+        Assert.assertEquals(convert(3, 4, 7, 9, 10), convert(onDisk.search(expressionForNot("Aleksey", "Vijay", "Pavel", "Michael", "Travis"))));
+
+        // now test prefixes
+        Assert.assertEquals(convert(3, 4, 5, 6, 7, 8, 9, 10), convert(onDisk.search(expressionForNot("Aleksey", "Vijay", "Pav"))));
+
+        Assert.assertEquals(convert(3, 4, 7, 8, 9, 10), convert(onDisk.search(expressionForNot("Aleksey", "Vijay", "Pavel", "Mic"))));
+
+        Assert.assertEquals(convert(3, 4, 7, 9, 10), convert(onDisk.search(expressionForNot("Aleksey", "Vijay", "Pavel", "Micha", "Tr"))));
+
+        onDisk.close();
+    }
+
+    @Test
+    public void testNotEqualsQueryForNumbers() throws Exception
+    {
+        final Map<ByteBuffer, TokenTreeBuilder> data = new HashMap<ByteBuffer, TokenTreeBuilder>()
+        {{
+                put(Int32Type.instance.decompose(5),  keyBuilder(1L));
+                put(Int32Type.instance.decompose(7),  keyBuilder(2L));
+                put(Int32Type.instance.decompose(1),  keyBuilder(3L));
+                put(Int32Type.instance.decompose(3),  keyBuilder(1L, 4L));
+                put(Int32Type.instance.decompose(8),  keyBuilder(8L, 6L));
+                put(Int32Type.instance.decompose(10), keyBuilder(5L));
+                put(Int32Type.instance.decompose(6),  keyBuilder(7L));
+                put(Int32Type.instance.decompose(4),  keyBuilder(9L, 10L));
+                put(Int32Type.instance.decompose(0),  keyBuilder(11L, 12L, 1L));
+        }};
+
+        OnDiskIndexBuilder builder = new OnDiskIndexBuilder(UTF8Type.instance, Int32Type.instance, OnDiskIndexBuilder.Mode.PREFIX);
+        for (Map.Entry<ByteBuffer, TokenTreeBuilder> e : data.entrySet())
+            addAll(builder, e.getKey(), e.getValue());
+
+        File index = File.createTempFile("on-disk-sa-except-int-test", "db");
+        index.deleteOnExit();
+
+        builder.finish(index);
+
+        OnDiskIndex onDisk = new OnDiskIndex(index, Int32Type.instance, new KeyConverter());
+
+        Assert.assertEquals(convert(1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 12), convert(onDisk.search(expressionForNot(0, 10, 1))));
+        Assert.assertEquals(convert(1, 2, 4, 5, 7, 9, 10, 11, 12), convert(onDisk.search(expressionForNot(0, 10, 1, 8))));
+        Assert.assertEquals(convert(1, 2, 4, 5, 7, 11, 12), convert(onDisk.search(expressionForNot(0, 10, 1, 8, 4))));
+
+        onDisk.close();
+    }
+
+    @Test
+    public void testRangeQueryWithExclusions() throws Exception
+    {
+        final long lower = 0;
+        final long upper = 100000;
+
+        OnDiskIndexBuilder builder = new OnDiskIndexBuilder(UTF8Type.instance, LongType.instance, OnDiskIndexBuilder.Mode.SPARSE);
+        for (long i = lower; i <= upper; i++)
+            builder.add(LongType.instance.decompose(i), keyAt(i), i);
+
+        File index = File.createTempFile("on-disk-sa-except-long-ranges", "db");
+        index.deleteOnExit();
+
+        builder.finish(index);
+
+        OnDiskIndex onDisk = new OnDiskIndex(index, LongType.instance, new KeyConverter());
+
+        ThreadLocalRandom random = ThreadLocalRandom.current();
+
+        // single exclusion
+
+        // let's do small range first to figure out if searchPoint works properly
+        validateExclusions(onDisk, lower, 50, Sets.newHashSet(42L));
+        // now let's do whole data set to test SPARSE searching
+        validateExclusions(onDisk, lower, upper, Sets.newHashSet(31337L));
+
+        // pair of exclusions which would generate a split
+
+        validateExclusions(onDisk, lower, random.nextInt(400, 800), Sets.newHashSet(42L, 154L));
+        validateExclusions(onDisk, lower, upper, Sets.newHashSet(31337L, 54631L));
+
+        // 3 exclusions which would generate a split and change bounds
+
+        validateExclusions(onDisk, lower, random.nextInt(400, 800), Sets.newHashSet(42L, 154L));
+        validateExclusions(onDisk, lower, upper, Sets.newHashSet(31337L, 54631L));
+
+        validateExclusions(onDisk, lower, random.nextLong(400, upper), Sets.newHashSet(42L, 55L));
+        validateExclusions(onDisk, lower, random.nextLong(400, upper), Sets.newHashSet(42L, 55L, 93L));
+        validateExclusions(onDisk, lower, random.nextLong(400, upper), Sets.newHashSet(42L, 55L, 93L, 205L));
+
+        Set<Long> exclusions = Sets.newHashSet(3L, 12L, 13L, 14L, 27L, 54L, 81L, 125L, 384L, 771L, 1054L, 2048L, 78834L);
+
+        // test that exclusions are properly bound by lower/upper of the expression
+        Assert.assertEquals(392, validateExclusions(onDisk, lower, 400, exclusions, false));
+        Assert.assertEquals(101, validateExclusions(onDisk, lower, 100, Sets.newHashSet(-10L, -5L, -1L), false));
+
+        validateExclusions(onDisk, lower, upper, exclusions);
+
+        Assert.assertEquals(100000, convert(onDisk.search(new Expression("", LongType.instance)
+                                                    .add(Operator.NEQ, LongType.instance.decompose(100L)))).size());
+
+        Assert.assertEquals(49, convert(onDisk.search(new Expression("", LongType.instance)
+                                                    .add(Operator.LT, LongType.instance.decompose(50L))
+                                                    .add(Operator.NEQ, LongType.instance.decompose(10L)))).size());
+
+        Assert.assertEquals(99998, convert(onDisk.search(new Expression("", LongType.instance)
+                                                    .add(Operator.GT, LongType.instance.decompose(1L))
+                                                    .add(Operator.NEQ, LongType.instance.decompose(20L)))).size());
+
+        onDisk.close();
+    }
+
+    private void validateExclusions(OnDiskIndex sa, long lower, long upper, Set<Long> exclusions)
+    {
+        validateExclusions(sa, lower, upper, exclusions, true);
+    }
+
+    private int validateExclusions(OnDiskIndex sa, long lower, long upper, Set<Long> exclusions, boolean checkCount)
+    {
+        int count = 0;
+        for (DecoratedKey key : convert(sa.search(rangeWithExclusions(lower, true, upper, true, exclusions))))
+        {
+            String keyId = UTF8Type.instance.getString(key.getKey()).split("key")[1];
+            Assert.assertFalse("key" + keyId + " is present.", exclusions.contains(Long.valueOf(keyId)));
+            count++;
+        }
+
+        if (checkCount)
+            Assert.assertEquals(upper - (lower == 0 ? -1 : lower) - exclusions.size(), count);
+
+        return count;
+    }
+
+    @Test
+    public void testDescriptor() throws Exception
+    {
+        final Map<ByteBuffer, Pair<DecoratedKey, Long>> data = new HashMap<ByteBuffer, Pair<DecoratedKey, Long>>()
+        {{
+                put(Int32Type.instance.decompose(5), Pair.create(keyAt(1L), 1L));
+        }};
+
+        OnDiskIndexBuilder builder1 = new OnDiskIndexBuilder(UTF8Type.instance, Int32Type.instance, OnDiskIndexBuilder.Mode.PREFIX);
+        OnDiskIndexBuilder builder2 = new OnDiskIndexBuilder(UTF8Type.instance, Int32Type.instance, OnDiskIndexBuilder.Mode.PREFIX);
+        for (Map.Entry<ByteBuffer, Pair<DecoratedKey, Long>> e : data.entrySet())
+        {
+            DecoratedKey key = e.getValue().left;
+            Long position = e.getValue().right;
+
+            builder1.add(e.getKey(), key, position);
+            builder2.add(e.getKey(), key, position);
+        }
+
+        File index1 = File.createTempFile("on-disk-sa-int", "db");
+        File index2 = File.createTempFile("on-disk-sa-int2", "db");
+        index1.deleteOnExit();
+        index2.deleteOnExit();
+
+        builder1.finish(index1);
+        builder2.finish(new Descriptor(Descriptor.VERSION_AA), index2);
+
+        OnDiskIndex onDisk1 = new OnDiskIndex(index1, Int32Type.instance, new KeyConverter());
+        OnDiskIndex onDisk2 = new OnDiskIndex(index2, Int32Type.instance, new KeyConverter());
+
+        ByteBuffer number = Int32Type.instance.decompose(5);
+
+        Assert.assertEquals(Collections.singleton(data.get(number).left), convert(onDisk1.search(expressionFor(Operator.EQ, Int32Type.instance, number))));
+        Assert.assertEquals(Collections.singleton(data.get(number).left), convert(onDisk2.search(expressionFor(Operator.EQ, Int32Type.instance, number))));
+
+        Assert.assertEquals(onDisk1.descriptor.version.version, Descriptor.CURRENT_VERSION);
+        Assert.assertEquals(onDisk2.descriptor.version.version, Descriptor.VERSION_AA);
+    }
+
+    @Test
+    public void testSuperBlocks() throws Exception
+    {
+        Map<ByteBuffer, TokenTreeBuilder> terms = new HashMap<>();
+        terms.put(UTF8Type.instance.decompose("1234"), keyBuilder(1L, 2L));
+        terms.put(UTF8Type.instance.decompose("2345"), keyBuilder(3L, 4L));
+        terms.put(UTF8Type.instance.decompose("3456"), keyBuilder(5L, 6L));
+        terms.put(UTF8Type.instance.decompose("4567"), keyBuilder(7L, 8L));
+        terms.put(UTF8Type.instance.decompose("5678"), keyBuilder(9L, 10L));
+
+        OnDiskIndexBuilder builder = new OnDiskIndexBuilder(UTF8Type.instance, Int32Type.instance, OnDiskIndexBuilder.Mode.SPARSE);
+        for (Map.Entry<ByteBuffer, TokenTreeBuilder> entry : terms.entrySet())
+            addAll(builder, entry.getKey(), entry.getValue());
+
+        File index = File.createTempFile("on-disk-sa-try-superblocks", ".db");
+        index.deleteOnExit();
+
+        builder.finish(index);
+
+        OnDiskIndex onDisk = new OnDiskIndex(index, Int32Type.instance, new KeyConverter());
+        OnDiskIndex.OnDiskSuperBlock superBlock = onDisk.dataLevel.getSuperBlock(0);
+        Iterator<Token> iter = superBlock.iterator();
+
+        Long lastToken = null;
+        while (iter.hasNext())
+        {
+            Token token = iter.next();
+
+            if (lastToken != null)
+                Assert.assertTrue(lastToken.compareTo(token.get()) < 0);
+
+            lastToken = token.get();
+        }
+    }
+
+    @Test
+    public void testSuperBlockRetrieval() throws Exception
+    {
+        OnDiskIndexBuilder builder = new OnDiskIndexBuilder(UTF8Type.instance, LongType.instance, OnDiskIndexBuilder.Mode.SPARSE);
+        for (long i = 0; i < 100000; i++)
+            builder.add(LongType.instance.decompose(i), keyAt(i), i);
+
+        File index = File.createTempFile("on-disk-sa-multi-superblock-match", ".db");
+        index.deleteOnExit();
+
+        builder.finish(index);
+
+        OnDiskIndex onDiskIndex = new OnDiskIndex(index, LongType.instance, new KeyConverter());
+
+        testSearchRangeWithSuperBlocks(onDiskIndex, 0, 500);
+        testSearchRangeWithSuperBlocks(onDiskIndex, 300, 93456);
+        testSearchRangeWithSuperBlocks(onDiskIndex, 210, 1700);
+        testSearchRangeWithSuperBlocks(onDiskIndex, 530, 3200);
+
+        Random random = new Random(0xdeadbeef);
+        for (int i = 0; i < 100000; i += random.nextInt(1500)) // random steps with max of 1500 elements
+        {
+            for (int j = 0; j < 3; j++)
+                testSearchRangeWithSuperBlocks(onDiskIndex, i, ThreadLocalRandom.current().nextInt(i, 100000));
+        }
+    }
+
+    public void putAll(SortedMap<Long, LongSet> offsets, TokenTreeBuilder ttb)
+    {
+        for (Pair<Long, LongSet> entry : ttb)
+            offsets.put(entry.left, entry.right);
+    }
+
+    @Test
+    public void testCombiningOfThePartitionedSA() throws Exception
+    {
+        OnDiskIndexBuilder builderA = new OnDiskIndexBuilder(UTF8Type.instance, LongType.instance, OnDiskIndexBuilder.Mode.PREFIX);
+        OnDiskIndexBuilder builderB = new OnDiskIndexBuilder(UTF8Type.instance, LongType.instance, OnDiskIndexBuilder.Mode.PREFIX);
+
+        TreeMap<Long, TreeMap<Long, LongSet>> expected = new TreeMap<>();
+
+        for (long i = 0; i <= 100; i++)
+        {
+            TreeMap<Long, LongSet> offsets = expected.get(i);
+            if (offsets == null)
+                expected.put(i, (offsets = new TreeMap<>()));
+
+            builderA.add(LongType.instance.decompose(i), keyAt(i), i);
+            putAll(offsets, keyBuilder(i));
+        }
+
+        for (long i = 50; i < 100; i++)
+        {
+            TreeMap<Long, LongSet> offsets = expected.get(i);
+            if (offsets == null)
+                expected.put(i, (offsets = new TreeMap<>()));
+
+            long position = 100L + i;
+            builderB.add(LongType.instance.decompose(i), keyAt(position), position);
+            putAll(offsets, keyBuilder(100L + i));
+        }
+
+        File indexA = File.createTempFile("on-disk-sa-partition-a", ".db");
+        indexA.deleteOnExit();
+
+        File indexB = File.createTempFile("on-disk-sa-partition-b", ".db");
+        indexB.deleteOnExit();
+
+        builderA.finish(indexA);
+        builderB.finish(indexB);
+
+        OnDiskIndex a = new OnDiskIndex(indexA, LongType.instance, new KeyConverter());
+        OnDiskIndex b = new OnDiskIndex(indexB, LongType.instance, new KeyConverter());
+
+        RangeIterator<OnDiskIndex.DataTerm, CombinedTerm> union = OnDiskIndexIterator.union(a, b);
+
+        TreeMap<Long, TreeMap<Long, LongSet>> actual = new TreeMap<>();
+        while (union.hasNext())
+        {
+            CombinedTerm term = union.next();
+
+            Long composedTerm = LongType.instance.compose(term.getTerm());
+
+            TreeMap<Long, LongSet> offsets = actual.get(composedTerm);
+            if (offsets == null)
+                actual.put(composedTerm, (offsets = new TreeMap<>()));
+
+            putAll(offsets, term.getTokenTreeBuilder());
+        }
+
+        Assert.assertEquals(actual, expected);
+
+        File indexC = File.createTempFile("on-disk-sa-partition-final", ".db");
+        indexC.deleteOnExit();
+
+        OnDiskIndexBuilder combined = new OnDiskIndexBuilder(UTF8Type.instance, LongType.instance, OnDiskIndexBuilder.Mode.PREFIX);
+        combined.finish(Pair.create(keyAt(0).getKey(), keyAt(100).getKey()), indexC, new CombinedTermIterator(a, b));
+
+        OnDiskIndex c = new OnDiskIndex(indexC, LongType.instance, new KeyConverter());
+        union = OnDiskIndexIterator.union(c);
+        actual.clear();
+
+        while (union.hasNext())
+        {
+            CombinedTerm term = union.next();
+
+            Long composedTerm = LongType.instance.compose(term.getTerm());
+
+            TreeMap<Long, LongSet> offsets = actual.get(composedTerm);
+            if (offsets == null)
+                actual.put(composedTerm, (offsets = new TreeMap<>()));
+
+            putAll(offsets, term.getTokenTreeBuilder());
+        }
+
+        Assert.assertEquals(actual, expected);
+
+        a.close();
+        b.close();
+    }
+
+    @Test
+    public void testPrefixSearchWithCONTAINSMode() throws Exception
+    {
+        Map<ByteBuffer, TokenTreeBuilder> data = new HashMap<ByteBuffer, TokenTreeBuilder>()
+        {{
+
+            put(UTF8Type.instance.decompose("lady gaga"), keyBuilder(1L));
+
+            // Partial term for 'lady of bells'
+            DataOutputBuffer ladyOfBellsBuffer = new DataOutputBuffer();
+            ladyOfBellsBuffer.writeShort(UTF8Type.instance.decompose("lady of bells").remaining() | (1 << OnDiskIndexBuilder.IS_PARTIAL_BIT));
+            ladyOfBellsBuffer.write(UTF8Type.instance.decompose("lady of bells"));
+            put(ladyOfBellsBuffer.asNewBuffer(), keyBuilder(2L));
+
+
+            put(UTF8Type.instance.decompose("lady pank"),  keyBuilder(3L));
+        }};
+
+        OnDiskIndexBuilder builder = new OnDiskIndexBuilder(UTF8Type.instance, UTF8Type.instance, OnDiskIndexBuilder.Mode.CONTAINS);
+        for (Map.Entry<ByteBuffer, TokenTreeBuilder> e : data.entrySet())
+            addAll(builder, e.getKey(), e.getValue());
+
+        File index = File.createTempFile("on-disk-sa-prefix-contains-search", "db");
+        index.deleteOnExit();
+
+        builder.finish(index);
+
+        OnDiskIndex onDisk = new OnDiskIndex(index, UTF8Type.instance, new KeyConverter());
+
+        // check that lady% return lady gaga (1) and lady pank (3) but not lady of bells(2)
+        Assert.assertEquals(convert(1, 3), convert(onDisk.search(expressionFor("lady", Operator.LIKE_PREFIX))));
+
+        onDisk.close();
+    }
+
+    private void testSearchRangeWithSuperBlocks(OnDiskIndex onDiskIndex, long start, long end)
+    {
+        RangeIterator<Long, Token> tokens = onDiskIndex.search(expressionFor(start, true, end, false));
+
+        // no results should be produced only if range is empty
+        if (tokens == null)
+        {
+            Assert.assertEquals(0, end - start);
+            return;
+        }
+
+        int keyCount = 0;
+        Long lastToken = null;
+        while (tokens.hasNext())
+        {
+            Token token = tokens.next();
+            Iterator<DecoratedKey> keys = token.iterator();
+
+            // each of the values should have exactly a single key
+            Assert.assertTrue(keys.hasNext());
+            keys.next();
+            Assert.assertFalse(keys.hasNext());
+
+            // and it's last should always smaller than current
+            if (lastToken != null)
+                Assert.assertTrue("last should be less than current", lastToken.compareTo(token.get()) < 0);
+
+            lastToken = token.get();
+            keyCount++;
+        }
+
+        Assert.assertEquals(end - start, keyCount);
+    }
+
+    private static DecoratedKey keyAt(long rawKey)
+    {
+        ByteBuffer key = ByteBuffer.wrap(("key" + rawKey).getBytes());
+        return new BufferDecoratedKey(new Murmur3Partitioner.LongToken(MurmurHash.hash2_64(key, key.position(), key.remaining(), 0)), key);
+    }
+
+    private static TokenTreeBuilder keyBuilder(Long... keys)
+    {
+        TokenTreeBuilder builder = new DynamicTokenTreeBuilder();
+
+        for (final Long key : keys)
+        {
+            DecoratedKey dk = keyAt(key);
+            builder.add((Long) dk.getToken().getTokenValue(), key);
+        }
+
+        return builder.finish();
+    }
+
+    private static Set<DecoratedKey> convert(TokenTreeBuilder offsets)
+    {
+        Set<DecoratedKey> result = new HashSet<>();
+
+        Iterator<Pair<Long, LongSet>> offsetIter = offsets.iterator();
+        while (offsetIter.hasNext())
+        {
+            LongSet v = offsetIter.next().right;
+
+            for (LongCursor offset : v)
+                result.add(keyAt(offset.value));
+        }
+        return result;
+    }
+
+    private static Set<DecoratedKey> convert(long... keyOffsets)
+    {
+        Set<DecoratedKey> result = new HashSet<>();
+        for (long offset : keyOffsets)
+            result.add(keyAt(offset));
+
+        return result;
+    }
+
+    private static Set<DecoratedKey> convert(RangeIterator<Long, Token> results)
+    {
+        if (results == null)
+            return Collections.emptySet();
+
+        Set<DecoratedKey> keys = new TreeSet<>(DecoratedKey.comparator);
+
+        while (results.hasNext())
+        {
+            for (DecoratedKey key : results.next())
+                keys.add(key);
+        }
+
+        return keys;
+    }
+
+    private static Expression expressionFor(long lower, boolean lowerInclusive, long upper, boolean upperInclusive)
+    {
+        Expression expression = new Expression("", LongType.instance);
+        expression.add(lowerInclusive ? Operator.GTE : Operator.GT, LongType.instance.decompose(lower));
+        expression.add(upperInclusive ? Operator.LTE : Operator.LT, LongType.instance.decompose(upper));
+        return expression;
+    }
+
+    private static Expression expressionFor(AbstractType<?> validator, ByteBuffer term)
+    {
+        return expressionFor(Operator.LIKE_CONTAINS, validator, term);
+    }
+
+    private static Expression expressionFor(Operator op, AbstractType<?> validator, ByteBuffer term)
+    {
+        Expression expression = new Expression("", validator);
+        expression.add(op, term);
+        return expression;
+    }
+
+    private static Expression expressionForNot(AbstractType<?> validator, ByteBuffer lower, ByteBuffer upper, Iterable<ByteBuffer> terms)
+    {
+        Expression expression = new Expression("", validator);
+        expression.setOp(Expression.Op.RANGE);
+        expression.setLower(new Expression.Bound(lower, true));
+        expression.setUpper(new Expression.Bound(upper, true));
+        for (ByteBuffer term : terms)
+            expression.add(Operator.NEQ, term);
+        return expression;
+
+    }
+
+    private static Expression expressionForNot(Integer lower, Integer upper, Integer... terms)
+    {
+        return expressionForNot(Int32Type.instance,
+                Int32Type.instance.decompose(lower),
+                Int32Type.instance.decompose(upper),
+                Arrays.asList(terms).stream().map(Int32Type.instance::decompose).collect(Collectors.toList()));
+    }
+
+    private static Expression rangeWithExclusions(long lower, boolean lowerInclusive, long upper, boolean upperInclusive, Set<Long> exclusions)
+    {
+        Expression expression = expressionFor(lower, lowerInclusive, upper, upperInclusive);
+        for (long e : exclusions)
+            expression.add(Operator.NEQ, LongType.instance.decompose(e));
+
+        return expression;
+    }
+
+    private static Expression expressionForNot(String lower, String upper, String... terms)
+    {
+        return expressionForNot(UTF8Type.instance,
+                UTF8Type.instance.decompose(lower),
+                UTF8Type.instance.decompose(upper),
+                Arrays.asList(terms).stream().map(UTF8Type.instance::decompose).collect(Collectors.toList()));
+    }
+
+    private static Expression expressionFor(String term)
+    {
+        return expressionFor(term, Operator.LIKE_CONTAINS);
+    }
+
+    private static Expression expressionFor(String term, Operator op)
+    {
+        return expressionFor(op, UTF8Type.instance, UTF8Type.instance.decompose(term));
+    }
+
+    private static void addAll(OnDiskIndexBuilder builder, ByteBuffer term, TokenTreeBuilder tokens)
+    {
+        for (Pair<Long, LongSet> token : tokens)
+        {
+            for (long position : token.right.toArray())
+                builder.add(term, keyAt(position), position);
+        }
+    }
+
+    private static class KeyConverter implements Function<Long, DecoratedKey>
+    {
+        @Override
+        public DecoratedKey apply(Long offset)
+        {
+            return keyAt(offset);
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriterTest.java b/test/unit/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriterTest.java
new file mode 100644
index 0000000..f19d962
--- /dev/null
+++ b/test/unit/org/apache/cassandra/index/sasi/disk/PerSSTableIndexWriterTest.java

@@ -0,0 +1,251 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import java.io.File;
+import java.nio.ByteBuffer;
+import java.util.*;
+import java.util.concurrent.Callable;
+import java.util.concurrent.ThreadLocalRandom;
+
+import org.apache.cassandra.SchemaLoader;
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.db.Clustering;
+import org.apache.cassandra.db.ColumnFamilyStore;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.Keyspace;
+import org.apache.cassandra.db.compaction.OperationType;
+import org.apache.cassandra.db.marshal.LongType;
+import org.apache.cassandra.db.rows.BTreeRow;
+import org.apache.cassandra.db.rows.BufferCell;
+import org.apache.cassandra.db.rows.Row;
+import org.apache.cassandra.index.sasi.SASIIndex;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.db.marshal.Int32Type;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.io.FSError;
+import org.apache.cassandra.io.sstable.Descriptor;
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.schema.KeyspaceMetadata;
+import org.apache.cassandra.schema.KeyspaceParams;
+import org.apache.cassandra.schema.Tables;
+import org.apache.cassandra.service.MigrationManager;
+import org.apache.cassandra.utils.ByteBufferUtil;
+
+import com.google.common.util.concurrent.Futures;
+
+import org.junit.Assert;
+import org.junit.BeforeClass;
+import org.junit.Test;
+
+public class PerSSTableIndexWriterTest extends SchemaLoader
+{
+    private static final String KS_NAME = "sasi";
+    private static final String CF_NAME = "test_cf";
+
+    @BeforeClass
+    public static void loadSchema() throws ConfigurationException
+    {
+        System.setProperty("cassandra.config", "cassandra-murmur.yaml");
+        SchemaLoader.loadSchema();
+        MigrationManager.announceNewKeyspace(KeyspaceMetadata.create(KS_NAME,
+                                                                     KeyspaceParams.simpleTransient(1),
+                                                                     Tables.of(SchemaLoader.sasiCFMD(KS_NAME, CF_NAME))));
+    }
+
+    @Test
+    public void testPartialIndexWrites() throws Exception
+    {
+        final int maxKeys = 100000, numParts = 4, partSize = maxKeys / numParts;
+        final String keyFormat = "key%06d";
+        final long timestamp = System.currentTimeMillis();
+
+        ColumnFamilyStore cfs = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+        ColumnDefinition column = cfs.metadata.getColumnDefinition(UTF8Type.instance.decompose("age"));
+
+        SASIIndex sasi = (SASIIndex) cfs.indexManager.getIndexByName("age");
+
+        File directory = cfs.getDirectories().getDirectoryForNewSSTables();
+        Descriptor descriptor = Descriptor.fromFilename(cfs.getSSTablePath(directory));
+        PerSSTableIndexWriter indexWriter = (PerSSTableIndexWriter) sasi.getFlushObserver(descriptor, OperationType.FLUSH);
+
+        SortedMap<DecoratedKey, Row> expectedKeys = new TreeMap<>(DecoratedKey.comparator);
+
+        for (int i = 0; i < maxKeys; i++)
+        {
+            ByteBuffer key = ByteBufferUtil.bytes(String.format(keyFormat, i));
+            expectedKeys.put(cfs.metadata.partitioner.decorateKey(key),
+                             BTreeRow.singleCellRow(Clustering.EMPTY,
+                                                    BufferCell.live(column, timestamp, Int32Type.instance.decompose(i))));
+        }
+
+        indexWriter.begin();
+
+        Iterator<Map.Entry<DecoratedKey, Row>> keyIterator = expectedKeys.entrySet().iterator();
+        long position = 0;
+
+        Set<String> segments = new HashSet<>();
+        outer:
+        for (;;)
+        {
+            for (int i = 0; i < partSize; i++)
+            {
+                if (!keyIterator.hasNext())
+                    break outer;
+
+                Map.Entry<DecoratedKey, Row> key = keyIterator.next();
+
+                indexWriter.startPartition(key.getKey(), position++);
+                indexWriter.nextUnfilteredCluster(key.getValue());
+            }
+
+            PerSSTableIndexWriter.Index index = indexWriter.getIndex(column);
+
+            OnDiskIndex segment = index.scheduleSegmentFlush(false).call();
+            index.segments.add(Futures.immediateFuture(segment));
+            segments.add(segment.getIndexPath());
+        }
+
+        for (String segment : segments)
+            Assert.assertTrue(new File(segment).exists());
+
+        String indexFile = indexWriter.indexes.get(column).filename(true);
+
+        // final flush
+        indexWriter.complete();
+
+        for (String segment : segments)
+            Assert.assertFalse(new File(segment).exists());
+
+        OnDiskIndex index = new OnDiskIndex(new File(indexFile), Int32Type.instance, keyPosition -> {
+            ByteBuffer key = ByteBufferUtil.bytes(String.format(keyFormat, keyPosition));
+            return cfs.metadata.partitioner.decorateKey(key);
+        });
+
+        Assert.assertEquals(0, UTF8Type.instance.compare(index.minKey(), ByteBufferUtil.bytes(String.format(keyFormat, 0))));
+        Assert.assertEquals(0, UTF8Type.instance.compare(index.maxKey(), ByteBufferUtil.bytes(String.format(keyFormat, maxKeys - 1))));
+
+        Set<DecoratedKey> actualKeys = new HashSet<>();
+        int count = 0;
+        for (OnDiskIndex.DataTerm term : index)
+        {
+            RangeIterator<Long, Token> tokens = term.getTokens();
+
+            while (tokens.hasNext())
+            {
+                for (DecoratedKey key : tokens.next())
+                    actualKeys.add(key);
+            }
+
+            Assert.assertEquals(count++, (int) Int32Type.instance.compose(term.getTerm()));
+        }
+
+        Assert.assertEquals(expectedKeys.size(), actualKeys.size());
+        for (DecoratedKey key : expectedKeys.keySet())
+            Assert.assertTrue(actualKeys.contains(key));
+
+        FileUtils.closeQuietly(index);
+    }
+
+    @Test
+    public void testSparse() throws Exception
+    {
+        final String columnName = "timestamp";
+
+        ColumnFamilyStore cfs = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+        ColumnDefinition column = cfs.metadata.getColumnDefinition(UTF8Type.instance.decompose(columnName));
+
+        SASIIndex sasi = (SASIIndex) cfs.indexManager.getIndexByName(columnName);
+
+        File directory = cfs.getDirectories().getDirectoryForNewSSTables();
+        Descriptor descriptor = Descriptor.fromFilename(cfs.getSSTablePath(directory));
+        PerSSTableIndexWriter indexWriter = (PerSSTableIndexWriter) sasi.getFlushObserver(descriptor, OperationType.FLUSH);
+
+        final long now = System.currentTimeMillis();
+
+        indexWriter.begin();
+        indexWriter.indexes.put(column, indexWriter.newIndex(sasi.getIndex()));
+
+        populateSegment(cfs.metadata, indexWriter.getIndex(column), new HashMap<Long, Set<Integer>>()
+        {{
+            put(now,     new HashSet<>(Arrays.asList(0, 1)));
+            put(now + 1, new HashSet<>(Arrays.asList(2, 3)));
+            put(now + 2, new HashSet<>(Arrays.asList(4, 5, 6, 7, 8, 9)));
+        }});
+
+        Callable<OnDiskIndex> segmentBuilder = indexWriter.getIndex(column).scheduleSegmentFlush(false);
+
+        Assert.assertNull(segmentBuilder.call());
+
+        PerSSTableIndexWriter.Index index = indexWriter.getIndex(column);
+        Random random = ThreadLocalRandom.current();
+
+        Set<String> segments = new HashSet<>();
+        // now let's test multiple correct segments with yield incorrect final segment
+        for (int i = 0; i < 3; i++)
+        {
+            populateSegment(cfs.metadata, index, new HashMap<Long, Set<Integer>>()
+            {{
+                put(now,     new HashSet<>(Arrays.asList(random.nextInt(), random.nextInt(), random.nextInt())));
+                put(now + 1, new HashSet<>(Arrays.asList(random.nextInt(), random.nextInt(), random.nextInt())));
+                put(now + 2, new HashSet<>(Arrays.asList(random.nextInt(), random.nextInt(), random.nextInt())));
+            }});
+
+            try
+            {
+                // flush each of the new segments, they should all succeed
+                OnDiskIndex segment = index.scheduleSegmentFlush(false).call();
+                index.segments.add(Futures.immediateFuture(segment));
+                segments.add(segment.getIndexPath());
+            }
+            catch (Exception | FSError e)
+            {
+                e.printStackTrace();
+                Assert.fail();
+            }
+        }
+
+        // make sure that all of the segments are present of the filesystem
+        for (String segment : segments)
+            Assert.assertTrue(new File(segment).exists());
+
+        indexWriter.complete();
+
+        // make sure that individual segments have been cleaned up
+        for (String segment : segments)
+            Assert.assertFalse(new File(segment).exists());
+
+        // and combined index doesn't exist either
+        Assert.assertFalse(new File(index.outputFile).exists());
+    }
+
+    private static void populateSegment(CFMetaData metadata, PerSSTableIndexWriter.Index index, Map<Long, Set<Integer>> data)
+    {
+        for (Map.Entry<Long, Set<Integer>> value : data.entrySet())
+        {
+            ByteBuffer term = LongType.instance.decompose(value.getKey());
+            for (Integer keyPos : value.getValue())
+            {
+                ByteBuffer key = ByteBufferUtil.bytes(String.format("key%06d", keyPos));
+                index.add(term, metadata.partitioner.decorateKey(key), ThreadLocalRandom.current().nextInt(Integer.MAX_VALUE - 1));
+            }
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/index/sasi/disk/TokenTreeTest.java b/test/unit/org/apache/cassandra/index/sasi/disk/TokenTreeTest.java
new file mode 100644
index 0000000..b26bb44
--- /dev/null
+++ b/test/unit/org/apache/cassandra/index/sasi/disk/TokenTreeTest.java

@@ -0,0 +1,654 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.disk;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.*;
+
+import com.google.common.collect.Iterators;
+import com.google.common.collect.PeekingIterator;
+import org.apache.cassandra.db.BufferDecoratedKey;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.dht.Murmur3Partitioner;
+import org.apache.cassandra.index.sasi.disk.TokenTreeBuilder.EntryType;
+import org.apache.cassandra.index.sasi.utils.CombinedTerm;
+import org.apache.cassandra.index.sasi.utils.CombinedValue;
+import org.apache.cassandra.index.sasi.utils.MappedBuffer;
+import org.apache.cassandra.index.sasi.utils.RangeIterator;
+import org.apache.cassandra.db.marshal.LongType;
+import org.apache.cassandra.index.sasi.utils.RangeUnionIterator;
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.io.util.SequentialWriterOption;
+import org.apache.cassandra.utils.MurmurHash;
+import org.apache.cassandra.io.util.RandomAccessReader;
+import org.apache.cassandra.io.util.SequentialWriter;
+
+import junit.framework.Assert;
+import org.junit.Test;
+import org.apache.commons.lang3.builder.HashCodeBuilder;
+import com.carrotsearch.hppc.LongOpenHashSet;
+import com.carrotsearch.hppc.LongSet;
+import com.carrotsearch.hppc.cursors.LongCursor;
+import com.google.common.base.Function;
+
+public class TokenTreeTest
+{
+    private static final Function<Long, DecoratedKey> KEY_CONVERTER = new KeyConverter();
+
+    static LongSet singleOffset = new LongOpenHashSet() {{ add(1); }};
+    static LongSet bigSingleOffset = new LongOpenHashSet() {{ add(2147521562L); }};
+    static LongSet shortPackableCollision = new LongOpenHashSet() {{ add(2L); add(3L); }}; // can pack two shorts
+    static LongSet intPackableCollision = new LongOpenHashSet() {{ add(6L); add(((long) Short.MAX_VALUE) + 1); }}; // can pack int & short
+    static LongSet multiCollision =  new LongOpenHashSet() {{ add(3L); add(4L); add(5L); }}; // can't pack
+    static LongSet unpackableCollision = new LongOpenHashSet() {{ add(((long) Short.MAX_VALUE) + 1); add(((long) Short.MAX_VALUE) + 2); }}; // can't pack
+
+    final static SortedMap<Long, LongSet> simpleTokenMap = new TreeMap<Long, LongSet>()
+    {{
+            put(1L, bigSingleOffset); put(3L, shortPackableCollision); put(4L, intPackableCollision); put(6L, singleOffset);
+            put(9L, multiCollision); put(10L, unpackableCollision); put(12L, singleOffset); put(13L, singleOffset);
+            put(15L, singleOffset); put(16L, singleOffset); put(20L, singleOffset); put(22L, singleOffset);
+            put(25L, singleOffset); put(26L, singleOffset); put(27L, singleOffset); put(28L, singleOffset);
+            put(40L, singleOffset); put(50L, singleOffset); put(100L, singleOffset); put(101L, singleOffset);
+            put(102L, singleOffset); put(103L, singleOffset); put(108L, singleOffset); put(110L, singleOffset);
+            put(112L, singleOffset); put(115L, singleOffset); put(116L, singleOffset); put(120L, singleOffset);
+            put(121L, singleOffset); put(122L, singleOffset); put(123L, singleOffset); put(125L, singleOffset);
+    }};
+
+    final static SortedMap<Long, LongSet> bigTokensMap = new TreeMap<Long, LongSet>()
+    {{
+            for (long i = 0; i < 1000000; i++)
+                put(i, singleOffset);
+    }};
+
+    final static SortedMap<Long, LongSet> collidingTokensMap = new TreeMap<Long, LongSet>()
+    {{
+            put(1L, singleOffset); put(7L, singleOffset); put(8L, singleOffset);
+    }};
+
+    final static SortedMap<Long, LongSet> tokens = bigTokensMap;
+
+    final static SequentialWriterOption DEFAULT_OPT = SequentialWriterOption.newBuilder().bufferSize(4096).build();
+
+    @Test
+    public void testSerializedSizeDynamic() throws Exception
+    {
+        testSerializedSize(new DynamicTokenTreeBuilder(tokens));
+    }
+
+    @Test
+    public void testSerializedSizeStatic() throws Exception
+    {
+        testSerializedSize(new StaticTokenTreeBuilder(new FakeCombinedTerm(tokens)));
+    }
+
+
+    public void testSerializedSize(final TokenTreeBuilder builder) throws Exception
+    {
+        builder.finish();
+        final File treeFile = File.createTempFile("token-tree-size-test", "tt");
+        treeFile.deleteOnExit();
+
+        try (SequentialWriter writer = new SequentialWriter(treeFile, DEFAULT_OPT))
+        {
+            builder.write(writer);
+            writer.sync();
+        }
+
+        final RandomAccessReader reader = RandomAccessReader.open(treeFile);
+        Assert.assertEquals((int) reader.bytesRemaining(), builder.serializedSize());
+        reader.close();
+    }
+
+    @Test
+    public void buildSerializeAndIterateDynamic() throws Exception
+    {
+        buildSerializeAndIterate(new DynamicTokenTreeBuilder(simpleTokenMap), simpleTokenMap);
+    }
+
+    @Test
+    public void buildSerializeAndIterateStatic() throws Exception
+    {
+        buildSerializeAndIterate(new StaticTokenTreeBuilder(new FakeCombinedTerm(tokens)), tokens);
+    }
+
+
+    public void buildSerializeAndIterate(TokenTreeBuilder builder, SortedMap<Long, LongSet> tokenMap) throws Exception
+    {
+
+        builder.finish();
+        final File treeFile = File.createTempFile("token-tree-iterate-test1", "tt");
+        treeFile.deleteOnExit();
+
+        try (SequentialWriter writer = new SequentialWriter(treeFile, DEFAULT_OPT))
+        {
+            builder.write(writer);
+            writer.sync();
+        }
+
+        final RandomAccessReader reader = RandomAccessReader.open(treeFile);
+        final TokenTree tokenTree = new TokenTree(new MappedBuffer(reader));
+
+        final Iterator<Token> tokenIterator = tokenTree.iterator(KEY_CONVERTER);
+        final Iterator<Map.Entry<Long, LongSet>> listIterator = tokenMap.entrySet().iterator();
+        while (tokenIterator.hasNext() && listIterator.hasNext())
+        {
+            Token treeNext = tokenIterator.next();
+            Map.Entry<Long, LongSet> listNext = listIterator.next();
+
+            Assert.assertEquals(listNext.getKey(), treeNext.get());
+            Assert.assertEquals(convert(listNext.getValue()), convert(treeNext));
+        }
+
+        Assert.assertFalse("token iterator not finished", tokenIterator.hasNext());
+        Assert.assertFalse("list iterator not finished", listIterator.hasNext());
+
+        reader.close();
+    }
+
+    @Test
+    public void buildSerializeAndGetDynamic() throws Exception
+    {
+        buildSerializeAndGet(false);
+    }
+
+    @Test
+    public void buildSerializeAndGetStatic() throws Exception
+    {
+        buildSerializeAndGet(true);
+    }
+
+    public void buildSerializeAndGet(boolean isStatic) throws Exception
+    {
+        final long tokMin = 0;
+        final long tokMax = 1000;
+
+        final TokenTree tokenTree = generateTree(tokMin, tokMax, isStatic);
+
+        for (long i = 0; i <= tokMax; i++)
+        {
+            TokenTree.OnDiskToken result = tokenTree.get(i, KEY_CONVERTER);
+            Assert.assertNotNull("failed to find object for token " + i, result);
+
+            LongSet found = result.getOffsets();
+            Assert.assertEquals(1, found.size());
+            Assert.assertEquals(i, found.toArray()[0]);
+        }
+
+        Assert.assertNull("found missing object", tokenTree.get(tokMax + 10, KEY_CONVERTER));
+    }
+
+    @Test
+    public void buildSerializeIterateAndSkipDynamic() throws Exception
+    {
+        buildSerializeIterateAndSkip(new DynamicTokenTreeBuilder(tokens), tokens);
+    }
+
+    @Test
+    public void buildSerializeIterateAndSkipStatic() throws Exception
+    {
+        buildSerializeIterateAndSkip(new StaticTokenTreeBuilder(new FakeCombinedTerm(tokens)), tokens);
+    }
+
+    public void buildSerializeIterateAndSkip(TokenTreeBuilder builder, SortedMap<Long, LongSet> tokens) throws Exception
+    {
+        builder.finish();
+        final File treeFile = File.createTempFile("token-tree-iterate-test2", "tt");
+        treeFile.deleteOnExit();
+
+        try (SequentialWriter writer = new SequentialWriter(treeFile, DEFAULT_OPT))
+        {
+            builder.write(writer);
+            writer.sync();
+        }
+
+        final RandomAccessReader reader = RandomAccessReader.open(treeFile);
+        final TokenTree tokenTree = new TokenTree(new MappedBuffer(reader));
+
+        final RangeIterator<Long, Token> treeIterator = tokenTree.iterator(KEY_CONVERTER);
+        final RangeIterator<Long, TokenWithOffsets> listIterator = new EntrySetSkippableIterator(tokens);
+
+        long lastToken = 0L;
+        while (treeIterator.hasNext() && lastToken < 12)
+        {
+            Token treeNext = treeIterator.next();
+            TokenWithOffsets listNext = listIterator.next();
+
+            Assert.assertEquals(listNext.token, (lastToken = treeNext.get()));
+            Assert.assertEquals(convert(listNext.offsets), convert(treeNext));
+        }
+
+        treeIterator.skipTo(100548L);
+        listIterator.skipTo(100548L);
+
+        while (treeIterator.hasNext() && listIterator.hasNext())
+        {
+            Token treeNext = treeIterator.next();
+            TokenWithOffsets listNext = listIterator.next();
+
+            Assert.assertEquals(listNext.token, (long) treeNext.get());
+            Assert.assertEquals(convert(listNext.offsets), convert(treeNext));
+
+        }
+
+        Assert.assertFalse("Tree iterator not completed", treeIterator.hasNext());
+        Assert.assertFalse("List iterator not completed", listIterator.hasNext());
+
+        reader.close();
+    }
+
+    @Test
+    public void skipPastEndDynamic() throws Exception
+    {
+        skipPastEnd(new DynamicTokenTreeBuilder(simpleTokenMap), simpleTokenMap);
+    }
+
+    @Test
+    public void skipPastEndStatic() throws Exception
+    {
+        skipPastEnd(new StaticTokenTreeBuilder(new FakeCombinedTerm(simpleTokenMap)), simpleTokenMap);
+    }
+
+    public void skipPastEnd(TokenTreeBuilder builder, SortedMap<Long, LongSet> tokens) throws Exception
+    {
+        builder.finish();
+        final File treeFile = File.createTempFile("token-tree-skip-past-test", "tt");
+        treeFile.deleteOnExit();
+
+        try (SequentialWriter writer = new SequentialWriter(treeFile, DEFAULT_OPT))
+        {
+            builder.write(writer);
+            writer.sync();
+        }
+
+        final RandomAccessReader reader = RandomAccessReader.open(treeFile);
+        final RangeIterator<Long, Token> tokenTree = new TokenTree(new MappedBuffer(reader)).iterator(KEY_CONVERTER);
+
+        tokenTree.skipTo(tokens.lastKey() + 10);
+    }
+
+    @Test
+    public void testTokenMergeDyanmic() throws Exception
+    {
+        testTokenMerge(false);
+    }
+
+    @Test
+    public void testTokenMergeStatic() throws Exception
+    {
+        testTokenMerge(true);
+    }
+
+    public void testTokenMerge(boolean isStatic) throws Exception
+    {
+        final long min = 0, max = 1000;
+
+        // two different trees with the same offsets
+        TokenTree treeA = generateTree(min, max, isStatic);
+        TokenTree treeB = generateTree(min, max, isStatic);
+
+        RangeIterator<Long, Token> a = treeA.iterator(new KeyConverter());
+        RangeIterator<Long, Token> b = treeB.iterator(new KeyConverter());
+
+        long count = min;
+        while (a.hasNext() && b.hasNext())
+        {
+            final Token tokenA = a.next();
+            final Token tokenB = b.next();
+
+            // merging of two OnDiskToken
+            tokenA.merge(tokenB);
+            // merging with RAM Token with different offset
+            tokenA.merge(new TokenWithOffsets(tokenA.get(), convert(count + 1)));
+            // and RAM token with the same offset
+            tokenA.merge(new TokenWithOffsets(tokenA.get(), convert(count)));
+
+            // should fail when trying to merge different tokens
+            try
+            {
+                tokenA.merge(new TokenWithOffsets(tokenA.get() + 1, convert(count)));
+                Assert.fail();
+            }
+            catch (IllegalArgumentException e)
+            {
+                // expected
+            }
+
+            final Set<Long> offsets = new TreeSet<>();
+            for (DecoratedKey key : tokenA)
+                 offsets.add(LongType.instance.compose(key.getKey()));
+
+            Set<Long> expected = new TreeSet<>();
+            {
+                expected.add(count);
+                expected.add(count + 1);
+            }
+
+            Assert.assertEquals(expected, offsets);
+            count++;
+        }
+
+        Assert.assertEquals(max, count - 1);
+    }
+
+    @Test
+    public void testEntryTypeOrdinalLookup()
+    {
+        Assert.assertEquals(EntryType.SIMPLE, EntryType.of(EntryType.SIMPLE.ordinal()));
+        Assert.assertEquals(EntryType.PACKED, EntryType.of(EntryType.PACKED.ordinal()));
+        Assert.assertEquals(EntryType.FACTORED, EntryType.of(EntryType.FACTORED.ordinal()));
+        Assert.assertEquals(EntryType.OVERFLOW, EntryType.of(EntryType.OVERFLOW.ordinal()));
+    }
+
+    @Test
+    public void testMergingOfEqualTokenTrees() throws Exception
+    {
+        testMergingOfEqualTokenTrees(simpleTokenMap);
+        testMergingOfEqualTokenTrees(bigTokensMap);
+    }
+
+    public void testMergingOfEqualTokenTrees(SortedMap<Long, LongSet> tokensMap) throws Exception
+    {
+        TokenTreeBuilder tokensA = new DynamicTokenTreeBuilder(tokensMap);
+        TokenTreeBuilder tokensB = new DynamicTokenTreeBuilder(tokensMap);
+
+        TokenTree a = buildTree(tokensA);
+        TokenTree b = buildTree(tokensB);
+
+        TokenTreeBuilder tokensC = new StaticTokenTreeBuilder(new CombinedTerm(null, null)
+        {
+            public RangeIterator<Long, Token> getTokenIterator()
+            {
+                RangeIterator.Builder<Long, Token> union = RangeUnionIterator.builder();
+                union.add(a.iterator(new KeyConverter()));
+                union.add(b.iterator(new KeyConverter()));
+
+                return union.build();
+            }
+        });
+
+        TokenTree c = buildTree(tokensC);
+        Assert.assertEquals(tokensMap.size(), c.getCount());
+
+        Iterator<Token> tokenIterator = c.iterator(KEY_CONVERTER);
+        Iterator<Map.Entry<Long, LongSet>> listIterator = tokensMap.entrySet().iterator();
+        while (tokenIterator.hasNext() && listIterator.hasNext())
+        {
+            Token treeNext = tokenIterator.next();
+            Map.Entry<Long, LongSet> listNext = listIterator.next();
+
+            Assert.assertEquals(listNext.getKey(), treeNext.get());
+            Assert.assertEquals(convert(listNext.getValue()), convert(treeNext));
+        }
+
+        for (Map.Entry<Long, LongSet> entry : tokensMap.entrySet())
+        {
+            TokenTree.OnDiskToken result = c.get(entry.getKey(), KEY_CONVERTER);
+            Assert.assertNotNull("failed to find object for token " + entry.getKey(), result);
+
+            LongSet found = result.getOffsets();
+            Assert.assertEquals(entry.getValue(), found);
+
+        }
+    }
+
+
+    private static TokenTree buildTree(TokenTreeBuilder builder) throws Exception
+    {
+        builder.finish();
+        final File treeFile = File.createTempFile("token-tree-", "db");
+        treeFile.deleteOnExit();
+
+        try (SequentialWriter writer = new SequentialWriter(treeFile, DEFAULT_OPT))
+        {
+            builder.write(writer);
+            writer.sync();
+        }
+
+        final RandomAccessReader reader = RandomAccessReader.open(treeFile);
+        return new TokenTree(new MappedBuffer(reader));
+    }
+
+    private static class EntrySetSkippableIterator extends RangeIterator<Long, TokenWithOffsets>
+    {
+        private final PeekingIterator<Map.Entry<Long, LongSet>> elements;
+
+        EntrySetSkippableIterator(SortedMap<Long, LongSet> elms)
+        {
+            super(elms.firstKey(), elms.lastKey(), elms.size());
+            elements = Iterators.peekingIterator(elms.entrySet().iterator());
+        }
+
+        @Override
+        public TokenWithOffsets computeNext()
+        {
+            if (!elements.hasNext())
+                return endOfData();
+
+            Map.Entry<Long, LongSet> next = elements.next();
+            return new TokenWithOffsets(next.getKey(), next.getValue());
+        }
+
+        @Override
+        protected void performSkipTo(Long nextToken)
+        {
+            while (elements.hasNext())
+            {
+                if (Long.compare(elements.peek().getKey(), nextToken) >= 0)
+                {
+                    break;
+                }
+
+                elements.next();
+            }
+        }
+
+        @Override
+        public void close() throws IOException
+        {
+            // nothing to do here
+        }
+    }
+
+    public static class FakeCombinedTerm extends CombinedTerm
+    {
+        private final SortedMap<Long, LongSet> tokens;
+
+        public FakeCombinedTerm(SortedMap<Long, LongSet> tokens)
+        {
+            super(null, null);
+            this.tokens = tokens;
+        }
+
+        public RangeIterator<Long, Token> getTokenIterator()
+        {
+            return new TokenMapIterator(tokens);
+        }
+    }
+
+    public static class TokenMapIterator extends RangeIterator<Long, Token>
+    {
+        public final Iterator<Map.Entry<Long, LongSet>> iterator;
+
+        public TokenMapIterator(SortedMap<Long, LongSet> tokens)
+        {
+            super(tokens.firstKey(), tokens.lastKey(), tokens.size());
+            iterator = tokens.entrySet().iterator();
+        }
+
+        public Token computeNext()
+        {
+            if (!iterator.hasNext())
+                return endOfData();
+
+            Map.Entry<Long, LongSet> entry = iterator.next();
+            return new TokenWithOffsets(entry.getKey(), entry.getValue());
+        }
+
+        public void close() throws IOException
+        {
+
+        }
+
+        public void performSkipTo(Long next)
+        {
+            throw new UnsupportedOperationException();
+        }
+    }
+
+    public static class TokenWithOffsets extends Token
+    {
+        private final LongSet offsets;
+
+        public TokenWithOffsets(long token, final LongSet offsets)
+        {
+            super(token);
+            this.offsets = offsets;
+        }
+
+        @Override
+        public LongSet getOffsets()
+        {
+            return offsets;
+        }
+
+        @Override
+        public void merge(CombinedValue<Long> other)
+        {}
+
+        @Override
+        public int compareTo(CombinedValue<Long> o)
+        {
+            return Long.compare(token, o.get());
+        }
+
+        @Override
+        public boolean equals(Object other)
+        {
+            if (!(other instanceof TokenWithOffsets))
+                return false;
+
+            TokenWithOffsets o = (TokenWithOffsets) other;
+            return token == o.token && offsets.equals(o.offsets);
+        }
+
+        @Override
+        public int hashCode()
+        {
+            return new HashCodeBuilder().append(token).build();
+        }
+
+        @Override
+        public String toString()
+        {
+            return String.format("TokenValue(token: %d, offsets: %s)", token, offsets);
+        }
+
+        @Override
+        public Iterator<DecoratedKey> iterator()
+        {
+            List<DecoratedKey> keys = new ArrayList<>(offsets.size());
+            for (LongCursor offset : offsets)
+                 keys.add(dk(offset.value));
+
+            return keys.iterator();
+        }
+    }
+
+    private static Set<DecoratedKey> convert(LongSet offsets)
+    {
+        Set<DecoratedKey> keys = new HashSet<>();
+        for (LongCursor offset : offsets)
+            keys.add(KEY_CONVERTER.apply(offset.value));
+
+        return keys;
+    }
+
+    private static Set<DecoratedKey> convert(Token results)
+    {
+        Set<DecoratedKey> keys = new HashSet<>();
+        for (DecoratedKey key : results)
+            keys.add(key);
+
+        return keys;
+    }
+
+    private static LongSet convert(long... values)
+    {
+        LongSet result = new LongOpenHashSet(values.length);
+        for (long v : values)
+            result.add(v);
+
+        return result;
+    }
+
+    private static class KeyConverter implements Function<Long, DecoratedKey>
+    {
+        @Override
+        public DecoratedKey apply(Long offset)
+        {
+            return dk(offset);
+        }
+    }
+
+    private static DecoratedKey dk(Long token)
+    {
+        ByteBuffer buf = ByteBuffer.allocate(8);
+        buf.putLong(token);
+        buf.flip();
+        Long hashed = MurmurHash.hash2_64(buf, buf.position(), buf.remaining(), 0);
+        return new BufferDecoratedKey(new Murmur3Partitioner.LongToken(hashed), buf);
+    }
+
+    private static TokenTree generateTree(final long minToken, final long maxToken, boolean isStatic) throws IOException
+    {
+        final SortedMap<Long, LongSet> toks = new TreeMap<Long, LongSet>()
+        {{
+                for (long i = minToken; i <= maxToken; i++)
+                {
+                    LongSet offsetSet = new LongOpenHashSet();
+                    offsetSet.add(i);
+                    put(i, offsetSet);
+                }
+        }};
+
+        final TokenTreeBuilder builder = isStatic ? new StaticTokenTreeBuilder(new FakeCombinedTerm(toks)) : new DynamicTokenTreeBuilder(toks);
+        builder.finish();
+        final File treeFile = File.createTempFile("token-tree-get-test", "tt");
+        treeFile.deleteOnExit();
+
+        try (SequentialWriter writer = new SequentialWriter(treeFile, DEFAULT_OPT))
+        {
+            builder.write(writer);
+            writer.sync();
+        }
+
+        RandomAccessReader reader = null;
+
+        try
+        {
+            reader = RandomAccessReader.open(treeFile);
+            return new TokenTree(new MappedBuffer(reader));
+        }
+        finally
+        {
+            FileUtils.closeQuietly(reader);
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/index/sasi/plan/OperationTest.java b/test/unit/org/apache/cassandra/index/sasi/plan/OperationTest.java
new file mode 100644
index 0000000..e388cd4
--- /dev/null
+++ b/test/unit/org/apache/cassandra/index/sasi/plan/OperationTest.java

@@ -0,0 +1,706 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.plan;
+
+import java.nio.ByteBuffer;
+import java.util.*;
+import java.util.concurrent.TimeUnit;
+
+import com.google.common.collect.ListMultimap;
+import com.google.common.collect.Multimap;
+import com.google.common.collect.Sets;
+import org.apache.cassandra.SchemaLoader;
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.cql3.Operator;
+import org.apache.cassandra.db.*;
+import org.apache.cassandra.db.filter.RowFilter;
+import org.apache.cassandra.db.marshal.DoubleType;
+import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.index.sasi.plan.Operation.OperationType;
+import org.apache.cassandra.db.marshal.Int32Type;
+import org.apache.cassandra.db.marshal.LongType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.schema.KeyspaceMetadata;
+import org.apache.cassandra.schema.KeyspaceParams;
+import org.apache.cassandra.schema.Tables;
+import org.apache.cassandra.service.MigrationManager;
+import org.apache.cassandra.utils.FBUtilities;
+
+import org.junit.*;
+
+public class OperationTest extends SchemaLoader
+{
+    private static final String KS_NAME = "sasi";
+    private static final String CF_NAME = "test_cf";
+    private static final String CLUSTERING_CF_NAME = "clustering_test_cf";
+    private static final String STATIC_CF_NAME = "static_sasi_test_cf";
+
+    private static ColumnFamilyStore BACKEND;
+    private static ColumnFamilyStore CLUSTERING_BACKEND;
+    private static ColumnFamilyStore STATIC_BACKEND;
+
+    @BeforeClass
+    public static void loadSchema() throws ConfigurationException
+    {
+        System.setProperty("cassandra.config", "cassandra-murmur.yaml");
+        SchemaLoader.loadSchema();
+        MigrationManager.announceNewKeyspace(KeyspaceMetadata.create(KS_NAME,
+                                                                     KeyspaceParams.simpleTransient(1),
+                                                                     Tables.of(SchemaLoader.sasiCFMD(KS_NAME, CF_NAME),
+                                                                               SchemaLoader.clusteringSASICFMD(KS_NAME, CLUSTERING_CF_NAME),
+                                                                               SchemaLoader.staticSASICFMD(KS_NAME, STATIC_CF_NAME))));
+
+        BACKEND = Keyspace.open(KS_NAME).getColumnFamilyStore(CF_NAME);
+        CLUSTERING_BACKEND = Keyspace.open(KS_NAME).getColumnFamilyStore(CLUSTERING_CF_NAME);
+        STATIC_BACKEND = Keyspace.open(KS_NAME).getColumnFamilyStore(STATIC_CF_NAME);
+    }
+
+    private QueryController controller;
+
+    @Before
+    public void beforeTest()
+    {
+        controller = new QueryController(BACKEND,
+                                         PartitionRangeReadCommand.allDataRead(BACKEND.metadata, FBUtilities.nowInSeconds()),
+                                         TimeUnit.SECONDS.toMillis(10));
+    }
+
+    @After
+    public void afterTest()
+    {
+        controller.finish();
+    }
+
+    @Test
+    public void testAnalyze() throws Exception
+    {
+        final ColumnDefinition firstName = getColumn(UTF8Type.instance.decompose("first_name"));
+        final ColumnDefinition age = getColumn(UTF8Type.instance.decompose("age"));
+        final ColumnDefinition comment = getColumn(UTF8Type.instance.decompose("comment"));
+
+        // age != 5 AND age > 1 AND age != 6 AND age <= 10
+        Map<Expression.Op, Expression> expressions = convert(Operation.analyzeGroup(controller, OperationType.AND,
+                                                                                Arrays.asList(new SimpleExpression(age, Operator.NEQ, Int32Type.instance.decompose(5)),
+                                                                                              new SimpleExpression(age, Operator.GT, Int32Type.instance.decompose(1)),
+                                                                                              new SimpleExpression(age, Operator.NEQ, Int32Type.instance.decompose(6)),
+                                                                                              new SimpleExpression(age, Operator.LTE, Int32Type.instance.decompose(10)))));
+
+        Expression expected = new Expression("age", Int32Type.instance)
+        {{
+            operation = Op.RANGE;
+            lower = new Bound(Int32Type.instance.decompose(1), false);
+            upper = new Bound(Int32Type.instance.decompose(10), true);
+
+            exclusions.add(Int32Type.instance.decompose(5));
+            exclusions.add(Int32Type.instance.decompose(6));
+        }};
+
+        Assert.assertEquals(1, expressions.size());
+        Assert.assertEquals(expected, expressions.get(Expression.Op.RANGE));
+
+        // age != 5 OR age >= 7
+        expressions = convert(Operation.analyzeGroup(controller, OperationType.OR,
+                                                    Arrays.asList(new SimpleExpression(age, Operator.NEQ, Int32Type.instance.decompose(5)),
+                                                                  new SimpleExpression(age, Operator.GTE, Int32Type.instance.decompose(7)))));
+        Assert.assertEquals(2, expressions.size());
+
+        Assert.assertEquals(new Expression("age", Int32Type.instance)
+                            {{
+                                    operation = Op.NOT_EQ;
+                                    lower = new Bound(Int32Type.instance.decompose(5), true);
+                                    upper = lower;
+                            }}, expressions.get(Expression.Op.NOT_EQ));
+
+        Assert.assertEquals(new Expression("age", Int32Type.instance)
+                            {{
+                                    operation = Op.RANGE;
+                                    lower = new Bound(Int32Type.instance.decompose(7), true);
+                            }}, expressions.get(Expression.Op.RANGE));
+
+        // age != 5 OR age < 7
+        expressions = convert(Operation.analyzeGroup(controller, OperationType.OR,
+                                                    Arrays.asList(new SimpleExpression(age, Operator.NEQ, Int32Type.instance.decompose(5)),
+                                                                  new SimpleExpression(age, Operator.LT, Int32Type.instance.decompose(7)))));
+
+        Assert.assertEquals(2, expressions.size());
+        Assert.assertEquals(new Expression("age", Int32Type.instance)
+                            {{
+                                    operation = Op.RANGE;
+                                    upper = new Bound(Int32Type.instance.decompose(7), false);
+                            }}, expressions.get(Expression.Op.RANGE));
+        Assert.assertEquals(new Expression("age", Int32Type.instance)
+                            {{
+                                    operation = Op.NOT_EQ;
+                                    lower = new Bound(Int32Type.instance.decompose(5), true);
+                                    upper = lower;
+                            }}, expressions.get(Expression.Op.NOT_EQ));
+
+        // age > 1 AND age < 7
+        expressions = convert(Operation.analyzeGroup(controller, OperationType.AND,
+                                                    Arrays.asList(new SimpleExpression(age, Operator.GT, Int32Type.instance.decompose(1)),
+                                                                  new SimpleExpression(age, Operator.LT, Int32Type.instance.decompose(7)))));
+
+        Assert.assertEquals(1, expressions.size());
+        Assert.assertEquals(new Expression("age", Int32Type.instance)
+                            {{
+                                    operation = Op.RANGE;
+                                    lower = new Bound(Int32Type.instance.decompose(1), false);
+                                    upper = new Bound(Int32Type.instance.decompose(7), false);
+                            }}, expressions.get(Expression.Op.RANGE));
+
+        // first_name = 'a' OR first_name != 'b'
+        expressions = convert(Operation.analyzeGroup(controller, OperationType.OR,
+                                                    Arrays.asList(new SimpleExpression(firstName, Operator.EQ, UTF8Type.instance.decompose("a")),
+                                                                  new SimpleExpression(firstName, Operator.NEQ, UTF8Type.instance.decompose("b")))));
+
+        Assert.assertEquals(2, expressions.size());
+        Assert.assertEquals(new Expression("first_name", UTF8Type.instance)
+                            {{
+                                    operation = Op.NOT_EQ;
+                                    lower = new Bound(UTF8Type.instance.decompose("b"), true);
+                                    upper = lower;
+                            }}, expressions.get(Expression.Op.NOT_EQ));
+        Assert.assertEquals(new Expression("first_name", UTF8Type.instance)
+                            {{
+                                    operation = Op.EQ;
+                                    lower = upper = new Bound(UTF8Type.instance.decompose("a"), true);
+                            }}, expressions.get(Expression.Op.EQ));
+
+        // comment = 'soft eng' and comment != 'likes do'
+        ListMultimap<ColumnDefinition, Expression> e = Operation.analyzeGroup(controller, OperationType.OR,
+                                                    Arrays.asList(new SimpleExpression(comment, Operator.LIKE_MATCHES, UTF8Type.instance.decompose("soft eng")),
+                                                                  new SimpleExpression(comment, Operator.NEQ, UTF8Type.instance.decompose("likes do"))));
+
+        List<Expression> expectedExpressions = new ArrayList<Expression>(2)
+        {{
+                add(new Expression("comment", UTF8Type.instance)
+                {{
+                        operation = Op.MATCH;
+                        lower = new Bound(UTF8Type.instance.decompose("soft"), true);
+                        upper = lower;
+                }});
+
+                add(new Expression("comment", UTF8Type.instance)
+                {{
+                        operation = Op.MATCH;
+                        lower = new Bound(UTF8Type.instance.decompose("eng"), true);
+                        upper = lower;
+                }});
+
+                add(new Expression("comment", UTF8Type.instance)
+                {{
+                        operation = Op.NOT_EQ;
+                        lower = new Bound(UTF8Type.instance.decompose("likes"), true);
+                        upper = lower;
+                }});
+
+                add(new Expression("comment", UTF8Type.instance)
+                {{
+                        operation = Op.NOT_EQ;
+                        lower = new Bound(UTF8Type.instance.decompose("do"), true);
+                        upper = lower;
+                }});
+        }};
+
+        Assert.assertEquals(expectedExpressions, e.get(comment));
+
+        // first_name = 'j' and comment != 'likes do'
+        e = Operation.analyzeGroup(controller, OperationType.OR,
+                        Arrays.asList(new SimpleExpression(comment, Operator.NEQ, UTF8Type.instance.decompose("likes do")),
+                                      new SimpleExpression(firstName, Operator.EQ, UTF8Type.instance.decompose("j"))));
+
+        expectedExpressions = new ArrayList<Expression>(2)
+        {{
+                add(new Expression("comment", UTF8Type.instance)
+                {{
+                        operation = Op.NOT_EQ;
+                        lower = new Bound(UTF8Type.instance.decompose("likes"), true);
+                        upper = lower;
+                }});
+
+                add(new Expression("comment", UTF8Type.instance)
+                {{
+                        operation = Op.NOT_EQ;
+                        lower = new Bound(UTF8Type.instance.decompose("do"), true);
+                        upper = lower;
+                }});
+        }};
+
+        Assert.assertEquals(expectedExpressions, e.get(comment));
+
+        // age != 27 first_name = 'j' and age != 25
+        e = Operation.analyzeGroup(controller, OperationType.OR,
+                        Arrays.asList(new SimpleExpression(age, Operator.NEQ, Int32Type.instance.decompose(27)),
+                                      new SimpleExpression(firstName, Operator.EQ, UTF8Type.instance.decompose("j")),
+                                      new SimpleExpression(age, Operator.NEQ, Int32Type.instance.decompose(25))));
+
+        expectedExpressions = new ArrayList<Expression>(2)
+        {{
+                add(new Expression("age", Int32Type.instance)
+                {{
+                        operation = Op.NOT_EQ;
+                        lower = new Bound(Int32Type.instance.decompose(27), true);
+                        upper = lower;
+                }});
+
+                add(new Expression("age", Int32Type.instance)
+                {{
+                        operation = Op.NOT_EQ;
+                        lower = new Bound(Int32Type.instance.decompose(25), true);
+                        upper = lower;
+                }});
+        }};
+
+        Assert.assertEquals(expectedExpressions, e.get(age));
+    }
+
+    @Test
+    public void testSatisfiedBy() throws Exception
+    {
+        final ColumnDefinition timestamp = getColumn(UTF8Type.instance.decompose("timestamp"));
+        final ColumnDefinition age = getColumn(UTF8Type.instance.decompose("age"));
+
+        Operation.Builder builder = new Operation.Builder(OperationType.AND, controller, new SimpleExpression(age, Operator.NEQ, Int32Type.instance.decompose(5)));
+        Operation op = builder.complete();
+
+        Unfiltered row = buildRow(buildCell(age, Int32Type.instance.decompose(6), System.currentTimeMillis()));
+        Row staticRow = buildRow(Clustering.STATIC_CLUSTERING);
+
+        Assert.assertTrue(op.satisfiedBy(row, staticRow, false));
+
+        row = buildRow(buildCell(age, Int32Type.instance.decompose(5), System.currentTimeMillis()));
+
+        // and reject incorrect value
+        Assert.assertFalse(op.satisfiedBy(row, staticRow, false));
+
+        row = buildRow(buildCell(age, Int32Type.instance.decompose(6), System.currentTimeMillis()));
+
+        Assert.assertTrue(op.satisfiedBy(row, staticRow, false));
+
+        // range with exclusions - age != 5 AND age > 1 AND age != 6 AND age <= 10
+        builder = new Operation.Builder(OperationType.AND, controller,
+                                        new SimpleExpression(age, Operator.NEQ, Int32Type.instance.decompose(5)),
+                                        new SimpleExpression(age, Operator.GT, Int32Type.instance.decompose(1)),
+                                        new SimpleExpression(age, Operator.NEQ, Int32Type.instance.decompose(6)),
+                                        new SimpleExpression(age, Operator.LTE, Int32Type.instance.decompose(10)));
+        op = builder.complete();
+
+        Set<Integer> exclusions = Sets.newHashSet(0, 1, 5, 6, 11);
+        for (int i = 0; i <= 11; i++)
+        {
+            row = buildRow(buildCell(age, Int32Type.instance.decompose(i), System.currentTimeMillis()));
+
+            boolean result = op.satisfiedBy(row, staticRow, false);
+            Assert.assertTrue(exclusions.contains(i) != result);
+        }
+
+        // now let's do something more complex - age = 5 OR age = 6
+        builder = new Operation.Builder(OperationType.OR, controller,
+                                        new SimpleExpression(age, Operator.EQ, Int32Type.instance.decompose(5)),
+                                        new SimpleExpression(age, Operator.EQ, Int32Type.instance.decompose(6)));
+
+        op = builder.complete();
+
+        exclusions = Sets.newHashSet(0, 1, 2, 3, 4, 7, 8, 9, 10);
+        for (int i = 0; i <= 10; i++)
+        {
+            row = buildRow(buildCell(age, Int32Type.instance.decompose(i), System.currentTimeMillis()));
+
+            boolean result = op.satisfiedBy(row, staticRow, false);
+            Assert.assertTrue(exclusions.contains(i) != result);
+        }
+
+        // now let's test aggregated AND commands
+        builder = new Operation.Builder(OperationType.AND, controller);
+
+        // logical should be ignored by analyzer, but we still what to make sure that it is
+        //IndexExpression logical = new IndexExpression(ByteBufferUtil.EMPTY_BYTE_BUFFER, IndexOperator.EQ, ByteBufferUtil.EMPTY_BYTE_BUFFER);
+        //logical.setLogicalOp(LogicalIndexOperator.AND);
+
+        //builder.add(logical);
+        builder.add(new SimpleExpression(age, Operator.GTE, Int32Type.instance.decompose(0)));
+        builder.add(new SimpleExpression(age, Operator.LT, Int32Type.instance.decompose(10)));
+        builder.add(new SimpleExpression(age, Operator.NEQ, Int32Type.instance.decompose(7)));
+
+        op = builder.complete();
+
+        exclusions = Sets.newHashSet(7);
+        for (int i = 0; i < 10; i++)
+        {
+            row = buildRow(buildCell(age, Int32Type.instance.decompose(i), System.currentTimeMillis()));
+
+            boolean result = op.satisfiedBy(row, staticRow, false);
+            Assert.assertTrue(exclusions.contains(i) != result);
+        }
+
+        // multiple analyzed expressions in the Operation timestamp >= 10 AND age = 5
+        builder = new Operation.Builder(OperationType.AND, controller);
+        builder.add(new SimpleExpression(timestamp, Operator.GTE, LongType.instance.decompose(10L)));
+        builder.add(new SimpleExpression(age, Operator.EQ, Int32Type.instance.decompose(5)));
+
+        op = builder.complete();
+
+        row = buildRow(buildCell(age, Int32Type.instance.decompose(6), System.currentTimeMillis()),
+                                  buildCell(timestamp, LongType.instance.decompose(11L), System.currentTimeMillis()));
+
+        Assert.assertFalse(op.satisfiedBy(row, staticRow, false));
+
+        row = buildRow(buildCell(age, Int32Type.instance.decompose(5), System.currentTimeMillis()),
+                                  buildCell(timestamp, LongType.instance.decompose(22L), System.currentTimeMillis()));
+
+        Assert.assertTrue(op.satisfiedBy(row, staticRow, false));
+
+        row = buildRow(buildCell(age, Int32Type.instance.decompose(5), System.currentTimeMillis()),
+                                  buildCell(timestamp, LongType.instance.decompose(9L), System.currentTimeMillis()));
+
+        Assert.assertFalse(op.satisfiedBy(row, staticRow, false));
+
+        // operation with internal expressions and right child
+        builder = new Operation.Builder(OperationType.OR, controller,
+                                        new SimpleExpression(timestamp, Operator.GT, LongType.instance.decompose(10L)));
+        builder.setRight(new Operation.Builder(OperationType.AND, controller,
+                                               new SimpleExpression(age, Operator.GT, Int32Type.instance.decompose(0)),
+                                               new SimpleExpression(age, Operator.LT, Int32Type.instance.decompose(10))));
+        op = builder.complete();
+
+        row = buildRow(buildCell(age, Int32Type.instance.decompose(5), System.currentTimeMillis()),
+                                  buildCell(timestamp, LongType.instance.decompose(9L), System.currentTimeMillis()));
+
+        Assert.assertTrue(op.satisfiedBy(row, staticRow, false));
+
+        row = buildRow(buildCell(age, Int32Type.instance.decompose(20), System.currentTimeMillis()),
+                                  buildCell(timestamp, LongType.instance.decompose(11L), System.currentTimeMillis()));
+
+        Assert.assertTrue(op.satisfiedBy(row, staticRow, false));
+
+        row = buildRow(buildCell(age, Int32Type.instance.decompose(0), System.currentTimeMillis()),
+                                  buildCell(timestamp, LongType.instance.decompose(9L), System.currentTimeMillis()));
+
+        Assert.assertFalse(op.satisfiedBy(row, staticRow, false));
+
+        // and for desert let's try out null and deleted rows etc.
+        builder = new Operation.Builder(OperationType.AND, controller);
+        builder.add(new SimpleExpression(age, Operator.EQ, Int32Type.instance.decompose(30)));
+        op = builder.complete();
+
+        Assert.assertFalse(op.satisfiedBy(null, staticRow, false));
+        Assert.assertFalse(op.satisfiedBy(row, null, false));
+        Assert.assertFalse(op.satisfiedBy(row, staticRow, false));
+
+        long now = System.currentTimeMillis();
+
+        row = OperationTest.buildRow(
+                Row.Deletion.regular(new DeletionTime(now - 10, (int) (now / 1000))),
+                          buildCell(age, Int32Type.instance.decompose(6), System.currentTimeMillis()));
+
+        Assert.assertFalse(op.satisfiedBy(row, staticRow, false));
+
+        row = buildRow(deletedCell(age, System.currentTimeMillis(), FBUtilities.nowInSeconds()));
+
+        Assert.assertFalse(op.satisfiedBy(row, staticRow, true));
+
+        try
+        {
+            Assert.assertFalse(op.satisfiedBy(buildRow(), staticRow, false));
+        }
+        catch (IllegalStateException e)
+        {
+            // expected
+        }
+
+        try
+        {
+            Assert.assertFalse(op.satisfiedBy(buildRow(), staticRow, true));
+        }
+        catch (IllegalStateException e)
+        {
+            Assert.fail("IllegalStateException should not be thrown when missing column and allowMissingColumns=true");
+        }
+    }
+
+    @Test
+    public void testAnalyzeNotIndexedButDefinedColumn() throws Exception
+    {
+        final ColumnDefinition firstName = getColumn(UTF8Type.instance.decompose("first_name"));
+        final ColumnDefinition height = getColumn(UTF8Type.instance.decompose("height"));
+
+        // first_name = 'a' AND height != 10
+        Map<Expression.Op, Expression> expressions;
+        expressions = convert(Operation.analyzeGroup(controller, OperationType.AND,
+                Arrays.asList(new SimpleExpression(firstName, Operator.EQ, UTF8Type.instance.decompose("a")),
+                              new SimpleExpression(height, Operator.NEQ, Int32Type.instance.decompose(5)))));
+
+        Assert.assertEquals(2, expressions.size());
+
+        Assert.assertEquals(new Expression("height", Int32Type.instance)
+        {{
+                operation = Op.NOT_EQ;
+                lower = new Bound(Int32Type.instance.decompose(5), true);
+                upper = lower;
+        }}, expressions.get(Expression.Op.NOT_EQ));
+
+        expressions = convert(Operation.analyzeGroup(controller, OperationType.AND,
+                Arrays.asList(new SimpleExpression(firstName, Operator.EQ, UTF8Type.instance.decompose("a")),
+                              new SimpleExpression(height, Operator.GT, Int32Type.instance.decompose(0)),
+                              new SimpleExpression(height, Operator.NEQ, Int32Type.instance.decompose(5)))));
+
+        Assert.assertEquals(2, expressions.size());
+
+        Assert.assertEquals(new Expression("height", Int32Type.instance)
+        {{
+            operation = Op.RANGE;
+            lower = new Bound(Int32Type.instance.decompose(0), false);
+            exclusions.add(Int32Type.instance.decompose(5));
+        }}, expressions.get(Expression.Op.RANGE));
+
+        expressions = convert(Operation.analyzeGroup(controller, OperationType.AND,
+                Arrays.asList(new SimpleExpression(firstName, Operator.EQ, UTF8Type.instance.decompose("a")),
+                              new SimpleExpression(height, Operator.NEQ, Int32Type.instance.decompose(5)),
+                              new SimpleExpression(height, Operator.GTE, Int32Type.instance.decompose(0)),
+                              new SimpleExpression(height, Operator.LT, Int32Type.instance.decompose(10)))));
+
+        Assert.assertEquals(2, expressions.size());
+
+        Assert.assertEquals(new Expression("height", Int32Type.instance)
+        {{
+                operation = Op.RANGE;
+                lower = new Bound(Int32Type.instance.decompose(0), true);
+                upper = new Bound(Int32Type.instance.decompose(10), false);
+                exclusions.add(Int32Type.instance.decompose(5));
+        }}, expressions.get(Expression.Op.RANGE));
+    }
+
+    @Test
+    public void testSatisfiedByWithMultipleTerms()
+    {
+        final ColumnDefinition comment = getColumn(UTF8Type.instance.decompose("comment"));
+
+        Unfiltered row = buildRow(buildCell(comment,UTF8Type.instance.decompose("software engineer is working on a project"),System.currentTimeMillis()));
+        Row staticRow = buildRow(Clustering.STATIC_CLUSTERING);
+
+        Operation.Builder builder = new Operation.Builder(OperationType.AND, controller,
+                                            new SimpleExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("eng is a work")));
+        Operation op = builder.complete();
+
+        Assert.assertTrue(op.satisfiedBy(row, staticRow, false));
+
+        builder = new Operation.Builder(OperationType.AND, controller,
+                                            new SimpleExpression(comment, Operator.LIKE_CONTAINS, UTF8Type.instance.decompose("soft works fine")));
+        op = builder.complete();
+
+        Assert.assertTrue(op.satisfiedBy(row, staticRow, false));
+    }
+
+    @Test
+    public void testSatisfiedByWithClustering()
+    {
+        ColumnDefinition location = getColumn(CLUSTERING_BACKEND, UTF8Type.instance.decompose("location"));
+        ColumnDefinition age = getColumn(CLUSTERING_BACKEND, UTF8Type.instance.decompose("age"));
+        ColumnDefinition height = getColumn(CLUSTERING_BACKEND, UTF8Type.instance.decompose("height"));
+        ColumnDefinition score = getColumn(CLUSTERING_BACKEND, UTF8Type.instance.decompose("score"));
+
+        Unfiltered row = buildRow(Clustering.make(UTF8Type.instance.fromString("US"), Int32Type.instance.decompose(27)),
+                                  buildCell(height, Int32Type.instance.decompose(182), System.currentTimeMillis()),
+                                  buildCell(score, DoubleType.instance.decompose(1.0d), System.currentTimeMillis()));
+        Row staticRow = buildRow(Clustering.STATIC_CLUSTERING);
+
+        Operation.Builder builder = new Operation.Builder(OperationType.AND, controller);
+        builder.add(new SimpleExpression(age, Operator.EQ, Int32Type.instance.decompose(27)));
+        builder.add(new SimpleExpression(height, Operator.EQ, Int32Type.instance.decompose(182)));
+
+        Assert.assertTrue(builder.complete().satisfiedBy(row, staticRow, false));
+
+        builder = new Operation.Builder(OperationType.AND, controller);
+
+        builder.add(new SimpleExpression(age, Operator.EQ, Int32Type.instance.decompose(28)));
+        builder.add(new SimpleExpression(height, Operator.EQ, Int32Type.instance.decompose(182)));
+
+        Assert.assertFalse(builder.complete().satisfiedBy(row, staticRow, false));
+
+        builder = new Operation.Builder(OperationType.AND, controller);
+        builder.add(new SimpleExpression(location, Operator.EQ, UTF8Type.instance.decompose("US")));
+        builder.add(new SimpleExpression(age, Operator.GTE, Int32Type.instance.decompose(27)));
+
+        Assert.assertTrue(builder.complete().satisfiedBy(row, staticRow, false));
+
+        builder = new Operation.Builder(OperationType.AND, controller);
+        builder.add(new SimpleExpression(location, Operator.EQ, UTF8Type.instance.decompose("BY")));
+        builder.add(new SimpleExpression(age, Operator.GTE, Int32Type.instance.decompose(28)));
+
+        Assert.assertFalse(builder.complete().satisfiedBy(row, staticRow, false));
+
+        builder = new Operation.Builder(OperationType.AND, controller);
+        builder.add(new SimpleExpression(location, Operator.EQ, UTF8Type.instance.decompose("US")));
+        builder.add(new SimpleExpression(age, Operator.LTE, Int32Type.instance.decompose(27)));
+        builder.add(new SimpleExpression(height, Operator.GTE, Int32Type.instance.decompose(182)));
+
+        Assert.assertTrue(builder.complete().satisfiedBy(row, staticRow, false));
+
+        builder = new Operation.Builder(OperationType.AND, controller);
+        builder.add(new SimpleExpression(location, Operator.EQ, UTF8Type.instance.decompose("US")));
+        builder.add(new SimpleExpression(height, Operator.GTE, Int32Type.instance.decompose(182)));
+        builder.add(new SimpleExpression(score, Operator.EQ, DoubleType.instance.decompose(1.0d)));
+
+        Assert.assertTrue(builder.complete().satisfiedBy(row, staticRow, false));
+
+        builder = new Operation.Builder(OperationType.AND, controller);
+        builder.add(new SimpleExpression(height, Operator.GTE, Int32Type.instance.decompose(182)));
+        builder.add(new SimpleExpression(score, Operator.EQ, DoubleType.instance.decompose(1.0d)));
+
+        Assert.assertTrue(builder.complete().satisfiedBy(row, staticRow, false));
+    }
+
+    private Map<Expression.Op, Expression> convert(Multimap<ColumnDefinition, Expression> expressions)
+    {
+        Map<Expression.Op, Expression> converted = new HashMap<>();
+        for (Expression expression : expressions.values())
+        {
+            Expression column = converted.get(expression.getOp());
+            assert column == null; // sanity check
+            converted.put(expression.getOp(), expression);
+        }
+
+        return converted;
+    }
+
+    @Test
+    public void testSatisfiedByWithStatic()
+    {
+        final ColumnDefinition sensorType = getColumn(STATIC_BACKEND, UTF8Type.instance.decompose("sensor_type"));
+        final ColumnDefinition value = getColumn(STATIC_BACKEND, UTF8Type.instance.decompose("value"));
+
+        Unfiltered row = buildRow(Clustering.make(UTF8Type.instance.fromString("date"), LongType.instance.decompose(20160401L)),
+                          buildCell(value, DoubleType.instance.decompose(24.56), System.currentTimeMillis()));
+        Row staticRow = buildRow(Clustering.STATIC_CLUSTERING,
+                         buildCell(sensorType, UTF8Type.instance.decompose("TEMPERATURE"), System.currentTimeMillis()));
+
+        // sensor_type ='TEMPERATURE' AND value = 24.56
+        Operation op = new Operation.Builder(OperationType.AND, controller,
+                                        new SimpleExpression(sensorType, Operator.EQ, UTF8Type.instance.decompose("TEMPERATURE")),
+                                        new SimpleExpression(value, Operator.EQ, DoubleType.instance.decompose(24.56))).complete();
+
+        Assert.assertTrue(op.satisfiedBy(row, staticRow, false));
+
+        // sensor_type ='TEMPERATURE' AND value = 30
+        op = new Operation.Builder(OperationType.AND, controller,
+                                             new SimpleExpression(sensorType, Operator.EQ, UTF8Type.instance.decompose("TEMPERATURE")),
+                                             new SimpleExpression(value, Operator.EQ, DoubleType.instance.decompose(30.00))).complete();
+
+        Assert.assertFalse(op.satisfiedBy(row, staticRow, false));
+
+        // sensor_type ='PRESSURE' OR value = 24.56
+        op = new Operation.Builder(OperationType.OR, controller,
+                                             new SimpleExpression(sensorType, Operator.EQ, UTF8Type.instance.decompose("TEMPERATURE")),
+                                             new SimpleExpression(value, Operator.EQ, DoubleType.instance.decompose(24.56))).complete();
+
+        Assert.assertTrue(op.satisfiedBy(row, staticRow, false));
+
+        // sensor_type ='PRESSURE' OR value = 30
+        op = new Operation.Builder(OperationType.AND, controller,
+                                   new SimpleExpression(sensorType, Operator.EQ, UTF8Type.instance.decompose("PRESSURE")),
+                                   new SimpleExpression(value, Operator.EQ, DoubleType.instance.decompose(30.00))).complete();
+
+        Assert.assertFalse(op.satisfiedBy(row, staticRow, false));
+
+        // (sensor_type = 'TEMPERATURE' OR sensor_type = 'PRESSURE') AND value = 24.56
+        op = new Operation.Builder(OperationType.OR, controller,
+                                   new SimpleExpression(sensorType, Operator.EQ, UTF8Type.instance.decompose("TEMPERATURE")),
+                                   new SimpleExpression(sensorType, Operator.EQ, UTF8Type.instance.decompose("PRESSURE")))
+             .setRight(new Operation.Builder(OperationType.AND, controller,
+                                             new SimpleExpression(value, Operator.EQ, DoubleType.instance.decompose(24.56)))).complete();
+
+        Assert.assertTrue(op.satisfiedBy(row, staticRow, false));
+
+        // sensor_type = LIKE 'TEMP%'  AND value = 24.56
+        op = new Operation.Builder(OperationType.AND, controller,
+                                   new SimpleExpression(sensorType, Operator.LIKE_PREFIX, UTF8Type.instance.decompose("TEMP")),
+                                   new SimpleExpression(value, Operator.EQ, DoubleType.instance.decompose(24.56))).complete();
+
+        Assert.assertTrue(op.satisfiedBy(row, staticRow, false));
+    }
+
+    private static class SimpleExpression extends RowFilter.Expression
+    {
+        SimpleExpression(ColumnDefinition column, Operator operator, ByteBuffer value)
+        {
+            super(column, operator, value);
+        }
+
+        @Override
+        protected Kind kind()
+        {
+            return Kind.SIMPLE;
+        }
+
+        @Override
+        public boolean isSatisfiedBy(CFMetaData metadata, DecoratedKey partitionKey, Row row)
+        {
+            throw new UnsupportedOperationException();
+        }
+    }
+
+    private static Unfiltered buildRow(Cell... cells)
+    {
+        return buildRow(Clustering.EMPTY, null, cells);
+    }
+
+    private static Row buildRow(Row.Deletion deletion, Cell... cells)
+    {
+        return buildRow(Clustering.EMPTY, deletion, cells);
+    }
+
+    private static Row buildRow(Clustering clustering, Cell... cells)
+    {
+        return buildRow(clustering, null, cells);
+    }
+
+    private static Row buildRow(Clustering clustering, Row.Deletion deletion, Cell... cells)
+    {
+        Row.Builder rowBuilder = BTreeRow.sortedBuilder();
+        rowBuilder.newRow(clustering);
+        for (Cell c : cells)
+            rowBuilder.addCell(c);
+
+        if (deletion != null)
+            rowBuilder.addRowDeletion(deletion);
+
+        return rowBuilder.build();
+    }
+
+    private static Cell buildCell(ColumnDefinition column, ByteBuffer value, long timestamp)
+    {
+        return BufferCell.live(column, timestamp, value);
+    }
+
+    private static Cell deletedCell(ColumnDefinition column, long timestamp, int nowInSeconds)
+    {
+        return BufferCell.tombstone(column, timestamp, nowInSeconds);
+    }
+
+    private static ColumnDefinition getColumn(ByteBuffer name)
+    {
+        return getColumn(BACKEND, name);
+    }
+
+    private static ColumnDefinition getColumn(ColumnFamilyStore cfs, ByteBuffer name)
+    {
+        return cfs.metadata.getColumnDefinition(name);
+    }
+}

diff --git a/test/unit/org/apache/cassandra/index/sasi/utils/LongIterator.java b/test/unit/org/apache/cassandra/index/sasi/utils/LongIterator.java
new file mode 100644
index 0000000..205d28f
--- /dev/null
+++ b/test/unit/org/apache/cassandra/index/sasi/utils/LongIterator.java

@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.Iterator;
+import java.util.List;
+
+import com.carrotsearch.hppc.LongOpenHashSet;
+import com.carrotsearch.hppc.LongSet;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.index.sasi.disk.Token;
+
+public class LongIterator extends RangeIterator<Long, Token>
+{
+    private final List<LongToken> tokens;
+    private int currentIdx = 0;
+
+    public LongIterator(long[] tokens)
+    {
+        super(tokens.length == 0 ? null : tokens[0], tokens.length == 0 ? null : tokens[tokens.length - 1], tokens.length);
+        this.tokens = new ArrayList<>(tokens.length);
+        for (long token : tokens)
+            this.tokens.add(new LongToken(token));
+    }
+
+    @Override
+    protected Token computeNext()
+    {
+        if (currentIdx >= tokens.size())
+            return endOfData();
+
+        return tokens.get(currentIdx++);
+    }
+
+    @Override
+    protected void performSkipTo(Long nextToken)
+    {
+        for (int i = currentIdx == 0 ? 0 : currentIdx - 1; i < tokens.size(); i++)
+        {
+            LongToken token = tokens.get(i);
+            if (token.get().compareTo(nextToken) >= 0)
+            {
+                currentIdx = i;
+                break;
+            }
+        }
+    }
+
+    @Override
+    public void close() throws IOException
+    {}
+
+    public static class LongToken extends Token
+    {
+        public LongToken(long token)
+        {
+            super(token);
+        }
+
+        @Override
+        public void merge(CombinedValue<Long> other)
+        {
+            // no-op
+        }
+
+        @Override
+        public LongSet getOffsets()
+        {
+            return new LongOpenHashSet(4);
+        }
+
+        @Override
+        public Iterator<DecoratedKey> iterator()
+        {
+            return Collections.emptyIterator();
+        }
+    }
+
+    public static List<Long> convert(RangeIterator<Long, Token> tokens)
+    {
+        List<Long> results = new ArrayList<>();
+        while (tokens.hasNext())
+            results.add(tokens.next().get());
+
+        return results;
+    }
+
+    public static List<Long> convert(final long... nums)
+    {
+        return new ArrayList<Long>(nums.length)
+        {{
+                for (long n : nums)
+                    add(n);
+        }};
+    }
+}

diff --git a/test/unit/org/apache/cassandra/index/sasi/utils/MappedBufferTest.java b/test/unit/org/apache/cassandra/index/sasi/utils/MappedBufferTest.java
new file mode 100644
index 0000000..7ffebf1
--- /dev/null
+++ b/test/unit/org/apache/cassandra/index/sasi/utils/MappedBufferTest.java

@@ -0,0 +1,540 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.io.*;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.concurrent.ThreadLocalRandom;
+
+import org.apache.cassandra.db.marshal.LongType;
+import org.apache.cassandra.io.util.ChannelProxy;
+import org.apache.cassandra.io.util.FileUtils;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+public class MappedBufferTest
+{
+    @Test
+    public void testBasicWriteThenRead() throws Exception
+    {
+        long numLongs = 10000;
+        final MappedBuffer buffer = createTestFile(numLongs);
+
+        Assert.assertEquals(0, buffer.position());
+        for (long i = 0; i < numLongs; i++)
+        {
+            Assert.assertEquals(i * 8, buffer.position());
+            Assert.assertEquals(i, buffer.getLong());
+        }
+
+        buffer.position(0);
+        for (long i = 0; i < numLongs; i++)
+        {
+            Assert.assertEquals(i, buffer.getLong(i * 8));
+            Assert.assertEquals(0, buffer.position());
+        }
+
+        // read all the numbers as shorts (all numbers fit into four bytes)
+        for (long i = 0; i < Math.min(Integer.MAX_VALUE, numLongs); i++)
+            Assert.assertEquals(i, buffer.getInt((i * 8) + 4));
+
+        // read all the numbers as shorts (all numbers fit into two bytes)
+        for (long i = 0; i < Math.min(Short.MAX_VALUE, numLongs); i++) {
+            Assert.assertEquals(i, buffer.getShort((i * 8) + 6));
+        }
+
+        // read all the numbers that can be represented as a single byte
+        for (long i = 0; i < 128; i++)
+            Assert.assertEquals(i, buffer.get((i * 8) + 7));
+
+        buffer.close();
+    }
+
+    @Test
+    public void testDuplicate() throws Exception
+    {
+        long numLongs = 10;
+        final MappedBuffer buffer1 = createTestFile(numLongs);
+
+        Assert.assertEquals(0, buffer1.getLong());
+        Assert.assertEquals(1, buffer1.getLong());
+
+        final MappedBuffer buffer2 = buffer1.duplicate();
+
+        Assert.assertEquals(2, buffer1.getLong());
+        Assert.assertEquals(2, buffer2.getLong());
+
+        buffer2.position(0);
+        Assert.assertEquals(3, buffer1.getLong());
+        Assert.assertEquals(0, buffer2.getLong());
+    }
+
+    @Test
+    public void testLimit() throws Exception
+    {
+        long numLongs =  10;
+        final MappedBuffer buffer1 = createTestFile(numLongs);
+
+        MappedBuffer buffer2 = buffer1.duplicate().position(16).limit(32);
+        buffer1.position(0).limit(16);
+        List<Long> longs = new ArrayList<>(4);
+
+        while (buffer1.hasRemaining())
+            longs.add(buffer1.getLong());
+
+        while (buffer2.hasRemaining())
+            longs.add(buffer2.getLong());
+
+        Assert.assertArrayEquals(new Long[]{0L, 1L, 2L, 3L}, longs.toArray());
+    }
+
+    @Test(expected = IllegalArgumentException.class)
+    public void testPositionGreaterThanLimit() throws Exception
+    {
+        final MappedBuffer buffer = createTestFile(1);
+
+        buffer.limit(4);
+
+        try
+        {
+            buffer.position(buffer.limit() + 1);
+        }
+        finally
+        {
+            buffer.close();
+        }
+    }
+
+    @Test(expected = IllegalArgumentException.class)
+    public void testNegativePosition() throws Exception
+    {
+        try (MappedBuffer buffer = createTestFile(1))
+        {
+            buffer.position(-1);
+        }
+    }
+
+    @Test(expected = IllegalArgumentException.class)
+    public void testLimitGreaterThanCapacity() throws Exception
+    {
+        try (MappedBuffer buffer = createTestFile(1))
+        {
+            buffer.limit(buffer.capacity() + 1);
+        }
+    }
+
+    @Test(expected = IllegalArgumentException.class)
+    public void testLimitLessThanPosition() throws Exception
+    {
+        final MappedBuffer buffer = createTestFile(1);
+
+        buffer.position(1);
+
+        try
+        {
+            buffer.limit(0);
+        }
+        finally
+        {
+            buffer.close();
+        }
+    }
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetRelativeUnderflow() throws Exception
+    {
+        final MappedBuffer buffer = createTestFile(1);
+
+        buffer.position(buffer.limit());
+        try
+        {
+            buffer.get();
+        }
+        finally
+        {
+            buffer.close();
+        }
+
+    }
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetAbsoluteGreaterThanCapacity() throws Exception
+    {
+        try (MappedBuffer buffer = createTestFile(1))
+        {
+            buffer.get(buffer.limit());
+        }
+    }
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetAbsoluteNegativePosition() throws Exception
+    {
+        try (MappedBuffer buffer = createTestFile(1))
+        {
+            buffer.get(-1);
+        }
+    }
+
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetShortRelativeUnderflow() throws Exception
+    {
+        final MappedBuffer buffer = createTestFile(1);
+
+        buffer.position(buffer.capacity() - 1);
+        try
+        {
+            buffer.getShort();
+        }
+        finally
+        {
+            buffer.close();
+        }
+
+    }
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetShortAbsoluteGreaterThanCapacity() throws Exception
+    {
+        final MappedBuffer buffer = createTestFile(1);
+
+        Assert.assertEquals(8, buffer.capacity());
+        try
+        {
+            buffer.getShort(buffer.capacity() - 1);
+        }
+        finally
+        {
+            buffer.close();
+        }
+    }
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetShortAbsoluteNegativePosition() throws Exception
+    {
+        try (MappedBuffer buffer = createTestFile(1))
+        {
+            buffer.getShort(-1);
+        }
+    }
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetIntRelativeUnderflow() throws Exception
+    {
+        final MappedBuffer buffer = createTestFile(1);
+
+        buffer.position(buffer.capacity() - 3);
+        try
+        {
+            buffer.getInt();
+        }
+        finally
+        {
+            buffer.close();
+        }
+    }
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetIntAbsoluteGreaterThanCapacity() throws Exception
+    {
+        final MappedBuffer buffer = createTestFile(1);
+
+        Assert.assertEquals(8, buffer.capacity());
+        try
+        {
+            buffer.getInt(buffer.capacity() - 3);
+        }
+        finally
+        {
+            buffer.close();
+        }
+    }
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetIntAbsoluteNegativePosition() throws Exception
+    {
+        try (MappedBuffer buffer = createTestFile(1))
+        {
+            buffer.getInt(-1);
+        }
+    }
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetLongRelativeUnderflow() throws Exception
+    {
+        final MappedBuffer buffer = createTestFile(1);
+
+        buffer.position(buffer.capacity() - 7);
+        try
+        {
+            buffer.getLong();
+        }
+        finally
+        {
+            buffer.close();
+        }
+
+    }
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetLongAbsoluteGreaterThanCapacity() throws Exception
+    {
+        final MappedBuffer buffer = createTestFile(1);
+
+        Assert.assertEquals(8, buffer.capacity());
+        try
+        {
+            buffer.getLong(buffer.capacity() - 7);
+        }
+        finally
+        {
+            buffer.close();
+        }
+    }
+
+    @Test(expected = IndexOutOfBoundsException.class)
+    public void testGetLongAbsoluteNegativePosition() throws Exception
+    {
+        try (MappedBuffer buffer = createTestFile(1))
+        {
+            buffer.getLong(-1);
+        }
+    }
+
+    @Test
+    public void testGetPageRegion() throws Exception
+    {
+        ThreadLocalRandom random = ThreadLocalRandom.current();
+
+        int numLongs = 1000;
+        int byteSize = 8;
+        int capacity = numLongs * byteSize;
+        try (MappedBuffer buffer = createTestFile(numLongs))
+        {
+            for (int i = 0; i < 1000; i++)
+            {
+                // offset, length are always aligned on sizeof(long)
+                int offset = random.nextInt(0, 1000 * byteSize - byteSize) & ~(byteSize - 1);
+                int length = Math.min(capacity, random.nextInt(byteSize, capacity - offset) & ~(byteSize - 1));
+
+                ByteBuffer region = buffer.getPageRegion(offset, length);
+                for (int j = offset; j < (offset + length); j += 8)
+                    Assert.assertEquals(j / 8, region.getLong(j));
+            }
+        }
+    }
+
+    @Test (expected = IllegalArgumentException.class)
+    public void testMisalignedRegionAccess() throws Exception
+    {
+        try (MappedBuffer buffer = createTestFile(100, 8, 4, 0))
+        {
+            buffer.getPageRegion(13, 27);
+        }
+    }
+
+    @Test
+    public void testSequentialIterationWithPadding() throws Exception
+    {
+        long numValues = 1000;
+        int maxPageBits = 6; // 64 bytes page
+        int[] paddings = new int[] { 0, 3, 5, 7, 9, 11, 13 };
+
+        // test different page sizes, with different padding and types
+        for (int numPageBits = 3; numPageBits <= maxPageBits; numPageBits++)
+        {
+            for (int typeSize = 2; typeSize <= 8; typeSize *= 2)
+            {
+                for (int padding : paddings)
+                {
+                    try (MappedBuffer buffer = createTestFile(numValues, typeSize, numPageBits, padding))
+                    {
+                        long offset = 0;
+                        for (long j = 0; j < numValues; j++)
+                        {
+                            switch (typeSize)
+                            {
+                                case 2:
+                                    Assert.assertEquals(j, buffer.getShort(offset));
+                                    break;
+
+                                case 4:
+                                    Assert.assertEquals(j, buffer.getInt(offset));
+                                    break;
+
+                                case 8:
+                                    Assert.assertEquals(j, buffer.getLong(offset));
+                                    break;
+
+                                default:
+                                    throw new AssertionError();
+                            }
+
+                            offset += typeSize + padding;
+                        }
+                    }
+                }
+            }
+        }
+    }
+
+    @Test
+    public void testSequentialIteration() throws IOException
+    {
+        long numValues = 1000;
+        for (int typeSize = 2; typeSize <= 8; typeSize *= 2)
+        {
+            try (MappedBuffer buffer = createTestFile(numValues, typeSize, 16, 0))
+            {
+                for (int j = 0; j < numValues; j++)
+                {
+                    Assert.assertEquals(j * typeSize, buffer.position());
+
+                    switch (typeSize)
+                    {
+                        case 2:
+                            Assert.assertEquals(j, buffer.getShort());
+                            break;
+
+                        case 4:
+                            Assert.assertEquals(j, buffer.getInt());
+                            break;
+
+                        case 8:
+                            Assert.assertEquals(j, buffer.getLong());
+                            break;
+
+                        default:
+                            throw new AssertionError();
+                    }
+                }
+            }
+        }
+    }
+
+    @Test
+    public void testCompareToPage() throws IOException
+    {
+        long numValues = 100;
+        int typeSize = 8;
+
+        try (MappedBuffer buffer = createTestFile(numValues))
+        {
+            for (long i = 0; i < numValues * typeSize; i += typeSize)
+            {
+                long value = i / typeSize;
+                Assert.assertEquals(0, buffer.comparePageTo(i, typeSize, LongType.instance, LongType.instance.decompose(value)));
+            }
+        }
+    }
+
+    @Test
+    public void testOpenWithoutPageBits() throws IOException
+    {
+        File tmp = File.createTempFile("mapped-buffer", "tmp");
+        tmp.deleteOnExit();
+
+        RandomAccessFile file = new RandomAccessFile(tmp, "rw");
+
+        long numValues = 1000;
+        for (long i = 0; i < numValues; i++)
+            file.writeLong(i);
+
+        file.getFD().sync();
+
+        try (MappedBuffer buffer = new MappedBuffer(new ChannelProxy(tmp.getAbsolutePath(), file.getChannel())))
+        {
+            Assert.assertEquals(numValues * 8, buffer.limit());
+            Assert.assertEquals(numValues * 8, buffer.capacity());
+
+            for (long i = 0; i < numValues; i++)
+            {
+                Assert.assertEquals(i * 8, buffer.position());
+                Assert.assertEquals(i, buffer.getLong());
+            }
+        }
+        finally
+        {
+            FileUtils.closeQuietly(file);
+        }
+    }
+
+    @Test(expected = IllegalArgumentException.class)
+    public void testIncorrectPageSize() throws Exception
+    {
+        new MappedBuffer(null, 33);
+    }
+
+    private MappedBuffer createTestFile(long numCount) throws IOException
+    {
+        return createTestFile(numCount, 8, 16, 0);
+    }
+
+    private MappedBuffer createTestFile(long numCount, int typeSize, int numPageBits, int padding) throws IOException
+    {
+        final File testFile = File.createTempFile("mapped-buffer-test", "db");
+        testFile.deleteOnExit();
+
+        RandomAccessFile file = new RandomAccessFile(testFile, "rw");
+
+        for (long i = 0; i < numCount; i++)
+        {
+
+            switch (typeSize)
+            {
+                case 1:
+                    file.write((byte) i);
+                    break;
+
+                case 2:
+                    file.writeShort((short) i);
+                    break;
+
+                case 4:
+                    file.writeInt((int) i);
+                    break;
+
+                case 8:
+                    // bunch of longs
+                    file.writeLong(i);
+                    break;
+
+                default:
+                    throw new IllegalArgumentException("unknown byte size: " + typeSize);
+            }
+
+            for (int j = 0; j < padding; j++)
+                file.write(0);
+        }
+
+        file.getFD().sync();
+
+        try
+        {
+            return new MappedBuffer(new ChannelProxy(testFile.getAbsolutePath(), file.getChannel()), numPageBits);
+        }
+        finally
+        {
+            FileUtils.closeQuietly(file);
+        }
+    }
+
+}

diff --git a/test/unit/org/apache/cassandra/index/sasi/utils/RangeIntersectionIteratorTest.java b/test/unit/org/apache/cassandra/index/sasi/utils/RangeIntersectionIteratorTest.java
new file mode 100644
index 0000000..18b9dd7
--- /dev/null
+++ b/test/unit/org/apache/cassandra/index/sasi/utils/RangeIntersectionIteratorTest.java

@@ -0,0 +1,387 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.io.IOException;
+import java.util.*;
+import java.util.concurrent.ThreadLocalRandom;
+
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.index.sasi.utils.RangeIntersectionIterator.Strategy;
+import org.apache.cassandra.index.sasi.utils.RangeIntersectionIterator.LookupIntersectionIterator;
+import org.apache.cassandra.index.sasi.utils.RangeIntersectionIterator.BounceIntersectionIterator;
+import org.apache.cassandra.io.util.FileUtils;
+
+import com.carrotsearch.hppc.LongOpenHashSet;
+import com.carrotsearch.hppc.LongSet;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.cassandra.index.sasi.utils.LongIterator.convert;
+
+public class RangeIntersectionIteratorTest
+{
+    @Test
+    public void testNoOverlappingValues()
+    {
+        for (Strategy strategy : Strategy.values())
+            testNoOverlappingValues(strategy);
+    }
+
+    private void testNoOverlappingValues(Strategy strategy)
+    {
+        RangeIterator.Builder<Long, Token> builder = RangeIntersectionIterator.builder(strategy);
+
+        builder.add(new LongIterator(new long[] { 2L, 3L, 5L, 6L }));
+        builder.add(new LongIterator(new long[] { 1L, 7L }));
+        builder.add(new LongIterator(new long[] { 4L, 8L, 9L, 10L }));
+
+        Assert.assertEquals(convert(), convert(builder.build()));
+
+        builder = RangeIntersectionIterator.builder(strategy);
+        // both ranges overlap by min/max but not by value
+        builder.add(new LongIterator(new long[] { 1L, 5L, 7L, 9L }));
+        builder.add(new LongIterator(new long[] { 6L }));
+
+        RangeIterator<Long, Token> range = builder.build();
+
+        Assert.assertNotNull(range);
+        Assert.assertFalse(range.hasNext());
+
+        builder = RangeIntersectionIterator.builder(strategy);
+        // both ranges overlap by min/max but not by value
+        builder.add(new LongIterator(new long[] { 1L, 5L, 7L, 9L }));
+        builder.add(new LongIterator(new long[] { 0L, 10L, 12L }));
+
+        range = builder.build();
+
+        Assert.assertNotNull(range);
+        Assert.assertFalse(range.hasNext());
+    }
+
+    @Test
+    public void testOverlappingValues()
+    {
+        for (Strategy strategy : Strategy.values())
+            testOverlappingValues(strategy);
+    }
+
+    private void testOverlappingValues(Strategy strategy)
+    {
+        RangeIterator.Builder<Long, Token> builder = RangeIntersectionIterator.builder(strategy);
+
+        builder.add(new LongIterator(new long[] { 1L, 4L, 6L, 7L }));
+        builder.add(new LongIterator(new long[] { 2L, 4L, 5L, 6L }));
+        builder.add(new LongIterator(new long[] { 4L, 6L, 8L, 9L, 10L }));
+
+        Assert.assertEquals(convert(4L, 6L), convert(builder.build()));
+    }
+
+    @Test
+    public void testSingleIterator()
+    {
+        for (Strategy strategy : Strategy.values())
+            testSingleIterator(strategy);
+    }
+
+    private void testSingleIterator(Strategy strategy)
+    {
+        RangeIntersectionIterator.Builder<Long, Token> builder = RangeIntersectionIterator.builder(strategy);
+
+        builder.add(new LongIterator(new long[] { 1L, 2L, 4L, 9L }));
+
+        Assert.assertEquals(convert(1L, 2L, 4L, 9L), convert(builder.build()));
+    }
+
+    @Test
+    public void testSkipTo()
+    {
+        for (Strategy strategy : Strategy.values())
+            testSkipTo(strategy);
+    }
+
+    private void testSkipTo(Strategy strategy)
+    {
+        RangeIterator.Builder<Long, Token> builder = RangeIntersectionIterator.builder(strategy);
+
+        builder.add(new LongIterator(new long[] { 1L, 4L, 6L, 7L, 9L, 10L }));
+        builder.add(new LongIterator(new long[] { 2L, 4L, 5L, 6L, 7L, 10L, 12L }));
+        builder.add(new LongIterator(new long[] { 4L, 6L, 7L, 9L, 10L }));
+
+        RangeIterator<Long, Token> range = builder.build();
+        Assert.assertNotNull(range);
+
+        // first let's skipTo something before range
+        Assert.assertEquals(4L, (long) range.skipTo(3L).get());
+        Assert.assertEquals(4L, (long) range.getCurrent());
+
+        // now let's skip right to the send value
+        Assert.assertEquals(6L, (long) range.skipTo(5L).get());
+        Assert.assertEquals(6L, (long) range.getCurrent());
+
+        // now right to the element
+        Assert.assertEquals(7L, (long) range.skipTo(7L).get());
+        Assert.assertEquals(7L, (long) range.getCurrent());
+        Assert.assertEquals(7L, (long) range.next().get());
+
+        Assert.assertTrue(range.hasNext());
+        Assert.assertEquals(10L, (long) range.getCurrent());
+
+        // now right after the last element
+        Assert.assertNull(range.skipTo(11L));
+        Assert.assertFalse(range.hasNext());
+    }
+
+    @Test
+    public void testMinMaxAndCount()
+    {
+        for (Strategy strategy : Strategy.values())
+            testMinMaxAndCount(strategy);
+    }
+
+    private void testMinMaxAndCount(Strategy strategy)
+    {
+        RangeIterator.Builder<Long, Token> builder = RangeIntersectionIterator.builder(strategy);
+
+        builder.add(new LongIterator(new long[]{1L, 2L, 9L}));
+        builder.add(new LongIterator(new long[]{4L, 5L, 9L}));
+        builder.add(new LongIterator(new long[]{7L, 8L, 9L}));
+
+        Assert.assertEquals(9L, (long) builder.getMaximum());
+        Assert.assertEquals(9L, builder.getTokenCount());
+
+        RangeIterator<Long, Token> tokens = builder.build();
+
+        Assert.assertNotNull(tokens);
+        Assert.assertEquals(7L, (long) tokens.getMinimum());
+        Assert.assertEquals(9L, (long) tokens.getMaximum());
+        Assert.assertEquals(9L, tokens.getCount());
+
+        Assert.assertEquals(convert(9L), convert(builder.build()));
+    }
+
+    @Test
+    public void testBuilder()
+    {
+        for (Strategy strategy : Strategy.values())
+            testBuilder(strategy);
+    }
+
+    private void testBuilder(Strategy strategy)
+    {
+        RangeIterator.Builder<Long, Token> builder = RangeIntersectionIterator.builder(strategy);
+
+        Assert.assertNull(builder.getMinimum());
+        Assert.assertNull(builder.getMaximum());
+        Assert.assertEquals(0L, builder.getTokenCount());
+        Assert.assertEquals(0L, builder.rangeCount());
+
+        builder.add(new LongIterator(new long[] { 1L, 2L, 6L }));
+        builder.add(new LongIterator(new long[] { 4L, 5L, 6L }));
+        builder.add(new LongIterator(new long[] { 6L, 8L, 9L }));
+
+        Assert.assertEquals(6L, (long) builder.getMinimum());
+        Assert.assertEquals(6L, (long) builder.getMaximum());
+        Assert.assertEquals(9L, builder.getTokenCount());
+        Assert.assertEquals(3L, builder.rangeCount());
+        Assert.assertFalse(builder.statistics.isDisjoint());
+
+        Assert.assertEquals(1L, (long) builder.ranges.poll().getMinimum());
+        Assert.assertEquals(4L, (long) builder.ranges.poll().getMinimum());
+        Assert.assertEquals(6L, (long) builder.ranges.poll().getMinimum());
+
+        builder.add(new LongIterator(new long[] { 1L, 2L, 6L }));
+        builder.add(new LongIterator(new long[] { 4L, 5L, 6L }));
+        builder.add(new LongIterator(new long[] { 6L, 8L, 9L }));
+
+        Assert.assertEquals(convert(6L), convert(builder.build()));
+
+        builder = RangeIntersectionIterator.builder(strategy);
+        builder.add(new LongIterator(new long[]{ 1L, 5L, 6L }));
+        builder.add(new LongIterator(new long[]{ 3L, 5L, 6L }));
+
+        RangeIterator<Long, Token> tokens = builder.build();
+
+        Assert.assertEquals(convert(5L, 6L), convert(tokens));
+
+        FileUtils.closeQuietly(tokens);
+
+        RangeIterator emptyTokens = RangeIntersectionIterator.builder(strategy).build();
+        Assert.assertNull(emptyTokens);
+
+        builder = RangeIntersectionIterator.builder(strategy);
+        Assert.assertEquals(0L, builder.add((RangeIterator<Long, Token>) null).rangeCount());
+        Assert.assertEquals(0L, builder.add((List<RangeIterator<Long, Token>>) null).getTokenCount());
+        Assert.assertEquals(0L, builder.add(new LongIterator(new long[] {})).rangeCount());
+
+        RangeIterator<Long, Token> single = new LongIterator(new long[] { 1L, 2L, 3L });
+        RangeIterator<Long, Token> range = RangeIntersectionIterator.<Long, Token>builder().add(single).build();
+
+        // because build should return first element if it's only one instead of building yet another iterator
+        Assert.assertEquals(range, single);
+
+        // disjoint case
+        builder = RangeIntersectionIterator.builder();
+        builder.add(new LongIterator(new long[] { 1L, 2L, 3L }));
+        builder.add(new LongIterator(new long[] { 4L, 5L, 6L }));
+
+        Assert.assertTrue(builder.statistics.isDisjoint());
+
+        RangeIterator<Long, Token> disjointIntersection = builder.build();
+        Assert.assertNotNull(disjointIntersection);
+        Assert.assertFalse(disjointIntersection.hasNext());
+
+    }
+
+    @Test
+    public void testClose() throws IOException
+    {
+        for (Strategy strategy : Strategy.values())
+            testClose(strategy);
+    }
+
+    private void testClose(Strategy strategy) throws IOException
+    {
+        RangeIterator<Long, Token> tokens = RangeIntersectionIterator.<Long, Token>builder(strategy)
+                                            .add(new LongIterator(new long[] { 1L, 2L, 3L }))
+                                            .build();
+
+        Assert.assertNotNull(tokens);
+        tokens.close();
+    }
+
+    @Test
+    public void testIsOverlapping()
+    {
+        RangeIterator<Long, Token> rangeA, rangeB;
+
+        rangeA = new LongIterator(new long[] { 1L, 5L });
+        rangeB = new LongIterator(new long[] { 5L, 9L });
+        Assert.assertTrue(RangeIterator.isOverlapping(rangeA, rangeB));
+
+        rangeA = new LongIterator(new long[] { 5L, 9L });
+        rangeB = new LongIterator(new long[] { 1L, 6L });
+        Assert.assertTrue(RangeIterator.isOverlapping(rangeA, rangeB));
+
+        rangeA = new LongIterator(new long[] { 5L, 9L });
+        rangeB = new LongIterator(new long[] { 5L, 9L });
+        Assert.assertTrue(RangeIterator.isOverlapping(rangeA, rangeB));
+
+        rangeA = new LongIterator(new long[] { 1L, 4L });
+        rangeB = new LongIterator(new long[] { 5L, 9L });
+        Assert.assertFalse(RangeIterator.isOverlapping(rangeA, rangeB));
+
+        rangeA = new LongIterator(new long[] { 6L, 9L });
+        rangeB = new LongIterator(new long[] { 1L, 4L });
+        Assert.assertFalse(RangeIterator.isOverlapping(rangeA, rangeB));
+    }
+
+    @Test
+    public void testIntersectionOfRandomRanges()
+    {
+        for (Strategy strategy : Strategy.values())
+            testIntersectionOfRandomRanges(strategy);
+    }
+
+    private void testIntersectionOfRandomRanges(Strategy strategy)
+    {
+        for (int attempt = 0; attempt < 16; attempt++)
+        {
+            final ThreadLocalRandom random = ThreadLocalRandom.current();
+            final int maxRanges = random.nextInt(2, 16);
+
+            // generate randomize ranges
+            long[][] ranges = new long[maxRanges][];
+            for (int i = 0; i < ranges.length; i++)
+            {
+                int rangeSize = random.nextInt(16, 512);
+                LongSet range = new LongOpenHashSet(rangeSize);
+
+                for (int j = 0; j < rangeSize; j++)
+                    range.add(random.nextLong(0, 100));
+
+                ranges[i] = range.toArray();
+                Arrays.sort(ranges[i]);
+            }
+
+            List<Long> expected = new ArrayList<>();
+            // determine unique tokens which intersect every range
+            for (long token : ranges[0])
+            {
+                boolean intersectsAll = true;
+                for (int i = 1; i < ranges.length; i++)
+                {
+                    if (Arrays.binarySearch(ranges[i], token) < 0)
+                    {
+                        intersectsAll = false;
+                        break;
+                    }
+                }
+
+                if (intersectsAll)
+                    expected.add(token);
+            }
+
+            RangeIterator.Builder<Long, Token> builder = RangeIntersectionIterator.builder(strategy);
+            for (long[] range : ranges)
+                builder.add(new LongIterator(range));
+
+            Assert.assertEquals(expected, convert(builder.build()));
+        }
+    }
+
+    @Test
+    public void testIteratorPeeking()
+    {
+        RangeIterator.Builder<Long, Token> builder = RangeIntersectionIterator.builder();
+
+        // iterator with only one element
+        builder.add(new LongIterator(new long[] { 10L }));
+
+        // iterator with 150 elements (lookup is going to be advantageous over bound in this case)
+        long[] tokens = new long[150];
+        for (int i = 0; i < tokens.length; i++)
+            tokens[i] = i;
+
+        builder.add(new LongIterator(tokens));
+
+        RangeIterator<Long, Token> intersection = builder.build();
+
+        Assert.assertNotNull(intersection);
+        Assert.assertEquals(LookupIntersectionIterator.class, intersection.getClass());
+
+        Assert.assertTrue(intersection.hasNext());
+        Assert.assertEquals(convert(10L), convert(intersection));
+
+        builder = RangeIntersectionIterator.builder();
+
+        builder.add(new LongIterator(new long[] { 1L, 3L, 5L, 7L, 9L }));
+        builder.add(new LongIterator(new long[] { 1L, 2L, 5L, 6L }));
+
+        intersection = builder.build();
+
+        // in the situation when there is a similar number of elements inside ranges
+        // ping-pong (bounce) intersection is preferred as it covers gaps quicker then linear scan + lookup.
+        Assert.assertNotNull(intersection);
+        Assert.assertEquals(BounceIntersectionIterator.class, intersection.getClass());
+
+        Assert.assertTrue(intersection.hasNext());
+        Assert.assertEquals(convert(1L, 5L), convert(intersection));
+    }
+}

diff --git a/test/unit/org/apache/cassandra/index/sasi/utils/RangeUnionIteratorTest.java b/test/unit/org/apache/cassandra/index/sasi/utils/RangeUnionIteratorTest.java
new file mode 100644
index 0000000..f69086b
--- /dev/null
+++ b/test/unit/org/apache/cassandra/index/sasi/utils/RangeUnionIteratorTest.java

@@ -0,0 +1,306 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.index.sasi.utils;
+
+import java.util.*;
+import java.util.concurrent.ThreadLocalRandom;
+
+import org.apache.cassandra.index.sasi.disk.Token;
+import org.apache.cassandra.io.util.FileUtils;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.apache.cassandra.index.sasi.utils.LongIterator.convert;
+
+public class RangeUnionIteratorTest
+{
+    @Test
+    public void testNoOverlappingValues()
+    {
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+
+        builder.add(new LongIterator(new long[] { 2L, 3L, 5L, 6L }));
+        builder.add(new LongIterator(new long[] { 1L, 7L }));
+        builder.add(new LongIterator(new long[] { 4L, 8L, 9L, 10L }));
+
+        Assert.assertEquals(convert(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), convert(builder.build()));
+    }
+
+    @Test
+    public void testSingleIterator()
+    {
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+
+        builder.add(new LongIterator(new long[] { 1L, 2L, 4L, 9L }));
+
+        Assert.assertEquals(convert(1L, 2L, 4L, 9L), convert(builder.build()));
+    }
+
+    @Test
+    public void testOverlappingValues()
+    {
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+
+        builder.add(new LongIterator(new long[] { 1L, 4L, 6L, 7L }));
+        builder.add(new LongIterator(new long[] { 2L, 3L, 5L, 6L }));
+        builder.add(new LongIterator(new long[] { 4L, 6L, 8L, 9L, 10L }));
+
+        List<Long> values = convert(builder.build());
+
+        Assert.assertEquals(values.toString(), convert(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L), values);
+    }
+
+    @Test
+    public void testNoOverlappingRanges()
+    {
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+
+        builder.add(new LongIterator(new long[] { 1L, 2L, 3L }));
+        builder.add(new LongIterator(new long[] { 4L, 5L, 6L }));
+        builder.add(new LongIterator(new long[] { 7L, 8L, 9L }));
+
+        Assert.assertEquals(convert(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L), convert(builder.build()));
+    }
+
+    @Test
+    public void testTwoIteratorsWithSingleValues()
+    {
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+
+        builder.add(new LongIterator(new long[] { 1L }));
+        builder.add(new LongIterator(new long[] { 1L }));
+
+        Assert.assertEquals(convert(1L), convert(builder.build()));
+    }
+
+    @Test
+    public void testDifferentSizeIterators()
+    {
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+
+        builder.add(new LongIterator(new long[] { 2L, 3L, 5L, 6L, 12L, 13L }));
+        builder.add(new LongIterator(new long[] { 1L, 7L, 14L, 15 }));
+        builder.add(new LongIterator(new long[] { 4L, 5L, 8L, 9L, 10L }));
+
+        Assert.assertEquals(convert(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 12L, 13L, 14L, 15L), convert(builder.build()));
+    }
+
+    @Test
+    public void testRandomSequences()
+    {
+        ThreadLocalRandom random = ThreadLocalRandom.current();
+
+        long[][] values = new long[random.nextInt(1, 20)][];
+        int numTests = random.nextInt(10, 20);
+
+        for (int tests = 0; tests < numTests; tests++)
+        {
+            RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+            int totalCount = 0;
+
+            for (int i = 0; i < values.length; i++)
+            {
+                long[] part = new long[random.nextInt(1, 500)];
+                for (int j = 0; j < part.length; j++)
+                    part[j] = random.nextLong();
+
+                // all of the parts have to be sorted to mimic SSTable
+                Arrays.sort(part);
+
+                values[i] = part;
+                builder.add(new LongIterator(part));
+                totalCount += part.length;
+            }
+
+            long[] totalOrdering = new long[totalCount];
+            int index = 0;
+
+            for (long[] part : values)
+            {
+                for (long value : part)
+                    totalOrdering[index++] = value;
+            }
+
+            Arrays.sort(totalOrdering);
+
+            int count = 0;
+            RangeIterator<Long, Token> tokens = builder.build();
+
+            Assert.assertNotNull(tokens);
+            while (tokens.hasNext())
+                Assert.assertEquals(totalOrdering[count++], (long) tokens.next().get());
+
+            Assert.assertEquals(totalCount, count);
+        }
+    }
+
+    @Test
+    public void testMinMaxAndCount()
+    {
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+
+        builder.add(new LongIterator(new long[] { 1L, 2L, 3L }));
+        builder.add(new LongIterator(new long[] { 4L, 5L, 6L }));
+        builder.add(new LongIterator(new long[] { 7L, 8L, 9L }));
+
+        Assert.assertEquals(9L, (long) builder.getMaximum());
+        Assert.assertEquals(9L, builder.getTokenCount());
+
+        RangeIterator<Long, Token> tokens = builder.build();
+
+        Assert.assertNotNull(tokens);
+        Assert.assertEquals(1L, (long) tokens.getMinimum());
+        Assert.assertEquals(9L, (long) tokens.getMaximum());
+        Assert.assertEquals(9L, tokens.getCount());
+
+        for (long i = 1; i < 10; i++)
+        {
+            Assert.assertTrue(tokens.hasNext());
+            Assert.assertEquals(i, (long) tokens.next().get());
+        }
+
+        Assert.assertFalse(tokens.hasNext());
+        Assert.assertEquals(1L, (long) tokens.getMinimum());
+    }
+
+    @Test
+    public void testBuilder()
+    {
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+
+        Assert.assertNull(builder.getMinimum());
+        Assert.assertNull(builder.getMaximum());
+        Assert.assertEquals(0L, builder.getTokenCount());
+        Assert.assertEquals(0L, builder.rangeCount());
+
+        builder.add(new LongIterator(new long[] { 1L, 2L, 3L }));
+        builder.add(new LongIterator(new long[] { 4L, 5L, 6L }));
+        builder.add(new LongIterator(new long[] { 7L, 8L, 9L }));
+
+        Assert.assertEquals(1L, (long) builder.getMinimum());
+        Assert.assertEquals(9L, (long) builder.getMaximum());
+        Assert.assertEquals(9L, builder.getTokenCount());
+        Assert.assertEquals(3L, builder.rangeCount());
+        Assert.assertFalse(builder.statistics.isDisjoint());
+
+        Assert.assertEquals(1L, (long) builder.ranges.poll().getMinimum());
+        Assert.assertEquals(4L, (long) builder.ranges.poll().getMinimum());
+        Assert.assertEquals(7L, (long) builder.ranges.poll().getMinimum());
+
+        RangeIterator<Long, Token> tokens = RangeUnionIterator.build(new ArrayList<RangeIterator<Long, Token>>()
+        {{
+            add(new LongIterator(new long[]{1L, 2L, 4L}));
+            add(new LongIterator(new long[]{3L, 5L, 6L}));
+        }});
+
+        Assert.assertEquals(convert(1L, 2L, 3L, 4L, 5L, 6L), convert(tokens));
+
+        FileUtils.closeQuietly(tokens);
+
+        RangeIterator emptyTokens = RangeUnionIterator.builder().build();
+        Assert.assertNull(emptyTokens);
+
+        builder = RangeUnionIterator.builder();
+        Assert.assertEquals(0L, builder.add((RangeIterator<Long, Token>) null).rangeCount());
+        Assert.assertEquals(0L, builder.add((List<RangeIterator<Long, Token>>) null).getTokenCount());
+        Assert.assertEquals(0L, builder.add(new LongIterator(new long[] {})).rangeCount());
+
+        RangeIterator<Long, Token> single = new LongIterator(new long[] { 1L, 2L, 3L });
+        RangeIterator<Long, Token> range = RangeIntersectionIterator.<Long, Token>builder().add(single).build();
+
+        // because build should return first element if it's only one instead of building yet another iterator
+        Assert.assertEquals(range, single);
+    }
+
+    @Test
+    public void testSkipTo()
+    {
+        RangeUnionIterator.Builder<Long, Token> builder = RangeUnionIterator.builder();
+
+        builder.add(new LongIterator(new long[]{1L, 2L, 3L}));
+        builder.add(new LongIterator(new long[]{4L, 5L, 6L}));
+        builder.add(new LongIterator(new long[]{7L, 8L, 9L}));
+
+        RangeIterator<Long, Token> tokens = builder.build();
+        Assert.assertNotNull(tokens);
+
+        tokens.skipTo(5L);
+        Assert.assertTrue(tokens.hasNext());
+        Assert.assertEquals(5L, (long) tokens.next().get());
+
+        tokens.skipTo(7L);
+        Assert.assertTrue(tokens.hasNext());
+        Assert.assertEquals(7L, (long) tokens.next().get());
+
+        tokens.skipTo(10L);
+        Assert.assertFalse(tokens.hasNext());
+        Assert.assertEquals(1L, (long) tokens.getMinimum());
+        Assert.assertEquals(9L, (long) tokens.getMaximum());
+    }
+
+    @Test
+    public void testMergingMultipleIterators()
+    {
+        RangeUnionIterator.Builder<Long, Token> builderA = RangeUnionIterator.builder();
+
+        builderA.add(new LongIterator(new long[] { 1L, 3L, 5L }));
+        builderA.add(new LongIterator(new long[] { 8L, 10L, 12L }));
+
+        RangeUnionIterator.Builder<Long, Token> builderB = RangeUnionIterator.builder();
+
+        builderB.add(new LongIterator(new long[] { 7L, 9L, 11L }));
+        builderB.add(new LongIterator(new long[] { 2L, 4L, 6L }));
+
+        RangeIterator<Long, Token> union = RangeUnionIterator.build(Arrays.asList(builderA.build(), builderB.build()));
+        Assert.assertEquals(convert(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L), convert(union));
+    }
+
+    @Test
+    public void testRangeIterator()
+    {
+        LongIterator tokens = new LongIterator(new long[] { 0L, 1L, 2L, 3L });
+
+        Assert.assertEquals(0L, (long) tokens.getMinimum());
+        Assert.assertEquals(3L, (long) tokens.getMaximum());
+
+        for (int i = 0; i <= 3; i++)
+        {
+            Assert.assertTrue(tokens.hasNext());
+            Assert.assertEquals(i, (long) tokens.getCurrent());
+            Assert.assertEquals(i, (long) tokens.next().get());
+        }
+
+        tokens = new LongIterator(new long[] { 0L, 1L, 3L, 5L });
+
+        Assert.assertEquals(3L, (long) tokens.skipTo(2L).get());
+        Assert.assertTrue(tokens.hasNext());
+        Assert.assertEquals(3L, (long) tokens.getCurrent());
+        Assert.assertEquals(3L, (long) tokens.next().get());
+
+        Assert.assertEquals(5L, (long) tokens.skipTo(5L).get());
+        Assert.assertTrue(tokens.hasNext());
+        Assert.assertEquals(5L, (long) tokens.getCurrent());
+        Assert.assertEquals(5L, (long) tokens.next().get());
+
+        LongIterator empty = new LongIterator(new long[0]);
+
+        Assert.assertNull(empty.skipTo(3L));
+        Assert.assertFalse(empty.hasNext());
+    }
+}
\ No newline at end of file

diff --git a/test/unit/org/apache/cassandra/io/compress/CQLCompressionTest.java b/test/unit/org/apache/cassandra/io/compress/CQLCompressionTest.java
new file mode 100644
index 0000000..a2aff2f
--- /dev/null
+++ b/test/unit/org/apache/cassandra/io/compress/CQLCompressionTest.java

@@ -0,0 +1,56 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.io.compress;
+
+import org.junit.Test;
+
+import org.apache.cassandra.cql3.CQLTester;
+import org.apache.cassandra.exceptions.ConfigurationException;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
+
+public class CQLCompressionTest extends CQLTester
+{
+    @Test
+    public void lz4ParamsTest()
+    {
+        createTable("create table %s (id int primary key, uh text) with compression = {'class':'LZ4Compressor', 'lz4_high_compressor_level':3}");
+        assertTrue(((LZ4Compressor)getCurrentColumnFamilyStore().metadata.params.compression.getSstableCompressor()).compressorType.equals(LZ4Compressor.LZ4_FAST_COMPRESSOR));
+        createTable("create table %s (id int primary key, uh text) with compression = {'class':'LZ4Compressor', 'lz4_compressor_type':'high', 'lz4_high_compressor_level':13}");
+        assertEquals(((LZ4Compressor)getCurrentColumnFamilyStore().metadata.params.compression.getSstableCompressor()).compressorType, LZ4Compressor.LZ4_HIGH_COMPRESSOR);
+        assertEquals(((LZ4Compressor)getCurrentColumnFamilyStore().metadata.params.compression.getSstableCompressor()).compressionLevel, (Integer)13);
+        createTable("create table %s (id int primary key, uh text) with compression = {'class':'LZ4Compressor'}");
+        assertEquals(((LZ4Compressor)getCurrentColumnFamilyStore().metadata.params.compression.getSstableCompressor()).compressorType, LZ4Compressor.LZ4_FAST_COMPRESSOR);
+        assertEquals(((LZ4Compressor)getCurrentColumnFamilyStore().metadata.params.compression.getSstableCompressor()).compressionLevel, (Integer)9);
+    }
+
+    @Test(expected = ConfigurationException.class)
+    public void lz4BadParamsTest() throws Throwable
+    {
+        try
+        {
+            createTable("create table %s (id int primary key, uh text) with compression = {'class':'LZ4Compressor', 'lz4_compressor_type':'high', 'lz4_high_compressor_level':113}");
+        }
+        catch (RuntimeException e)
+        {
+            throw e.getCause();
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/io/compress/CompressedRandomAccessReaderTest.java b/test/unit/org/apache/cassandra/io/compress/CompressedRandomAccessReaderTest.java
index 802d9c8..309083b 100644
--- a/test/unit/org/apache/cassandra/io/compress/CompressedRandomAccessReaderTest.java
+++ b/test/unit/org/apache/cassandra/io/compress/CompressedRandomAccessReaderTest.java

@@ -29,11 +29,7 @@
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.io.sstable.CorruptSSTableException;
 import org.apache.cassandra.io.sstable.metadata.MetadataCollector;
-import org.apache.cassandra.io.util.ChannelProxy;
-import org.apache.cassandra.io.util.DataPosition;
-import org.apache.cassandra.io.util.MmappedRegions;
-import org.apache.cassandra.io.util.RandomAccessReader;
-import org.apache.cassandra.io.util.SequentialWriter;
+import org.apache.cassandra.io.util.*;
 import org.apache.cassandra.schema.CompressionParams;
 import org.apache.cassandra.utils.ChecksumType;
 import org.apache.cassandra.utils.SyncUtil;
@@ -78,7 +74,10 @@
         {
 
             MetadataCollector sstableMetadataCollector = new MetadataCollector(new ClusteringComparator(BytesType.instance));
-            try(CompressedSequentialWriter writer = new CompressedSequentialWriter(f, filename + ".metadata", CompressionParams.snappy(32), sstableMetadataCollector))
+            try(CompressedSequentialWriter writer = new CompressedSequentialWriter(f, filename + ".metadata",
+                                                                                   null, SequentialWriterOption.DEFAULT,
+                                                                                   CompressionParams.snappy(32),
+                                                                                   sstableMetadataCollector))
             {
 
                 for (int i = 0; i < 20; i++)
@@ -96,8 +95,8 @@
                 writer.finish();
             }
 
-            try(RandomAccessReader reader = new CompressedRandomAccessReader.Builder(channel,
-                                                                                     new CompressionMetadata(filename + ".metadata", f.length(), ChecksumType.CRC32))
+            try(RandomAccessReader reader = RandomAccessReader.builder(channel)
+                                            .compression(new CompressionMetadata(filename + ".metadata", f.length(), ChecksumType.CRC32))
                                             .build())
             {
                 String res = reader.readLine();
@@ -122,8 +121,10 @@
         {
             MetadataCollector sstableMetadataCollector = new MetadataCollector(new ClusteringComparator(BytesType.instance));
             try(SequentialWriter writer = compressed
-                ? new CompressedSequentialWriter(f, filename + ".metadata", CompressionParams.snappy(), sstableMetadataCollector)
-                : SequentialWriter.open(f))
+                ? new CompressedSequentialWriter(f, filename + ".metadata",
+                                                 null, SequentialWriterOption.DEFAULT,
+                                                 CompressionParams.snappy(), sstableMetadataCollector)
+                : new SequentialWriter(f))
             {
                 writer.write("The quick ".getBytes());
                 DataPosition mark = writer.mark();
@@ -142,16 +143,17 @@
             assert f.exists();
 
             CompressionMetadata compressionMetadata = compressed ? new CompressionMetadata(filename + ".metadata", f.length(), ChecksumType.CRC32) : null;
-            RandomAccessReader.Builder builder = compressed
-                                                 ? new CompressedRandomAccessReader.Builder(channel, compressionMetadata)
-                                                 : new RandomAccessReader.Builder(channel);
+            RandomAccessReader.Builder builder = RandomAccessReader.builder(channel);
+            if (compressed)
+                builder.compression(compressionMetadata);
 
+            MmappedRegions regions = null;
             if (usemmap)
             {
-                if (compressed)
-                    builder.regions(MmappedRegions.map(channel, compressionMetadata));
-                else
-                    builder.regions(MmappedRegions.map(channel, f.length()));
+                regions = compressed
+                        ? MmappedRegions.map(channel, compressionMetadata)
+                        : MmappedRegions.map(channel, f.length());
+                builder.regions(regions);
             }
 
             try(RandomAccessReader reader = builder.build())
@@ -163,8 +165,8 @@
                 assert new String(b).equals(expected) : "Expecting '" + expected + "', got '" + new String(b) + '\'';
             }
 
-            if (usemmap)
-                builder.regions.close();
+            if (regions != null)
+                regions.close();
         }
         finally
         {
@@ -191,7 +193,9 @@
         assertTrue(metadata.createNewFile());
 
         MetadataCollector sstableMetadataCollector = new MetadataCollector(new ClusteringComparator(BytesType.instance));
-        try (SequentialWriter writer = new CompressedSequentialWriter(file, metadata.getPath(), CompressionParams.snappy(), sstableMetadataCollector))
+        try (SequentialWriter writer = new CompressedSequentialWriter(file, metadata.getPath(),
+                                                                      null, SequentialWriterOption.DEFAULT,
+                                                                      CompressionParams.snappy(), sstableMetadataCollector))
         {
             writer.write(CONTENT.getBytes());
             writer.finish();
@@ -203,7 +207,7 @@
             CompressionMetadata meta = new CompressionMetadata(metadata.getPath(), file.length(), ChecksumType.CRC32);
             CompressionMetadata.Chunk chunk = meta.chunkFor(0);
 
-            try(RandomAccessReader reader = new CompressedRandomAccessReader.Builder(channel, meta).build())
+            try(RandomAccessReader reader = RandomAccessReader.builder(channel).compression(meta).build())
             {// read and verify compressed data
                 assertEquals(CONTENT, reader.readLine());
 
@@ -228,7 +232,7 @@
                         checksumModifier.write(random.nextInt());
                         SyncUtil.sync(checksumModifier); // making sure that change was synced with disk
 
-                        try (final RandomAccessReader r = new CompressedRandomAccessReader.Builder(channel, meta).build())
+                        try (final RandomAccessReader r = RandomAccessReader.builder(channel).compression(meta).build())
                         {
                             Throwable exception = null;
                             try
@@ -248,7 +252,7 @@
                     // lets write original checksum and check if we can read data
                     updateChecksum(checksumModifier, chunk.length, checksum);
 
-                    try (RandomAccessReader cr = new CompressedRandomAccessReader.Builder(channel, meta).build())
+                    try (RandomAccessReader cr = RandomAccessReader.builder(channel).compression(meta).build())
                     {
                         // read and verify compressed data
                         assertEquals(CONTENT, cr.readLine());

diff --git a/test/unit/org/apache/cassandra/io/compress/CompressedSequentialWriterTest.java b/test/unit/org/apache/cassandra/io/compress/CompressedSequentialWriterTest.java
index e045aad..9959c7b 100644
--- a/test/unit/org/apache/cassandra/io/compress/CompressedSequentialWriterTest.java
+++ b/test/unit/org/apache/cassandra/io/compress/CompressedSequentialWriterTest.java

@@ -27,6 +27,7 @@
 import static org.apache.commons.io.FileUtils.readFileToByteArray;
 import static org.junit.Assert.assertEquals;
 
+import com.google.common.io.Files;
 import org.junit.After;
 import org.junit.Test;
 
@@ -37,10 +38,7 @@
 import org.apache.cassandra.db.marshal.BytesType;
 import org.apache.cassandra.db.marshal.UTF8Type;
 import org.apache.cassandra.io.sstable.metadata.MetadataCollector;
-import org.apache.cassandra.io.util.ChannelProxy;
-import org.apache.cassandra.io.util.DataPosition;
-import org.apache.cassandra.io.util.RandomAccessReader;
-import org.apache.cassandra.io.util.SequentialWriterTest;
+import org.apache.cassandra.io.util.*;
 import org.apache.cassandra.schema.CompressionParams;
 import org.apache.cassandra.utils.ChecksumType;
 
@@ -92,7 +90,10 @@
 
             byte[] dataPre = new byte[bytesToTest];
             byte[] rawPost = new byte[bytesToTest];
-            try (CompressedSequentialWriter writer = new CompressedSequentialWriter(f, filename + ".metadata", compressionParameters, sstableMetadataCollector);)
+            try (CompressedSequentialWriter writer = new CompressedSequentialWriter(f, filename + ".metadata",
+                                                                                    null, SequentialWriterOption.DEFAULT,
+                                                                                    compressionParameters,
+                                                                                    sstableMetadataCollector))
             {
                 Random r = new Random(42);
 
@@ -117,7 +118,7 @@
             }
 
             assert f.exists();
-            RandomAccessReader reader = new CompressedRandomAccessReader.Builder(channel, new CompressionMetadata(filename + ".metadata", f.length(), ChecksumType.CRC32)).build();
+            RandomAccessReader reader = RandomAccessReader.builder(channel).compression(new CompressionMetadata(filename + ".metadata", f.length(), ChecksumType.CRC32)).build();
             assertEquals(dataPre.length + rawPost.length, reader.length());
             byte[] result = new byte[(int)reader.length()];
 
@@ -159,6 +160,49 @@
         writers.clear();
     }
 
+    @Test
+    @Override
+    public void resetAndTruncateTest()
+    {
+        File tempFile = new File(Files.createTempDir(), "reset.txt");
+        File offsetsFile = FileUtils.createTempFile("compressedsequentialwriter.offset", "test");
+        final int bufferSize = 48;
+        final int writeSize = 64;
+        byte[] toWrite = new byte[writeSize];
+        try (SequentialWriter writer = new CompressedSequentialWriter(tempFile, offsetsFile.getPath(),
+                                                                      null, SequentialWriterOption.DEFAULT,
+                                                                      CompressionParams.lz4(bufferSize),
+                                                                      new MetadataCollector(new ClusteringComparator(UTF8Type.instance))))
+        {
+            // write bytes greather than buffer
+            writer.write(toWrite);
+            long flushedOffset = writer.getLastFlushOffset();
+            assertEquals(writeSize, writer.position());
+            // mark thi position
+            DataPosition pos = writer.mark();
+            // write another
+            writer.write(toWrite);
+            // another buffer should be flushed
+            assertEquals(flushedOffset * 2, writer.getLastFlushOffset());
+            assertEquals(writeSize * 2, writer.position());
+            // reset writer
+            writer.resetAndTruncate(pos);
+            // current position and flushed size should be changed
+            assertEquals(writeSize, writer.position());
+            assertEquals(flushedOffset, writer.getLastFlushOffset());
+            // write another byte less than buffer
+            writer.write(new byte[]{0});
+            assertEquals(writeSize + 1, writer.position());
+            // flush off set should not be increase
+            assertEquals(flushedOffset, writer.getLastFlushOffset());
+            writer.finish();
+        }
+        catch (IOException e)
+        {
+            Assert.fail();
+        }
+    }
+
     protected TestableTransaction newTest() throws IOException
     {
         TestableCSW sw = new TestableCSW();
@@ -178,10 +222,11 @@
 
         private TestableCSW(File file, File offsetsFile) throws IOException
         {
-            this(file, offsetsFile, new CompressedSequentialWriter(file,
-                                                                   offsetsFile.getPath(),
+            this(file, offsetsFile, new CompressedSequentialWriter(file, offsetsFile.getPath(),
+                                                                   null, SequentialWriterOption.DEFAULT,
                                                                    CompressionParams.lz4(BUFFER_SIZE),
                                                                    new MetadataCollector(new ClusteringComparator(UTF8Type.instance))));
+
         }
 
         private TestableCSW(File file, File offsetsFile, CompressedSequentialWriter sw) throws IOException
@@ -196,7 +241,7 @@
             Assert.assertFalse(offsetsFile.exists());
             byte[] compressed = readFileToByteArray(file);
             byte[] uncompressed = new byte[partialContents.length];
-            LZ4Compressor.instance.uncompress(compressed, 0, compressed.length - 4, uncompressed, 0);
+            LZ4Compressor.create(Collections.<String, String>emptyMap()).uncompress(compressed, 0, compressed.length - 4, uncompressed, 0);
             Assert.assertTrue(Arrays.equals(partialContents, uncompressed));
         }
 
@@ -214,8 +259,8 @@
             int offset = (int) offsets.readLong();
             byte[] compressed = readFileToByteArray(file);
             byte[] uncompressed = new byte[fullContents.length];
-            LZ4Compressor.instance.uncompress(compressed, 0, offset - 4, uncompressed, 0);
-            LZ4Compressor.instance.uncompress(compressed, offset, compressed.length - (4 + offset), uncompressed, partialContents.length);
+            LZ4Compressor.create(Collections.<String, String>emptyMap()).uncompress(compressed, 0, offset - 4, uncompressed, 0);
+            LZ4Compressor.create(Collections.<String, String>emptyMap()).uncompress(compressed, offset, compressed.length - (4 + offset), uncompressed, partialContents.length);
             Assert.assertTrue(Arrays.equals(fullContents, uncompressed));
         }
 

diff --git a/test/unit/org/apache/cassandra/io/sstable/CQLSSTableWriterClientTest.java b/test/unit/org/apache/cassandra/io/sstable/CQLSSTableWriterClientTest.java
index d38276f..6df2d65 100644
--- a/test/unit/org/apache/cassandra/io/sstable/CQLSSTableWriterClientTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/CQLSSTableWriterClientTest.java

@@ -28,14 +28,10 @@
 import org.junit.Test;
 
 import org.apache.cassandra.config.Config;
-import org.apache.cassandra.config.DatabaseDescriptor;
-import org.apache.cassandra.db.Directories;
-import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.exceptions.InvalidRequestException;
 import org.apache.cassandra.io.util.FileUtils;
 
 import static org.junit.Assert.assertEquals;
-import static org.junit.Assert.assertTrue;
 
 public class CQLSSTableWriterClientTest
 {

diff --git a/test/unit/org/apache/cassandra/io/sstable/CQLSSTableWriterTest.java b/test/unit/org/apache/cassandra/io/sstable/CQLSSTableWriterTest.java
index 557beba..caa92f6 100644
--- a/test/unit/org/apache/cassandra/io/sstable/CQLSSTableWriterTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/CQLSSTableWriterTest.java

@@ -20,10 +20,10 @@
 import java.io.File;
 import java.io.FilenameFilter;
 import java.nio.ByteBuffer;
-import java.util.Arrays;
-import java.util.Iterator;
-import java.util.UUID;
+import java.util.*;
+import java.util.concurrent.ExecutionException;
 
+import com.google.common.collect.ImmutableList;
 import com.google.common.collect.ImmutableMap;
 import com.google.common.io.Files;
 
@@ -33,18 +33,22 @@
 
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
-import org.apache.cassandra.config.CFMetaData;
-import org.apache.cassandra.config.Config;
-import org.apache.cassandra.config.Schema;
-import org.apache.cassandra.cql3.QueryProcessor;
-import org.apache.cassandra.cql3.UntypedResultSet;
+import org.apache.cassandra.config.*;
+import org.apache.cassandra.cql3.*;
+import org.apache.cassandra.cql3.functions.UDHelper;
 import org.apache.cassandra.db.Keyspace;
 import org.apache.cassandra.dht.*;
+import org.apache.cassandra.exceptions.*;
 import org.apache.cassandra.service.StorageService;
-import org.apache.cassandra.utils.FBUtilities;
-import org.apache.cassandra.utils.OutputHandler;
+import org.apache.cassandra.utils.*;
+import com.datastax.driver.core.DataType;
+import com.datastax.driver.core.ProtocolVersion;
+import com.datastax.driver.core.TypeCodec;
+import com.datastax.driver.core.UDTValue;
+import com.datastax.driver.core.UserType;
 
 import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.fail;
 
 public class CQLSSTableWriterTest
 {
@@ -92,24 +96,7 @@
 
             writer.close();
 
-            SSTableLoader loader = new SSTableLoader(dataDir, new SSTableLoader.Client()
-            {
-                private String keyspace;
-
-                public void init(String keyspace)
-                {
-                    this.keyspace = keyspace;
-                    for (Range<Token> range : StorageService.instance.getLocalRanges("cql_keyspace"))
-                        addRangeForEndpoint(range, FBUtilities.getBroadcastAddress());
-                }
-
-                public CFMetaData getTableMetadata(String cfName)
-                {
-                    return Schema.instance.getCFMetaData(keyspace, cfName);
-                }
-            }, new OutputHandler.SystemOutput(false, false));
-
-            loader.stream().get();
+            loadSSTables(dataDir, KS);
 
             UntypedResultSet rs = QueryProcessor.executeInternal("SELECT * FROM cql_keyspace.table1;");
             assertEquals(4, rs.size());
@@ -300,6 +287,254 @@
             }
         }
 
+        loadSSTables(dataDir, KS);
+
+        UntypedResultSet rs = QueryProcessor.executeInternal("SELECT * FROM cql_keyspace2.table2;");
+        assertEquals(threads.length * NUMBER_WRITES_IN_RUNNABLE, rs.size());
+    }
+
+    @Test
+    @SuppressWarnings("unchecked")
+    public void testWritesWithUdts() throws Exception
+    {
+        final String KS = "cql_keyspace3";
+        final String TABLE = "table3";
+
+        final String schema = "CREATE TABLE " + KS + "." + TABLE + " ("
+                              + "  k int,"
+                              + "  v1 list<frozen<tuple2>>,"
+                              + "  v2 frozen<tuple3>,"
+                              + "  PRIMARY KEY (k)"
+                              + ")";
+
+        File tempdir = Files.createTempDir();
+        File dataDir = new File(tempdir.getAbsolutePath() + File.separator + KS + File.separator + TABLE);
+        assert dataDir.mkdirs();
+
+        CQLSSTableWriter writer = CQLSSTableWriter.builder()
+                                                  .inDirectory(dataDir)
+                                                  .withType("CREATE TYPE " + KS + ".tuple2 (a int, b int)")
+                                                  .withType("CREATE TYPE " + KS + ".tuple3 (a int, b int, c int)")
+                                                  .forTable(schema)
+                                                  .using("INSERT INTO " + KS + "." + TABLE + " (k, v1, v2) " +
+                                                         "VALUES (?, ?, ?)").build();
+
+        UserType tuple2Type = writer.getUDType("tuple2");
+        UserType tuple3Type = writer.getUDType("tuple3");
+        for (int i = 0; i < 100; i++)
+        {
+            writer.addRow(i,
+                          ImmutableList.builder()
+                                       .add(tuple2Type.newValue()
+                                                      .setInt("a", i * 10)
+                                                      .setInt("b", i * 20))
+                                       .add(tuple2Type.newValue()
+                                                      .setInt("a", i * 30)
+                                                      .setInt("b", i * 40))
+                                       .build(),
+                          tuple3Type.newValue()
+                                    .setInt("a", i * 100)
+                                    .setInt("b", i * 200)
+                                    .setInt("c", i * 300));
+        }
+
+        writer.close();
+        loadSSTables(dataDir, KS);
+
+        UntypedResultSet resultSet = QueryProcessor.executeInternal("SELECT * FROM " + KS + "." + TABLE);
+        TypeCodec collectionCodec = UDHelper.codecFor(DataType.CollectionType.frozenList(tuple2Type));
+        TypeCodec tuple3Codec = UDHelper.codecFor(tuple3Type);
+
+        assertEquals(resultSet.size(), 100);
+        int cnt = 0;
+        for (UntypedResultSet.Row row: resultSet) {
+            assertEquals(cnt,
+                         row.getInt("k"));
+            List<UDTValue> values = (List<UDTValue>) collectionCodec.deserialize(row.getBytes("v1"),
+                                                                                 ProtocolVersion.NEWEST_SUPPORTED);
+            assertEquals(values.get(0).getInt("a"), cnt * 10);
+            assertEquals(values.get(0).getInt("b"), cnt * 20);
+            assertEquals(values.get(1).getInt("a"), cnt * 30);
+            assertEquals(values.get(1).getInt("b"), cnt * 40);
+
+            UDTValue v2 = (UDTValue) tuple3Codec.deserialize(row.getBytes("v2"), ProtocolVersion.NEWEST_SUPPORTED);
+
+            assertEquals(v2.getInt("a"), cnt * 100);
+            assertEquals(v2.getInt("b"), cnt * 200);
+            assertEquals(v2.getInt("c"), cnt * 300);
+            cnt++;
+        }
+    }
+
+    @Test
+    @SuppressWarnings("unchecked")
+    public void testWritesWithDependentUdts() throws Exception
+    {
+        final String KS = "cql_keyspace4";
+        final String TABLE = "table4";
+
+        final String schema = "CREATE TABLE " + KS + "." + TABLE + " ("
+                              + "  k int,"
+                              + "  v1 frozen<nested_tuple>,"
+                              + "  PRIMARY KEY (k)"
+                              + ")";
+
+        File tempdir = Files.createTempDir();
+        File dataDir = new File(tempdir.getAbsolutePath() + File.separator + KS + File.separator + TABLE);
+        assert dataDir.mkdirs();
+
+        CQLSSTableWriter writer = CQLSSTableWriter.builder()
+                                                  .inDirectory(dataDir)
+                                                  .withType("CREATE TYPE " + KS + ".nested_tuple (c int, tpl frozen<tuple2>)")
+                                                  .withType("CREATE TYPE " + KS + ".tuple2 (a int, b int)")
+                                                  .forTable(schema)
+                                                  .using("INSERT INTO " + KS + "." + TABLE + " (k, v1) " +
+                                                         "VALUES (?, ?)")
+                                                  .build();
+
+        UserType tuple2Type = writer.getUDType("tuple2");
+        UserType nestedTuple = writer.getUDType("nested_tuple");
+        TypeCodec tuple2Codec = UDHelper.codecFor(tuple2Type);
+        TypeCodec nestedTupleCodec = UDHelper.codecFor(nestedTuple);
+
+        for (int i = 0; i < 100; i++)
+        {
+            writer.addRow(i,
+                          nestedTuple.newValue()
+                                     .setInt("c", i * 100)
+                                     .set("tpl",
+                                          tuple2Type.newValue()
+                                                    .setInt("a", i * 200)
+                                                    .setInt("b", i * 300),
+                                          tuple2Codec));
+        }
+
+        writer.close();
+        loadSSTables(dataDir, KS);
+
+        UntypedResultSet resultSet = QueryProcessor.executeInternal("SELECT * FROM " + KS + "." + TABLE);
+
+        assertEquals(resultSet.size(), 100);
+        int cnt = 0;
+        for (UntypedResultSet.Row row: resultSet) {
+            assertEquals(cnt,
+                         row.getInt("k"));
+            UDTValue nestedTpl = (UDTValue) nestedTupleCodec.deserialize(row.getBytes("v1"),
+                                                                         ProtocolVersion.NEWEST_SUPPORTED);
+            assertEquals(nestedTpl.getInt("c"), cnt * 100);
+            UDTValue tpl = nestedTpl.getUDTValue("tpl");
+            assertEquals(tpl.getInt("a"), cnt * 200);
+            assertEquals(tpl.getInt("b"), cnt * 300);
+
+            cnt++;
+        }
+    }
+
+    @Test
+    public void testUnsetValues() throws Exception
+    {
+        final String KS = "cql_keyspace5";
+        final String TABLE = "table5";
+
+        final String schema = "CREATE TABLE " + KS + "." + TABLE + " ("
+                              + "  k int,"
+                              + "  c1 int,"
+                              + "  c2 int,"
+                              + "  v text,"
+                              + "  PRIMARY KEY (k, c1, c2)"
+                              + ")";
+
+        File tempdir = Files.createTempDir();
+        File dataDir = new File(tempdir.getAbsolutePath() + File.separator + KS + File.separator + TABLE);
+        assert dataDir.mkdirs();
+
+        CQLSSTableWriter writer = CQLSSTableWriter.builder()
+                                                  .inDirectory(dataDir)
+                                                  .forTable(schema)
+                                                  .using("INSERT INTO " + KS + "." + TABLE + " (k, c1, c2, v) " +
+                                                         "VALUES (?, ?, ?, ?)")
+                                                  .build();
+
+        try
+        {
+            writer.addRow(1, 1, 1);
+            fail("Passing less arguments then expected in prepared statement should not work.");
+        }
+        catch (InvalidRequestException e)
+        {
+            assertEquals("Invalid number of arguments, expecting 4 values but got 3",
+                         e.getMessage());
+        }
+
+        try
+        {
+            writer.addRow(1, 1, CQLSSTableWriter.UNSET_VALUE, "1");
+            fail("Unset values should not work with clustering columns.");
+        }
+        catch (InvalidRequestException e)
+        {
+            assertEquals("Invalid unset value for column c2",
+                         e.getMessage());
+        }
+
+        try
+        {
+            writer.addRow(ImmutableMap.<String, Object>builder().put("k", 1).put("c1", 1).put("v", CQLSSTableWriter.UNSET_VALUE).build());
+            fail("Unset or null clustering columns should not be allowed.");
+        }
+        catch (InvalidRequestException e)
+        {
+            assertEquals("Invalid null value in condition for column c2",
+                         e.getMessage());
+        }
+
+        writer.addRow(1, 1, 1, CQLSSTableWriter.UNSET_VALUE);
+        writer.addRow(2, 2, 2, null);
+        writer.addRow(Arrays.asList(3, 3, 3, CQLSSTableWriter.UNSET_VALUE));
+        writer.addRow(ImmutableMap.<String, Object>builder()
+                                  .put("k", 4)
+                                  .put("c1", 4)
+                                  .put("c2", 4)
+                                  .put("v", CQLSSTableWriter.UNSET_VALUE)
+                                  .build());
+        writer.addRow(Arrays.asList(3, 3, 3, CQLSSTableWriter.UNSET_VALUE));
+        writer.addRow(5, 5, 5, "5");
+
+        writer.close();
+        loadSSTables(dataDir, KS);
+
+        UntypedResultSet resultSet = QueryProcessor.executeInternal("SELECT * FROM " + KS + "." + TABLE);
+        Iterator<UntypedResultSet.Row> iter = resultSet.iterator();
+        UntypedResultSet.Row r1 = iter.next();
+        assertEquals(1, r1.getInt("k"));
+        assertEquals(1, r1.getInt("c1"));
+        assertEquals(1, r1.getInt("c2"));
+        assertEquals(false, r1.has("v"));
+        UntypedResultSet.Row r2 = iter.next();
+        assertEquals(2, r2.getInt("k"));
+        assertEquals(2, r2.getInt("c1"));
+        assertEquals(2, r2.getInt("c2"));
+        assertEquals(false, r2.has("v"));
+        UntypedResultSet.Row r3 = iter.next();
+        assertEquals(3, r3.getInt("k"));
+        assertEquals(3, r3.getInt("c1"));
+        assertEquals(3, r3.getInt("c2"));
+        assertEquals(false, r3.has("v"));
+        UntypedResultSet.Row r4 = iter.next();
+        assertEquals(4, r4.getInt("k"));
+        assertEquals(4, r4.getInt("c1"));
+        assertEquals(4, r4.getInt("c2"));
+        assertEquals(false, r3.has("v"));
+        UntypedResultSet.Row r5 = iter.next();
+        assertEquals(5, r5.getInt("k"));
+        assertEquals(5, r5.getInt("c1"));
+        assertEquals(5, r5.getInt("c2"));
+        assertEquals(true, r5.has("v"));
+        assertEquals("5", r5.getString("v"));
+    }
+
+    private static void loadSSTables(File dataDir, String ks) throws ExecutionException, InterruptedException
+    {
         SSTableLoader loader = new SSTableLoader(dataDir, new SSTableLoader.Client()
         {
             private String keyspace;
@@ -307,7 +542,7 @@
             public void init(String keyspace)
             {
                 this.keyspace = keyspace;
-                for (Range<Token> range : StorageService.instance.getLocalRanges(KS))
+                for (Range<Token> range : StorageService.instance.getLocalRanges(ks))
                     addRangeForEndpoint(range, FBUtilities.getBroadcastAddress());
             }
 
@@ -318,8 +553,5 @@
         }, new OutputHandler.SystemOutput(false, false));
 
         loader.stream().get();
-
-        UntypedResultSet rs = QueryProcessor.executeInternal("SELECT * FROM cql_keyspace2.table2;");
-        assertEquals(threads.length * NUMBER_WRITES_IN_RUNNABLE, rs.size());
     }
 }

diff --git a/test/unit/org/apache/cassandra/io/sstable/DescriptorTest.java b/test/unit/org/apache/cassandra/io/sstable/DescriptorTest.java
index 184d637..f769293 100644
--- a/test/unit/org/apache/cassandra/io/sstable/DescriptorTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/DescriptorTest.java

@@ -27,6 +27,7 @@
 
 import org.apache.cassandra.db.Directories;
 import org.apache.cassandra.io.sstable.format.SSTableFormat;
+import org.apache.cassandra.io.sstable.format.big.BigFormat;
 import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.Pair;
 
@@ -119,8 +120,8 @@
     {
         // Descriptor should be equal when parent directory points to the same directory
         File dir = new File(".");
-        Descriptor desc1 = new Descriptor(dir, "ks", "cf", 1);
-        Descriptor desc2 = new Descriptor(dir.getAbsoluteFile(), "ks", "cf", 1);
+        Descriptor desc1 = new Descriptor(dir, "ks", "cf", 1, SSTableFormat.Type.BIG);
+        Descriptor desc2 = new Descriptor(dir.getAbsoluteFile(), "ks", "cf", 1, SSTableFormat.Type.BIG);
         assertEquals(desc1, desc2);
         assertEquals(desc1.hashCode(), desc2.hashCode());
     }

diff --git a/test/unit/org/apache/cassandra/io/sstable/IndexHelperTest.java b/test/unit/org/apache/cassandra/io/sstable/IndexHelperTest.java
deleted file mode 100644
index e6328de..0000000
--- a/test/unit/org/apache/cassandra/io/sstable/IndexHelperTest.java
+++ /dev/null

@@ -1,78 +0,0 @@
-/*
-* Licensed to the Apache Software Foundation (ASF) under one
-* or more contributor license agreements.  See the NOTICE file
-* distributed with this work for additional information
-* regarding copyright ownership.  The ASF licenses this file
-* to you under the Apache License, Version 2.0 (the
-* "License"); you may not use this file except in compliance
-* with the License.  You may obtain a copy of the License at
-*
-*    http://www.apache.org/licenses/LICENSE-2.0
-*
-* Unless required by applicable law or agreed to in writing,
-* software distributed under the License is distributed on an
-* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-* KIND, either express or implied.  See the License for the
-* specific language governing permissions and limitations
-* under the License.
-*/
-package org.apache.cassandra.io.sstable;
-
-import java.util.ArrayList;
-import java.util.Collections;
-import java.util.List;
-
-import org.junit.Test;
-
-import org.apache.cassandra.Util;
-import org.apache.cassandra.db.ClusteringComparator;
-import org.apache.cassandra.db.ClusteringPrefix;
-import org.apache.cassandra.db.DeletionTime;
-import org.apache.cassandra.db.marshal.AbstractType;
-import org.apache.cassandra.db.marshal.LongType;
-import org.apache.cassandra.utils.FBUtilities;
-
-import static org.apache.cassandra.io.sstable.IndexHelper.IndexInfo;
-import static org.junit.Assert.assertEquals;
-
-public class IndexHelperTest
-{
-
-    private static ClusteringComparator comp = new ClusteringComparator(Collections.<AbstractType<?>>singletonList(LongType.instance));
-    private static ClusteringPrefix cn(long l)
-    {
-        return Util.clustering(comp, l);
-    }
-
-    @Test
-    public void testIndexHelper()
-    {
-        DeletionTime deletionInfo = new DeletionTime(FBUtilities.timestampMicros(), FBUtilities.nowInSeconds());
-
-        List<IndexInfo> indexes = new ArrayList<>();
-        indexes.add(new IndexInfo(cn(0L), cn(5L), 0, 0, deletionInfo));
-        indexes.add(new IndexInfo(cn(10L), cn(15L), 0, 0, deletionInfo));
-        indexes.add(new IndexInfo(cn(20L), cn(25L), 0, 0, deletionInfo));
-
-        assertEquals(0, IndexHelper.indexFor(cn(-1L), indexes, comp, false, -1));
-        assertEquals(0, IndexHelper.indexFor(cn(5L), indexes, comp, false, -1));
-        assertEquals(1, IndexHelper.indexFor(cn(12L), indexes, comp, false, -1));
-        assertEquals(2, IndexHelper.indexFor(cn(17L), indexes, comp, false, -1));
-        assertEquals(3, IndexHelper.indexFor(cn(100L), indexes, comp, false, -1));
-        assertEquals(3, IndexHelper.indexFor(cn(100L), indexes, comp, false, 0));
-        assertEquals(3, IndexHelper.indexFor(cn(100L), indexes, comp, false, 1));
-        assertEquals(3, IndexHelper.indexFor(cn(100L), indexes, comp, false, 2));
-        assertEquals(3, IndexHelper.indexFor(cn(100L), indexes, comp, false, 3));
-
-        assertEquals(-1, IndexHelper.indexFor(cn(-1L), indexes, comp, true, -1));
-        assertEquals(0, IndexHelper.indexFor(cn(5L), indexes, comp, true, 3));
-        assertEquals(0, IndexHelper.indexFor(cn(5L), indexes, comp, true, 2));
-        assertEquals(1, IndexHelper.indexFor(cn(17L), indexes, comp, true, 3));
-        assertEquals(2, IndexHelper.indexFor(cn(100L), indexes, comp, true, 3));
-        assertEquals(2, IndexHelper.indexFor(cn(100L), indexes, comp, true, 4));
-        assertEquals(1, IndexHelper.indexFor(cn(12L), indexes, comp, true, 3));
-        assertEquals(1, IndexHelper.indexFor(cn(12L), indexes, comp, true, 2));
-        assertEquals(1, IndexHelper.indexFor(cn(100L), indexes, comp, true, 1));
-        assertEquals(2, IndexHelper.indexFor(cn(100L), indexes, comp, true, 2));
-    }
-}

diff --git a/test/unit/org/apache/cassandra/io/sstable/LargePartitionsTest.java b/test/unit/org/apache/cassandra/io/sstable/LargePartitionsTest.java
new file mode 100644
index 0000000..f97356a
--- /dev/null
+++ b/test/unit/org/apache/cassandra/io/sstable/LargePartitionsTest.java

@@ -0,0 +1,219 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.io.sstable;
+
+import java.util.concurrent.ThreadLocalRandom;
+
+import org.junit.Ignore;
+import org.junit.Test;
+import org.junit.runner.RunWith;
+
+import org.apache.cassandra.OrderedJUnit4ClassRunner;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.cql3.CQLTester;
+import org.apache.cassandra.metrics.CacheMetrics;
+import org.apache.cassandra.service.CacheService;
+
+/**
+ * Test intended to manually measure GC pressure to write and read partitions of different size
+ * for CASSANDRA-11206.
+ */
+@RunWith(OrderedJUnit4ClassRunner.class)
+@Ignore // all these tests take very, very long - so only run them manually
+public class LargePartitionsTest extends CQLTester
+{
+
+    @FunctionalInterface
+    interface Measured
+    {
+        void measure() throws Throwable;
+    }
+
+    private static void measured(String name, Measured measured) throws Throwable
+    {
+        long t0 = System.currentTimeMillis();
+        measured.measure();
+        long t = System.currentTimeMillis() - t0;
+        System.out.println(name + " took " + t + " ms");
+    }
+
+    private static String randomText(int bytes)
+    {
+        char[] ch = new char[bytes];
+        ThreadLocalRandom r = ThreadLocalRandom.current();
+        for (int i = 0; i < bytes; i++)
+            ch[i] = (char) (32 + r.nextInt(95));
+        return new String(ch);
+    }
+
+    private static final int rowKBytes = 8;
+
+    private void withPartitionSize(long partitionKBytes, long totalMBytes) throws Throwable
+    {
+        long totalKBytes = totalMBytes * 1024L;
+
+        createTable("CREATE TABLE %s (pk text, ck text, val text, PRIMARY KEY (pk, ck))");
+
+        String name = "part=" + partitionKBytes + "k total=" + totalMBytes + 'M';
+
+        measured("INSERTs for " + name, () -> {
+            for (long writtenKBytes = 0L; writtenKBytes < totalKBytes; writtenKBytes += partitionKBytes)
+            {
+                String pk = Long.toBinaryString(writtenKBytes);
+                for (long kbytes = 0L; kbytes < partitionKBytes; kbytes += rowKBytes)
+                {
+                    String ck = Long.toBinaryString(kbytes);
+                    execute("INSERT INTO %s (pk, ck, val) VALUES (?,?,?)", pk, ck, randomText(rowKBytes * 1024));
+                }
+            }
+        });
+
+        measured("flush for " + name, () -> flush(true));
+
+        CacheService.instance.keyCache.clear();
+
+        measured("compact for " + name, () -> {
+            keyCacheMetrics("before compaction");
+            compact();
+            keyCacheMetrics("after compaction");
+        });
+
+        measured("SELECTs 1 for " + name, () -> selects(partitionKBytes, totalKBytes));
+
+        measured("SELECTs 2 for " + name, () -> selects(partitionKBytes, totalKBytes));
+    }
+
+    private void selects(long partitionKBytes, long totalKBytes) throws Throwable
+    {
+        for (int i = 0; i < 50000; i++)
+        {
+            long pk = ThreadLocalRandom.current().nextLong(totalKBytes / partitionKBytes) * partitionKBytes;
+            long ck = ThreadLocalRandom.current().nextLong(partitionKBytes / rowKBytes) * rowKBytes;
+            execute("SELECT val FROM %s WHERE pk=? AND ck=?",
+                    Long.toBinaryString(pk),
+                    Long.toBinaryString(ck)).one();
+            if (i % 1000 == 0)
+                keyCacheMetrics("after " + i + " selects");
+        }
+        keyCacheMetrics("after all selects");
+    }
+
+    private static void keyCacheMetrics(String title)
+    {
+        CacheMetrics metrics = CacheService.instance.keyCache.getMetrics();
+        System.out.println("Key cache metrics " + title + ": capacity:" + metrics.capacity.getValue() +
+                           " size:"+metrics.size.getValue()+
+                           " entries:" + metrics.entries.getValue() +
+                           " hit-rate:"+metrics.hitRate.getValue() +
+                           " one-min-rate:"+metrics.oneMinuteHitRate.getValue());
+    }
+
+    @Test
+    public void prepare() throws Throwable
+    {
+        for (int i = 0; i < 4; i++)
+        {
+            withPartitionSize(8L, 32L);
+        }
+    }
+
+    @Test
+    public void test_01_16k() throws Throwable
+    {
+        withPartitionSize(16L, 1024L);
+    }
+
+    @Test
+    public void test_02_512k() throws Throwable
+    {
+        withPartitionSize(512L, 1024L);
+    }
+
+    @Test
+    public void test_03_1M() throws Throwable
+    {
+        withPartitionSize(1024L, 1024L);
+    }
+
+    @Test
+    public void test_04_4M() throws Throwable
+    {
+        withPartitionSize(4L * 1024L, 1024L);
+    }
+
+    @Test
+    public void test_05_8M() throws Throwable
+    {
+        withPartitionSize(8L * 1024L, 1024L);
+    }
+
+    @Test
+    public void test_06_16M() throws Throwable
+    {
+        withPartitionSize(16L * 1024L, 1024L);
+    }
+
+    @Test
+    public void test_07_32M() throws Throwable
+    {
+        withPartitionSize(32L * 1024L, 1024L);
+    }
+
+    @Test
+    public void test_08_64M() throws Throwable
+    {
+        withPartitionSize(64L * 1024L, 1024L);
+    }
+
+    @Test
+    public void test_09_256M() throws Throwable
+    {
+        withPartitionSize(256L * 1024L, 4 * 1024L);
+    }
+
+    @Test
+    public void test_10_512M() throws Throwable
+    {
+        withPartitionSize(512L * 1024L, 4 * 1024L);
+    }
+
+    @Test
+    public void test_11_1G() throws Throwable
+    {
+        withPartitionSize(1024L * 1024L, 8 * 1024L);
+    }
+
+    @Test
+    public void test_12_2G() throws Throwable
+    {
+        withPartitionSize(2L * 1024L * 1024L, 8 * 1024L);
+    }
+
+    @Test
+    public void test_13_4G() throws Throwable
+    {
+        withPartitionSize(4L * 1024L * 1024L, 16 * 1024L);
+    }
+
+    @Test
+    public void test_14_8G() throws Throwable
+    {
+        withPartitionSize(8L * 1024L * 1024L, 32 * 1024L);
+    }
+}

diff --git a/test/unit/org/apache/cassandra/io/sstable/LegacySSTableTest.java b/test/unit/org/apache/cassandra/io/sstable/LegacySSTableTest.java
index 62228e3..803e275 100644
--- a/test/unit/org/apache/cassandra/io/sstable/LegacySSTableTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/LegacySSTableTest.java

@@ -36,6 +36,7 @@
 import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.SchemaLoader;
+import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.cql3.QueryProcessor;
 import org.apache.cassandra.cql3.UntypedResultSet;
 import org.apache.cassandra.db.ColumnFamilyStore;
@@ -126,14 +127,31 @@
     @Test
     public void testLoadLegacyCqlTables() throws Exception
     {
+        DatabaseDescriptor.setColumnIndexCacheSize(99999);
+        CacheService.instance.invalidateKeyCache();
+        doTestLegacyCqlTables();
+    }
+
+    @Test
+    public void testLoadLegacyCqlTablesShallow() throws Exception
+    {
+        DatabaseDescriptor.setColumnIndexCacheSize(0);
+        CacheService.instance.invalidateKeyCache();
+        doTestLegacyCqlTables();
+    }
+
+    private void doTestLegacyCqlTables() throws Exception
+    {
         for (String legacyVersion : legacyVersions)
         {
             logger.info("Loading legacy version: {}", legacyVersion);
+            truncateLegacyTables(legacyVersion);
             loadLegacyTables(legacyVersion);
             CacheService.instance.invalidateKeyCache();
             long startCount = CacheService.instance.keyCache.size();
             verifyReads(legacyVersion);
             verifyCache(legacyVersion, startCount);
+            compactLegacyTables(legacyVersion);
         }
     }
 
@@ -175,6 +193,30 @@
                                              .execute().get();
     }
 
+    private static void truncateLegacyTables(String legacyVersion) throws Exception
+    {
+        for (int compact = 0; compact <= 1; compact++)
+        {
+            logger.info("Truncating legacy version {}{}", legacyVersion, getCompactNameSuffix(compact));
+            Keyspace.open("legacy_tables").getColumnFamilyStore(String.format("legacy_%s_simple%s", legacyVersion, getCompactNameSuffix(compact))).truncateBlocking();
+            Keyspace.open("legacy_tables").getColumnFamilyStore(String.format("legacy_%s_simple_counter%s", legacyVersion, getCompactNameSuffix(compact))).truncateBlocking();
+            Keyspace.open("legacy_tables").getColumnFamilyStore(String.format("legacy_%s_clust%s", legacyVersion, getCompactNameSuffix(compact))).truncateBlocking();
+            Keyspace.open("legacy_tables").getColumnFamilyStore(String.format("legacy_%s_clust_counter%s", legacyVersion, getCompactNameSuffix(compact))).truncateBlocking();
+        }
+    }
+
+    private static void compactLegacyTables(String legacyVersion) throws Exception
+    {
+        for (int compact = 0; compact <= 1; compact++)
+        {
+            logger.info("Compacting legacy version {}{}", legacyVersion, getCompactNameSuffix(compact));
+            Keyspace.open("legacy_tables").getColumnFamilyStore(String.format("legacy_%s_simple%s", legacyVersion, getCompactNameSuffix(compact))).forceMajorCompaction();
+            Keyspace.open("legacy_tables").getColumnFamilyStore(String.format("legacy_%s_simple_counter%s", legacyVersion, getCompactNameSuffix(compact))).forceMajorCompaction();
+            Keyspace.open("legacy_tables").getColumnFamilyStore(String.format("legacy_%s_clust%s", legacyVersion, getCompactNameSuffix(compact))).forceMajorCompaction();
+            Keyspace.open("legacy_tables").getColumnFamilyStore(String.format("legacy_%s_clust_counter%s", legacyVersion, getCompactNameSuffix(compact))).forceMajorCompaction();
+        }
+    }
+
     private static void loadLegacyTables(String legacyVersion) throws Exception
     {
         for (int compact = 0; compact <= 1; compact++)

diff --git a/test/unit/org/apache/cassandra/io/sstable/SSTableCorruptionDetectionTest.java b/test/unit/org/apache/cassandra/io/sstable/SSTableCorruptionDetectionTest.java
index 4da8519..88ed52e 100644
--- a/test/unit/org/apache/cassandra/io/sstable/SSTableCorruptionDetectionTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/SSTableCorruptionDetectionTest.java

@@ -167,6 +167,9 @@
             }
             finally
             {
+                if (ChunkCache.instance != null)
+                    ChunkCache.instance.invalidateFile(ssTableReader.getFilename());
+
                 restore(raf, corruptionPosition, backup);
             }
         }
@@ -207,7 +210,7 @@
             for (int i = 0; i < numberOfPks; i++)
             {
                 DecoratedKey dk = Util.dk(String.format("pkvalue_%07d", i));
-                try (UnfilteredRowIterator rowIter = sstable.iterator(dk, ColumnFilter.all(cfs.metadata), false, false))
+                try (UnfilteredRowIterator rowIter = sstable.iterator(dk, Slices.ALL, ColumnFilter.all(cfs.metadata), false, false))
                 {
                     while (rowIter.hasNext())
                     {

diff --git a/test/unit/org/apache/cassandra/io/sstable/SSTableLoaderTest.java b/test/unit/org/apache/cassandra/io/sstable/SSTableLoaderTest.java
index ad7523d..72c7467 100644
--- a/test/unit/org/apache/cassandra/io/sstable/SSTableLoaderTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/SSTableLoaderTest.java

@@ -142,7 +142,7 @@
 
         assertEquals(1, partitions.size());
         assertEquals("key1", AsciiType.instance.getString(partitions.get(0).partitionKey().getKey()));
-        assertEquals(ByteBufferUtil.bytes("100"), partitions.get(0).getRow(new Clustering(ByteBufferUtil.bytes("col1")))
+        assertEquals(ByteBufferUtil.bytes("100"), partitions.get(0).getRow(Clustering.make(ByteBufferUtil.bytes("col1")))
                                                                    .getCell(cfmeta.getColumnDefinition(ByteBufferUtil.bytes("val")))
                                                                    .value());
 

diff --git a/test/unit/org/apache/cassandra/io/sstable/SSTableReaderTest.java b/test/unit/org/apache/cassandra/io/sstable/SSTableReaderTest.java
index 640b68b..151a995 100644
--- a/test/unit/org/apache/cassandra/io/sstable/SSTableReaderTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/SSTableReaderTest.java

@@ -47,7 +47,6 @@
 import org.apache.cassandra.io.sstable.format.SSTableReader;
 import org.apache.cassandra.io.util.FileDataInput;
 import org.apache.cassandra.io.util.MmappedRegions;
-import org.apache.cassandra.io.util.MmappedSegmentedFile;
 import org.apache.cassandra.io.util.SegmentedFile;
 import org.apache.cassandra.schema.CachingParams;
 import org.apache.cassandra.schema.KeyspaceParams;
@@ -606,9 +605,9 @@
                                              .build();
         Index.Searcher searcher = indexedCFS.indexManager.getBestIndexFor(rc).searcherFor(rc);
         assertNotNull(searcher);
-        try (ReadOrderGroup orderGroup = ReadOrderGroup.forCommand(rc))
+        try (ReadExecutionController executionController = rc.executionController())
         {
-            assertEquals(1, Util.size(UnfilteredPartitionIterators.filter(searcher.search(orderGroup), rc.nowInSec())));
+            assertEquals(1, Util.size(UnfilteredPartitionIterators.filter(searcher.search(executionController), rc.nowInSec())));
         }
     }
 

diff --git a/test/unit/org/apache/cassandra/io/sstable/SSTableRewriterTest.java b/test/unit/org/apache/cassandra/io/sstable/SSTableRewriterTest.java
index c842b7f..9c6da77 100644
--- a/test/unit/org/apache/cassandra/io/sstable/SSTableRewriterTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/SSTableRewriterTest.java

@@ -87,7 +87,7 @@
         int nowInSec = FBUtilities.nowInSeconds();
         try (AbstractCompactionStrategy.ScannerList scanners = cfs.getCompactionStrategyManager().getScanners(sstables);
              LifecycleTransaction txn = cfs.getTracker().tryModify(sstables, OperationType.UNKNOWN);
-             SSTableRewriter writer = new SSTableRewriter(txn, 1000, false);
+             SSTableRewriter writer = SSTableRewriter.constructKeepingOriginals(txn, false, 1000);
              CompactionController controller = new CompactionController(cfs, sstables, cfs.gcBefore(nowInSec));
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, scanners.scanners, controller, nowInSec, UUIDGen.getTimeUUID()))
         {
@@ -119,7 +119,7 @@
         int nowInSec = FBUtilities.nowInSeconds();
         try (AbstractCompactionStrategy.ScannerList scanners = cfs.getCompactionStrategyManager().getScanners(sstables);
              LifecycleTransaction txn = cfs.getTracker().tryModify(sstables, OperationType.UNKNOWN);
-             SSTableRewriter writer = new SSTableRewriter(txn, 1000, false, 10000000, false);
+             SSTableRewriter writer = new SSTableRewriter(txn, 1000, 10000000, false);
              CompactionController controller = new CompactionController(cfs, sstables, cfs.gcBefore(nowInSec));
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, scanners.scanners, controller, nowInSec, UUIDGen.getTimeUUID()))
         {
@@ -152,7 +152,7 @@
         boolean checked = false;
         try (AbstractCompactionStrategy.ScannerList scanners = cfs.getCompactionStrategyManager().getScanners(sstables);
              LifecycleTransaction txn = cfs.getTracker().tryModify(sstables, OperationType.UNKNOWN);
-             SSTableRewriter writer = new SSTableRewriter(txn, 1000, false, 10000000, false);
+             SSTableRewriter writer = new SSTableRewriter(txn, 1000, 10000000, false);
              CompactionController controller = new CompactionController(cfs, sstables, cfs.gcBefore(nowInSec));
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, scanners.scanners, controller, nowInSec, UUIDGen.getTimeUUID()))
         {
@@ -210,7 +210,7 @@
         try (ISSTableScanner scanner = s.getScanner();
              CompactionController controller = new CompactionController(cfs, compacting, 0);
              LifecycleTransaction txn = cfs.getTracker().tryModify(compacting, OperationType.UNKNOWN);
-             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, false, 10000000, false);
+             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, 10000000, false);
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, Collections.singletonList(scanner), controller, FBUtilities.nowInSeconds(), UUIDGen.getTimeUUID()))
         {
             rewriter.switchWriter(getWriter(cfs, s.descriptor.directory, txn));
@@ -265,7 +265,7 @@
         try (ISSTableScanner scanner = s.getScanner();
              CompactionController controller = new CompactionController(cfs, compacting, 0);
              LifecycleTransaction txn = cfs.getTracker().tryModify(compacting, OperationType.UNKNOWN);
-             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, false, 10000000, false);
+             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, 10000000, false);
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, Collections.singletonList(scanner), controller, FBUtilities.nowInSeconds(), UUIDGen.getTimeUUID()))
         {
             rewriter.switchWriter(getWriter(cfs, s.descriptor.directory, txn));
@@ -416,7 +416,7 @@
         try (ISSTableScanner scanner = s.getScanner();
              CompactionController controller = new CompactionController(cfs, compacting, 0);
              LifecycleTransaction txn = cfs.getTracker().tryModify(compacting, OperationType.UNKNOWN);
-             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, false, 10000000, false))
+             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, 10000000, false))
         {
             rewriter.switchWriter(getWriter(cfs, s.descriptor.directory, txn));
             test.run(scanner, controller, s, cfs, rewriter, txn);
@@ -448,7 +448,7 @@
         try (ISSTableScanner scanner = s.getScanner();
              CompactionController controller = new CompactionController(cfs, compacting, 0);
              LifecycleTransaction txn = cfs.getTracker().tryModify(compacting, OperationType.UNKNOWN);
-             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, false, 10000000, false);
+             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, 10000000, false);
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, Collections.singletonList(scanner), controller, FBUtilities.nowInSeconds(), UUIDGen.getTimeUUID()))
         {
             rewriter.switchWriter(getWriter(cfs, s.descriptor.directory, txn));
@@ -494,7 +494,7 @@
         try (ISSTableScanner scanner = s.getScanner();
              CompactionController controller = new CompactionController(cfs, compacting, 0);
              LifecycleTransaction txn = cfs.getTracker().tryModify(compacting, OperationType.UNKNOWN);
-             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, false, 10000000, false);
+             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, 10000000, false);
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, Collections.singletonList(scanner), controller, FBUtilities.nowInSeconds(), UUIDGen.getTimeUUID()))
         {
             rewriter.switchWriter(getWriter(cfs, s.descriptor.directory, txn));
@@ -534,7 +534,7 @@
         try (ISSTableScanner scanner = s.getScanner();
              CompactionController controller = new CompactionController(cfs, compacting, 0);
              LifecycleTransaction txn = cfs.getTracker().tryModify(compacting, OperationType.UNKNOWN);
-             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, false, 1000000, false);
+             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, 1000000, false);
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, Collections.singletonList(scanner), controller, FBUtilities.nowInSeconds(), UUIDGen.getTimeUUID()))
         {
             rewriter.switchWriter(getWriter(cfs, s.descriptor.directory, txn));
@@ -620,7 +620,7 @@
              CompactionController controller = new CompactionController(cfs, compacting, 0);
              LifecycleTransaction txn = offline ? LifecycleTransaction.offline(OperationType.UNKNOWN, compacting)
                                        : cfs.getTracker().tryModify(compacting, OperationType.UNKNOWN);
-             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, offline, 10000000, false);
+             SSTableRewriter rewriter = new SSTableRewriter(txn, 100, 10000000, false);
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, Collections.singletonList(scanner), controller, FBUtilities.nowInSeconds(), UUIDGen.getTimeUUID())
         )
         {
@@ -710,7 +710,7 @@
         try (ISSTableScanner scanner = compacting.iterator().next().getScanner();
              CompactionController controller = new CompactionController(cfs, compacting, 0);
              LifecycleTransaction txn = cfs.getTracker().tryModify(compacting, OperationType.UNKNOWN);
-             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, false, 1, false);
+             SSTableRewriter rewriter = new SSTableRewriter(txn, 1000, 1, false);
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, Collections.singletonList(scanner), controller, FBUtilities.nowInSeconds(), UUIDGen.getTimeUUID())
         )
         {
@@ -748,7 +748,7 @@
         try (ISSTableScanner scanner = sstables.iterator().next().getScanner();
              CompactionController controller = new CompactionController(cfs, sstables, 0);
              LifecycleTransaction txn = cfs.getTracker().tryModify(sstables, OperationType.UNKNOWN);
-             SSTableRewriter writer = new SSTableRewriter(txn, 1000, false, 10000000, false);
+             SSTableRewriter writer = new SSTableRewriter(txn, 1000, 10000000, false);
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, Collections.singletonList(scanner), controller, FBUtilities.nowInSeconds(), UUIDGen.getTimeUUID())
         )
         {
@@ -850,8 +850,8 @@
         int nowInSec = FBUtilities.nowInSeconds();
         try (AbstractCompactionStrategy.ScannerList scanners = cfs.getCompactionStrategyManager().getScanners(sstables);
              LifecycleTransaction txn = cfs.getTracker().tryModify(sstables, OperationType.UNKNOWN);
-             SSTableRewriter writer = new SSTableRewriter(txn, 1000, false, false);
-             SSTableRewriter writer2 = new SSTableRewriter(txn, 1000, false, false);
+             SSTableRewriter writer = SSTableRewriter.constructWithoutEarlyOpening(txn, false, 1000);
+             SSTableRewriter writer2 = SSTableRewriter.constructWithoutEarlyOpening(txn, false, 1000);
              CompactionController controller = new CompactionController(cfs, sstables, cfs.gcBefore(nowInSec));
              CompactionIterator ci = new CompactionIterator(OperationType.COMPACTION, scanners.scanners, controller, nowInSec, UUIDGen.getTimeUUID())
              )

diff --git a/test/unit/org/apache/cassandra/io/sstable/SSTableWriterTest.java b/test/unit/org/apache/cassandra/io/sstable/SSTableWriterTest.java
index 6f18461..91843d9 100644
--- a/test/unit/org/apache/cassandra/io/sstable/SSTableWriterTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/SSTableWriterTest.java

@@ -223,7 +223,7 @@
             try
             {
                 DecoratedKey dk = Util.dk("large_value");
-                UnfilteredRowIterator rowIter = sstable.iterator(dk, ColumnFilter.all(cfs.metadata), false, false);
+                UnfilteredRowIterator rowIter = sstable.iterator(dk, Slices.ALL, ColumnFilter.all(cfs.metadata), false, false);
                 while (rowIter.hasNext())
                 {
                     rowIter.next();

diff --git a/test/unit/org/apache/cassandra/io/sstable/SSTableWriterTestBase.java b/test/unit/org/apache/cassandra/io/sstable/SSTableWriterTestBase.java
index 5c7c7c0..9c3bb19 100644
--- a/test/unit/org/apache/cassandra/io/sstable/SSTableWriterTestBase.java
+++ b/test/unit/org/apache/cassandra/io/sstable/SSTableWriterTestBase.java

@@ -143,7 +143,7 @@
     public static SSTableWriter getWriter(ColumnFamilyStore cfs, File directory, LifecycleTransaction txn)
     {
         String filename = cfs.getSSTablePath(directory);
-        return SSTableWriter.create(filename, 0, 0, new SerializationHeader(true, cfs.metadata, cfs.metadata.partitionColumns(), EncodingStats.NO_STATS), txn);
+        return SSTableWriter.create(filename, 0, 0, new SerializationHeader(true, cfs.metadata, cfs.metadata.partitionColumns(), EncodingStats.NO_STATS), cfs.indexManager.listIndexes(), txn);
     }
 
     public static ByteBuffer random(int i, int size)

diff --git a/test/unit/org/apache/cassandra/io/sstable/format/ClientModeSSTableTest.java b/test/unit/org/apache/cassandra/io/sstable/format/ClientModeSSTableTest.java
index 661fcd5..6b0427b 100644
--- a/test/unit/org/apache/cassandra/io/sstable/format/ClientModeSSTableTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/format/ClientModeSSTableTest.java

@@ -29,15 +29,13 @@
 import org.apache.cassandra.concurrent.ScheduledExecutors;
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.config.Config;
+import org.apache.cassandra.db.Slices;
 import org.apache.cassandra.db.filter.ColumnFilter;
 import org.apache.cassandra.db.marshal.BytesType;
-import org.apache.cassandra.db.rows.SliceableUnfilteredRowIterator;
+import org.apache.cassandra.db.rows.UnfilteredRowIterator;
 import org.apache.cassandra.dht.ByteOrderedPartitioner;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.io.sstable.Descriptor;
-import org.apache.cassandra.io.sstable.format.SSTableFormat;
-import org.apache.cassandra.io.sstable.format.SSTableReader;
-import org.apache.cassandra.io.sstable.format.Version;
 
 /**
  * Tests backwards compatibility for SSTables
@@ -107,7 +105,7 @@
 
             ByteBuffer key = bytes(Integer.toString(100));
 
-            try (SliceableUnfilteredRowIterator iter = reader.iterator(metadata.decorateKey(key), ColumnFilter.selection(metadata.partitionColumns()), false, false))
+            try (UnfilteredRowIterator iter = reader.iterator(metadata.decorateKey(key), Slices.ALL, ColumnFilter.selection(metadata.partitionColumns()), false, false))
             {
                 assert iter.next().clustering().get(0).equals(key);
             }

diff --git a/test/unit/org/apache/cassandra/io/sstable/format/SSTableFlushObserverTest.java b/test/unit/org/apache/cassandra/io/sstable/format/SSTableFlushObserverTest.java
new file mode 100644
index 0000000..dafad37
--- /dev/null
+++ b/test/unit/org/apache/cassandra/io/sstable/format/SSTableFlushObserverTest.java

@@ -0,0 +1,218 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.io.sstable.format;
+
+import java.io.File;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Collections;
+import java.util.Iterator;
+
+import org.apache.cassandra.config.CFMetaData;
+import org.apache.cassandra.config.ColumnDefinition;
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.db.Clustering;
+import org.apache.cassandra.db.DecoratedKey;
+import org.apache.cassandra.db.DeletionTime;
+import org.apache.cassandra.db.SerializationHeader;
+import org.apache.cassandra.db.compaction.OperationType;
+import org.apache.cassandra.db.lifecycle.LifecycleTransaction;
+import org.apache.cassandra.db.marshal.Int32Type;
+import org.apache.cassandra.db.marshal.LongType;
+import org.apache.cassandra.db.marshal.UTF8Type;
+import org.apache.cassandra.db.rows.*;
+import org.apache.cassandra.io.FSReadError;
+import org.apache.cassandra.io.FSWriteError;
+import org.apache.cassandra.io.sstable.Descriptor;
+import org.apache.cassandra.io.sstable.format.big.BigTableWriter;
+import org.apache.cassandra.io.sstable.metadata.MetadataCollector;
+import org.apache.cassandra.io.util.FileDataInput;
+import org.apache.cassandra.io.util.FileUtils;
+import org.apache.cassandra.utils.ByteBufferUtil;
+import org.apache.cassandra.utils.Pair;
+
+import com.google.common.collect.ArrayListMultimap;
+import com.google.common.collect.Multimap;
+
+import junit.framework.Assert;
+import org.junit.Test;
+
+public class SSTableFlushObserverTest
+{
+    private static final String KS_NAME = "test";
+    private static final String CF_NAME = "flush_observer";
+
+    @Test
+    public void testFlushObserver()
+    {
+        CFMetaData cfm = CFMetaData.Builder.create(KS_NAME, CF_NAME)
+                                           .addPartitionKey("id", UTF8Type.instance)
+                                           .addRegularColumn("first_name", UTF8Type.instance)
+                                           .addRegularColumn("age", Int32Type.instance)
+                                           .addRegularColumn("height", LongType.instance)
+                                           .build();
+
+        LifecycleTransaction transaction = LifecycleTransaction.offline(OperationType.COMPACTION);
+        FlushObserver observer = new FlushObserver();
+
+        String sstableDirectory = DatabaseDescriptor.getAllDataFileLocations()[0];
+        File directory = new File(sstableDirectory + File.pathSeparator + KS_NAME + File.pathSeparator + CF_NAME);
+        directory.deleteOnExit();
+
+        if (!directory.exists() && !directory.mkdirs())
+            throw new FSWriteError(new IOException("failed to create tmp directory"), directory.getAbsolutePath());
+
+        SSTableFormat.Type sstableFormat = DatabaseDescriptor.getSSTableFormat();
+
+        BigTableWriter writer = new BigTableWriter(new Descriptor(sstableFormat.info.getLatestVersion().version,
+                                                                  directory,
+                                                                  KS_NAME, CF_NAME,
+                                                                  0,
+                                                                  sstableFormat),
+                                                   10L, 0L, cfm,
+                                                   new MetadataCollector(cfm.comparator).sstableLevel(0),
+                                                   new SerializationHeader(true, cfm, cfm.partitionColumns(), EncodingStats.NO_STATS),
+                                                   Collections.singletonList(observer),
+                                                   transaction);
+
+        SSTableReader reader = null;
+        Multimap<ByteBuffer, Cell> expected = ArrayListMultimap.create();
+
+        try
+        {
+            final long now = System.currentTimeMillis();
+
+            ByteBuffer key = UTF8Type.instance.fromString("key1");
+            expected.putAll(key, Arrays.asList(BufferCell.live(getColumn(cfm, "age"), now, Int32Type.instance.decompose(27)),
+                                               BufferCell.live(getColumn(cfm, "first_name"), now,UTF8Type.instance.fromString("jack")),
+                                               BufferCell.live(getColumn(cfm, "height"), now, LongType.instance.decompose(183L))));
+
+            writer.append(new RowIterator(cfm, key.duplicate(), Collections.singletonList(buildRow(expected.get(key)))));
+
+            key = UTF8Type.instance.fromString("key2");
+            expected.putAll(key, Arrays.asList(BufferCell.live(getColumn(cfm, "age"), now, Int32Type.instance.decompose(30)),
+                                               BufferCell.live(getColumn(cfm, "first_name"), now,UTF8Type.instance.fromString("jim")),
+                                               BufferCell.live(getColumn(cfm, "height"), now, LongType.instance.decompose(180L))));
+
+            writer.append(new RowIterator(cfm, key, Collections.singletonList(buildRow(expected.get(key)))));
+
+            key = UTF8Type.instance.fromString("key3");
+            expected.putAll(key, Arrays.asList(BufferCell.live(getColumn(cfm, "age"), now, Int32Type.instance.decompose(30)),
+                                               BufferCell.live(getColumn(cfm, "first_name"), now,UTF8Type.instance.fromString("ken")),
+                                               BufferCell.live(getColumn(cfm, "height"), now, LongType.instance.decompose(178L))));
+
+            writer.append(new RowIterator(cfm, key, Collections.singletonList(buildRow(expected.get(key)))));
+
+            reader = writer.finish(true);
+        }
+        finally
+        {
+            FileUtils.closeQuietly(writer);
+        }
+
+        Assert.assertTrue(observer.isComplete);
+        Assert.assertEquals(expected.size(), observer.rows.size());
+
+        for (Pair<ByteBuffer, Long> e : observer.rows.keySet())
+        {
+            ByteBuffer key = e.left;
+            Long indexPosition = e.right;
+
+            try (FileDataInput index = reader.ifile.createReader(indexPosition))
+            {
+                ByteBuffer indexKey = ByteBufferUtil.readWithShortLength(index);
+                Assert.assertEquals(0, UTF8Type.instance.compare(key, indexKey));
+            }
+            catch (IOException ex)
+            {
+                throw new FSReadError(ex, reader.getIndexFilename());
+            }
+
+            Assert.assertEquals(expected.get(key), observer.rows.get(e));
+        }
+    }
+
+    private static class RowIterator extends AbstractUnfilteredRowIterator
+    {
+        private final Iterator<Unfiltered> rows;
+
+        public RowIterator(CFMetaData cfm, ByteBuffer key, Collection<Unfiltered> content)
+        {
+            super(cfm,
+                  DatabaseDescriptor.getPartitioner().decorateKey(key),
+                  DeletionTime.LIVE,
+                  cfm.partitionColumns(),
+                  BTreeRow.emptyRow(Clustering.STATIC_CLUSTERING),
+                  false,
+                  EncodingStats.NO_STATS);
+
+            rows = content.iterator();
+        }
+
+        @Override
+        protected Unfiltered computeNext()
+        {
+            return rows.hasNext() ? rows.next() : endOfData();
+        }
+    }
+
+    private static class FlushObserver implements SSTableFlushObserver
+    {
+        private final Multimap<Pair<ByteBuffer, Long>, Cell> rows = ArrayListMultimap.create();
+        private Pair<ByteBuffer, Long> currentKey;
+        private boolean isComplete;
+
+        @Override
+        public void begin()
+        {}
+
+        @Override
+        public void startPartition(DecoratedKey key, long indexPosition)
+        {
+            currentKey = Pair.create(key.getKey(), indexPosition);
+        }
+
+        @Override
+        public void nextUnfilteredCluster(Unfiltered row)
+        {
+            if (row.isRow())
+                ((Row) row).forEach((c) -> rows.put(currentKey, (Cell) c));
+        }
+
+        @Override
+        public void complete()
+        {
+            isComplete = true;
+        }
+    }
+
+    private static Row buildRow(Collection<Cell> cells)
+    {
+        Row.Builder rowBuilder = BTreeRow.sortedBuilder();
+        rowBuilder.newRow(Clustering.EMPTY);
+        cells.forEach(rowBuilder::addCell);
+        return rowBuilder.build();
+    }
+
+    private static ColumnDefinition getColumn(CFMetaData cfm, String name)
+    {
+        return cfm.getColumnDefinition(UTF8Type.instance.fromString(name));
+    }
+}

diff --git a/test/unit/org/apache/cassandra/io/sstable/metadata/MetadataSerializerTest.java b/test/unit/org/apache/cassandra/io/sstable/metadata/MetadataSerializerTest.java
index 93365ef..a3382c4 100644
--- a/test/unit/org/apache/cassandra/io/sstable/metadata/MetadataSerializerTest.java
+++ b/test/unit/org/apache/cassandra/io/sstable/metadata/MetadataSerializerTest.java

@@ -32,7 +32,7 @@
 import org.apache.cassandra.config.CFMetaData;
 import org.apache.cassandra.db.SerializationHeader;
 import org.apache.cassandra.config.DatabaseDescriptor;
-import org.apache.cassandra.db.commitlog.ReplayPosition;
+import org.apache.cassandra.db.commitlog.CommitLogPosition;
 import org.apache.cassandra.dht.RandomPartitioner;
 import org.apache.cassandra.io.sstable.Component;
 import org.apache.cassandra.io.sstable.Descriptor;
@@ -80,8 +80,8 @@
 
     public Map<MetadataType, MetadataComponent> constructMetadata()
     {
-        ReplayPosition club = new ReplayPosition(11L, 12);
-        ReplayPosition cllb = new ReplayPosition(9L, 12);
+        CommitLogPosition club = new CommitLogPosition(11L, 12);
+        CommitLogPosition cllb = new CommitLogPosition(9L, 12);
 
         CFMetaData cfm = SchemaLoader.standardCFMD("ks1", "cf1");
         MetadataCollector collector = new MetadataCollector(cfm.comparator)

diff --git a/test/unit/org/apache/cassandra/io/util/BufferedRandomAccessFileTest.java b/test/unit/org/apache/cassandra/io/util/BufferedRandomAccessFileTest.java
index 360d262..8cdd4ea 100644
--- a/test/unit/org/apache/cassandra/io/util/BufferedRandomAccessFileTest.java
+++ b/test/unit/org/apache/cassandra/io/util/BufferedRandomAccessFileTest.java

@@ -18,6 +18,7 @@
  *
  */
 package org.apache.cassandra.io.util;
+import org.apache.cassandra.io.FSReadError;
 import org.apache.cassandra.utils.ByteBufferUtil;
 import org.apache.cassandra.utils.SyncUtil;
 
@@ -130,7 +131,7 @@
     public void testReadAndWriteOnCapacity() throws IOException
     {
         File tmpFile = File.createTempFile("readtest", "bin");
-        SequentialWriter w = SequentialWriter.open(tmpFile);
+        SequentialWriter w = new SequentialWriter(tmpFile);
 
         // Fully write the file and sync..
         byte[] in = generateByteArray(RandomAccessReader.DEFAULT_BUFFER_SIZE);
@@ -156,7 +157,7 @@
     public void testLength() throws IOException
     {
         File tmpFile = File.createTempFile("lengthtest", "bin");
-        SequentialWriter w = SequentialWriter.open(tmpFile);
+        SequentialWriter w = new SequentialWriter(tmpFile);
         assertEquals(0, w.length());
 
         // write a chunk smaller then our buffer, so will not be flushed
@@ -330,7 +331,7 @@
             {
                 File file1 = writeTemporaryFile(new byte[16]);
                 try (final ChannelProxy channel = new ChannelProxy(file1);
-                     final RandomAccessReader file = new RandomAccessReader.Builder(channel)
+                     final RandomAccessReader file = RandomAccessReader.builder(channel)
                                                      .bufferSize(bufferSize)
                                                      .build())
                 {
@@ -343,7 +344,7 @@
             {
                 File file1 = writeTemporaryFile(new byte[16]);
                 try (final ChannelProxy channel = new ChannelProxy(file1);
-                     final RandomAccessReader file = new RandomAccessReader.Builder(channel).bufferSize(bufferSize).build())
+                     final RandomAccessReader file = RandomAccessReader.builder(channel).bufferSize(bufferSize).build())
                 {
                     expectEOF(() -> {
                         while (true)
@@ -561,7 +562,7 @@
     public void testSetNegativeLength() throws IOException, IllegalArgumentException
     {
         File tmpFile = File.createTempFile("set_negative_length", "bin");
-        try (SequentialWriter file = SequentialWriter.open(tmpFile))
+        try (SequentialWriter file = new SequentialWriter(tmpFile))
         {
             file.truncate(-8L);
         }
@@ -572,7 +573,7 @@
         File tempFile = File.createTempFile(name, null);
         tempFile.deleteOnExit();
 
-        return SequentialWriter.open(tempFile);
+        return new SequentialWriter(tempFile);
     }
 
     private File writeTemporaryFile(byte[] data) throws IOException

diff --git a/test/unit/org/apache/cassandra/io/util/ChecksummedRandomAccessReaderTest.java b/test/unit/org/apache/cassandra/io/util/ChecksummedRandomAccessReaderTest.java
index 57428af..0657f7f 100644
--- a/test/unit/org/apache/cassandra/io/util/ChecksummedRandomAccessReaderTest.java
+++ b/test/unit/org/apache/cassandra/io/util/ChecksummedRandomAccessReaderTest.java

@@ -27,10 +27,6 @@
 import org.junit.Test;
 
 import static org.junit.Assert.*;
-import org.apache.cassandra.io.util.ChecksummedRandomAccessReader;
-import org.apache.cassandra.io.util.ChecksummedSequentialWriter;
-import org.apache.cassandra.io.util.RandomAccessReader;
-import org.apache.cassandra.io.util.SequentialWriter;
 
 public class ChecksummedRandomAccessReaderTest
 {
@@ -43,9 +39,11 @@
         final byte[] expected = new byte[70 * 1024];   // bit more than crc chunk size, so we can test rebuffering.
         ThreadLocalRandom.current().nextBytes(expected);
 
-        SequentialWriter writer = ChecksummedSequentialWriter.open(data, crc);
-        writer.write(expected);
-        writer.finish();
+        try (SequentialWriter writer = new ChecksummedSequentialWriter(data, crc, null, SequentialWriterOption.DEFAULT))
+        {
+            writer.write(expected);
+            writer.finish();
+        }
 
         assert data.exists();
 
@@ -69,9 +67,11 @@
         final byte[] dataBytes = new byte[70 * 1024];   // bit more than crc chunk size
         ThreadLocalRandom.current().nextBytes(dataBytes);
 
-        SequentialWriter writer = ChecksummedSequentialWriter.open(data, crc);
-        writer.write(dataBytes);
-        writer.finish();
+        try (SequentialWriter writer = new ChecksummedSequentialWriter(data, crc, null, SequentialWriterOption.DEFAULT))
+        {
+            writer.write(dataBytes);
+            writer.finish();
+        }
 
         assert data.exists();
 
@@ -101,9 +101,11 @@
         final byte[] expected = new byte[5 * 1024];
         Arrays.fill(expected, (byte) 0);
 
-        SequentialWriter writer = ChecksummedSequentialWriter.open(data, crc);
-        writer.write(expected);
-        writer.finish();
+        try (SequentialWriter writer = new ChecksummedSequentialWriter(data, crc, null, SequentialWriterOption.DEFAULT))
+        {
+            writer.write(expected);
+            writer.finish();
+        }
 
         assert data.exists();
 

diff --git a/test/unit/org/apache/cassandra/io/util/ChecksummedSequentialWriterTest.java b/test/unit/org/apache/cassandra/io/util/ChecksummedSequentialWriterTest.java
index bea3aac..65ffdba 100644
--- a/test/unit/org/apache/cassandra/io/util/ChecksummedSequentialWriterTest.java
+++ b/test/unit/org/apache/cassandra/io/util/ChecksummedSequentialWriterTest.java

@@ -59,7 +59,9 @@
 
         private TestableCSW(File file, File crcFile) throws IOException
         {
-            this(file, crcFile, new ChecksummedSequentialWriter(file, BUFFER_SIZE, crcFile));
+            this(file, crcFile, new ChecksummedSequentialWriter(file, crcFile, null, SequentialWriterOption.newBuilder()
+                                                                                                           .bufferSize(BUFFER_SIZE)
+                                                                                                           .build()));
         }
 
         private TestableCSW(File file, File crcFile, SequentialWriter sw) throws IOException

diff --git a/test/unit/org/apache/cassandra/io/util/DataOutputTest.java b/test/unit/org/apache/cassandra/io/util/DataOutputTest.java
index 1fb5597..e082b19 100644
--- a/test/unit/org/apache/cassandra/io/util/DataOutputTest.java
+++ b/test/unit/org/apache/cassandra/io/util/DataOutputTest.java

@@ -40,7 +40,6 @@
 import org.junit.Assert;
 import org.junit.Test;
 
-import org.apache.cassandra.io.compress.BufferType;
 import org.apache.cassandra.utils.ByteBufferUtil;
 
 public class DataOutputTest
@@ -380,8 +379,9 @@
     public void testSequentialWriter() throws IOException
     {
         File file = FileUtils.createTempFile("dataoutput", "test");
-        final SequentialWriter writer = new SequentialWriter(file, 32, BufferType.ON_HEAP);
-        DataOutputStreamPlus write = new WrappedDataOutputStreamPlus(writer.finishOnClose());
+        SequentialWriterOption option = SequentialWriterOption.newBuilder().bufferSize(32).finishOnClose(true).build();
+        final SequentialWriter writer = new SequentialWriter(file, option);
+        DataOutputStreamPlus write = new WrappedDataOutputStreamPlus(writer);
         DataInput canon = testWrite(write);
         write.flush();
         write.close();

diff --git a/test/unit/org/apache/cassandra/io/util/FileUtilsTest.java b/test/unit/org/apache/cassandra/io/util/FileUtilsTest.java
index 7110504..ee33107 100644
--- a/test/unit/org/apache/cassandra/io/util/FileUtilsTest.java
+++ b/test/unit/org/apache/cassandra/io/util/FileUtilsTest.java

@@ -20,11 +20,20 @@
 
 import java.io.File;
 import java.io.IOException;
+import java.io.RandomAccessFile;
 import java.nio.charset.Charset;
 import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
 
 import org.junit.Test;
 
+import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.cql3.CQLTester;
+import org.apache.cassandra.schema.SchemaKeyspace;
+import org.apache.cassandra.utils.FBUtilities;
+
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.assertTrue;
 
@@ -52,4 +61,47 @@
         assertEquals(0, b.length);
     }
 
+    @Test
+    public void testFolderSize() throws Exception
+    {
+        File folder = createFolder(Paths.get(DatabaseDescriptor.getAllDataFileLocations()[0], "testFolderSize"));
+        folder.deleteOnExit();
+
+        File childFolder = createFolder(Paths.get(folder.getPath(), "child"));
+
+        File[] files = {
+                       createFile(new File(folder, "001"), 10000),
+                       createFile(new File(folder, "002"), 1000),
+                       createFile(new File(folder, "003"), 100),
+                       createFile(new File(childFolder, "001"), 1000),
+                       createFile(new File(childFolder, "002"), 2000),
+        };
+
+        assertEquals(0, FileUtils.folderSize(new File(folder, "i_dont_exist")));
+        assertEquals(files[0].length(), FileUtils.folderSize(files[0]));
+
+        long size = FileUtils.folderSize(folder);
+        assertEquals(Arrays.stream(files).mapToLong(f -> f.length()).sum(), size);
+    }
+
+    private File createFolder(Path path)
+    {
+        File folder = path.toFile();
+        FileUtils.createDirectory(folder);
+        return folder;
+    }
+
+    private File createFile(File file, long size)
+    {
+        try
+        {
+            RandomAccessFile f = new RandomAccessFile(file, "rw");
+            f.setLength(size);
+        }
+        catch (Exception e)
+        {
+            System.err.println(e);
+        }
+        return file;
+    }
 }

diff --git a/test/unit/org/apache/cassandra/io/util/MmappedRegionsTest.java b/test/unit/org/apache/cassandra/io/util/MmappedRegionsTest.java
index 7cf7bd3..f34c00f 100644
--- a/test/unit/org/apache/cassandra/io/util/MmappedRegionsTest.java
+++ b/test/unit/org/apache/cassandra/io/util/MmappedRegionsTest.java

@@ -63,7 +63,7 @@
         File ret = File.createTempFile(fileName, "1");
         ret.deleteOnExit();
 
-        try (SequentialWriter writer = SequentialWriter.open(ret))
+        try (SequentialWriter writer = new SequentialWriter(ret))
         {
             writer.write(buffer);
             writer.finish();
@@ -99,8 +99,8 @@
             {
                 MmappedRegions.Region region = regions.floor(i);
                 assertNotNull(region);
-                assertEquals(0, region.bottom());
-                assertEquals(1024, region.top());
+                assertEquals(0, region.offset());
+                assertEquals(1024, region.end());
             }
 
             regions.extend(2048);
@@ -110,13 +110,13 @@
                 assertNotNull(region);
                 if (i < 1024)
                 {
-                    assertEquals(0, region.bottom());
-                    assertEquals(1024, region.top());
+                    assertEquals(0, region.offset());
+                    assertEquals(1024, region.end());
                 }
                 else
                 {
-                    assertEquals(1024, region.bottom());
-                    assertEquals(2048, region.top());
+                    assertEquals(1024, region.offset());
+                    assertEquals(2048, region.end());
                 }
             }
         }
@@ -141,8 +141,8 @@
             {
                 MmappedRegions.Region region = regions.floor(i);
                 assertNotNull(region);
-                assertEquals(SIZE * (i / SIZE), region.bottom());
-                assertEquals(SIZE + (SIZE * (i / SIZE)), region.top());
+                assertEquals(SIZE * (i / SIZE), region.offset());
+                assertEquals(SIZE + (SIZE * (i / SIZE)), region.end());
             }
         }
         finally
@@ -169,8 +169,8 @@
             {
                 MmappedRegions.Region region = regions.floor(i);
                 assertNotNull(region);
-                assertEquals(SIZE * (i / SIZE), region.bottom());
-                assertEquals(SIZE + (SIZE * (i / SIZE)), region.top());
+                assertEquals(SIZE * (i / SIZE), region.offset());
+                assertEquals(SIZE + (SIZE * (i / SIZE)), region.end());
             }
         }
         finally
@@ -209,8 +209,8 @@
         {
             MmappedRegions.Region region = snapshot.floor(i);
             assertNotNull(region);
-            assertEquals(SIZE * (i / SIZE), region.bottom());
-            assertEquals(SIZE + (SIZE * (i / SIZE)), region.top());
+            assertEquals(SIZE * (i / SIZE), region.offset());
+            assertEquals(SIZE + (SIZE * (i / SIZE)), region.end());
 
             // check we can access the buffer
             assertNotNull(region.buffer.duplicate().getInt());
@@ -267,8 +267,8 @@
             {
                 MmappedRegions.Region region = regions.floor(i);
                 assertNotNull(region);
-                assertEquals(0, region.bottom());
-                assertEquals(4096, region.top());
+                assertEquals(0, region.offset());
+                assertEquals(4096, region.end());
             }
         }
     }
@@ -298,10 +298,9 @@
         cf.deleteOnExit();
 
         MetadataCollector sstableMetadataCollector = new MetadataCollector(new ClusteringComparator(BytesType.instance));
-        try(SequentialWriter writer = new CompressedSequentialWriter(f,
-                                                                     cf.getAbsolutePath(),
-                                                                     CompressionParams.snappy(),
-                                                                     sstableMetadataCollector))
+        try(SequentialWriter writer = new CompressedSequentialWriter(f, cf.getAbsolutePath(),
+                                                                     null, SequentialWriterOption.DEFAULT,
+                                                                     CompressionParams.snappy(), sstableMetadataCollector))
         {
             writer.write(buffer);
             writer.finish();
@@ -325,8 +324,8 @@
                 assertNotNull(compressedChunk);
                 assertEquals(chunk.length + 4, compressedChunk.capacity());
 
-                assertEquals(chunk.offset, region.bottom());
-                assertEquals(chunk.offset + chunk.length + 4, region.top());
+                assertEquals(chunk.offset, region.offset());
+                assertEquals(chunk.offset + chunk.length + 4, region.end());
 
                 i += metadata.chunkLength();
             }

diff --git a/test/unit/org/apache/cassandra/io/util/RandomAccessReaderTest.java b/test/unit/org/apache/cassandra/io/util/RandomAccessReaderTest.java
index aad5117..32ce554 100644
--- a/test/unit/org/apache/cassandra/io/util/RandomAccessReaderTest.java
+++ b/test/unit/org/apache/cassandra/io/util/RandomAccessReaderTest.java

@@ -87,12 +87,6 @@
             this.maxSegmentSize = maxSegmentSize;
             return this;
         }
-
-        public Parameters expected(byte[] expected)
-        {
-            this.expected = expected;
-            return this;
-        }
     }
 
     @Test
@@ -128,6 +122,7 @@
     @Test
     public void testMultipleSegments() throws IOException
     {
+        // FIXME: This is the same as above.
         testReadFully(new Parameters(8192, 4096).mmappedRegions(true).maxSegmentSize(1024));
     }
 
@@ -139,7 +134,7 @@
 
         try(ChannelProxy channel = new ChannelProxy("abc", new FakeFileChannel(SIZE)))
         {
-            RandomAccessReader.Builder builder = new RandomAccessReader.Builder(channel)
+            RandomAccessReader.Builder builder = RandomAccessReader.builder(channel)
                                                  .bufferType(params.bufferType)
                                                  .bufferSize(params.bufferSize);
 
@@ -268,7 +263,7 @@
         final File f = File.createTempFile("testReadFully", "1");
         f.deleteOnExit();
 
-        try(SequentialWriter writer = SequentialWriter.open(f))
+        try(SequentialWriter writer = new SequentialWriter(f))
         {
             long numWritten = 0;
             while (numWritten < params.fileLength)
@@ -290,11 +285,15 @@
         final File f = writeFile(params);
         try(ChannelProxy channel = new ChannelProxy(f))
         {
-            RandomAccessReader.Builder builder = new RandomAccessReader.Builder(channel)
+            RandomAccessReader.Builder builder = RandomAccessReader.builder(channel)
                                                  .bufferType(params.bufferType)
                                                  .bufferSize(params.bufferSize);
+            MmappedRegions regions = null;
             if (params.mmappedRegions)
-                builder.regions(MmappedRegions.map(channel, f.length()));
+            {
+                regions = MmappedRegions.map(channel, f.length());
+                builder.regions(regions);
+            }
 
             try(RandomAccessReader reader = builder.build())
             {
@@ -316,8 +315,8 @@
                 assertEquals(0, reader.bytesRemaining());
             }
 
-            if (builder.regions != null)
-                assertNull(builder.regions.close(null));
+            if (regions != null)
+                assertNull(regions.close(null));
         }
     }
 
@@ -327,7 +326,7 @@
         File f = File.createTempFile("testReadBytes", "1");
         final String expected = "The quick brown fox jumps over the lazy dog";
 
-        try(SequentialWriter writer = SequentialWriter.open(f))
+        try(SequentialWriter writer = new SequentialWriter(f))
         {
             writer.write(expected.getBytes());
             writer.finish();
@@ -336,7 +335,7 @@
         assert f.exists();
 
         try(ChannelProxy channel = new ChannelProxy(f);
-            RandomAccessReader reader = new RandomAccessReader.Builder(channel).build())
+            RandomAccessReader reader = RandomAccessReader.builder(channel).build())
         {
             assertEquals(f.getAbsolutePath(), reader.getPath());
             assertEquals(expected.length(), reader.length());
@@ -356,7 +355,7 @@
         final String expected = "The quick brown fox jumps over the lazy dog";
         final int numIterations = 10;
 
-        try(SequentialWriter writer = SequentialWriter.open(f))
+        try(SequentialWriter writer = new SequentialWriter(f))
         {
             for (int i = 0; i < numIterations; i++)
                 writer.write(expected.getBytes());
@@ -366,7 +365,7 @@
         assert f.exists();
 
         try(ChannelProxy channel = new ChannelProxy(f);
-        RandomAccessReader reader = new RandomAccessReader.Builder(channel).build())
+        RandomAccessReader reader = RandomAccessReader.builder(channel).build())
         {
             assertEquals(expected.length() * numIterations, reader.length());
 
@@ -436,7 +435,7 @@
         Random r = new Random(seed);
         r.nextBytes(expected);
 
-        try(SequentialWriter writer = SequentialWriter.open(f))
+        try(SequentialWriter writer = new SequentialWriter(f))
         {
             writer.write(expected);
             writer.finish();
@@ -448,7 +447,7 @@
         {
             final Runnable worker = () ->
             {
-                try(RandomAccessReader reader = new RandomAccessReader.Builder(channel).build())
+                try(RandomAccessReader reader = RandomAccessReader.builder(channel).build())
                 {
                     assertEquals(expected.length, reader.length());
 

diff --git a/test/unit/org/apache/cassandra/io/util/SequentialWriterTest.java b/test/unit/org/apache/cassandra/io/util/SequentialWriterTest.java
index f5a366e..2797384 100644
--- a/test/unit/org/apache/cassandra/io/util/SequentialWriterTest.java
+++ b/test/unit/org/apache/cassandra/io/util/SequentialWriterTest.java

@@ -36,6 +36,7 @@
 import org.apache.cassandra.utils.concurrent.AbstractTransactionalTest;
 
 import static org.apache.commons.io.FileUtils.*;
+import static org.junit.Assert.assertEquals;
 
 public class SequentialWriterTest extends AbstractTransactionalTest
 {
@@ -71,7 +72,10 @@
 
         protected TestableSW(File file) throws IOException
         {
-            this(file, new SequentialWriter(file, 8 << 10, BufferType.OFF_HEAP));
+            this(file, new SequentialWriter(file, SequentialWriterOption.newBuilder()
+                                                                        .bufferSize(8 << 10)
+                                                                        .bufferType(BufferType.OFF_HEAP)
+                                                                        .build()));
         }
 
         protected TestableSW(File file, SequentialWriter sw) throws IOException
@@ -118,6 +122,47 @@
         }
     }
 
+    @Test
+    public void resetAndTruncateTest()
+    {
+        File tempFile = new File(Files.createTempDir(), "reset.txt");
+        final int bufferSize = 48;
+        final int writeSize = 64;
+        byte[] toWrite = new byte[writeSize];
+        SequentialWriterOption option = SequentialWriterOption.newBuilder().bufferSize(bufferSize).build();
+        try (SequentialWriter writer = new SequentialWriter(tempFile, option))
+        {
+            // write bytes greather than buffer
+            writer.write(toWrite);
+            assertEquals(bufferSize, writer.getLastFlushOffset());
+            assertEquals(writeSize, writer.position());
+            // mark thi position
+            DataPosition pos = writer.mark();
+            // write another
+            writer.write(toWrite);
+            // another buffer should be flushed
+            assertEquals(bufferSize * 2, writer.getLastFlushOffset());
+            assertEquals(writeSize * 2, writer.position());
+            // reset writer
+            writer.resetAndTruncate(pos);
+            // current position and flushed size should be changed
+            assertEquals(writeSize, writer.position());
+            assertEquals(writeSize, writer.getLastFlushOffset());
+            // write another byte less than buffer
+            writer.write(new byte[]{0});
+            assertEquals(writeSize + 1, writer.position());
+            // flush off set should not be increase
+            assertEquals(writeSize, writer.getLastFlushOffset());
+            writer.finish();
+        }
+        catch (IOException e)
+        {
+            Assert.fail();
+        }
+        // final file size check
+        assertEquals(writeSize + 1, tempFile.length());
+    }
+
     /**
      * Tests that the output stream exposed by SequentialWriter behaves as expected
      */
@@ -127,7 +172,8 @@
         File tempFile = new File(Files.createTempDir(), "test.txt");
         Assert.assertFalse("temp file shouldn't exist yet", tempFile.exists());
 
-        try (DataOutputStream os = new DataOutputStream(SequentialWriter.open(tempFile).finishOnClose()))
+        SequentialWriterOption option = SequentialWriterOption.newBuilder().finishOnClose(true).build();
+        try (DataOutputStream os = new DataOutputStream(new SequentialWriter(tempFile, option)))
         {
             os.writeUTF("123");
         }

diff --git a/test/unit/org/apache/cassandra/locator/NetworkTopologyStrategyTest.java b/test/unit/org/apache/cassandra/locator/NetworkTopologyStrategyTest.java
index bbfdd3b..3cba328 100644
--- a/test/unit/org/apache/cassandra/locator/NetworkTopologyStrategyTest.java
+++ b/test/unit/org/apache/cassandra/locator/NetworkTopologyStrategyTest.java

@@ -21,24 +21,26 @@
 import java.io.IOException;
 import java.net.InetAddress;
 import java.net.UnknownHostException;
-import java.util.ArrayList;
-import java.util.HashMap;
-import java.util.HashSet;
-import java.util.List;
-import java.util.Map;
-import java.util.Set;
+import java.util.*;
+import java.util.stream.Collectors;
 
 import com.google.common.collect.HashMultimap;
+import com.google.common.collect.ImmutableMap;
 import com.google.common.collect.Multimap;
+
 import org.junit.Assert;
 import org.junit.Test;
+
 import org.slf4j.Logger;
 import org.slf4j.LoggerFactory;
 
 import org.apache.cassandra.config.DatabaseDescriptor;
+import org.apache.cassandra.dht.Murmur3Partitioner;
 import org.apache.cassandra.dht.OrderPreservingPartitioner.StringToken;
 import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.locator.TokenMetadata.Topology;
+import org.apache.cassandra.service.StorageService;
 
 public class NetworkTopologyStrategyTest
 {
@@ -166,4 +168,203 @@
         InetAddress add1 = InetAddress.getByAddress(bytes);
         metadata.updateNormalToken(token1, add1);
     }
+
+    @Test
+    public void testCalculateEndpoints() throws UnknownHostException
+    {
+        final int NODES = 100;
+        final int VNODES = 64;
+        final int RUNS = 10;
+        StorageService.instance.setPartitionerUnsafe(Murmur3Partitioner.instance);
+        Map<String, Integer> datacenters = ImmutableMap.of("rf1", 1, "rf3", 3, "rf5_1", 5, "rf5_2", 5, "rf5_3", 5);
+        List<InetAddress> nodes = new ArrayList<>(NODES);
+        for (byte i=0; i<NODES; ++i)
+            nodes.add(InetAddress.getByAddress(new byte[]{127, 0, 0, i}));
+        for (int run=0; run<RUNS; ++run)
+        {
+            Random rand = new Random();
+            IEndpointSnitch snitch = generateSnitch(datacenters, nodes, rand);
+            DatabaseDescriptor.setEndpointSnitch(snitch);
+
+            TokenMetadata meta = new TokenMetadata();
+            for (int i=0; i<NODES; ++i)  // Nodes
+                for (int j=0; j<VNODES; ++j) // tokens/vnodes per node
+                    meta.updateNormalToken(Murmur3Partitioner.instance.getRandomToken(rand), nodes.get(i));
+            testEquivalence(meta, snitch, datacenters, rand);
+        }
+    }
+
+    void testEquivalence(TokenMetadata tokenMetadata, IEndpointSnitch snitch, Map<String, Integer> datacenters, Random rand)
+    {
+        NetworkTopologyStrategy nts = new NetworkTopologyStrategy("ks", tokenMetadata, snitch,
+                                                                  datacenters.entrySet().stream().
+                                                                      collect(Collectors.toMap(x -> x.getKey(), x -> Integer.toString(x.getValue()))));
+        for (int i=0; i<1000; ++i)
+        {
+            Token token = Murmur3Partitioner.instance.getRandomToken(rand);
+            List<InetAddress> expected = calculateNaturalEndpoints(token, tokenMetadata, datacenters, snitch);
+            List<InetAddress> actual = nts.calculateNaturalEndpoints(token, tokenMetadata);
+            if (endpointsDiffer(expected, actual))
+            {
+                System.err.println("Endpoints mismatch for token " + token);
+                System.err.println(" expected: " + expected);
+                System.err.println(" actual  : " + actual);
+                Assert.assertEquals("Endpoints for token " + token + " mismatch.", expected, actual);
+            }
+        }
+    }
+
+    private boolean endpointsDiffer(List<InetAddress> ep1, List<InetAddress> ep2)
+    {
+        // Because the old algorithm does not put the nodes in the correct order in the case where more replicas
+        // are required than there are racks in a dc, we accept different order as long as the primary
+        // replica is the same.
+        if (ep1.equals(ep2))
+            return false;
+        if (!ep1.get(0).equals(ep2.get(0)))
+            return true;
+        Set<InetAddress> s1 = new HashSet<>(ep1);
+        Set<InetAddress> s2 = new HashSet<>(ep2);
+        return !s1.equals(s2);
+    }
+
+    IEndpointSnitch generateSnitch(Map<String, Integer> datacenters, Collection<InetAddress> nodes, Random rand)
+    {
+        final Map<InetAddress, String> nodeToRack = new HashMap<>();
+        final Map<InetAddress, String> nodeToDC = new HashMap<>();
+        Map<String, List<String>> racksPerDC = new HashMap<>();
+        datacenters.forEach((dc, rf) -> racksPerDC.put(dc, randomRacks(rf, rand)));
+        int rf = datacenters.values().stream().mapToInt(x -> x).sum();
+        String[] dcs = new String[rf];
+        int pos = 0;
+        for (Map.Entry<String, Integer> dce : datacenters.entrySet())
+        {
+            for (int i = 0; i < dce.getValue(); ++i)
+                dcs[pos++] = dce.getKey();
+        }
+
+        for (InetAddress node : nodes)
+        {
+            String dc = dcs[rand.nextInt(rf)];
+            List<String> racks = racksPerDC.get(dc);
+            String rack = racks.get(rand.nextInt(racks.size()));
+            nodeToRack.put(node, rack);
+            nodeToDC.put(node, dc);
+        }
+
+        return new AbstractNetworkTopologySnitch()
+        {
+            public String getRack(InetAddress endpoint)
+            {
+                return nodeToRack.get(endpoint);
+            }
+
+            public String getDatacenter(InetAddress endpoint)
+            {
+                return nodeToDC.get(endpoint);
+            }
+        };
+    }
+
+    private List<String> randomRacks(int rf, Random rand)
+    {
+        int rc = rand.nextInt(rf * 3 - 1) + 1;
+        List<String> racks = new ArrayList<>(rc);
+        for (int i=0; i<rc; ++i)
+            racks.add(Integer.toString(i));
+        return racks;
+    }
+
+    // Copy of older endpoints calculation algorithm for comparison
+    public static List<InetAddress> calculateNaturalEndpoints(Token searchToken, TokenMetadata tokenMetadata, Map<String, Integer> datacenters, IEndpointSnitch snitch)
+    {
+        // we want to preserve insertion order so that the first added endpoint becomes primary
+        Set<InetAddress> replicas = new LinkedHashSet<>();
+        // replicas we have found in each DC
+        Map<String, Set<InetAddress>> dcReplicas = new HashMap<>(datacenters.size());
+        for (Map.Entry<String, Integer> dc : datacenters.entrySet())
+            dcReplicas.put(dc.getKey(), new HashSet<InetAddress>(dc.getValue()));
+
+        Topology topology = tokenMetadata.getTopology();
+        // all endpoints in each DC, so we can check when we have exhausted all the members of a DC
+        Multimap<String, InetAddress> allEndpoints = topology.getDatacenterEndpoints();
+        // all racks in a DC so we can check when we have exhausted all racks in a DC
+        Map<String, Multimap<String, InetAddress>> racks = topology.getDatacenterRacks();
+        assert !allEndpoints.isEmpty() && !racks.isEmpty() : "not aware of any cluster members";
+
+        // tracks the racks we have already placed replicas in
+        Map<String, Set<String>> seenRacks = new HashMap<>(datacenters.size());
+        for (Map.Entry<String, Integer> dc : datacenters.entrySet())
+            seenRacks.put(dc.getKey(), new HashSet<String>());
+
+        // tracks the endpoints that we skipped over while looking for unique racks
+        // when we relax the rack uniqueness we can append this to the current result so we don't have to wind back the iterator
+        Map<String, Set<InetAddress>> skippedDcEndpoints = new HashMap<>(datacenters.size());
+        for (Map.Entry<String, Integer> dc : datacenters.entrySet())
+            skippedDcEndpoints.put(dc.getKey(), new LinkedHashSet<InetAddress>());
+
+        Iterator<Token> tokenIter = TokenMetadata.ringIterator(tokenMetadata.sortedTokens(), searchToken, false);
+        while (tokenIter.hasNext() && !hasSufficientReplicas(dcReplicas, allEndpoints, datacenters))
+        {
+            Token next = tokenIter.next();
+            InetAddress ep = tokenMetadata.getEndpoint(next);
+            String dc = snitch.getDatacenter(ep);
+            // have we already found all replicas for this dc?
+            if (!datacenters.containsKey(dc) || hasSufficientReplicas(dc, dcReplicas, allEndpoints, datacenters))
+                continue;
+            // can we skip checking the rack?
+            if (seenRacks.get(dc).size() == racks.get(dc).keySet().size())
+            {
+                dcReplicas.get(dc).add(ep);
+                replicas.add(ep);
+            }
+            else
+            {
+                String rack = snitch.getRack(ep);
+                // is this a new rack?
+                if (seenRacks.get(dc).contains(rack))
+                {
+                    skippedDcEndpoints.get(dc).add(ep);
+                }
+                else
+                {
+                    dcReplicas.get(dc).add(ep);
+                    replicas.add(ep);
+                    seenRacks.get(dc).add(rack);
+                    // if we've run out of distinct racks, add the hosts we skipped past already (up to RF)
+                    if (seenRacks.get(dc).size() == racks.get(dc).keySet().size())
+                    {
+                        Iterator<InetAddress> skippedIt = skippedDcEndpoints.get(dc).iterator();
+                        while (skippedIt.hasNext() && !hasSufficientReplicas(dc, dcReplicas, allEndpoints, datacenters))
+                        {
+                            InetAddress nextSkipped = skippedIt.next();
+                            dcReplicas.get(dc).add(nextSkipped);
+                            replicas.add(nextSkipped);
+                        }
+                    }
+                }
+            }
+        }
+
+        return new ArrayList<InetAddress>(replicas);
+    }
+
+    private static boolean hasSufficientReplicas(String dc, Map<String, Set<InetAddress>> dcReplicas, Multimap<String, InetAddress> allEndpoints, Map<String, Integer> datacenters)
+    {
+        return dcReplicas.get(dc).size() >= Math.min(allEndpoints.get(dc).size(), getReplicationFactor(dc, datacenters));
+    }
+
+    private static boolean hasSufficientReplicas(Map<String, Set<InetAddress>> dcReplicas, Multimap<String, InetAddress> allEndpoints, Map<String, Integer> datacenters)
+    {
+        for (String dc : datacenters.keySet())
+            if (!hasSufficientReplicas(dc, dcReplicas, allEndpoints, datacenters))
+                return false;
+        return true;
+    }
+
+    public static int getReplicationFactor(String dc, Map<String, Integer> datacenters)
+    {
+        Integer replicas = datacenters.get(dc);
+        return replicas == null ? 0 : replicas;
+    }
 }

diff --git a/test/unit/org/apache/cassandra/metrics/CassandraMetricsRegistryTest.java b/test/unit/org/apache/cassandra/metrics/CassandraMetricsRegistryTest.java
new file mode 100644
index 0000000..e18e005
--- /dev/null
+++ b/test/unit/org/apache/cassandra/metrics/CassandraMetricsRegistryTest.java

@@ -0,0 +1,54 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.metrics;
+
+import org.junit.Test;
+import org.apache.cassandra.metrics.CassandraMetricsRegistry.MetricName;
+import static org.junit.Assert.*;
+
+
+public class CassandraMetricsRegistryTest
+{
+    // A class with a name ending in '$'
+    private static class StrangeName$
+    {
+    }
+
+    @Test
+    public void testChooseType()
+    {
+        assertEquals("StrangeName", MetricName.chooseType(null, StrangeName$.class));
+        assertEquals("StrangeName", MetricName.chooseType("", StrangeName$.class));
+        assertEquals("String", MetricName.chooseType(null, String.class));
+        assertEquals("String", MetricName.chooseType("", String.class));
+        
+        assertEquals("a", MetricName.chooseType("a", StrangeName$.class));
+        assertEquals("b", MetricName.chooseType("b", String.class));
+    }
+    
+    @Test
+    public void testMetricName()
+    {
+         MetricName name = new MetricName(StrangeName$.class, "NaMe", "ScOpE");
+         assertEquals("StrangeName", name.getType());
+    }
+    
+}

diff --git a/test/unit/org/apache/cassandra/net/MessagingServiceTest.java b/test/unit/org/apache/cassandra/net/MessagingServiceTest.java
index 8631f03..ef51f30 100644
--- a/test/unit/org/apache/cassandra/net/MessagingServiceTest.java
+++ b/test/unit/org/apache/cassandra/net/MessagingServiceTest.java

@@ -20,38 +20,98 @@
  */
 package org.apache.cassandra.net;
 
+import java.io.ByteArrayInputStream;
+import java.io.ByteArrayOutputStream;
+import java.io.IOException;
+import java.net.InetAddress;
+import java.util.Arrays;
 import java.util.List;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.TimeUnit;
 
+import com.codahale.metrics.Timer;
+
+import org.apache.cassandra.io.util.DataInputPlus.DataInputStreamPlus;
+import org.apache.cassandra.io.util.DataOutputStreamPlus;
+import org.apache.cassandra.io.util.WrappedDataOutputStreamPlus;
+import org.caffinitas.ohc.histo.EstimatedHistogram;
 import org.junit.Test;
 
-import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.*;
 
 public class MessagingServiceTest
 {
     private final MessagingService messagingService = MessagingService.test();
+    private final static long[] bucketOffsets = new EstimatedHistogram(160).getBucketOffsets();
 
     @Test
     public void testDroppedMessages()
     {
         MessagingService.Verb verb = MessagingService.Verb.READ;
 
-        for (int i = 0; i < 5000; i++)
-            messagingService.incrementDroppedMessages(verb, i % 2 == 0);
+        for (int i = 1; i <= 5000; i++)
+            messagingService.incrementDroppedMessages(verb, i, i % 2 == 0);
 
         List<String> logs = messagingService.getDroppedMessagesLogs();
         assertEquals(1, logs.size());
-        assertEquals("READ messages were dropped in last 5000 ms: 2500 for internal timeout and 2500 for cross node timeout", logs.get(0));
+        assertEquals("READ messages were dropped in last 5000 ms: 2500 for internal timeout and 2500 for cross node timeout. Mean internal dropped latency: 2730 ms and Mean cross-node dropped latency: 2731 ms", logs.get(0));
         assertEquals(5000, (int)messagingService.getDroppedMessages().get(verb.toString()));
 
         logs = messagingService.getDroppedMessagesLogs();
         assertEquals(0, logs.size());
 
         for (int i = 0; i < 2500; i++)
-            messagingService.incrementDroppedMessages(verb, i % 2 == 0);
+            messagingService.incrementDroppedMessages(verb, i, i % 2 == 0);
 
         logs = messagingService.getDroppedMessagesLogs();
-        assertEquals("READ messages were dropped in last 5000 ms: 1250 for internal timeout and 1250 for cross node timeout", logs.get(0));
+        assertEquals("READ messages were dropped in last 5000 ms: 1250 for internal timeout and 1250 for cross node timeout. Mean internal dropped latency: 2277 ms and Mean cross-node dropped latency: 2278 ms", logs.get(0));
         assertEquals(7500, (int)messagingService.getDroppedMessages().get(verb.toString()));
     }
 
+    private static void addDCLatency(long sentAt, long now) throws IOException
+    {
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+        try (DataOutputStreamPlus out = new WrappedDataOutputStreamPlus(baos))
+        {
+            out.writeInt((int) sentAt);
+        }
+        DataInputStreamPlus in = new DataInputStreamPlus(new ByteArrayInputStream(baos.toByteArray()));
+        MessageIn.readTimestamp(InetAddress.getLocalHost(), in, now);
+    }
+
+    @Test
+    public void testDCLatency() throws Exception
+    {
+        int latency = 100;
+
+        ConcurrentHashMap<String, Timer> dcLatency = MessagingService.instance().metrics.dcLatency;
+        dcLatency.clear();
+
+        long now = System.currentTimeMillis();
+        long sentAt = now - latency;
+
+        assertNull(dcLatency.get("datacenter1"));
+        addDCLatency(sentAt, now);
+        assertNotNull(dcLatency.get("datacenter1"));
+        assertEquals(1, dcLatency.get("datacenter1").getCount());
+        long expectedBucket = bucketOffsets[Math.abs(Arrays.binarySearch(bucketOffsets, TimeUnit.MILLISECONDS.toNanos(latency))) - 1];
+        assertEquals(expectedBucket, dcLatency.get("datacenter1").getSnapshot().getMax());
+    }
+
+    @Test
+    public void testNegativeDCLatency() throws Exception
+    {
+        // if clocks are off should just not track anything
+        int latency = -100;
+
+        ConcurrentHashMap<String, Timer> dcLatency = MessagingService.instance().metrics.dcLatency;
+        dcLatency.clear();
+
+        long now = System.currentTimeMillis();
+        long sentAt = now - latency;
+
+        assertNull(dcLatency.get("datacenter1"));
+        addDCLatency(sentAt, now);
+        assertNull(dcLatency.get("datacenter1"));
+    }
 }

diff --git a/test/unit/org/apache/cassandra/schema/IndexMetadataTest.java b/test/unit/org/apache/cassandra/schema/IndexMetadataTest.java
new file mode 100644
index 0000000..785ed73
--- /dev/null
+++ b/test/unit/org/apache/cassandra/schema/IndexMetadataTest.java

@@ -0,0 +1,56 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.schema;
+
+import org.junit.Assert;
+import org.junit.Test;
+
+import static org.junit.Assert.assertFalse;
+import static org.junit.Assert.assertTrue;
+
+public class IndexMetadataTest {
+    
+    @Test
+    public void testIsNameValidPositive()
+    {
+        assertTrue(IndexMetadata.isNameValid("abcdefghijklmnopqrstuvwxyz"));
+        assertTrue(IndexMetadata.isNameValid("ABCDEFGHIJKLMNOPQRSTUVWXYZ"));
+        assertTrue(IndexMetadata.isNameValid("_01234567890"));
+    }
+    
+    @Test
+    public void testIsNameValidNegative()
+    {
+        assertFalse(IndexMetadata.isNameValid(null));
+        assertFalse(IndexMetadata.isNameValid(""));
+        assertFalse(IndexMetadata.isNameValid(" "));
+        assertFalse(IndexMetadata.isNameValid("@"));
+        assertFalse(IndexMetadata.isNameValid("!"));
+    }
+    
+    @Test
+    public void testGetDefaultIndexName()
+    {
+        Assert.assertEquals("aB4__idx", IndexMetadata.getDefaultIndexName("a B-4@!_+", null));
+        Assert.assertEquals("34_Ddd_F6_idx", IndexMetadata.getDefaultIndexName("34_()Ddd", "#F%6*"));
+        
+    }
+}

diff --git a/test/unit/org/apache/cassandra/schema/LegacySchemaMigratorTest.java b/test/unit/org/apache/cassandra/schema/LegacySchemaMigratorTest.java
index feb2778..2de671c 100644
--- a/test/unit/org/apache/cassandra/schema/LegacySchemaMigratorTest.java
+++ b/test/unit/org/apache/cassandra/schema/LegacySchemaMigratorTest.java

@@ -31,10 +31,11 @@
 import org.apache.cassandra.config.Schema;
 import org.apache.cassandra.cql3.CQLTester;
 import org.apache.cassandra.cql3.ColumnIdentifier;
+import org.apache.cassandra.cql3.FieldIdentifier;
 import org.apache.cassandra.cql3.functions.*;
 import org.apache.cassandra.db.*;
 import org.apache.cassandra.db.marshal.*;
-import org.apache.cassandra.index.internal.CassandraIndex;
+import org.apache.cassandra.index.TargetParser;
 import org.apache.cassandra.thrift.ThriftConversion;
 
 import static java.lang.String.format;
@@ -87,7 +88,7 @@
         }
 
         // make sure that we've read *exactly* the same set of keyspaces/tables/types/functions
-        assertEquals(expected, actual);
+        assertEquals(expected.diff(actual).toString(), expected, actual);
 
         // check that the build status of all indexes has been updated to use the new
         // format of index name: the index_name column of system.IndexInfo used to
@@ -95,6 +96,11 @@
         expected.forEach(LegacySchemaMigratorTest::verifyIndexBuildStatus);
     }
 
+    private static FieldIdentifier field(String field)
+    {
+        return FieldIdentifier.forQuoted(field);
+    }
+
     private static void loadLegacySchemaTables()
     {
         KeyspaceMetadata systemKeyspace = Schema.instance.getKSMetaData(SystemKeyspace.NAME);
@@ -310,18 +316,21 @@
 
         UserType udt1 = new UserType(keyspace,
                                      bytes("udt1"),
-                                     new ArrayList<ByteBuffer>() {{ add(bytes("col1")); add(bytes("col2")); }},
-                                     new ArrayList<AbstractType<?>>() {{ add(UTF8Type.instance); add(Int32Type.instance); }});
+                                     new ArrayList<FieldIdentifier>() {{ add(field("col1")); add(field("col2")); }},
+                                     new ArrayList<AbstractType<?>>() {{ add(UTF8Type.instance); add(Int32Type.instance); }},
+                                     true);
 
         UserType udt2 = new UserType(keyspace,
                                      bytes("udt2"),
-                                     new ArrayList<ByteBuffer>() {{ add(bytes("col3")); add(bytes("col4")); }},
-                                     new ArrayList<AbstractType<?>>() {{ add(BytesType.instance); add(BooleanType.instance); }});
+                                     new ArrayList<FieldIdentifier>() {{ add(field("col3")); add(field("col4")); }},
+                                     new ArrayList<AbstractType<?>>() {{ add(BytesType.instance); add(BooleanType.instance); }},
+                                     true);
 
         UserType udt3 = new UserType(keyspace,
                                      bytes("udt3"),
-                                     new ArrayList<ByteBuffer>() {{ add(bytes("col5")); }},
-                                     new ArrayList<AbstractType<?>>() {{ add(AsciiType.instance); }});
+                                     new ArrayList<FieldIdentifier>() {{ add(field("col5")); }},
+                                     new ArrayList<AbstractType<?>>() {{ add(AsciiType.instance); }},
+                                     true);
 
         return KeyspaceMetadata.create(keyspace,
                                        KeyspaceParams.simple(1),
@@ -430,13 +439,15 @@
 
         UserType udt1 = new UserType(keyspace,
                                      bytes("udt1"),
-                                     new ArrayList<ByteBuffer>() {{ add(bytes("col1")); add(bytes("col2")); }},
-                                     new ArrayList<AbstractType<?>>() {{ add(UTF8Type.instance); add(Int32Type.instance); }});
+                                     new ArrayList<FieldIdentifier>() {{ add(field("col1")); add(field("col2")); }},
+                                     new ArrayList<AbstractType<?>>() {{ add(UTF8Type.instance); add(Int32Type.instance); }},
+                                     true);
 
         UserType udt2 = new UserType(keyspace,
                                      bytes("udt2"),
-                                     new ArrayList<ByteBuffer>() {{ add(bytes("col1")); add(bytes("col2")); }},
-                                     new ArrayList<AbstractType<?>>() {{ add(ListType.getInstance(udt1, false)); add(Int32Type.instance); }});
+                                     new ArrayList<FieldIdentifier>() {{ add(field("col1")); add(field("col2")); }},
+                                     new ArrayList<AbstractType<?>>() {{ add(ListType.getInstance(udt1, false)); add(Int32Type.instance); }},
+                                     true);
 
         UDFunction udf1 = UDFunction.create(new FunctionName(keyspace, "udf"),
                                             ImmutableList.of(new ColumnIdentifier("col1", false), new ColumnIdentifier("col2", false)),
@@ -477,13 +488,15 @@
 
         UserType udt1 = new UserType(keyspace,
                                      bytes("udt1"),
-                                     new ArrayList<ByteBuffer>() {{ add(bytes("col1")); add(bytes("col2")); }},
-                                     new ArrayList<AbstractType<?>>() {{ add(UTF8Type.instance); add(Int32Type.instance); }});
+                                     new ArrayList<FieldIdentifier>() {{ add(field("col1")); add(field("col2")); }},
+                                     new ArrayList<AbstractType<?>>() {{ add(UTF8Type.instance); add(Int32Type.instance); }},
+                                     true);
 
         UserType udt2 = new UserType(keyspace,
                                      bytes("udt2"),
-                                     new ArrayList<ByteBuffer>() {{ add(bytes("col1")); add(bytes("col2")); }},
-                                     new ArrayList<AbstractType<?>>() {{ add(ListType.getInstance(udt1, false)); add(Int32Type.instance); }});
+                                     new ArrayList<FieldIdentifier>() {{ add(field("col1")); add(field("col2")); }},
+                                     new ArrayList<AbstractType<?>>() {{ add(ListType.getInstance(udt1, false)); add(Int32Type.instance); }},
+                                     true);
 
         UDFunction udf1 = UDFunction.create(new FunctionName(keyspace, "udf1"),
                                             ImmutableList.of(new ColumnIdentifier("col1", false), new ColumnIdentifier("col2", false)),
@@ -681,7 +694,7 @@
         // index targets can be parsed by CassandraIndex.parseTarget
         // which should be true for any pre-3.0 index
         for (IndexMetadata index : indexes)
-          if (CassandraIndex.parseTarget(table, index).left.equals(column))
+          if (TargetParser.parse(table, index).left.equals(column))
                 return Optional.of(index);
 
         return Optional.empty();
@@ -721,7 +734,7 @@
 
         for (int i = 0; i < type.size(); i++)
         {
-            adder.addListEntry("field_names", type.fieldName(i))
+            adder.addListEntry("field_names", type.fieldName(i).toString())
                  .addListEntry("field_types", type.fieldType(i).toString());
         }
 

diff --git a/test/unit/org/apache/cassandra/security/CipherFactoryTest.java b/test/unit/org/apache/cassandra/security/CipherFactoryTest.java
new file mode 100644
index 0000000..4ba265e
--- /dev/null
+++ b/test/unit/org/apache/cassandra/security/CipherFactoryTest.java

@@ -0,0 +1,119 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.security;
+
+import java.io.IOException;
+import java.security.SecureRandom;
+
+import javax.crypto.BadPaddingException;
+import javax.crypto.Cipher;
+import javax.crypto.IllegalBlockSizeException;
+
+import com.google.common.base.Charsets;
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.cassandra.config.TransparentDataEncryptionOptions;
+
+public class CipherFactoryTest
+{
+    // http://www.gutenberg.org/files/4300/4300-h/4300-h.htm
+    static final String ULYSSEUS = "Stately, plump Buck Mulligan came from the stairhead, bearing a bowl of lather on which a mirror and a razor lay crossed. " +
+                                   "A yellow dressinggown, ungirdled, was sustained gently behind him on the mild morning air. He held the bowl aloft and intoned: " +
+                                   "-Introibo ad altare Dei.";
+    TransparentDataEncryptionOptions encryptionOptions;
+    CipherFactory cipherFactory;
+    SecureRandom secureRandom;
+
+    @Before
+    public void setup()
+    {
+        secureRandom = new SecureRandom(new byte[] {0,1,2,3,4,5,6,7,8,9} );
+        encryptionOptions = EncryptionContextGenerator.createEncryptionOptions();
+        cipherFactory = new CipherFactory(encryptionOptions);
+    }
+
+    @Test
+    public void roundTrip() throws IOException, BadPaddingException, IllegalBlockSizeException
+    {
+        Cipher encryptor = cipherFactory.getEncryptor(encryptionOptions.cipher, encryptionOptions.key_alias);
+        byte[] original = ULYSSEUS.getBytes(Charsets.UTF_8);
+        byte[] encrypted = encryptor.doFinal(original);
+
+        Cipher decryptor = cipherFactory.getDecryptor(encryptionOptions.cipher, encryptionOptions.key_alias, encryptor.getIV());
+        byte[] decrypted = decryptor.doFinal(encrypted);
+        Assert.assertEquals(ULYSSEUS, new String(decrypted, Charsets.UTF_8));
+    }
+
+    private byte[] nextIV()
+    {
+        byte[] b = new byte[16];
+        secureRandom.nextBytes(b);
+        return b;
+    }
+
+    @Test
+    public void buildCipher_SameParams() throws Exception
+    {
+        byte[] iv = nextIV();
+        Cipher c1 = cipherFactory.buildCipher(encryptionOptions.cipher, encryptionOptions.key_alias, iv, Cipher.ENCRYPT_MODE);
+        Cipher c2 = cipherFactory.buildCipher(encryptionOptions.cipher, encryptionOptions.key_alias, iv, Cipher.ENCRYPT_MODE);
+        Assert.assertTrue(c1 == c2);
+    }
+
+    @Test
+    public void buildCipher_DifferentModes() throws Exception
+    {
+        byte[] iv = nextIV();
+        Cipher c1 = cipherFactory.buildCipher(encryptionOptions.cipher, encryptionOptions.key_alias, iv, Cipher.ENCRYPT_MODE);
+        Cipher c2 = cipherFactory.buildCipher(encryptionOptions.cipher, encryptionOptions.key_alias, iv, Cipher.DECRYPT_MODE);
+        Assert.assertFalse(c1 == c2);
+    }
+
+    @Test
+    public void buildCipher_DifferentIVs() throws Exception
+    {
+        Cipher c1 = cipherFactory.buildCipher(encryptionOptions.cipher, encryptionOptions.key_alias, nextIV(), Cipher.ENCRYPT_MODE);
+        Cipher c2 = cipherFactory.buildCipher(encryptionOptions.cipher, encryptionOptions.key_alias, nextIV(), Cipher.DECRYPT_MODE);
+        Assert.assertFalse(c1 == c2);
+    }
+
+    @Test
+    public void buildCipher_DifferentAliases() throws Exception
+    {
+        Cipher c1 = cipherFactory.buildCipher(encryptionOptions.cipher, encryptionOptions.key_alias, nextIV(), Cipher.ENCRYPT_MODE);
+        Cipher c2 = cipherFactory.buildCipher(encryptionOptions.cipher, EncryptionContextGenerator.KEY_ALIAS_2, nextIV(), Cipher.DECRYPT_MODE);
+        Assert.assertFalse(c1 == c2);
+    }
+
+    @Test(expected = AssertionError.class)
+    public void getDecryptor_NullIv() throws IOException
+    {
+        cipherFactory.getDecryptor(encryptionOptions.cipher, encryptionOptions.key_alias, null);
+    }
+
+    @Test(expected = AssertionError.class)
+    public void getDecryptor_EmptyIv() throws IOException
+    {
+        cipherFactory.getDecryptor(encryptionOptions.cipher, encryptionOptions.key_alias, new byte[0]);
+    }
+}

diff --git a/test/unit/org/apache/cassandra/security/EncryptionContextGenerator.java b/test/unit/org/apache/cassandra/security/EncryptionContextGenerator.java
new file mode 100644
index 0000000..4719356
--- /dev/null
+++ b/test/unit/org/apache/cassandra/security/EncryptionContextGenerator.java

@@ -0,0 +1,59 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.security;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import org.apache.cassandra.config.ParameterizedClass;
+import org.apache.cassandra.config.TransparentDataEncryptionOptions;
+
+public class EncryptionContextGenerator
+{
+    public static final String KEY_ALIAS_1 = "testing:1";
+    public static final String KEY_ALIAS_2 = "testing:2";
+
+    public static EncryptionContext createContext(boolean init)
+    {
+        return createContext(null, init);
+    }
+
+    public static EncryptionContext createContext(byte[] iv, boolean init)
+    {
+        return new EncryptionContext(createEncryptionOptions(), iv, init);
+    }
+
+    public static TransparentDataEncryptionOptions createEncryptionOptions()
+    {
+        Map<String,String> params = new HashMap<>();
+        params.put("keystore", "test/conf/cassandra.keystore");
+        params.put("keystore_password", "cassandra");
+        params.put("store_type", "JCEKS");
+        ParameterizedClass keyProvider = new ParameterizedClass(JKSKeyProvider.class.getName(), params);
+
+        return new TransparentDataEncryptionOptions("AES/CBC/PKCS5Padding", KEY_ALIAS_1, keyProvider);
+    }
+
+    public static EncryptionContext createDisabledContext()
+    {
+        return new EncryptionContext();
+    }
+}

diff --git a/test/unit/org/apache/cassandra/security/EncryptionUtilsTest.java b/test/unit/org/apache/cassandra/security/EncryptionUtilsTest.java
new file mode 100644
index 0000000..be37f45
--- /dev/null
+++ b/test/unit/org/apache/cassandra/security/EncryptionUtilsTest.java

@@ -0,0 +1,116 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.security;
+
+import java.io.File;
+import java.io.IOException;
+import java.io.RandomAccessFile;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.util.HashMap;
+import java.util.Random;
+import javax.crypto.BadPaddingException;
+import javax.crypto.Cipher;
+import javax.crypto.IllegalBlockSizeException;
+import javax.crypto.ShortBufferException;
+
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.cassandra.config.TransparentDataEncryptionOptions;
+import org.apache.cassandra.io.compress.ICompressor;
+import org.apache.cassandra.io.compress.LZ4Compressor;
+import org.apache.cassandra.io.util.RandomAccessReader;
+
+public class EncryptionUtilsTest
+{
+    final Random random = new Random();
+    ICompressor compressor;
+    TransparentDataEncryptionOptions tdeOptions;
+
+    @Before
+    public void setup()
+    {
+        compressor = LZ4Compressor.create(new HashMap<>());
+        tdeOptions = EncryptionContextGenerator.createEncryptionOptions();
+    }
+
+    @Test
+    public void compress() throws IOException
+    {
+        byte[] buf = new byte[(1 << 13) - 13];
+        random.nextBytes(buf);
+        ByteBuffer compressedBuffer = EncryptionUtils.compress(ByteBuffer.wrap(buf), ByteBuffer.allocate(0), true, compressor);
+        ByteBuffer uncompressedBuffer = EncryptionUtils.uncompress(compressedBuffer, ByteBuffer.allocate(0), true, compressor);
+        Assert.assertArrayEquals(buf, uncompressedBuffer.array());
+    }
+
+    @Test
+    public void encrypt() throws BadPaddingException, ShortBufferException, IllegalBlockSizeException, IOException
+    {
+        byte[] buf = new byte[(1 << 12) - 7];
+        random.nextBytes(buf);
+
+        // encrypt
+        CipherFactory cipherFactory = new CipherFactory(tdeOptions);
+        Cipher encryptor = cipherFactory.getEncryptor(tdeOptions.cipher, tdeOptions.key_alias);
+
+        File f = File.createTempFile("commitlog-enc-utils-", ".tmp");
+        f.deleteOnExit();
+        FileChannel channel = new RandomAccessFile(f, "rw").getChannel();
+        EncryptionUtils.encryptAndWrite(ByteBuffer.wrap(buf), channel, true, encryptor);
+        channel.close();
+
+        // decrypt
+        Cipher decryptor = cipherFactory.getDecryptor(tdeOptions.cipher, tdeOptions.key_alias, encryptor.getIV());
+        ByteBuffer decryptedBuffer = EncryptionUtils.decrypt(RandomAccessReader.open(f), ByteBuffer.allocate(0), true, decryptor);
+
+        // normally, we'd just call BB.array(), but that gives you the *entire* backing array, not with any of the offsets (position,limit) applied.
+        // thus, just for this test, we copy the array and perform an array-level comparison with those offsets
+        decryptedBuffer.limit(buf.length);
+        byte[] b = new byte[buf.length];
+        System.arraycopy(decryptedBuffer.array(), 0, b, 0, buf.length);
+        Assert.assertArrayEquals(buf, b);
+    }
+
+    @Test
+    public void fullRoundTrip() throws IOException, BadPaddingException, ShortBufferException, IllegalBlockSizeException
+    {
+        // compress
+        byte[] buf = new byte[(1 << 12) - 7];
+        random.nextBytes(buf);
+        ByteBuffer compressedBuffer = EncryptionUtils.compress(ByteBuffer.wrap(buf), ByteBuffer.allocate(0), true, compressor);
+
+        // encrypt
+        CipherFactory cipherFactory = new CipherFactory(tdeOptions);
+        Cipher encryptor = cipherFactory.getEncryptor(tdeOptions.cipher, tdeOptions.key_alias);
+        File f = File.createTempFile("commitlog-enc-utils-", ".tmp");
+        f.deleteOnExit();
+        FileChannel channel = new RandomAccessFile(f, "rw").getChannel();
+        EncryptionUtils.encryptAndWrite(compressedBuffer, channel, true, encryptor);
+
+        // decrypt
+        Cipher decryptor = cipherFactory.getDecryptor(tdeOptions.cipher, tdeOptions.key_alias, encryptor.getIV());
+        ByteBuffer decryptedBuffer = EncryptionUtils.decrypt(RandomAccessReader.open(f), ByteBuffer.allocate(0), true, decryptor);
+
+        // uncompress
+        ByteBuffer uncompressedBuffer = EncryptionUtils.uncompress(decryptedBuffer, ByteBuffer.allocate(0), true, compressor);
+        Assert.assertArrayEquals(buf, uncompressedBuffer.array());
+    }
+}

diff --git a/test/unit/org/apache/cassandra/security/JKSKeyProviderTest.java b/test/unit/org/apache/cassandra/security/JKSKeyProviderTest.java
new file mode 100644
index 0000000..081f688
--- /dev/null
+++ b/test/unit/org/apache/cassandra/security/JKSKeyProviderTest.java

@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.security;
+
+import java.io.IOException;
+
+import org.junit.Assert;
+import org.junit.Before;
+import org.junit.Test;
+
+import org.apache.cassandra.config.TransparentDataEncryptionOptions;
+
+public class JKSKeyProviderTest
+{
+    JKSKeyProvider jksKeyProvider;
+    TransparentDataEncryptionOptions tdeOptions;
+
+    @Before
+    public void setup()
+    {
+        tdeOptions = EncryptionContextGenerator.createEncryptionOptions();
+        jksKeyProvider = new JKSKeyProvider(tdeOptions);
+    }
+
+    @Test
+    public void getSecretKey_WithKeyPassword() throws IOException
+    {
+        Assert.assertNotNull(jksKeyProvider.getSecretKey(tdeOptions.key_alias));
+    }
+
+    @Test
+    public void getSecretKey_WithoutKeyPassword() throws IOException
+    {
+        tdeOptions.remove("key_password");
+        Assert.assertNotNull(jksKeyProvider.getSecretKey(tdeOptions.key_alias));
+    }
+}

diff --git a/test/unit/org/apache/cassandra/service/ActiveRepairServiceTest.java b/test/unit/org/apache/cassandra/service/ActiveRepairServiceTest.java
index adcd684..2c1a8d2 100644
--- a/test/unit/org/apache/cassandra/service/ActiveRepairServiceTest.java
+++ b/test/unit/org/apache/cassandra/service/ActiveRepairServiceTest.java

@@ -270,6 +270,7 @@
         Set<SSTableReader> original = Sets.newHashSet(store.select(View.select(SSTableSet.CANONICAL, (s) -> !s.isRepaired())).sstables);
         UUID prsId = UUID.randomUUID();
         ActiveRepairService.instance.registerParentRepairSession(prsId, FBUtilities.getBroadcastAddress(), Collections.singletonList(store), null, true, System.currentTimeMillis(), true);
+
         ActiveRepairService.ParentRepairSession prs = ActiveRepairService.instance.getParentRepairSession(prsId);
         prs.markSSTablesRepairing(store.metadata.cfId, prsId);
         try (Refs<SSTableReader> refs = prs.getActiveRepairedSSTableRefsForAntiCompaction(store.metadata.cfId, prsId))

diff --git a/test/unit/org/apache/cassandra/service/ClientWarningsTest.java b/test/unit/org/apache/cassandra/service/ClientWarningsTest.java
index cf14d55..78b1c88 100644
--- a/test/unit/org/apache/cassandra/service/ClientWarningsTest.java
+++ b/test/unit/org/apache/cassandra/service/ClientWarningsTest.java

@@ -72,9 +72,13 @@
         {
             client.connect(false);
 
-            QueryMessage query = new QueryMessage(createBatchStatement(DatabaseDescriptor.getBatchSizeWarnThreshold()), QueryOptions.DEFAULT);
+            QueryMessage query = new QueryMessage(createBatchStatement2(DatabaseDescriptor.getBatchSizeWarnThreshold() / 2 + 1), QueryOptions.DEFAULT);
             Message.Response resp = client.execute(query);
             assertEquals(1, resp.getWarnings().size());
+
+            query = new QueryMessage(createBatchStatement(DatabaseDescriptor.getBatchSizeWarnThreshold()), QueryOptions.DEFAULT);
+            resp = client.execute(query);
+            assertNull(resp.getWarnings());
         }
     }
 

diff --git a/test/unit/org/apache/cassandra/service/DataResolverTest.java b/test/unit/org/apache/cassandra/service/DataResolverTest.java
index c9878d4..fe7e211 100644
--- a/test/unit/org/apache/cassandra/service/DataResolverTest.java
+++ b/test/unit/org/apache/cassandra/service/DataResolverTest.java

@@ -53,7 +53,7 @@
 import static org.junit.Assert.assertFalse;
 import static org.junit.Assert.assertNotNull;
 import static org.junit.Assert.assertTrue;
-import static org.apache.cassandra.db.RangeTombstone.Bound.Kind;
+import static org.apache.cassandra.db.ClusteringBound.Kind;
 
 public class DataResolverTest
 {
@@ -525,18 +525,26 @@
     // Forces the start to be exclusive if the condition holds
     private static RangeTombstone withExclusiveStartIf(RangeTombstone rt, boolean condition)
     {
+        if (!condition)
+            return rt;
+
         Slice slice = rt.deletedSlice();
+        ClusteringBound newStart = ClusteringBound.create(Kind.EXCL_START_BOUND, slice.start().getRawValues());
         return condition
-             ? new RangeTombstone(Slice.make(slice.start().withNewKind(Kind.EXCL_START_BOUND), slice.end()), rt.deletionTime())
+             ? new RangeTombstone(Slice.make(newStart, slice.end()), rt.deletionTime())
              : rt;
     }
 
     // Forces the end to be exclusive if the condition holds
     private static RangeTombstone withExclusiveEndIf(RangeTombstone rt, boolean condition)
     {
+        if (!condition)
+            return rt;
+
         Slice slice = rt.deletedSlice();
+        ClusteringBound newEnd = ClusteringBound.create(Kind.EXCL_END_BOUND, slice.end().getRawValues());
         return condition
-             ? new RangeTombstone(Slice.make(slice.start(), slice.end().withNewKind(Kind.EXCL_END_BOUND)), rt.deletionTime())
+             ? new RangeTombstone(Slice.make(slice.start(), newEnd), rt.deletionTime())
              : rt;
     }
 
@@ -547,7 +555,7 @@
 
     private Cell mapCell(int k, int v, long ts)
     {
-        return BufferCell.live(cfm2, m, ts, bb(v), CellPath.create(bb(k)));
+        return BufferCell.live(m, ts, bb(v), CellPath.create(bb(k)));
     }
 
     @Test
@@ -818,7 +826,8 @@
                                 ReadResponse.createRemoteDataResponse(partitionIterator, cmd),
                                 Collections.EMPTY_MAP,
                                 MessagingService.Verb.REQUEST_RESPONSE,
-                                MessagingService.current_version);
+                                MessagingService.current_version,
+                                MessageIn.createTimestamp());
     }
 
     private RangeTombstone tombstone(Object start, Object end, long markedForDeleteAt, int localDeletionTime)
@@ -828,15 +837,11 @@
 
     private RangeTombstone tombstone(Object start, boolean inclusiveStart, Object end, boolean inclusiveEnd, long markedForDeleteAt, int localDeletionTime)
     {
-        RangeTombstone.Bound.Kind startKind = inclusiveStart
-                                            ? Kind.INCL_START_BOUND
-                                            : Kind.EXCL_START_BOUND;
-        RangeTombstone.Bound.Kind endKind = inclusiveEnd
-                                          ? Kind.INCL_END_BOUND
-                                          : Kind.EXCL_END_BOUND;
+        Kind startKind = inclusiveStart ? Kind.INCL_START_BOUND : Kind.EXCL_START_BOUND;
+        Kind endKind = inclusiveEnd ? Kind.INCL_END_BOUND : Kind.EXCL_END_BOUND;
 
-        RangeTombstone.Bound startBound = new RangeTombstone.Bound(startKind, cfm.comparator.make(start).getRawValues());
-        RangeTombstone.Bound endBound = new RangeTombstone.Bound(endKind, cfm.comparator.make(end).getRawValues());
+        ClusteringBound startBound = ClusteringBound.create(startKind, cfm.comparator.make(start).getRawValues());
+        ClusteringBound endBound = ClusteringBound.create(endKind, cfm.comparator.make(end).getRawValues());
         return new RangeTombstone(Slice.make(startBound, endBound), new DeletionTime(markedForDeleteAt, localDeletionTime));
     }
 

diff --git a/test/unit/org/apache/cassandra/service/QueryPagerTest.java b/test/unit/org/apache/cassandra/service/QueryPagerTest.java
index bfc66e0..2f2a236 100644
--- a/test/unit/org/apache/cassandra/service/QueryPagerTest.java
+++ b/test/unit/org/apache/cassandra/service/QueryPagerTest.java

@@ -122,7 +122,8 @@
         StringBuilder sb = new StringBuilder();
         List<FilteredPartition> partitionList = new ArrayList<>();
         int rows = 0;
-        try (ReadOrderGroup orderGroup = pager.startOrderGroup(); PartitionIterator iterator = pager.fetchPageInternal(toQuery, orderGroup))
+        try (ReadExecutionController executionController = pager.executionController();
+             PartitionIterator iterator = pager.fetchPageInternal(toQuery, executionController))
         {
             while (iterator.hasNext())
             {

diff --git a/test/unit/org/apache/cassandra/service/RMIServerSocketFactoryImplTest.java b/test/unit/org/apache/cassandra/service/RMIServerSocketFactoryImplTest.java
index 393dfe1..fad3c78 100644
--- a/test/unit/org/apache/cassandra/service/RMIServerSocketFactoryImplTest.java
+++ b/test/unit/org/apache/cassandra/service/RMIServerSocketFactoryImplTest.java

@@ -36,7 +36,7 @@
     @Test
     public void testReusableAddrSocket() throws IOException
     {
-        RMIServerSocketFactory serverFactory = new RMIServerSocketFactoryImpl();
+        RMIServerSocketFactory serverFactory = new RMIServerSocketFactoryImpl(null);
         ServerSocket socket = serverFactory.createServerSocket(7199);
         assertTrue(socket.getReuseAddress());
     }

diff --git a/test/unit/org/apache/cassandra/service/RemoveTest.java b/test/unit/org/apache/cassandra/service/RemoveTest.java
index 9f1d6a8..4c26fc5 100644
--- a/test/unit/org/apache/cassandra/service/RemoveTest.java
+++ b/test/unit/org/apache/cassandra/service/RemoveTest.java

@@ -22,6 +22,7 @@
 import java.io.IOException;
 import java.net.InetAddress;
 import java.util.ArrayList;
+import java.util.Collection;
 import java.util.Collections;
 import java.util.List;
 import java.util.UUID;
@@ -31,11 +32,14 @@
 
 import org.apache.cassandra.SchemaLoader;
 import org.apache.cassandra.Util;
+import org.apache.cassandra.config.DatabaseDescriptor;
 import org.apache.cassandra.dht.IPartitioner;
 import org.apache.cassandra.dht.RandomPartitioner;
 import org.apache.cassandra.dht.Token;
 import org.apache.cassandra.exceptions.ConfigurationException;
+import org.apache.cassandra.gms.ApplicationState;
 import org.apache.cassandra.gms.Gossiper;
+import org.apache.cassandra.gms.VersionedValue.VersionedValueFactory;
 import org.apache.cassandra.locator.TokenMetadata;
 import org.apache.cassandra.net.MessageOut;
 import org.apache.cassandra.net.MessagingService;
@@ -108,6 +112,25 @@
         ss.removeNode(hostIds.get(0).toString());
     }
 
+    @Test(expected = UnsupportedOperationException.class)
+    public void testNonmemberId()
+    {
+        VersionedValueFactory valueFactory = new VersionedValueFactory(DatabaseDescriptor.getPartitioner());
+        Collection<Token> tokens = Collections.singleton(DatabaseDescriptor.getPartitioner().getRandomToken());
+
+        InetAddress joininghost = hosts.get(4);
+        UUID joiningId = hostIds.get(4);
+
+        hosts.remove(joininghost);
+        hostIds.remove(joiningId);
+
+        // Change a node to a bootstrapping node that is not yet a member of the ring
+        Gossiper.instance.injectApplicationState(joininghost, ApplicationState.TOKENS, valueFactory.tokens(tokens));
+        ss.onChange(joininghost, ApplicationState.STATUS, valueFactory.bootstrapping(tokens));
+
+        ss.removeNode(joiningId.toString());
+    }
+
     @Test
     public void testRemoveHostId() throws InterruptedException
     {

diff --git a/test/unit/org/apache/cassandra/service/StorageServiceServerTest.java b/test/unit/org/apache/cassandra/service/StorageServiceServerTest.java
index 392b6f4..e438a6b 100644
--- a/test/unit/org/apache/cassandra/service/StorageServiceServerTest.java
+++ b/test/unit/org/apache/cassandra/service/StorageServiceServerTest.java

@@ -94,10 +94,10 @@
     }
 
     @Test
-    public void testSnapshot() throws IOException
+    public void testSnapshotWithFlush() throws IOException
     {
         // no need to insert extra data, even an "empty" database will have a little information in the system keyspace
-        StorageService.instance.takeSnapshot("snapshot");
+        StorageService.instance.takeSnapshot(UUID.randomUUID().toString());
     }
 
     private void checkTempFilePresence(File f, boolean exist)
@@ -173,7 +173,14 @@
     public void testTableSnapshot() throws IOException
     {
         // no need to insert extra data, even an "empty" database will have a little information in the system keyspace
-        StorageService.instance.takeTableSnapshot(SchemaKeyspace.NAME, SchemaKeyspace.KEYSPACES, "cf_snapshot");
+        StorageService.instance.takeTableSnapshot(SchemaKeyspace.NAME, SchemaKeyspace.KEYSPACES, UUID.randomUUID().toString());
+    }
+
+    @Test
+    public void testSnapshot() throws IOException
+    {
+        // no need to insert extra data, even an "empty" database will have a little information in the system keyspace
+        StorageService.instance.takeSnapshot(UUID.randomUUID().toString(), SchemaKeyspace.NAME);
     }
 
     @Test

diff --git a/test/unit/org/apache/cassandra/service/pager/PagingStateTest.java b/test/unit/org/apache/cassandra/service/pager/PagingStateTest.java
index ba82e85..98fa959 100644
--- a/test/unit/org/apache/cassandra/service/pager/PagingStateTest.java
+++ b/test/unit/org/apache/cassandra/service/pager/PagingStateTest.java

@@ -49,8 +49,8 @@
         ByteBuffer pk = ByteBufferUtil.bytes("someKey");
 
         ColumnDefinition def = metadata.getColumnDefinition(new ColumnIdentifier("myCol", false));
-        Clustering c = new Clustering(ByteBufferUtil.bytes("c1"), ByteBufferUtil.bytes(42));
-        Row row = BTreeRow.singleCellRow(c, BufferCell.live(metadata, def, 0, ByteBufferUtil.EMPTY_BYTE_BUFFER));
+        Clustering c = Clustering.make(ByteBufferUtil.bytes("c1"), ByteBufferUtil.bytes(42));
+        Row row = BTreeRow.singleCellRow(c, BufferCell.live(def, 0, ByteBufferUtil.EMPTY_BYTE_BUFFER));
         PagingState.RowMark mark = PagingState.RowMark.create(metadata, row, protocolVersion);
         return new PagingState(pk, mark, 10, 0);
     }

diff --git a/test/unit/org/apache/cassandra/streaming/compression/CompressedInputStreamTest.java b/test/unit/org/apache/cassandra/streaming/compression/CompressedInputStreamTest.java
index a3300ac..562416e 100644
--- a/test/unit/org/apache/cassandra/streaming/compression/CompressedInputStreamTest.java
+++ b/test/unit/org/apache/cassandra/streaming/compression/CompressedInputStreamTest.java

@@ -19,14 +19,13 @@
 
 import java.io.*;
 import java.util.*;
-import java.util.concurrent.SynchronousQueue;
-import java.util.concurrent.TimeUnit;
 
 import org.junit.Test;
 import org.apache.cassandra.db.ClusteringComparator;
 import org.apache.cassandra.db.marshal.BytesType;
 import org.apache.cassandra.io.compress.CompressedSequentialWriter;
 import org.apache.cassandra.io.compress.CompressionMetadata;
+import org.apache.cassandra.io.util.SequentialWriterOption;
 import org.apache.cassandra.schema.CompressionParams;
 import org.apache.cassandra.io.sstable.Component;
 import org.apache.cassandra.io.sstable.Descriptor;
@@ -72,7 +71,11 @@
         MetadataCollector collector = new MetadataCollector(new ClusteringComparator(BytesType.instance));
         CompressionParams param = CompressionParams.snappy(32);
         Map<Long, Long> index = new HashMap<Long, Long>();
-        try (CompressedSequentialWriter writer = new CompressedSequentialWriter(tmp, desc.filenameFor(Component.COMPRESSION_INFO), param, collector))
+        try (CompressedSequentialWriter writer = new CompressedSequentialWriter(tmp,
+                                                                                desc.filenameFor(Component.COMPRESSION_INFO),
+                                                                                null,
+                                                                                SequentialWriterOption.DEFAULT,
+                                                                                param, collector))
         {
             for (long l = 0L; l < 1000; l++)
             {

diff --git a/test/unit/org/apache/cassandra/tools/nodetool/formatter/TableBuilderTest.java b/test/unit/org/apache/cassandra/tools/nodetool/formatter/TableBuilderTest.java
new file mode 100644
index 0000000..9782b5b
--- /dev/null
+++ b/test/unit/org/apache/cassandra/tools/nodetool/formatter/TableBuilderTest.java

@@ -0,0 +1,114 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.tools.nodetool.formatter;
+
+import java.io.ByteArrayOutputStream;
+import java.io.PrintStream;
+
+import org.junit.Test;
+
+import static org.junit.Assert.assertEquals;
+
+public class TableBuilderTest
+{
+    @Test
+    public void testEmptyRow()
+    {
+        TableBuilder table = new TableBuilder();
+
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+        try (PrintStream out = new PrintStream(baos))
+        {
+            table.printTo(out);
+        }
+        assertEquals("", baos.toString());
+    }
+
+    @Test
+    public void testOneRow()
+    {
+        TableBuilder table = new TableBuilder();
+
+        table.add("a", "bb", "ccc");
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+        try (PrintStream out = new PrintStream(baos))
+        {
+            table.printTo(out);
+        }
+        assertEquals(String.format("a bb ccc%n"), baos.toString());
+    }
+
+    @Test
+    public void testRows()
+    {
+        TableBuilder table = new TableBuilder();
+        table.add("a", "bb", "ccc");
+        table.add("aaa", "bb", "c");
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+        try (PrintStream out = new PrintStream(baos))
+        {
+            table.printTo(out);
+        }
+        assertEquals(String.format("a   bb ccc%naaa bb c  %n"), baos.toString());
+    }
+
+    @Test
+    public void testNullColumn()
+    {
+        TableBuilder table = new TableBuilder();
+        table.add("a", "b", "c");
+        table.add("a", null, "c");
+        table.add("a", null, null);
+        table.add(null, "b", "c");
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+        try (PrintStream out = new PrintStream(baos))
+        {
+            table.printTo(out);
+        }
+        assertEquals(String.format("a b c%na   c%na    %n  b c%n"), baos.toString());
+    }
+
+    @Test
+    public void testRowsOfDifferentSize()
+    {
+        TableBuilder table = new TableBuilder();
+        table.add("a", "b", "c");
+        table.add("a", "b", "c", "d", "e");
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+        try (PrintStream out = new PrintStream(baos))
+        {
+            table.printTo(out);
+        }
+        assertEquals(baos.toString(), String.format("a b c    %na b c d e%n"), baos.toString());
+    }
+
+    @Test
+    public void testDelimiter()
+    {
+        TableBuilder table = new TableBuilder('\t');
+
+        table.add("a", "bb", "ccc");
+        ByteArrayOutputStream baos = new ByteArrayOutputStream();
+        try (PrintStream out = new PrintStream(baos))
+        {
+            table.printTo(out);
+        }
+        assertEquals(String.format("a\tbb\tccc%n"), baos.toString());
+    }
+}
\ No newline at end of file

diff --git a/test/unit/org/apache/cassandra/tracing/TracingTest.java b/test/unit/org/apache/cassandra/tracing/TracingTest.java
new file mode 100644
index 0000000..30521c0
--- /dev/null
+++ b/test/unit/org/apache/cassandra/tracing/TracingTest.java

@@ -0,0 +1,176 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.tracing;
+
+import java.net.InetAddress;
+import java.nio.ByteBuffer;
+import java.util.ArrayList;
+import java.util.Collections;
+import java.util.List;
+import java.util.Map;
+import java.util.UUID;
+
+import org.junit.Test;
+
+import org.apache.cassandra.utils.progress.ProgressEvent;
+import org.apache.cassandra.utils.progress.ProgressListener;
+
+public final class TracingTest
+{
+
+    @Test
+    public void test()
+    {
+        List<String> traces = new ArrayList<>();
+        Tracing tracing = new TracingImpl(traces);
+        tracing.newSession(Tracing.TraceType.NONE);
+        TraceState state = tracing.begin("test-request", Collections.<String,String>emptyMap());
+        state.trace("test-1");
+        state.trace("test-2");
+        state.trace("test-3");
+        tracing.stopSession();
+
+        assert null == tracing.get();
+        assert 4 == traces.size();
+        assert "test-request".equals(traces.get(0));
+        assert "test-1".equals(traces.get(1));
+        assert "test-2".equals(traces.get(2));
+        assert "test-3".equals(traces.get(3));
+    }
+
+    @Test
+    public void test_get()
+    {
+        List<String> traces = new ArrayList<>();
+        Tracing tracing = new TracingImpl(traces);
+        tracing.newSession(Tracing.TraceType.NONE);
+        tracing.begin("test-request", Collections.<String,String>emptyMap());
+        tracing.get().trace("test-1");
+        tracing.get().trace("test-2");
+        tracing.get().trace("test-3");
+        tracing.stopSession();
+
+        assert null == tracing.get();
+        assert 4 == traces.size();
+        assert "test-request".equals(traces.get(0));
+        assert "test-1".equals(traces.get(1));
+        assert "test-2".equals(traces.get(2));
+        assert "test-3".equals(traces.get(3));
+    }
+
+    @Test
+    public void test_get_uuid()
+    {
+        List<String> traces = new ArrayList<>();
+        Tracing tracing = new TracingImpl(traces);
+        UUID uuid = tracing.newSession(Tracing.TraceType.NONE);
+        tracing.begin("test-request", Collections.<String,String>emptyMap());
+        tracing.get(uuid).trace("test-1");
+        tracing.get(uuid).trace("test-2");
+        tracing.get(uuid).trace("test-3");
+        tracing.stopSession();
+
+        assert null == tracing.get();
+        assert 4 == traces.size();
+        assert "test-request".equals(traces.get(0));
+        assert "test-1".equals(traces.get(1));
+        assert "test-2".equals(traces.get(2));
+        assert "test-3".equals(traces.get(3));
+    }
+
+    @Test
+    public void test_states()
+    {
+        List<String> traces = new ArrayList<>();
+        Tracing tracing = new TracingImpl(traces);
+        tracing.newSession(Tracing.TraceType.REPAIR);
+        tracing.begin("test-request", Collections.<String,String>emptyMap());
+        tracing.get().enableActivityNotification("test-tag");
+        assert TraceState.Status.IDLE == tracing.get().waitActivity(1);
+        tracing.get().trace("test-1");
+        assert TraceState.Status.ACTIVE == tracing.get().waitActivity(1);
+        tracing.get().stop();
+        assert TraceState.Status.STOPPED == tracing.get().waitActivity(1);
+        tracing.stopSession();
+        assert null == tracing.get();
+    }
+
+    @Test
+    public void test_progress_listener()
+    {
+        List<String> traces = new ArrayList<>();
+        Tracing tracing = new TracingImpl(traces);
+        tracing.newSession(Tracing.TraceType.REPAIR);
+        tracing.begin("test-request", Collections.<String,String>emptyMap());
+        tracing.get().enableActivityNotification("test-tag");
+
+        tracing.get().addProgressListener(
+                new ProgressListener()
+                {
+                    public void progress(String tag, ProgressEvent pe)
+                    {
+                        assert "test-tag".equals(tag);
+                        assert "test-trace".equals(pe.getMessage());
+                    }
+                });
+
+        tracing.get().trace("test-trace");
+        tracing.stopSession();
+        assert null == tracing.get();
+    }
+
+    private class TracingImpl extends Tracing
+    {
+        private final List<String> traces;
+
+        public TracingImpl(List<String> traces)
+        {
+            this.traces = traces;
+        }
+
+        public void stopSessionImpl()
+        {}
+
+        public TraceState begin(String request, InetAddress ia, Map<String, String> map)
+        {
+            traces.add(request);
+            return get();
+        }
+
+        protected TraceState newTraceState(InetAddress ia, UUID uuid, Tracing.TraceType tt)
+        {
+            return new TraceState(ia, uuid, tt)
+            {
+                protected void traceImpl(String string)
+                {
+                    traces.add(string);
+                }
+
+                protected void waitForPendingEvents()
+                {
+                }
+            };
+        }
+
+        public void trace(ByteBuffer bb, String string, int i)
+        {
+            throw new UnsupportedOperationException("Not supported yet.");
+        }
+    }
+}

diff --git a/test/unit/org/apache/cassandra/transport/SerDeserTest.java b/test/unit/org/apache/cassandra/transport/SerDeserTest.java
index fdb346e..44d72a1 100644
--- a/test/unit/org/apache/cassandra/transport/SerDeserTest.java
+++ b/test/unit/org/apache/cassandra/transport/SerDeserTest.java

@@ -148,6 +148,11 @@
         return UTF8Type.instance.decompose(str);
     }
 
+    private static FieldIdentifier field(String field)
+    {
+        return FieldIdentifier.forQuoted(field);
+    }
+
     private static ColumnIdentifier ci(String name)
     {
         return new ColumnIdentifier(name, false);
@@ -175,6 +180,7 @@
         udtSerDeserTest(4);
     }
 
+
     public void udtSerDeserTest(int version) throws Exception
     {
         ListType<?> lt = ListType.getInstance(Int32Type.instance, true);
@@ -183,14 +189,15 @@
 
         UserType udt = new UserType("ks",
                                     bb("myType"),
-                                    Arrays.asList(bb("f1"), bb("f2"), bb("f3"), bb("f4")),
-                                    Arrays.asList(LongType.instance, lt, st, mt));
+                                    Arrays.asList(field("f1"), field("f2"), field("f3"), field("f4")),
+                                    Arrays.asList(LongType.instance, lt, st, mt),
+                                    true);
 
-        Map<ColumnIdentifier, Term.Raw> value = new HashMap<>();
-        value.put(ci("f1"), lit(42));
-        value.put(ci("f2"), new Lists.Literal(Arrays.<Term.Raw>asList(lit(3), lit(1))));
-        value.put(ci("f3"), new Sets.Literal(Arrays.<Term.Raw>asList(lit("foo"), lit("bar"))));
-        value.put(ci("f4"), new Maps.Literal(Arrays.<Pair<Term.Raw, Term.Raw>>asList(
+        Map<FieldIdentifier, Term.Raw> value = new HashMap<>();
+        value.put(field("f1"), lit(42));
+        value.put(field("f2"), new Lists.Literal(Arrays.<Term.Raw>asList(lit(3), lit(1))));
+        value.put(field("f3"), new Sets.Literal(Arrays.<Term.Raw>asList(lit("foo"), lit("bar"))));
+        value.put(field("f4"), new Maps.Literal(Arrays.<Pair<Term.Raw, Term.Raw>>asList(
                                    Pair.<Term.Raw, Term.Raw>create(lit("foo"), lit(24)),
                                    Pair.<Term.Raw, Term.Raw>create(lit("bar"), lit(12)))));
 
@@ -230,7 +237,7 @@
         for (int i = 0; i < 3; i++)
             columnNames.add(new ColumnSpecification("ks", "cf", new ColumnIdentifier("col" + i, false), Int32Type.instance));
 
-        ResultSet.PreparedMetadata meta = new ResultSet.PreparedMetadata(columnNames, new Short[]{2, 1});
+        ResultSet.PreparedMetadata meta = new ResultSet.PreparedMetadata(columnNames, new short[]{2, 1});
         ByteBuf buf = Unpooled.buffer(meta.codec.encodedSize(meta, Server.VERSION_4));
         meta.codec.encode(meta, buf, Server.VERSION_4);
         ResultSet.PreparedMetadata decodedMeta = meta.codec.decode(buf, Server.VERSION_4);

diff --git a/test/unit/org/apache/cassandra/triggers/TriggerExecutorTest.java b/test/unit/org/apache/cassandra/triggers/TriggerExecutorTest.java
index 44391c8..36efabd 100644
--- a/test/unit/org/apache/cassandra/triggers/TriggerExecutorTest.java
+++ b/test/unit/org/apache/cassandra/triggers/TriggerExecutorTest.java

@@ -46,12 +46,14 @@
         CFMetaData metadata = makeCfMetaData("ks1", "cf1", TriggerMetadata.create("test", SameKeySameCfTrigger.class.getName()));
         PartitionUpdate mutated = TriggerExecutor.instance.execute(makeCf(metadata, "k1", "v1", null));
 
-        RowIterator rowIterator = UnfilteredRowIterators.filter(mutated.unfilteredIterator(), FBUtilities.nowInSeconds());
+        try (RowIterator rowIterator = UnfilteredRowIterators.filter(mutated.unfilteredIterator(),
+                                                                     FBUtilities.nowInSeconds()))
+        {
+            Iterator<Cell> cells = rowIterator.next().cells().iterator();
+            assertEquals(bytes("trigger"), cells.next().value());
 
-        Iterator<Cell> cells = rowIterator.next().cells().iterator();
-        assertEquals(bytes("trigger"), cells.next().value());
-
-        assertTrue(!rowIterator.hasNext());
+            assertTrue(!rowIterator.hasNext());
+        }
     }
 
     @Test(expected = InvalidRequestException.class)
@@ -272,9 +274,9 @@
         builder.newRow(Clustering.EMPTY);
         long ts = FBUtilities.timestampMicros();
         if (columnValue1 != null)
-            builder.addCell(BufferCell.live(metadata, metadata.getColumnDefinition(bytes("c1")), ts, bytes(columnValue1)));
+            builder.addCell(BufferCell.live(metadata.getColumnDefinition(bytes("c1")), ts, bytes(columnValue1)));
         if (columnValue2 != null)
-            builder.addCell(BufferCell.live(metadata, metadata.getColumnDefinition(bytes("c2")), ts, bytes(columnValue2)));
+            builder.addCell(BufferCell.live(metadata.getColumnDefinition(bytes("c2")), ts, bytes(columnValue2)));
 
         return PartitionUpdate.singleRowUpdate(metadata, Util.dk(key), builder.build());
     }

diff --git a/test/unit/org/apache/cassandra/triggers/TriggersTest.java b/test/unit/org/apache/cassandra/triggers/TriggersTest.java
index 13ecbe9..e5a2dd6 100644
--- a/test/unit/org/apache/cassandra/triggers/TriggersTest.java
+++ b/test/unit/org/apache/cassandra/triggers/TriggersTest.java

@@ -18,7 +18,6 @@
 package org.apache.cassandra.triggers;
 
 import java.net.InetAddress;
-import java.nio.ByteBuffer;
 import java.util.Collection;
 import java.util.Collections;
 
@@ -35,7 +34,6 @@
 import org.apache.cassandra.db.ConsistencyLevel;
 import org.apache.cassandra.db.Mutation;
 import org.apache.cassandra.db.partitions.Partition;
-import org.apache.cassandra.db.partitions.PartitionUpdate;
 import org.apache.cassandra.exceptions.ConfigurationException;
 import org.apache.cassandra.exceptions.RequestExecutionException;
 import org.apache.cassandra.service.StorageService;
@@ -197,9 +195,7 @@
         assertUpdateIsAugmented(6);
     }
 
-    // Unfortunately, an IRE thrown from StorageProxy.cas
-    // results in a RuntimeException from QueryProcessor.process
-    @Test(expected=RuntimeException.class)
+    @Test(expected=org.apache.cassandra.exceptions.InvalidRequestException.class)
     public void onCqlUpdateWithConditionsRejectGeneratedUpdatesForDifferentPartition() throws Exception
     {
         String cf = "cf" + System.nanoTime();
@@ -215,9 +211,7 @@
         }
     }
 
-    // Unfortunately, an IRE thrown from StorageProxy.cas
-    // results in a RuntimeException from QueryProcessor.process
-    @Test(expected=RuntimeException.class)
+    @Test(expected=org.apache.cassandra.exceptions.InvalidRequestException.class)
     public void onCqlUpdateWithConditionsRejectGeneratedUpdatesForDifferentTable() throws Exception
     {
         String cf = "cf" + System.nanoTime();
@@ -283,6 +277,27 @@
         }
     }
 
+    @Test(expected=org.apache.cassandra.exceptions.InvalidRequestException.class)
+    public void ifTriggerThrowsErrorNoMutationsAreApplied() throws Exception
+    {
+        String cf = "cf" + System.nanoTime();
+        try
+        {
+            setupTableWithTrigger(cf, ErrorTrigger.class);
+            String cql = String.format("INSERT INTO %s.%s (k, v1) VALUES (11, 11)", ksName, cf);
+            QueryProcessor.process(cql, ConsistencyLevel.ONE);
+        }
+        catch (Exception e)
+        {
+            assertTrue(e.getMessage().equals(ErrorTrigger.MESSAGE));
+            throw e;
+        }
+        finally
+        {
+            assertUpdateNotExecuted(cf, 11);
+        }
+    }
+
     private void setupTableWithTrigger(String cf, Class<? extends ITrigger> triggerImpl)
     throws RequestExecutionException
     {
@@ -352,4 +367,13 @@
             return Collections.singletonList(update.build());
         }
     }
+
+    public static class ErrorTrigger implements ITrigger
+    {
+        public static final String MESSAGE = "Thrown by ErrorTrigger";
+        public Collection<Mutation> augment(Partition partition)
+        {
+            throw new org.apache.cassandra.exceptions.InvalidRequestException(MESSAGE);
+        }
+    }
 }

diff --git a/test/unit/org/apache/cassandra/utils/BTreeTest.java b/test/unit/org/apache/cassandra/utils/BTreeTest.java
index ffd7315..a01ad2e 100644
--- a/test/unit/org/apache/cassandra/utils/BTreeTest.java
+++ b/test/unit/org/apache/cassandra/utils/BTreeTest.java

@@ -214,14 +214,16 @@
                 builder.add(i);
             // for sorted input, check non-resolve path works before checking resolution path
             checkResolverOutput(count, builder.build(), BTree.Dir.ASC);
-            builder.reuse();
+            builder = BTree.builder(Comparator.naturalOrder());
+            builder.setQuickResolver(resolver);
             for (int i = 0 ; i < 10 ; i++)
             {
                 // now do a few runs of randomized inputs
                 for (Accumulator j : resolverInput(count, true))
                     builder.add(j);
                 checkResolverOutput(count, builder.build(), BTree.Dir.ASC);
-                builder.reuse();
+                builder = BTree.builder(Comparator.naturalOrder());
+                builder.setQuickResolver(resolver);
             }
             for (List<Accumulator> add : splitResolverInput(count))
             {
@@ -231,7 +233,6 @@
                     builder.addAll(new TreeSet<>(add));
             }
             checkResolverOutput(count, builder.build(), BTree.Dir.ASC);
-            builder.reuse();
         }
     }
 
@@ -278,7 +279,14 @@
                 builder.add(i);
             // for sorted input, check non-resolve path works before checking resolution path
             Assert.assertTrue(Iterables.elementsEqual(sorted, BTree.iterable(builder.build())));
+
+            builder = BTree.builder(Comparator.naturalOrder());
+            builder.auto(false);
+            for (Accumulator i : sorted)
+                builder.add(i);
+            // check resolution path
             checkResolverOutput(count, builder.resolve(resolver).build(), BTree.Dir.ASC);
+
             builder = BTree.builder(Comparator.naturalOrder());
             builder.auto(false);
             for (int i = 0 ; i < 10 ; i++)
@@ -287,11 +295,13 @@
                 for (Accumulator j : resolverInput(count, true))
                     builder.add(j);
                 checkResolverOutput(count, builder.sort().resolve(resolver).build(), BTree.Dir.ASC);
-                builder.reuse();
+                builder = BTree.builder(Comparator.naturalOrder());
+                builder.auto(false);
                 for (Accumulator j : resolverInput(count, true))
                     builder.add(j);
                 checkResolverOutput(count, builder.sort().reverse().resolve(resolver).build(), BTree.Dir.DESC);
-                builder.reuse();
+                builder = BTree.builder(Comparator.naturalOrder());
+                builder.auto(false);
             }
         }
     }

diff --git a/test/unit/org/apache/cassandra/utils/ByteBufferUtilTest.java b/test/unit/org/apache/cassandra/utils/ByteBufferUtilTest.java
index 3f34102..f2746f6 100644
--- a/test/unit/org/apache/cassandra/utils/ByteBufferUtilTest.java
+++ b/test/unit/org/apache/cassandra/utils/ByteBufferUtilTest.java

@@ -24,7 +24,9 @@
 import java.nio.ByteBuffer;
 import java.nio.charset.CharacterCodingException;
 import java.util.Arrays;
+import java.util.concurrent.ThreadLocalRandom;
 
+import org.junit.Assert;
 import org.junit.Test;
 
 import org.apache.cassandra.io.util.DataOutputBuffer;
@@ -247,4 +249,55 @@
         assertEquals(bb, bb2);
         assertEquals("0102", s);
     }
+
+    @Test
+    public void testStartsAndEndsWith()
+    {
+        byte[] bytes = new byte[512];
+        ThreadLocalRandom random = ThreadLocalRandom.current();
+
+        random.nextBytes(bytes);
+
+        ByteBuffer a = ByteBuffer.wrap(bytes);
+        ByteBuffer b = a.duplicate();
+
+        // let's take random slices of a and match
+        for (int i = 0; i < 512; i++)
+        {
+            // prefix from the original offset
+            b.position(0).limit(a.remaining() - random.nextInt(0, a.remaining() - 1));
+            Assert.assertTrue(ByteBufferUtil.startsWith(a, b));
+            Assert.assertTrue(ByteBufferUtil.startsWith(a, b.slice()));
+
+            // prefix from random position inside of array
+            int pos = random.nextInt(1, a.remaining() - 5);
+            a.position(pos);
+            b.limit(bytes.length - 1).position(pos);
+
+            Assert.assertTrue(ByteBufferUtil.startsWith(a, b));
+
+            a.position(0);
+
+            // endsWith at random position
+            b.limit(a.remaining()).position(random.nextInt(0, a.remaining() - 1));
+            Assert.assertTrue(ByteBufferUtil.endsWith(a, b));
+            Assert.assertTrue(ByteBufferUtil.endsWith(a, b.slice()));
+
+        }
+
+        a.limit(bytes.length - 1).position(0);
+        b.limit(bytes.length - 1).position(1);
+
+        Assert.assertFalse(ByteBufferUtil.startsWith(a, b));
+        Assert.assertFalse(ByteBufferUtil.startsWith(a, b.slice()));
+
+        Assert.assertTrue(ByteBufferUtil.endsWith(a, b));
+        Assert.assertTrue(ByteBufferUtil.endsWith(a, b.slice()));
+
+
+        a.position(5);
+
+        Assert.assertFalse(ByteBufferUtil.startsWith(a, b));
+        Assert.assertFalse(ByteBufferUtil.endsWith(a, b));
+    }
 }

diff --git a/test/unit/org/apache/cassandra/utils/CassandraVersionTest.java b/test/unit/org/apache/cassandra/utils/CassandraVersionTest.java
index 145b735..73562b7 100644
--- a/test/unit/org/apache/cassandra/utils/CassandraVersionTest.java
+++ b/test/unit/org/apache/cassandra/utils/CassandraVersionTest.java

@@ -17,9 +17,16 @@
  */
 package org.apache.cassandra.utils;
 
+import java.lang.reflect.InvocationTargetException;
+import java.lang.reflect.Method;
+import java.util.Arrays;
 import org.junit.Test;
 
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertThat;
 import static org.junit.Assert.assertTrue;
+import static org.junit.Assert.fail;
+import static org.junit.matchers.JUnitMatchers.containsString;
 
 public class CassandraVersionTest
 {
@@ -29,14 +36,18 @@
         CassandraVersion version;
 
         version = new CassandraVersion("1.2.3");
-        assert version.major == 1 && version.minor == 2 && version.patch == 3;
+        assertTrue(version.major == 1 && version.minor == 2 && version.patch == 3);
 
         version = new CassandraVersion("1.2.3-foo.2+Bar");
-        assert version.major == 1 && version.minor == 2 && version.patch == 3;
+        assertTrue(version.major == 1 && version.minor == 2 && version.patch == 3);
 
         // CassandraVersion can parse 4th '.' as build number
         version = new CassandraVersion("1.2.3.456");
-        assert version.major == 1 && version.minor == 2 && version.patch == 3;
+        assertTrue(version.major == 1 && version.minor == 2 && version.patch == 3);
+
+        // support for tick-tock release
+        version = new CassandraVersion("3.2");
+        assertTrue(version.major == 3 && version.minor == 2 && version.patch == 0);
     }
 
     @Test
@@ -46,32 +57,32 @@
 
         v1 = new CassandraVersion("1.2.3");
         v2 = new CassandraVersion("1.2.4");
-        assert v1.compareTo(v2) == -1;
+        assertTrue(v1.compareTo(v2) == -1);
 
         v1 = new CassandraVersion("1.2.3");
         v2 = new CassandraVersion("1.2.3");
-        assert v1.compareTo(v2) == 0;
+        assertTrue(v1.compareTo(v2) == 0);
 
         v1 = new CassandraVersion("1.2.3");
         v2 = new CassandraVersion("2.0.0");
-        assert v1.compareTo(v2) == -1;
-        assert v2.compareTo(v1) == 1;
+        assertTrue(v1.compareTo(v2) == -1);
+        assertTrue(v2.compareTo(v1) == 1);
 
         v1 = new CassandraVersion("1.2.3");
         v2 = new CassandraVersion("1.2.3-alpha");
-        assert v1.compareTo(v2) == 1;
+        assertTrue(v1.compareTo(v2) == 1);
 
         v1 = new CassandraVersion("1.2.3");
         v2 = new CassandraVersion("1.2.3+foo");
-        assert v1.compareTo(v2) == -1;
+        assertTrue(v1.compareTo(v2) == -1);
 
         v1 = new CassandraVersion("1.2.3");
         v2 = new CassandraVersion("1.2.3-alpha+foo");
-        assert v1.compareTo(v2) == 1;
+        assertTrue(v1.compareTo(v2) == 1);
 
         v1 = new CassandraVersion("1.2.3-alpha+1");
         v2 = new CassandraVersion("1.2.3-alpha+2");
-        assert v1.compareTo(v2) == -1;
+        assertTrue(v1.compareTo(v2) == -1);
     }
 
     @Test
@@ -80,33 +91,32 @@
         CassandraVersion v1, v2;
 
         v1 = new CassandraVersion("3.0.2");
-        assert v1.isSupportedBy(v1);
+        assertTrue(v1.isSupportedBy(v1));
 
         v1 = new CassandraVersion("1.2.3");
         v2 = new CassandraVersion("1.2.4");
-        assert v1.isSupportedBy(v2);
-        assert !v2.isSupportedBy(v1);
+        assertTrue(v1.isSupportedBy(v2));
+        assertTrue(!v2.isSupportedBy(v1));
 
         v1 = new CassandraVersion("1.2.3");
         v2 = new CassandraVersion("1.3.3");
-        assert v1.isSupportedBy(v2);
-        assert !v2.isSupportedBy(v1);
+        assertTrue(v1.isSupportedBy(v2));
+        assertTrue(!v2.isSupportedBy(v1));
 
         v1 = new CassandraVersion("2.2.3");
         v2 = new CassandraVersion("1.3.3");
-        assert !v1.isSupportedBy(v2);
-        assert !v2.isSupportedBy(v1);
+        assertTrue(!v1.isSupportedBy(v2));
+        assertTrue(!v2.isSupportedBy(v1));
 
         v1 = new CassandraVersion("3.1.0");
         v2 = new CassandraVersion("3.0.1");
-        assert !v1.isSupportedBy(v2);
-        assert v2.isSupportedBy(v1);
+        assertTrue(!v1.isSupportedBy(v2));
+        assertTrue(v2.isSupportedBy(v1));
     }
 
     @Test
     public void testInvalid()
     {
-        assertThrows("1.0");
         assertThrows("1.0.0a");
         assertThrows("1.a.4");
         assertThrows("1.0.0-foo&");
@@ -132,15 +142,74 @@
         prev = next;
         next = new CassandraVersion("2.2.0");
         assertTrue(prev.compareTo(next) < 0);
-    }
 
+        prev = next;
+        next = new CassandraVersion("3.1");
+        assertTrue(prev.compareTo(next) < 0);
+
+        prev = next;
+        next = new CassandraVersion("3.1.1");
+        assertTrue(prev.compareTo(next) < 0);
+
+        prev = next;
+        next = new CassandraVersion("3.2-rc1-SNAPSHOT");
+        assertTrue(prev.compareTo(next) < 0);
+
+        prev = next;
+        next = new CassandraVersion("3.2");
+        assertTrue(prev.compareTo(next) < 0);
+    }
+    
     private static void assertThrows(String str)
     {
         try
         {
             new CassandraVersion(str);
-            assert false;
+            fail();
         }
         catch (IllegalArgumentException e) {}
     }
+    
+    @Test
+    public void testParseIdentifiersPositive() throws Throwable
+    {
+        String[] result = parseIdentifiers("DUMMY", "+a.b.cde.f_g.");
+        String[] expected = {"a", "b", "cde", "f_g"};
+        assertArrayEquals(expected, result);
+    }
+    
+    @Test
+    public void testParseIdentifiersNegative() throws Throwable
+    {
+        String version = "DUMMY";
+        try
+        {
+            parseIdentifiers(version, "+a. .b");
+            
+        }
+        catch (IllegalArgumentException e)
+        {
+            assertThat(e.getMessage(), containsString(version));
+        }
+    }
+    private static String[] parseIdentifiers(String version, String str) throws Throwable
+    {
+        String name = "parseIdentifiers";
+        Class[] args = {String.class, String.class};
+        for (Method m: CassandraVersion.class.getDeclaredMethods())
+        {
+            if (name.equals(m.getName()) && 
+                    Arrays.equals(args, m.getParameterTypes()))
+            {
+                m.setAccessible(true);
+                try 
+                {
+                return (String[]) m.invoke(null, version, str); 
+                } catch (InvocationTargetException e){
+                    throw e.getTargetException();
+                }
+            }
+        }
+        throw new NoSuchMethodException(CassandraVersion.class + "." + name + Arrays.toString(args));
+    }
 }

diff --git a/test/unit/org/apache/cassandra/utils/FBUtilitiesTest.java b/test/unit/org/apache/cassandra/utils/FBUtilitiesTest.java
index 90c5f05..3c1ea74 100644
--- a/test/unit/org/apache/cassandra/utils/FBUtilitiesTest.java
+++ b/test/unit/org/apache/cassandra/utils/FBUtilitiesTest.java

@@ -1,4 +1,4 @@
-/**
+/*
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -22,12 +22,17 @@
 import java.nio.ByteBuffer;
 import java.nio.charset.CharacterCodingException;
 import java.nio.charset.StandardCharsets;
+import java.util.Map;
+import java.util.Optional;
+import java.util.TreeMap;
 
 import com.google.common.primitives.Ints;
+
+import org.junit.Assert;
 import org.junit.Test;
 
-import java.util.Map;
-import java.util.TreeMap;
+import org.apache.cassandra.db.marshal.*;
+import org.apache.cassandra.dht.*;
 
 import static org.junit.Assert.assertEquals;
 import static org.junit.Assert.fail;
@@ -95,4 +100,51 @@
         ByteBuffer bytes = ByteBuffer.wrap(new byte[]{(byte)0xff, (byte)0xfe});
         ByteBufferUtil.string(bytes, StandardCharsets.UTF_8);
     }
+
+    private static void assertPartitioner(String name, Class expected)
+    {
+        Assert.assertTrue(String.format("%s != %s", name, expected.toString()),
+                          expected.isInstance(FBUtilities.newPartitioner(name)));
+    }
+
+    /**
+     * Check that given a name, the correct partitioner instance is created.
+     *
+     * If the assertions in this test start failing, it likely means the sstabledump/sstablemetadata tools will
+     * also fail to read existing sstables.
+     */
+    @Test
+    public void testNewPartitionerNoArgConstructors()
+    {
+        assertPartitioner("ByteOrderedPartitioner", ByteOrderedPartitioner.class);
+        assertPartitioner("LengthPartitioner", LengthPartitioner.class);
+        assertPartitioner("Murmur3Partitioner", Murmur3Partitioner.class);
+        assertPartitioner("OrderPreservingPartitioner", OrderPreservingPartitioner.class);
+        assertPartitioner("RandomPartitioner", RandomPartitioner.class);
+        assertPartitioner("org.apache.cassandra.dht.ByteOrderedPartitioner", ByteOrderedPartitioner.class);
+        assertPartitioner("org.apache.cassandra.dht.LengthPartitioner", LengthPartitioner.class);
+        assertPartitioner("org.apache.cassandra.dht.Murmur3Partitioner", Murmur3Partitioner.class);
+        assertPartitioner("org.apache.cassandra.dht.OrderPreservingPartitioner", OrderPreservingPartitioner.class);
+        assertPartitioner("org.apache.cassandra.dht.RandomPartitioner", RandomPartitioner.class);
+    }
+
+    /**
+     * Check that we can instantiate local partitioner correctly and that we can pass the correct type
+     * to it as a constructor argument.
+     *
+     * If the assertions in this test start failing, it likely means the sstabledump/sstablemetadata tools will
+     * also fail to read existing sstables.
+     */
+    @Test
+    public void testNewPartitionerLocalPartitioner()
+    {
+        for (String name : new String[] {"LocalPartitioner", "org.apache.cassandra.dht.LocalPartitioner"})
+            for (AbstractType<?> type : new AbstractType<?>[] {UUIDType.instance, ListType.getInstance(Int32Type.instance, true)})
+            {
+                IPartitioner partitioner = FBUtilities.newPartitioner(name, Optional.of(type));
+                Assert.assertTrue(String.format("%s != LocalPartitioner", partitioner.toString()),
+                                  LocalPartitioner.class.isInstance(partitioner));
+                Assert.assertEquals(partitioner.partitionOrdering(), type);
+            }
+    }
 }

diff --git a/test/unit/org/apache/cassandra/utils/KillerForTests.java b/test/unit/org/apache/cassandra/utils/KillerForTests.java
index abc7952..fe9aa45 100644
--- a/test/unit/org/apache/cassandra/utils/KillerForTests.java
+++ b/test/unit/org/apache/cassandra/utils/KillerForTests.java

@@ -18,6 +18,8 @@
 
 package org.apache.cassandra.utils;
 
+import org.junit.Assert;
+
 /**
  * Responsible for stubbing out the System.exit() logic during unit tests.
  */
@@ -25,10 +27,24 @@
 {
     private boolean killed = false;
     private boolean quiet = false;
+    private final boolean expected;
+
+    public KillerForTests()
+    {
+        expected = true;
+    }
+
+    public KillerForTests(boolean expectFailure)
+    {
+        expected = expectFailure;
+    }
 
     @Override
     protected void killCurrentJVM(Throwable t, boolean quiet)
     {
+        if (!expected)
+            Assert.fail("Saw JVM Kill but did not expect it.");
+
         this.killed = true;
         this.quiet = quiet;
     }

diff --git a/test/unit/org/apache/cassandra/utils/NanoTimeToCurrentTimeMillisTest.java b/test/unit/org/apache/cassandra/utils/NanoTimeToCurrentTimeMillisTest.java
index 1662e77..b3dfad3 100644
--- a/test/unit/org/apache/cassandra/utils/NanoTimeToCurrentTimeMillisTest.java
+++ b/test/unit/org/apache/cassandra/utils/NanoTimeToCurrentTimeMillisTest.java

@@ -34,10 +34,7 @@
             now = Math.max(now, System.currentTimeMillis());
             if (ii % 10000 == 0)
             {
-                synchronized (NanoTimeToCurrentTimeMillis.TIMESTAMP_UPDATE)
-                {
-                    NanoTimeToCurrentTimeMillis.TIMESTAMP_UPDATE.notify();
-                }
+                NanoTimeToCurrentTimeMillis.updateNow();
                 Thread.sleep(1);
             }
 

diff --git a/test/unit/org/apache/cassandra/utils/StreamingHistogramTest.java b/test/unit/org/apache/cassandra/utils/StreamingHistogramTest.java
index b6b1882..94aac9e 100644
--- a/test/unit/org/apache/cassandra/utils/StreamingHistogramTest.java
+++ b/test/unit/org/apache/cassandra/utils/StreamingHistogramTest.java

@@ -22,6 +22,7 @@
 import java.util.Map;
 
 import org.junit.Test;
+
 import org.apache.cassandra.io.util.DataInputBuffer;
 import org.apache.cassandra.io.util.DataOutputBuffer;
 
@@ -50,11 +51,11 @@
         expected1.put(36.0, 1L);
 
         Iterator<Map.Entry<Double, Long>> expectedItr = expected1.entrySet().iterator();
-        for (Map.Entry<Double, Long> actual : hist.getAsMap().entrySet())
+        for (Map.Entry<Number, long[]> actual : hist.getAsMap().entrySet())
         {
             Map.Entry<Double, Long> entry = expectedItr.next();
-            assertEquals(entry.getKey(), actual.getKey(), 0.01);
-            assertEquals(entry.getValue(), actual.getValue());
+            assertEquals(entry.getKey(), actual.getKey().doubleValue(), 0.01);
+            assertEquals(entry.getValue().longValue(), actual.getValue()[0]);
         }
 
         // merge test
@@ -72,11 +73,11 @@
         expected2.put(32.67, 3L);
         expected2.put(45.0, 1L);
         expectedItr = expected2.entrySet().iterator();
-        for (Map.Entry<Double, Long> actual : hist.getAsMap().entrySet())
+        for (Map.Entry<Number, long[]> actual : hist.getAsMap().entrySet())
         {
             Map.Entry<Double, Long> entry = expectedItr.next();
-            assertEquals(entry.getKey(), actual.getKey(), 0.01);
-            assertEquals(entry.getValue(), actual.getValue());
+            assertEquals(entry.getKey(), actual.getKey().doubleValue(), 0.01);
+            assertEquals(entry.getValue().longValue(), actual.getValue()[0]);
         }
 
         // sum test
@@ -112,11 +113,40 @@
         expected1.put(36.0, 1L);
 
         Iterator<Map.Entry<Double, Long>> expectedItr = expected1.entrySet().iterator();
-        for (Map.Entry<Double, Long> actual : deserialized.getAsMap().entrySet())
+        for (Map.Entry<Number, long[]> actual : deserialized.getAsMap().entrySet())
         {
             Map.Entry<Double, Long> entry = expectedItr.next();
-            assertEquals(entry.getKey(), actual.getKey(), 0.01);
-            assertEquals(entry.getValue(), actual.getValue());
+            assertEquals(entry.getKey(), actual.getKey().doubleValue(), 0.01);
+            assertEquals(entry.getValue().longValue(), actual.getValue()[0]);
         }
     }
+
+
+    @Test
+    public void testNumericTypes() throws Exception
+    {
+        StreamingHistogram hist = new StreamingHistogram(5);
+
+        hist.update(2);
+        hist.update(2.0);
+        hist.update(2L);
+
+        Map<Number, long[]> asMap = hist.getAsMap();
+
+        assertEquals(1, asMap.size());
+        assertEquals(3L, asMap.get(2)[0]);
+
+        //Make sure it's working with Serde
+        DataOutputBuffer out = new DataOutputBuffer();
+        StreamingHistogram.serializer.serialize(hist, out);
+        byte[] bytes = out.toByteArray();
+
+        StreamingHistogram deserialized = StreamingHistogram.serializer.deserialize(new DataInputBuffer(bytes));
+
+        deserialized.update(2L);
+
+        asMap = deserialized.getAsMap();
+        assertEquals(1, asMap.size());
+        assertEquals(4L, asMap.get(2)[0]);
+    }
 }

diff --git a/test/unit/org/apache/cassandra/utils/UUIDTests.java b/test/unit/org/apache/cassandra/utils/UUIDTests.java
index 83e421a..0d57c47 100644
--- a/test/unit/org/apache/cassandra/utils/UUIDTests.java
+++ b/test/unit/org/apache/cassandra/utils/UUIDTests.java

@@ -22,12 +22,20 @@
 
 
 import java.nio.ByteBuffer;
+import java.util.Set;
 import java.util.UUID;
+import java.util.concurrent.ExecutorService;
+import java.util.concurrent.Executors;
+import java.util.concurrent.TimeUnit;
+import java.util.concurrent.atomic.AtomicBoolean;
 
 import org.junit.Test;
 
+import com.google.common.collect.Sets;
+
 import org.apache.cassandra.db.marshal.TimeUUIDType;
 import org.apache.cassandra.utils.UUIDGen;
+import org.cliffc.high_scale_lib.NonBlockingHashMap;
 
 
 public class UUIDTests
@@ -48,7 +56,6 @@
         assert one.timestamp() < two.timestamp();
     }
 
-
     @Test
     public void testDecomposeAndRaw()
     {
@@ -59,6 +66,15 @@
     }
 
     @Test
+    public void testToFromByteBuffer()
+    {
+        UUID a = UUIDGen.getTimeUUID();
+        ByteBuffer bb = UUIDGen.toByteBuffer(a);
+        UUID b = UUIDGen.getUUID(bb);
+        assert a.equals(b);
+    }
+
+    @Test
     public void testTimeUUIDType()
     {
         TimeUUIDType comp = TimeUUIDType.instance;
@@ -80,4 +96,53 @@
         // I'll be damn is the uuid timestamp is more than 10ms after now
         assert now <= tstamp && now >= tstamp - 10 : "now = " + now + ", timestamp = " + tstamp;
     }
+
+    /*
+     * Don't ignore spurious failures of this test since it is testing concurrent access
+     * and might not fail reliably.
+     */
+    @Test
+    public void verifyConcurrentUUIDGeneration() throws Throwable
+    {
+        long iterations = 250000;
+        int threads = 4;
+        ExecutorService es = Executors.newFixedThreadPool(threads);
+        try
+        {
+            AtomicBoolean failedOrdering = new AtomicBoolean(false);
+            AtomicBoolean failedDuplicate = new AtomicBoolean(false);
+            Set<UUID> generated = Sets.newSetFromMap(new NonBlockingHashMap<>());
+            Runnable task = () -> {
+                long lastTimestamp = 0;
+                long newTimestamp = 0;
+
+                for (long i = 0; i < iterations; i++)
+                {
+                    UUID uuid = UUIDGen.getTimeUUID();
+                    newTimestamp = uuid.timestamp();
+
+                    if (lastTimestamp >= newTimestamp)
+                        failedOrdering.set(true);
+                    if (!generated.add(uuid))
+                        failedDuplicate.set(true);
+
+                    lastTimestamp = newTimestamp;
+                }
+            };
+
+            for (int i = 0; i < threads; i++)
+            {
+                es.execute(task);
+            }
+            es.shutdown();
+            es.awaitTermination(10, TimeUnit.MINUTES);
+
+            assert !failedOrdering.get();
+            assert !failedDuplicate.get();
+        }
+        finally
+        {
+            es.shutdown();
+        }
+    }
 }

diff --git a/test/unit/org/apache/cassandra/utils/btree/BTreeRemovalTest.java b/test/unit/org/apache/cassandra/utils/btree/BTreeRemovalTest.java
new file mode 100644
index 0000000..a9cf383
--- /dev/null
+++ b/test/unit/org/apache/cassandra/utils/btree/BTreeRemovalTest.java

@@ -0,0 +1,385 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.cassandra.utils.btree;
+
+import static org.apache.cassandra.utils.btree.BTreeRemoval.remove;
+import static org.junit.Assert.assertArrayEquals;
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertNotNull;
+import static org.junit.Assert.assertNull;
+import static org.junit.Assert.assertTrue;
+
+import java.util.Comparator;
+import java.util.Random;
+import java.util.SortedSet;
+import java.util.TreeSet;
+
+import org.junit.Test;
+
+import com.google.common.collect.Iterables;
+
+public class BTreeRemovalTest
+{
+    static
+    {
+        System.setProperty("cassandra.btree.fanfactor", "8");
+    }
+
+    private static final Comparator<Integer> CMP = new Comparator<Integer>()
+    {
+        public int compare(Integer o1, Integer o2)
+        {
+            return Integer.compare(o1, o2);
+        }
+    };
+
+    private static Object[] copy(final Object[] btree)
+    {
+        final Object[] result = new Object[btree.length];
+        System.arraycopy(btree, 0, result, 0, btree.length);
+        if (!BTree.isLeaf(btree))
+        {
+            for (int i = BTree.getChildStart(btree); i < BTree.getChildEnd(btree); ++i)
+                result[i] = copy((Object[]) btree[i]);
+            final int[] sizeMap = BTree.getSizeMap(btree);
+            final int[] resultSizeMap = new int[sizeMap.length];
+            System.arraycopy(sizeMap, 0, resultSizeMap, 0, sizeMap.length);
+            result[result.length - 1] = resultSizeMap;
+        }
+        return result;
+    }
+
+    private static Object[] assertRemove(final Object[] btree, final int key)
+    {
+        final Object[] btreeBeforeRemoval = copy(btree);
+        final Object[] result = remove(btree, CMP, key);
+        assertBTree(btreeBeforeRemoval, btree);
+        assertTrue(BTree.isWellFormed(result, CMP));
+        assertEquals(BTree.size(btree) - 1, BTree.size(result));
+        assertNull(BTree.find(result, CMP, key));
+
+        for (Integer k : BTree.<Integer>iterable(btree))
+            if (k != key)
+                assertNotNull(BTree.find(result, CMP, k));
+
+        return result;
+    }
+
+    private static void assertBTree(final Object[] expected, final Object[] result)
+    {
+        assertEquals(BTree.isEmpty(expected), BTree.isEmpty(result));
+        assertEquals(BTree.isLeaf(expected), BTree.isLeaf(result));
+        assertEquals(expected.length, result.length);
+        if (BTree.isLeaf(expected))
+        {
+            assertArrayEquals(expected, result);
+        }
+        else
+        {
+            for (int i = 0; i < BTree.getBranchKeyEnd(expected); ++i)
+                assertEquals(expected[i], result[i]);
+            for (int i = BTree.getChildStart(expected); i < BTree.getChildEnd(expected); ++i)
+                assertBTree((Object[]) expected[i], (Object[]) result[i]);
+            assertArrayEquals(BTree.getSizeMap(expected), BTree.getSizeMap(result));
+        }
+    }
+
+    private static Object[] generateLeaf(int from, int size)
+    {
+        final Object[] result = new Object[(size & 1) == 1 ? size : size + 1];
+        for (int i = 0; i < size; ++i)
+            result[i] = from + i;
+        return result;
+    }
+
+    private static Object[] generateBranch(int[] keys, Object[][] children)
+    {
+        assert keys.length > 0;
+        assert children.length > 1;
+        assert children.length == keys.length + 1;
+        final Object[] result = new Object[keys.length + children.length + 1];
+        for (int i = 0; i < keys.length; ++i)
+            result[i] = keys[i];
+        for (int i = 0; i < children.length; ++i)
+            result[keys.length + i] = children[i];
+        final int[] sizeMap = new int[children.length];
+        sizeMap[0] = BTree.size(children[0]);
+        for (int i = 1; i < children.length; ++i)
+            sizeMap[i] = sizeMap[i - 1] + BTree.size(children[i]) + 1;
+        result[result.length - 1] = sizeMap;
+        return result;
+    }
+
+    private static Object[] generateSampleTwoLevelsTree(final int[] leafSizes)
+    {
+        assert leafSizes.length > 1;
+        final Object[][] leaves = new Object[leafSizes.length][];
+        for (int i = 0; i < leaves.length; ++i)
+            leaves[i] = generateLeaf(10 * i + 1, leafSizes[i]);
+        final int[] keys = new int[leafSizes.length - 1];
+        for (int i = 0; i < keys.length; ++i)
+            keys[i] = 10 * (i + 1);
+        final Object[] btree = generateBranch(keys, leaves);
+        assertTrue(BTree.isWellFormed(btree, CMP));
+        return btree;
+    }
+
+    private static Object[] generateSampleThreeLevelsTree(final int[] middleNodeSizes)
+    {
+        assert middleNodeSizes.length > 1;
+        final Object[][] middleNodes = new Object[middleNodeSizes.length][];
+        for (int i = 0; i < middleNodes.length; ++i)
+        {
+            final Object[][] leaves = new Object[middleNodeSizes[i]][];
+            for (int j = 0; j < middleNodeSizes[i]; ++j)
+                leaves[j] = generateLeaf(100 * i + 10 * j + 1, 4);
+            final int[] keys = new int[middleNodeSizes[i] - 1];
+            for (int j = 0; j < keys.length; ++j)
+                keys[j] = 100 * i + 10 * (j + 1);
+            middleNodes[i] = generateBranch(keys, leaves);
+        }
+        final int[] keys = new int[middleNodeSizes.length - 1];
+        for (int i = 0; i < keys.length; ++i)
+            keys[i] = 100 * (i + 1);
+        final Object[] btree = generateBranch(keys, middleNodes);
+        assertTrue(BTree.isWellFormed(btree, CMP));
+        return btree;
+    }
+
+    @Test
+    public void testRemoveFromEmpty()
+    {
+        assertBTree(BTree.empty(), remove(BTree.empty(), CMP, 1));
+    }
+
+    @Test
+    public void testRemoveNonexistingElement()
+    {
+        final Object[] btree = new Object[] {1, 2, 3, 4, null};
+        assertBTree(btree, remove(btree, CMP, 5));
+    }
+
+    @Test
+    public void testRemoveLastElement()
+    {
+        final Object[] btree = new Object[] {1};
+        assertBTree(BTree.empty(), remove(btree, CMP, 1));
+    }
+
+    @Test
+    public void testRemoveFromRootWhichIsALeaf()
+    {
+        for (int size = 1; size < 9; ++size)
+        {
+            final Object[] btree = new Object[(size & 1) == 1 ? size : size + 1];
+            for (int i = 0; i < size; ++i)
+                btree[i] = i + 1;
+            for (int i = 0; i < size; ++i)
+            {
+                final Object[] result = remove(btree, CMP, i + 1);
+                assertTrue("size " + size, BTree.isWellFormed(result, CMP));
+                for (int j = 0; j < i; ++j)
+                    assertEquals("size " + size + "elem " + j, btree[j], result[j]);
+                for (int j = i; j < size - 1; ++j)
+                    assertEquals("size " + size + "elem " + j, btree[j + 1], result[j]);
+                for (int j = size - 1; j < result.length; ++j)
+                    assertNull("size " + size + "elem " + j, result[j]);
+            }
+
+            {
+                final Object[] result = remove(btree, CMP, 0);
+                assertTrue("size " + size, BTree.isWellFormed(result, CMP));
+                assertBTree(btree, result);
+            }
+
+            {
+                final Object[] result = remove(btree, CMP, size + 1);
+                assertTrue("size " + size, BTree.isWellFormed(result, CMP));
+                assertBTree(btree, result);
+            }
+        }
+    }
+
+    @Test
+    public void testRemoveFromNonMinimalLeaf()
+    {
+        for (int size = 5; size < 9; ++size)
+        {
+            final Object[] btree = generateSampleTwoLevelsTree(new int[] {size, 4, 4, 4, 4});
+
+            for (int i = 1; i < size + 1; ++i)
+                assertRemove(btree, i);
+        }
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafRotateLeft()
+    {
+        final Object[] btree = generateSampleTwoLevelsTree(new int[] {4, 5, 5, 5, 5});
+
+        for (int i = 11; i < 15; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafRotateRight1()
+    {
+        final Object[] btree = generateSampleTwoLevelsTree(new int[] {4, 5, 5, 5, 5});
+
+        for (int i = 1; i < 5; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafRotateRight2()
+    {
+        final Object[] btree = generateSampleTwoLevelsTree(new int[] {4, 4, 5, 5, 5});
+
+        for (int i = 11; i < 15; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafMergeWithLeft1()
+    {
+        final Object[] btree = generateSampleTwoLevelsTree(new int[] {4, 4, 4, 4, 4});
+
+        for (int i = 11; i < 15; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafMergeWithLeft2()
+    {
+        final Object[] btree = generateSampleTwoLevelsTree(new int[] {4, 4, 4, 4, 4});
+
+        for (int i = 41; i < 45; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafMergeWithRight()
+    {
+        final Object[] btree = generateSampleTwoLevelsTree(new int[] {4, 4, 4, 4, 4});
+
+        for (int i = 1; i < 5; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafWhenSingleKeyRootMergeWithLeft()
+    {
+        final Object[] btree = generateSampleTwoLevelsTree(new int[] {4, 4});
+
+        for (int i = 1; i < 5; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafWhenSingleKeyRootMergeWithRight()
+    {
+        final Object[] btree = generateSampleTwoLevelsTree(new int[] {4, 4});
+
+        for (int i = 11; i < 15; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafWithBranchLeftRotation()
+    {
+        final Object[] btree = generateSampleThreeLevelsTree(new int[] {6, 5, 5, 5, 5});
+        for (int i = 101; i < 105; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafWithBranchRightRotation1()
+    {
+        final Object[] btree = generateSampleThreeLevelsTree(new int[] {5, 6, 5, 5, 5});
+        for (int i = 1; i < 5; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafWithBranchRightRotation2()
+    {
+        final Object[] btree = generateSampleThreeLevelsTree(new int[] {5, 5, 6, 5, 5});
+        for (int i = 101; i < 105; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafWithBranchMergeWithLeft1()
+    {
+        final Object[] btree = generateSampleThreeLevelsTree(new int[] {5, 5, 5, 5, 5});
+        for (int i = 101; i < 105; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafWithBranchMergeWithLeft2()
+    {
+        final Object[] btree = generateSampleThreeLevelsTree(new int[] {5, 5, 5, 5, 5});
+        for (int i = 401; i < 405; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMinimalLeafWithBranchMergeWithRight()
+    {
+        final Object[] btree = generateSampleThreeLevelsTree(new int[] {5, 5, 5, 5, 5});
+        for (int i = 1; i < 5; ++i)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromMiddleBranch()
+    {
+        final Object[] btree = generateSampleThreeLevelsTree(new int[] {5, 5, 5, 5, 5});
+        for (int i = 10; i < 50; i += 10)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void testRemoveFromRootBranch()
+    {
+        final Object[] btree = generateSampleThreeLevelsTree(new int[] {5, 5, 5, 5, 5});
+        for (int i = 100; i < 500; i += 100)
+            assertRemove(btree, i);
+    }
+
+    @Test
+    public void randomizedTest()
+    {
+        Random rand = new Random(2);
+        SortedSet<Integer> data = new TreeSet<>();
+        for (int i = 0; i < 1000; ++i)
+            data.add(rand.nextInt());
+        Object[] btree = BTree.build(data, UpdateFunction.<Integer>noOp());
+
+        assertTrue(BTree.isWellFormed(btree, CMP));
+        assertTrue(Iterables.elementsEqual(data, BTree.iterable(btree)));
+        while (btree != BTree.empty())
+        {
+            int idx = rand.nextInt(BTree.size(btree));
+            Integer val = BTree.findByIndex(btree, idx);
+            assertTrue(data.remove(val));
+            btree = assertRemove(btree, val);
+        }
+    }
+}

diff --git a/tools/stress/README.txt b/tools/stress/README.txt
index e560c08..aa89dab 100644
--- a/tools/stress/README.txt
+++ b/tools/stress/README.txt

@@ -72,6 +72,8 @@
         The port to connect to cassandra nodes on
     -sendto:
         Specify a stress server to send this command to
+    -graph:
+        Graph recorded metrics
     -tokenrange:
         Token range settings
 

diff --git a/tools/stress/src/org/apache/cassandra/stress/Operation.java b/tools/stress/src/org/apache/cassandra/stress/Operation.java
index 8054482..16f6f04 100644
--- a/tools/stress/src/org/apache/cassandra/stress/Operation.java
+++ b/tools/stress/src/org/apache/cassandra/stress/Operation.java

@@ -1,4 +1,4 @@
-/**
+/*
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -15,6 +15,7 @@
  * See the License for the specific language governing permissions and
  * limitations under the License.
  */
+
 package org.apache.cassandra.stress;
 
 import java.io.IOException;
@@ -32,7 +33,7 @@
 public abstract class Operation
 {
     public final StressSettings settings;
-    public final Timer timer;
+    private final Timer timer;
 
     public Operation(Timer timer, StressSettings settings)
     {
@@ -47,7 +48,7 @@
         public int rowCount();
     }
 
-    public abstract boolean ready(WorkManager permits, RateLimiter rateLimiter);
+    public abstract int ready(WorkManager permits);
 
     public boolean isWrite()
     {
@@ -61,15 +62,17 @@
      */
     public abstract void run(ThriftClient client) throws IOException;
 
-    public void run(SimpleClient client) throws IOException {
+    public void run(SimpleClient client) throws IOException
+    {
         throw new UnsupportedOperationException();
     }
 
-    public void run(JavaDriverClient client) throws IOException {
+    public void run(JavaDriverClient client) throws IOException
+    {
         throw new UnsupportedOperationException();
     }
 
-    public void timeWithRetry(RunOp run) throws IOException
+    public final void timeWithRetry(RunOp run) throws IOException
     {
         timer.start();
 
@@ -105,7 +108,7 @@
                 exceptionMessage = getExceptionMessage(e);
             }
         }
-
+        
         timer.stop(run.partitionCount(), run.rowCount(), !success);
 
         if (!success)
@@ -137,4 +140,13 @@
             System.err.println(message);
     }
 
+    public void close()
+    {
+        timer.close();
+    }
+
+    public void intendedStartNs(long intendedTime)
+    {
+        timer.intendedTimeNs(intendedTime);
+    }
 }

diff --git a/tools/stress/src/org/apache/cassandra/stress/Stress.java b/tools/stress/src/org/apache/cassandra/stress/Stress.java
index bc6d027..874f515 100644
--- a/tools/stress/src/org/apache/cassandra/stress/Stress.java
+++ b/tools/stress/src/org/apache/cassandra/stress/Stress.java

@@ -24,6 +24,7 @@
 import org.apache.cassandra.stress.settings.StressSettings;
 import org.apache.cassandra.utils.FBUtilities;
 import org.apache.cassandra.utils.WindowsTimer;
+import org.apache.cassandra.stress.util.MultiPrintStream;
 
 public final class Stress
 {
@@ -57,22 +58,37 @@
         if (FBUtilities.isWindows())
             WindowsTimer.startTimerPeriod(1);
 
+        int exitCode = run(arguments);
+
+        if (FBUtilities.isWindows())
+            WindowsTimer.endTimerPeriod(1);
+
+        System.exit(exitCode);
+    }
+
+
+    private static int run(String[] arguments)
+    {
         try
         {
-
             final StressSettings settings;
             try
             {
                 settings = StressSettings.parse(arguments);
+                if (settings == null)
+                    return 0; // special settings action
             }
             catch (IllegalArgumentException e)
             {
+                System.out.printf("%s\n", e.getMessage());
                 printHelpMessage();
-                e.printStackTrace();
-                return;
+                return 1;
             }
 
-            PrintStream logout = settings.log.getOutput();
+            MultiPrintStream logout = settings.log.getOutput();
+            if (settings.graph.inGraphMode()) {
+                logout.addStream(new PrintStream(settings.graph.temporaryLogFile));
+            }
 
             if (settings.sendToDaemon != null)
             {
@@ -115,20 +131,19 @@
             {
                 StressAction stressAction = new StressAction(settings, logout);
                 stressAction.run();
+                logout.flush();
+                if (settings.graph.inGraphMode())
+                    new StressGraph(settings, arguments).generateGraph();
             }
 
         }
         catch (Throwable t)
         {
             t.printStackTrace();
-        }
-        finally
-        {
-            if (FBUtilities.isWindows())
-                WindowsTimer.endTimerPeriod(1);
-            System.exit(0);
+            return 1;
         }
 
+        return 0;
     }
 
     /**

diff --git a/tools/stress/src/org/apache/cassandra/stress/StressAction.java b/tools/stress/src/org/apache/cassandra/stress/StressAction.java
index cda54a0..7c37ef8 100644
--- a/tools/stress/src/org/apache/cassandra/stress/StressAction.java
+++ b/tools/stress/src/org/apache/cassandra/stress/StressAction.java

@@ -24,25 +24,25 @@
 import java.util.List;
 import java.util.concurrent.CountDownLatch;
 import java.util.concurrent.TimeUnit;
-
-import com.google.common.util.concurrent.RateLimiter;
-import com.google.common.util.concurrent.Uninterruptibles;
+import java.util.concurrent.atomic.AtomicLong;
+import java.util.concurrent.locks.LockSupport;
 
 import org.apache.cassandra.stress.operations.OpDistribution;
 import org.apache.cassandra.stress.operations.OpDistributionFactory;
+import org.apache.cassandra.stress.settings.ConnectionAPI;
 import org.apache.cassandra.stress.settings.SettingsCommand;
 import org.apache.cassandra.stress.settings.StressSettings;
 import org.apache.cassandra.stress.util.JavaDriverClient;
 import org.apache.cassandra.stress.util.ThriftClient;
-import org.apache.cassandra.stress.util.TimingInterval;
 import org.apache.cassandra.transport.SimpleClient;
 
+import com.google.common.util.concurrent.Uninterruptibles;
+
 public class StressAction implements Runnable
 {
 
     private final StressSettings settings;
     private final PrintStream output;
-
     public StressAction(StressSettings settings, PrintStream out)
     {
         this.settings = settings;
@@ -59,13 +59,14 @@
 
         if (!settings.command.noWarmup)
             warmup(settings.command.getFactory(settings));
+
         if (settings.command.truncate == SettingsCommand.TruncateWhen.ONCE)
             settings.command.truncateTables(settings);
 
         // TODO : move this to a new queue wrapper that gates progress based on a poisson (or configurable) distribution
-        RateLimiter rateLimiter = null;
-        if (settings.rate.opRateTargetPerSecond > 0)
-            rateLimiter = RateLimiter.create(settings.rate.opRateTargetPerSecond);
+        UniformRateLimiter rateLimiter = null;
+        if (settings.rate.opsPerSecond > 0)
+            rateLimiter = new UniformRateLimiter(settings.rate.opsPerSecond);
 
         boolean success;
         if (settings.rate.minThreads > 0)
@@ -80,9 +81,13 @@
             output.println("FAILURE");
 
         settings.disconnect();
+
+        if (!success)
+            throw new RuntimeException("Failed to execute stress action");
     }
 
     // type provided separately to support recursive call for mixed command with each command type it is performing
+    @SuppressWarnings("resource") // warmupOutput doesn't need closing
     private void warmup(OpDistributionFactory operations)
     {
         PrintStream warmupOutput = new PrintStream(new OutputStream() { @Override public void write(int b) throws IOException { } } );
@@ -102,13 +107,16 @@
             // we need to warm up all the nodes in the cluster ideally, but we may not be the only stress instance;
             // so warm up all the nodes we're speaking to only.
             output.println(String.format("Warming up %s with %d iterations...", single.desc(), iterations));
-            run(single, threads, iterations, 0, null, null, warmupOutput, true);
+            boolean success = null != run(single, threads, iterations, 0, null, null, warmupOutput, true);
+            if (!success)
+                throw new RuntimeException("Failed to execute warmup");
         }
+
     }
 
     // TODO : permit varying more than just thread count
     // TODO : vary thread count based on percentage improvement of previous increment, not by fixed amounts
-    private boolean runMulti(boolean auto, RateLimiter rateLimiter)
+    private boolean runMulti(boolean auto, UniformRateLimiter rateLimiter)
     {
         if (settings.command.targetUncertainty >= 0)
             output.println("WARNING: uncertainty mode (err<) results in uneven workload between thread runs, so should be used for high level analysis only");
@@ -118,6 +126,7 @@
         List<String> runIds = new ArrayList<>();
         do
         {
+            output.println("");
             output.println(String.format("Running with %d threadCount", threadCount));
 
             if (settings.command.truncate == SettingsCommand.TruncateWhen.ALWAYS)
@@ -160,7 +169,7 @@
         } while (!auto || (hasAverageImprovement(results, 3, 0) && hasAverageImprovement(results, 5, settings.command.targetUncertainty)));
 
         // summarise all results
-        StressMetrics.summarise(runIds, results, output, settings.samples.historyCount);
+        StressMetrics.summarise(runIds, results, output);
         return true;
     }
 
@@ -185,7 +194,7 @@
                               int threadCount,
                               long opCount,
                               long duration,
-                              RateLimiter rateLimiter,
+                              UniformRateLimiter rateLimiter,
                               TimeUnit durationUnits,
                               PrintStream output,
                               boolean isWarmup)
@@ -204,20 +213,35 @@
 
         final StressMetrics metrics = new StressMetrics(output, settings.log.intervalMillis, settings);
 
+        final CountDownLatch releaseConsumers = new CountDownLatch(1);
         final CountDownLatch done = new CountDownLatch(threadCount);
+        final CountDownLatch start = new CountDownLatch(threadCount);
         final Consumer[] consumers = new Consumer[threadCount];
-        int sampleCount = settings.samples.liveCount / threadCount;
         for (int i = 0; i < threadCount; i++)
         {
-
-            consumers[i] = new Consumer(operations.get(metrics.getTiming(), sampleCount, isWarmup),
-                                        done, workManager, metrics, rateLimiter);
+            consumers[i] = new Consumer(operations.get(metrics.getTiming(), isWarmup),
+                                        done, start, releaseConsumers, workManager, metrics, rateLimiter);
         }
 
         // starting worker threadCount
         for (int i = 0; i < threadCount; i++)
             consumers[i].start();
 
+        // wait for the lot of them to get their pants on
+        try
+        {
+            start.await();
+        }
+        catch (InterruptedException e)
+        {
+            throw new RuntimeException("Unexpected interruption", e);
+        }
+        // start counting from NOW!
+        if(rateLimiter != null)
+            rateLimiter.start();
+        // release the hounds!!!
+        releaseConsumers.countDown();
+
         metrics.start();
 
         if (durationUnits != null)
@@ -258,40 +282,123 @@
         return metrics;
     }
 
+    /**
+     * Provides a 'next operation time' for rate limited operation streams. The rate limiter is thread safe and is to be
+     * shared by all consumer threads.
+     */
+    private static class UniformRateLimiter
+    {
+        long start = Long.MIN_VALUE;
+        final long intervalNs;
+        final AtomicLong opIndex = new AtomicLong();
+
+        UniformRateLimiter(int opsPerSec)
+        {
+            intervalNs = 1000000000 / opsPerSec;
+        }
+
+        void start()
+        {
+            start = System.nanoTime();
+        }
+
+        /**
+         * @param partitionCount
+         * @return expect start time in ns for the operation
+         */
+        long acquire(int partitionCount)
+        {
+            long currOpIndex = opIndex.getAndAdd(partitionCount);
+            return start + currOpIndex * intervalNs;
+        }
+    }
+
+    /**
+     * Provides a blocking stream of operations per consumer.
+     */
+    private static class StreamOfOperations
+    {
+        private final OpDistribution operations;
+        private final UniformRateLimiter rateLimiter;
+        private final WorkManager workManager;
+
+        public StreamOfOperations(OpDistribution operations, UniformRateLimiter rateLimiter, WorkManager workManager)
+        {
+            this.operations = operations;
+            this.rateLimiter = rateLimiter;
+            this.workManager = workManager;
+        }
+
+        /**
+         * This method will block until the next operation becomes available.
+         *
+         * @return next operation or null if no more ops are coming
+         */
+        Operation nextOp()
+        {
+            Operation op = operations.next();
+            final int partitionCount = op.ready(workManager);
+            if (partitionCount == 0)
+                return null;
+            if (rateLimiter != null)
+            {
+                long intendedTime = rateLimiter.acquire(partitionCount);
+                op.intendedStartNs(intendedTime);
+                long now;
+                while ((now = System.nanoTime()) < intendedTime)
+                {
+                    LockSupport.parkNanos(intendedTime - now);
+                }
+            }
+            return op;
+        }
+
+        void close()
+        {
+            operations.closeTimers();
+        }
+
+        void abort()
+        {
+            workManager.stop();
+        }
+    }
+
     private class Consumer extends Thread
     {
-
-        private final OpDistribution operations;
+        private final StreamOfOperations opStream;
         private final StressMetrics metrics;
-        private final RateLimiter rateLimiter;
         private volatile boolean success = true;
-        private final WorkManager workManager;
         private final CountDownLatch done;
+        private final CountDownLatch start;
+        private final CountDownLatch releaseConsumers;
 
         public Consumer(OpDistribution operations,
                         CountDownLatch done,
+                        CountDownLatch start,
+                        CountDownLatch releaseConsumers,
                         WorkManager workManager,
                         StressMetrics metrics,
-                        RateLimiter rateLimiter)
+                        UniformRateLimiter rateLimiter)
         {
             this.done = done;
-            this.rateLimiter = rateLimiter;
-            this.workManager = workManager;
+            this.start = start;
+            this.releaseConsumers = releaseConsumers;
             this.metrics = metrics;
-            this.operations = operations;
+            this.opStream = new StreamOfOperations(operations, rateLimiter, workManager);
         }
 
         public void run()
         {
-            operations.initTimers();
-
             try
             {
                 SimpleClient sclient = null;
                 ThriftClient tclient = null;
                 JavaDriverClient jclient = null;
 
-                switch (settings.mode.api)
+
+                final ConnectionAPI clientType = settings.mode.api;
+                switch (clientType)
                 {
                     case JAVA_DRIVER_NATIVE:
                         jclient = settings.getJavaDriverClient();
@@ -307,15 +414,21 @@
                         throw new IllegalStateException();
                 }
 
+                // synchronize the start of all the consumer threads
+                start.countDown();
+
+                releaseConsumers.await();
+
                 while (true)
                 {
-                    Operation op = operations.next();
-                    if (!op.ready(workManager, rateLimiter))
+                    // Assumption: All ops are thread local, operations are never shared across threads.
+                    Operation op = opStream.nextOp();
+                    if (op == null)
                         break;
 
                     try
                     {
-                        switch (settings.mode.api)
+                        switch (clientType)
                         {
                             case JAVA_DRIVER_NATIVE:
                                 op.run(jclient);
@@ -332,24 +445,26 @@
                     catch (Exception e)
                     {
                         if (output == null)
-                        {
                             System.err.println(e.getMessage());
-                            success = false;
-                            System.exit(-1);
-                        }
+                        else
+                            e.printStackTrace(output);
 
-                        e.printStackTrace(output);
                         success = false;
-                        workManager.stop();
+                        opStream.abort();
                         metrics.cancel();
                         return;
                     }
                 }
             }
+            catch (Exception e)
+            {
+                System.err.println(e.getMessage());
+                success = false;
+            }
             finally
             {
                 done.countDown();
-                operations.closeTimers();
+                opStream.close();
             }
         }
     }

diff --git a/tools/stress/src/org/apache/cassandra/stress/StressGraph.java b/tools/stress/src/org/apache/cassandra/stress/StressGraph.java
new file mode 100644
index 0000000..ebaa0ae
--- /dev/null
+++ b/tools/stress/src/org/apache/cassandra/stress/StressGraph.java

@@ -0,0 +1,261 @@
+/**
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ * <p/>
+ * http://www.apache.org/licenses/LICENSE-2.0
+ * <p/>
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.cassandra.stress;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileInputStream;
+import java.io.FileNotFoundException;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.InputStreamReader;
+import java.io.PrintWriter;
+import java.math.BigDecimal;
+import java.nio.charset.StandardCharsets;
+import java.nio.file.Files;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.regex.Matcher;
+import java.util.regex.Pattern;
+
+import com.google.common.io.ByteStreams;
+import org.apache.commons.lang3.StringUtils;
+
+import org.apache.cassandra.stress.settings.StressSettings;
+import org.json.simple.JSONArray;
+import org.json.simple.JSONObject;
+import org.json.simple.JSONValue;
+
+
+public class StressGraph
+{
+    private StressSettings stressSettings;
+    private enum ReadingMode
+    {
+        START,
+        METRICS,
+        AGGREGATES,
+        NEXTITERATION
+    }
+    private String[] stressArguments;
+
+    public StressGraph(StressSettings stressSetttings, String[] stressArguments)
+    {
+        this.stressSettings = stressSetttings;
+        this.stressArguments = stressArguments;
+    }
+
+    public void generateGraph()
+    {
+        File htmlFile = new File(stressSettings.graph.file);
+        JSONObject stats;
+        if (htmlFile.isFile())
+        {
+            try
+            {
+                String html = new String(Files.readAllBytes(Paths.get(htmlFile.toURI())), StandardCharsets.UTF_8);
+                stats = parseExistingStats(html);
+            }
+            catch (IOException e)
+            {
+                throw new RuntimeException("Couldn't load existing stats html.");
+            }
+            stats = this.createJSONStats(stats);
+        }
+        else
+        {
+            stats = this.createJSONStats(null);
+        }
+
+        try
+        {
+            PrintWriter out = new PrintWriter(htmlFile);
+            String statsBlock = "/* stats start */\nstats = " + stats.toJSONString() + ";\n/* stats end */\n";
+            String html = getGraphHTML().replaceFirst("/\\* stats start \\*/\n\n/\\* stats end \\*/\n", statsBlock);
+            out.write(html);
+            out.close();
+        }
+        catch (IOException e)
+        {
+            throw new RuntimeException("Couldn't write stats html.");
+        }
+    }
+
+    private JSONObject parseExistingStats(String html)
+    {
+        JSONObject stats;
+
+        Pattern pattern = Pattern.compile("(?s).*/\\* stats start \\*/\\nstats = (.*);\\n/\\* stats end \\*/.*");
+        Matcher matcher = pattern.matcher(html);
+        matcher.matches();
+        stats = (JSONObject) JSONValue.parse(matcher.group(1));
+
+        return stats;
+    }
+
+    private String getGraphHTML()
+    {
+        InputStream graphHTMLRes = StressGraph.class.getClassLoader().getResourceAsStream("org/apache/cassandra/stress/graph/graph.html");
+        String graphHTML;
+        try
+        {
+            graphHTML = new String(ByteStreams.toByteArray(graphHTMLRes));
+        }
+        catch (IOException e)
+        {
+            throw new RuntimeException(e);
+        }
+        return graphHTML;
+    }
+
+    /** Parse log and append to stats array */
+    private JSONArray parseLogStats(InputStream log, JSONArray stats) {
+        BufferedReader reader = new BufferedReader(new InputStreamReader(log));
+        JSONObject json = new JSONObject();
+        JSONArray intervals = new JSONArray();
+        boolean runningMultipleThreadCounts = false;
+        String currentThreadCount = null;
+        Pattern threadCountMessage = Pattern.compile("Running ([A-Z]+) with ([0-9]+) threads .*");
+        ReadingMode mode = ReadingMode.START;
+
+        try
+        {
+            String line;
+            while ((line = reader.readLine()) != null)
+            {
+                // Detect if we are running multiple thread counts:
+                if (line.startsWith("Thread count was not specified"))
+                    runningMultipleThreadCounts = true;
+
+                if (runningMultipleThreadCounts)
+                {
+                    // Detect thread count:
+                    Matcher tc = threadCountMessage.matcher(line);
+                    if (tc.matches())
+                    {
+                        currentThreadCount = tc.group(2);
+                    }
+                }
+                
+                // Detect mode changes
+                if (line.equals(StressMetrics.HEAD))
+                {
+                    mode = ReadingMode.METRICS;
+                    continue;
+                }
+                else if (line.equals("Results:"))
+                {
+                    mode = ReadingMode.AGGREGATES;
+                    continue;
+                }
+                else if (mode == ReadingMode.AGGREGATES && line.equals(""))
+                {
+                    mode = ReadingMode.NEXTITERATION;
+                }
+                else if (line.equals("END") || line.equals("FAILURE"))
+                {
+                    break;
+                }
+
+                // Process lines
+                if (mode == ReadingMode.METRICS)
+                {
+                    JSONArray metrics = new JSONArray();
+                    String[] parts = line.split(",");
+                    if (parts.length != StressMetrics.HEADMETRICS.length)
+                    {
+                        continue;
+                    }
+                    for (String m : parts)
+                    {
+                        try
+                        {
+                            metrics.add(new BigDecimal(m.trim()));
+                        }
+                        catch (NumberFormatException e)
+                        {
+                            metrics.add(null);
+                        }
+                    }
+                    intervals.add(metrics);
+                }
+                else if (mode == ReadingMode.AGGREGATES)
+                {
+                    String[] parts = line.split(":",2);
+                    if (parts.length != 2)
+                    {
+                        continue;
+                    }
+                    json.put(parts[0].trim(), parts[1].trim());
+                }
+                else if (mode == ReadingMode.NEXTITERATION)
+                {
+                    //Wrap up the results of this test and append to the array.
+                    json.put("metrics", Arrays.asList(StressMetrics.HEADMETRICS));
+                    json.put("test", stressSettings.graph.operation);
+                    if (currentThreadCount == null)
+                        json.put("revision", stressSettings.graph.revision);
+                    else
+                        json.put("revision", String.format("%s - %s threads", stressSettings.graph.revision, currentThreadCount));
+                    json.put("command", StringUtils.join(stressArguments, " "));
+                    json.put("intervals", intervals);
+                    stats.add(json);
+
+                    //Start fresh for next iteration:
+                    json = new JSONObject();
+                    intervals = new JSONArray();
+                    mode = ReadingMode.START;
+                }
+            }
+        }
+        catch (IOException e)
+        {
+            throw new RuntimeException("Couldn't read from temporary stress log file");
+        }
+        stats.add(json);
+        return stats;
+    }
+
+    private JSONObject createJSONStats(JSONObject json)
+    {
+        try (InputStream logStream = new FileInputStream(stressSettings.graph.temporaryLogFile))
+        {
+            JSONArray stats;
+            if (json == null)
+            {
+                json = new JSONObject();
+                stats = new JSONArray();
+            }
+            else
+            {
+                stats = (JSONArray) json.get("stats");
+            }
+
+            stats = parseLogStats(logStream, stats);
+
+            json.put("title", stressSettings.graph.title);
+            json.put("stats", stats);
+            return json;
+        }
+        catch (IOException e)
+        {
+            throw new RuntimeException(e);
+        }
+    }
+}

diff --git a/tools/stress/src/org/apache/cassandra/stress/StressMetrics.java b/tools/stress/src/org/apache/cassandra/stress/StressMetrics.java
index a4f280e..86e9a7a 100644
--- a/tools/stress/src/org/apache/cassandra/stress/StressMetrics.java
+++ b/tools/stress/src/org/apache/cassandra/stress/StressMetrics.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,52 +8,82 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
+import static java.util.concurrent.TimeUnit.NANOSECONDS;
+
+import java.io.FileNotFoundException;
 import java.io.PrintStream;
 import java.util.Arrays;
 import java.util.List;
 import java.util.Map;
 import java.util.concurrent.Callable;
 import java.util.concurrent.CountDownLatch;
-import java.util.concurrent.ThreadFactory;
 
-import org.apache.cassandra.stress.util.*;
-import org.apache.commons.lang3.time.DurationFormatUtils;
-import org.apache.cassandra.concurrent.NamedThreadFactory;
+import org.HdrHistogram.Histogram;
+import org.HdrHistogram.HistogramLogWriter;
 import org.apache.cassandra.stress.settings.StressSettings;
+import org.apache.cassandra.stress.util.JmxCollector;
+import org.apache.cassandra.stress.util.Timing;
+import org.apache.cassandra.stress.util.TimingInterval;
+import org.apache.cassandra.stress.util.TimingIntervals;
+import org.apache.cassandra.stress.util.Uncertainty;
+import org.apache.cassandra.utils.FBUtilities;
+import org.apache.commons.lang3.time.DurationFormatUtils;
 
 public class StressMetrics
 {
 
-    private static final ThreadFactory tf = new NamedThreadFactory("StressMetrics");
-
     private final PrintStream output;
     private final Thread thread;
-    private volatile boolean stop = false;
-    private volatile boolean cancelled = false;
     private final Uncertainty rowRateUncertainty = new Uncertainty();
     private final CountDownLatch stopped = new CountDownLatch(1);
     private final Timing timing;
     private final Callable<JmxCollector.GcStats> gcStatsCollector;
+    private final HistogramLogWriter histogramWriter;
+    private final long epochNs = System.nanoTime();
+    private final long epochMs = System.currentTimeMillis();
+
     private volatile JmxCollector.GcStats totalGcStats;
-    private final StressSettings settings;
+
+    private volatile boolean stop = false;
+    private volatile boolean cancelled = false;
 
     public StressMetrics(PrintStream output, final long logIntervalMillis, StressSettings settings)
     {
         this.output = output;
-        this.settings = settings;
+        if(settings.log.hdrFile != null)
+        {
+            try
+            {
+                histogramWriter = new HistogramLogWriter(settings.log.hdrFile);
+                histogramWriter.outputComment("Logging op latencies for Cassandra Stress");
+                histogramWriter.outputLogFormatVersion();
+                histogramWriter.outputBaseTime(epochMs);
+                histogramWriter.setBaseTime(epochMs);
+                histogramWriter.outputStartTime(epochMs);
+                histogramWriter.outputLegend();
+            }
+            catch (FileNotFoundException e)
+            {
+                throw new IllegalArgumentException(e);
+            }
+        }
+        else
+        {
+            histogramWriter = null;
+        }
         Callable<JmxCollector.GcStats> gcStatsCollector;
         totalGcStats = new JmxCollector.GcStats(0);
         try
@@ -77,10 +107,10 @@
             };
         }
         this.gcStatsCollector = gcStatsCollector;
-        this.timing = new Timing(settings.samples.historyCount, settings.samples.reportCount);
+        this.timing = new Timing(settings.rate.isFixed);
 
         printHeader("", output);
-        thread = tf.newThread(new Runnable()
+        thread = new Thread(new Runnable()
         {
             @Override
             public void run()
@@ -124,6 +154,7 @@
                 }
             }
         });
+        thread.setName("StressMetrics");
     }
 
     public void start()
@@ -155,15 +186,23 @@
     {
         Timing.TimingResult<JmxCollector.GcStats> result = timing.snap(gcStatsCollector);
         totalGcStats = JmxCollector.GcStats.aggregate(Arrays.asList(totalGcStats, result.extra));
-        TimingInterval current = result.intervals.combine(settings.samples.reportCount);
-        TimingInterval history = timing.getHistory().combine(settings.samples.historyCount);
+        TimingInterval current = result.intervals.combine();
+        TimingInterval history = timing.getHistory().combine();
         rowRateUncertainty.update(current.adjustedRowRate());
-        if (current.operationCount != 0)
+        if (current.operationCount() != 0)
         {
-            if (result.intervals.intervals().size() > 1)
+            // if there's a single operation we only print the total
+            final boolean logPerOpSummaryLine = result.intervals.intervals().size() > 1;
+
+            for (Map.Entry<String, TimingInterval> type : result.intervals.intervals().entrySet())
             {
-                for (Map.Entry<String, TimingInterval> type : result.intervals.intervals().entrySet())
-                    printRow("", type.getKey(), type.getValue(), timing.getHistory().get(type.getKey()), result.extra, rowRateUncertainty, output);
+                final String opName = type.getKey();
+                final TimingInterval opInterval = type.getValue();
+                if (logPerOpSummaryLine)
+                {
+                    printRow("", opName, opInterval, timing.getHistory().get(type.getKey()), result.extra, rowRateUncertainty, output);
+                }
+                logHistograms(opName, opInterval);
             }
 
             printRow("", "total", current, history, result.extra, rowRateUncertainty, output);
@@ -173,31 +212,57 @@
     }
 
 
+    private void logHistograms(String opName, TimingInterval opInterval)
+    {
+        if (histogramWriter == null)
+            return;
+        final long startNs = opInterval.startNanos();
+        final long endNs = opInterval.endNanos();
+
+        logHistogram(opName + "-st", startNs, endNs, opInterval.serviceTime());
+        logHistogram(opName + "-rt", startNs, endNs, opInterval.responseTime());
+        logHistogram(opName + "-wt", startNs, endNs, opInterval.waitTime());
+    }
+
+    private void logHistogram(String opName, final long startNs, final long endNs, final Histogram histogram)
+    {
+        if (histogram.getTotalCount() != 0)
+        {
+            histogram.setTag(opName);
+            histogram.setStartTimeStamp(epochMs + NANOSECONDS.toMillis(startNs - epochNs));
+            histogram.setEndTimeStamp(epochMs + NANOSECONDS.toMillis(endNs - epochNs));
+            histogramWriter.outputIntervalHistogram(histogram);
+        }
+    }
+
+
     // PRINT FORMATTING
 
     public static final String HEADFORMAT = "%-10s%10s,%8s,%8s,%8s,%8s,%8s,%8s,%8s,%8s,%8s,%7s,%9s,%7s,%7s,%8s,%8s,%8s,%8s";
     public static final String ROWFORMAT =  "%-10s%10d,%8.0f,%8.0f,%8.0f,%8.1f,%8.1f,%8.1f,%8.1f,%8.1f,%8.1f,%7.1f,%9.5f,%7d,%7.0f,%8.0f,%8.0f,%8.0f,%8.0f";
+    public static final String[] HEADMETRICS = new String[]{"type", "total ops","op/s","pk/s","row/s","mean","med",".95",".99",".999","max","time","stderr", "errors", "gc: #", "max ms", "sum ms", "sdv ms", "mb"};
+    public static final String HEAD = String.format(HEADFORMAT, (Object[]) HEADMETRICS);
 
     private static void printHeader(String prefix, PrintStream output)
     {
-        output.println(prefix + String.format(HEADFORMAT, "type,", "total ops","op/s","pk/s","row/s","mean","med",".95",".99",".999","max","time","stderr", "errors", "gc: #", "max ms", "sum ms", "sdv ms", "mb"));
+        output.println(prefix + HEAD);
     }
 
     private static void printRow(String prefix, String type, TimingInterval interval, TimingInterval total, JmxCollector.GcStats gcStats, Uncertainty opRateUncertainty, PrintStream output)
     {
         output.println(prefix + String.format(ROWFORMAT,
                 type + ",",
-                total.operationCount,
+                total.operationCount(),
                 interval.opRate(),
                 interval.partitionRate(),
                 interval.rowRate(),
-                interval.meanLatency(),
-                interval.medianLatency(),
-                interval.rankLatency(0.95f),
-                interval.rankLatency(0.99f),
-                interval.rankLatency(0.999f),
-                interval.maxLatency(),
-                total.runTime() / 1000f,
+                interval.meanLatencyMs(),
+                interval.medianLatencyMs(),
+                interval.latencyAtPercentileMs(95.0),
+                interval.latencyAtPercentileMs(99.0),
+                interval.latencyAtPercentileMs(99.9),
+                interval.maxLatencyMs(),
+                total.runTimeMs() / 1000f,
                 opRateUncertainty.getUncertainty(),
                 interval.errorCount,
                 gcStats.count,
@@ -214,28 +279,29 @@
         output.println("Results:");
 
         TimingIntervals opHistory = timing.getHistory();
-        TimingInterval history = opHistory.combine(settings.samples.historyCount);
-        output.println(String.format("op rate                   : %.0f %s", history.opRate(), opHistory.opRates()));
-        output.println(String.format("partition rate            : %.0f %s", history.partitionRate(), opHistory.partitionRates()));
-        output.println(String.format("row rate                  : %.0f %s", history.rowRate(), opHistory.rowRates()));
-        output.println(String.format("latency mean              : %.1f %s", history.meanLatency(), opHistory.meanLatencies()));
-        output.println(String.format("latency median            : %.1f %s", history.medianLatency(), opHistory.medianLatencies()));
-        output.println(String.format("latency 95th percentile   : %.1f %s", history.rankLatency(.95f), opHistory.rankLatencies(0.95f)));
-        output.println(String.format("latency 99th percentile   : %.1f %s", history.rankLatency(0.99f), opHistory.rankLatencies(0.99f)));
-        output.println(String.format("latency 99.9th percentile : %.1f %s", history.rankLatency(0.999f), opHistory.rankLatencies(0.999f)));
-        output.println(String.format("latency max               : %.1f %s", history.maxLatency(), opHistory.maxLatencies()));
-        output.println(String.format("Total partitions          : %d %s",   history.partitionCount, opHistory.partitionCounts()));
-        output.println(String.format("Total errors              : %d %s",   history.errorCount, opHistory.errorCounts()));
-        output.println(String.format("total gc count            : %.0f", totalGcStats.count));
-        output.println(String.format("total gc mb               : %.0f", totalGcStats.bytes / (1 << 20)));
-        output.println(String.format("total gc time (s)         : %.0f", totalGcStats.summs / 1000));
-        output.println(String.format("avg gc time(ms)           : %.0f", totalGcStats.summs / totalGcStats.count));
-        output.println(String.format("stdev gc time(ms)         : %.0f", totalGcStats.sdvms));
+        TimingInterval history = opHistory.combine();
+        output.println(String.format("Op rate                   : %,8.0f op/s  %s", history.opRate(), opHistory.opRates()));
+        output.println(String.format("Partition rate            : %,8.0f pk/s  %s", history.partitionRate(), opHistory.partitionRates()));
+        output.println(String.format("Row rate                  : %,8.0f row/s %s", history.rowRate(), opHistory.rowRates()));
+        output.println(String.format("Latency mean              : %6.1f ms %s", history.meanLatencyMs(), opHistory.meanLatencies()));
+        output.println(String.format("Latency median            : %6.1f ms %s", history.medianLatencyMs(), opHistory.medianLatencies()));
+        output.println(String.format("Latency 95th percentile   : %6.1f ms %s", history.latencyAtPercentileMs(95.0), opHistory.latenciesAtPercentile(95.0)));
+        output.println(String.format("Latency 99th percentile   : %6.1f ms %s", history.latencyAtPercentileMs(99.0), opHistory.latenciesAtPercentile(99.0)));
+        output.println(String.format("Latency 99.9th percentile : %6.1f ms %s", history.latencyAtPercentileMs(99.9), opHistory.latenciesAtPercentile(99.9)));
+        output.println(String.format("Latency max               : %6.1f ms %s", history.maxLatencyMs(), opHistory.maxLatencies()));
+        output.println(String.format("Total partitions          : %,10d %s",   history.partitionCount, opHistory.partitionCounts()));
+        output.println(String.format("Total errors              : %,10d %s",   history.errorCount, opHistory.errorCounts()));
+        output.println(String.format("Total GC count            : %,1.0f", totalGcStats.count));
+        output.println(String.format("Total GC memory           : %s", FBUtilities.prettyPrintMemory((long)totalGcStats.bytes, true)));
+        output.println(String.format("Total GC time             : %,6.1f seconds", totalGcStats.summs / 1000));
+        output.println(String.format("Avg GC time               : %,6.1f ms", totalGcStats.summs / totalGcStats.count));
+        output.println(String.format("StdDev GC time            : %,6.1f ms", totalGcStats.sdvms));
         output.println("Total operation time      : " + DurationFormatUtils.formatDuration(
-                history.runTime(), "HH:mm:ss", true));
+                history.runTimeMs(), "HH:mm:ss", true));
+        output.println(""); // Newline is important here to separate the aggregates section from the END or the next stress iteration
     }
 
-    public static void summarise(List<String> ids, List<StressMetrics> summarise, PrintStream out, int historySampleCount)
+    public static void summarise(List<String> ids, List<StressMetrics> summarise, PrintStream out)
     {
         int idLen = 0;
         for (String id : ids)
@@ -254,7 +320,7 @@
                          summarise.get(i).rowRateUncertainty,
                          out);
             }
-            TimingInterval hist = summarise.get(i).timing.getHistory().combine(historySampleCount);
+            TimingInterval hist = summarise.get(i).timing.getHistory().combine();
             printRow(String.format(formatstr, ids.get(i)),
                     "total",
                     hist,

diff --git a/tools/stress/src/org/apache/cassandra/stress/StressProfile.java b/tools/stress/src/org/apache/cassandra/stress/StressProfile.java
index 5243d96..8b59bda 100644
--- a/tools/stress/src/org/apache/cassandra/stress/StressProfile.java
+++ b/tools/stress/src/org/apache/cassandra/stress/StressProfile.java

@@ -28,6 +28,7 @@
 import java.net.URI;
 import java.util.*;
 import java.util.concurrent.TimeUnit;
+import java.util.regex.Pattern;
 
 import com.google.common.base.Function;
 import com.google.common.util.concurrent.Uninterruptibles;
@@ -87,6 +88,8 @@
     transient volatile Map<String, PreparedStatement> queryStatements;
     transient volatile Map<String, Integer> thriftQueryIds;
 
+    private static final Pattern lowercaseAlphanumeric = Pattern.compile("[a-z0-9_]+");
+
     private void init(StressYaml yaml) throws RequestValidationException
     {
         keyspaceName = yaml.keyspace;
@@ -243,7 +246,7 @@
                 TableMetadata metadata = client.getCluster()
                                                .getMetadata()
                                                .getKeyspace(keyspaceName)
-                                               .getTable(tableName);
+                                               .getTable(quoteIdentifier(tableName));
 
                 if (metadata == null)
                     throw new RuntimeException("Unable to find table " + keyspaceName + "." + tableName);
@@ -368,55 +371,78 @@
                     maybeLoadSchemaInfo(settings);
 
                     Set<ColumnMetadata> keyColumns = com.google.common.collect.Sets.newHashSet(tableMetaData.getPrimaryKey());
+                    Set<ColumnMetadata> allColumns = com.google.common.collect.Sets.newHashSet(tableMetaData.getColumns());
+                    boolean isKeyOnlyTable = (keyColumns.size() == allColumns.size());
+                    //With compact storage
+                    if (!isKeyOnlyTable && (keyColumns.size() == (allColumns.size() - 1)))
+                    {
+                        com.google.common.collect.Sets.SetView diff = com.google.common.collect.Sets.difference(allColumns, keyColumns);
+                        for (Object obj : diff)
+                        {
+                            ColumnMetadata col = (ColumnMetadata)obj;
+                            isKeyOnlyTable = col.getName().isEmpty();
+                            break;
+                        }
+                    }
 
                     //Non PK Columns
                     StringBuilder sb = new StringBuilder();
-
-                    sb.append("UPDATE \"").append(tableName).append("\" SET ");
-
-                    //PK Columns
-                    StringBuilder pred = new StringBuilder();
-                    pred.append(" WHERE ");
-
-                    boolean firstCol = true;
-                    boolean firstPred = true;
-                    for (ColumnMetadata c : tableMetaData.getColumns())
+                    if (!isKeyOnlyTable)
                     {
+                        sb.append("UPDATE ").append(quoteIdentifier(tableName)).append(" SET ");
+                        //PK Columns
+                        StringBuilder pred = new StringBuilder();
+                        pred.append(" WHERE ");
 
-                        if (keyColumns.contains(c))
-                        {
-                            if (firstPred)
-                                firstPred = false;
-                            else
-                                pred.append(" AND ");
+                        boolean firstCol = true;
+                        boolean firstPred = true;
+                        for (ColumnMetadata c : tableMetaData.getColumns()) {
 
-                            pred.append(c.getName()).append(" = ?");
-                        }
-                        else
-                        {
-                            if (firstCol)
-                                firstCol = false;
-                            else
-                                sb.append(",");
+                            if (keyColumns.contains(c)) {
+                                if (firstPred)
+                                    firstPred = false;
+                                else
+                                    pred.append(" AND ");
 
-                            sb.append(c.getName()).append(" = ");
+                                pred.append(quoteIdentifier(c.getName())).append(" = ?");
+                            } else {
+                                if (firstCol)
+                                    firstCol = false;
+                                else
+                                    sb.append(',');
 
-                            switch (c.getType().getName())
-                            {
+                                sb.append(quoteIdentifier(c.getName())).append(" = ");
+
+                                switch (c.getType().getName())
+                                {
                                 case SET:
                                 case LIST:
                                 case COUNTER:
-                                    sb.append(c.getName()).append(" + ?");
+                                    sb.append(quoteIdentifier(c.getName())).append(" + ?");
                                     break;
                                 default:
                                     sb.append("?");
                                     break;
+                                }
                             }
                         }
-                    }
 
-                    //Put PK predicates at the end
-                    sb.append(pred);
+                        //Put PK predicates at the end
+                        sb.append(pred);
+                    }
+                    else
+                    {
+                        sb.append("INSERT INTO ").append(quoteIdentifier(tableName)).append(" (");
+                        StringBuilder value = new StringBuilder();
+                        for (ColumnMetadata c : tableMetaData.getPrimaryKey())
+                        {
+                            sb.append(quoteIdentifier(c.getName())).append(", ");
+                            value.append("?, ");
+                        }
+                        sb.delete(sb.lastIndexOf(","), sb.length());
+                        value.delete(value.lastIndexOf(","), value.length());
+                        sb.append(") ").append("values(").append(value).append(')');
+                    }
 
                     if (insert == null)
                         insert = new HashMap<>();
@@ -669,4 +695,10 @@
         for (Map.Entry<String, V> e : reinsert)
             map.put(e.getKey().toLowerCase(), e.getValue());
     }
+
+    /* Quote a identifier if it contains uppercase letters */
+    private static String quoteIdentifier(String identifier)
+    {
+        return lowercaseAlphanumeric.matcher(identifier).matches() ? identifier : '\"'+identifier+ '\"';
+    }
 }

diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/FixedOpDistribution.java b/tools/stress/src/org/apache/cassandra/stress/operations/FixedOpDistribution.java
index f2616cf..9a3522c 100644
--- a/tools/stress/src/org/apache/cassandra/stress/operations/FixedOpDistribution.java
+++ b/tools/stress/src/org/apache/cassandra/stress/operations/FixedOpDistribution.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.operations;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,16 +8,16 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
@@ -37,13 +37,8 @@
         return operation;
     }
 
-    public void initTimers()
-    {
-        operation.timer.init();
-    }
-
     public void closeTimers()
     {
-        operation.timer.close();
+        operation.close();
     }
 }

diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/OpDistribution.java b/tools/stress/src/org/apache/cassandra/stress/operations/OpDistribution.java
index e09300a..33a0c93 100644
--- a/tools/stress/src/org/apache/cassandra/stress/operations/OpDistribution.java
+++ b/tools/stress/src/org/apache/cassandra/stress/operations/OpDistribution.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.operations;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,16 +8,16 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
@@ -28,6 +28,5 @@
 
     Operation next();
 
-    public void initTimers();
     public void closeTimers();
 }

diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/OpDistributionFactory.java b/tools/stress/src/org/apache/cassandra/stress/operations/OpDistributionFactory.java
index 5fbb0f9..14e6dfb 100644
--- a/tools/stress/src/org/apache/cassandra/stress/operations/OpDistributionFactory.java
+++ b/tools/stress/src/org/apache/cassandra/stress/operations/OpDistributionFactory.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.operations;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,16 +8,16 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
@@ -25,7 +25,7 @@
 
 public interface OpDistributionFactory
 {
-    public OpDistribution get(Timing timing, int sampleCount, boolean isWarmup);
+    public OpDistribution get(Timing timing, boolean isWarmup);
     public String desc();
     Iterable<OpDistributionFactory> each();
 }

diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/PartitionOperation.java b/tools/stress/src/org/apache/cassandra/stress/operations/PartitionOperation.java
index 45c36f2..784b2ac 100644
--- a/tools/stress/src/org/apache/cassandra/stress/operations/PartitionOperation.java
+++ b/tools/stress/src/org/apache/cassandra/stress/operations/PartitionOperation.java

@@ -77,14 +77,14 @@
         this.spec = spec;
     }
 
-    public boolean ready(WorkManager permits, RateLimiter rateLimiter)
+    public int ready(WorkManager permits)
     {
         int partitionCount = (int) spec.partitionCount.next();
         if (partitionCount <= 0)
-            return false;
+            return 0;
         partitionCount = permits.takePermits(partitionCount);
         if (partitionCount <= 0)
-            return false;
+            return 0;
 
         int i = 0;
         boolean success = true;
@@ -105,11 +105,8 @@
         }
         partitionCount = i;
 
-        if (rateLimiter != null)
-            rateLimiter.acquire(partitionCount);
-
         partitions = partitionCache.subList(0, partitionCount);
-        return !partitions.isEmpty();
+        return partitions.size();
     }
 
     protected boolean reset(Seed seed, PartitionIterator iterator)

diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/SampledOpDistribution.java b/tools/stress/src/org/apache/cassandra/stress/operations/SampledOpDistribution.java
index 9698421..fc0229e 100644
--- a/tools/stress/src/org/apache/cassandra/stress/operations/SampledOpDistribution.java
+++ b/tools/stress/src/org/apache/cassandra/stress/operations/SampledOpDistribution.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.operations;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,16 +8,16 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
@@ -52,19 +52,11 @@
         return cur;
     }
 
-    public void initTimers()
-    {
-        for (Pair<Operation, Double> op : operations.getPmf())
-        {
-            op.getFirst().timer.init();
-        }
-    }
-
     public void closeTimers()
     {
         for (Pair<Operation, Double> op : operations.getPmf())
         {
-            op.getFirst().timer.close();
+            op.getFirst().close();
         }
     }
 }

diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/SampledOpDistributionFactory.java b/tools/stress/src/org/apache/cassandra/stress/operations/SampledOpDistributionFactory.java
index a10585d..0b206f9 100644
--- a/tools/stress/src/org/apache/cassandra/stress/operations/SampledOpDistributionFactory.java
+++ b/tools/stress/src/org/apache/cassandra/stress/operations/SampledOpDistributionFactory.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.operations;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,16 +8,16 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
@@ -47,13 +47,13 @@
     protected abstract List<? extends Operation> get(Timer timer, PartitionGenerator generator, T key, boolean isWarmup);
     protected abstract PartitionGenerator newGenerator();
 
-    public OpDistribution get(Timing timing, int sampleCount, boolean isWarmup)
+    public OpDistribution get(Timing timing, boolean isWarmup)
     {
         PartitionGenerator generator = newGenerator();
         List<Pair<Operation, Double>> operations = new ArrayList<>();
         for (Map.Entry<T, Double> ratio : ratios.entrySet())
         {
-            List<? extends Operation> ops = get(timing.newTimer(ratio.getKey().toString(), sampleCount),
+            List<? extends Operation> ops = get(timing.newTimer(ratio.getKey().toString()),
                                                 generator, ratio.getKey(), isWarmup);
             for (Operation op : ops)
                 operations.add(new Pair<>(op, ratio.getValue() / ops.size()));
@@ -76,9 +76,9 @@
         {
             out.add(new OpDistributionFactory()
             {
-                public OpDistribution get(Timing timing, int sampleCount, boolean isWarmup)
+                public OpDistribution get(Timing timing, boolean isWarmup)
                 {
-                    List<? extends Operation> ops = SampledOpDistributionFactory.this.get(timing.newTimer(ratio.getKey().toString(), sampleCount),
+                    List<? extends Operation> ops = SampledOpDistributionFactory.this.get(timing.newTimer(ratio.getKey().toString()),
                                                                                           newGenerator(),
                                                                                           ratio.getKey(),
                                                                                           isWarmup);

diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/predefined/CqlOperation.java b/tools/stress/src/org/apache/cassandra/stress/operations/predefined/CqlOperation.java
index afdc0b1..097c1a0 100644
--- a/tools/stress/src/org/apache/cassandra/stress/operations/predefined/CqlOperation.java
+++ b/tools/stress/src/org/apache/cassandra/stress/operations/predefined/CqlOperation.java

@@ -49,6 +49,9 @@
 public abstract class CqlOperation<V> extends PredefinedOperation
 {
 
+    public static final ByteBuffer[][] EMPTY_BYTE_BUFFERS = new ByteBuffer[0][];
+    public static final byte[][] EMPTY_BYTE_ARRAYS = new byte[0][];
+
     protected abstract List<Object> getQueryParameters(byte[] key);
     protected abstract String buildQuery();
     protected abstract CqlRunOp<V> buildRunOp(ClientWrapper client, String query, Object queryId, List<Object> params, ByteBuffer key);
@@ -455,7 +458,7 @@
                 public ByteBuffer[][] apply(ResultSet result)
                 {
                     if (result == null)
-                        return new ByteBuffer[0][];
+                        return EMPTY_BYTE_BUFFERS;
                     List<Row> rows = result.all();
 
                     ByteBuffer[][] r = new ByteBuffer[rows.size()][];
@@ -481,7 +484,7 @@
                 public ByteBuffer[][] apply(ResultMessage result)
                 {
                     if (!(result instanceof ResultMessage.Rows))
-                        return new ByteBuffer[0][];
+                        return EMPTY_BYTE_BUFFERS;
 
                     ResultMessage.Rows rows = ((ResultMessage.Rows) result);
                     ByteBuffer[][] r = new ByteBuffer[rows.result.size()][];
@@ -536,7 +539,7 @@
                 {
 
                     if (result == null)
-                        return new byte[0][];
+                        return EMPTY_BYTE_ARRAYS;
                     List<Row> rows = result.all();
                     byte[][] r = new byte[rows.size()][];
                     for (int i = 0 ; i < r.length ; i++)

diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/predefined/PredefinedOperation.java b/tools/stress/src/org/apache/cassandra/stress/operations/predefined/PredefinedOperation.java
index 3767401..1f9a2c8 100644
--- a/tools/stress/src/org/apache/cassandra/stress/operations/predefined/PredefinedOperation.java
+++ b/tools/stress/src/org/apache/cassandra/stress/operations/predefined/PredefinedOperation.java

@@ -35,6 +35,7 @@
 
 public abstract class PredefinedOperation extends PartitionOperation
 {
+    public static final byte[] EMPTY_BYTE_ARRAY = {};
     public final Command type;
     private final Distribution columnCount;
     private Object cqlCache;
@@ -107,7 +108,7 @@
             {
                 predicate.setSlice_range(new SliceRange()
                                          .setStart(settings.columns.names.get(lb))
-                                         .setFinish(new byte[] {})
+                                         .setFinish(EMPTY_BYTE_ARRAY)
                                          .setReversed(false)
                                          .setCount(count())
                 );

diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/SchemaStatement.java b/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/SchemaStatement.java
index c9ead12..166d689 100644
--- a/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/SchemaStatement.java
+++ b/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/SchemaStatement.java

@@ -20,8 +20,6 @@
  * 
  */
 
-
-import java.io.IOException;
 import java.nio.ByteBuffer;
 import java.util.ArrayList;
 import java.util.List;
@@ -38,7 +36,6 @@
 import org.apache.cassandra.stress.settings.StressSettings;
 import org.apache.cassandra.stress.util.JavaDriverClient;
 import org.apache.cassandra.stress.util.Timer;
-import org.apache.cassandra.transport.SimpleClient;
 
 public abstract class SchemaStatement extends PartitionOperation
 {
@@ -92,12 +89,6 @@
         return args;
     }
 
-    @Override
-    public void run(SimpleClient client) throws IOException
-    {
-        throw new UnsupportedOperationException();
-    }
-
     abstract class Runner implements RunOp
     {
         int partitionCount;

diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/TokenRangeQuery.java b/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/TokenRangeQuery.java
index 60a6c48..198f1f5 100644
--- a/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/TokenRangeQuery.java
+++ b/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/TokenRangeQuery.java

@@ -248,18 +248,16 @@
         timeWithRetry(new ThriftRun(client));
     }
 
-    public boolean ready(WorkManager workManager, RateLimiter rateLimiter)
+    public int ready(WorkManager workManager)
     {
         tokenRangeIterator.update();
 
         if (tokenRangeIterator.exhausted() && currentState.get() == null)
-            return false;
+            return 0;
 
         int numLeft = workManager.takePermits(1);
-        if (rateLimiter != null && numLeft > 0 )
-            rateLimiter.acquire(numLeft);
 
-        return numLeft > 0;
+        return numLeft > 0 ? 1 : 0;
     }
 
     public String key()

diff --git a/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/ValidatingSchemaQuery.java b/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/ValidatingSchemaQuery.java
index 02a9ca8..33f6f80 100644
--- a/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/ValidatingSchemaQuery.java
+++ b/tools/stress/src/org/apache/cassandra/stress/operations/userdefined/ValidatingSchemaQuery.java

@@ -26,7 +26,6 @@
 import java.util.ArrayList;
 import java.util.Iterator;
 import java.util.List;
-import java.util.Random;
 import java.util.concurrent.ThreadLocalRandom;
 
 import com.datastax.driver.core.*;
@@ -43,13 +42,11 @@
 import org.apache.cassandra.thrift.CqlResult;
 import org.apache.cassandra.thrift.CqlRow;
 import org.apache.cassandra.thrift.ThriftConversion;
-import org.apache.cassandra.transport.SimpleClient;
 import org.apache.cassandra.utils.Pair;
 import org.apache.thrift.TException;
 
 public class ValidatingSchemaQuery extends PartitionOperation
 {
-    final Random random = new Random();
     private Pair<Row, Row> bounds;
 
     final int clusteringComponents;
@@ -58,12 +55,6 @@
     final int[] argumentIndex;
     final Object[] bindBuffer;
 
-    @Override
-    public void run(SimpleClient client) throws IOException
-    {
-        throw new UnsupportedOperationException();
-    }
-
     private ValidatingSchemaQuery(Timer timer, StressSettings settings, PartitionGenerator generator, SeedManager seedManager, ValidatingStatement[] statements, ConsistencyLevel cl, int clusteringComponents)
     {
         super(timer, settings, new DataSpec(generator, seedManager, new DistributionFixed(1), settings.insert.rowPopulationRatio.get(), 1));
@@ -281,14 +272,14 @@
         {
             StringBuilder cc = new StringBuilder();
             StringBuilder arg = new StringBuilder();
-            cc.append("("); arg.append("(");
+            cc.append('('); arg.append('(');
             for (int d = 0 ; d <= depth ; d++)
             {
-                if (d > 0) { cc.append(","); arg.append(","); }
+                if (d > 0) { cc.append(','); arg.append(','); }
                 cc.append(metadata.getClusteringColumns().get(d).getName());
-                arg.append("?");
+                arg.append('?');
             }
-            cc.append(")"); arg.append(")");
+            cc.append(')'); arg.append(')');
 
             ValidatingStatement[] statements = new ValidatingStatement[depth < maxDepth ? 1 : 4];
             int i = 0;

diff --git a/tools/stress/src/org/apache/cassandra/stress/settings/CliOption.java b/tools/stress/src/org/apache/cassandra/stress/settings/CliOption.java
index eb286ee..36284ab 100644
--- a/tools/stress/src/org/apache/cassandra/stress/settings/CliOption.java
+++ b/tools/stress/src/org/apache/cassandra/stress/settings/CliOption.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.settings;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,16 +8,16 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
@@ -32,13 +32,13 @@
     RATE("Thread count, rate limit or automatic mode (default is auto)", SettingsRate.helpPrinter()),
     MODE("Thrift or CQL with options", SettingsMode.helpPrinter()),
     ERRORS("How to handle errors when encountered during stress", SettingsErrors.helpPrinter()),
-    SAMPLE("Specify the number of samples to collect for measuring latency", SettingsSamples.helpPrinter()),
     SCHEMA("Replication settings, compression, compaction, etc.", SettingsSchema.helpPrinter()),
     NODE("Nodes to connect to", SettingsNode.helpPrinter()),
     LOG("Where to log progress to, and the interval at which to do it", SettingsLog.helpPrinter()),
     TRANSPORT("Custom transport factories", SettingsTransport.helpPrinter()),
     PORT("The port to connect to cassandra nodes on", SettingsPort.helpPrinter()),
     SENDTO("-send-to", "Specify a stress server to send this command to", SettingsMisc.sendToDaemonHelpPrinter()),
+    GRAPH("-graph", "Graph recorded metrics", SettingsGraph.helpPrinter()),
     TOKENRANGE("Token range settings", SettingsTokenRange.helpPrinter())
     ;
 

diff --git a/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommandPreDefined.java b/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommandPreDefined.java
index c2f2591..8e93ff6 100644
--- a/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommandPreDefined.java
+++ b/tools/stress/src/org/apache/cassandra/stress/settings/SettingsCommandPreDefined.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.settings;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,16 +8,16 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
@@ -51,9 +51,9 @@
         final SeedManager seeds = new SeedManager(settings);
         return new OpDistributionFactory()
         {
-            public OpDistribution get(Timing timing, int sampleCount, boolean isWarmup)
+            public OpDistribution get(Timing timing, boolean isWarmup)
             {
-                return new FixedOpDistribution(PredefinedOperation.operation(type, timing.newTimer(type.toString(), sampleCount),
+                return new FixedOpDistribution(PredefinedOperation.operation(type, timing.newTimer(type.toString()),
                                                newGenerator(settings), seeds, settings, add));
             }
 

diff --git a/tools/stress/src/org/apache/cassandra/stress/settings/SettingsGraph.java b/tools/stress/src/org/apache/cassandra/stress/settings/SettingsGraph.java
new file mode 100644
index 0000000..22261c1
--- /dev/null
+++ b/tools/stress/src/org/apache/cassandra/stress/settings/SettingsGraph.java

@@ -0,0 +1,125 @@
+package org.apache.cassandra.stress.settings;
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+
+
+import java.io.File;
+import java.io.IOException;
+import java.io.Serializable;
+import java.text.SimpleDateFormat;
+import java.util.Arrays;
+import java.util.Date;
+import java.util.List;
+import java.util.Map;
+
+public class SettingsGraph implements Serializable
+{
+    public final String file;
+    public final String revision;
+    public final String title;
+    public final String operation;
+    public final File temporaryLogFile;
+
+    public SettingsGraph(GraphOptions options, SettingsCommand stressCommand)
+    {
+        file = options.file.value();
+        revision = options.revision.value();
+        title = options.revision.value() == null
+            ? "cassandra-stress - " + new SimpleDateFormat("yyyy-mm-dd hh:mm:ss").format(new Date())
+            : options.title.value();
+
+        operation = options.operation.value() == null
+            ? stressCommand.type.name()
+            : options.operation.value();
+
+        if (inGraphMode())
+        {
+            try
+            {
+                temporaryLogFile = File.createTempFile("cassandra-stress", ".log");
+            }
+            catch (IOException e)
+            {
+                throw new RuntimeException("Cannot open temporary file");
+            }
+        }
+        else
+        {
+            temporaryLogFile = null;
+        }
+    }
+
+    public boolean inGraphMode()
+    {
+        return this.file == null ? false : true;
+    }
+
+    // Option Declarations
+    private static final class GraphOptions extends GroupedOptions
+    {
+        final OptionSimple file = new OptionSimple("file=", ".*", null, "HTML file to create or append to", true);
+        final OptionSimple revision = new OptionSimple("revision=", ".*", "unknown", "Unique name to assign to the current configuration being stressed", false);
+        final OptionSimple title = new OptionSimple("title=", ".*", null, "Title for chart (current date by default)", false);
+        final OptionSimple operation = new OptionSimple("op=", ".*", null, "Alternative name for current operation (stress op name used by default)", false);
+
+        @Override
+        public List<? extends Option> options()
+        {
+            return Arrays.asList(file, revision, title, operation);
+        }
+    }
+
+    // CLI Utility Methods
+    public static SettingsGraph get(Map<String, String[]> clArgs, SettingsCommand stressCommand)
+    {
+        String[] params = clArgs.remove("-graph");
+        if (params == null)
+        {
+            return new SettingsGraph(new GraphOptions(), stressCommand);
+        }
+        GraphOptions options = GroupedOptions.select(params, new GraphOptions());
+        if (options == null)
+        {
+            printHelp();
+            System.out.println("Invalid -graph options provided, see output for valid options");
+            System.exit(1);
+        }
+        return new SettingsGraph(options, stressCommand);
+    }
+
+    public static void printHelp()
+    {
+        GroupedOptions.printOptions(System.out, "-graph", new GraphOptions());
+    }
+
+    public static Runnable helpPrinter()
+    {
+        return new Runnable()
+        {
+            @Override
+            public void run()
+            {
+                printHelp();
+            }
+        };
+    }
+}
+

diff --git a/tools/stress/src/org/apache/cassandra/stress/settings/SettingsLog.java b/tools/stress/src/org/apache/cassandra/stress/settings/SettingsLog.java
index 5657fb2..602435c 100644
--- a/tools/stress/src/org/apache/cassandra/stress/settings/SettingsLog.java
+++ b/tools/stress/src/org/apache/cassandra/stress/settings/SettingsLog.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.settings;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,23 +8,22 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
-import java.io.File;
-import java.io.FileNotFoundException;
-import java.io.PrintStream;
-import java.io.Serializable;
+import org.apache.cassandra.stress.util.MultiPrintStream;
+
+import java.io.*;
 import java.util.Arrays;
 import java.util.List;
 import java.util.Map;
@@ -38,6 +37,7 @@
 
     public final boolean noSummary;
     public final File file;
+    public final File hdrFile;
     public final int intervalMillis;
     public final Level level;
 
@@ -49,7 +49,10 @@
             file = new File(options.outputFile.value());
         else
             file = null;
-
+        if (options.hdrOutputFile.setByUser())
+            hdrFile = new File(options.hdrOutputFile.value());
+        else
+            hdrFile = null;
         String interval = options.interval.value();
         if (interval.endsWith("ms"))
             intervalMillis = Integer.parseInt(interval.substring(0, interval.length() - 2));
@@ -62,9 +65,15 @@
         level = Level.valueOf(options.level.value().toUpperCase());
     }
 
-    public PrintStream getOutput() throws FileNotFoundException
+    public MultiPrintStream getOutput() throws FileNotFoundException
     {
-        return file == null ? new PrintStream(System.out) : new PrintStream(file);
+        // Always print to stdout regardless of whether we're graphing or not
+        MultiPrintStream stream = new MultiPrintStream(new PrintStream(System.out));
+
+        if (file != null)
+            stream.addStream(new PrintStream(file));
+
+        return stream;
     }
 
     // Option Declarations
@@ -73,13 +82,14 @@
     {
         final OptionSimple noSummmary = new OptionSimple("no-summary", "", null, "Disable printing of aggregate statistics at the end of a test", false);
         final OptionSimple outputFile = new OptionSimple("file=", ".*", null, "Log to a file", false);
+        final OptionSimple hdrOutputFile = new OptionSimple("hdrfile=", ".*", null, "Log to a file", false);
         final OptionSimple interval = new OptionSimple("interval=", "[0-9]+(ms|s|)", "1s", "Log progress every <value> seconds or milliseconds", false);
         final OptionSimple level = new OptionSimple("level=", "(minimal|normal|verbose)", "normal", "Logging level (minimal, normal or verbose)", false);
 
         @Override
         public List<? extends Option> options()
         {
-            return Arrays.asList(level, noSummmary, outputFile, interval);
+            return Arrays.asList(level, noSummmary, outputFile, hdrOutputFile, interval);
         }
     }
 

diff --git a/tools/stress/src/org/apache/cassandra/stress/settings/SettingsNode.java b/tools/stress/src/org/apache/cassandra/stress/settings/SettingsNode.java
index ba1fcb5..89b7871 100644
--- a/tools/stress/src/org/apache/cassandra/stress/settings/SettingsNode.java
+++ b/tools/stress/src/org/apache/cassandra/stress/settings/SettingsNode.java

@@ -33,6 +33,7 @@
 {
     public final List<String> nodes;
     public final boolean isWhiteList;
+    public final String datacenter;
 
     public SettingsNode(Options options)
     {
@@ -64,8 +65,12 @@
 
         }
         else
+        {
             nodes = Arrays.asList(options.list.value().split(","));
+        }
+
         isWhiteList = options.whitelist.setByUser();
+        datacenter = options.datacenter.value();
     }
 
     public Set<String> resolveAllPermitted(StressSettings settings)
@@ -135,6 +140,7 @@
 
     public static final class Options extends GroupedOptions
     {
+        final OptionSimple datacenter = new OptionSimple("datacenter=", ".*", null, "Datacenter used for DCAwareRoundRobinLoadPolicy", false);
         final OptionSimple whitelist = new OptionSimple("whitelist", "", null, "Limit communications to the provided nodes", false);
         final OptionSimple file = new OptionSimple("file=", ".*", null, "Node file (one per line)", false);
         final OptionSimple list = new OptionSimple("", "[^=,]+(,[^=,]+)*", "localhost", "comma delimited list of nodes", false);
@@ -142,7 +148,7 @@
         @Override
         public List<? extends Option> options()
         {
-            return Arrays.asList(whitelist, file, list);
+            return Arrays.asList(datacenter, whitelist, file, list);
         }
     }
 

diff --git a/tools/stress/src/org/apache/cassandra/stress/settings/SettingsRate.java b/tools/stress/src/org/apache/cassandra/stress/settings/SettingsRate.java
index 0486678..7a5995b 100644
--- a/tools/stress/src/org/apache/cassandra/stress/settings/SettingsRate.java
+++ b/tools/stress/src/org/apache/cassandra/stress/settings/SettingsRate.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.settings;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,16 +8,16 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
@@ -33,14 +33,22 @@
     public final int minThreads;
     public final int maxThreads;
     public final int threadCount;
-    public final int opRateTargetPerSecond;
+    public final int opsPerSecond;
+    public final boolean isFixed;
 
     public SettingsRate(ThreadOptions options)
     {
         auto = false;
         threadCount = Integer.parseInt(options.threads.value());
-        String rateOpt = options.rate.value();
-        opRateTargetPerSecond = Integer.parseInt(rateOpt.substring(0, rateOpt.length() - 2));
+        String throttleOpt = options.throttle.value();
+        String fixedOpt = options.fixed.value();
+        int throttle = Integer.parseInt(throttleOpt.substring(0, throttleOpt.length() - 2));
+        int fixed = Integer.parseInt(fixedOpt.substring(0, fixedOpt.length() - 2));
+        if(throttle != 0 && fixed != 0)
+            throw new IllegalArgumentException("can't have both fixed and throttle set, choose one.");
+        opsPerSecond = Math.max(fixed, throttle);
+        isFixed = (opsPerSecond == fixed);
+
         minThreads = -1;
         maxThreads = -1;
     }
@@ -51,7 +59,8 @@
         this.minThreads = Integer.parseInt(auto.minThreads.value());
         this.maxThreads = Integer.parseInt(auto.maxThreads.value());
         this.threadCount = -1;
-        this.opRateTargetPerSecond = 0;
+        this.opsPerSecond = 0;
+        isFixed = false;
     }
 
 
@@ -73,12 +82,13 @@
     private static final class ThreadOptions extends GroupedOptions
     {
         final OptionSimple threads = new OptionSimple("threads=", "[0-9]+", null, "run this many clients concurrently", true);
-        final OptionSimple rate = new OptionSimple("limit=", "[0-9]+/s", "0/s", "limit operations per second across all clients", false);
+        final OptionSimple throttle = new OptionSimple("throttle=", "[0-9]+/s", "0/s", "throttle operations per second across all clients to a maximum rate (or less) with no implied schedule", false);
+        final OptionSimple fixed = new OptionSimple("fixed=", "[0-9]+/s", "0/s", "expect fixed rate of operations per second across all clients with implied schedule", false);
 
         @Override
         public List<? extends Option> options()
         {
-            return Arrays.asList(threads, rate);
+            return Arrays.asList(threads, throttle, fixed);
         }
     }
 

diff --git a/tools/stress/src/org/apache/cassandra/stress/settings/SettingsSamples.java b/tools/stress/src/org/apache/cassandra/stress/settings/SettingsSamples.java
deleted file mode 100644
index 7a9f484..0000000
--- a/tools/stress/src/org/apache/cassandra/stress/settings/SettingsSamples.java
+++ /dev/null

@@ -1,94 +0,0 @@
-package org.apache.cassandra.stress.settings;
-/*
- * 
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- * 
- *   http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- * 
- */
-
-
-import java.io.Serializable;
-import java.util.Arrays;
-import java.util.List;
-import java.util.Map;
-
-public class SettingsSamples implements Serializable
-{
-
-    public final int liveCount;
-    public final int historyCount;
-    public final int reportCount;
-
-    public SettingsSamples(SampleOptions options)
-    {
-        liveCount = (int) OptionDistribution.parseLong(options.liveCount.value());
-        historyCount = (int) OptionDistribution.parseLong(options.historyCount.value());
-        reportCount = (int) OptionDistribution.parseLong(options.reportCount.value());
-    }
-
-    // Option Declarations
-
-    private static final class SampleOptions extends GroupedOptions
-    {
-        final OptionSimple historyCount = new OptionSimple("history=", "[0-9]+[bmk]?", "50K", "The number of samples to save across the whole run", false);
-        final OptionSimple liveCount = new OptionSimple("live=", "[0-9]+[bmk]?", "1M", "The number of samples to save between reports", false);
-        final OptionSimple reportCount = new OptionSimple("report=", "[0-9]+[bmk]?", "100K", "The maximum number of samples to use when building a report", false);
-
-        @Override
-        public List<? extends Option> options()
-        {
-            return Arrays.asList(historyCount, liveCount, reportCount);
-        }
-    }
-
-    // CLI Utility Methods
-
-    public static SettingsSamples get(Map<String, String[]> clArgs)
-    {
-        String[] params = clArgs.remove("-sample");
-        if (params == null)
-        {
-            return new SettingsSamples(new SampleOptions());
-        }
-        SampleOptions options = GroupedOptions.select(params, new SampleOptions());
-        if (options == null)
-        {
-            printHelp();
-            System.out.println("Invalid -sample options provided, see output for valid options");
-            System.exit(1);
-        }
-        return new SettingsSamples(options);
-    }
-
-    public static void printHelp()
-    {
-        GroupedOptions.printOptions(System.out, "-sample", new SampleOptions());
-    }
-
-    public static Runnable helpPrinter()
-    {
-        return new Runnable()
-        {
-            @Override
-            public void run()
-            {
-                printHelp();
-            }
-        };
-    }
-}
-

diff --git a/tools/stress/src/org/apache/cassandra/stress/settings/StressSettings.java b/tools/stress/src/org/apache/cassandra/stress/settings/StressSettings.java
index 5b1f861..de03737 100644
--- a/tools/stress/src/org/apache/cassandra/stress/settings/StressSettings.java
+++ b/tools/stress/src/org/apache/cassandra/stress/settings/StressSettings.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.settings;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,16 +8,16 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
@@ -45,7 +45,6 @@
     public final SettingsPopulation generate;
     public final SettingsInsert insert;
     public final SettingsColumn columns;
-    public final SettingsSamples samples;
     public final SettingsErrors errors;
     public final SettingsLog log;
     public final SettingsMode mode;
@@ -54,16 +53,30 @@
     public final SettingsTransport transport;
     public final SettingsPort port;
     public final String sendToDaemon;
-
+    public final SettingsGraph graph;
     public final SettingsTokenRange tokenRange;
-    public StressSettings(SettingsCommand command, SettingsRate rate, SettingsPopulation generate, SettingsInsert insert, SettingsColumn columns, SettingsSamples samples, SettingsErrors errors, SettingsLog log, SettingsMode mode, SettingsNode node, SettingsSchema schema, SettingsTransport transport, SettingsPort port, String sendToDaemon, SettingsTokenRange tokenRange)
+
+    public StressSettings(SettingsCommand command,
+                          SettingsRate rate,
+                          SettingsPopulation generate,
+                          SettingsInsert insert,
+                          SettingsColumn columns,
+                          SettingsErrors errors,
+                          SettingsLog log,
+                          SettingsMode mode,
+                          SettingsNode node,
+                          SettingsSchema schema,
+                          SettingsTransport transport,
+                          SettingsPort port,
+                          String sendToDaemon,
+                          SettingsGraph graph,
+                          SettingsTokenRange tokenRange)
     {
         this.command = command;
         this.rate = rate;
         this.insert = insert;
         this.generate = generate;
         this.columns = columns;
-        this.samples = samples;
         this.errors = errors;
         this.log = log;
         this.mode = mode;
@@ -72,6 +85,7 @@
         this.transport = transport;
         this.port = port;
         this.sendToDaemon = sendToDaemon;
+        this.graph = graph;
         this.tokenRange = tokenRange;
     }
 
@@ -167,6 +181,8 @@
     }
 
     private static volatile JavaDriverClient client;
+    private static volatile int numFailures;
+    private static int MAX_NUM_FAILURES = 10;
 
     public JavaDriverClient getJavaDriverClient()
     {
@@ -178,9 +194,12 @@
         if (client != null)
             return client;
 
-        try
+        synchronized (this)
         {
-            synchronized (this)
+            if (numFailures >= MAX_NUM_FAILURES)
+                throw new RuntimeException("Failed to create client too many times");
+
+            try
             {
                 String currentNode = node.randomNode();
                 if (client != null)
@@ -194,10 +213,11 @@
 
                 return client = c;
             }
-        }
-        catch (Exception e)
-        {
-            throw new RuntimeException(e);
+            catch (Exception e)
+            {
+                numFailures +=1;
+                throw new RuntimeException(e);
+            }
         }
     }
 
@@ -211,22 +231,14 @@
 
     public static StressSettings parse(String[] args)
     {
-        try
-        {
-            args = repairParams(args);
-            final Map<String, String[]> clArgs = parseMap(args);
-            if (clArgs.containsKey("legacy"))
-                return Legacy.build(Arrays.copyOfRange(args, 1, args.length));
-            if (SettingsMisc.maybeDoSpecial(clArgs))
-                System.exit(1);
-            return get(clArgs);
-        }
-        catch (IllegalArgumentException e)
-        {
-            System.out.println(e.getMessage());
-            System.exit(1);
-            throw new AssertionError();
-        }
+        args = repairParams(args);
+        final Map<String, String[]> clArgs = parseMap(args);
+        if (clArgs.containsKey("legacy"))
+            return Legacy.build(Arrays.copyOfRange(args, 1, args.length));
+        if (SettingsMisc.maybeDoSpecial(clArgs))
+            return null;
+        return get(clArgs);
+
     }
 
     private static String[] repairParams(String[] args)
@@ -258,13 +270,13 @@
         SettingsTokenRange tokenRange = SettingsTokenRange.get(clArgs);
         SettingsInsert insert = SettingsInsert.get(clArgs);
         SettingsColumn columns = SettingsColumn.get(clArgs);
-        SettingsSamples samples = SettingsSamples.get(clArgs);
         SettingsErrors errors = SettingsErrors.get(clArgs);
         SettingsLog log = SettingsLog.get(clArgs);
         SettingsMode mode = SettingsMode.get(clArgs);
         SettingsNode node = SettingsNode.get(clArgs);
         SettingsSchema schema = SettingsSchema.get(clArgs, command);
         SettingsTransport transport = SettingsTransport.get(clArgs);
+        SettingsGraph graph = SettingsGraph.get(clArgs, command);
         if (!clArgs.isEmpty())
         {
             printHelp();
@@ -281,7 +293,8 @@
             }
             System.exit(1);
         }
-        return new StressSettings(command, rate, generate, insert, columns, samples, errors, log, mode, node, schema, transport, port, sendToDaemon, tokenRange);
+
+        return new StressSettings(command, rate, generate, insert, columns, errors, log, mode, node, schema, transport, port, sendToDaemon, graph, tokenRange);
     }
 
     private static Map<String, String[]> parseMap(String[] args)

diff --git a/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java b/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java
index 4f173b4..53d8786 100644
--- a/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java
+++ b/tools/stress/src/org/apache/cassandra/stress/util/JavaDriverClient.java

@@ -24,6 +24,7 @@
 
 import com.datastax.driver.core.*;
 import com.datastax.driver.core.policies.DCAwareRoundRobinPolicy;
+import com.datastax.driver.core.policies.LoadBalancingPolicy;
 import com.datastax.driver.core.policies.WhiteListPolicy;
 import io.netty.util.internal.logging.InternalLoggerFactory;
 import io.netty.util.internal.logging.Slf4JLoggerFactory;
@@ -51,7 +52,7 @@
     private final EncryptionOptions.ClientEncryptionOptions encryptionOptions;
     private Cluster cluster;
     private Session session;
-    private final WhiteListPolicy whitelist;
+    private final LoadBalancingPolicy loadBalancingPolicy;
 
     private static final ConcurrentMap<String, PreparedStatement> stmts = new ConcurrentHashMap<>();
 
@@ -69,10 +70,18 @@
         this.password = settings.mode.password;
         this.authProvider = settings.mode.authProvider;
         this.encryptionOptions = encryptionOptions;
+
+        DCAwareRoundRobinPolicy.Builder policyBuilder = DCAwareRoundRobinPolicy.builder();
+        if (settings.node.datacenter != null)
+            policyBuilder.withLocalDc(settings.node.datacenter);
+
         if (settings.node.isWhiteList)
-            whitelist = new WhiteListPolicy(DCAwareRoundRobinPolicy.builder().build(), settings.node.resolveAll(settings.port.nativePort));
+            loadBalancingPolicy = new WhiteListPolicy(policyBuilder.build(), settings.node.resolveAll(settings.port.nativePort));
+        else if (settings.node.datacenter != null)
+            loadBalancingPolicy = policyBuilder.build();
         else
-            whitelist = null;
+            loadBalancingPolicy = null;
+
         connectionsPerHost = settings.mode.connectionsPerHost == null ? 8 : settings.mode.connectionsPerHost;
 
         int maxThreadCount = 0;
@@ -119,8 +128,8 @@
                                                 .withoutJMXReporting()
                                                 .withProtocolVersion(protocolVersion)
                                                 .withoutMetrics(); // The driver uses metrics 3 with conflict with our version
-        if (whitelist != null)
-            clusterBuilder.withLoadBalancingPolicy(whitelist);
+        if (loadBalancingPolicy != null)
+            clusterBuilder.withLoadBalancingPolicy(loadBalancingPolicy);
         clusterBuilder.withCompression(compression);
         if (encryptionOptions.enabled)
         {

diff --git a/tools/stress/src/org/apache/cassandra/stress/util/MultiPrintStream.java b/tools/stress/src/org/apache/cassandra/stress/util/MultiPrintStream.java
new file mode 100644
index 0000000..d299013
--- /dev/null
+++ b/tools/stress/src/org/apache/cassandra/stress/util/MultiPrintStream.java

@@ -0,0 +1,288 @@
+/*
+ *
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ *
+ */
+package org.apache.cassandra.stress.util;
+
+import java.io.*;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Locale;
+
+/** PrintStream that multiplexes to multiple streams */
+public class MultiPrintStream extends PrintStream
+{
+    private List<PrintStream> newStreams;
+
+    public MultiPrintStream(PrintStream baseStream)
+    {
+        super(baseStream);
+        this.newStreams = new ArrayList();
+    }
+
+    public void addStream(PrintStream printStream)
+    {
+        newStreams.add(printStream);
+    }
+
+    @Override
+    public void flush()
+    {
+        super.flush();
+        for (PrintStream s : newStreams)
+            s.flush();
+    }
+
+    @Override
+    public void close()
+    {
+        super.close();
+        for (PrintStream s : newStreams)
+            s.close();
+    }
+
+    @Override
+    public boolean checkError()
+    {
+        boolean error = super.checkError();
+        for (PrintStream s : newStreams)
+        {
+            if (s.checkError())
+                error = true;
+        }
+        return error;
+    }
+
+    @Override
+    public void write(int b)
+    {
+        super.write(b);
+        for (PrintStream s: newStreams)
+            s.write(b);
+    }
+
+    @Override
+    public void write(byte[] buf, int off, int len)
+    {
+        super.write(buf, off, len);
+        for (PrintStream s: newStreams)
+            s.write(buf, off, len);
+    }
+
+    @Override
+    public void print(boolean b)
+    {
+        super.print(b);
+        for (PrintStream s: newStreams)
+            s.print(b);
+    }
+
+    @Override
+    public void print(char c)
+    {
+        super.print(c);
+        for (PrintStream s: newStreams)
+            s.print(c);
+    }
+
+    @Override
+    public void print(int i)
+    {
+        super.print(i);
+        for (PrintStream s: newStreams)
+            s.print(i);
+    }
+
+    @Override
+    public void print(long l)
+    {
+        super.print(l);
+        for (PrintStream s: newStreams)
+            s.print(l);
+    }
+
+    @Override
+    public void print(float f)
+    {
+        super.print(f);
+        for (PrintStream s: newStreams)
+            s.print(f);
+    }
+
+    @Override
+    public void print(double d)
+    {
+        super.print(d);
+        for (PrintStream s: newStreams)
+            s.print(d);
+    }
+
+    @Override
+    public void print(char[] s)
+    {
+        super.print(s);
+        for (PrintStream stream: newStreams)
+            stream.print(s);
+    }
+
+    @Override
+    public void print(String s)
+    {
+        super.print(s);
+        for (PrintStream stream: newStreams)
+            stream.print(s);
+    }
+
+    @Override
+    public void print(Object obj)
+    {
+        super.print(obj);
+        for (PrintStream s: newStreams)
+            s.print(obj);
+    }
+
+    @Override
+    public void println()
+    {
+        super.println();
+        for (PrintStream s: newStreams)
+            s.println();
+    }
+
+    @Override
+    public void println(boolean x)
+    {
+        super.println(x);
+        for (PrintStream s: newStreams)
+            s.println(x);
+    }
+
+    @Override
+    public void println(char x)
+    {
+        super.println(x);
+        for (PrintStream s: newStreams)
+            s.println(x);
+    }
+
+    @Override
+    public void println(int x)
+    {
+        super.println(x);
+        for (PrintStream s: newStreams)
+            s.println(x);
+    }
+
+    @Override
+    public void println(long x)
+    {
+        super.println(x);
+        for (PrintStream s: newStreams)
+            s.println(x);
+    }
+
+    @Override
+    public void println(float x)
+    {
+        super.println(x);
+        for (PrintStream s: newStreams)
+            s.println(x);
+    }
+
+    @Override
+    public void println(double x)
+    {
+        super.println(x);
+        for (PrintStream s: newStreams)
+            s.println(x);
+    }
+
+    @Override
+    public void println(char[] x)
+    {
+        super.println(x);
+        for (PrintStream s: newStreams)
+            s.println(x);
+    }
+
+    @Override
+    public void println(String x)
+    {
+        super.println(x);
+        for (PrintStream s: newStreams)
+            s.println(x);
+    }
+
+    @Override
+    public void println(Object x)
+    {
+        super.println(x);
+        for (PrintStream s: newStreams)
+            s.println(x);
+    }
+
+    @Override
+    public PrintStream printf(String format, Object... args)
+    {
+        for (PrintStream s: newStreams)
+            s.printf(format, args);
+        return super.printf(format, args);
+    }
+
+    @Override
+    public PrintStream printf(Locale l, String format, Object... args)
+    {
+        for (PrintStream s: newStreams)
+            s.printf(l, format, args);
+        return super.printf(l, format, args);
+    }
+
+    @Override
+    public PrintStream append(CharSequence csq)
+    {
+        for (PrintStream s: newStreams)
+            s.append(csq);
+        return super.append(csq);
+    }
+
+    @Override
+    public PrintStream append(CharSequence csq, int start, int end)
+    {
+        for (PrintStream s: newStreams)
+            s.append(csq, start, end);
+        return super.append(csq, start, end);
+    }
+
+    @Override
+    public PrintStream append(char c)
+    {
+        for (PrintStream s: newStreams)
+            s.append(c);
+        return super.append(c);
+    }
+
+    @Override
+    public void write(byte[] b) throws IOException
+    {
+        super.write(b);
+        for (PrintStream s: newStreams)
+            s.write(b);
+    }
+
+}

diff --git a/tools/stress/src/org/apache/cassandra/stress/util/SampleOfLongs.java b/tools/stress/src/org/apache/cassandra/stress/util/SampleOfLongs.java
deleted file mode 100644
index ed54ee0..0000000
--- a/tools/stress/src/org/apache/cassandra/stress/util/SampleOfLongs.java
+++ /dev/null

@@ -1,111 +0,0 @@
-package org.apache.cassandra.stress.util;
-/*
- * 
- * Licensed to the Apache Software Foundation (ASF) under one
- * or more contributor license agreements.  See the NOTICE file
- * distributed with this work for additional information
- * regarding copyright ownership.  The ASF licenses this file
- * to you under the Apache License, Version 2.0 (the
- * "License"); you may not use this file except in compliance
- * with the License.  You may obtain a copy of the License at
- * 
- *   http://www.apache.org/licenses/LICENSE-2.0
- * 
- * Unless required by applicable law or agreed to in writing,
- * software distributed under the License is distributed on an
- * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
- * KIND, either express or implied.  See the License for the
- * specific language governing permissions and limitations
- * under the License.
- * 
- */
-
-
-import java.util.Arrays;
-import java.util.List;
-import java.util.Random;
-
-// represents a sample of long (latencies) together with the probability of selection of each sample (i.e. the ratio of
-// samples to total number of events). This is used to ensure that, when merging, the result has samples from each
-// with equal probability
-public final class SampleOfLongs
-{
-
-    // nanos
-    final long[] sample;
-
-    // probability with which each sample was selected
-    final double p;
-
-    SampleOfLongs(long[] sample, int p)
-    {
-        this.sample = sample;
-        this.p = 1 / (float) p;
-    }
-
-    SampleOfLongs(long[] sample, double p)
-    {
-        this.sample = sample;
-        this.p = p;
-    }
-
-    static SampleOfLongs merge(Random rnd, List<SampleOfLongs> merge, int maxSamples)
-    {
-        // grab the lowest probability of selection, and normalise all samples to that
-        double targetp = 1;
-        for (SampleOfLongs sampleOfLongs : merge)
-            targetp = Math.min(targetp, sampleOfLongs.p);
-
-        // calculate how many samples we should encounter
-        int maxLength = 0;
-        for (SampleOfLongs sampleOfLongs : merge)
-            maxLength += sampleOfLongs.sample.length * (targetp / sampleOfLongs.p);
-
-        if (maxLength > maxSamples)
-        {
-            targetp *= maxSamples / (double) maxLength;
-            maxLength = maxSamples;
-        }
-
-        long[] sample = new long[maxLength];
-        int count = 0;
-        out: for (SampleOfLongs latencies : merge)
-        {
-            long[] in = latencies.sample;
-            double p = targetp / latencies.p;
-            for (int i = 0 ; i < in.length ; i++)
-            {
-                if (rnd.nextDouble() < p)
-                {
-                    sample[count++] = in[i];
-                    if (count == maxLength)
-                        break out;
-                }
-            }
-        }
-        if (count != maxLength)
-            sample = Arrays.copyOf(sample, count);
-        Arrays.sort(sample);
-        return new SampleOfLongs(sample, targetp);
-    }
-
-    public double medianLatency()
-    {
-        if (sample.length == 0)
-            return 0;
-        return sample[sample.length >> 1] * 0.000001d;
-    }
-
-    // 0 < rank < 1
-    public double rankLatency(float rank)
-    {
-        if (sample.length == 0)
-            return 0;
-        int index = (int)(rank * sample.length);
-        if (index >= sample.length)
-            index = sample.length - 1;
-        return sample[index] * 0.000001d;
-    }
-
-}
-

diff --git a/tools/stress/src/org/apache/cassandra/stress/util/Timer.java b/tools/stress/src/org/apache/cassandra/stress/util/Timer.java
index 88e8020..bb19bb6 100644
--- a/tools/stress/src/org/apache/cassandra/stress/util/Timer.java
+++ b/tools/stress/src/org/apache/cassandra/stress/util/Timer.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.util;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,41 +8,40 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
-import java.util.Arrays;
-import java.util.List;
 import java.util.concurrent.CountDownLatch;
-import java.util.concurrent.ThreadLocalRandom;
+
+import org.HdrHistogram.Histogram;
 
 // a timer - this timer must be used by a single thread, and co-ordinates with other timers by
 public final class Timer
 {
-    private ThreadLocalRandom rnd;
+    private Histogram responseTime = new Histogram(3);
+    private Histogram serviceTime = new Histogram(3);
+    private Histogram waitTime = new Histogram(3);
 
-    // in progress snap start
-    private long sampleStartNanos;
+    // event timing info
+    private long intendedTimeNs;
+    private long startTimeNs;
+    private long endTimeNs;
 
-    // each entry is present with probability 1/p(opCount) or 1/(p(opCount)-1)
-    private final long[] sample;
-    private int opCount;
 
     // aggregate info
     private long errorCount;
     private long partitionCount;
     private long rowCount;
-    private long total;
     private long max;
     private long maxStart;
     private long upToDateAsOf;
@@ -52,26 +51,11 @@
     private volatile CountDownLatch reportRequest;
     volatile TimingInterval report;
     private volatile TimingInterval finalReport;
+    private final boolean isFixed;
 
-    public Timer(int sampleCount)
+    public Timer(boolean isFixed)
     {
-        int powerOf2 = 32 - Integer.numberOfLeadingZeros(sampleCount - 1);
-        this.sample = new long[1 << powerOf2];
-    }
-
-    public void init()
-    {
-        rnd = ThreadLocalRandom.current();
-    }
-
-    public void start(){
-        // decide if we're logging this event
-        sampleStartNanos = System.nanoTime();
-    }
-
-    private int p(int index)
-    {
-        return 1 + (index / sample.length);
+        this.isFixed = isFixed;
     }
 
     public boolean running()
@@ -81,46 +65,52 @@
 
     public void stop(long partitionCount, long rowCount, boolean error)
     {
+        endTimeNs = System.nanoTime();
         maybeReport();
         long now = System.nanoTime();
-        long time = now - sampleStartNanos;
-        if (rnd.nextInt(p(opCount)) == 0)
-            sample[index(opCount)] = time;
-        if (time > max)
+        if (intendedTimeNs != 0)
         {
-            maxStart = sampleStartNanos;
-            max = time;
+            long rTime = endTimeNs - intendedTimeNs;
+            responseTime.recordValue(rTime);
+            long wTime = startTimeNs - intendedTimeNs;
+            waitTime.recordValue(wTime);
         }
-        total += time;
-        opCount += 1;
+
+        long sTime = endTimeNs - startTimeNs;
+        serviceTime.recordValue(sTime);
+
+        if (sTime > max)
+        {
+            maxStart = startTimeNs;
+            max = sTime;
+        }
         this.partitionCount += partitionCount;
         this.rowCount += rowCount;
         if (error)
             this.errorCount++;
         upToDateAsOf = now;
+        resetTimes();
     }
 
-    private int index(int count)
+    private void resetTimes()
     {
-        return count & (sample.length - 1);
+        intendedTimeNs = startTimeNs = endTimeNs = 0;
     }
 
     private TimingInterval buildReport()
     {
-        final List<SampleOfLongs> sampleLatencies = Arrays.asList
-                (       new SampleOfLongs(Arrays.copyOf(sample, index(opCount)), p(opCount)),
-                        new SampleOfLongs(Arrays.copyOfRange(sample, index(opCount), Math.min(opCount, sample.length)), p(opCount) - 1)
-                );
-        final TimingInterval report = new TimingInterval(lastSnap, upToDateAsOf, max, maxStart, max, partitionCount,
-                rowCount, total, opCount, errorCount, SampleOfLongs.merge(rnd, sampleLatencies, Integer.MAX_VALUE));
+        final TimingInterval report = new TimingInterval(lastSnap, upToDateAsOf, maxStart, partitionCount,
+                rowCount, errorCount, responseTime, serviceTime, waitTime, isFixed);
         // reset counters
-        opCount = 0;
         partitionCount = 0;
         rowCount = 0;
-        total = 0;
         max = 0;
         errorCount = 0;
         lastSnap = upToDateAsOf;
+        responseTime = new Histogram(3);
+        serviceTime = new Histogram(3);
+        waitTime = new Histogram(3);
+
         return report;
     }
 
@@ -164,4 +154,14 @@
             reportRequest = null;
         }
     }
+
+    public void intendedTimeNs(long v)
+    {
+        intendedTimeNs = v;
+    }
+
+    public void start()
+    {
+        startTimeNs = System.nanoTime();
+    }
 }
\ No newline at end of file

diff --git a/tools/stress/src/org/apache/cassandra/stress/util/Timing.java b/tools/stress/src/org/apache/cassandra/stress/util/Timing.java
index 403bee0..fa95fdb 100644
--- a/tools/stress/src/org/apache/cassandra/stress/util/Timing.java
+++ b/tools/stress/src/org/apache/cassandra/stress/util/Timing.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.util;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,16 +8,16 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
 
@@ -38,14 +38,12 @@
     // Probably the CopyOnWriteArrayList could be changed to an ordinary list as well.
     private final Map<String, List<Timer>> timers = new TreeMap<>();
     private volatile TimingIntervals history;
-    private final int historySampleCount;
-    private final int reportSampleCount;
     private boolean done;
+    private boolean isFixed;
 
-    public Timing(int historySampleCount, int reportSampleCount)
+    public Timing(boolean isFixed)
     {
-        this.historySampleCount = historySampleCount;
-        this.reportSampleCount = reportSampleCount;
+        this.isFixed = isFixed;
     }
 
     // TIMING
@@ -111,20 +109,20 @@
                 done &= !timer.running();
             }
 
-            intervals.put(entry.getKey(), TimingInterval.merge(operationIntervals, reportSampleCount,
+            intervals.put(entry.getKey(), TimingInterval.merge(operationIntervals,
                                                               history.get(entry.getKey()).endNanos()));
         }
 
         TimingIntervals result = new TimingIntervals(intervals);
         this.done = done;
-        history = history.merge(result, historySampleCount, history.startNanos());
+        history = history.merge(result, history.startNanos());
         return new TimingResult<>(extra, result);
     }
 
     // build a new timer and add it to the set of running timers.
-    public Timer newTimer(String opType, int sampleCount)
+    public Timer newTimer(String opType)
     {
-        final Timer timer = new Timer(sampleCount);
+        final Timer timer = new Timer(isFixed);
 
         if (!timers.containsKey(opType))
             timers.put(opType, new ArrayList<Timer>());

diff --git a/tools/stress/src/org/apache/cassandra/stress/util/TimingInterval.java b/tools/stress/src/org/apache/cassandra/stress/util/TimingInterval.java
index 6be71c8..bb9587f 100644
--- a/tools/stress/src/org/apache/cassandra/stress/util/TimingInterval.java
+++ b/tools/stress/src/org/apache/cassandra/stress/util/TimingInterval.java

@@ -1,6 +1,6 @@
 package org.apache.cassandra.stress.util;
 /*
- * 
+ *
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -8,166 +8,193 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
  * KIND, either express or implied.  See the License for the
  * specific language governing permissions and limitations
  * under the License.
- * 
+ *
  */
 
-import java.util.*;
-import java.util.concurrent.ThreadLocalRandom;
+import org.HdrHistogram.Histogram;
 
 // represents measurements taken over an interval of time
 // used for both single timer results and merged timer results
 public final class TimingInterval
 {
+    private final Histogram responseTime;
+    private final Histogram serviceTime;
+    private final Histogram waitTime;
+
+    public static final long[] EMPTY_SAMPLE = new long[0];
     // nanos
-    private final long start;
-    private final long end;
-    public final long maxLatency;
-    public final long pauseLength;
+    private final long startNs;
+    private final long endNs;
     public final long pauseStart;
-    public final long totalLatency;
 
     // discrete
     public final long partitionCount;
     public final long rowCount;
-    public final long operationCount;
     public final long errorCount;
+    public final boolean isFixed;
 
-    final SampleOfLongs sample;
 
     public String toString()
     {
-        return String.format("Start: %d end: %d maxLatency: %d pauseLength: %d pauseStart: %d totalLatency: %d" +
-                             " pCount: %d rcount: %d opCount: %d errors: %d", start, end, maxLatency, pauseLength,
-                             pauseStart, totalLatency, partitionCount, rowCount, operationCount, errorCount);
+        return String.format("Start: %d end: %d maxLatency: %d pauseStart: %d" +
+                             " pCount: %d rcount: %d opCount: %d errors: %d",
+                             startNs, endNs, getLatencyHistogram().getMaxValue(), pauseStart,
+                             partitionCount, rowCount, getLatencyHistogram().getTotalCount(), errorCount);
     }
 
     TimingInterval(long time)
     {
-        start = end = time;
-        maxLatency = totalLatency = 0;
-        partitionCount = rowCount = operationCount = errorCount = 0;
-        pauseStart = pauseLength = 0;
-        sample = new SampleOfLongs(new long[0], 1d);
+        startNs = endNs = time;
+        partitionCount = rowCount = errorCount = 0;
+        pauseStart = 0;
+        responseTime = new Histogram(3);
+        serviceTime = new Histogram(3);
+        waitTime = new Histogram(3);
+        isFixed = false;
     }
 
-    TimingInterval(long start, long end, long maxLatency, long pauseStart, long pauseLength, long partitionCount,
-                   long rowCount, long totalLatency, long operationCount, long errorCount, SampleOfLongs sample)
+    TimingInterval(long start, long end, long maxPauseStart, long partitionCount,
+                   long rowCount, long errorCount, Histogram r, Histogram s, Histogram w, boolean isFixed)
     {
-        this.start = start;
-        this.end = Math.max(end, start);
-        this.maxLatency = maxLatency;
+        this.startNs = start;
+        this.endNs = Math.max(end, start);
         this.partitionCount = partitionCount;
         this.rowCount = rowCount;
-        this.totalLatency = totalLatency;
         this.errorCount = errorCount;
-        this.operationCount = operationCount;
-        this.pauseStart = pauseStart;
-        this.pauseLength = pauseLength;
-        this.sample = sample;
+        this.pauseStart = maxPauseStart;
+        this.responseTime = r;
+        this.serviceTime = s;
+        this.waitTime = w;
+        this.isFixed = isFixed;
+
     }
 
     // merge multiple timer intervals together
-    static TimingInterval merge(Iterable<TimingInterval> intervals, int maxSamples, long start)
+    static TimingInterval merge(Iterable<TimingInterval> intervals, long start)
     {
-        ThreadLocalRandom rnd = ThreadLocalRandom.current();
-        long operationCount = 0, partitionCount = 0, rowCount = 0, errorCount = 0;
-        long maxLatency = 0, totalLatency = 0;
-        List<SampleOfLongs> latencies = new ArrayList<>();
+        long partitionCount = 0, rowCount = 0, errorCount = 0;
         long end = 0;
-        long pauseStart = 0, pauseEnd = Long.MAX_VALUE;
+        long pauseStart = 0;
+        Histogram responseTime = new Histogram(3);
+        Histogram serviceTime = new Histogram(3);
+        Histogram waitTime = new Histogram(3);
+        boolean isFixed = false;
         for (TimingInterval interval : intervals)
         {
             if (interval != null)
             {
-                end = Math.max(end, interval.end);
-                operationCount += interval.operationCount;
-                maxLatency = Math.max(interval.maxLatency, maxLatency);
-                totalLatency += interval.totalLatency;
+                end = Math.max(end, interval.endNs);
                 partitionCount += interval.partitionCount;
                 rowCount += interval.rowCount;
                 errorCount += interval.errorCount;
-                latencies.addAll(Arrays.asList(interval.sample));
-                if (interval.pauseLength > 0)
+
+                if (interval.getLatencyHistogram().getMaxValue() > serviceTime.getMaxValue())
                 {
-                    pauseStart = Math.max(pauseStart, interval.pauseStart);
-                    pauseEnd = Math.min(pauseEnd, interval.pauseStart + interval.pauseLength);
+                    pauseStart = interval.pauseStart;
                 }
+                responseTime.add(interval.responseTime);
+                serviceTime.add(interval.serviceTime);
+                waitTime.add(interval.waitTime);
+                isFixed |= interval.isFixed;
             }
         }
 
-        if (pauseEnd < pauseStart || pauseStart <= 0)
-        {
-            pauseEnd = pauseStart = 0;
-        }
 
-        return new TimingInterval(start, end, maxLatency, pauseStart, pauseEnd - pauseStart, partitionCount, rowCount,
-                                  totalLatency, operationCount, errorCount, SampleOfLongs.merge(rnd, latencies, maxSamples));
+        return new TimingInterval(start, end, pauseStart, partitionCount, rowCount,
+                                  errorCount, responseTime, serviceTime, waitTime, isFixed);
 
     }
 
     public double opRate()
     {
-        return operationCount / ((end - start) * 0.000000001d);
+        return getLatencyHistogram().getTotalCount() / ((endNs - startNs) * 0.000000001d);
     }
 
     public double adjustedRowRate()
     {
-        return rowCount / ((end - (start + pauseLength)) * 0.000000001d);
+        return rowCount / ((endNs - (startNs + getLatencyHistogram().getMaxValue())) * 0.000000001d);
     }
 
     public double partitionRate()
     {
-        return partitionCount / ((end - start) * 0.000000001d);
+        return partitionCount / ((endNs - startNs) * 0.000000001d);
     }
 
     public double rowRate()
     {
-        return rowCount / ((end - start) * 0.000000001d);
+        return rowCount / ((endNs - startNs) * 0.000000001d);
     }
 
-    public double meanLatency()
+    public double meanLatencyMs()
     {
-        return (totalLatency / (double) operationCount) * 0.000001d;
+        return getLatencyHistogram().getMean() * 0.000001d;
     }
 
-    public double maxLatency()
+    public double maxLatencyMs()
     {
-        return maxLatency * 0.000001d;
+        return getLatencyHistogram().getMaxValue() * 0.000001d;
     }
 
-    public double medianLatency()
+    public double medianLatencyMs()
     {
-        return sample.medianLatency();
+        return getLatencyHistogram().getValueAtPercentile(50.0) * 0.000001d;
     }
 
-    // 0 < rank < 1
-    public double rankLatency(float rank)
+
+    /**
+     * @param percentile between 0.0 and 100.0
+     * @return latency in milliseconds at percentile
+     */
+    public double latencyAtPercentileMs(double percentile)
     {
-        return sample.rankLatency(rank);
+        return getLatencyHistogram().getValueAtPercentile(percentile) * 0.000001d;
     }
 
-    public long runTime()
+    public long runTimeMs()
     {
-        return (end - start) / 1000000;
+        return (endNs - startNs) / 1000000;
     }
 
-    public final long endNanos()
+    public long endNanos()
     {
-        return end;
+        return endNs;
     }
 
     public long startNanos()
     {
-        return start;
+        return startNs;
+    }
+
+    public Histogram responseTime()
+    {
+        return responseTime;
+    }
+
+    public Histogram serviceTime()
+    {
+        return serviceTime;
+    }
+
+    public Histogram waitTime()
+    {
+        return waitTime;
+    }
+
+    private Histogram getLatencyHistogram()
+    {
+        if (!isFixed || responseTime.getTotalCount() == 0)
+            return serviceTime;
+        else
+            return responseTime;
     }
 
     public static enum TimingParameter
@@ -181,22 +208,27 @@
         return getStringValue(value, Float.NaN);
     }
 
-    String getStringValue(TimingParameter value, float rank)
+    String getStringValue(TimingParameter value, double rank)
     {
         switch (value)
         {
-            case OPRATE:         return String.format("%.0f", opRate());
-            case ROWRATE:        return String.format("%.0f", rowRate());
-            case ADJROWRATE:     return String.format("%.0f", adjustedRowRate());
-            case PARTITIONRATE:  return String.format("%.0f", partitionRate());
-            case MEANLATENCY:    return String.format("%.1f", meanLatency());
-            case MAXLATENCY:     return String.format("%.1f", maxLatency());
-            case MEDIANLATENCY:  return String.format("%.1f", medianLatency());
-            case RANKLATENCY:    return String.format("%.1f", rankLatency(rank));
-            case ERRORCOUNT:     return String.format("%d", errorCount);
-            case PARTITIONCOUNT: return String.format("%d", partitionCount);
+            case OPRATE:         return String.format("%,.0f", opRate());
+            case ROWRATE:        return String.format("%,.0f", rowRate());
+            case ADJROWRATE:     return String.format("%,.0f", adjustedRowRate());
+            case PARTITIONRATE:  return String.format("%,.0f", partitionRate());
+            case MEANLATENCY:    return String.format("%,.1f", meanLatencyMs());
+            case MAXLATENCY:     return String.format("%,.1f", maxLatencyMs());
+            case MEDIANLATENCY:  return String.format("%,.1f", medianLatencyMs());
+            case RANKLATENCY:    return String.format("%,.1f", latencyAtPercentileMs(rank));
+            case ERRORCOUNT:     return String.format("%,d", errorCount);
+            case PARTITIONCOUNT: return String.format("%,d", partitionCount);
             default:             throw new IllegalStateException();
         }
     }
+
+    public long operationCount()
+    {
+        return getLatencyHistogram().getTotalCount();
+    }
  }
 

diff --git a/tools/stress/src/org/apache/cassandra/stress/util/TimingIntervals.java b/tools/stress/src/org/apache/cassandra/stress/util/TimingIntervals.java
index f989173..0586006 100644
--- a/tools/stress/src/org/apache/cassandra/stress/util/TimingIntervals.java
+++ b/tools/stress/src/org/apache/cassandra/stress/util/TimingIntervals.java

@@ -20,7 +20,7 @@
         this.intervals = intervals;
     }
 
-    public TimingIntervals merge(TimingIntervals with, int maxSamples, long start)
+    public TimingIntervals merge(TimingIntervals with, long start)
     {
         assert intervals.size() == with.intervals.size();
         TreeMap<String, TimingInterval> ret = new TreeMap<>();
@@ -28,7 +28,7 @@
         for (String opType : intervals.keySet())
         {
             assert with.intervals.containsKey(opType);
-            ret.put(opType, TimingInterval.merge(Arrays.asList(intervals.get(opType), with.intervals.get(opType)), maxSamples, start));
+            ret.put(opType, TimingInterval.merge(Arrays.asList(intervals.get(opType), with.intervals.get(opType)), start));
         }
 
         return new TimingIntervals(ret);
@@ -39,29 +39,34 @@
         return intervals.get(opType);
     }
 
-    public TimingInterval combine(int maxSamples)
+    public TimingInterval combine()
     {
         long start = Long.MAX_VALUE;
         for (TimingInterval ti : intervals.values())
             start = Math.min(start, ti.startNanos());
 
-        return TimingInterval.merge(intervals.values(), maxSamples, start);
+        return TimingInterval.merge(intervals.values(), start);
     }
 
-    public String str(TimingInterval.TimingParameter value)
+    public String str(TimingInterval.TimingParameter value, String unit)
     {
-        return str(value, Float.NaN);
+        return str(value, Double.NaN, unit);
     }
 
-    public String str(TimingInterval.TimingParameter value, float rank)
+    public String str(TimingInterval.TimingParameter value, double rank, String unit)
     {
         StringBuilder sb = new StringBuilder("[");
 
         for (Map.Entry<String, TimingInterval> entry : intervals.entrySet())
         {
             sb.append(entry.getKey());
-            sb.append(":");
+            sb.append(": ");
             sb.append(entry.getValue().getStringValue(value, rank));
+            if (unit.length() > 0)
+            {
+                sb.append(" ");
+                sb.append(unit);
+            }
             sb.append(", ");
         }
 
@@ -73,39 +78,47 @@
 
     public String opRates()
     {
-        return str(TimingInterval.TimingParameter.OPRATE);
+        return str(TimingInterval.TimingParameter.OPRATE, "op/s");
     }
+
     public String partitionRates()
     {
-        return str(TimingInterval.TimingParameter.PARTITIONRATE);
+        return str(TimingInterval.TimingParameter.PARTITIONRATE, "pk/s");
     }
+
     public String rowRates()
     {
-        return str(TimingInterval.TimingParameter.ROWRATE);
+        return str(TimingInterval.TimingParameter.ROWRATE, "row/s");
     }
+
     public String meanLatencies()
     {
-        return str(TimingInterval.TimingParameter.MEANLATENCY);
+        return str(TimingInterval.TimingParameter.MEANLATENCY, "ms");
     }
+
     public String maxLatencies()
     {
-        return str(TimingInterval.TimingParameter.MAXLATENCY);
+        return str(TimingInterval.TimingParameter.MAXLATENCY, "ms");
     }
+
     public String medianLatencies()
     {
-        return str(TimingInterval.TimingParameter.MEDIANLATENCY);
+        return str(TimingInterval.TimingParameter.MEDIANLATENCY, "ms");
     }
-    public String rankLatencies(float rank)
+
+    public String latenciesAtPercentile(double rank)
     {
-        return str(TimingInterval.TimingParameter.RANKLATENCY, rank);
+        return str(TimingInterval.TimingParameter.RANKLATENCY, rank, "ms");
     }
+
     public String errorCounts()
     {
-        return str(TimingInterval.TimingParameter.ERRORCOUNT);
+        return str(TimingInterval.TimingParameter.ERRORCOUNT, "");
     }
+
     public String partitionCounts()
     {
-        return str(TimingInterval.TimingParameter.PARTITIONCOUNT);
+        return str(TimingInterval.TimingParameter.PARTITIONCOUNT, "");
     }
 
     public long opRate()

diff --git a/tools/stress/src/resources/org/apache/cassandra/stress/graph/graph.html b/tools/stress/src/resources/org/apache/cassandra/stress/graph/graph.html
new file mode 100644
index 0000000..54cd469
--- /dev/null
+++ b/tools/stress/src/resources/org/apache/cassandra/stress/graph/graph.html

@@ -0,0 +1,659 @@
+<!DOCTYPE html>
+
+<!-- cstar_perf (https://github.com/datastax/cstar_perf) graphing
+utility adapted for use directly with command line cassandra-stress -->
+
+<head>
+  <meta charset="utf-8">
+  <script language="javascript" type="text/javascript">
+    <!--
+
+/* stats start */
+
+/* stats end */
+
+/*! jQuery v1.11.1 | (c) 2005, 2014 jQuery Foundation, Inc. | jquery.org/license */
+!function(a,b){"object"==typeof module&&"object"==typeof module.exports?module.exports=a.document?b(a,!0):function(a){if(!a.document)throw new Error("jQuery requires a window with a document");return b(a)}:b(a)}("undefined"!=typeof window?window:this,function(a,b){var c=[],d=c.slice,e=c.concat,f=c.push,g=c.indexOf,h={},i=h.toString,j=h.hasOwnProperty,k={},l="1.11.1",m=function(a,b){return new m.fn.init(a,b)},n=/^[\s\uFEFF\xA0]+|[\s\uFEFF\xA0]+$/g,o=/^-ms-/,p=/-([\da-z])/gi,q=function(a,b){return b.toUpperCase()};m.fn=m.prototype={jquery:l,constructor:m,selector:"",length:0,toArray:function(){return d.call(this)},get:function(a){return null!=a?0>a?this[a+this.length]:this[a]:d.call(this)},pushStack:function(a){var b=m.merge(this.constructor(),a);return b.prevObject=this,b.context=this.context,b},each:function(a,b){return m.each(this,a,b)},map:function(a){return this.pushStack(m.map(this,function(b,c){return a.call(b,c,b)}))},slice:function(){return this.pushStack(d.apply(this,arguments))},first:function(){return this.eq(0)},last:function(){return this.eq(-1)},eq:function(a){var b=this.length,c=+a+(0>a?b:0);return this.pushStack(c>=0&&b>c?[this[c]]:[])},end:function(){return this.prevObject||this.constructor(null)},push:f,sort:c.sort,splice:c.splice},m.extend=m.fn.extend=function(){var a,b,c,d,e,f,g=arguments[0]||{},h=1,i=arguments.length,j=!1;for("boolean"==typeof g&&(j=g,g=arguments[h]||{},h++),"object"==typeof g||m.isFunction(g)||(g={}),h===i&&(g=this,h--);i>h;h++)if(null!=(e=arguments[h]))for(d in e)a=g[d],c=e[d],g!==c&&(j&&c&&(m.isPlainObject(c)||(b=m.isArray(c)))?(b?(b=!1,f=a&&m.isArray(a)?a:[]):f=a&&m.isPlainObject(a)?a:{},g[d]=m.extend(j,f,c)):void 0!==c&&(g[d]=c));return g},m.extend({expando:"jQuery"+(l+Math.random()).replace(/\D/g,""),isReady:!0,error:function(a){throw new Error(a)},noop:function(){},isFunction:function(a){return"function"===m.type(a)},isArray:Array.isArray||function(a){return"array"===m.type(a)},isWindow:function(a){return null!=a&&a==a.window},isNumeric:function(a){return!m.isArray(a)&&a-parseFloat(a)>=0},isEmptyObject:function(a){var b;for(b in a)return!1;return!0},isPlainObject:function(a){var b;if(!a||"object"!==m.type(a)||a.nodeType||m.isWindow(a))return!1;try{if(a.constructor&&!j.call(a,"constructor")&&!j.call(a.constructor.prototype,"isPrototypeOf"))return!1}catch(c){return!1}if(k.ownLast)for(b in a)return j.call(a,b);for(b in a);return void 0===b||j.call(a,b)},type:function(a){return null==a?a+"":"object"==typeof a||"function"==typeof a?h[i.call(a)]||"object":typeof a},globalEval:function(b){b&&m.trim(b)&&(a.execScript||function(b){a.eval.call(a,b)})(b)},camelCase:function(a){return a.replace(o,"ms-").replace(p,q)},nodeName:function(a,b){return a.nodeName&&a.nodeName.toLowerCase()===b.toLowerCase()},each:function(a,b,c){var d,e=0,f=a.length,g=r(a);if(c){if(g){for(;f>e;e++)if(d=b.apply(a[e],c),d===!1)break}else for(e in a)if(d=b.apply(a[e],c),d===!1)break}else if(g){for(;f>e;e++)if(d=b.call(a[e],e,a[e]),d===!1)break}else for(e in a)if(d=b.call(a[e],e,a[e]),d===!1)break;return a},trim:function(a){return null==a?"":(a+"").replace(n,"")},makeArray:function(a,b){var c=b||[];return null!=a&&(r(Object(a))?m.merge(c,"string"==typeof a?[a]:a):f.call(c,a)),c},inArray:function(a,b,c){var d;if(b){if(g)return g.call(b,a,c);for(d=b.length,c=c?0>c?Math.max(0,d+c):c:0;d>c;c++)if(c in b&&b[c]===a)return c}return-1},merge:function(a,b){var c=+b.length,d=0,e=a.length;while(c>d)a[e++]=b[d++];if(c!==c)while(void 0!==b[d])a[e++]=b[d++];return a.length=e,a},grep:function(a,b,c){for(var d,e=[],f=0,g=a.length,h=!c;g>f;f++)d=!b(a[f],f),d!==h&&e.push(a[f]);return e},map:function(a,b,c){var d,f=0,g=a.length,h=r(a),i=[];if(h)for(;g>f;f++)d=b(a[f],f,c),null!=d&&i.push(d);else for(f in a)d=b(a[f],f,c),null!=d&&i.push(d);return e.apply([],i)},guid:1,proxy:function(a,b){var c,e,f;return"string"==typeof b&&(f=a[b],b=a,a=f),m.isFunction(a)?(c=d.call(arguments,2),e=function(){return a.apply(b||this,c.concat(d.call(arguments)))},e.guid=a.guid=a.guid||m.guid++,e):void 0},now:function(){return+new Date},support:k}),m.each("Boolean Number String Function Array Date RegExp Object Error".split(" "),function(a,b){h["[object "+b+"]"]=b.toLowerCase()});function r(a){var b=a.length,c=m.type(a);return"function"===c||m.isWindow(a)?!1:1===a.nodeType&&b?!0:"array"===c||0===b||"number"==typeof b&&b>0&&b-1 in a}var s=function(a){var b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u="sizzle"+-new Date,v=a.document,w=0,x=0,y=gb(),z=gb(),A=gb(),B=function(a,b){return a===b&&(l=!0),0},C="undefined",D=1<<31,E={}.hasOwnProperty,F=[],G=F.pop,H=F.push,I=F.push,J=F.slice,K=F.indexOf||function(a){for(var b=0,c=this.length;c>b;b++)if(this[b]===a)return b;return-1},L="checked|selected|async|autofocus|autoplay|controls|defer|disabled|hidden|ismap|loop|multiple|open|readonly|required|scoped",M="[\\x20\\t\\r\\n\\f]",N="(?:\\\\.|[\\w-]|[^\\x00-\\xa0])+",O=N.replace("w","w#"),P="\\["+M+"*("+N+")(?:"+M+"*([*^$|!~]?=)"+M+"*(?:'((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\"|("+O+"))|)"+M+"*\\]",Q=":("+N+")(?:\\((('((?:\\\\.|[^\\\\'])*)'|\"((?:\\\\.|[^\\\\\"])*)\")|((?:\\\\.|[^\\\\()[\\]]|"+P+")*)|.*)\\)|)",R=new RegExp("^"+M+"+|((?:^|[^\\\\])(?:\\\\.)*)"+M+"+$","g"),S=new RegExp("^"+M+"*,"+M+"*"),T=new RegExp("^"+M+"*([>+~]|"+M+")"+M+"*"),U=new RegExp("="+M+"*([^\\]'\"]*?)"+M+"*\\]","g"),V=new RegExp(Q),W=new RegExp("^"+O+"$"),X={ID:new RegExp("^#("+N+")"),CLASS:new RegExp("^\\.("+N+")"),TAG:new RegExp("^("+N.replace("w","w*")+")"),ATTR:new RegExp("^"+P),PSEUDO:new RegExp("^"+Q),CHILD:new RegExp("^:(only|first|last|nth|nth-last)-(child|of-type)(?:\\("+M+"*(even|odd|(([+-]|)(\\d*)n|)"+M+"*(?:([+-]|)"+M+"*(\\d+)|))"+M+"*\\)|)","i"),bool:new RegExp("^(?:"+L+")$","i"),needsContext:new RegExp("^"+M+"*[>+~]|:(even|odd|eq|gt|lt|nth|first|last)(?:\\("+M+"*((?:-\\d)?\\d*)"+M+"*\\)|)(?=[^-]|$)","i")},Y=/^(?:input|select|textarea|button)$/i,Z=/^h\d$/i,$=/^[^{]+\{\s*\[native \w/,_=/^(?:#([\w-]+)|(\w+)|\.([\w-]+))$/,ab=/[+~]/,bb=/'|\\/g,cb=new RegExp("\\\\([\\da-f]{1,6}"+M+"?|("+M+")|.)","ig"),db=function(a,b,c){var d="0x"+b-65536;return d!==d||c?b:0>d?String.fromCharCode(d+65536):String.fromCharCode(d>>10|55296,1023&d|56320)};try{I.apply(F=J.call(v.childNodes),v.childNodes),F[v.childNodes.length].nodeType}catch(eb){I={apply:F.length?function(a,b){H.apply(a,J.call(b))}:function(a,b){var c=a.length,d=0;while(a[c++]=b[d++]);a.length=c-1}}}function fb(a,b,d,e){var f,h,j,k,l,o,r,s,w,x;if((b?b.ownerDocument||b:v)!==n&&m(b),b=b||n,d=d||[],!a||"string"!=typeof a)return d;if(1!==(k=b.nodeType)&&9!==k)return[];if(p&&!e){if(f=_.exec(a))if(j=f[1]){if(9===k){if(h=b.getElementById(j),!h||!h.parentNode)return d;if(h.id===j)return d.push(h),d}else if(b.ownerDocument&&(h=b.ownerDocument.getElementById(j))&&t(b,h)&&h.id===j)return d.push(h),d}else{if(f[2])return I.apply(d,b.getElementsByTagName(a)),d;if((j=f[3])&&c.getElementsByClassName&&b.getElementsByClassName)return I.apply(d,b.getElementsByClassName(j)),d}if(c.qsa&&(!q||!q.test(a))){if(s=r=u,w=b,x=9===k&&a,1===k&&"object"!==b.nodeName.toLowerCase()){o=g(a),(r=b.getAttribute("id"))?s=r.replace(bb,"\\$&"):b.setAttribute("id",s),s="[id='"+s+"'] ",l=o.length;while(l--)o[l]=s+qb(o[l]);w=ab.test(a)&&ob(b.parentNode)||b,x=o.join(",")}if(x)try{return I.apply(d,w.querySelectorAll(x)),d}catch(y){}finally{r||b.removeAttribute("id")}}}return i(a.replace(R,"$1"),b,d,e)}function gb(){var a=[];function b(c,e){return a.push(c+" ")>d.cacheLength&&delete b[a.shift()],b[c+" "]=e}return b}function hb(a){return a[u]=!0,a}function ib(a){var b=n.createElement("div");try{return!!a(b)}catch(c){return!1}finally{b.parentNode&&b.parentNode.removeChild(b),b=null}}function jb(a,b){var c=a.split("|"),e=a.length;while(e--)d.attrHandle[c[e]]=b}function kb(a,b){var c=b&&a,d=c&&1===a.nodeType&&1===b.nodeType&&(~b.sourceIndex||D)-(~a.sourceIndex||D);if(d)return d;if(c)while(c=c.nextSibling)if(c===b)return-1;return a?1:-1}function lb(a){return function(b){var c=b.nodeName.toLowerCase();return"input"===c&&b.type===a}}function mb(a){return function(b){var c=b.nodeName.toLowerCase();return("input"===c||"button"===c)&&b.type===a}}function nb(a){return hb(function(b){return b=+b,hb(function(c,d){var e,f=a([],c.length,b),g=f.length;while(g--)c[e=f[g]]&&(c[e]=!(d[e]=c[e]))})})}function ob(a){return a&&typeof a.getElementsByTagName!==C&&a}c=fb.support={},f=fb.isXML=function(a){var b=a&&(a.ownerDocument||a).documentElement;return b?"HTML"!==b.nodeName:!1},m=fb.setDocument=function(a){var b,e=a?a.ownerDocument||a:v,g=e.defaultView;return e!==n&&9===e.nodeType&&e.documentElement?(n=e,o=e.documentElement,p=!f(e),g&&g!==g.top&&(g.addEventListener?g.addEventListener("unload",function(){m()},!1):g.attachEvent&&g.attachEvent("onunload",function(){m()})),c.attributes=ib(function(a){return a.className="i",!a.getAttribute("className")}),c.getElementsByTagName=ib(function(a){return a.appendChild(e.createComment("")),!a.getElementsByTagName("*").length}),c.getElementsByClassName=$.test(e.getElementsByClassName)&&ib(function(a){return a.innerHTML="<div class='a'></div><div class='a i'></div>",a.firstChild.className="i",2===a.getElementsByClassName("i").length}),c.getById=ib(function(a){return o.appendChild(a).id=u,!e.getElementsByName||!e.getElementsByName(u).length}),c.getById?(d.find.ID=function(a,b){if(typeof b.getElementById!==C&&p){var c=b.getElementById(a);return c&&c.parentNode?[c]:[]}},d.filter.ID=function(a){var b=a.replace(cb,db);return function(a){return a.getAttribute("id")===b}}):(delete d.find.ID,d.filter.ID=function(a){var b=a.replace(cb,db);return function(a){var c=typeof a.getAttributeNode!==C&&a.getAttributeNode("id");return c&&c.value===b}}),d.find.TAG=c.getElementsByTagName?function(a,b){return typeof b.getElementsByTagName!==C?b.getElementsByTagName(a):void 0}:function(a,b){var c,d=[],e=0,f=b.getElementsByTagName(a);if("*"===a){while(c=f[e++])1===c.nodeType&&d.push(c);return d}return f},d.find.CLASS=c.getElementsByClassName&&function(a,b){return typeof b.getElementsByClassName!==C&&p?b.getElementsByClassName(a):void 0},r=[],q=[],(c.qsa=$.test(e.querySelectorAll))&&(ib(function(a){a.innerHTML="<select msallowclip=''><option selected=''></option></select>",a.querySelectorAll("[msallowclip^='']").length&&q.push("[*^$]="+M+"*(?:''|\"\")"),a.querySelectorAll("[selected]").length||q.push("\\["+M+"*(?:value|"+L+")"),a.querySelectorAll(":checked").length||q.push(":checked")}),ib(function(a){var b=e.createElement("input");b.setAttribute("type","hidden"),a.appendChild(b).setAttribute("name","D"),a.querySelectorAll("[name=d]").length&&q.push("name"+M+"*[*^$|!~]?="),a.querySelectorAll(":enabled").length||q.push(":enabled",":disabled"),a.querySelectorAll("*,:x"),q.push(",.*:")})),(c.matchesSelector=$.test(s=o.matches||o.webkitMatchesSelector||o.mozMatchesSelector||o.oMatchesSelector||o.msMatchesSelector))&&ib(function(a){c.disconnectedMatch=s.call(a,"div"),s.call(a,"[s!='']:x"),r.push("!=",Q)}),q=q.length&&new RegExp(q.join("|")),r=r.length&&new RegExp(r.join("|")),b=$.test(o.compareDocumentPosition),t=b||$.test(o.contains)?function(a,b){var c=9===a.nodeType?a.documentElement:a,d=b&&b.parentNode;return a===d||!(!d||1!==d.nodeType||!(c.contains?c.contains(d):a.compareDocumentPosition&&16&a.compareDocumentPosition(d)))}:function(a,b){if(b)while(b=b.parentNode)if(b===a)return!0;return!1},B=b?function(a,b){if(a===b)return l=!0,0;var d=!a.compareDocumentPosition-!b.compareDocumentPosition;return d?d:(d=(a.ownerDocument||a)===(b.ownerDocument||b)?a.compareDocumentPosition(b):1,1&d||!c.sortDetached&&b.compareDocumentPosition(a)===d?a===e||a.ownerDocument===v&&t(v,a)?-1:b===e||b.ownerDocument===v&&t(v,b)?1:k?K.call(k,a)-K.call(k,b):0:4&d?-1:1)}:function(a,b){if(a===b)return l=!0,0;var c,d=0,f=a.parentNode,g=b.parentNode,h=[a],i=[b];if(!f||!g)return a===e?-1:b===e?1:f?-1:g?1:k?K.call(k,a)-K.call(k,b):0;if(f===g)return kb(a,b);c=a;while(c=c.parentNode)h.unshift(c);c=b;while(c=c.parentNode)i.unshift(c);while(h[d]===i[d])d++;return d?kb(h[d],i[d]):h[d]===v?-1:i[d]===v?1:0},e):n},fb.matches=function(a,b){return fb(a,null,null,b)},fb.matchesSelector=function(a,b){if((a.ownerDocument||a)!==n&&m(a),b=b.replace(U,"='$1']"),!(!c.matchesSelector||!p||r&&r.test(b)||q&&q.test(b)))try{var d=s.call(a,b);if(d||c.disconnectedMatch||a.document&&11!==a.document.nodeType)return d}catch(e){}return fb(b,n,null,[a]).length>0},fb.contains=function(a,b){return(a.ownerDocument||a)!==n&&m(a),t(a,b)},fb.attr=function(a,b){(a.ownerDocument||a)!==n&&m(a);var e=d.attrHandle[b.toLowerCase()],f=e&&E.call(d.attrHandle,b.toLowerCase())?e(a,b,!p):void 0;return void 0!==f?f:c.attributes||!p?a.getAttribute(b):(f=a.getAttributeNode(b))&&f.specified?f.value:null},fb.error=function(a){throw new Error("Syntax error, unrecognized expression: "+a)},fb.uniqueSort=function(a){var b,d=[],e=0,f=0;if(l=!c.detectDuplicates,k=!c.sortStable&&a.slice(0),a.sort(B),l){while(b=a[f++])b===a[f]&&(e=d.push(f));while(e--)a.splice(d[e],1)}return k=null,a},e=fb.getText=function(a){var b,c="",d=0,f=a.nodeType;if(f){if(1===f||9===f||11===f){if("string"==typeof a.textContent)return a.textContent;for(a=a.firstChild;a;a=a.nextSibling)c+=e(a)}else if(3===f||4===f)return a.nodeValue}else while(b=a[d++])c+=e(b);return c},d=fb.selectors={cacheLength:50,createPseudo:hb,match:X,attrHandle:{},find:{},relative:{">":{dir:"parentNode",first:!0}," ":{dir:"parentNode"},"+":{dir:"previousSibling",first:!0},"~":{dir:"previousSibling"}},preFilter:{ATTR:function(a){return a[1]=a[1].replace(cb,db),a[3]=(a[3]||a[4]||a[5]||"").replace(cb,db),"~="===a[2]&&(a[3]=" "+a[3]+" "),a.slice(0,4)},CHILD:function(a){return a[1]=a[1].toLowerCase(),"nth"===a[1].slice(0,3)?(a[3]||fb.error(a[0]),a[4]=+(a[4]?a[5]+(a[6]||1):2*("even"===a[3]||"odd"===a[3])),a[5]=+(a[7]+a[8]||"odd"===a[3])):a[3]&&fb.error(a[0]),a},PSEUDO:function(a){var b,c=!a[6]&&a[2];return X.CHILD.test(a[0])?null:(a[3]?a[2]=a[4]||a[5]||"":c&&V.test(c)&&(b=g(c,!0))&&(b=c.indexOf(")",c.length-b)-c.length)&&(a[0]=a[0].slice(0,b),a[2]=c.slice(0,b)),a.slice(0,3))}},filter:{TAG:function(a){var b=a.replace(cb,db).toLowerCase();return"*"===a?function(){return!0}:function(a){return a.nodeName&&a.nodeName.toLowerCase()===b}},CLASS:function(a){var b=y[a+" "];return b||(b=new RegExp("(^|"+M+")"+a+"("+M+"|$)"))&&y(a,function(a){return b.test("string"==typeof a.className&&a.className||typeof a.getAttribute!==C&&a.getAttribute("class")||"")})},ATTR:function(a,b,c){return function(d){var e=fb.attr(d,a);return null==e?"!="===b:b?(e+="","="===b?e===c:"!="===b?e!==c:"^="===b?c&&0===e.indexOf(c):"*="===b?c&&e.indexOf(c)>-1:"$="===b?c&&e.slice(-c.length)===c:"~="===b?(" "+e+" ").indexOf(c)>-1:"|="===b?e===c||e.slice(0,c.length+1)===c+"-":!1):!0}},CHILD:function(a,b,c,d,e){var f="nth"!==a.slice(0,3),g="last"!==a.slice(-4),h="of-type"===b;return 1===d&&0===e?function(a){return!!a.parentNode}:function(b,c,i){var j,k,l,m,n,o,p=f!==g?"nextSibling":"previousSibling",q=b.parentNode,r=h&&b.nodeName.toLowerCase(),s=!i&&!h;if(q){if(f){while(p){l=b;while(l=l[p])if(h?l.nodeName.toLowerCase()===r:1===l.nodeType)return!1;o=p="only"===a&&!o&&"nextSibling"}return!0}if(o=[g?q.firstChild:q.lastChild],g&&s){k=q[u]||(q[u]={}),j=k[a]||[],n=j[0]===w&&j[1],m=j[0]===w&&j[2],l=n&&q.childNodes[n];while(l=++n&&l&&l[p]||(m=n=0)||o.pop())if(1===l.nodeType&&++m&&l===b){k[a]=[w,n,m];break}}else if(s&&(j=(b[u]||(b[u]={}))[a])&&j[0]===w)m=j[1];else while(l=++n&&l&&l[p]||(m=n=0)||o.pop())if((h?l.nodeName.toLowerCase()===r:1===l.nodeType)&&++m&&(s&&((l[u]||(l[u]={}))[a]=[w,m]),l===b))break;return m-=e,m===d||m%d===0&&m/d>=0}}},PSEUDO:function(a,b){var c,e=d.pseudos[a]||d.setFilters[a.toLowerCase()]||fb.error("unsupported pseudo: "+a);return e[u]?e(b):e.length>1?(c=[a,a,"",b],d.setFilters.hasOwnProperty(a.toLowerCase())?hb(function(a,c){var d,f=e(a,b),g=f.length;while(g--)d=K.call(a,f[g]),a[d]=!(c[d]=f[g])}):function(a){return e(a,0,c)}):e}},pseudos:{not:hb(function(a){var b=[],c=[],d=h(a.replace(R,"$1"));return d[u]?hb(function(a,b,c,e){var f,g=d(a,null,e,[]),h=a.length;while(h--)(f=g[h])&&(a[h]=!(b[h]=f))}):function(a,e,f){return b[0]=a,d(b,null,f,c),!c.pop()}}),has:hb(function(a){return function(b){return fb(a,b).length>0}}),contains:hb(function(a){return function(b){return(b.textContent||b.innerText||e(b)).indexOf(a)>-1}}),lang:hb(function(a){return W.test(a||"")||fb.error("unsupported lang: "+a),a=a.replace(cb,db).toLowerCase(),function(b){var c;do if(c=p?b.lang:b.getAttribute("xml:lang")||b.getAttribute("lang"))return c=c.toLowerCase(),c===a||0===c.indexOf(a+"-");while((b=b.parentNode)&&1===b.nodeType);return!1}}),target:function(b){var c=a.location&&a.location.hash;return c&&c.slice(1)===b.id},root:function(a){return a===o},focus:function(a){return a===n.activeElement&&(!n.hasFocus||n.hasFocus())&&!!(a.type||a.href||~a.tabIndex)},enabled:function(a){return a.disabled===!1},disabled:function(a){return a.disabled===!0},checked:function(a){var b=a.nodeName.toLowerCase();return"input"===b&&!!a.checked||"option"===b&&!!a.selected},selected:function(a){return a.parentNode&&a.parentNode.selectedIndex,a.selected===!0},empty:function(a){for(a=a.firstChild;a;a=a.nextSibling)if(a.nodeType<6)return!1;return!0},parent:function(a){return!d.pseudos.empty(a)},header:function(a){return Z.test(a.nodeName)},input:function(a){return Y.test(a.nodeName)},button:function(a){var b=a.nodeName.toLowerCase();return"input"===b&&"button"===a.type||"button"===b},text:function(a){var b;return"input"===a.nodeName.toLowerCase()&&"text"===a.type&&(null==(b=a.getAttribute("type"))||"text"===b.toLowerCase())},first:nb(function(){return[0]}),last:nb(function(a,b){return[b-1]}),eq:nb(function(a,b,c){return[0>c?c+b:c]}),even:nb(function(a,b){for(var c=0;b>c;c+=2)a.push(c);return a}),odd:nb(function(a,b){for(var c=1;b>c;c+=2)a.push(c);return a}),lt:nb(function(a,b,c){for(var d=0>c?c+b:c;--d>=0;)a.push(d);return a}),gt:nb(function(a,b,c){for(var d=0>c?c+b:c;++d<b;)a.push(d);return a})}},d.pseudos.nth=d.pseudos.eq;for(b in{radio:!0,checkbox:!0,file:!0,password:!0,image:!0})d.pseudos[b]=lb(b);for(b in{submit:!0,reset:!0})d.pseudos[b]=mb(b);function pb(){}pb.prototype=d.filters=d.pseudos,d.setFilters=new pb,g=fb.tokenize=function(a,b){var c,e,f,g,h,i,j,k=z[a+" "];if(k)return b?0:k.slice(0);h=a,i=[],j=d.preFilter;while(h){(!c||(e=S.exec(h)))&&(e&&(h=h.slice(e[0].length)||h),i.push(f=[])),c=!1,(e=T.exec(h))&&(c=e.shift(),f.push({value:c,type:e[0].replace(R," ")}),h=h.slice(c.length));for(g in d.filter)!(e=X[g].exec(h))||j[g]&&!(e=j[g](e))||(c=e.shift(),f.push({value:c,type:g,matches:e}),h=h.slice(c.length));if(!c)break}return b?h.length:h?fb.error(a):z(a,i).slice(0)};function qb(a){for(var b=0,c=a.length,d="";c>b;b++)d+=a[b].value;return d}function rb(a,b,c){var d=b.dir,e=c&&"parentNode"===d,f=x++;return b.first?function(b,c,f){while(b=b[d])if(1===b.nodeType||e)return a(b,c,f)}:function(b,c,g){var h,i,j=[w,f];if(g){while(b=b[d])if((1===b.nodeType||e)&&a(b,c,g))return!0}else while(b=b[d])if(1===b.nodeType||e){if(i=b[u]||(b[u]={}),(h=i[d])&&h[0]===w&&h[1]===f)return j[2]=h[2];if(i[d]=j,j[2]=a(b,c,g))return!0}}}function sb(a){return a.length>1?function(b,c,d){var e=a.length;while(e--)if(!a[e](b,c,d))return!1;return!0}:a[0]}function tb(a,b,c){for(var d=0,e=b.length;e>d;d++)fb(a,b[d],c);return c}function ub(a,b,c,d,e){for(var f,g=[],h=0,i=a.length,j=null!=b;i>h;h++)(f=a[h])&&(!c||c(f,d,e))&&(g.push(f),j&&b.push(h));return g}function vb(a,b,c,d,e,f){return d&&!d[u]&&(d=vb(d)),e&&!e[u]&&(e=vb(e,f)),hb(function(f,g,h,i){var j,k,l,m=[],n=[],o=g.length,p=f||tb(b||"*",h.nodeType?[h]:h,[]),q=!a||!f&&b?p:ub(p,m,a,h,i),r=c?e||(f?a:o||d)?[]:g:q;if(c&&c(q,r,h,i),d){j=ub(r,n),d(j,[],h,i),k=j.length;while(k--)(l=j[k])&&(r[n[k]]=!(q[n[k]]=l))}if(f){if(e||a){if(e){j=[],k=r.length;while(k--)(l=r[k])&&j.push(q[k]=l);e(null,r=[],j,i)}k=r.length;while(k--)(l=r[k])&&(j=e?K.call(f,l):m[k])>-1&&(f[j]=!(g[j]=l))}}else r=ub(r===g?r.splice(o,r.length):r),e?e(null,g,r,i):I.apply(g,r)})}function wb(a){for(var b,c,e,f=a.length,g=d.relative[a[0].type],h=g||d.relative[" "],i=g?1:0,k=rb(function(a){return a===b},h,!0),l=rb(function(a){return K.call(b,a)>-1},h,!0),m=[function(a,c,d){return!g&&(d||c!==j)||((b=c).nodeType?k(a,c,d):l(a,c,d))}];f>i;i++)if(c=d.relative[a[i].type])m=[rb(sb(m),c)];else{if(c=d.filter[a[i].type].apply(null,a[i].matches),c[u]){for(e=++i;f>e;e++)if(d.relative[a[e].type])break;return vb(i>1&&sb(m),i>1&&qb(a.slice(0,i-1).concat({value:" "===a[i-2].type?"*":""})).replace(R,"$1"),c,e>i&&wb(a.slice(i,e)),f>e&&wb(a=a.slice(e)),f>e&&qb(a))}m.push(c)}return sb(m)}function xb(a,b){var c=b.length>0,e=a.length>0,f=function(f,g,h,i,k){var l,m,o,p=0,q="0",r=f&&[],s=[],t=j,u=f||e&&d.find.TAG("*",k),v=w+=null==t?1:Math.random()||.1,x=u.length;for(k&&(j=g!==n&&g);q!==x&&null!=(l=u[q]);q++){if(e&&l){m=0;while(o=a[m++])if(o(l,g,h)){i.push(l);break}k&&(w=v)}c&&((l=!o&&l)&&p--,f&&r.push(l))}if(p+=q,c&&q!==p){m=0;while(o=b[m++])o(r,s,g,h);if(f){if(p>0)while(q--)r[q]||s[q]||(s[q]=G.call(i));s=ub(s)}I.apply(i,s),k&&!f&&s.length>0&&p+b.length>1&&fb.uniqueSort(i)}return k&&(w=v,j=t),r};return c?hb(f):f}return h=fb.compile=function(a,b){var c,d=[],e=[],f=A[a+" "];if(!f){b||(b=g(a)),c=b.length;while(c--)f=wb(b[c]),f[u]?d.push(f):e.push(f);f=A(a,xb(e,d)),f.selector=a}return f},i=fb.select=function(a,b,e,f){var i,j,k,l,m,n="function"==typeof a&&a,o=!f&&g(a=n.selector||a);if(e=e||[],1===o.length){if(j=o[0]=o[0].slice(0),j.length>2&&"ID"===(k=j[0]).type&&c.getById&&9===b.nodeType&&p&&d.relative[j[1].type]){if(b=(d.find.ID(k.matches[0].replace(cb,db),b)||[])[0],!b)return e;n&&(b=b.parentNode),a=a.slice(j.shift().value.length)}i=X.needsContext.test(a)?0:j.length;while(i--){if(k=j[i],d.relative[l=k.type])break;if((m=d.find[l])&&(f=m(k.matches[0].replace(cb,db),ab.test(j[0].type)&&ob(b.parentNode)||b))){if(j.splice(i,1),a=f.length&&qb(j),!a)return I.apply(e,f),e;break}}}return(n||h(a,o))(f,b,!p,e,ab.test(a)&&ob(b.parentNode)||b),e},c.sortStable=u.split("").sort(B).join("")===u,c.detectDuplicates=!!l,m(),c.sortDetached=ib(function(a){return 1&a.compareDocumentPosition(n.createElement("div"))}),ib(function(a){return a.innerHTML="<a href='#'></a>","#"===a.firstChild.getAttribute("href")})||jb("type|href|height|width",function(a,b,c){return c?void 0:a.getAttribute(b,"type"===b.toLowerCase()?1:2)}),c.attributes&&ib(function(a){return a.innerHTML="<input/>",a.firstChild.setAttribute("value",""),""===a.firstChild.getAttribute("value")})||jb("value",function(a,b,c){return c||"input"!==a.nodeName.toLowerCase()?void 0:a.defaultValue}),ib(function(a){return null==a.getAttribute("disabled")})||jb(L,function(a,b,c){var d;return c?void 0:a[b]===!0?b.toLowerCase():(d=a.getAttributeNode(b))&&d.specified?d.value:null}),fb}(a);m.find=s,m.expr=s.selectors,m.expr[":"]=m.expr.pseudos,m.unique=s.uniqueSort,m.text=s.getText,m.isXMLDoc=s.isXML,m.contains=s.contains;var t=m.expr.match.needsContext,u=/^<(\w+)\s*\/?>(?:<\/\1>|)$/,v=/^.[^:#\[\.,]*$/;function w(a,b,c){if(m.isFunction(b))return m.grep(a,function(a,d){return!!b.call(a,d,a)!==c});if(b.nodeType)return m.grep(a,function(a){return a===b!==c});if("string"==typeof b){if(v.test(b))return m.filter(b,a,c);b=m.filter(b,a)}return m.grep(a,function(a){return m.inArray(a,b)>=0!==c})}m.filter=function(a,b,c){var d=b[0];return c&&(a=":not("+a+")"),1===b.length&&1===d.nodeType?m.find.matchesSelector(d,a)?[d]:[]:m.find.matches(a,m.grep(b,function(a){return 1===a.nodeType}))},m.fn.extend({find:function(a){var b,c=[],d=this,e=d.length;if("string"!=typeof a)return this.pushStack(m(a).filter(function(){for(b=0;e>b;b++)if(m.contains(d[b],this))return!0}));for(b=0;e>b;b++)m.find(a,d[b],c);return c=this.pushStack(e>1?m.unique(c):c),c.selector=this.selector?this.selector+" "+a:a,c},filter:function(a){return this.pushStack(w(this,a||[],!1))},not:function(a){return this.pushStack(w(this,a||[],!0))},is:function(a){return!!w(this,"string"==typeof a&&t.test(a)?m(a):a||[],!1).length}});var x,y=a.document,z=/^(?:\s*(<[\w\W]+>)[^>]*|#([\w-]*))$/,A=m.fn.init=function(a,b){var c,d;if(!a)return this;if("string"==typeof a){if(c="<"===a.charAt(0)&&">"===a.charAt(a.length-1)&&a.length>=3?[null,a,null]:z.exec(a),!c||!c[1]&&b)return!b||b.jquery?(b||x).find(a):this.constructor(b).find(a);if(c[1]){if(b=b instanceof m?b[0]:b,m.merge(this,m.parseHTML(c[1],b&&b.nodeType?b.ownerDocument||b:y,!0)),u.test(c[1])&&m.isPlainObject(b))for(c in b)m.isFunction(this[c])?this[c](b[c]):this.attr(c,b[c]);return this}if(d=y.getElementById(c[2]),d&&d.parentNode){if(d.id!==c[2])return x.find(a);this.length=1,this[0]=d}return this.context=y,this.selector=a,this}return a.nodeType?(this.context=this[0]=a,this.length=1,this):m.isFunction(a)?"undefined"!=typeof x.ready?x.ready(a):a(m):(void 0!==a.selector&&(this.selector=a.selector,this.context=a.context),m.makeArray(a,this))};A.prototype=m.fn,x=m(y);var B=/^(?:parents|prev(?:Until|All))/,C={children:!0,contents:!0,next:!0,prev:!0};m.extend({dir:function(a,b,c){var d=[],e=a[b];while(e&&9!==e.nodeType&&(void 0===c||1!==e.nodeType||!m(e).is(c)))1===e.nodeType&&d.push(e),e=e[b];return d},sibling:function(a,b){for(var c=[];a;a=a.nextSibling)1===a.nodeType&&a!==b&&c.push(a);return c}}),m.fn.extend({has:function(a){var b,c=m(a,this),d=c.length;return this.filter(function(){for(b=0;d>b;b++)if(m.contains(this,c[b]))return!0})},closest:function(a,b){for(var c,d=0,e=this.length,f=[],g=t.test(a)||"string"!=typeof a?m(a,b||this.context):0;e>d;d++)for(c=this[d];c&&c!==b;c=c.parentNode)if(c.nodeType<11&&(g?g.index(c)>-1:1===c.nodeType&&m.find.matchesSelector(c,a))){f.push(c);break}return this.pushStack(f.length>1?m.unique(f):f)},index:function(a){return a?"string"==typeof a?m.inArray(this[0],m(a)):m.inArray(a.jquery?a[0]:a,this):this[0]&&this[0].parentNode?this.first().prevAll().length:-1},add:function(a,b){return this.pushStack(m.unique(m.merge(this.get(),m(a,b))))},addBack:function(a){return this.add(null==a?this.prevObject:this.prevObject.filter(a))}});function D(a,b){do a=a[b];while(a&&1!==a.nodeType);return a}m.each({parent:function(a){var b=a.parentNode;return b&&11!==b.nodeType?b:null},parents:function(a){return m.dir(a,"parentNode")},parentsUntil:function(a,b,c){return m.dir(a,"parentNode",c)},next:function(a){return D(a,"nextSibling")},prev:function(a){return D(a,"previousSibling")},nextAll:function(a){return m.dir(a,"nextSibling")},prevAll:function(a){return m.dir(a,"previousSibling")},nextUntil:function(a,b,c){return m.dir(a,"nextSibling",c)},prevUntil:function(a,b,c){return m.dir(a,"previousSibling",c)},siblings:function(a){return m.sibling((a.parentNode||{}).firstChild,a)},children:function(a){return m.sibling(a.firstChild)},contents:function(a){return m.nodeName(a,"iframe")?a.contentDocument||a.contentWindow.document:m.merge([],a.childNodes)}},function(a,b){m.fn[a]=function(c,d){var e=m.map(this,b,c);return"Until"!==a.slice(-5)&&(d=c),d&&"string"==typeof d&&(e=m.filter(d,e)),this.length>1&&(C[a]||(e=m.unique(e)),B.test(a)&&(e=e.reverse())),this.pushStack(e)}});var E=/\S+/g,F={};function G(a){var b=F[a]={};return m.each(a.match(E)||[],function(a,c){b[c]=!0}),b}m.Callbacks=function(a){a="string"==typeof a?F[a]||G(a):m.extend({},a);var b,c,d,e,f,g,h=[],i=!a.once&&[],j=function(l){for(c=a.memory&&l,d=!0,f=g||0,g=0,e=h.length,b=!0;h&&e>f;f++)if(h[f].apply(l[0],l[1])===!1&&a.stopOnFalse){c=!1;break}b=!1,h&&(i?i.length&&j(i.shift()):c?h=[]:k.disable())},k={add:function(){if(h){var d=h.length;!function f(b){m.each(b,function(b,c){var d=m.type(c);"function"===d?a.unique&&k.has(c)||h.push(c):c&&c.length&&"string"!==d&&f(c)})}(arguments),b?e=h.length:c&&(g=d,j(c))}return this},remove:function(){return h&&m.each(arguments,function(a,c){var d;while((d=m.inArray(c,h,d))>-1)h.splice(d,1),b&&(e>=d&&e--,f>=d&&f--)}),this},has:function(a){return a?m.inArray(a,h)>-1:!(!h||!h.length)},empty:function(){return h=[],e=0,this},disable:function(){return h=i=c=void 0,this},disabled:function(){return!h},lock:function(){return i=void 0,c||k.disable(),this},locked:function(){return!i},fireWith:function(a,c){return!h||d&&!i||(c=c||[],c=[a,c.slice?c.slice():c],b?i.push(c):j(c)),this},fire:function(){return k.fireWith(this,arguments),this},fired:function(){return!!d}};return k},m.extend({Deferred:function(a){var b=[["resolve","done",m.Callbacks("once memory"),"resolved"],["reject","fail",m.Callbacks("once memory"),"rejected"],["notify","progress",m.Callbacks("memory")]],c="pending",d={state:function(){return c},always:function(){return e.done(arguments).fail(arguments),this},then:function(){var a=arguments;return m.Deferred(function(c){m.each(b,function(b,f){var g=m.isFunction(a[b])&&a[b];e[f[1]](function(){var a=g&&g.apply(this,arguments);a&&m.isFunction(a.promise)?a.promise().done(c.resolve).fail(c.reject).progress(c.notify):c[f[0]+"With"](this===d?c.promise():this,g?[a]:arguments)})}),a=null}).promise()},promise:function(a){return null!=a?m.extend(a,d):d}},e={};return d.pipe=d.then,m.each(b,function(a,f){var g=f[2],h=f[3];d[f[1]]=g.add,h&&g.add(function(){c=h},b[1^a][2].disable,b[2][2].lock),e[f[0]]=function(){return e[f[0]+"With"](this===e?d:this,arguments),this},e[f[0]+"With"]=g.fireWith}),d.promise(e),a&&a.call(e,e),e},when:function(a){var b=0,c=d.call(arguments),e=c.length,f=1!==e||a&&m.isFunction(a.promise)?e:0,g=1===f?a:m.Deferred(),h=function(a,b,c){return function(e){b[a]=this,c[a]=arguments.length>1?d.call(arguments):e,c===i?g.notifyWith(b,c):--f||g.resolveWith(b,c)}},i,j,k;if(e>1)for(i=new Array(e),j=new Array(e),k=new Array(e);e>b;b++)c[b]&&m.isFunction(c[b].promise)?c[b].promise().done(h(b,k,c)).fail(g.reject).progress(h(b,j,i)):--f;return f||g.resolveWith(k,c),g.promise()}});var H;m.fn.ready=function(a){return m.ready.promise().done(a),this},m.extend({isReady:!1,readyWait:1,holdReady:function(a){a?m.readyWait++:m.ready(!0)},ready:function(a){if(a===!0?!--m.readyWait:!m.isReady){if(!y.body)return setTimeout(m.ready);m.isReady=!0,a!==!0&&--m.readyWait>0||(H.resolveWith(y,[m]),m.fn.triggerHandler&&(m(y).triggerHandler("ready"),m(y).off("ready")))}}});function I(){y.addEventListener?(y.removeEventListener("DOMContentLoaded",J,!1),a.removeEventListener("load",J,!1)):(y.detachEvent("onreadystatechange",J),a.detachEvent("onload",J))}function J(){(y.addEventListener||"load"===event.type||"complete"===y.readyState)&&(I(),m.ready())}m.ready.promise=function(b){if(!H)if(H=m.Deferred(),"complete"===y.readyState)setTimeout(m.ready);else if(y.addEventListener)y.addEventListener("DOMContentLoaded",J,!1),a.addEventListener("load",J,!1);else{y.attachEvent("onreadystatechange",J),a.attachEvent("onload",J);var c=!1;try{c=null==a.frameElement&&y.documentElement}catch(d){}c&&c.doScroll&&!function e(){if(!m.isReady){try{c.doScroll("left")}catch(a){return setTimeout(e,50)}I(),m.ready()}}()}return H.promise(b)};var K="undefined",L;for(L in m(k))break;k.ownLast="0"!==L,k.inlineBlockNeedsLayout=!1,m(function(){var a,b,c,d;c=y.getElementsByTagName("body")[0],c&&c.style&&(b=y.createElement("div"),d=y.createElement("div"),d.style.cssText="position:absolute;border:0;width:0;height:0;top:0;left:-9999px",c.appendChild(d).appendChild(b),typeof b.style.zoom!==K&&(b.style.cssText="display:inline;margin:0;border:0;padding:1px;width:1px;zoom:1",k.inlineBlockNeedsLayout=a=3===b.offsetWidth,a&&(c.style.zoom=1)),c.removeChild(d))}),function(){var a=y.createElement("div");if(null==k.deleteExpando){k.deleteExpando=!0;try{delete a.test}catch(b){k.deleteExpando=!1}}a=null}(),m.acceptData=function(a){var b=m.noData[(a.nodeName+" ").toLowerCase()],c=+a.nodeType||1;return 1!==c&&9!==c?!1:!b||b!==!0&&a.getAttribute("classid")===b};var M=/^(?:\{[\w\W]*\}|\[[\w\W]*\])$/,N=/([A-Z])/g;function O(a,b,c){if(void 0===c&&1===a.nodeType){var d="data-"+b.replace(N,"-$1").toLowerCase();if(c=a.getAttribute(d),"string"==typeof c){try{c="true"===c?!0:"false"===c?!1:"null"===c?null:+c+""===c?+c:M.test(c)?m.parseJSON(c):c}catch(e){}m.data(a,b,c)}else c=void 0}return c}function P(a){var b;for(b in a)if(("data"!==b||!m.isEmptyObject(a[b]))&&"toJSON"!==b)return!1;return!0}function Q(a,b,d,e){if(m.acceptData(a)){var f,g,h=m.expando,i=a.nodeType,j=i?m.cache:a,k=i?a[h]:a[h]&&h;
+if(k&&j[k]&&(e||j[k].data)||void 0!==d||"string"!=typeof b)return k||(k=i?a[h]=c.pop()||m.guid++:h),j[k]||(j[k]=i?{}:{toJSON:m.noop}),("object"==typeof b||"function"==typeof b)&&(e?j[k]=m.extend(j[k],b):j[k].data=m.extend(j[k].data,b)),g=j[k],e||(g.data||(g.data={}),g=g.data),void 0!==d&&(g[m.camelCase(b)]=d),"string"==typeof b?(f=g[b],null==f&&(f=g[m.camelCase(b)])):f=g,f}}function R(a,b,c){if(m.acceptData(a)){var d,e,f=a.nodeType,g=f?m.cache:a,h=f?a[m.expando]:m.expando;if(g[h]){if(b&&(d=c?g[h]:g[h].data)){m.isArray(b)?b=b.concat(m.map(b,m.camelCase)):b in d?b=[b]:(b=m.camelCase(b),b=b in d?[b]:b.split(" ")),e=b.length;while(e--)delete d[b[e]];if(c?!P(d):!m.isEmptyObject(d))return}(c||(delete g[h].data,P(g[h])))&&(f?m.cleanData([a],!0):k.deleteExpando||g!=g.window?delete g[h]:g[h]=null)}}}m.extend({cache:{},noData:{"applet ":!0,"embed ":!0,"object ":"clsid:D27CDB6E-AE6D-11cf-96B8-444553540000"},hasData:function(a){return a=a.nodeType?m.cache[a[m.expando]]:a[m.expando],!!a&&!P(a)},data:function(a,b,c){return Q(a,b,c)},removeData:function(a,b){return R(a,b)},_data:function(a,b,c){return Q(a,b,c,!0)},_removeData:function(a,b){return R(a,b,!0)}}),m.fn.extend({data:function(a,b){var c,d,e,f=this[0],g=f&&f.attributes;if(void 0===a){if(this.length&&(e=m.data(f),1===f.nodeType&&!m._data(f,"parsedAttrs"))){c=g.length;while(c--)g[c]&&(d=g[c].name,0===d.indexOf("data-")&&(d=m.camelCase(d.slice(5)),O(f,d,e[d])));m._data(f,"parsedAttrs",!0)}return e}return"object"==typeof a?this.each(function(){m.data(this,a)}):arguments.length>1?this.each(function(){m.data(this,a,b)}):f?O(f,a,m.data(f,a)):void 0},removeData:function(a){return this.each(function(){m.removeData(this,a)})}}),m.extend({queue:function(a,b,c){var d;return a?(b=(b||"fx")+"queue",d=m._data(a,b),c&&(!d||m.isArray(c)?d=m._data(a,b,m.makeArray(c)):d.push(c)),d||[]):void 0},dequeue:function(a,b){b=b||"fx";var c=m.queue(a,b),d=c.length,e=c.shift(),f=m._queueHooks(a,b),g=function(){m.dequeue(a,b)};"inprogress"===e&&(e=c.shift(),d--),e&&("fx"===b&&c.unshift("inprogress"),delete f.stop,e.call(a,g,f)),!d&&f&&f.empty.fire()},_queueHooks:function(a,b){var c=b+"queueHooks";return m._data(a,c)||m._data(a,c,{empty:m.Callbacks("once memory").add(function(){m._removeData(a,b+"queue"),m._removeData(a,c)})})}}),m.fn.extend({queue:function(a,b){var c=2;return"string"!=typeof a&&(b=a,a="fx",c--),arguments.length<c?m.queue(this[0],a):void 0===b?this:this.each(function(){var c=m.queue(this,a,b);m._queueHooks(this,a),"fx"===a&&"inprogress"!==c[0]&&m.dequeue(this,a)})},dequeue:function(a){return this.each(function(){m.dequeue(this,a)})},clearQueue:function(a){return this.queue(a||"fx",[])},promise:function(a,b){var c,d=1,e=m.Deferred(),f=this,g=this.length,h=function(){--d||e.resolveWith(f,[f])};"string"!=typeof a&&(b=a,a=void 0),a=a||"fx";while(g--)c=m._data(f[g],a+"queueHooks"),c&&c.empty&&(d++,c.empty.add(h));return h(),e.promise(b)}});var S=/[+-]?(?:\d*\.|)\d+(?:[eE][+-]?\d+|)/.source,T=["Top","Right","Bottom","Left"],U=function(a,b){return a=b||a,"none"===m.css(a,"display")||!m.contains(a.ownerDocument,a)},V=m.access=function(a,b,c,d,e,f,g){var h=0,i=a.length,j=null==c;if("object"===m.type(c)){e=!0;for(h in c)m.access(a,b,h,c[h],!0,f,g)}else if(void 0!==d&&(e=!0,m.isFunction(d)||(g=!0),j&&(g?(b.call(a,d),b=null):(j=b,b=function(a,b,c){return j.call(m(a),c)})),b))for(;i>h;h++)b(a[h],c,g?d:d.call(a[h],h,b(a[h],c)));return e?a:j?b.call(a):i?b(a[0],c):f},W=/^(?:checkbox|radio)$/i;!function(){var a=y.createElement("input"),b=y.createElement("div"),c=y.createDocumentFragment();if(b.innerHTML="  <link/><table></table><a href='/a'>a</a><input type='checkbox'/>",k.leadingWhitespace=3===b.firstChild.nodeType,k.tbody=!b.getElementsByTagName("tbody").length,k.htmlSerialize=!!b.getElementsByTagName("link").length,k.html5Clone="<:nav></:nav>"!==y.createElement("nav").cloneNode(!0).outerHTML,a.type="checkbox",a.checked=!0,c.appendChild(a),k.appendChecked=a.checked,b.innerHTML="<textarea>x</textarea>",k.noCloneChecked=!!b.cloneNode(!0).lastChild.defaultValue,c.appendChild(b),b.innerHTML="<input type='radio' checked='checked' name='t'/>",k.checkClone=b.cloneNode(!0).cloneNode(!0).lastChild.checked,k.noCloneEvent=!0,b.attachEvent&&(b.attachEvent("onclick",function(){k.noCloneEvent=!1}),b.cloneNode(!0).click()),null==k.deleteExpando){k.deleteExpando=!0;try{delete b.test}catch(d){k.deleteExpando=!1}}}(),function(){var b,c,d=y.createElement("div");for(b in{submit:!0,change:!0,focusin:!0})c="on"+b,(k[b+"Bubbles"]=c in a)||(d.setAttribute(c,"t"),k[b+"Bubbles"]=d.attributes[c].expando===!1);d=null}();var X=/^(?:input|select|textarea)$/i,Y=/^key/,Z=/^(?:mouse|pointer|contextmenu)|click/,$=/^(?:focusinfocus|focusoutblur)$/,_=/^([^.]*)(?:\.(.+)|)$/;function ab(){return!0}function bb(){return!1}function cb(){try{return y.activeElement}catch(a){}}m.event={global:{},add:function(a,b,c,d,e){var f,g,h,i,j,k,l,n,o,p,q,r=m._data(a);if(r){c.handler&&(i=c,c=i.handler,e=i.selector),c.guid||(c.guid=m.guid++),(g=r.events)||(g=r.events={}),(k=r.handle)||(k=r.handle=function(a){return typeof m===K||a&&m.event.triggered===a.type?void 0:m.event.dispatch.apply(k.elem,arguments)},k.elem=a),b=(b||"").match(E)||[""],h=b.length;while(h--)f=_.exec(b[h])||[],o=q=f[1],p=(f[2]||"").split(".").sort(),o&&(j=m.event.special[o]||{},o=(e?j.delegateType:j.bindType)||o,j=m.event.special[o]||{},l=m.extend({type:o,origType:q,data:d,handler:c,guid:c.guid,selector:e,needsContext:e&&m.expr.match.needsContext.test(e),namespace:p.join(".")},i),(n=g[o])||(n=g[o]=[],n.delegateCount=0,j.setup&&j.setup.call(a,d,p,k)!==!1||(a.addEventListener?a.addEventListener(o,k,!1):a.attachEvent&&a.attachEvent("on"+o,k))),j.add&&(j.add.call(a,l),l.handler.guid||(l.handler.guid=c.guid)),e?n.splice(n.delegateCount++,0,l):n.push(l),m.event.global[o]=!0);a=null}},remove:function(a,b,c,d,e){var f,g,h,i,j,k,l,n,o,p,q,r=m.hasData(a)&&m._data(a);if(r&&(k=r.events)){b=(b||"").match(E)||[""],j=b.length;while(j--)if(h=_.exec(b[j])||[],o=q=h[1],p=(h[2]||"").split(".").sort(),o){l=m.event.special[o]||{},o=(d?l.delegateType:l.bindType)||o,n=k[o]||[],h=h[2]&&new RegExp("(^|\\.)"+p.join("\\.(?:.*\\.|)")+"(\\.|$)"),i=f=n.length;while(f--)g=n[f],!e&&q!==g.origType||c&&c.guid!==g.guid||h&&!h.test(g.namespace)||d&&d!==g.selector&&("**"!==d||!g.selector)||(n.splice(f,1),g.selector&&n.delegateCount--,l.remove&&l.remove.call(a,g));i&&!n.length&&(l.teardown&&l.teardown.call(a,p,r.handle)!==!1||m.removeEvent(a,o,r.handle),delete k[o])}else for(o in k)m.event.remove(a,o+b[j],c,d,!0);m.isEmptyObject(k)&&(delete r.handle,m._removeData(a,"events"))}},trigger:function(b,c,d,e){var f,g,h,i,k,l,n,o=[d||y],p=j.call(b,"type")?b.type:b,q=j.call(b,"namespace")?b.namespace.split("."):[];if(h=l=d=d||y,3!==d.nodeType&&8!==d.nodeType&&!$.test(p+m.event.triggered)&&(p.indexOf(".")>=0&&(q=p.split("."),p=q.shift(),q.sort()),g=p.indexOf(":")<0&&"on"+p,b=b[m.expando]?b:new m.Event(p,"object"==typeof b&&b),b.isTrigger=e?2:3,b.namespace=q.join("."),b.namespace_re=b.namespace?new RegExp("(^|\\.)"+q.join("\\.(?:.*\\.|)")+"(\\.|$)"):null,b.result=void 0,b.target||(b.target=d),c=null==c?[b]:m.makeArray(c,[b]),k=m.event.special[p]||{},e||!k.trigger||k.trigger.apply(d,c)!==!1)){if(!e&&!k.noBubble&&!m.isWindow(d)){for(i=k.delegateType||p,$.test(i+p)||(h=h.parentNode);h;h=h.parentNode)o.push(h),l=h;l===(d.ownerDocument||y)&&o.push(l.defaultView||l.parentWindow||a)}n=0;while((h=o[n++])&&!b.isPropagationStopped())b.type=n>1?i:k.bindType||p,f=(m._data(h,"events")||{})[b.type]&&m._data(h,"handle"),f&&f.apply(h,c),f=g&&h[g],f&&f.apply&&m.acceptData(h)&&(b.result=f.apply(h,c),b.result===!1&&b.preventDefault());if(b.type=p,!e&&!b.isDefaultPrevented()&&(!k._default||k._default.apply(o.pop(),c)===!1)&&m.acceptData(d)&&g&&d[p]&&!m.isWindow(d)){l=d[g],l&&(d[g]=null),m.event.triggered=p;try{d[p]()}catch(r){}m.event.triggered=void 0,l&&(d[g]=l)}return b.result}},dispatch:function(a){a=m.event.fix(a);var b,c,e,f,g,h=[],i=d.call(arguments),j=(m._data(this,"events")||{})[a.type]||[],k=m.event.special[a.type]||{};if(i[0]=a,a.delegateTarget=this,!k.preDispatch||k.preDispatch.call(this,a)!==!1){h=m.event.handlers.call(this,a,j),b=0;while((f=h[b++])&&!a.isPropagationStopped()){a.currentTarget=f.elem,g=0;while((e=f.handlers[g++])&&!a.isImmediatePropagationStopped())(!a.namespace_re||a.namespace_re.test(e.namespace))&&(a.handleObj=e,a.data=e.data,c=((m.event.special[e.origType]||{}).handle||e.handler).apply(f.elem,i),void 0!==c&&(a.result=c)===!1&&(a.preventDefault(),a.stopPropagation()))}return k.postDispatch&&k.postDispatch.call(this,a),a.result}},handlers:function(a,b){var c,d,e,f,g=[],h=b.delegateCount,i=a.target;if(h&&i.nodeType&&(!a.button||"click"!==a.type))for(;i!=this;i=i.parentNode||this)if(1===i.nodeType&&(i.disabled!==!0||"click"!==a.type)){for(e=[],f=0;h>f;f++)d=b[f],c=d.selector+" ",void 0===e[c]&&(e[c]=d.needsContext?m(c,this).index(i)>=0:m.find(c,this,null,[i]).length),e[c]&&e.push(d);e.length&&g.push({elem:i,handlers:e})}return h<b.length&&g.push({elem:this,handlers:b.slice(h)}),g},fix:function(a){if(a[m.expando])return a;var b,c,d,e=a.type,f=a,g=this.fixHooks[e];g||(this.fixHooks[e]=g=Z.test(e)?this.mouseHooks:Y.test(e)?this.keyHooks:{}),d=g.props?this.props.concat(g.props):this.props,a=new m.Event(f),b=d.length;while(b--)c=d[b],a[c]=f[c];return a.target||(a.target=f.srcElement||y),3===a.target.nodeType&&(a.target=a.target.parentNode),a.metaKey=!!a.metaKey,g.filter?g.filter(a,f):a},props:"altKey bubbles cancelable ctrlKey currentTarget eventPhase metaKey relatedTarget shiftKey target timeStamp view which".split(" "),fixHooks:{},keyHooks:{props:"char charCode key keyCode".split(" "),filter:function(a,b){return null==a.which&&(a.which=null!=b.charCode?b.charCode:b.keyCode),a}},mouseHooks:{props:"button buttons clientX clientY fromElement offsetX offsetY pageX pageY screenX screenY toElement".split(" "),filter:function(a,b){var c,d,e,f=b.button,g=b.fromElement;return null==a.pageX&&null!=b.clientX&&(d=a.target.ownerDocument||y,e=d.documentElement,c=d.body,a.pageX=b.clientX+(e&&e.scrollLeft||c&&c.scrollLeft||0)-(e&&e.clientLeft||c&&c.clientLeft||0),a.pageY=b.clientY+(e&&e.scrollTop||c&&c.scrollTop||0)-(e&&e.clientTop||c&&c.clientTop||0)),!a.relatedTarget&&g&&(a.relatedTarget=g===a.target?b.toElement:g),a.which||void 0===f||(a.which=1&f?1:2&f?3:4&f?2:0),a}},special:{load:{noBubble:!0},focus:{trigger:function(){if(this!==cb()&&this.focus)try{return this.focus(),!1}catch(a){}},delegateType:"focusin"},blur:{trigger:function(){return this===cb()&&this.blur?(this.blur(),!1):void 0},delegateType:"focusout"},click:{trigger:function(){return m.nodeName(this,"input")&&"checkbox"===this.type&&this.click?(this.click(),!1):void 0},_default:function(a){return m.nodeName(a.target,"a")}},beforeunload:{postDispatch:function(a){void 0!==a.result&&a.originalEvent&&(a.originalEvent.returnValue=a.result)}}},simulate:function(a,b,c,d){var e=m.extend(new m.Event,c,{type:a,isSimulated:!0,originalEvent:{}});d?m.event.trigger(e,null,b):m.event.dispatch.call(b,e),e.isDefaultPrevented()&&c.preventDefault()}},m.removeEvent=y.removeEventListener?function(a,b,c){a.removeEventListener&&a.removeEventListener(b,c,!1)}:function(a,b,c){var d="on"+b;a.detachEvent&&(typeof a[d]===K&&(a[d]=null),a.detachEvent(d,c))},m.Event=function(a,b){return this instanceof m.Event?(a&&a.type?(this.originalEvent=a,this.type=a.type,this.isDefaultPrevented=a.defaultPrevented||void 0===a.defaultPrevented&&a.returnValue===!1?ab:bb):this.type=a,b&&m.extend(this,b),this.timeStamp=a&&a.timeStamp||m.now(),void(this[m.expando]=!0)):new m.Event(a,b)},m.Event.prototype={isDefaultPrevented:bb,isPropagationStopped:bb,isImmediatePropagationStopped:bb,preventDefault:function(){var a=this.originalEvent;this.isDefaultPrevented=ab,a&&(a.preventDefault?a.preventDefault():a.returnValue=!1)},stopPropagation:function(){var a=this.originalEvent;this.isPropagationStopped=ab,a&&(a.stopPropagation&&a.stopPropagation(),a.cancelBubble=!0)},stopImmediatePropagation:function(){var a=this.originalEvent;this.isImmediatePropagationStopped=ab,a&&a.stopImmediatePropagation&&a.stopImmediatePropagation(),this.stopPropagation()}},m.each({mouseenter:"mouseover",mouseleave:"mouseout",pointerenter:"pointerover",pointerleave:"pointerout"},function(a,b){m.event.special[a]={delegateType:b,bindType:b,handle:function(a){var c,d=this,e=a.relatedTarget,f=a.handleObj;return(!e||e!==d&&!m.contains(d,e))&&(a.type=f.origType,c=f.handler.apply(this,arguments),a.type=b),c}}}),k.submitBubbles||(m.event.special.submit={setup:function(){return m.nodeName(this,"form")?!1:void m.event.add(this,"click._submit keypress._submit",function(a){var b=a.target,c=m.nodeName(b,"input")||m.nodeName(b,"button")?b.form:void 0;c&&!m._data(c,"submitBubbles")&&(m.event.add(c,"submit._submit",function(a){a._submit_bubble=!0}),m._data(c,"submitBubbles",!0))})},postDispatch:function(a){a._submit_bubble&&(delete a._submit_bubble,this.parentNode&&!a.isTrigger&&m.event.simulate("submit",this.parentNode,a,!0))},teardown:function(){return m.nodeName(this,"form")?!1:void m.event.remove(this,"._submit")}}),k.changeBubbles||(m.event.special.change={setup:function(){return X.test(this.nodeName)?(("checkbox"===this.type||"radio"===this.type)&&(m.event.add(this,"propertychange._change",function(a){"checked"===a.originalEvent.propertyName&&(this._just_changed=!0)}),m.event.add(this,"click._change",function(a){this._just_changed&&!a.isTrigger&&(this._just_changed=!1),m.event.simulate("change",this,a,!0)})),!1):void m.event.add(this,"beforeactivate._change",function(a){var b=a.target;X.test(b.nodeName)&&!m._data(b,"changeBubbles")&&(m.event.add(b,"change._change",function(a){!this.parentNode||a.isSimulated||a.isTrigger||m.event.simulate("change",this.parentNode,a,!0)}),m._data(b,"changeBubbles",!0))})},handle:function(a){var b=a.target;return this!==b||a.isSimulated||a.isTrigger||"radio"!==b.type&&"checkbox"!==b.type?a.handleObj.handler.apply(this,arguments):void 0},teardown:function(){return m.event.remove(this,"._change"),!X.test(this.nodeName)}}),k.focusinBubbles||m.each({focus:"focusin",blur:"focusout"},function(a,b){var c=function(a){m.event.simulate(b,a.target,m.event.fix(a),!0)};m.event.special[b]={setup:function(){var d=this.ownerDocument||this,e=m._data(d,b);e||d.addEventListener(a,c,!0),m._data(d,b,(e||0)+1)},teardown:function(){var d=this.ownerDocument||this,e=m._data(d,b)-1;e?m._data(d,b,e):(d.removeEventListener(a,c,!0),m._removeData(d,b))}}}),m.fn.extend({on:function(a,b,c,d,e){var f,g;if("object"==typeof a){"string"!=typeof b&&(c=c||b,b=void 0);for(f in a)this.on(f,b,c,a[f],e);return this}if(null==c&&null==d?(d=b,c=b=void 0):null==d&&("string"==typeof b?(d=c,c=void 0):(d=c,c=b,b=void 0)),d===!1)d=bb;else if(!d)return this;return 1===e&&(g=d,d=function(a){return m().off(a),g.apply(this,arguments)},d.guid=g.guid||(g.guid=m.guid++)),this.each(function(){m.event.add(this,a,d,c,b)})},one:function(a,b,c,d){return this.on(a,b,c,d,1)},off:function(a,b,c){var d,e;if(a&&a.preventDefault&&a.handleObj)return d=a.handleObj,m(a.delegateTarget).off(d.namespace?d.origType+"."+d.namespace:d.origType,d.selector,d.handler),this;if("object"==typeof a){for(e in a)this.off(e,b,a[e]);return this}return(b===!1||"function"==typeof b)&&(c=b,b=void 0),c===!1&&(c=bb),this.each(function(){m.event.remove(this,a,c,b)})},trigger:function(a,b){return this.each(function(){m.event.trigger(a,b,this)})},triggerHandler:function(a,b){var c=this[0];return c?m.event.trigger(a,b,c,!0):void 0}});function db(a){var b=eb.split("|"),c=a.createDocumentFragment();if(c.createElement)while(b.length)c.createElement(b.pop());return c}var eb="abbr|article|aside|audio|bdi|canvas|data|datalist|details|figcaption|figure|footer|header|hgroup|mark|meter|nav|output|progress|section|summary|time|video",fb=/ jQuery\d+="(?:null|\d+)"/g,gb=new RegExp("<(?:"+eb+")[\\s/>]","i"),hb=/^\s+/,ib=/<(?!area|br|col|embed|hr|img|input|link|meta|param)(([\w:]+)[^>]*)\/>/gi,jb=/<([\w:]+)/,kb=/<tbody/i,lb=/<|&#?\w+;/,mb=/<(?:script|style|link)/i,nb=/checked\s*(?:[^=]|=\s*.checked.)/i,ob=/^$|\/(?:java|ecma)script/i,pb=/^true\/(.*)/,qb=/^\s*<!(?:\[CDATA\[|--)|(?:\]\]|--)>\s*$/g,rb={option:[1,"<select multiple='multiple'>","</select>"],legend:[1,"<fieldset>","</fieldset>"],area:[1,"<map>","</map>"],param:[1,"<object>","</object>"],thead:[1,"<table>","</table>"],tr:[2,"<table><tbody>","</tbody></table>"],col:[2,"<table><tbody></tbody><colgroup>","</colgroup></table>"],td:[3,"<table><tbody><tr>","</tr></tbody></table>"],_default:k.htmlSerialize?[0,"",""]:[1,"X<div>","</div>"]},sb=db(y),tb=sb.appendChild(y.createElement("div"));rb.optgroup=rb.option,rb.tbody=rb.tfoot=rb.colgroup=rb.caption=rb.thead,rb.th=rb.td;function ub(a,b){var c,d,e=0,f=typeof a.getElementsByTagName!==K?a.getElementsByTagName(b||"*"):typeof a.querySelectorAll!==K?a.querySelectorAll(b||"*"):void 0;if(!f)for(f=[],c=a.childNodes||a;null!=(d=c[e]);e++)!b||m.nodeName(d,b)?f.push(d):m.merge(f,ub(d,b));return void 0===b||b&&m.nodeName(a,b)?m.merge([a],f):f}function vb(a){W.test(a.type)&&(a.defaultChecked=a.checked)}function wb(a,b){return m.nodeName(a,"table")&&m.nodeName(11!==b.nodeType?b:b.firstChild,"tr")?a.getElementsByTagName("tbody")[0]||a.appendChild(a.ownerDocument.createElement("tbody")):a}function xb(a){return a.type=(null!==m.find.attr(a,"type"))+"/"+a.type,a}function yb(a){var b=pb.exec(a.type);return b?a.type=b[1]:a.removeAttribute("type"),a}function zb(a,b){for(var c,d=0;null!=(c=a[d]);d++)m._data(c,"globalEval",!b||m._data(b[d],"globalEval"))}function Ab(a,b){if(1===b.nodeType&&m.hasData(a)){var c,d,e,f=m._data(a),g=m._data(b,f),h=f.events;if(h){delete g.handle,g.events={};for(c in h)for(d=0,e=h[c].length;e>d;d++)m.event.add(b,c,h[c][d])}g.data&&(g.data=m.extend({},g.data))}}function Bb(a,b){var c,d,e;if(1===b.nodeType){if(c=b.nodeName.toLowerCase(),!k.noCloneEvent&&b[m.expando]){e=m._data(b);for(d in e.events)m.removeEvent(b,d,e.handle);b.removeAttribute(m.expando)}"script"===c&&b.text!==a.text?(xb(b).text=a.text,yb(b)):"object"===c?(b.parentNode&&(b.outerHTML=a.outerHTML),k.html5Clone&&a.innerHTML&&!m.trim(b.innerHTML)&&(b.innerHTML=a.innerHTML)):"input"===c&&W.test(a.type)?(b.defaultChecked=b.checked=a.checked,b.value!==a.value&&(b.value=a.value)):"option"===c?b.defaultSelected=b.selected=a.defaultSelected:("input"===c||"textarea"===c)&&(b.defaultValue=a.defaultValue)}}m.extend({clone:function(a,b,c){var d,e,f,g,h,i=m.contains(a.ownerDocument,a);if(k.html5Clone||m.isXMLDoc(a)||!gb.test("<"+a.nodeName+">")?f=a.cloneNode(!0):(tb.innerHTML=a.outerHTML,tb.removeChild(f=tb.firstChild)),!(k.noCloneEvent&&k.noCloneChecked||1!==a.nodeType&&11!==a.nodeType||m.isXMLDoc(a)))for(d=ub(f),h=ub(a),g=0;null!=(e=h[g]);++g)d[g]&&Bb(e,d[g]);if(b)if(c)for(h=h||ub(a),d=d||ub(f),g=0;null!=(e=h[g]);g++)Ab(e,d[g]);else Ab(a,f);return d=ub(f,"script"),d.length>0&&zb(d,!i&&ub(a,"script")),d=h=e=null,f},buildFragment:function(a,b,c,d){for(var e,f,g,h,i,j,l,n=a.length,o=db(b),p=[],q=0;n>q;q++)if(f=a[q],f||0===f)if("object"===m.type(f))m.merge(p,f.nodeType?[f]:f);else if(lb.test(f)){h=h||o.appendChild(b.createElement("div")),i=(jb.exec(f)||["",""])[1].toLowerCase(),l=rb[i]||rb._default,h.innerHTML=l[1]+f.replace(ib,"<$1></$2>")+l[2],e=l[0];while(e--)h=h.lastChild;if(!k.leadingWhitespace&&hb.test(f)&&p.push(b.createTextNode(hb.exec(f)[0])),!k.tbody){f="table"!==i||kb.test(f)?"<table>"!==l[1]||kb.test(f)?0:h:h.firstChild,e=f&&f.childNodes.length;while(e--)m.nodeName(j=f.childNodes[e],"tbody")&&!j.childNodes.length&&f.removeChild(j)}m.merge(p,h.childNodes),h.textContent="";while(h.firstChild)h.removeChild(h.firstChild);h=o.lastChild}else p.push(b.createTextNode(f));h&&o.removeChild(h),k.appendChecked||m.grep(ub(p,"input"),vb),q=0;while(f=p[q++])if((!d||-1===m.inArray(f,d))&&(g=m.contains(f.ownerDocument,f),h=ub(o.appendChild(f),"script"),g&&zb(h),c)){e=0;while(f=h[e++])ob.test(f.type||"")&&c.push(f)}return h=null,o},cleanData:function(a,b){for(var d,e,f,g,h=0,i=m.expando,j=m.cache,l=k.deleteExpando,n=m.event.special;null!=(d=a[h]);h++)if((b||m.acceptData(d))&&(f=d[i],g=f&&j[f])){if(g.events)for(e in g.events)n[e]?m.event.remove(d,e):m.removeEvent(d,e,g.handle);j[f]&&(delete j[f],l?delete d[i]:typeof d.removeAttribute!==K?d.removeAttribute(i):d[i]=null,c.push(f))}}}),m.fn.extend({text:function(a){return V(this,function(a){return void 0===a?m.text(this):this.empty().append((this[0]&&this[0].ownerDocument||y).createTextNode(a))},null,a,arguments.length)},append:function(){return this.domManip(arguments,function(a){if(1===this.nodeType||11===this.nodeType||9===this.nodeType){var b=wb(this,a);b.appendChild(a)}})},prepend:function(){return this.domManip(arguments,function(a){if(1===this.nodeType||11===this.nodeType||9===this.nodeType){var b=wb(this,a);b.insertBefore(a,b.firstChild)}})},before:function(){return this.domManip(arguments,function(a){this.parentNode&&this.parentNode.insertBefore(a,this)})},after:function(){return this.domManip(arguments,function(a){this.parentNode&&this.parentNode.insertBefore(a,this.nextSibling)})},remove:function(a,b){for(var c,d=a?m.filter(a,this):this,e=0;null!=(c=d[e]);e++)b||1!==c.nodeType||m.cleanData(ub(c)),c.parentNode&&(b&&m.contains(c.ownerDocument,c)&&zb(ub(c,"script")),c.parentNode.removeChild(c));return this},empty:function(){for(var a,b=0;null!=(a=this[b]);b++){1===a.nodeType&&m.cleanData(ub(a,!1));while(a.firstChild)a.removeChild(a.firstChild);a.options&&m.nodeName(a,"select")&&(a.options.length=0)}return this},clone:function(a,b){return a=null==a?!1:a,b=null==b?a:b,this.map(function(){return m.clone(this,a,b)})},html:function(a){return V(this,function(a){var b=this[0]||{},c=0,d=this.length;if(void 0===a)return 1===b.nodeType?b.innerHTML.replace(fb,""):void 0;if(!("string"!=typeof a||mb.test(a)||!k.htmlSerialize&&gb.test(a)||!k.leadingWhitespace&&hb.test(a)||rb[(jb.exec(a)||["",""])[1].toLowerCase()])){a=a.replace(ib,"<$1></$2>");try{for(;d>c;c++)b=this[c]||{},1===b.nodeType&&(m.cleanData(ub(b,!1)),b.innerHTML=a);b=0}catch(e){}}b&&this.empty().append(a)},null,a,arguments.length)},replaceWith:function(){var a=arguments[0];return this.domManip(arguments,function(b){a=this.parentNode,m.cleanData(ub(this)),a&&a.replaceChild(b,this)}),a&&(a.length||a.nodeType)?this:this.remove()},detach:function(a){return this.remove(a,!0)},domManip:function(a,b){a=e.apply([],a);var c,d,f,g,h,i,j=0,l=this.length,n=this,o=l-1,p=a[0],q=m.isFunction(p);if(q||l>1&&"string"==typeof p&&!k.checkClone&&nb.test(p))return this.each(function(c){var d=n.eq(c);q&&(a[0]=p.call(this,c,d.html())),d.domManip(a,b)});if(l&&(i=m.buildFragment(a,this[0].ownerDocument,!1,this),c=i.firstChild,1===i.childNodes.length&&(i=c),c)){for(g=m.map(ub(i,"script"),xb),f=g.length;l>j;j++)d=i,j!==o&&(d=m.clone(d,!0,!0),f&&m.merge(g,ub(d,"script"))),b.call(this[j],d,j);if(f)for(h=g[g.length-1].ownerDocument,m.map(g,yb),j=0;f>j;j++)d=g[j],ob.test(d.type||"")&&!m._data(d,"globalEval")&&m.contains(h,d)&&(d.src?m._evalUrl&&m._evalUrl(d.src):m.globalEval((d.text||d.textContent||d.innerHTML||"").replace(qb,"")));i=c=null}return this}}),m.each({appendTo:"append",prependTo:"prepend",insertBefore:"before",insertAfter:"after",replaceAll:"replaceWith"},function(a,b){m.fn[a]=function(a){for(var c,d=0,e=[],g=m(a),h=g.length-1;h>=d;d++)c=d===h?this:this.clone(!0),m(g[d])[b](c),f.apply(e,c.get());return this.pushStack(e)}});var Cb,Db={};function Eb(b,c){var d,e=m(c.createElement(b)).appendTo(c.body),f=a.getDefaultComputedStyle&&(d=a.getDefaultComputedStyle(e[0]))?d.display:m.css(e[0],"display");return e.detach(),f}function Fb(a){var b=y,c=Db[a];return c||(c=Eb(a,b),"none"!==c&&c||(Cb=(Cb||m("<iframe frameborder='0' width='0' height='0'/>")).appendTo(b.documentElement),b=(Cb[0].contentWindow||Cb[0].contentDocument).document,b.write(),b.close(),c=Eb(a,b),Cb.detach()),Db[a]=c),c}!function(){var a;k.shrinkWrapBlocks=function(){if(null!=a)return a;a=!1;var b,c,d;return c=y.getElementsByTagName("body")[0],c&&c.style?(b=y.createElement("div"),d=y.createElement("div"),d.style.cssText="position:absolute;border:0;width:0;height:0;top:0;left:-9999px",c.appendChild(d).appendChild(b),typeof b.style.zoom!==K&&(b.style.cssText="-webkit-box-sizing:content-box;-moz-box-sizing:content-box;box-sizing:content-box;display:block;margin:0;border:0;padding:1px;width:1px;zoom:1",b.appendChild(y.createElement("div")).style.width="5px",a=3!==b.offsetWidth),c.removeChild(d),a):void 0}}();var Gb=/^margin/,Hb=new RegExp("^("+S+")(?!px)[a-z%]+$","i"),Ib,Jb,Kb=/^(top|right|bottom|left)$/;a.getComputedStyle?(Ib=function(a){return a.ownerDocument.defaultView.getComputedStyle(a,null)},Jb=function(a,b,c){var d,e,f,g,h=a.style;return c=c||Ib(a),g=c?c.getPropertyValue(b)||c[b]:void 0,c&&(""!==g||m.contains(a.ownerDocument,a)||(g=m.style(a,b)),Hb.test(g)&&Gb.test(b)&&(d=h.width,e=h.minWidth,f=h.maxWidth,h.minWidth=h.maxWidth=h.width=g,g=c.width,h.width=d,h.minWidth=e,h.maxWidth=f)),void 0===g?g:g+""}):y.documentElement.currentStyle&&(Ib=function(a){return a.currentStyle},Jb=function(a,b,c){var d,e,f,g,h=a.style;return c=c||Ib(a),g=c?c[b]:void 0,null==g&&h&&h[b]&&(g=h[b]),Hb.test(g)&&!Kb.test(b)&&(d=h.left,e=a.runtimeStyle,f=e&&e.left,f&&(e.left=a.currentStyle.left),h.left="fontSize"===b?"1em":g,g=h.pixelLeft+"px",h.left=d,f&&(e.left=f)),void 0===g?g:g+""||"auto"});function Lb(a,b){return{get:function(){var c=a();if(null!=c)return c?void delete this.get:(this.get=b).apply(this,arguments)}}}!function(){var b,c,d,e,f,g,h;if(b=y.createElement("div"),b.innerHTML="  <link/><table></table><a href='/a'>a</a><input type='checkbox'/>",d=b.getElementsByTagName("a")[0],c=d&&d.style){c.cssText="float:left;opacity:.5",k.opacity="0.5"===c.opacity,k.cssFloat=!!c.cssFloat,b.style.backgroundClip="content-box",b.cloneNode(!0).style.backgroundClip="",k.clearCloneStyle="content-box"===b.style.backgroundClip,k.boxSizing=""===c.boxSizing||""===c.MozBoxSizing||""===c.WebkitBoxSizing,m.extend(k,{reliableHiddenOffsets:function(){return null==g&&i(),g},boxSizingReliable:function(){return null==f&&i(),f},pixelPosition:function(){return null==e&&i(),e},reliableMarginRight:function(){return null==h&&i(),h}});function i(){var b,c,d,i;c=y.getElementsByTagName("body")[0],c&&c.style&&(b=y.createElement("div"),d=y.createElement("div"),d.style.cssText="position:absolute;border:0;width:0;height:0;top:0;left:-9999px",c.appendChild(d).appendChild(b),b.style.cssText="-webkit-box-sizing:border-box;-moz-box-sizing:border-box;box-sizing:border-box;display:block;margin-top:1%;top:1%;border:1px;padding:1px;width:4px;position:absolute",e=f=!1,h=!0,a.getComputedStyle&&(e="1%"!==(a.getComputedStyle(b,null)||{}).top,f="4px"===(a.getComputedStyle(b,null)||{width:"4px"}).width,i=b.appendChild(y.createElement("div")),i.style.cssText=b.style.cssText="-webkit-box-sizing:content-box;-moz-box-sizing:content-box;box-sizing:content-box;display:block;margin:0;border:0;padding:0",i.style.marginRight=i.style.width="0",b.style.width="1px",h=!parseFloat((a.getComputedStyle(i,null)||{}).marginRight)),b.innerHTML="<table><tr><td></td><td>t</td></tr></table>",i=b.getElementsByTagName("td"),i[0].style.cssText="margin:0;border:0;padding:0;display:none",g=0===i[0].offsetHeight,g&&(i[0].style.display="",i[1].style.display="none",g=0===i[0].offsetHeight),c.removeChild(d))}}}(),m.swap=function(a,b,c,d){var e,f,g={};for(f in b)g[f]=a.style[f],a.style[f]=b[f];e=c.apply(a,d||[]);for(f in b)a.style[f]=g[f];return e};var Mb=/alpha\([^)]*\)/i,Nb=/opacity\s*=\s*([^)]*)/,Ob=/^(none|table(?!-c[ea]).+)/,Pb=new RegExp("^("+S+")(.*)$","i"),Qb=new RegExp("^([+-])=("+S+")","i"),Rb={position:"absolute",visibility:"hidden",display:"block"},Sb={letterSpacing:"0",fontWeight:"400"},Tb=["Webkit","O","Moz","ms"];function Ub(a,b){if(b in a)return b;var c=b.charAt(0).toUpperCase()+b.slice(1),d=b,e=Tb.length;while(e--)if(b=Tb[e]+c,b in a)return b;return d}function Vb(a,b){for(var c,d,e,f=[],g=0,h=a.length;h>g;g++)d=a[g],d.style&&(f[g]=m._data(d,"olddisplay"),c=d.style.display,b?(f[g]||"none"!==c||(d.style.display=""),""===d.style.display&&U(d)&&(f[g]=m._data(d,"olddisplay",Fb(d.nodeName)))):(e=U(d),(c&&"none"!==c||!e)&&m._data(d,"olddisplay",e?c:m.css(d,"display"))));for(g=0;h>g;g++)d=a[g],d.style&&(b&&"none"!==d.style.display&&""!==d.style.display||(d.style.display=b?f[g]||"":"none"));return a}function Wb(a,b,c){var d=Pb.exec(b);return d?Math.max(0,d[1]-(c||0))+(d[2]||"px"):b}function Xb(a,b,c,d,e){for(var f=c===(d?"border":"content")?4:"width"===b?1:0,g=0;4>f;f+=2)"margin"===c&&(g+=m.css(a,c+T[f],!0,e)),d?("content"===c&&(g-=m.css(a,"padding"+T[f],!0,e)),"margin"!==c&&(g-=m.css(a,"border"+T[f]+"Width",!0,e))):(g+=m.css(a,"padding"+T[f],!0,e),"padding"!==c&&(g+=m.css(a,"border"+T[f]+"Width",!0,e)));return g}function Yb(a,b,c){var d=!0,e="width"===b?a.offsetWidth:a.offsetHeight,f=Ib(a),g=k.boxSizing&&"border-box"===m.css(a,"boxSizing",!1,f);if(0>=e||null==e){if(e=Jb(a,b,f),(0>e||null==e)&&(e=a.style[b]),Hb.test(e))return e;d=g&&(k.boxSizingReliable()||e===a.style[b]),e=parseFloat(e)||0}return e+Xb(a,b,c||(g?"border":"content"),d,f)+"px"}m.extend({cssHooks:{opacity:{get:function(a,b){if(b){var c=Jb(a,"opacity");return""===c?"1":c}}}},cssNumber:{columnCount:!0,fillOpacity:!0,flexGrow:!0,flexShrink:!0,fontWeight:!0,lineHeight:!0,opacity:!0,order:!0,orphans:!0,widows:!0,zIndex:!0,zoom:!0},cssProps:{"float":k.cssFloat?"cssFloat":"styleFloat"},style:function(a,b,c,d){if(a&&3!==a.nodeType&&8!==a.nodeType&&a.style){var e,f,g,h=m.camelCase(b),i=a.style;if(b=m.cssProps[h]||(m.cssProps[h]=Ub(i,h)),g=m.cssHooks[b]||m.cssHooks[h],void 0===c)return g&&"get"in g&&void 0!==(e=g.get(a,!1,d))?e:i[b];if(f=typeof c,"string"===f&&(e=Qb.exec(c))&&(c=(e[1]+1)*e[2]+parseFloat(m.css(a,b)),f="number"),null!=c&&c===c&&("number"!==f||m.cssNumber[h]||(c+="px"),k.clearCloneStyle||""!==c||0!==b.indexOf("background")||(i[b]="inherit"),!(g&&"set"in g&&void 0===(c=g.set(a,c,d)))))try{i[b]=c}catch(j){}}},css:function(a,b,c,d){var e,f,g,h=m.camelCase(b);return b=m.cssProps[h]||(m.cssProps[h]=Ub(a.style,h)),g=m.cssHooks[b]||m.cssHooks[h],g&&"get"in g&&(f=g.get(a,!0,c)),void 0===f&&(f=Jb(a,b,d)),"normal"===f&&b in Sb&&(f=Sb[b]),""===c||c?(e=parseFloat(f),c===!0||m.isNumeric(e)?e||0:f):f}}),m.each(["height","width"],function(a,b){m.cssHooks[b]={get:function(a,c,d){return c?Ob.test(m.css(a,"display"))&&0===a.offsetWidth?m.swap(a,Rb,function(){return Yb(a,b,d)}):Yb(a,b,d):void 0},set:function(a,c,d){var e=d&&Ib(a);return Wb(a,c,d?Xb(a,b,d,k.boxSizing&&"border-box"===m.css(a,"boxSizing",!1,e),e):0)}}}),k.opacity||(m.cssHooks.opacity={get:function(a,b){return Nb.test((b&&a.currentStyle?a.currentStyle.filter:a.style.filter)||"")?.01*parseFloat(RegExp.$1)+"":b?"1":""},set:function(a,b){var c=a.style,d=a.currentStyle,e=m.isNumeric(b)?"alpha(opacity="+100*b+")":"",f=d&&d.filter||c.filter||"";c.zoom=1,(b>=1||""===b)&&""===m.trim(f.replace(Mb,""))&&c.removeAttribute&&(c.removeAttribute("filter"),""===b||d&&!d.filter)||(c.filter=Mb.test(f)?f.replace(Mb,e):f+" "+e)}}),m.cssHooks.marginRight=Lb(k.reliableMarginRight,function(a,b){return b?m.swap(a,{display:"inline-block"},Jb,[a,"marginRight"]):void 0}),m.each({margin:"",padding:"",border:"Width"},function(a,b){m.cssHooks[a+b]={expand:function(c){for(var d=0,e={},f="string"==typeof c?c.split(" "):[c];4>d;d++)e[a+T[d]+b]=f[d]||f[d-2]||f[0];return e}},Gb.test(a)||(m.cssHooks[a+b].set=Wb)}),m.fn.extend({css:function(a,b){return V(this,function(a,b,c){var d,e,f={},g=0;if(m.isArray(b)){for(d=Ib(a),e=b.length;e>g;g++)f[b[g]]=m.css(a,b[g],!1,d);return f}return void 0!==c?m.style(a,b,c):m.css(a,b)},a,b,arguments.length>1)},show:function(){return Vb(this,!0)},hide:function(){return Vb(this)},toggle:function(a){return"boolean"==typeof a?a?this.show():this.hide():this.each(function(){U(this)?m(this).show():m(this).hide()})}});function Zb(a,b,c,d,e){return new Zb.prototype.init(a,b,c,d,e)}m.Tween=Zb,Zb.prototype={constructor:Zb,init:function(a,b,c,d,e,f){this.elem=a,this.prop=c,this.easing=e||"swing",this.options=b,this.start=this.now=this.cur(),this.end=d,this.unit=f||(m.cssNumber[c]?"":"px")
+},cur:function(){var a=Zb.propHooks[this.prop];return a&&a.get?a.get(this):Zb.propHooks._default.get(this)},run:function(a){var b,c=Zb.propHooks[this.prop];return this.pos=b=this.options.duration?m.easing[this.easing](a,this.options.duration*a,0,1,this.options.duration):a,this.now=(this.end-this.start)*b+this.start,this.options.step&&this.options.step.call(this.elem,this.now,this),c&&c.set?c.set(this):Zb.propHooks._default.set(this),this}},Zb.prototype.init.prototype=Zb.prototype,Zb.propHooks={_default:{get:function(a){var b;return null==a.elem[a.prop]||a.elem.style&&null!=a.elem.style[a.prop]?(b=m.css(a.elem,a.prop,""),b&&"auto"!==b?b:0):a.elem[a.prop]},set:function(a){m.fx.step[a.prop]?m.fx.step[a.prop](a):a.elem.style&&(null!=a.elem.style[m.cssProps[a.prop]]||m.cssHooks[a.prop])?m.style(a.elem,a.prop,a.now+a.unit):a.elem[a.prop]=a.now}}},Zb.propHooks.scrollTop=Zb.propHooks.scrollLeft={set:function(a){a.elem.nodeType&&a.elem.parentNode&&(a.elem[a.prop]=a.now)}},m.easing={linear:function(a){return a},swing:function(a){return.5-Math.cos(a*Math.PI)/2}},m.fx=Zb.prototype.init,m.fx.step={};var $b,_b,ac=/^(?:toggle|show|hide)$/,bc=new RegExp("^(?:([+-])=|)("+S+")([a-z%]*)$","i"),cc=/queueHooks$/,dc=[ic],ec={"*":[function(a,b){var c=this.createTween(a,b),d=c.cur(),e=bc.exec(b),f=e&&e[3]||(m.cssNumber[a]?"":"px"),g=(m.cssNumber[a]||"px"!==f&&+d)&&bc.exec(m.css(c.elem,a)),h=1,i=20;if(g&&g[3]!==f){f=f||g[3],e=e||[],g=+d||1;do h=h||".5",g/=h,m.style(c.elem,a,g+f);while(h!==(h=c.cur()/d)&&1!==h&&--i)}return e&&(g=c.start=+g||+d||0,c.unit=f,c.end=e[1]?g+(e[1]+1)*e[2]:+e[2]),c}]};function fc(){return setTimeout(function(){$b=void 0}),$b=m.now()}function gc(a,b){var c,d={height:a},e=0;for(b=b?1:0;4>e;e+=2-b)c=T[e],d["margin"+c]=d["padding"+c]=a;return b&&(d.opacity=d.width=a),d}function hc(a,b,c){for(var d,e=(ec[b]||[]).concat(ec["*"]),f=0,g=e.length;g>f;f++)if(d=e[f].call(c,b,a))return d}function ic(a,b,c){var d,e,f,g,h,i,j,l,n=this,o={},p=a.style,q=a.nodeType&&U(a),r=m._data(a,"fxshow");c.queue||(h=m._queueHooks(a,"fx"),null==h.unqueued&&(h.unqueued=0,i=h.empty.fire,h.empty.fire=function(){h.unqueued||i()}),h.unqueued++,n.always(function(){n.always(function(){h.unqueued--,m.queue(a,"fx").length||h.empty.fire()})})),1===a.nodeType&&("height"in b||"width"in b)&&(c.overflow=[p.overflow,p.overflowX,p.overflowY],j=m.css(a,"display"),l="none"===j?m._data(a,"olddisplay")||Fb(a.nodeName):j,"inline"===l&&"none"===m.css(a,"float")&&(k.inlineBlockNeedsLayout&&"inline"!==Fb(a.nodeName)?p.zoom=1:p.display="inline-block")),c.overflow&&(p.overflow="hidden",k.shrinkWrapBlocks()||n.always(function(){p.overflow=c.overflow[0],p.overflowX=c.overflow[1],p.overflowY=c.overflow[2]}));for(d in b)if(e=b[d],ac.exec(e)){if(delete b[d],f=f||"toggle"===e,e===(q?"hide":"show")){if("show"!==e||!r||void 0===r[d])continue;q=!0}o[d]=r&&r[d]||m.style(a,d)}else j=void 0;if(m.isEmptyObject(o))"inline"===("none"===j?Fb(a.nodeName):j)&&(p.display=j);else{r?"hidden"in r&&(q=r.hidden):r=m._data(a,"fxshow",{}),f&&(r.hidden=!q),q?m(a).show():n.done(function(){m(a).hide()}),n.done(function(){var b;m._removeData(a,"fxshow");for(b in o)m.style(a,b,o[b])});for(d in o)g=hc(q?r[d]:0,d,n),d in r||(r[d]=g.start,q&&(g.end=g.start,g.start="width"===d||"height"===d?1:0))}}function jc(a,b){var c,d,e,f,g;for(c in a)if(d=m.camelCase(c),e=b[d],f=a[c],m.isArray(f)&&(e=f[1],f=a[c]=f[0]),c!==d&&(a[d]=f,delete a[c]),g=m.cssHooks[d],g&&"expand"in g){f=g.expand(f),delete a[d];for(c in f)c in a||(a[c]=f[c],b[c]=e)}else b[d]=e}function kc(a,b,c){var d,e,f=0,g=dc.length,h=m.Deferred().always(function(){delete i.elem}),i=function(){if(e)return!1;for(var b=$b||fc(),c=Math.max(0,j.startTime+j.duration-b),d=c/j.duration||0,f=1-d,g=0,i=j.tweens.length;i>g;g++)j.tweens[g].run(f);return h.notifyWith(a,[j,f,c]),1>f&&i?c:(h.resolveWith(a,[j]),!1)},j=h.promise({elem:a,props:m.extend({},b),opts:m.extend(!0,{specialEasing:{}},c),originalProperties:b,originalOptions:c,startTime:$b||fc(),duration:c.duration,tweens:[],createTween:function(b,c){var d=m.Tween(a,j.opts,b,c,j.opts.specialEasing[b]||j.opts.easing);return j.tweens.push(d),d},stop:function(b){var c=0,d=b?j.tweens.length:0;if(e)return this;for(e=!0;d>c;c++)j.tweens[c].run(1);return b?h.resolveWith(a,[j,b]):h.rejectWith(a,[j,b]),this}}),k=j.props;for(jc(k,j.opts.specialEasing);g>f;f++)if(d=dc[f].call(j,a,k,j.opts))return d;return m.map(k,hc,j),m.isFunction(j.opts.start)&&j.opts.start.call(a,j),m.fx.timer(m.extend(i,{elem:a,anim:j,queue:j.opts.queue})),j.progress(j.opts.progress).done(j.opts.done,j.opts.complete).fail(j.opts.fail).always(j.opts.always)}m.Animation=m.extend(kc,{tweener:function(a,b){m.isFunction(a)?(b=a,a=["*"]):a=a.split(" ");for(var c,d=0,e=a.length;e>d;d++)c=a[d],ec[c]=ec[c]||[],ec[c].unshift(b)},prefilter:function(a,b){b?dc.unshift(a):dc.push(a)}}),m.speed=function(a,b,c){var d=a&&"object"==typeof a?m.extend({},a):{complete:c||!c&&b||m.isFunction(a)&&a,duration:a,easing:c&&b||b&&!m.isFunction(b)&&b};return d.duration=m.fx.off?0:"number"==typeof d.duration?d.duration:d.duration in m.fx.speeds?m.fx.speeds[d.duration]:m.fx.speeds._default,(null==d.queue||d.queue===!0)&&(d.queue="fx"),d.old=d.complete,d.complete=function(){m.isFunction(d.old)&&d.old.call(this),d.queue&&m.dequeue(this,d.queue)},d},m.fn.extend({fadeTo:function(a,b,c,d){return this.filter(U).css("opacity",0).show().end().animate({opacity:b},a,c,d)},animate:function(a,b,c,d){var e=m.isEmptyObject(a),f=m.speed(b,c,d),g=function(){var b=kc(this,m.extend({},a),f);(e||m._data(this,"finish"))&&b.stop(!0)};return g.finish=g,e||f.queue===!1?this.each(g):this.queue(f.queue,g)},stop:function(a,b,c){var d=function(a){var b=a.stop;delete a.stop,b(c)};return"string"!=typeof a&&(c=b,b=a,a=void 0),b&&a!==!1&&this.queue(a||"fx",[]),this.each(function(){var b=!0,e=null!=a&&a+"queueHooks",f=m.timers,g=m._data(this);if(e)g[e]&&g[e].stop&&d(g[e]);else for(e in g)g[e]&&g[e].stop&&cc.test(e)&&d(g[e]);for(e=f.length;e--;)f[e].elem!==this||null!=a&&f[e].queue!==a||(f[e].anim.stop(c),b=!1,f.splice(e,1));(b||!c)&&m.dequeue(this,a)})},finish:function(a){return a!==!1&&(a=a||"fx"),this.each(function(){var b,c=m._data(this),d=c[a+"queue"],e=c[a+"queueHooks"],f=m.timers,g=d?d.length:0;for(c.finish=!0,m.queue(this,a,[]),e&&e.stop&&e.stop.call(this,!0),b=f.length;b--;)f[b].elem===this&&f[b].queue===a&&(f[b].anim.stop(!0),f.splice(b,1));for(b=0;g>b;b++)d[b]&&d[b].finish&&d[b].finish.call(this);delete c.finish})}}),m.each(["toggle","show","hide"],function(a,b){var c=m.fn[b];m.fn[b]=function(a,d,e){return null==a||"boolean"==typeof a?c.apply(this,arguments):this.animate(gc(b,!0),a,d,e)}}),m.each({slideDown:gc("show"),slideUp:gc("hide"),slideToggle:gc("toggle"),fadeIn:{opacity:"show"},fadeOut:{opacity:"hide"},fadeToggle:{opacity:"toggle"}},function(a,b){m.fn[a]=function(a,c,d){return this.animate(b,a,c,d)}}),m.timers=[],m.fx.tick=function(){var a,b=m.timers,c=0;for($b=m.now();c<b.length;c++)a=b[c],a()||b[c]!==a||b.splice(c--,1);b.length||m.fx.stop(),$b=void 0},m.fx.timer=function(a){m.timers.push(a),a()?m.fx.start():m.timers.pop()},m.fx.interval=13,m.fx.start=function(){_b||(_b=setInterval(m.fx.tick,m.fx.interval))},m.fx.stop=function(){clearInterval(_b),_b=null},m.fx.speeds={slow:600,fast:200,_default:400},m.fn.delay=function(a,b){return a=m.fx?m.fx.speeds[a]||a:a,b=b||"fx",this.queue(b,function(b,c){var d=setTimeout(b,a);c.stop=function(){clearTimeout(d)}})},function(){var a,b,c,d,e;b=y.createElement("div"),b.setAttribute("className","t"),b.innerHTML="  <link/><table></table><a href='/a'>a</a><input type='checkbox'/>",d=b.getElementsByTagName("a")[0],c=y.createElement("select"),e=c.appendChild(y.createElement("option")),a=b.getElementsByTagName("input")[0],d.style.cssText="top:1px",k.getSetAttribute="t"!==b.className,k.style=/top/.test(d.getAttribute("style")),k.hrefNormalized="/a"===d.getAttribute("href"),k.checkOn=!!a.value,k.optSelected=e.selected,k.enctype=!!y.createElement("form").enctype,c.disabled=!0,k.optDisabled=!e.disabled,a=y.createElement("input"),a.setAttribute("value",""),k.input=""===a.getAttribute("value"),a.value="t",a.setAttribute("type","radio"),k.radioValue="t"===a.value}();var lc=/\r/g;m.fn.extend({val:function(a){var b,c,d,e=this[0];{if(arguments.length)return d=m.isFunction(a),this.each(function(c){var e;1===this.nodeType&&(e=d?a.call(this,c,m(this).val()):a,null==e?e="":"number"==typeof e?e+="":m.isArray(e)&&(e=m.map(e,function(a){return null==a?"":a+""})),b=m.valHooks[this.type]||m.valHooks[this.nodeName.toLowerCase()],b&&"set"in b&&void 0!==b.set(this,e,"value")||(this.value=e))});if(e)return b=m.valHooks[e.type]||m.valHooks[e.nodeName.toLowerCase()],b&&"get"in b&&void 0!==(c=b.get(e,"value"))?c:(c=e.value,"string"==typeof c?c.replace(lc,""):null==c?"":c)}}}),m.extend({valHooks:{option:{get:function(a){var b=m.find.attr(a,"value");return null!=b?b:m.trim(m.text(a))}},select:{get:function(a){for(var b,c,d=a.options,e=a.selectedIndex,f="select-one"===a.type||0>e,g=f?null:[],h=f?e+1:d.length,i=0>e?h:f?e:0;h>i;i++)if(c=d[i],!(!c.selected&&i!==e||(k.optDisabled?c.disabled:null!==c.getAttribute("disabled"))||c.parentNode.disabled&&m.nodeName(c.parentNode,"optgroup"))){if(b=m(c).val(),f)return b;g.push(b)}return g},set:function(a,b){var c,d,e=a.options,f=m.makeArray(b),g=e.length;while(g--)if(d=e[g],m.inArray(m.valHooks.option.get(d),f)>=0)try{d.selected=c=!0}catch(h){d.scrollHeight}else d.selected=!1;return c||(a.selectedIndex=-1),e}}}}),m.each(["radio","checkbox"],function(){m.valHooks[this]={set:function(a,b){return m.isArray(b)?a.checked=m.inArray(m(a).val(),b)>=0:void 0}},k.checkOn||(m.valHooks[this].get=function(a){return null===a.getAttribute("value")?"on":a.value})});var mc,nc,oc=m.expr.attrHandle,pc=/^(?:checked|selected)$/i,qc=k.getSetAttribute,rc=k.input;m.fn.extend({attr:function(a,b){return V(this,m.attr,a,b,arguments.length>1)},removeAttr:function(a){return this.each(function(){m.removeAttr(this,a)})}}),m.extend({attr:function(a,b,c){var d,e,f=a.nodeType;if(a&&3!==f&&8!==f&&2!==f)return typeof a.getAttribute===K?m.prop(a,b,c):(1===f&&m.isXMLDoc(a)||(b=b.toLowerCase(),d=m.attrHooks[b]||(m.expr.match.bool.test(b)?nc:mc)),void 0===c?d&&"get"in d&&null!==(e=d.get(a,b))?e:(e=m.find.attr(a,b),null==e?void 0:e):null!==c?d&&"set"in d&&void 0!==(e=d.set(a,c,b))?e:(a.setAttribute(b,c+""),c):void m.removeAttr(a,b))},removeAttr:function(a,b){var c,d,e=0,f=b&&b.match(E);if(f&&1===a.nodeType)while(c=f[e++])d=m.propFix[c]||c,m.expr.match.bool.test(c)?rc&&qc||!pc.test(c)?a[d]=!1:a[m.camelCase("default-"+c)]=a[d]=!1:m.attr(a,c,""),a.removeAttribute(qc?c:d)},attrHooks:{type:{set:function(a,b){if(!k.radioValue&&"radio"===b&&m.nodeName(a,"input")){var c=a.value;return a.setAttribute("type",b),c&&(a.value=c),b}}}}}),nc={set:function(a,b,c){return b===!1?m.removeAttr(a,c):rc&&qc||!pc.test(c)?a.setAttribute(!qc&&m.propFix[c]||c,c):a[m.camelCase("default-"+c)]=a[c]=!0,c}},m.each(m.expr.match.bool.source.match(/\w+/g),function(a,b){var c=oc[b]||m.find.attr;oc[b]=rc&&qc||!pc.test(b)?function(a,b,d){var e,f;return d||(f=oc[b],oc[b]=e,e=null!=c(a,b,d)?b.toLowerCase():null,oc[b]=f),e}:function(a,b,c){return c?void 0:a[m.camelCase("default-"+b)]?b.toLowerCase():null}}),rc&&qc||(m.attrHooks.value={set:function(a,b,c){return m.nodeName(a,"input")?void(a.defaultValue=b):mc&&mc.set(a,b,c)}}),qc||(mc={set:function(a,b,c){var d=a.getAttributeNode(c);return d||a.setAttributeNode(d=a.ownerDocument.createAttribute(c)),d.value=b+="","value"===c||b===a.getAttribute(c)?b:void 0}},oc.id=oc.name=oc.coords=function(a,b,c){var d;return c?void 0:(d=a.getAttributeNode(b))&&""!==d.value?d.value:null},m.valHooks.button={get:function(a,b){var c=a.getAttributeNode(b);return c&&c.specified?c.value:void 0},set:mc.set},m.attrHooks.contenteditable={set:function(a,b,c){mc.set(a,""===b?!1:b,c)}},m.each(["width","height"],function(a,b){m.attrHooks[b]={set:function(a,c){return""===c?(a.setAttribute(b,"auto"),c):void 0}}})),k.style||(m.attrHooks.style={get:function(a){return a.style.cssText||void 0},set:function(a,b){return a.style.cssText=b+""}});var sc=/^(?:input|select|textarea|button|object)$/i,tc=/^(?:a|area)$/i;m.fn.extend({prop:function(a,b){return V(this,m.prop,a,b,arguments.length>1)},removeProp:function(a){return a=m.propFix[a]||a,this.each(function(){try{this[a]=void 0,delete this[a]}catch(b){}})}}),m.extend({propFix:{"for":"htmlFor","class":"className"},prop:function(a,b,c){var d,e,f,g=a.nodeType;if(a&&3!==g&&8!==g&&2!==g)return f=1!==g||!m.isXMLDoc(a),f&&(b=m.propFix[b]||b,e=m.propHooks[b]),void 0!==c?e&&"set"in e&&void 0!==(d=e.set(a,c,b))?d:a[b]=c:e&&"get"in e&&null!==(d=e.get(a,b))?d:a[b]},propHooks:{tabIndex:{get:function(a){var b=m.find.attr(a,"tabindex");return b?parseInt(b,10):sc.test(a.nodeName)||tc.test(a.nodeName)&&a.href?0:-1}}}}),k.hrefNormalized||m.each(["href","src"],function(a,b){m.propHooks[b]={get:function(a){return a.getAttribute(b,4)}}}),k.optSelected||(m.propHooks.selected={get:function(a){var b=a.parentNode;return b&&(b.selectedIndex,b.parentNode&&b.parentNode.selectedIndex),null}}),m.each(["tabIndex","readOnly","maxLength","cellSpacing","cellPadding","rowSpan","colSpan","useMap","frameBorder","contentEditable"],function(){m.propFix[this.toLowerCase()]=this}),k.enctype||(m.propFix.enctype="encoding");var uc=/[\t\r\n\f]/g;m.fn.extend({addClass:function(a){var b,c,d,e,f,g,h=0,i=this.length,j="string"==typeof a&&a;if(m.isFunction(a))return this.each(function(b){m(this).addClass(a.call(this,b,this.className))});if(j)for(b=(a||"").match(E)||[];i>h;h++)if(c=this[h],d=1===c.nodeType&&(c.className?(" "+c.className+" ").replace(uc," "):" ")){f=0;while(e=b[f++])d.indexOf(" "+e+" ")<0&&(d+=e+" ");g=m.trim(d),c.className!==g&&(c.className=g)}return this},removeClass:function(a){var b,c,d,e,f,g,h=0,i=this.length,j=0===arguments.length||"string"==typeof a&&a;if(m.isFunction(a))return this.each(function(b){m(this).removeClass(a.call(this,b,this.className))});if(j)for(b=(a||"").match(E)||[];i>h;h++)if(c=this[h],d=1===c.nodeType&&(c.className?(" "+c.className+" ").replace(uc," "):"")){f=0;while(e=b[f++])while(d.indexOf(" "+e+" ")>=0)d=d.replace(" "+e+" "," ");g=a?m.trim(d):"",c.className!==g&&(c.className=g)}return this},toggleClass:function(a,b){var c=typeof a;return"boolean"==typeof b&&"string"===c?b?this.addClass(a):this.removeClass(a):this.each(m.isFunction(a)?function(c){m(this).toggleClass(a.call(this,c,this.className,b),b)}:function(){if("string"===c){var b,d=0,e=m(this),f=a.match(E)||[];while(b=f[d++])e.hasClass(b)?e.removeClass(b):e.addClass(b)}else(c===K||"boolean"===c)&&(this.className&&m._data(this,"__className__",this.className),this.className=this.className||a===!1?"":m._data(this,"__className__")||"")})},hasClass:function(a){for(var b=" "+a+" ",c=0,d=this.length;d>c;c++)if(1===this[c].nodeType&&(" "+this[c].className+" ").replace(uc," ").indexOf(b)>=0)return!0;return!1}}),m.each("blur focus focusin focusout load resize scroll unload click dblclick mousedown mouseup mousemove mouseover mouseout mouseenter mouseleave change select submit keydown keypress keyup error contextmenu".split(" "),function(a,b){m.fn[b]=function(a,c){return arguments.length>0?this.on(b,null,a,c):this.trigger(b)}}),m.fn.extend({hover:function(a,b){return this.mouseenter(a).mouseleave(b||a)},bind:function(a,b,c){return this.on(a,null,b,c)},unbind:function(a,b){return this.off(a,null,b)},delegate:function(a,b,c,d){return this.on(b,a,c,d)},undelegate:function(a,b,c){return 1===arguments.length?this.off(a,"**"):this.off(b,a||"**",c)}});var vc=m.now(),wc=/\?/,xc=/(,)|(\[|{)|(}|])|"(?:[^"\\\r\n]|\\["\\\/bfnrt]|\\u[\da-fA-F]{4})*"\s*:?|true|false|null|-?(?!0\d)\d+(?:\.\d+|)(?:[eE][+-]?\d+|)/g;m.parseJSON=function(b){if(a.JSON&&a.JSON.parse)return a.JSON.parse(b+"");var c,d=null,e=m.trim(b+"");return e&&!m.trim(e.replace(xc,function(a,b,e,f){return c&&b&&(d=0),0===d?a:(c=e||b,d+=!f-!e,"")}))?Function("return "+e)():m.error("Invalid JSON: "+b)},m.parseXML=function(b){var c,d;if(!b||"string"!=typeof b)return null;try{a.DOMParser?(d=new DOMParser,c=d.parseFromString(b,"text/xml")):(c=new ActiveXObject("Microsoft.XMLDOM"),c.async="false",c.loadXML(b))}catch(e){c=void 0}return c&&c.documentElement&&!c.getElementsByTagName("parsererror").length||m.error("Invalid XML: "+b),c};var yc,zc,Ac=/#.*$/,Bc=/([?&])_=[^&]*/,Cc=/^(.*?):[ \t]*([^\r\n]*)\r?$/gm,Dc=/^(?:about|app|app-storage|.+-extension|file|res|widget):$/,Ec=/^(?:GET|HEAD)$/,Fc=/^\/\//,Gc=/^([\w.+-]+:)(?:\/\/(?:[^\/?#]*@|)([^\/?#:]*)(?::(\d+)|)|)/,Hc={},Ic={},Jc="*/".concat("*");try{zc=location.href}catch(Kc){zc=y.createElement("a"),zc.href="",zc=zc.href}yc=Gc.exec(zc.toLowerCase())||[];function Lc(a){return function(b,c){"string"!=typeof b&&(c=b,b="*");var d,e=0,f=b.toLowerCase().match(E)||[];if(m.isFunction(c))while(d=f[e++])"+"===d.charAt(0)?(d=d.slice(1)||"*",(a[d]=a[d]||[]).unshift(c)):(a[d]=a[d]||[]).push(c)}}function Mc(a,b,c,d){var e={},f=a===Ic;function g(h){var i;return e[h]=!0,m.each(a[h]||[],function(a,h){var j=h(b,c,d);return"string"!=typeof j||f||e[j]?f?!(i=j):void 0:(b.dataTypes.unshift(j),g(j),!1)}),i}return g(b.dataTypes[0])||!e["*"]&&g("*")}function Nc(a,b){var c,d,e=m.ajaxSettings.flatOptions||{};for(d in b)void 0!==b[d]&&((e[d]?a:c||(c={}))[d]=b[d]);return c&&m.extend(!0,a,c),a}function Oc(a,b,c){var d,e,f,g,h=a.contents,i=a.dataTypes;while("*"===i[0])i.shift(),void 0===e&&(e=a.mimeType||b.getResponseHeader("Content-Type"));if(e)for(g in h)if(h[g]&&h[g].test(e)){i.unshift(g);break}if(i[0]in c)f=i[0];else{for(g in c){if(!i[0]||a.converters[g+" "+i[0]]){f=g;break}d||(d=g)}f=f||d}return f?(f!==i[0]&&i.unshift(f),c[f]):void 0}function Pc(a,b,c,d){var e,f,g,h,i,j={},k=a.dataTypes.slice();if(k[1])for(g in a.converters)j[g.toLowerCase()]=a.converters[g];f=k.shift();while(f)if(a.responseFields[f]&&(c[a.responseFields[f]]=b),!i&&d&&a.dataFilter&&(b=a.dataFilter(b,a.dataType)),i=f,f=k.shift())if("*"===f)f=i;else if("*"!==i&&i!==f){if(g=j[i+" "+f]||j["* "+f],!g)for(e in j)if(h=e.split(" "),h[1]===f&&(g=j[i+" "+h[0]]||j["* "+h[0]])){g===!0?g=j[e]:j[e]!==!0&&(f=h[0],k.unshift(h[1]));break}if(g!==!0)if(g&&a["throws"])b=g(b);else try{b=g(b)}catch(l){return{state:"parsererror",error:g?l:"No conversion from "+i+" to "+f}}}return{state:"success",data:b}}m.extend({active:0,lastModified:{},etag:{},ajaxSettings:{url:zc,type:"GET",isLocal:Dc.test(yc[1]),global:!0,processData:!0,async:!0,contentType:"application/x-www-form-urlencoded; charset=UTF-8",accepts:{"*":Jc,text:"text/plain",html:"text/html",xml:"application/xml, text/xml",json:"application/json, text/javascript"},contents:{xml:/xml/,html:/html/,json:/json/},responseFields:{xml:"responseXML",text:"responseText",json:"responseJSON"},converters:{"* text":String,"text html":!0,"text json":m.parseJSON,"text xml":m.parseXML},flatOptions:{url:!0,context:!0}},ajaxSetup:function(a,b){return b?Nc(Nc(a,m.ajaxSettings),b):Nc(m.ajaxSettings,a)},ajaxPrefilter:Lc(Hc),ajaxTransport:Lc(Ic),ajax:function(a,b){"object"==typeof a&&(b=a,a=void 0),b=b||{};var c,d,e,f,g,h,i,j,k=m.ajaxSetup({},b),l=k.context||k,n=k.context&&(l.nodeType||l.jquery)?m(l):m.event,o=m.Deferred(),p=m.Callbacks("once memory"),q=k.statusCode||{},r={},s={},t=0,u="canceled",v={readyState:0,getResponseHeader:function(a){var b;if(2===t){if(!j){j={};while(b=Cc.exec(f))j[b[1].toLowerCase()]=b[2]}b=j[a.toLowerCase()]}return null==b?null:b},getAllResponseHeaders:function(){return 2===t?f:null},setRequestHeader:function(a,b){var c=a.toLowerCase();return t||(a=s[c]=s[c]||a,r[a]=b),this},overrideMimeType:function(a){return t||(k.mimeType=a),this},statusCode:function(a){var b;if(a)if(2>t)for(b in a)q[b]=[q[b],a[b]];else v.always(a[v.status]);return this},abort:function(a){var b=a||u;return i&&i.abort(b),x(0,b),this}};if(o.promise(v).complete=p.add,v.success=v.done,v.error=v.fail,k.url=((a||k.url||zc)+"").replace(Ac,"").replace(Fc,yc[1]+"//"),k.type=b.method||b.type||k.method||k.type,k.dataTypes=m.trim(k.dataType||"*").toLowerCase().match(E)||[""],null==k.crossDomain&&(c=Gc.exec(k.url.toLowerCase()),k.crossDomain=!(!c||c[1]===yc[1]&&c[2]===yc[2]&&(c[3]||("http:"===c[1]?"80":"443"))===(yc[3]||("http:"===yc[1]?"80":"443")))),k.data&&k.processData&&"string"!=typeof k.data&&(k.data=m.param(k.data,k.traditional)),Mc(Hc,k,b,v),2===t)return v;h=k.global,h&&0===m.active++&&m.event.trigger("ajaxStart"),k.type=k.type.toUpperCase(),k.hasContent=!Ec.test(k.type),e=k.url,k.hasContent||(k.data&&(e=k.url+=(wc.test(e)?"&":"?")+k.data,delete k.data),k.cache===!1&&(k.url=Bc.test(e)?e.replace(Bc,"$1_="+vc++):e+(wc.test(e)?"&":"?")+"_="+vc++)),k.ifModified&&(m.lastModified[e]&&v.setRequestHeader("If-Modified-Since",m.lastModified[e]),m.etag[e]&&v.setRequestHeader("If-None-Match",m.etag[e])),(k.data&&k.hasContent&&k.contentType!==!1||b.contentType)&&v.setRequestHeader("Content-Type",k.contentType),v.setRequestHeader("Accept",k.dataTypes[0]&&k.accepts[k.dataTypes[0]]?k.accepts[k.dataTypes[0]]+("*"!==k.dataTypes[0]?", "+Jc+"; q=0.01":""):k.accepts["*"]);for(d in k.headers)v.setRequestHeader(d,k.headers[d]);if(k.beforeSend&&(k.beforeSend.call(l,v,k)===!1||2===t))return v.abort();u="abort";for(d in{success:1,error:1,complete:1})v[d](k[d]);if(i=Mc(Ic,k,b,v)){v.readyState=1,h&&n.trigger("ajaxSend",[v,k]),k.async&&k.timeout>0&&(g=setTimeout(function(){v.abort("timeout")},k.timeout));try{t=1,i.send(r,x)}catch(w){if(!(2>t))throw w;x(-1,w)}}else x(-1,"No Transport");function x(a,b,c,d){var j,r,s,u,w,x=b;2!==t&&(t=2,g&&clearTimeout(g),i=void 0,f=d||"",v.readyState=a>0?4:0,j=a>=200&&300>a||304===a,c&&(u=Oc(k,v,c)),u=Pc(k,u,v,j),j?(k.ifModified&&(w=v.getResponseHeader("Last-Modified"),w&&(m.lastModified[e]=w),w=v.getResponseHeader("etag"),w&&(m.etag[e]=w)),204===a||"HEAD"===k.type?x="nocontent":304===a?x="notmodified":(x=u.state,r=u.data,s=u.error,j=!s)):(s=x,(a||!x)&&(x="error",0>a&&(a=0))),v.status=a,v.statusText=(b||x)+"",j?o.resolveWith(l,[r,x,v]):o.rejectWith(l,[v,x,s]),v.statusCode(q),q=void 0,h&&n.trigger(j?"ajaxSuccess":"ajaxError",[v,k,j?r:s]),p.fireWith(l,[v,x]),h&&(n.trigger("ajaxComplete",[v,k]),--m.active||m.event.trigger("ajaxStop")))}return v},getJSON:function(a,b,c){return m.get(a,b,c,"json")},getScript:function(a,b){return m.get(a,void 0,b,"script")}}),m.each(["get","post"],function(a,b){m[b]=function(a,c,d,e){return m.isFunction(c)&&(e=e||d,d=c,c=void 0),m.ajax({url:a,type:b,dataType:e,data:c,success:d})}}),m.each(["ajaxStart","ajaxStop","ajaxComplete","ajaxError","ajaxSuccess","ajaxSend"],function(a,b){m.fn[b]=function(a){return this.on(b,a)}}),m._evalUrl=function(a){return m.ajax({url:a,type:"GET",dataType:"script",async:!1,global:!1,"throws":!0})},m.fn.extend({wrapAll:function(a){if(m.isFunction(a))return this.each(function(b){m(this).wrapAll(a.call(this,b))});if(this[0]){var b=m(a,this[0].ownerDocument).eq(0).clone(!0);this[0].parentNode&&b.insertBefore(this[0]),b.map(function(){var a=this;while(a.firstChild&&1===a.firstChild.nodeType)a=a.firstChild;return a}).append(this)}return this},wrapInner:function(a){return this.each(m.isFunction(a)?function(b){m(this).wrapInner(a.call(this,b))}:function(){var b=m(this),c=b.contents();c.length?c.wrapAll(a):b.append(a)})},wrap:function(a){var b=m.isFunction(a);return this.each(function(c){m(this).wrapAll(b?a.call(this,c):a)})},unwrap:function(){return this.parent().each(function(){m.nodeName(this,"body")||m(this).replaceWith(this.childNodes)}).end()}}),m.expr.filters.hidden=function(a){return a.offsetWidth<=0&&a.offsetHeight<=0||!k.reliableHiddenOffsets()&&"none"===(a.style&&a.style.display||m.css(a,"display"))},m.expr.filters.visible=function(a){return!m.expr.filters.hidden(a)};var Qc=/%20/g,Rc=/\[\]$/,Sc=/\r?\n/g,Tc=/^(?:submit|button|image|reset|file)$/i,Uc=/^(?:input|select|textarea|keygen)/i;function Vc(a,b,c,d){var e;if(m.isArray(b))m.each(b,function(b,e){c||Rc.test(a)?d(a,e):Vc(a+"["+("object"==typeof e?b:"")+"]",e,c,d)});else if(c||"object"!==m.type(b))d(a,b);else for(e in b)Vc(a+"["+e+"]",b[e],c,d)}m.param=function(a,b){var c,d=[],e=function(a,b){b=m.isFunction(b)?b():null==b?"":b,d[d.length]=encodeURIComponent(a)+"="+encodeURIComponent(b)};if(void 0===b&&(b=m.ajaxSettings&&m.ajaxSettings.traditional),m.isArray(a)||a.jquery&&!m.isPlainObject(a))m.each(a,function(){e(this.name,this.value)});else for(c in a)Vc(c,a[c],b,e);return d.join("&").replace(Qc,"+")},m.fn.extend({serialize:function(){return m.param(this.serializeArray())},serializeArray:function(){return this.map(function(){var a=m.prop(this,"elements");return a?m.makeArray(a):this}).filter(function(){var a=this.type;return this.name&&!m(this).is(":disabled")&&Uc.test(this.nodeName)&&!Tc.test(a)&&(this.checked||!W.test(a))}).map(function(a,b){var c=m(this).val();return null==c?null:m.isArray(c)?m.map(c,function(a){return{name:b.name,value:a.replace(Sc,"\r\n")}}):{name:b.name,value:c.replace(Sc,"\r\n")}}).get()}}),m.ajaxSettings.xhr=void 0!==a.ActiveXObject?function(){return!this.isLocal&&/^(get|post|head|put|delete|options)$/i.test(this.type)&&Zc()||$c()}:Zc;var Wc=0,Xc={},Yc=m.ajaxSettings.xhr();a.ActiveXObject&&m(a).on("unload",function(){for(var a in Xc)Xc[a](void 0,!0)}),k.cors=!!Yc&&"withCredentials"in Yc,Yc=k.ajax=!!Yc,Yc&&m.ajaxTransport(function(a){if(!a.crossDomain||k.cors){var b;return{send:function(c,d){var e,f=a.xhr(),g=++Wc;if(f.open(a.type,a.url,a.async,a.username,a.password),a.xhrFields)for(e in a.xhrFields)f[e]=a.xhrFields[e];a.mimeType&&f.overrideMimeType&&f.overrideMimeType(a.mimeType),a.crossDomain||c["X-Requested-With"]||(c["X-Requested-With"]="XMLHttpRequest");for(e in c)void 0!==c[e]&&f.setRequestHeader(e,c[e]+"");f.send(a.hasContent&&a.data||null),b=function(c,e){var h,i,j;if(b&&(e||4===f.readyState))if(delete Xc[g],b=void 0,f.onreadystatechange=m.noop,e)4!==f.readyState&&f.abort();else{j={},h=f.status,"string"==typeof f.responseText&&(j.text=f.responseText);try{i=f.statusText}catch(k){i=""}h||!a.isLocal||a.crossDomain?1223===h&&(h=204):h=j.text?200:404}j&&d(h,i,j,f.getAllResponseHeaders())},a.async?4===f.readyState?setTimeout(b):f.onreadystatechange=Xc[g]=b:b()},abort:function(){b&&b(void 0,!0)}}}});function Zc(){try{return new a.XMLHttpRequest}catch(b){}}function $c(){try{return new a.ActiveXObject("Microsoft.XMLHTTP")}catch(b){}}m.ajaxSetup({accepts:{script:"text/javascript, application/javascript, application/ecmascript, application/x-ecmascript"},contents:{script:/(?:java|ecma)script/},converters:{"text script":function(a){return m.globalEval(a),a}}}),m.ajaxPrefilter("script",function(a){void 0===a.cache&&(a.cache=!1),a.crossDomain&&(a.type="GET",a.global=!1)}),m.ajaxTransport("script",function(a){if(a.crossDomain){var b,c=y.head||m("head")[0]||y.documentElement;return{send:function(d,e){b=y.createElement("script"),b.async=!0,a.scriptCharset&&(b.charset=a.scriptCharset),b.src=a.url,b.onload=b.onreadystatechange=function(a,c){(c||!b.readyState||/loaded|complete/.test(b.readyState))&&(b.onload=b.onreadystatechange=null,b.parentNode&&b.parentNode.removeChild(b),b=null,c||e(200,"success"))},c.insertBefore(b,c.firstChild)},abort:function(){b&&b.onload(void 0,!0)}}}});var _c=[],ad=/(=)\?(?=&|$)|\?\?/;m.ajaxSetup({jsonp:"callback",jsonpCallback:function(){var a=_c.pop()||m.expando+"_"+vc++;return this[a]=!0,a}}),m.ajaxPrefilter("json jsonp",function(b,c,d){var e,f,g,h=b.jsonp!==!1&&(ad.test(b.url)?"url":"string"==typeof b.data&&!(b.contentType||"").indexOf("application/x-www-form-urlencoded")&&ad.test(b.data)&&"data");return h||"jsonp"===b.dataTypes[0]?(e=b.jsonpCallback=m.isFunction(b.jsonpCallback)?b.jsonpCallback():b.jsonpCallback,h?b[h]=b[h].replace(ad,"$1"+e):b.jsonp!==!1&&(b.url+=(wc.test(b.url)?"&":"?")+b.jsonp+"="+e),b.converters["script json"]=function(){return g||m.error(e+" was not called"),g[0]},b.dataTypes[0]="json",f=a[e],a[e]=function(){g=arguments},d.always(function(){a[e]=f,b[e]&&(b.jsonpCallback=c.jsonpCallback,_c.push(e)),g&&m.isFunction(f)&&f(g[0]),g=f=void 0}),"script"):void 0}),m.parseHTML=function(a,b,c){if(!a||"string"!=typeof a)return null;"boolean"==typeof b&&(c=b,b=!1),b=b||y;var d=u.exec(a),e=!c&&[];return d?[b.createElement(d[1])]:(d=m.buildFragment([a],b,e),e&&e.length&&m(e).remove(),m.merge([],d.childNodes))};var bd=m.fn.load;m.fn.load=function(a,b,c){if("string"!=typeof a&&bd)return bd.apply(this,arguments);var d,e,f,g=this,h=a.indexOf(" ");return h>=0&&(d=m.trim(a.slice(h,a.length)),a=a.slice(0,h)),m.isFunction(b)?(c=b,b=void 0):b&&"object"==typeof b&&(f="POST"),g.length>0&&m.ajax({url:a,type:f,dataType:"html",data:b}).done(function(a){e=arguments,g.html(d?m("<div>").append(m.parseHTML(a)).find(d):a)}).complete(c&&function(a,b){g.each(c,e||[a.responseText,b,a])}),this},m.expr.filters.animated=function(a){return m.grep(m.timers,function(b){return a===b.elem}).length};var cd=a.document.documentElement;function dd(a){return m.isWindow(a)?a:9===a.nodeType?a.defaultView||a.parentWindow:!1}m.offset={setOffset:function(a,b,c){var d,e,f,g,h,i,j,k=m.css(a,"position"),l=m(a),n={};"static"===k&&(a.style.position="relative"),h=l.offset(),f=m.css(a,"top"),i=m.css(a,"left"),j=("absolute"===k||"fixed"===k)&&m.inArray("auto",[f,i])>-1,j?(d=l.position(),g=d.top,e=d.left):(g=parseFloat(f)||0,e=parseFloat(i)||0),m.isFunction(b)&&(b=b.call(a,c,h)),null!=b.top&&(n.top=b.top-h.top+g),null!=b.left&&(n.left=b.left-h.left+e),"using"in b?b.using.call(a,n):l.css(n)}},m.fn.extend({offset:function(a){if(arguments.length)return void 0===a?this:this.each(function(b){m.offset.setOffset(this,a,b)});var b,c,d={top:0,left:0},e=this[0],f=e&&e.ownerDocument;if(f)return b=f.documentElement,m.contains(b,e)?(typeof e.getBoundingClientRect!==K&&(d=e.getBoundingClientRect()),c=dd(f),{top:d.top+(c.pageYOffset||b.scrollTop)-(b.clientTop||0),left:d.left+(c.pageXOffset||b.scrollLeft)-(b.clientLeft||0)}):d},position:function(){if(this[0]){var a,b,c={top:0,left:0},d=this[0];return"fixed"===m.css(d,"position")?b=d.getBoundingClientRect():(a=this.offsetParent(),b=this.offset(),m.nodeName(a[0],"html")||(c=a.offset()),c.top+=m.css(a[0],"borderTopWidth",!0),c.left+=m.css(a[0],"borderLeftWidth",!0)),{top:b.top-c.top-m.css(d,"marginTop",!0),left:b.left-c.left-m.css(d,"marginLeft",!0)}}},offsetParent:function(){return this.map(function(){var a=this.offsetParent||cd;while(a&&!m.nodeName(a,"html")&&"static"===m.css(a,"position"))a=a.offsetParent;return a||cd})}}),m.each({scrollLeft:"pageXOffset",scrollTop:"pageYOffset"},function(a,b){var c=/Y/.test(b);m.fn[a]=function(d){return V(this,function(a,d,e){var f=dd(a);return void 0===e?f?b in f?f[b]:f.document.documentElement[d]:a[d]:void(f?f.scrollTo(c?m(f).scrollLeft():e,c?e:m(f).scrollTop()):a[d]=e)},a,d,arguments.length,null)}}),m.each(["top","left"],function(a,b){m.cssHooks[b]=Lb(k.pixelPosition,function(a,c){return c?(c=Jb(a,b),Hb.test(c)?m(a).position()[b]+"px":c):void 0})}),m.each({Height:"height",Width:"width"},function(a,b){m.each({padding:"inner"+a,content:b,"":"outer"+a},function(c,d){m.fn[d]=function(d,e){var f=arguments.length&&(c||"boolean"!=typeof d),g=c||(d===!0||e===!0?"margin":"border");return V(this,function(b,c,d){var e;return m.isWindow(b)?b.document.documentElement["client"+a]:9===b.nodeType?(e=b.documentElement,Math.max(b.body["scroll"+a],e["scroll"+a],b.body["offset"+a],e["offset"+a],e["client"+a])):void 0===d?m.css(b,c,g):m.style(b,c,d,g)},b,f?d:void 0,f,null)}})}),m.fn.size=function(){return this.length},m.fn.andSelf=m.fn.addBack,"function"==typeof define&&define.amd&&define("jquery",[],function(){return m});var ed=a.jQuery,fd=a.$;return m.noConflict=function(b){return a.$===m&&(a.$=fd),b&&a.jQuery===m&&(a.jQuery=ed),m},typeof b===K&&(a.jQuery=a.$=m),m});
+
+
+
+/** d3js **/
+!function(){function n(n,t){return t>n?-1:n>t?1:n>=t?0:0/0}function t(n){return null!=n&&!isNaN(n)}function e(n){return{left:function(t,e,r,u){for(arguments.length<3&&(r=0),arguments.length<4&&(u=t.length);u>r;){var i=r+u>>>1;n(t[i],e)<0?r=i+1:u=i}return r},right:function(t,e,r,u){for(arguments.length<3&&(r=0),arguments.length<4&&(u=t.length);u>r;){var i=r+u>>>1;n(t[i],e)>0?u=i:r=i+1}return r}}}function r(n){return n.length}function u(n){for(var t=1;n*t%1;)t*=10;return t}function i(n,t){try{for(var e in t)Object.defineProperty(n.prototype,e,{value:t[e],enumerable:!1})}catch(r){n.prototype=t}}function o(){}function a(n){return ia+n in this}function c(n){return n=ia+n,n in this&&delete this[n]}function s(){var n=[];return this.forEach(function(t){n.push(t)}),n}function l(){var n=0;for(var t in this)t.charCodeAt(0)===oa&&++n;return n}function f(){for(var n in this)if(n.charCodeAt(0)===oa)return!1;return!0}function h(){}function g(n,t,e){return function(){var r=e.apply(t,arguments);return r===t?n:r}}function p(n,t){if(t in n)return t;t=t.charAt(0).toUpperCase()+t.substring(1);for(var e=0,r=aa.length;r>e;++e){var u=aa[e]+t;if(u in n)return u}}function v(){}function d(){}function m(n){function t(){for(var t,r=e,u=-1,i=r.length;++u<i;)(t=r[u].on)&&t.apply(this,arguments);return n}var e=[],r=new o;return t.on=function(t,u){var i,o=r.get(t);return arguments.length<2?o&&o.on:(o&&(o.on=null,e=e.slice(0,i=e.indexOf(o)).concat(e.slice(i+1)),r.remove(t)),u&&e.push(r.set(t,{on:u})),n)},t}function y(){Zo.event.preventDefault()}function x(){for(var n,t=Zo.event;n=t.sourceEvent;)t=n;return t}function M(n){for(var t=new d,e=0,r=arguments.length;++e<r;)t[arguments[e]]=m(t);return t.of=function(e,r){return function(u){try{var i=u.sourceEvent=Zo.event;u.target=n,Zo.event=u,t[u.type].apply(e,r)}finally{Zo.event=i}}},t}function _(n){return sa(n,pa),n}function b(n){return"function"==typeof n?n:function(){return la(n,this)}}function w(n){return"function"==typeof n?n:function(){return fa(n,this)}}function S(n,t){function e(){this.removeAttribute(n)}function r(){this.removeAttributeNS(n.space,n.local)}function u(){this.setAttribute(n,t)}function i(){this.setAttributeNS(n.space,n.local,t)}function o(){var e=t.apply(this,arguments);null==e?this.removeAttribute(n):this.setAttribute(n,e)}function a(){var e=t.apply(this,arguments);null==e?this.removeAttributeNS(n.space,n.local):this.setAttributeNS(n.space,n.local,e)}return n=Zo.ns.qualify(n),null==t?n.local?r:e:"function"==typeof t?n.local?a:o:n.local?i:u}function k(n){return n.trim().replace(/\s+/g," ")}function E(n){return new RegExp("(?:^|\\s+)"+Zo.requote(n)+"(?:\\s+|$)","g")}function A(n){return(n+"").trim().split(/^|\s+/)}function C(n,t){function e(){for(var e=-1;++e<u;)n[e](this,t)}function r(){for(var e=-1,r=t.apply(this,arguments);++e<u;)n[e](this,r)}n=A(n).map(N);var u=n.length;return"function"==typeof t?r:e}function N(n){var t=E(n);return function(e,r){if(u=e.classList)return r?u.add(n):u.remove(n);var u=e.getAttribute("class")||"";r?(t.lastIndex=0,t.test(u)||e.setAttribute("class",k(u+" "+n))):e.setAttribute("class",k(u.replace(t," ")))}}function z(n,t,e){function r(){this.style.removeProperty(n)}function u(){this.style.setProperty(n,t,e)}function i(){var r=t.apply(this,arguments);null==r?this.style.removeProperty(n):this.style.setProperty(n,r,e)}return null==t?r:"function"==typeof t?i:u}function L(n,t){function e(){delete this[n]}function r(){this[n]=t}function u(){var e=t.apply(this,arguments);null==e?delete this[n]:this[n]=e}return null==t?e:"function"==typeof t?u:r}function T(n){return"function"==typeof n?n:(n=Zo.ns.qualify(n)).local?function(){return this.ownerDocument.createElementNS(n.space,n.local)}:function(){return this.ownerDocument.createElementNS(this.namespaceURI,n)}}function q(n){return{__data__:n}}function R(n){return function(){return ga(this,n)}}function D(t){return arguments.length||(t=n),function(n,e){return n&&e?t(n.__data__,e.__data__):!n-!e}}function P(n,t){for(var e=0,r=n.length;r>e;e++)for(var u,i=n[e],o=0,a=i.length;a>o;o++)(u=i[o])&&t(u,o,e);return n}function U(n){return sa(n,da),n}function j(n){var t,e;return function(r,u,i){var o,a=n[i].update,c=a.length;for(i!=e&&(e=i,t=0),u>=t&&(t=u+1);!(o=a[t])&&++t<c;);return o}}function H(){var n=this.__transition__;n&&++n.active}function F(n,t,e){function r(){var t=this[o];t&&(this.removeEventListener(n,t,t.$),delete this[o])}function u(){var u=c(t,Xo(arguments));r.call(this),this.addEventListener(n,this[o]=u,u.$=e),u._=t}function i(){var t,e=new RegExp("^__on([^.]+)"+Zo.requote(n)+"$");for(var r in this)if(t=r.match(e)){var u=this[r];this.removeEventListener(t[1],u,u.$),delete this[r]}}var o="__on"+n,a=n.indexOf("."),c=O;a>0&&(n=n.substring(0,a));var s=ya.get(n);return s&&(n=s,c=Y),a?t?u:r:t?v:i}function O(n,t){return function(e){var r=Zo.event;Zo.event=e,t[0]=this.__data__;try{n.apply(this,t)}finally{Zo.event=r}}}function Y(n,t){var e=O(n,t);return function(n){var t=this,r=n.relatedTarget;r&&(r===t||8&r.compareDocumentPosition(t))||e.call(t,n)}}function I(){var n=".dragsuppress-"+ ++Ma,t="click"+n,e=Zo.select(Wo).on("touchmove"+n,y).on("dragstart"+n,y).on("selectstart"+n,y);if(xa){var r=Bo.style,u=r[xa];r[xa]="none"}return function(i){function o(){e.on(t,null)}e.on(n,null),xa&&(r[xa]=u),i&&(e.on(t,function(){y(),o()},!0),setTimeout(o,0))}}function Z(n,t){t.changedTouches&&(t=t.changedTouches[0]);var e=n.ownerSVGElement||n;if(e.createSVGPoint){var r=e.createSVGPoint();if(0>_a&&(Wo.scrollX||Wo.scrollY)){e=Zo.select("body").append("svg").style({position:"absolute",top:0,left:0,margin:0,padding:0,border:"none"},"important");var u=e[0][0].getScreenCTM();_a=!(u.f||u.e),e.remove()}return _a?(r.x=t.pageX,r.y=t.pageY):(r.x=t.clientX,r.y=t.clientY),r=r.matrixTransform(n.getScreenCTM().inverse()),[r.x,r.y]}var i=n.getBoundingClientRect();return[t.clientX-i.left-n.clientLeft,t.clientY-i.top-n.clientTop]}function V(){return Zo.event.changedTouches[0].identifier}function X(){return Zo.event.target}function $(){return Wo}function B(n){return n>0?1:0>n?-1:0}function W(n,t,e){return(t[0]-n[0])*(e[1]-n[1])-(t[1]-n[1])*(e[0]-n[0])}function J(n){return n>1?0:-1>n?ba:Math.acos(n)}function G(n){return n>1?Sa:-1>n?-Sa:Math.asin(n)}function K(n){return((n=Math.exp(n))-1/n)/2}function Q(n){return((n=Math.exp(n))+1/n)/2}function nt(n){return((n=Math.exp(2*n))-1)/(n+1)}function tt(n){return(n=Math.sin(n/2))*n}function et(){}function rt(n,t,e){return this instanceof rt?(this.h=+n,this.s=+t,void(this.l=+e)):arguments.length<2?n instanceof rt?new rt(n.h,n.s,n.l):mt(""+n,yt,rt):new rt(n,t,e)}function ut(n,t,e){function r(n){return n>360?n-=360:0>n&&(n+=360),60>n?i+(o-i)*n/60:180>n?o:240>n?i+(o-i)*(240-n)/60:i}function u(n){return Math.round(255*r(n))}var i,o;return n=isNaN(n)?0:(n%=360)<0?n+360:n,t=isNaN(t)?0:0>t?0:t>1?1:t,e=0>e?0:e>1?1:e,o=.5>=e?e*(1+t):e+t-e*t,i=2*e-o,new gt(u(n+120),u(n),u(n-120))}function it(n,t,e){return this instanceof it?(this.h=+n,this.c=+t,void(this.l=+e)):arguments.length<2?n instanceof it?new it(n.h,n.c,n.l):n instanceof at?st(n.l,n.a,n.b):st((n=xt((n=Zo.rgb(n)).r,n.g,n.b)).l,n.a,n.b):new it(n,t,e)}function ot(n,t,e){return isNaN(n)&&(n=0),isNaN(t)&&(t=0),new at(e,Math.cos(n*=Aa)*t,Math.sin(n)*t)}function at(n,t,e){return this instanceof at?(this.l=+n,this.a=+t,void(this.b=+e)):arguments.length<2?n instanceof at?new at(n.l,n.a,n.b):n instanceof it?ot(n.l,n.c,n.h):xt((n=gt(n)).r,n.g,n.b):new at(n,t,e)}function ct(n,t,e){var r=(n+16)/116,u=r+t/500,i=r-e/200;return u=lt(u)*ja,r=lt(r)*Ha,i=lt(i)*Fa,new gt(ht(3.2404542*u-1.5371385*r-.4985314*i),ht(-.969266*u+1.8760108*r+.041556*i),ht(.0556434*u-.2040259*r+1.0572252*i))}function st(n,t,e){return n>0?new it(Math.atan2(e,t)*Ca,Math.sqrt(t*t+e*e),n):new it(0/0,0/0,n)}function lt(n){return n>.206893034?n*n*n:(n-4/29)/7.787037}function ft(n){return n>.008856?Math.pow(n,1/3):7.787037*n+4/29}function ht(n){return Math.round(255*(.00304>=n?12.92*n:1.055*Math.pow(n,1/2.4)-.055))}function gt(n,t,e){return this instanceof gt?(this.r=~~n,this.g=~~t,void(this.b=~~e)):arguments.length<2?n instanceof gt?new gt(n.r,n.g,n.b):mt(""+n,gt,ut):new gt(n,t,e)}function pt(n){return new gt(n>>16,255&n>>8,255&n)}function vt(n){return pt(n)+""}function dt(n){return 16>n?"0"+Math.max(0,n).toString(16):Math.min(255,n).toString(16)}function mt(n,t,e){var r,u,i,o=0,a=0,c=0;if(r=/([a-z]+)\((.*)\)/i.exec(n))switch(u=r[2].split(","),r[1]){case"hsl":return e(parseFloat(u[0]),parseFloat(u[1])/100,parseFloat(u[2])/100);case"rgb":return t(_t(u[0]),_t(u[1]),_t(u[2]))}return(i=Ia.get(n))?t(i.r,i.g,i.b):(null==n||"#"!==n.charAt(0)||isNaN(i=parseInt(n.substring(1),16))||(4===n.length?(o=(3840&i)>>4,o=o>>4|o,a=240&i,a=a>>4|a,c=15&i,c=c<<4|c):7===n.length&&(o=(16711680&i)>>16,a=(65280&i)>>8,c=255&i)),t(o,a,c))}function yt(n,t,e){var r,u,i=Math.min(n/=255,t/=255,e/=255),o=Math.max(n,t,e),a=o-i,c=(o+i)/2;return a?(u=.5>c?a/(o+i):a/(2-o-i),r=n==o?(t-e)/a+(e>t?6:0):t==o?(e-n)/a+2:(n-t)/a+4,r*=60):(r=0/0,u=c>0&&1>c?0:r),new rt(r,u,c)}function xt(n,t,e){n=Mt(n),t=Mt(t),e=Mt(e);var r=ft((.4124564*n+.3575761*t+.1804375*e)/ja),u=ft((.2126729*n+.7151522*t+.072175*e)/Ha),i=ft((.0193339*n+.119192*t+.9503041*e)/Fa);return at(116*u-16,500*(r-u),200*(u-i))}function Mt(n){return(n/=255)<=.04045?n/12.92:Math.pow((n+.055)/1.055,2.4)}function _t(n){var t=parseFloat(n);return"%"===n.charAt(n.length-1)?Math.round(2.55*t):t}function bt(n){return"function"==typeof n?n:function(){return n}}function wt(n){return n}function St(n){return function(t,e,r){return 2===arguments.length&&"function"==typeof e&&(r=e,e=null),kt(t,e,n,r)}}function kt(n,t,e,r){function u(){var n,t=c.status;if(!t&&c.responseText||t>=200&&300>t||304===t){try{n=e.call(i,c)}catch(r){return o.error.call(i,r),void 0}o.load.call(i,n)}else o.error.call(i,c)}var i={},o=Zo.dispatch("beforesend","progress","load","error"),a={},c=new XMLHttpRequest,s=null;return!Wo.XDomainRequest||"withCredentials"in c||!/^(http(s)?:)?\/\//.test(n)||(c=new XDomainRequest),"onload"in c?c.onload=c.onerror=u:c.onreadystatechange=function(){c.readyState>3&&u()},c.onprogress=function(n){var t=Zo.event;Zo.event=n;try{o.progress.call(i,c)}finally{Zo.event=t}},i.header=function(n,t){return n=(n+"").toLowerCase(),arguments.length<2?a[n]:(null==t?delete a[n]:a[n]=t+"",i)},i.mimeType=function(n){return arguments.length?(t=null==n?null:n+"",i):t},i.responseType=function(n){return arguments.length?(s=n,i):s},i.response=function(n){return e=n,i},["get","post"].forEach(function(n){i[n]=function(){return i.send.apply(i,[n].concat(Xo(arguments)))}}),i.send=function(e,r,u){if(2===arguments.length&&"function"==typeof r&&(u=r,r=null),c.open(e,n,!0),null==t||"accept"in a||(a.accept=t+",*/*"),c.setRequestHeader)for(var l in a)c.setRequestHeader(l,a[l]);return null!=t&&c.overrideMimeType&&c.overrideMimeType(t),null!=s&&(c.responseType=s),null!=u&&i.on("error",u).on("load",function(n){u(null,n)}),o.beforesend.call(i,c),c.send(null==r?null:r),i},i.abort=function(){return c.abort(),i},Zo.rebind(i,o,"on"),null==r?i:i.get(Et(r))}function Et(n){return 1===n.length?function(t,e){n(null==t?e:null)}:n}function At(){var n=Ct(),t=Nt()-n;t>24?(isFinite(t)&&(clearTimeout($a),$a=setTimeout(At,t)),Xa=0):(Xa=1,Wa(At))}function Ct(){var n=Date.now();for(Ba=Za;Ba;)n>=Ba.t&&(Ba.f=Ba.c(n-Ba.t)),Ba=Ba.n;return n}function Nt(){for(var n,t=Za,e=1/0;t;)t.f?t=n?n.n=t.n:Za=t.n:(t.t<e&&(e=t.t),t=(n=t).n);return Va=n,e}function zt(n,t){return t-(n?Math.ceil(Math.log(n)/Math.LN10):1)}function Lt(n,t){var e=Math.pow(10,3*ua(8-t));return{scale:t>8?function(n){return n/e}:function(n){return n*e},symbol:n}}function Tt(n){var t=n.decimal,e=n.thousands,r=n.grouping,u=n.currency,i=r?function(n){for(var t=n.length,u=[],i=0,o=r[0];t>0&&o>0;)u.push(n.substring(t-=o,t+o)),o=r[i=(i+1)%r.length];return u.reverse().join(e)}:wt;return function(n){var e=Ga.exec(n),r=e[1]||" ",o=e[2]||">",a=e[3]||"",c=e[4]||"",s=e[5],l=+e[6],f=e[7],h=e[8],g=e[9],p=1,v="",d="",m=!1;switch(h&&(h=+h.substring(1)),(s||"0"===r&&"="===o)&&(s=r="0",o="=",f&&(l-=Math.floor((l-1)/4))),g){case"n":f=!0,g="g";break;case"%":p=100,d="%",g="f";break;case"p":p=100,d="%",g="r";break;case"b":case"o":case"x":case"X":"#"===c&&(v="0"+g.toLowerCase());case"c":case"d":m=!0,h=0;break;case"s":p=-1,g="r"}"$"===c&&(v=u[0],d=u[1]),"r"!=g||h||(g="g"),null!=h&&("g"==g?h=Math.max(1,Math.min(21,h)):("e"==g||"f"==g)&&(h=Math.max(0,Math.min(20,h)))),g=Ka.get(g)||qt;var y=s&&f;return function(n){var e=d;if(m&&n%1)return"";var u=0>n||0===n&&0>1/n?(n=-n,"-"):a;if(0>p){var c=Zo.formatPrefix(n,h);n=c.scale(n),e=c.symbol+d}else n*=p;n=g(n,h);var x=n.lastIndexOf("."),M=0>x?n:n.substring(0,x),_=0>x?"":t+n.substring(x+1);!s&&f&&(M=i(M));var b=v.length+M.length+_.length+(y?0:u.length),w=l>b?new Array(b=l-b+1).join(r):"";return y&&(M=i(w+M)),u+=v,n=M+_,("<"===o?u+n+w:">"===o?w+u+n:"^"===o?w.substring(0,b>>=1)+u+n+w.substring(b):u+(y?n:w+n))+e}}}function qt(n){return n+""}function Rt(){this._=new Date(arguments.length>1?Date.UTC.apply(this,arguments):arguments[0])}function Dt(n,t,e){function r(t){var e=n(t),r=i(e,1);return r-t>t-e?e:r}function u(e){return t(e=n(new nc(e-1)),1),e}function i(n,e){return t(n=new nc(+n),e),n}function o(n,r,i){var o=u(n),a=[];if(i>1)for(;r>o;)e(o)%i||a.push(new Date(+o)),t(o,1);else for(;r>o;)a.push(new Date(+o)),t(o,1);return a}function a(n,t,e){try{nc=Rt;var r=new Rt;return r._=n,o(r,t,e)}finally{nc=Date}}n.floor=n,n.round=r,n.ceil=u,n.offset=i,n.range=o;var c=n.utc=Pt(n);return c.floor=c,c.round=Pt(r),c.ceil=Pt(u),c.offset=Pt(i),c.range=a,n}function Pt(n){return function(t,e){try{nc=Rt;var r=new Rt;return r._=t,n(r,e)._}finally{nc=Date}}}function Ut(n){function t(n){function t(t){for(var e,u,i,o=[],a=-1,c=0;++a<r;)37===n.charCodeAt(a)&&(o.push(n.substring(c,a)),null!=(u=ec[e=n.charAt(++a)])&&(e=n.charAt(++a)),(i=C[e])&&(e=i(t,null==u?"e"===e?" ":"0":u)),o.push(e),c=a+1);return o.push(n.substring(c,a)),o.join("")}var r=n.length;return t.parse=function(t){var r={y:1900,m:0,d:1,H:0,M:0,S:0,L:0,Z:null},u=e(r,n,t,0);if(u!=t.length)return null;"p"in r&&(r.H=r.H%12+12*r.p);var i=null!=r.Z&&nc!==Rt,o=new(i?Rt:nc);return"j"in r?o.setFullYear(r.y,0,r.j):"w"in r&&("W"in r||"U"in r)?(o.setFullYear(r.y,0,1),o.setFullYear(r.y,0,"W"in r?(r.w+6)%7+7*r.W-(o.getDay()+5)%7:r.w+7*r.U-(o.getDay()+6)%7)):o.setFullYear(r.y,r.m,r.d),o.setHours(r.H+Math.floor(r.Z/100),r.M+r.Z%100,r.S,r.L),i?o._:o},t.toString=function(){return n},t}function e(n,t,e,r){for(var u,i,o,a=0,c=t.length,s=e.length;c>a;){if(r>=s)return-1;if(u=t.charCodeAt(a++),37===u){if(o=t.charAt(a++),i=N[o in ec?t.charAt(a++):o],!i||(r=i(n,e,r))<0)return-1}else if(u!=e.charCodeAt(r++))return-1}return r}function r(n,t,e){b.lastIndex=0;var r=b.exec(t.substring(e));return r?(n.w=w.get(r[0].toLowerCase()),e+r[0].length):-1}function u(n,t,e){M.lastIndex=0;var r=M.exec(t.substring(e));return r?(n.w=_.get(r[0].toLowerCase()),e+r[0].length):-1}function i(n,t,e){E.lastIndex=0;var r=E.exec(t.substring(e));return r?(n.m=A.get(r[0].toLowerCase()),e+r[0].length):-1}function o(n,t,e){S.lastIndex=0;var r=S.exec(t.substring(e));return r?(n.m=k.get(r[0].toLowerCase()),e+r[0].length):-1}function a(n,t,r){return e(n,C.c.toString(),t,r)}function c(n,t,r){return e(n,C.x.toString(),t,r)}function s(n,t,r){return e(n,C.X.toString(),t,r)}function l(n,t,e){var r=x.get(t.substring(e,e+=2).toLowerCase());return null==r?-1:(n.p=r,e)}var f=n.dateTime,h=n.date,g=n.time,p=n.periods,v=n.days,d=n.shortDays,m=n.months,y=n.shortMonths;t.utc=function(n){function e(n){try{nc=Rt;var t=new nc;return t._=n,r(t)}finally{nc=Date}}var r=t(n);return e.parse=function(n){try{nc=Rt;var t=r.parse(n);return t&&t._}finally{nc=Date}},e.toString=r.toString,e},t.multi=t.utc.multi=re;var x=Zo.map(),M=Ht(v),_=Ft(v),b=Ht(d),w=Ft(d),S=Ht(m),k=Ft(m),E=Ht(y),A=Ft(y);p.forEach(function(n,t){x.set(n.toLowerCase(),t)});var C={a:function(n){return d[n.getDay()]},A:function(n){return v[n.getDay()]},b:function(n){return y[n.getMonth()]},B:function(n){return m[n.getMonth()]},c:t(f),d:function(n,t){return jt(n.getDate(),t,2)},e:function(n,t){return jt(n.getDate(),t,2)},H:function(n,t){return jt(n.getHours(),t,2)},I:function(n,t){return jt(n.getHours()%12||12,t,2)},j:function(n,t){return jt(1+Qa.dayOfYear(n),t,3)},L:function(n,t){return jt(n.getMilliseconds(),t,3)},m:function(n,t){return jt(n.getMonth()+1,t,2)},M:function(n,t){return jt(n.getMinutes(),t,2)},p:function(n){return p[+(n.getHours()>=12)]},S:function(n,t){return jt(n.getSeconds(),t,2)},U:function(n,t){return jt(Qa.sundayOfYear(n),t,2)},w:function(n){return n.getDay()},W:function(n,t){return jt(Qa.mondayOfYear(n),t,2)},x:t(h),X:t(g),y:function(n,t){return jt(n.getFullYear()%100,t,2)},Y:function(n,t){return jt(n.getFullYear()%1e4,t,4)},Z:te,"%":function(){return"%"}},N={a:r,A:u,b:i,B:o,c:a,d:Wt,e:Wt,H:Gt,I:Gt,j:Jt,L:ne,m:Bt,M:Kt,p:l,S:Qt,U:Yt,w:Ot,W:It,x:c,X:s,y:Vt,Y:Zt,Z:Xt,"%":ee};return t}function jt(n,t,e){var r=0>n?"-":"",u=(r?-n:n)+"",i=u.length;return r+(e>i?new Array(e-i+1).join(t)+u:u)}function Ht(n){return new RegExp("^(?:"+n.map(Zo.requote).join("|")+")","i")}function Ft(n){for(var t=new o,e=-1,r=n.length;++e<r;)t.set(n[e].toLowerCase(),e);return t}function Ot(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e,e+1));return r?(n.w=+r[0],e+r[0].length):-1}function Yt(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e));return r?(n.U=+r[0],e+r[0].length):-1}function It(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e));return r?(n.W=+r[0],e+r[0].length):-1}function Zt(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e,e+4));return r?(n.y=+r[0],e+r[0].length):-1}function Vt(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e,e+2));return r?(n.y=$t(+r[0]),e+r[0].length):-1}function Xt(n,t,e){return/^[+-]\d{4}$/.test(t=t.substring(e,e+5))?(n.Z=-t,e+5):-1}function $t(n){return n+(n>68?1900:2e3)}function Bt(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e,e+2));return r?(n.m=r[0]-1,e+r[0].length):-1}function Wt(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e,e+2));return r?(n.d=+r[0],e+r[0].length):-1}function Jt(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e,e+3));return r?(n.j=+r[0],e+r[0].length):-1}function Gt(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e,e+2));return r?(n.H=+r[0],e+r[0].length):-1}function Kt(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e,e+2));return r?(n.M=+r[0],e+r[0].length):-1}function Qt(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e,e+2));return r?(n.S=+r[0],e+r[0].length):-1}function ne(n,t,e){rc.lastIndex=0;var r=rc.exec(t.substring(e,e+3));return r?(n.L=+r[0],e+r[0].length):-1}function te(n){var t=n.getTimezoneOffset(),e=t>0?"-":"+",r=~~(ua(t)/60),u=ua(t)%60;return e+jt(r,"0",2)+jt(u,"0",2)}function ee(n,t,e){uc.lastIndex=0;var r=uc.exec(t.substring(e,e+1));return r?e+r[0].length:-1}function re(n){for(var t=n.length,e=-1;++e<t;)n[e][0]=this(n[e][0]);return function(t){for(var e=0,r=n[e];!r[1](t);)r=n[++e];return r[0](t)}}function ue(){}function ie(n,t,e){var r=e.s=n+t,u=r-n,i=r-u;e.t=n-i+(t-u)}function oe(n,t){n&&cc.hasOwnProperty(n.type)&&cc[n.type](n,t)}function ae(n,t,e){var r,u=-1,i=n.length-e;for(t.lineStart();++u<i;)r=n[u],t.point(r[0],r[1],r[2]);t.lineEnd()}function ce(n,t){var e=-1,r=n.length;for(t.polygonStart();++e<r;)ae(n[e],t,1);t.polygonEnd()}function se(){function n(n,t){n*=Aa,t=t*Aa/2+ba/4;var e=n-r,o=e>=0?1:-1,a=o*e,c=Math.cos(t),s=Math.sin(t),l=i*s,f=u*c+l*Math.cos(a),h=l*o*Math.sin(a);lc.add(Math.atan2(h,f)),r=n,u=c,i=s}var t,e,r,u,i;fc.point=function(o,a){fc.point=n,r=(t=o)*Aa,u=Math.cos(a=(e=a)*Aa/2+ba/4),i=Math.sin(a)},fc.lineEnd=function(){n(t,e)}}function le(n){var t=n[0],e=n[1],r=Math.cos(e);return[r*Math.cos(t),r*Math.sin(t),Math.sin(e)]}function fe(n,t){return n[0]*t[0]+n[1]*t[1]+n[2]*t[2]}function he(n,t){return[n[1]*t[2]-n[2]*t[1],n[2]*t[0]-n[0]*t[2],n[0]*t[1]-n[1]*t[0]]}function ge(n,t){n[0]+=t[0],n[1]+=t[1],n[2]+=t[2]}function pe(n,t){return[n[0]*t,n[1]*t,n[2]*t]}function ve(n){var t=Math.sqrt(n[0]*n[0]+n[1]*n[1]+n[2]*n[2]);n[0]/=t,n[1]/=t,n[2]/=t}function de(n){return[Math.atan2(n[1],n[0]),G(n[2])]}function me(n,t){return ua(n[0]-t[0])<ka&&ua(n[1]-t[1])<ka}function ye(n,t){n*=Aa;var e=Math.cos(t*=Aa);xe(e*Math.cos(n),e*Math.sin(n),Math.sin(t))}function xe(n,t,e){++hc,pc+=(n-pc)/hc,vc+=(t-vc)/hc,dc+=(e-dc)/hc}function Me(){function n(n,u){n*=Aa;var i=Math.cos(u*=Aa),o=i*Math.cos(n),a=i*Math.sin(n),c=Math.sin(u),s=Math.atan2(Math.sqrt((s=e*c-r*a)*s+(s=r*o-t*c)*s+(s=t*a-e*o)*s),t*o+e*a+r*c);gc+=s,mc+=s*(t+(t=o)),yc+=s*(e+(e=a)),xc+=s*(r+(r=c)),xe(t,e,r)}var t,e,r;wc.point=function(u,i){u*=Aa;var o=Math.cos(i*=Aa);t=o*Math.cos(u),e=o*Math.sin(u),r=Math.sin(i),wc.point=n,xe(t,e,r)}}function _e(){wc.point=ye}function be(){function n(n,t){n*=Aa;var e=Math.cos(t*=Aa),o=e*Math.cos(n),a=e*Math.sin(n),c=Math.sin(t),s=u*c-i*a,l=i*o-r*c,f=r*a-u*o,h=Math.sqrt(s*s+l*l+f*f),g=r*o+u*a+i*c,p=h&&-J(g)/h,v=Math.atan2(h,g);Mc+=p*s,_c+=p*l,bc+=p*f,gc+=v,mc+=v*(r+(r=o)),yc+=v*(u+(u=a)),xc+=v*(i+(i=c)),xe(r,u,i)}var t,e,r,u,i;wc.point=function(o,a){t=o,e=a,wc.point=n,o*=Aa;var c=Math.cos(a*=Aa);r=c*Math.cos(o),u=c*Math.sin(o),i=Math.sin(a),xe(r,u,i)},wc.lineEnd=function(){n(t,e),wc.lineEnd=_e,wc.point=ye}}function we(){return!0}function Se(n,t,e,r,u){var i=[],o=[];if(n.forEach(function(n){if(!((t=n.length-1)<=0)){var t,e=n[0],r=n[t];if(me(e,r)){u.lineStart();for(var a=0;t>a;++a)u.point((e=n[a])[0],e[1]);return u.lineEnd(),void 0}var c=new Ee(e,n,null,!0),s=new Ee(e,null,c,!1);c.o=s,i.push(c),o.push(s),c=new Ee(r,n,null,!1),s=new Ee(r,null,c,!0),c.o=s,i.push(c),o.push(s)}}),o.sort(t),ke(i),ke(o),i.length){for(var a=0,c=e,s=o.length;s>a;++a)o[a].e=c=!c;for(var l,f,h=i[0];;){for(var g=h,p=!0;g.v;)if((g=g.n)===h)return;l=g.z,u.lineStart();do{if(g.v=g.o.v=!0,g.e){if(p)for(var a=0,s=l.length;s>a;++a)u.point((f=l[a])[0],f[1]);else r(g.x,g.n.x,1,u);g=g.n}else{if(p){l=g.p.z;for(var a=l.length-1;a>=0;--a)u.point((f=l[a])[0],f[1])}else r(g.x,g.p.x,-1,u);g=g.p}g=g.o,l=g.z,p=!p}while(!g.v);u.lineEnd()}}}function ke(n){if(t=n.length){for(var t,e,r=0,u=n[0];++r<t;)u.n=e=n[r],e.p=u,u=e;u.n=e=n[0],e.p=u}}function Ee(n,t,e,r){this.x=n,this.z=t,this.o=e,this.e=r,this.v=!1,this.n=this.p=null}function Ae(n,t,e,r){return function(u,i){function o(t,e){var r=u(t,e);n(t=r[0],e=r[1])&&i.point(t,e)}function a(n,t){var e=u(n,t);d.point(e[0],e[1])}function c(){y.point=a,d.lineStart()}function s(){y.point=o,d.lineEnd()}function l(n,t){v.push([n,t]);var e=u(n,t);M.point(e[0],e[1])}function f(){M.lineStart(),v=[]}function h(){l(v[0][0],v[0][1]),M.lineEnd();var n,t=M.clean(),e=x.buffer(),r=e.length;if(v.pop(),p.push(v),v=null,r)if(1&t){n=e[0];var u,r=n.length-1,o=-1;if(r>0){for(_||(i.polygonStart(),_=!0),i.lineStart();++o<r;)i.point((u=n[o])[0],u[1]);i.lineEnd()}}else r>1&&2&t&&e.push(e.pop().concat(e.shift())),g.push(e.filter(Ce))}var g,p,v,d=t(i),m=u.invert(r[0],r[1]),y={point:o,lineStart:c,lineEnd:s,polygonStart:function(){y.point=l,y.lineStart=f,y.lineEnd=h,g=[],p=[]},polygonEnd:function(){y.point=o,y.lineStart=c,y.lineEnd=s,g=Zo.merge(g);var n=Le(m,p);g.length?(_||(i.polygonStart(),_=!0),Se(g,ze,n,e,i)):n&&(_||(i.polygonStart(),_=!0),i.lineStart(),e(null,null,1,i),i.lineEnd()),_&&(i.polygonEnd(),_=!1),g=p=null},sphere:function(){i.polygonStart(),i.lineStart(),e(null,null,1,i),i.lineEnd(),i.polygonEnd()}},x=Ne(),M=t(x),_=!1;return y}}function Ce(n){return n.length>1}function Ne(){var n,t=[];return{lineStart:function(){t.push(n=[])},point:function(t,e){n.push([t,e])},lineEnd:v,buffer:function(){var e=t;return t=[],n=null,e},rejoin:function(){t.length>1&&t.push(t.pop().concat(t.shift()))}}}function ze(n,t){return((n=n.x)[0]<0?n[1]-Sa-ka:Sa-n[1])-((t=t.x)[0]<0?t[1]-Sa-ka:Sa-t[1])}function Le(n,t){var e=n[0],r=n[1],u=[Math.sin(e),-Math.cos(e),0],i=0,o=0;lc.reset();for(var a=0,c=t.length;c>a;++a){var s=t[a],l=s.length;if(l)for(var f=s[0],h=f[0],g=f[1]/2+ba/4,p=Math.sin(g),v=Math.cos(g),d=1;;){d===l&&(d=0),n=s[d];var m=n[0],y=n[1]/2+ba/4,x=Math.sin(y),M=Math.cos(y),_=m-h,b=_>=0?1:-1,w=b*_,S=w>ba,k=p*x;if(lc.add(Math.atan2(k*b*Math.sin(w),v*M+k*Math.cos(w))),i+=S?_+b*wa:_,S^h>=e^m>=e){var E=he(le(f),le(n));ve(E);var A=he(u,E);ve(A);var C=(S^_>=0?-1:1)*G(A[2]);(r>C||r===C&&(E[0]||E[1]))&&(o+=S^_>=0?1:-1)}if(!d++)break;h=m,p=x,v=M,f=n}}return(-ka>i||ka>i&&0>lc)^1&o}function Te(n){var t,e=0/0,r=0/0,u=0/0;return{lineStart:function(){n.lineStart(),t=1},point:function(i,o){var a=i>0?ba:-ba,c=ua(i-e);ua(c-ba)<ka?(n.point(e,r=(r+o)/2>0?Sa:-Sa),n.point(u,r),n.lineEnd(),n.lineStart(),n.point(a,r),n.point(i,r),t=0):u!==a&&c>=ba&&(ua(e-u)<ka&&(e-=u*ka),ua(i-a)<ka&&(i-=a*ka),r=qe(e,r,i,o),n.point(u,r),n.lineEnd(),n.lineStart(),n.point(a,r),t=0),n.point(e=i,r=o),u=a},lineEnd:function(){n.lineEnd(),e=r=0/0},clean:function(){return 2-t}}}function qe(n,t,e,r){var u,i,o=Math.sin(n-e);return ua(o)>ka?Math.atan((Math.sin(t)*(i=Math.cos(r))*Math.sin(e)-Math.sin(r)*(u=Math.cos(t))*Math.sin(n))/(u*i*o)):(t+r)/2}function Re(n,t,e,r){var u;if(null==n)u=e*Sa,r.point(-ba,u),r.point(0,u),r.point(ba,u),r.point(ba,0),r.point(ba,-u),r.point(0,-u),r.point(-ba,-u),r.point(-ba,0),r.point(-ba,u);else if(ua(n[0]-t[0])>ka){var i=n[0]<t[0]?ba:-ba;u=e*i/2,r.point(-i,u),r.point(0,u),r.point(i,u)}else r.point(t[0],t[1])}function De(n){function t(n,t){return Math.cos(n)*Math.cos(t)>i}function e(n){var e,i,c,s,l;return{lineStart:function(){s=c=!1,l=1},point:function(f,h){var g,p=[f,h],v=t(f,h),d=o?v?0:u(f,h):v?u(f+(0>f?ba:-ba),h):0;if(!e&&(s=c=v)&&n.lineStart(),v!==c&&(g=r(e,p),(me(e,g)||me(p,g))&&(p[0]+=ka,p[1]+=ka,v=t(p[0],p[1]))),v!==c)l=0,v?(n.lineStart(),g=r(p,e),n.point(g[0],g[1])):(g=r(e,p),n.point(g[0],g[1]),n.lineEnd()),e=g;else if(a&&e&&o^v){var m;d&i||!(m=r(p,e,!0))||(l=0,o?(n.lineStart(),n.point(m[0][0],m[0][1]),n.point(m[1][0],m[1][1]),n.lineEnd()):(n.point(m[1][0],m[1][1]),n.lineEnd(),n.lineStart(),n.point(m[0][0],m[0][1])))}!v||e&&me(e,p)||n.point(p[0],p[1]),e=p,c=v,i=d},lineEnd:function(){c&&n.lineEnd(),e=null},clean:function(){return l|(s&&c)<<1}}}function r(n,t,e){var r=le(n),u=le(t),o=[1,0,0],a=he(r,u),c=fe(a,a),s=a[0],l=c-s*s;if(!l)return!e&&n;var f=i*c/l,h=-i*s/l,g=he(o,a),p=pe(o,f),v=pe(a,h);ge(p,v);var d=g,m=fe(p,d),y=fe(d,d),x=m*m-y*(fe(p,p)-1);if(!(0>x)){var M=Math.sqrt(x),_=pe(d,(-m-M)/y);if(ge(_,p),_=de(_),!e)return _;var b,w=n[0],S=t[0],k=n[1],E=t[1];w>S&&(b=w,w=S,S=b);var A=S-w,C=ua(A-ba)<ka,N=C||ka>A;if(!C&&k>E&&(b=k,k=E,E=b),N?C?k+E>0^_[1]<(ua(_[0]-w)<ka?k:E):k<=_[1]&&_[1]<=E:A>ba^(w<=_[0]&&_[0]<=S)){var z=pe(d,(-m+M)/y);return ge(z,p),[_,de(z)]}}}function u(t,e){var r=o?n:ba-n,u=0;return-r>t?u|=1:t>r&&(u|=2),-r>e?u|=4:e>r&&(u|=8),u}var i=Math.cos(n),o=i>0,a=ua(i)>ka,c=sr(n,6*Aa);return Ae(t,e,c,o?[0,-n]:[-ba,n-ba])}function Pe(n,t,e,r){return function(u){var i,o=u.a,a=u.b,c=o.x,s=o.y,l=a.x,f=a.y,h=0,g=1,p=l-c,v=f-s;if(i=n-c,p||!(i>0)){if(i/=p,0>p){if(h>i)return;g>i&&(g=i)}else if(p>0){if(i>g)return;i>h&&(h=i)}if(i=e-c,p||!(0>i)){if(i/=p,0>p){if(i>g)return;i>h&&(h=i)}else if(p>0){if(h>i)return;g>i&&(g=i)}if(i=t-s,v||!(i>0)){if(i/=v,0>v){if(h>i)return;g>i&&(g=i)}else if(v>0){if(i>g)return;i>h&&(h=i)}if(i=r-s,v||!(0>i)){if(i/=v,0>v){if(i>g)return;i>h&&(h=i)}else if(v>0){if(h>i)return;g>i&&(g=i)}return h>0&&(u.a={x:c+h*p,y:s+h*v}),1>g&&(u.b={x:c+g*p,y:s+g*v}),u}}}}}}function Ue(n,t,e,r){function u(r,u){return ua(r[0]-n)<ka?u>0?0:3:ua(r[0]-e)<ka?u>0?2:1:ua(r[1]-t)<ka?u>0?1:0:u>0?3:2}function i(n,t){return o(n.x,t.x)}function o(n,t){var e=u(n,1),r=u(t,1);return e!==r?e-r:0===e?t[1]-n[1]:1===e?n[0]-t[0]:2===e?n[1]-t[1]:t[0]-n[0]}return function(a){function c(n){for(var t=0,e=d.length,r=n[1],u=0;e>u;++u)for(var i,o=1,a=d[u],c=a.length,s=a[0];c>o;++o)i=a[o],s[1]<=r?i[1]>r&&W(s,i,n)>0&&++t:i[1]<=r&&W(s,i,n)<0&&--t,s=i;return 0!==t}function s(i,a,c,s){var l=0,f=0;if(null==i||(l=u(i,c))!==(f=u(a,c))||o(i,a)<0^c>0){do s.point(0===l||3===l?n:e,l>1?r:t);while((l=(l+c+4)%4)!==f)}else s.point(a[0],a[1])}function l(u,i){return u>=n&&e>=u&&i>=t&&r>=i}function f(n,t){l(n,t)&&a.point(n,t)}function h(){N.point=p,d&&d.push(m=[]),S=!0,w=!1,_=b=0/0}function g(){v&&(p(y,x),M&&w&&A.rejoin(),v.push(A.buffer())),N.point=f,w&&a.lineEnd()}function p(n,t){n=Math.max(-kc,Math.min(kc,n)),t=Math.max(-kc,Math.min(kc,t));var e=l(n,t);if(d&&m.push([n,t]),S)y=n,x=t,M=e,S=!1,e&&(a.lineStart(),a.point(n,t));else if(e&&w)a.point(n,t);else{var r={a:{x:_,y:b},b:{x:n,y:t}};C(r)?(w||(a.lineStart(),a.point(r.a.x,r.a.y)),a.point(r.b.x,r.b.y),e||a.lineEnd(),k=!1):e&&(a.lineStart(),a.point(n,t),k=!1)}_=n,b=t,w=e}var v,d,m,y,x,M,_,b,w,S,k,E=a,A=Ne(),C=Pe(n,t,e,r),N={point:f,lineStart:h,lineEnd:g,polygonStart:function(){a=A,v=[],d=[],k=!0},polygonEnd:function(){a=E,v=Zo.merge(v);var t=c([n,r]),e=k&&t,u=v.length;(e||u)&&(a.polygonStart(),e&&(a.lineStart(),s(null,null,1,a),a.lineEnd()),u&&Se(v,i,t,s,a),a.polygonEnd()),v=d=m=null}};return N}}function je(n,t){function e(e,r){return e=n(e,r),t(e[0],e[1])}return n.invert&&t.invert&&(e.invert=function(e,r){return e=t.invert(e,r),e&&n.invert(e[0],e[1])}),e}function He(n){var t=0,e=ba/3,r=tr(n),u=r(t,e);return u.parallels=function(n){return arguments.length?r(t=n[0]*ba/180,e=n[1]*ba/180):[180*(t/ba),180*(e/ba)]},u}function Fe(n,t){function e(n,t){var e=Math.sqrt(i-2*u*Math.sin(t))/u;return[e*Math.sin(n*=u),o-e*Math.cos(n)]}var r=Math.sin(n),u=(r+Math.sin(t))/2,i=1+r*(2*u-r),o=Math.sqrt(i)/u;return e.invert=function(n,t){var e=o-t;return[Math.atan2(n,e)/u,G((i-(n*n+e*e)*u*u)/(2*u))]},e}function Oe(){function n(n,t){Ac+=u*n-r*t,r=n,u=t}var t,e,r,u;Tc.point=function(i,o){Tc.point=n,t=r=i,e=u=o},Tc.lineEnd=function(){n(t,e)}}function Ye(n,t){Cc>n&&(Cc=n),n>zc&&(zc=n),Nc>t&&(Nc=t),t>Lc&&(Lc=t)}function Ie(){function n(n,t){o.push("M",n,",",t,i)}function t(n,t){o.push("M",n,",",t),a.point=e}function e(n,t){o.push("L",n,",",t)}function r(){a.point=n}function u(){o.push("Z")}var i=Ze(4.5),o=[],a={point:n,lineStart:function(){a.point=t},lineEnd:r,polygonStart:function(){a.lineEnd=u},polygonEnd:function(){a.lineEnd=r,a.point=n},pointRadius:function(n){return i=Ze(n),a},result:function(){if(o.length){var n=o.join("");return o=[],n}}};return a}function Ze(n){return"m0,"+n+"a"+n+","+n+" 0 1,1 0,"+-2*n+"a"+n+","+n+" 0 1,1 0,"+2*n+"z"}function Ve(n,t){pc+=n,vc+=t,++dc}function Xe(){function n(n,r){var u=n-t,i=r-e,o=Math.sqrt(u*u+i*i);mc+=o*(t+n)/2,yc+=o*(e+r)/2,xc+=o,Ve(t=n,e=r)}var t,e;Rc.point=function(r,u){Rc.point=n,Ve(t=r,e=u)}}function $e(){Rc.point=Ve}function Be(){function n(n,t){var e=n-r,i=t-u,o=Math.sqrt(e*e+i*i);mc+=o*(r+n)/2,yc+=o*(u+t)/2,xc+=o,o=u*n-r*t,Mc+=o*(r+n),_c+=o*(u+t),bc+=3*o,Ve(r=n,u=t)}var t,e,r,u;Rc.point=function(i,o){Rc.point=n,Ve(t=r=i,e=u=o)},Rc.lineEnd=function(){n(t,e)}}function We(n){function t(t,e){n.moveTo(t,e),n.arc(t,e,o,0,wa)}function e(t,e){n.moveTo(t,e),a.point=r}function r(t,e){n.lineTo(t,e)}function u(){a.point=t}function i(){n.closePath()}var o=4.5,a={point:t,lineStart:function(){a.point=e},lineEnd:u,polygonStart:function(){a.lineEnd=i},polygonEnd:function(){a.lineEnd=u,a.point=t},pointRadius:function(n){return o=n,a},result:v};return a}function Je(n){function t(n){return(a?r:e)(n)}function e(t){return Qe(t,function(e,r){e=n(e,r),t.point(e[0],e[1])})}function r(t){function e(e,r){e=n(e,r),t.point(e[0],e[1])}function r(){x=0/0,S.point=i,t.lineStart()}function i(e,r){var i=le([e,r]),o=n(e,r);u(x,M,y,_,b,w,x=o[0],M=o[1],y=e,_=i[0],b=i[1],w=i[2],a,t),t.point(x,M)}function o(){S.point=e,t.lineEnd()}function c(){r(),S.point=s,S.lineEnd=l}function s(n,t){i(f=n,h=t),g=x,p=M,v=_,d=b,m=w,S.point=i}function l(){u(x,M,y,_,b,w,g,p,f,v,d,m,a,t),S.lineEnd=o,o()}var f,h,g,p,v,d,m,y,x,M,_,b,w,S={point:e,lineStart:r,lineEnd:o,polygonStart:function(){t.polygonStart(),S.lineStart=c},polygonEnd:function(){t.polygonEnd(),S.lineStart=r}};return S}function u(t,e,r,a,c,s,l,f,h,g,p,v,d,m){var y=l-t,x=f-e,M=y*y+x*x;if(M>4*i&&d--){var _=a+g,b=c+p,w=s+v,S=Math.sqrt(_*_+b*b+w*w),k=Math.asin(w/=S),E=ua(ua(w)-1)<ka||ua(r-h)<ka?(r+h)/2:Math.atan2(b,_),A=n(E,k),C=A[0],N=A[1],z=C-t,L=N-e,T=x*z-y*L;(T*T/M>i||ua((y*z+x*L)/M-.5)>.3||o>a*g+c*p+s*v)&&(u(t,e,r,a,c,s,C,N,E,_/=S,b/=S,w,d,m),m.point(C,N),u(C,N,E,_,b,w,l,f,h,g,p,v,d,m))}}var i=.5,o=Math.cos(30*Aa),a=16;
+return t.precision=function(n){return arguments.length?(a=(i=n*n)>0&&16,t):Math.sqrt(i)},t}function Ge(n){var t=Je(function(t,e){return n([t*Ca,e*Ca])});return function(n){return er(t(n))}}function Ke(n){this.stream=n}function Qe(n,t){return{point:t,sphere:function(){n.sphere()},lineStart:function(){n.lineStart()},lineEnd:function(){n.lineEnd()},polygonStart:function(){n.polygonStart()},polygonEnd:function(){n.polygonEnd()}}}function nr(n){return tr(function(){return n})()}function tr(n){function t(n){return n=a(n[0]*Aa,n[1]*Aa),[n[0]*h+c,s-n[1]*h]}function e(n){return n=a.invert((n[0]-c)/h,(s-n[1])/h),n&&[n[0]*Ca,n[1]*Ca]}function r(){a=je(o=ir(m,y,x),i);var n=i(v,d);return c=g-n[0]*h,s=p+n[1]*h,u()}function u(){return l&&(l.valid=!1,l=null),t}var i,o,a,c,s,l,f=Je(function(n,t){return n=i(n,t),[n[0]*h+c,s-n[1]*h]}),h=150,g=480,p=250,v=0,d=0,m=0,y=0,x=0,M=Sc,_=wt,b=null,w=null;return t.stream=function(n){return l&&(l.valid=!1),l=er(M(o,f(_(n)))),l.valid=!0,l},t.clipAngle=function(n){return arguments.length?(M=null==n?(b=n,Sc):De((b=+n)*Aa),u()):b},t.clipExtent=function(n){return arguments.length?(w=n,_=n?Ue(n[0][0],n[0][1],n[1][0],n[1][1]):wt,u()):w},t.scale=function(n){return arguments.length?(h=+n,r()):h},t.translate=function(n){return arguments.length?(g=+n[0],p=+n[1],r()):[g,p]},t.center=function(n){return arguments.length?(v=n[0]%360*Aa,d=n[1]%360*Aa,r()):[v*Ca,d*Ca]},t.rotate=function(n){return arguments.length?(m=n[0]%360*Aa,y=n[1]%360*Aa,x=n.length>2?n[2]%360*Aa:0,r()):[m*Ca,y*Ca,x*Ca]},Zo.rebind(t,f,"precision"),function(){return i=n.apply(this,arguments),t.invert=i.invert&&e,r()}}function er(n){return Qe(n,function(t,e){n.point(t*Aa,e*Aa)})}function rr(n,t){return[n,t]}function ur(n,t){return[n>ba?n-wa:-ba>n?n+wa:n,t]}function ir(n,t,e){return n?t||e?je(ar(n),cr(t,e)):ar(n):t||e?cr(t,e):ur}function or(n){return function(t,e){return t+=n,[t>ba?t-wa:-ba>t?t+wa:t,e]}}function ar(n){var t=or(n);return t.invert=or(-n),t}function cr(n,t){function e(n,t){var e=Math.cos(t),a=Math.cos(n)*e,c=Math.sin(n)*e,s=Math.sin(t),l=s*r+a*u;return[Math.atan2(c*i-l*o,a*r-s*u),G(l*i+c*o)]}var r=Math.cos(n),u=Math.sin(n),i=Math.cos(t),o=Math.sin(t);return e.invert=function(n,t){var e=Math.cos(t),a=Math.cos(n)*e,c=Math.sin(n)*e,s=Math.sin(t),l=s*i-c*o;return[Math.atan2(c*i+s*o,a*r+l*u),G(l*r-a*u)]},e}function sr(n,t){var e=Math.cos(n),r=Math.sin(n);return function(u,i,o,a){var c=o*t;null!=u?(u=lr(e,u),i=lr(e,i),(o>0?i>u:u>i)&&(u+=o*wa)):(u=n+o*wa,i=n-.5*c);for(var s,l=u;o>0?l>i:i>l;l-=c)a.point((s=de([e,-r*Math.cos(l),-r*Math.sin(l)]))[0],s[1])}}function lr(n,t){var e=le(t);e[0]-=n,ve(e);var r=J(-e[1]);return((-e[2]<0?-r:r)+2*Math.PI-ka)%(2*Math.PI)}function fr(n,t,e){var r=Zo.range(n,t-ka,e).concat(t);return function(n){return r.map(function(t){return[n,t]})}}function hr(n,t,e){var r=Zo.range(n,t-ka,e).concat(t);return function(n){return r.map(function(t){return[t,n]})}}function gr(n){return n.source}function pr(n){return n.target}function vr(n,t,e,r){var u=Math.cos(t),i=Math.sin(t),o=Math.cos(r),a=Math.sin(r),c=u*Math.cos(n),s=u*Math.sin(n),l=o*Math.cos(e),f=o*Math.sin(e),h=2*Math.asin(Math.sqrt(tt(r-t)+u*o*tt(e-n))),g=1/Math.sin(h),p=h?function(n){var t=Math.sin(n*=h)*g,e=Math.sin(h-n)*g,r=e*c+t*l,u=e*s+t*f,o=e*i+t*a;return[Math.atan2(u,r)*Ca,Math.atan2(o,Math.sqrt(r*r+u*u))*Ca]}:function(){return[n*Ca,t*Ca]};return p.distance=h,p}function dr(){function n(n,u){var i=Math.sin(u*=Aa),o=Math.cos(u),a=ua((n*=Aa)-t),c=Math.cos(a);Dc+=Math.atan2(Math.sqrt((a=o*Math.sin(a))*a+(a=r*i-e*o*c)*a),e*i+r*o*c),t=n,e=i,r=o}var t,e,r;Pc.point=function(u,i){t=u*Aa,e=Math.sin(i*=Aa),r=Math.cos(i),Pc.point=n},Pc.lineEnd=function(){Pc.point=Pc.lineEnd=v}}function mr(n,t){function e(t,e){var r=Math.cos(t),u=Math.cos(e),i=n(r*u);return[i*u*Math.sin(t),i*Math.sin(e)]}return e.invert=function(n,e){var r=Math.sqrt(n*n+e*e),u=t(r),i=Math.sin(u),o=Math.cos(u);return[Math.atan2(n*i,r*o),Math.asin(r&&e*i/r)]},e}function yr(n,t){function e(n,t){o>0?-Sa+ka>t&&(t=-Sa+ka):t>Sa-ka&&(t=Sa-ka);var e=o/Math.pow(u(t),i);return[e*Math.sin(i*n),o-e*Math.cos(i*n)]}var r=Math.cos(n),u=function(n){return Math.tan(ba/4+n/2)},i=n===t?Math.sin(n):Math.log(r/Math.cos(t))/Math.log(u(t)/u(n)),o=r*Math.pow(u(n),i)/i;return i?(e.invert=function(n,t){var e=o-t,r=B(i)*Math.sqrt(n*n+e*e);return[Math.atan2(n,e)/i,2*Math.atan(Math.pow(o/r,1/i))-Sa]},e):Mr}function xr(n,t){function e(n,t){var e=i-t;return[e*Math.sin(u*n),i-e*Math.cos(u*n)]}var r=Math.cos(n),u=n===t?Math.sin(n):(r-Math.cos(t))/(t-n),i=r/u+n;return ua(u)<ka?rr:(e.invert=function(n,t){var e=i-t;return[Math.atan2(n,e)/u,i-B(u)*Math.sqrt(n*n+e*e)]},e)}function Mr(n,t){return[n,Math.log(Math.tan(ba/4+t/2))]}function _r(n){var t,e=nr(n),r=e.scale,u=e.translate,i=e.clipExtent;return e.scale=function(){var n=r.apply(e,arguments);return n===e?t?e.clipExtent(null):e:n},e.translate=function(){var n=u.apply(e,arguments);return n===e?t?e.clipExtent(null):e:n},e.clipExtent=function(n){var o=i.apply(e,arguments);if(o===e){if(t=null==n){var a=ba*r(),c=u();i([[c[0]-a,c[1]-a],[c[0]+a,c[1]+a]])}}else t&&(o=null);return o},e.clipExtent(null)}function br(n,t){return[Math.log(Math.tan(ba/4+t/2)),-n]}function wr(n){return n[0]}function Sr(n){return n[1]}function kr(n){for(var t=n.length,e=[0,1],r=2,u=2;t>u;u++){for(;r>1&&W(n[e[r-2]],n[e[r-1]],n[u])<=0;)--r;e[r++]=u}return e.slice(0,r)}function Er(n,t){return n[0]-t[0]||n[1]-t[1]}function Ar(n,t,e){return(e[0]-t[0])*(n[1]-t[1])<(e[1]-t[1])*(n[0]-t[0])}function Cr(n,t,e,r){var u=n[0],i=e[0],o=t[0]-u,a=r[0]-i,c=n[1],s=e[1],l=t[1]-c,f=r[1]-s,h=(a*(c-s)-f*(u-i))/(f*o-a*l);return[u+h*o,c+h*l]}function Nr(n){var t=n[0],e=n[n.length-1];return!(t[0]-e[0]||t[1]-e[1])}function zr(){Gr(this),this.edge=this.site=this.circle=null}function Lr(n){var t=Bc.pop()||new zr;return t.site=n,t}function Tr(n){Yr(n),Vc.remove(n),Bc.push(n),Gr(n)}function qr(n){var t=n.circle,e=t.x,r=t.cy,u={x:e,y:r},i=n.P,o=n.N,a=[n];Tr(n);for(var c=i;c.circle&&ua(e-c.circle.x)<ka&&ua(r-c.circle.cy)<ka;)i=c.P,a.unshift(c),Tr(c),c=i;a.unshift(c),Yr(c);for(var s=o;s.circle&&ua(e-s.circle.x)<ka&&ua(r-s.circle.cy)<ka;)o=s.N,a.push(s),Tr(s),s=o;a.push(s),Yr(s);var l,f=a.length;for(l=1;f>l;++l)s=a[l],c=a[l-1],Br(s.edge,c.site,s.site,u);c=a[0],s=a[f-1],s.edge=Xr(c.site,s.site,null,u),Or(c),Or(s)}function Rr(n){for(var t,e,r,u,i=n.x,o=n.y,a=Vc._;a;)if(r=Dr(a,o)-i,r>ka)a=a.L;else{if(u=i-Pr(a,o),!(u>ka)){r>-ka?(t=a.P,e=a):u>-ka?(t=a,e=a.N):t=e=a;break}if(!a.R){t=a;break}a=a.R}var c=Lr(n);if(Vc.insert(t,c),t||e){if(t===e)return Yr(t),e=Lr(t.site),Vc.insert(c,e),c.edge=e.edge=Xr(t.site,c.site),Or(t),Or(e),void 0;if(!e)return c.edge=Xr(t.site,c.site),void 0;Yr(t),Yr(e);var s=t.site,l=s.x,f=s.y,h=n.x-l,g=n.y-f,p=e.site,v=p.x-l,d=p.y-f,m=2*(h*d-g*v),y=h*h+g*g,x=v*v+d*d,M={x:(d*y-g*x)/m+l,y:(h*x-v*y)/m+f};Br(e.edge,s,p,M),c.edge=Xr(s,n,null,M),e.edge=Xr(n,p,null,M),Or(t),Or(e)}}function Dr(n,t){var e=n.site,r=e.x,u=e.y,i=u-t;if(!i)return r;var o=n.P;if(!o)return-1/0;e=o.site;var a=e.x,c=e.y,s=c-t;if(!s)return a;var l=a-r,f=1/i-1/s,h=l/s;return f?(-h+Math.sqrt(h*h-2*f*(l*l/(-2*s)-c+s/2+u-i/2)))/f+r:(r+a)/2}function Pr(n,t){var e=n.N;if(e)return Dr(e,t);var r=n.site;return r.y===t?r.x:1/0}function Ur(n){this.site=n,this.edges=[]}function jr(n){for(var t,e,r,u,i,o,a,c,s,l,f=n[0][0],h=n[1][0],g=n[0][1],p=n[1][1],v=Zc,d=v.length;d--;)if(i=v[d],i&&i.prepare())for(a=i.edges,c=a.length,o=0;c>o;)l=a[o].end(),r=l.x,u=l.y,s=a[++o%c].start(),t=s.x,e=s.y,(ua(r-t)>ka||ua(u-e)>ka)&&(a.splice(o,0,new Wr($r(i.site,l,ua(r-f)<ka&&p-u>ka?{x:f,y:ua(t-f)<ka?e:p}:ua(u-p)<ka&&h-r>ka?{x:ua(e-p)<ka?t:h,y:p}:ua(r-h)<ka&&u-g>ka?{x:h,y:ua(t-h)<ka?e:g}:ua(u-g)<ka&&r-f>ka?{x:ua(e-g)<ka?t:f,y:g}:null),i.site,null)),++c)}function Hr(n,t){return t.angle-n.angle}function Fr(){Gr(this),this.x=this.y=this.arc=this.site=this.cy=null}function Or(n){var t=n.P,e=n.N;if(t&&e){var r=t.site,u=n.site,i=e.site;if(r!==i){var o=u.x,a=u.y,c=r.x-o,s=r.y-a,l=i.x-o,f=i.y-a,h=2*(c*f-s*l);if(!(h>=-Ea)){var g=c*c+s*s,p=l*l+f*f,v=(f*g-s*p)/h,d=(c*p-l*g)/h,f=d+a,m=Wc.pop()||new Fr;m.arc=n,m.site=u,m.x=v+o,m.y=f+Math.sqrt(v*v+d*d),m.cy=f,n.circle=m;for(var y=null,x=$c._;x;)if(m.y<x.y||m.y===x.y&&m.x<=x.x){if(!x.L){y=x.P;break}x=x.L}else{if(!x.R){y=x;break}x=x.R}$c.insert(y,m),y||(Xc=m)}}}}function Yr(n){var t=n.circle;t&&(t.P||(Xc=t.N),$c.remove(t),Wc.push(t),Gr(t),n.circle=null)}function Ir(n){for(var t,e=Ic,r=Pe(n[0][0],n[0][1],n[1][0],n[1][1]),u=e.length;u--;)t=e[u],(!Zr(t,n)||!r(t)||ua(t.a.x-t.b.x)<ka&&ua(t.a.y-t.b.y)<ka)&&(t.a=t.b=null,e.splice(u,1))}function Zr(n,t){var e=n.b;if(e)return!0;var r,u,i=n.a,o=t[0][0],a=t[1][0],c=t[0][1],s=t[1][1],l=n.l,f=n.r,h=l.x,g=l.y,p=f.x,v=f.y,d=(h+p)/2,m=(g+v)/2;if(v===g){if(o>d||d>=a)return;if(h>p){if(i){if(i.y>=s)return}else i={x:d,y:c};e={x:d,y:s}}else{if(i){if(i.y<c)return}else i={x:d,y:s};e={x:d,y:c}}}else if(r=(h-p)/(v-g),u=m-r*d,-1>r||r>1)if(h>p){if(i){if(i.y>=s)return}else i={x:(c-u)/r,y:c};e={x:(s-u)/r,y:s}}else{if(i){if(i.y<c)return}else i={x:(s-u)/r,y:s};e={x:(c-u)/r,y:c}}else if(v>g){if(i){if(i.x>=a)return}else i={x:o,y:r*o+u};e={x:a,y:r*a+u}}else{if(i){if(i.x<o)return}else i={x:a,y:r*a+u};e={x:o,y:r*o+u}}return n.a=i,n.b=e,!0}function Vr(n,t){this.l=n,this.r=t,this.a=this.b=null}function Xr(n,t,e,r){var u=new Vr(n,t);return Ic.push(u),e&&Br(u,n,t,e),r&&Br(u,t,n,r),Zc[n.i].edges.push(new Wr(u,n,t)),Zc[t.i].edges.push(new Wr(u,t,n)),u}function $r(n,t,e){var r=new Vr(n,null);return r.a=t,r.b=e,Ic.push(r),r}function Br(n,t,e,r){n.a||n.b?n.l===e?n.b=r:n.a=r:(n.a=r,n.l=t,n.r=e)}function Wr(n,t,e){var r=n.a,u=n.b;this.edge=n,this.site=t,this.angle=e?Math.atan2(e.y-t.y,e.x-t.x):n.l===t?Math.atan2(u.x-r.x,r.y-u.y):Math.atan2(r.x-u.x,u.y-r.y)}function Jr(){this._=null}function Gr(n){n.U=n.C=n.L=n.R=n.P=n.N=null}function Kr(n,t){var e=t,r=t.R,u=e.U;u?u.L===e?u.L=r:u.R=r:n._=r,r.U=u,e.U=r,e.R=r.L,e.R&&(e.R.U=e),r.L=e}function Qr(n,t){var e=t,r=t.L,u=e.U;u?u.L===e?u.L=r:u.R=r:n._=r,r.U=u,e.U=r,e.L=r.R,e.L&&(e.L.U=e),r.R=e}function nu(n){for(;n.L;)n=n.L;return n}function tu(n,t){var e,r,u,i=n.sort(eu).pop();for(Ic=[],Zc=new Array(n.length),Vc=new Jr,$c=new Jr;;)if(u=Xc,i&&(!u||i.y<u.y||i.y===u.y&&i.x<u.x))(i.x!==e||i.y!==r)&&(Zc[i.i]=new Ur(i),Rr(i),e=i.x,r=i.y),i=n.pop();else{if(!u)break;qr(u.arc)}t&&(Ir(t),jr(t));var o={cells:Zc,edges:Ic};return Vc=$c=Ic=Zc=null,o}function eu(n,t){return t.y-n.y||t.x-n.x}function ru(n,t,e){return(n.x-e.x)*(t.y-n.y)-(n.x-t.x)*(e.y-n.y)}function uu(n){return n.x}function iu(n){return n.y}function ou(){return{leaf:!0,nodes:[],point:null,x:null,y:null}}function au(n,t,e,r,u,i){if(!n(t,e,r,u,i)){var o=.5*(e+u),a=.5*(r+i),c=t.nodes;c[0]&&au(n,c[0],e,r,o,a),c[1]&&au(n,c[1],o,r,u,a),c[2]&&au(n,c[2],e,a,o,i),c[3]&&au(n,c[3],o,a,u,i)}}function cu(n,t){n=Zo.rgb(n),t=Zo.rgb(t);var e=n.r,r=n.g,u=n.b,i=t.r-e,o=t.g-r,a=t.b-u;return function(n){return"#"+dt(Math.round(e+i*n))+dt(Math.round(r+o*n))+dt(Math.round(u+a*n))}}function su(n,t){var e,r={},u={};for(e in n)e in t?r[e]=hu(n[e],t[e]):u[e]=n[e];for(e in t)e in n||(u[e]=t[e]);return function(n){for(e in r)u[e]=r[e](n);return u}}function lu(n,t){return t-=n=+n,function(e){return n+t*e}}function fu(n,t){var e,r,u,i=Gc.lastIndex=Kc.lastIndex=0,o=-1,a=[],c=[];for(n+="",t+="";(e=Gc.exec(n))&&(r=Kc.exec(t));)(u=r.index)>i&&(u=t.substring(i,u),a[o]?a[o]+=u:a[++o]=u),(e=e[0])===(r=r[0])?a[o]?a[o]+=r:a[++o]=r:(a[++o]=null,c.push({i:o,x:lu(e,r)})),i=Kc.lastIndex;return i<t.length&&(u=t.substring(i),a[o]?a[o]+=u:a[++o]=u),a.length<2?c[0]?(t=c[0].x,function(n){return t(n)+""}):function(){return t}:(t=c.length,function(n){for(var e,r=0;t>r;++r)a[(e=c[r]).i]=e.x(n);return a.join("")})}function hu(n,t){for(var e,r=Zo.interpolators.length;--r>=0&&!(e=Zo.interpolators[r](n,t)););return e}function gu(n,t){var e,r=[],u=[],i=n.length,o=t.length,a=Math.min(n.length,t.length);for(e=0;a>e;++e)r.push(hu(n[e],t[e]));for(;i>e;++e)u[e]=n[e];for(;o>e;++e)u[e]=t[e];return function(n){for(e=0;a>e;++e)u[e]=r[e](n);return u}}function pu(n){return function(t){return 0>=t?0:t>=1?1:n(t)}}function vu(n){return function(t){return 1-n(1-t)}}function du(n){return function(t){return.5*(.5>t?n(2*t):2-n(2-2*t))}}function mu(n){return n*n}function yu(n){return n*n*n}function xu(n){if(0>=n)return 0;if(n>=1)return 1;var t=n*n,e=t*n;return 4*(.5>n?e:3*(n-t)+e-.75)}function Mu(n){return function(t){return Math.pow(t,n)}}function _u(n){return 1-Math.cos(n*Sa)}function bu(n){return Math.pow(2,10*(n-1))}function wu(n){return 1-Math.sqrt(1-n*n)}function Su(n,t){var e;return arguments.length<2&&(t=.45),arguments.length?e=t/wa*Math.asin(1/n):(n=1,e=t/4),function(r){return 1+n*Math.pow(2,-10*r)*Math.sin((r-e)*wa/t)}}function ku(n){return n||(n=1.70158),function(t){return t*t*((n+1)*t-n)}}function Eu(n){return 1/2.75>n?7.5625*n*n:2/2.75>n?7.5625*(n-=1.5/2.75)*n+.75:2.5/2.75>n?7.5625*(n-=2.25/2.75)*n+.9375:7.5625*(n-=2.625/2.75)*n+.984375}function Au(n,t){n=Zo.hcl(n),t=Zo.hcl(t);var e=n.h,r=n.c,u=n.l,i=t.h-e,o=t.c-r,a=t.l-u;return isNaN(o)&&(o=0,r=isNaN(r)?t.c:r),isNaN(i)?(i=0,e=isNaN(e)?t.h:e):i>180?i-=360:-180>i&&(i+=360),function(n){return ot(e+i*n,r+o*n,u+a*n)+""}}function Cu(n,t){n=Zo.hsl(n),t=Zo.hsl(t);var e=n.h,r=n.s,u=n.l,i=t.h-e,o=t.s-r,a=t.l-u;return isNaN(o)&&(o=0,r=isNaN(r)?t.s:r),isNaN(i)?(i=0,e=isNaN(e)?t.h:e):i>180?i-=360:-180>i&&(i+=360),function(n){return ut(e+i*n,r+o*n,u+a*n)+""}}function Nu(n,t){n=Zo.lab(n),t=Zo.lab(t);var e=n.l,r=n.a,u=n.b,i=t.l-e,o=t.a-r,a=t.b-u;return function(n){return ct(e+i*n,r+o*n,u+a*n)+""}}function zu(n,t){return t-=n,function(e){return Math.round(n+t*e)}}function Lu(n){var t=[n.a,n.b],e=[n.c,n.d],r=qu(t),u=Tu(t,e),i=qu(Ru(e,t,-u))||0;t[0]*e[1]<e[0]*t[1]&&(t[0]*=-1,t[1]*=-1,r*=-1,u*=-1),this.rotate=(r?Math.atan2(t[1],t[0]):Math.atan2(-e[0],e[1]))*Ca,this.translate=[n.e,n.f],this.scale=[r,i],this.skew=i?Math.atan2(u,i)*Ca:0}function Tu(n,t){return n[0]*t[0]+n[1]*t[1]}function qu(n){var t=Math.sqrt(Tu(n,n));return t&&(n[0]/=t,n[1]/=t),t}function Ru(n,t,e){return n[0]+=e*t[0],n[1]+=e*t[1],n}function Du(n,t){var e,r=[],u=[],i=Zo.transform(n),o=Zo.transform(t),a=i.translate,c=o.translate,s=i.rotate,l=o.rotate,f=i.skew,h=o.skew,g=i.scale,p=o.scale;return a[0]!=c[0]||a[1]!=c[1]?(r.push("translate(",null,",",null,")"),u.push({i:1,x:lu(a[0],c[0])},{i:3,x:lu(a[1],c[1])})):c[0]||c[1]?r.push("translate("+c+")"):r.push(""),s!=l?(s-l>180?l+=360:l-s>180&&(s+=360),u.push({i:r.push(r.pop()+"rotate(",null,")")-2,x:lu(s,l)})):l&&r.push(r.pop()+"rotate("+l+")"),f!=h?u.push({i:r.push(r.pop()+"skewX(",null,")")-2,x:lu(f,h)}):h&&r.push(r.pop()+"skewX("+h+")"),g[0]!=p[0]||g[1]!=p[1]?(e=r.push(r.pop()+"scale(",null,",",null,")"),u.push({i:e-4,x:lu(g[0],p[0])},{i:e-2,x:lu(g[1],p[1])})):(1!=p[0]||1!=p[1])&&r.push(r.pop()+"scale("+p+")"),e=u.length,function(n){for(var t,i=-1;++i<e;)r[(t=u[i]).i]=t.x(n);return r.join("")}}function Pu(n,t){return t=t-(n=+n)?1/(t-n):0,function(e){return(e-n)*t}}function Uu(n,t){return t=t-(n=+n)?1/(t-n):0,function(e){return Math.max(0,Math.min(1,(e-n)*t))}}function ju(n){for(var t=n.source,e=n.target,r=Fu(t,e),u=[t];t!==r;)t=t.parent,u.push(t);for(var i=u.length;e!==r;)u.splice(i,0,e),e=e.parent;return u}function Hu(n){for(var t=[],e=n.parent;null!=e;)t.push(n),n=e,e=e.parent;return t.push(n),t}function Fu(n,t){if(n===t)return n;for(var e=Hu(n),r=Hu(t),u=e.pop(),i=r.pop(),o=null;u===i;)o=u,u=e.pop(),i=r.pop();return o}function Ou(n){n.fixed|=2}function Yu(n){n.fixed&=-7}function Iu(n){n.fixed|=4,n.px=n.x,n.py=n.y}function Zu(n){n.fixed&=-5}function Vu(n,t,e){var r=0,u=0;if(n.charge=0,!n.leaf)for(var i,o=n.nodes,a=o.length,c=-1;++c<a;)i=o[c],null!=i&&(Vu(i,t,e),n.charge+=i.charge,r+=i.charge*i.cx,u+=i.charge*i.cy);if(n.point){n.leaf||(n.point.x+=Math.random()-.5,n.point.y+=Math.random()-.5);var s=t*e[n.point.index];n.charge+=n.pointCharge=s,r+=s*n.point.x,u+=s*n.point.y}n.cx=r/n.charge,n.cy=u/n.charge}function Xu(n,t){return Zo.rebind(n,t,"sort","children","value"),n.nodes=n,n.links=Ku,n}function $u(n,t){for(var e=[n];null!=(n=e.pop());)if(t(n),(u=n.children)&&(r=u.length))for(var r,u;--r>=0;)e.push(u[r])}function Bu(n,t){for(var e=[n],r=[];null!=(n=e.pop());)if(r.push(n),(i=n.children)&&(u=i.length))for(var u,i,o=-1;++o<u;)e.push(i[o]);for(;null!=(n=r.pop());)t(n)}function Wu(n){return n.children}function Ju(n){return n.value}function Gu(n,t){return t.value-n.value}function Ku(n){return Zo.merge(n.map(function(n){return(n.children||[]).map(function(t){return{source:n,target:t}})}))}function Qu(n){return n.x}function ni(n){return n.y}function ti(n,t,e){n.y0=t,n.y=e}function ei(n){return Zo.range(n.length)}function ri(n){for(var t=-1,e=n[0].length,r=[];++t<e;)r[t]=0;return r}function ui(n){for(var t,e=1,r=0,u=n[0][1],i=n.length;i>e;++e)(t=n[e][1])>u&&(r=e,u=t);return r}function ii(n){return n.reduce(oi,0)}function oi(n,t){return n+t[1]}function ai(n,t){return ci(n,Math.ceil(Math.log(t.length)/Math.LN2+1))}function ci(n,t){for(var e=-1,r=+n[0],u=(n[1]-r)/t,i=[];++e<=t;)i[e]=u*e+r;return i}function si(n){return[Zo.min(n),Zo.max(n)]}function li(n,t){return n.value-t.value}function fi(n,t){var e=n._pack_next;n._pack_next=t,t._pack_prev=n,t._pack_next=e,e._pack_prev=t}function hi(n,t){n._pack_next=t,t._pack_prev=n}function gi(n,t){var e=t.x-n.x,r=t.y-n.y,u=n.r+t.r;return.999*u*u>e*e+r*r}function pi(n){function t(n){l=Math.min(n.x-n.r,l),f=Math.max(n.x+n.r,f),h=Math.min(n.y-n.r,h),g=Math.max(n.y+n.r,g)}if((e=n.children)&&(s=e.length)){var e,r,u,i,o,a,c,s,l=1/0,f=-1/0,h=1/0,g=-1/0;if(e.forEach(vi),r=e[0],r.x=-r.r,r.y=0,t(r),s>1&&(u=e[1],u.x=u.r,u.y=0,t(u),s>2))for(i=e[2],yi(r,u,i),t(i),fi(r,i),r._pack_prev=i,fi(i,u),u=r._pack_next,o=3;s>o;o++){yi(r,u,i=e[o]);var p=0,v=1,d=1;for(a=u._pack_next;a!==u;a=a._pack_next,v++)if(gi(a,i)){p=1;break}if(1==p)for(c=r._pack_prev;c!==a._pack_prev&&!gi(c,i);c=c._pack_prev,d++);p?(d>v||v==d&&u.r<r.r?hi(r,u=a):hi(r=c,u),o--):(fi(r,i),u=i,t(i))}var m=(l+f)/2,y=(h+g)/2,x=0;for(o=0;s>o;o++)i=e[o],i.x-=m,i.y-=y,x=Math.max(x,i.r+Math.sqrt(i.x*i.x+i.y*i.y));n.r=x,e.forEach(di)}}function vi(n){n._pack_next=n._pack_prev=n}function di(n){delete n._pack_next,delete n._pack_prev}function mi(n,t,e,r){var u=n.children;if(n.x=t+=r*n.x,n.y=e+=r*n.y,n.r*=r,u)for(var i=-1,o=u.length;++i<o;)mi(u[i],t,e,r)}function yi(n,t,e){var r=n.r+e.r,u=t.x-n.x,i=t.y-n.y;if(r&&(u||i)){var o=t.r+e.r,a=u*u+i*i;o*=o,r*=r;var c=.5+(r-o)/(2*a),s=Math.sqrt(Math.max(0,2*o*(r+a)-(r-=a)*r-o*o))/(2*a);e.x=n.x+c*u+s*i,e.y=n.y+c*i-s*u}else e.x=n.x+r,e.y=n.y}function xi(n,t){return n.parent==t.parent?1:2}function Mi(n){var t=n.children;return t.length?t[0]:n.t}function _i(n){var t,e=n.children;return(t=e.length)?e[t-1]:n.t}function bi(n,t,e){var r=e/(t.i-n.i);t.c-=r,t.s+=e,n.c+=r,t.z+=e,t.m+=e}function wi(n){for(var t,e=0,r=0,u=n.children,i=u.length;--i>=0;)t=u[i],t.z+=e,t.m+=e,e+=t.s+(r+=t.c)}function Si(n,t,e){return n.a.parent===t.parent?n.a:e}function ki(n){return 1+Zo.max(n,function(n){return n.y})}function Ei(n){return n.reduce(function(n,t){return n+t.x},0)/n.length}function Ai(n){var t=n.children;return t&&t.length?Ai(t[0]):n}function Ci(n){var t,e=n.children;return e&&(t=e.length)?Ci(e[t-1]):n}function Ni(n){return{x:n.x,y:n.y,dx:n.dx,dy:n.dy}}function zi(n,t){var e=n.x+t[3],r=n.y+t[0],u=n.dx-t[1]-t[3],i=n.dy-t[0]-t[2];return 0>u&&(e+=u/2,u=0),0>i&&(r+=i/2,i=0),{x:e,y:r,dx:u,dy:i}}function Li(n){var t=n[0],e=n[n.length-1];return e>t?[t,e]:[e,t]}function Ti(n){return n.rangeExtent?n.rangeExtent():Li(n.range())}function qi(n,t,e,r){var u=e(n[0],n[1]),i=r(t[0],t[1]);return function(n){return i(u(n))}}function Ri(n,t){var e,r=0,u=n.length-1,i=n[r],o=n[u];return i>o&&(e=r,r=u,u=e,e=i,i=o,o=e),n[r]=t.floor(i),n[u]=t.ceil(o),n}function Di(n){return n?{floor:function(t){return Math.floor(t/n)*n},ceil:function(t){return Math.ceil(t/n)*n}}:ss}function Pi(n,t,e,r){var u=[],i=[],o=0,a=Math.min(n.length,t.length)-1;for(n[a]<n[0]&&(n=n.slice().reverse(),t=t.slice().reverse());++o<=a;)u.push(e(n[o-1],n[o])),i.push(r(t[o-1],t[o]));return function(t){var e=Zo.bisect(n,t,1,a)-1;return i[e](u[e](t))}}function Ui(n,t,e,r){function u(){var u=Math.min(n.length,t.length)>2?Pi:qi,c=r?Uu:Pu;return o=u(n,t,c,e),a=u(t,n,c,hu),i}function i(n){return o(n)}var o,a;return i.invert=function(n){return a(n)},i.domain=function(t){return arguments.length?(n=t.map(Number),u()):n},i.range=function(n){return arguments.length?(t=n,u()):t},i.rangeRound=function(n){return i.range(n).interpolate(zu)},i.clamp=function(n){return arguments.length?(r=n,u()):r},i.interpolate=function(n){return arguments.length?(e=n,u()):e},i.ticks=function(t){return Oi(n,t)},i.tickFormat=function(t,e){return Yi(n,t,e)},i.nice=function(t){return Hi(n,t),u()},i.copy=function(){return Ui(n,t,e,r)},u()}function ji(n,t){return Zo.rebind(n,t,"range","rangeRound","interpolate","clamp")}function Hi(n,t){return Ri(n,Di(Fi(n,t)[2]))}function Fi(n,t){null==t&&(t=10);var e=Li(n),r=e[1]-e[0],u=Math.pow(10,Math.floor(Math.log(r/t)/Math.LN10)),i=t/r*u;return.15>=i?u*=10:.35>=i?u*=5:.75>=i&&(u*=2),e[0]=Math.ceil(e[0]/u)*u,e[1]=Math.floor(e[1]/u)*u+.5*u,e[2]=u,e}function Oi(n,t){return Zo.range.apply(Zo,Fi(n,t))}function Yi(n,t,e){var r=Fi(n,t);if(e){var u=Ga.exec(e);if(u.shift(),"s"===u[8]){var i=Zo.formatPrefix(Math.max(ua(r[0]),ua(r[1])));return u[7]||(u[7]="."+Ii(i.scale(r[2]))),u[8]="f",e=Zo.format(u.join("")),function(n){return e(i.scale(n))+i.symbol}}u[7]||(u[7]="."+Zi(u[8],r)),e=u.join("")}else e=",."+Ii(r[2])+"f";return Zo.format(e)}function Ii(n){return-Math.floor(Math.log(n)/Math.LN10+.01)}function Zi(n,t){var e=Ii(t[2]);return n in ls?Math.abs(e-Ii(Math.max(ua(t[0]),ua(t[1]))))+ +("e"!==n):e-2*("%"===n)}function Vi(n,t,e,r){function u(n){return(e?Math.log(0>n?0:n):-Math.log(n>0?0:-n))/Math.log(t)}function i(n){return e?Math.pow(t,n):-Math.pow(t,-n)}function o(t){return n(u(t))}return o.invert=function(t){return i(n.invert(t))},o.domain=function(t){return arguments.length?(e=t[0]>=0,n.domain((r=t.map(Number)).map(u)),o):r},o.base=function(e){return arguments.length?(t=+e,n.domain(r.map(u)),o):t},o.nice=function(){var t=Ri(r.map(u),e?Math:hs);return n.domain(t),r=t.map(i),o},o.ticks=function(){var n=Li(r),o=[],a=n[0],c=n[1],s=Math.floor(u(a)),l=Math.ceil(u(c)),f=t%1?2:t;if(isFinite(l-s)){if(e){for(;l>s;s++)for(var h=1;f>h;h++)o.push(i(s)*h);o.push(i(s))}else for(o.push(i(s));s++<l;)for(var h=f-1;h>0;h--)o.push(i(s)*h);for(s=0;o[s]<a;s++);for(l=o.length;o[l-1]>c;l--);o=o.slice(s,l)}return o},o.tickFormat=function(n,t){if(!arguments.length)return fs;arguments.length<2?t=fs:"function"!=typeof t&&(t=Zo.format(t));var r,a=Math.max(.1,n/o.ticks().length),c=e?(r=1e-12,Math.ceil):(r=-1e-12,Math.floor);return function(n){return n/i(c(u(n)+r))<=a?t(n):""}},o.copy=function(){return Vi(n.copy(),t,e,r)},ji(o,n)}function Xi(n,t,e){function r(t){return n(u(t))}var u=$i(t),i=$i(1/t);return r.invert=function(t){return i(n.invert(t))},r.domain=function(t){return arguments.length?(n.domain((e=t.map(Number)).map(u)),r):e},r.ticks=function(n){return Oi(e,n)},r.tickFormat=function(n,t){return Yi(e,n,t)},r.nice=function(n){return r.domain(Hi(e,n))},r.exponent=function(o){return arguments.length?(u=$i(t=o),i=$i(1/t),n.domain(e.map(u)),r):t},r.copy=function(){return Xi(n.copy(),t,e)},ji(r,n)}function $i(n){return function(t){return 0>t?-Math.pow(-t,n):Math.pow(t,n)}}function Bi(n,t){function e(e){return i[((u.get(e)||("range"===t.t?u.set(e,n.push(e)):0/0))-1)%i.length]}function r(t,e){return Zo.range(n.length).map(function(n){return t+e*n})}var u,i,a;return e.domain=function(r){if(!arguments.length)return n;n=[],u=new o;for(var i,a=-1,c=r.length;++a<c;)u.has(i=r[a])||u.set(i,n.push(i));return e[t.t].apply(e,t.a)},e.range=function(n){return arguments.length?(i=n,a=0,t={t:"range",a:arguments},e):i},e.rangePoints=function(u,o){arguments.length<2&&(o=0);var c=u[0],s=u[1],l=(s-c)/(Math.max(1,n.length-1)+o);return i=r(n.length<2?(c+s)/2:c+l*o/2,l),a=0,t={t:"rangePoints",a:arguments},e},e.rangeBands=function(u,o,c){arguments.length<2&&(o=0),arguments.length<3&&(c=o);var s=u[1]<u[0],l=u[s-0],f=u[1-s],h=(f-l)/(n.length-o+2*c);return i=r(l+h*c,h),s&&i.reverse(),a=h*(1-o),t={t:"rangeBands",a:arguments},e},e.rangeRoundBands=function(u,o,c){arguments.length<2&&(o=0),arguments.length<3&&(c=o);var s=u[1]<u[0],l=u[s-0],f=u[1-s],h=Math.floor((f-l)/(n.length-o+2*c)),g=f-l-(n.length-o)*h;return i=r(l+Math.round(g/2),h),s&&i.reverse(),a=Math.round(h*(1-o)),t={t:"rangeRoundBands",a:arguments},e},e.rangeBand=function(){return a},e.rangeExtent=function(){return Li(t.a[0])},e.copy=function(){return Bi(n,t)},e.domain(n)}function Wi(e,r){function u(){var n=0,t=r.length;for(o=[];++n<t;)o[n-1]=Zo.quantile(e,n/t);return i}function i(n){return isNaN(n=+n)?void 0:r[Zo.bisect(o,n)]}var o;return i.domain=function(r){return arguments.length?(e=r.filter(t).sort(n),u()):e},i.range=function(n){return arguments.length?(r=n,u()):r},i.quantiles=function(){return o},i.invertExtent=function(n){return n=r.indexOf(n),0>n?[0/0,0/0]:[n>0?o[n-1]:e[0],n<o.length?o[n]:e[e.length-1]]},i.copy=function(){return Wi(e,r)},u()}function Ji(n,t,e){function r(t){return e[Math.max(0,Math.min(o,Math.floor(i*(t-n))))]}function u(){return i=e.length/(t-n),o=e.length-1,r}var i,o;return r.domain=function(e){return arguments.length?(n=+e[0],t=+e[e.length-1],u()):[n,t]},r.range=function(n){return arguments.length?(e=n,u()):e},r.invertExtent=function(t){return t=e.indexOf(t),t=0>t?0/0:t/i+n,[t,t+1/i]},r.copy=function(){return Ji(n,t,e)},u()}function Gi(n,t){function e(e){return e>=e?t[Zo.bisect(n,e)]:void 0}return e.domain=function(t){return arguments.length?(n=t,e):n},e.range=function(n){return arguments.length?(t=n,e):t},e.invertExtent=function(e){return e=t.indexOf(e),[n[e-1],n[e]]},e.copy=function(){return Gi(n,t)},e}function Ki(n){function t(n){return+n}return t.invert=t,t.domain=t.range=function(e){return arguments.length?(n=e.map(t),t):n},t.ticks=function(t){return Oi(n,t)},t.tickFormat=function(t,e){return Yi(n,t,e)},t.copy=function(){return Ki(n)},t}function Qi(n){return n.innerRadius}function no(n){return n.outerRadius}function to(n){return n.startAngle}function eo(n){return n.endAngle}function ro(n){function t(t){function o(){s.push("M",i(n(l),a))}for(var c,s=[],l=[],f=-1,h=t.length,g=bt(e),p=bt(r);++f<h;)u.call(this,c=t[f],f)?l.push([+g.call(this,c,f),+p.call(this,c,f)]):l.length&&(o(),l=[]);return l.length&&o(),s.length?s.join(""):null}var e=wr,r=Sr,u=we,i=uo,o=i.key,a=.7;return t.x=function(n){return arguments.length?(e=n,t):e},t.y=function(n){return arguments.length?(r=n,t):r},t.defined=function(n){return arguments.length?(u=n,t):u},t.interpolate=function(n){return arguments.length?(o="function"==typeof n?i=n:(i=xs.get(n)||uo).key,t):o},t.tension=function(n){return arguments.length?(a=n,t):a},t}function uo(n){return n.join("L")}function io(n){return uo(n)+"Z"}function oo(n){for(var t=0,e=n.length,r=n[0],u=[r[0],",",r[1]];++t<e;)u.push("H",(r[0]+(r=n[t])[0])/2,"V",r[1]);return e>1&&u.push("H",r[0]),u.join("")}function ao(n){for(var t=0,e=n.length,r=n[0],u=[r[0],",",r[1]];++t<e;)u.push("V",(r=n[t])[1],"H",r[0]);return u.join("")}function co(n){for(var t=0,e=n.length,r=n[0],u=[r[0],",",r[1]];++t<e;)u.push("H",(r=n[t])[0],"V",r[1]);return u.join("")}function so(n,t){return n.length<4?uo(n):n[1]+ho(n.slice(1,n.length-1),go(n,t))}function lo(n,t){return n.length<3?uo(n):n[0]+ho((n.push(n[0]),n),go([n[n.length-2]].concat(n,[n[1]]),t))}function fo(n,t){return n.length<3?uo(n):n[0]+ho(n,go(n,t))}function ho(n,t){if(t.length<1||n.length!=t.length&&n.length!=t.length+2)return uo(n);var e=n.length!=t.length,r="",u=n[0],i=n[1],o=t[0],a=o,c=1;if(e&&(r+="Q"+(i[0]-2*o[0]/3)+","+(i[1]-2*o[1]/3)+","+i[0]+","+i[1],u=n[1],c=2),t.length>1){a=t[1],i=n[c],c++,r+="C"+(u[0]+o[0])+","+(u[1]+o[1])+","+(i[0]-a[0])+","+(i[1]-a[1])+","+i[0]+","+i[1];for(var s=2;s<t.length;s++,c++)i=n[c],a=t[s],r+="S"+(i[0]-a[0])+","+(i[1]-a[1])+","+i[0]+","+i[1]}if(e){var l=n[c];r+="Q"+(i[0]+2*a[0]/3)+","+(i[1]+2*a[1]/3)+","+l[0]+","+l[1]}return r}function go(n,t){for(var e,r=[],u=(1-t)/2,i=n[0],o=n[1],a=1,c=n.length;++a<c;)e=i,i=o,o=n[a],r.push([u*(o[0]-e[0]),u*(o[1]-e[1])]);return r}function po(n){if(n.length<3)return uo(n);var t=1,e=n.length,r=n[0],u=r[0],i=r[1],o=[u,u,u,(r=n[1])[0]],a=[i,i,i,r[1]],c=[u,",",i,"L",xo(bs,o),",",xo(bs,a)];for(n.push(n[e-1]);++t<=e;)r=n[t],o.shift(),o.push(r[0]),a.shift(),a.push(r[1]),Mo(c,o,a);return n.pop(),c.push("L",r),c.join("")}function vo(n){if(n.length<4)return uo(n);for(var t,e=[],r=-1,u=n.length,i=[0],o=[0];++r<3;)t=n[r],i.push(t[0]),o.push(t[1]);for(e.push(xo(bs,i)+","+xo(bs,o)),--r;++r<u;)t=n[r],i.shift(),i.push(t[0]),o.shift(),o.push(t[1]),Mo(e,i,o);return e.join("")}function mo(n){for(var t,e,r=-1,u=n.length,i=u+4,o=[],a=[];++r<4;)e=n[r%u],o.push(e[0]),a.push(e[1]);for(t=[xo(bs,o),",",xo(bs,a)],--r;++r<i;)e=n[r%u],o.shift(),o.push(e[0]),a.shift(),a.push(e[1]),Mo(t,o,a);return t.join("")}function yo(n,t){var e=n.length-1;if(e)for(var r,u,i=n[0][0],o=n[0][1],a=n[e][0]-i,c=n[e][1]-o,s=-1;++s<=e;)r=n[s],u=s/e,r[0]=t*r[0]+(1-t)*(i+u*a),r[1]=t*r[1]+(1-t)*(o+u*c);return po(n)}function xo(n,t){return n[0]*t[0]+n[1]*t[1]+n[2]*t[2]+n[3]*t[3]}function Mo(n,t,e){n.push("C",xo(Ms,t),",",xo(Ms,e),",",xo(_s,t),",",xo(_s,e),",",xo(bs,t),",",xo(bs,e))}function _o(n,t){return(t[1]-n[1])/(t[0]-n[0])}function bo(n){for(var t=0,e=n.length-1,r=[],u=n[0],i=n[1],o=r[0]=_o(u,i);++t<e;)r[t]=(o+(o=_o(u=i,i=n[t+1])))/2;return r[t]=o,r}function wo(n){for(var t,e,r,u,i=[],o=bo(n),a=-1,c=n.length-1;++a<c;)t=_o(n[a],n[a+1]),ua(t)<ka?o[a]=o[a+1]=0:(e=o[a]/t,r=o[a+1]/t,u=e*e+r*r,u>9&&(u=3*t/Math.sqrt(u),o[a]=u*e,o[a+1]=u*r));for(a=-1;++a<=c;)u=(n[Math.min(c,a+1)][0]-n[Math.max(0,a-1)][0])/(6*(1+o[a]*o[a])),i.push([u||0,o[a]*u||0]);return i}function So(n){return n.length<3?uo(n):n[0]+ho(n,wo(n))}function ko(n){for(var t,e,r,u=-1,i=n.length;++u<i;)t=n[u],e=t[0],r=t[1]+ms,t[0]=e*Math.cos(r),t[1]=e*Math.sin(r);return n}function Eo(n){function t(t){function c(){v.push("M",a(n(m),f),l,s(n(d.reverse()),f),"Z")}for(var h,g,p,v=[],d=[],m=[],y=-1,x=t.length,M=bt(e),_=bt(u),b=e===r?function(){return g}:bt(r),w=u===i?function(){return p}:bt(i);++y<x;)o.call(this,h=t[y],y)?(d.push([g=+M.call(this,h,y),p=+_.call(this,h,y)]),m.push([+b.call(this,h,y),+w.call(this,h,y)])):d.length&&(c(),d=[],m=[]);return d.length&&c(),v.length?v.join(""):null}var e=wr,r=wr,u=0,i=Sr,o=we,a=uo,c=a.key,s=a,l="L",f=.7;return t.x=function(n){return arguments.length?(e=r=n,t):r},t.x0=function(n){return arguments.length?(e=n,t):e},t.x1=function(n){return arguments.length?(r=n,t):r},t.y=function(n){return arguments.length?(u=i=n,t):i},t.y0=function(n){return arguments.length?(u=n,t):u},t.y1=function(n){return arguments.length?(i=n,t):i},t.defined=function(n){return arguments.length?(o=n,t):o},t.interpolate=function(n){return arguments.length?(c="function"==typeof n?a=n:(a=xs.get(n)||uo).key,s=a.reverse||a,l=a.closed?"M":"L",t):c},t.tension=function(n){return arguments.length?(f=n,t):f},t}function Ao(n){return n.radius}function Co(n){return[n.x,n.y]}function No(n){return function(){var t=n.apply(this,arguments),e=t[0],r=t[1]+ms;return[e*Math.cos(r),e*Math.sin(r)]}}function zo(){return 64}function Lo(){return"circle"}function To(n){var t=Math.sqrt(n/ba);return"M0,"+t+"A"+t+","+t+" 0 1,1 0,"+-t+"A"+t+","+t+" 0 1,1 0,"+t+"Z"}function qo(n,t){return sa(n,Cs),n.id=t,n}function Ro(n,t,e,r){var u=n.id;return P(n,"function"==typeof e?function(n,i,o){n.__transition__[u].tween.set(t,r(e.call(n,n.__data__,i,o)))}:(e=r(e),function(n){n.__transition__[u].tween.set(t,e)}))}function Do(n){return null==n&&(n=""),function(){this.textContent=n}}function Po(n,t,e,r){var u=n.__transition__||(n.__transition__={active:0,count:0}),i=u[e];if(!i){var a=r.time;i=u[e]={tween:new o,time:a,ease:r.ease,delay:r.delay,duration:r.duration},++u.count,Zo.timer(function(r){function o(r){return u.active>e?s():(u.active=e,i.event&&i.event.start.call(n,l,t),i.tween.forEach(function(e,r){(r=r.call(n,l,t))&&v.push(r)}),Zo.timer(function(){return p.c=c(r||1)?we:c,1},0,a),void 0)}function c(r){if(u.active!==e)return s();for(var o=r/g,a=f(o),c=v.length;c>0;)v[--c].call(n,a);
+return o>=1?(i.event&&i.event.end.call(n,l,t),s()):void 0}function s(){return--u.count?delete u[e]:delete n.__transition__,1}var l=n.__data__,f=i.ease,h=i.delay,g=i.duration,p=Ba,v=[];return p.t=h+a,r>=h?o(r-h):(p.c=o,void 0)},0,a)}}function Uo(n,t){n.attr("transform",function(n){return"translate("+t(n)+",0)"})}function jo(n,t){n.attr("transform",function(n){return"translate(0,"+t(n)+")"})}function Ho(n){return n.toISOString()}function Fo(n,t,e){function r(t){return n(t)}function u(n,e){var r=n[1]-n[0],u=r/e,i=Zo.bisect(Us,u);return i==Us.length?[t.year,Fi(n.map(function(n){return n/31536e6}),e)[2]]:i?t[u/Us[i-1]<Us[i]/u?i-1:i]:[Fs,Fi(n,e)[2]]}return r.invert=function(t){return Oo(n.invert(t))},r.domain=function(t){return arguments.length?(n.domain(t),r):n.domain().map(Oo)},r.nice=function(n,t){function e(e){return!isNaN(e)&&!n.range(e,Oo(+e+1),t).length}var i=r.domain(),o=Li(i),a=null==n?u(o,10):"number"==typeof n&&u(o,n);return a&&(n=a[0],t=a[1]),r.domain(Ri(i,t>1?{floor:function(t){for(;e(t=n.floor(t));)t=Oo(t-1);return t},ceil:function(t){for(;e(t=n.ceil(t));)t=Oo(+t+1);return t}}:n))},r.ticks=function(n,t){var e=Li(r.domain()),i=null==n?u(e,10):"number"==typeof n?u(e,n):!n.range&&[{range:n},t];return i&&(n=i[0],t=i[1]),n.range(e[0],Oo(+e[1]+1),1>t?1:t)},r.tickFormat=function(){return e},r.copy=function(){return Fo(n.copy(),t,e)},ji(r,n)}function Oo(n){return new Date(n)}function Yo(n){return JSON.parse(n.responseText)}function Io(n){var t=$o.createRange();return t.selectNode($o.body),t.createContextualFragment(n.responseText)}var Zo={version:"3.4.11"};Date.now||(Date.now=function(){return+new Date});var Vo=[].slice,Xo=function(n){return Vo.call(n)},$o=document,Bo=$o.documentElement,Wo=window;try{Xo(Bo.childNodes)[0].nodeType}catch(Jo){Xo=function(n){for(var t=n.length,e=new Array(t);t--;)e[t]=n[t];return e}}try{$o.createElement("div").style.setProperty("opacity",0,"")}catch(Go){var Ko=Wo.Element.prototype,Qo=Ko.setAttribute,na=Ko.setAttributeNS,ta=Wo.CSSStyleDeclaration.prototype,ea=ta.setProperty;Ko.setAttribute=function(n,t){Qo.call(this,n,t+"")},Ko.setAttributeNS=function(n,t,e){na.call(this,n,t,e+"")},ta.setProperty=function(n,t,e){ea.call(this,n,t+"",e)}}Zo.ascending=n,Zo.descending=function(n,t){return n>t?-1:t>n?1:t>=n?0:0/0},Zo.min=function(n,t){var e,r,u=-1,i=n.length;if(1===arguments.length){for(;++u<i&&!(null!=(e=n[u])&&e>=e);)e=void 0;for(;++u<i;)null!=(r=n[u])&&e>r&&(e=r)}else{for(;++u<i&&!(null!=(e=t.call(n,n[u],u))&&e>=e);)e=void 0;for(;++u<i;)null!=(r=t.call(n,n[u],u))&&e>r&&(e=r)}return e},Zo.max=function(n,t){var e,r,u=-1,i=n.length;if(1===arguments.length){for(;++u<i&&!(null!=(e=n[u])&&e>=e);)e=void 0;for(;++u<i;)null!=(r=n[u])&&r>e&&(e=r)}else{for(;++u<i&&!(null!=(e=t.call(n,n[u],u))&&e>=e);)e=void 0;for(;++u<i;)null!=(r=t.call(n,n[u],u))&&r>e&&(e=r)}return e},Zo.extent=function(n,t){var e,r,u,i=-1,o=n.length;if(1===arguments.length){for(;++i<o&&!(null!=(e=u=n[i])&&e>=e);)e=u=void 0;for(;++i<o;)null!=(r=n[i])&&(e>r&&(e=r),r>u&&(u=r))}else{for(;++i<o&&!(null!=(e=u=t.call(n,n[i],i))&&e>=e);)e=void 0;for(;++i<o;)null!=(r=t.call(n,n[i],i))&&(e>r&&(e=r),r>u&&(u=r))}return[e,u]},Zo.sum=function(n,t){var e,r=0,u=n.length,i=-1;if(1===arguments.length)for(;++i<u;)isNaN(e=+n[i])||(r+=e);else for(;++i<u;)isNaN(e=+t.call(n,n[i],i))||(r+=e);return r},Zo.mean=function(n,e){var r,u=0,i=n.length,o=-1,a=i;if(1===arguments.length)for(;++o<i;)t(r=n[o])?u+=r:--a;else for(;++o<i;)t(r=e.call(n,n[o],o))?u+=r:--a;return a?u/a:void 0},Zo.quantile=function(n,t){var e=(n.length-1)*t+1,r=Math.floor(e),u=+n[r-1],i=e-r;return i?u+i*(n[r]-u):u},Zo.median=function(e,r){return arguments.length>1&&(e=e.map(r)),e=e.filter(t),e.length?Zo.quantile(e.sort(n),.5):void 0};var ra=e(n);Zo.bisectLeft=ra.left,Zo.bisect=Zo.bisectRight=ra.right,Zo.bisector=function(t){return e(1===t.length?function(e,r){return n(t(e),r)}:t)},Zo.shuffle=function(n){for(var t,e,r=n.length;r;)e=0|Math.random()*r--,t=n[r],n[r]=n[e],n[e]=t;return n},Zo.permute=function(n,t){for(var e=t.length,r=new Array(e);e--;)r[e]=n[t[e]];return r},Zo.pairs=function(n){for(var t,e=0,r=n.length-1,u=n[0],i=new Array(0>r?0:r);r>e;)i[e]=[t=u,u=n[++e]];return i},Zo.zip=function(){if(!(u=arguments.length))return[];for(var n=-1,t=Zo.min(arguments,r),e=new Array(t);++n<t;)for(var u,i=-1,o=e[n]=new Array(u);++i<u;)o[i]=arguments[i][n];return e},Zo.transpose=function(n){return Zo.zip.apply(Zo,n)},Zo.keys=function(n){var t=[];for(var e in n)t.push(e);return t},Zo.values=function(n){var t=[];for(var e in n)t.push(n[e]);return t},Zo.entries=function(n){var t=[];for(var e in n)t.push({key:e,value:n[e]});return t},Zo.merge=function(n){for(var t,e,r,u=n.length,i=-1,o=0;++i<u;)o+=n[i].length;for(e=new Array(o);--u>=0;)for(r=n[u],t=r.length;--t>=0;)e[--o]=r[t];return e};var ua=Math.abs;Zo.range=function(n,t,e){if(arguments.length<3&&(e=1,arguments.length<2&&(t=n,n=0)),1/0===(t-n)/e)throw new Error("infinite range");var r,i=[],o=u(ua(e)),a=-1;if(n*=o,t*=o,e*=o,0>e)for(;(r=n+e*++a)>t;)i.push(r/o);else for(;(r=n+e*++a)<t;)i.push(r/o);return i},Zo.map=function(n){var t=new o;if(n instanceof o)n.forEach(function(n,e){t.set(n,e)});else for(var e in n)t.set(e,n[e]);return t},i(o,{has:a,get:function(n){return this[ia+n]},set:function(n,t){return this[ia+n]=t},remove:c,keys:s,values:function(){var n=[];return this.forEach(function(t,e){n.push(e)}),n},entries:function(){var n=[];return this.forEach(function(t,e){n.push({key:t,value:e})}),n},size:l,empty:f,forEach:function(n){for(var t in this)t.charCodeAt(0)===oa&&n.call(this,t.substring(1),this[t])}});var ia="\x00",oa=ia.charCodeAt(0);Zo.nest=function(){function n(t,a,c){if(c>=i.length)return r?r.call(u,a):e?a.sort(e):a;for(var s,l,f,h,g=-1,p=a.length,v=i[c++],d=new o;++g<p;)(h=d.get(s=v(l=a[g])))?h.push(l):d.set(s,[l]);return t?(l=t(),f=function(e,r){l.set(e,n(t,r,c))}):(l={},f=function(e,r){l[e]=n(t,r,c)}),d.forEach(f),l}function t(n,e){if(e>=i.length)return n;var r=[],u=a[e++];return n.forEach(function(n,u){r.push({key:n,values:t(u,e)})}),u?r.sort(function(n,t){return u(n.key,t.key)}):r}var e,r,u={},i=[],a=[];return u.map=function(t,e){return n(e,t,0)},u.entries=function(e){return t(n(Zo.map,e,0),0)},u.key=function(n){return i.push(n),u},u.sortKeys=function(n){return a[i.length-1]=n,u},u.sortValues=function(n){return e=n,u},u.rollup=function(n){return r=n,u},u},Zo.set=function(n){var t=new h;if(n)for(var e=0,r=n.length;r>e;++e)t.add(n[e]);return t},i(h,{has:a,add:function(n){return this[ia+n]=!0,n},remove:function(n){return n=ia+n,n in this&&delete this[n]},values:s,size:l,empty:f,forEach:function(n){for(var t in this)t.charCodeAt(0)===oa&&n.call(this,t.substring(1))}}),Zo.behavior={},Zo.rebind=function(n,t){for(var e,r=1,u=arguments.length;++r<u;)n[e=arguments[r]]=g(n,t,t[e]);return n};var aa=["webkit","ms","moz","Moz","o","O"];Zo.dispatch=function(){for(var n=new d,t=-1,e=arguments.length;++t<e;)n[arguments[t]]=m(n);return n},d.prototype.on=function(n,t){var e=n.indexOf("."),r="";if(e>=0&&(r=n.substring(e+1),n=n.substring(0,e)),n)return arguments.length<2?this[n].on(r):this[n].on(r,t);if(2===arguments.length){if(null==t)for(n in this)this.hasOwnProperty(n)&&this[n].on(r,null);return this}},Zo.event=null,Zo.requote=function(n){return n.replace(ca,"\\$&")};var ca=/[\\\^\$\*\+\?\|\[\]\(\)\.\{\}]/g,sa={}.__proto__?function(n,t){n.__proto__=t}:function(n,t){for(var e in t)n[e]=t[e]},la=function(n,t){return t.querySelector(n)},fa=function(n,t){return t.querySelectorAll(n)},ha=Bo.matches||Bo[p(Bo,"matchesSelector")],ga=function(n,t){return ha.call(n,t)};"function"==typeof Sizzle&&(la=function(n,t){return Sizzle(n,t)[0]||null},fa=Sizzle,ga=Sizzle.matchesSelector),Zo.selection=function(){return ma};var pa=Zo.selection.prototype=[];pa.select=function(n){var t,e,r,u,i=[];n=b(n);for(var o=-1,a=this.length;++o<a;){i.push(t=[]),t.parentNode=(r=this[o]).parentNode;for(var c=-1,s=r.length;++c<s;)(u=r[c])?(t.push(e=n.call(u,u.__data__,c,o)),e&&"__data__"in u&&(e.__data__=u.__data__)):t.push(null)}return _(i)},pa.selectAll=function(n){var t,e,r=[];n=w(n);for(var u=-1,i=this.length;++u<i;)for(var o=this[u],a=-1,c=o.length;++a<c;)(e=o[a])&&(r.push(t=Xo(n.call(e,e.__data__,a,u))),t.parentNode=e);return _(r)};var va={svg:"http://www.w3.org/2000/svg",xhtml:"http://www.w3.org/1999/xhtml",xlink:"http://www.w3.org/1999/xlink",xml:"http://www.w3.org/XML/1998/namespace",xmlns:"http://www.w3.org/2000/xmlns/"};Zo.ns={prefix:va,qualify:function(n){var t=n.indexOf(":"),e=n;return t>=0&&(e=n.substring(0,t),n=n.substring(t+1)),va.hasOwnProperty(e)?{space:va[e],local:n}:n}},pa.attr=function(n,t){if(arguments.length<2){if("string"==typeof n){var e=this.node();return n=Zo.ns.qualify(n),n.local?e.getAttributeNS(n.space,n.local):e.getAttribute(n)}for(t in n)this.each(S(t,n[t]));return this}return this.each(S(n,t))},pa.classed=function(n,t){if(arguments.length<2){if("string"==typeof n){var e=this.node(),r=(n=A(n)).length,u=-1;if(t=e.classList){for(;++u<r;)if(!t.contains(n[u]))return!1}else for(t=e.getAttribute("class");++u<r;)if(!E(n[u]).test(t))return!1;return!0}for(t in n)this.each(C(t,n[t]));return this}return this.each(C(n,t))},pa.style=function(n,t,e){var r=arguments.length;if(3>r){if("string"!=typeof n){2>r&&(t="");for(e in n)this.each(z(e,n[e],t));return this}if(2>r)return Wo.getComputedStyle(this.node(),null).getPropertyValue(n);e=""}return this.each(z(n,t,e))},pa.property=function(n,t){if(arguments.length<2){if("string"==typeof n)return this.node()[n];for(t in n)this.each(L(t,n[t]));return this}return this.each(L(n,t))},pa.text=function(n){return arguments.length?this.each("function"==typeof n?function(){var t=n.apply(this,arguments);this.textContent=null==t?"":t}:null==n?function(){this.textContent=""}:function(){this.textContent=n}):this.node().textContent},pa.html=function(n){return arguments.length?this.each("function"==typeof n?function(){var t=n.apply(this,arguments);this.innerHTML=null==t?"":t}:null==n?function(){this.innerHTML=""}:function(){this.innerHTML=n}):this.node().innerHTML},pa.append=function(n){return n=T(n),this.select(function(){return this.appendChild(n.apply(this,arguments))})},pa.insert=function(n,t){return n=T(n),t=b(t),this.select(function(){return this.insertBefore(n.apply(this,arguments),t.apply(this,arguments)||null)})},pa.remove=function(){return this.each(function(){var n=this.parentNode;n&&n.removeChild(this)})},pa.data=function(n,t){function e(n,e){var r,u,i,a=n.length,f=e.length,h=Math.min(a,f),g=new Array(f),p=new Array(f),v=new Array(a);if(t){var d,m=new o,y=new o,x=[];for(r=-1;++r<a;)d=t.call(u=n[r],u.__data__,r),m.has(d)?v[r]=u:m.set(d,u),x.push(d);for(r=-1;++r<f;)d=t.call(e,i=e[r],r),(u=m.get(d))?(g[r]=u,u.__data__=i):y.has(d)||(p[r]=q(i)),y.set(d,i),m.remove(d);for(r=-1;++r<a;)m.has(x[r])&&(v[r]=n[r])}else{for(r=-1;++r<h;)u=n[r],i=e[r],u?(u.__data__=i,g[r]=u):p[r]=q(i);for(;f>r;++r)p[r]=q(e[r]);for(;a>r;++r)v[r]=n[r]}p.update=g,p.parentNode=g.parentNode=v.parentNode=n.parentNode,c.push(p),s.push(g),l.push(v)}var r,u,i=-1,a=this.length;if(!arguments.length){for(n=new Array(a=(r=this[0]).length);++i<a;)(u=r[i])&&(n[i]=u.__data__);return n}var c=U([]),s=_([]),l=_([]);if("function"==typeof n)for(;++i<a;)e(r=this[i],n.call(r,r.parentNode.__data__,i));else for(;++i<a;)e(r=this[i],n);return s.enter=function(){return c},s.exit=function(){return l},s},pa.datum=function(n){return arguments.length?this.property("__data__",n):this.property("__data__")},pa.filter=function(n){var t,e,r,u=[];"function"!=typeof n&&(n=R(n));for(var i=0,o=this.length;o>i;i++){u.push(t=[]),t.parentNode=(e=this[i]).parentNode;for(var a=0,c=e.length;c>a;a++)(r=e[a])&&n.call(r,r.__data__,a,i)&&t.push(r)}return _(u)},pa.order=function(){for(var n=-1,t=this.length;++n<t;)for(var e,r=this[n],u=r.length-1,i=r[u];--u>=0;)(e=r[u])&&(i&&i!==e.nextSibling&&i.parentNode.insertBefore(e,i),i=e);return this},pa.sort=function(n){n=D.apply(this,arguments);for(var t=-1,e=this.length;++t<e;)this[t].sort(n);return this.order()},pa.each=function(n){return P(this,function(t,e,r){n.call(t,t.__data__,e,r)})},pa.call=function(n){var t=Xo(arguments);return n.apply(t[0]=this,t),this},pa.empty=function(){return!this.node()},pa.node=function(){for(var n=0,t=this.length;t>n;n++)for(var e=this[n],r=0,u=e.length;u>r;r++){var i=e[r];if(i)return i}return null},pa.size=function(){var n=0;return this.each(function(){++n}),n};var da=[];Zo.selection.enter=U,Zo.selection.enter.prototype=da,da.append=pa.append,da.empty=pa.empty,da.node=pa.node,da.call=pa.call,da.size=pa.size,da.select=function(n){for(var t,e,r,u,i,o=[],a=-1,c=this.length;++a<c;){r=(u=this[a]).update,o.push(t=[]),t.parentNode=u.parentNode;for(var s=-1,l=u.length;++s<l;)(i=u[s])?(t.push(r[s]=e=n.call(u.parentNode,i.__data__,s,a)),e.__data__=i.__data__):t.push(null)}return _(o)},da.insert=function(n,t){return arguments.length<2&&(t=j(this)),pa.insert.call(this,n,t)},pa.transition=function(){for(var n,t,e=Ss||++Ns,r=[],u=ks||{time:Date.now(),ease:xu,delay:0,duration:250},i=-1,o=this.length;++i<o;){r.push(n=[]);for(var a=this[i],c=-1,s=a.length;++c<s;)(t=a[c])&&Po(t,c,e,u),n.push(t)}return qo(r,e)},pa.interrupt=function(){return this.each(H)},Zo.select=function(n){var t=["string"==typeof n?la(n,$o):n];return t.parentNode=Bo,_([t])},Zo.selectAll=function(n){var t=Xo("string"==typeof n?fa(n,$o):n);return t.parentNode=Bo,_([t])};var ma=Zo.select(Bo);pa.on=function(n,t,e){var r=arguments.length;if(3>r){if("string"!=typeof n){2>r&&(t=!1);for(e in n)this.each(F(e,n[e],t));return this}if(2>r)return(r=this.node()["__on"+n])&&r._;e=!1}return this.each(F(n,t,e))};var ya=Zo.map({mouseenter:"mouseover",mouseleave:"mouseout"});ya.forEach(function(n){"on"+n in $o&&ya.remove(n)});var xa="onselectstart"in $o?null:p(Bo.style,"userSelect"),Ma=0;Zo.mouse=function(n){return Z(n,x())};var _a=/WebKit/.test(Wo.navigator.userAgent)?-1:0;Zo.touches=function(n,t){return arguments.length<2&&(t=x().touches),t?Xo(t).map(function(t){var e=Z(n,t);return e.identifier=t.identifier,e}):[]},Zo.behavior.drag=function(){function n(){this.on("mousedown.drag",u).on("touchstart.drag",i)}function t(n,t,u,i,o){return function(){function a(){var n,e,r=t(h,v);r&&(n=r[0]-x[0],e=r[1]-x[1],p|=n|e,x=r,g({type:"drag",x:r[0]+s[0],y:r[1]+s[1],dx:n,dy:e}))}function c(){t(h,v)&&(m.on(i+d,null).on(o+d,null),y(p&&Zo.event.target===f),g({type:"dragend"}))}var s,l=this,f=Zo.event.target,h=l.parentNode,g=e.of(l,arguments),p=0,v=n(),d=".drag"+(null==v?"":"-"+v),m=Zo.select(u()).on(i+d,a).on(o+d,c),y=I(),x=t(h,v);r?(s=r.apply(l,arguments),s=[s.x-x[0],s.y-x[1]]):s=[0,0],g({type:"dragstart"})}}var e=M(n,"drag","dragstart","dragend"),r=null,u=t(v,Zo.mouse,$,"mousemove","mouseup"),i=t(V,Zo.touch,X,"touchmove","touchend");return n.origin=function(t){return arguments.length?(r=t,n):r},Zo.rebind(n,e,"on")};var ba=Math.PI,wa=2*ba,Sa=ba/2,ka=1e-6,Ea=ka*ka,Aa=ba/180,Ca=180/ba,Na=Math.SQRT2,za=2,La=4;Zo.interpolateZoom=function(n,t){function e(n){var t=n*y;if(m){var e=Q(v),o=i/(za*h)*(e*nt(Na*t+v)-K(v));return[r+o*s,u+o*l,i*e/Q(Na*t+v)]}return[r+n*s,u+n*l,i*Math.exp(Na*t)]}var r=n[0],u=n[1],i=n[2],o=t[0],a=t[1],c=t[2],s=o-r,l=a-u,f=s*s+l*l,h=Math.sqrt(f),g=(c*c-i*i+La*f)/(2*i*za*h),p=(c*c-i*i-La*f)/(2*c*za*h),v=Math.log(Math.sqrt(g*g+1)-g),d=Math.log(Math.sqrt(p*p+1)-p),m=d-v,y=(m||Math.log(c/i))/Na;return e.duration=1e3*y,e},Zo.behavior.zoom=function(){function n(n){n.on(A,s).on(Ra+".zoom",f).on("dblclick.zoom",h).on(z,l)}function t(n){return[(n[0]-S.x)/S.k,(n[1]-S.y)/S.k]}function e(n){return[n[0]*S.k+S.x,n[1]*S.k+S.y]}function r(n){S.k=Math.max(E[0],Math.min(E[1],n))}function u(n,t){t=e(t),S.x+=n[0]-t[0],S.y+=n[1]-t[1]}function i(){_&&_.domain(x.range().map(function(n){return(n-S.x)/S.k}).map(x.invert)),w&&w.domain(b.range().map(function(n){return(n-S.y)/S.k}).map(b.invert))}function o(n){n({type:"zoomstart"})}function a(n){i(),n({type:"zoom",scale:S.k,translate:[S.x,S.y]})}function c(n){n({type:"zoomend"})}function s(){function n(){l=1,u(Zo.mouse(r),h),a(s)}function e(){f.on(C,null).on(N,null),g(l&&Zo.event.target===i),c(s)}var r=this,i=Zo.event.target,s=L.of(r,arguments),l=0,f=Zo.select(Wo).on(C,n).on(N,e),h=t(Zo.mouse(r)),g=I();H.call(r),o(s)}function l(){function n(){var n=Zo.touches(g);return h=S.k,n.forEach(function(n){n.identifier in v&&(v[n.identifier]=t(n))}),n}function e(){var t=Zo.event.target;Zo.select(t).on(M,i).on(_,f),b.push(t);for(var e=Zo.event.changedTouches,o=0,c=e.length;c>o;++o)v[e[o].identifier]=null;var s=n(),l=Date.now();if(1===s.length){if(500>l-m){var h=s[0],g=v[h.identifier];r(2*S.k),u(h,g),y(),a(p)}m=l}else if(s.length>1){var h=s[0],x=s[1],w=h[0]-x[0],k=h[1]-x[1];d=w*w+k*k}}function i(){for(var n,t,e,i,o=Zo.touches(g),c=0,s=o.length;s>c;++c,i=null)if(e=o[c],i=v[e.identifier]){if(t)break;n=e,t=i}if(i){var l=(l=e[0]-n[0])*l+(l=e[1]-n[1])*l,f=d&&Math.sqrt(l/d);n=[(n[0]+e[0])/2,(n[1]+e[1])/2],t=[(t[0]+i[0])/2,(t[1]+i[1])/2],r(f*h)}m=null,u(n,t),a(p)}function f(){if(Zo.event.touches.length){for(var t=Zo.event.changedTouches,e=0,r=t.length;r>e;++e)delete v[t[e].identifier];for(var u in v)return void n()}Zo.selectAll(b).on(x,null),w.on(A,s).on(z,l),k(),c(p)}var h,g=this,p=L.of(g,arguments),v={},d=0,x=".zoom-"+Zo.event.changedTouches[0].identifier,M="touchmove"+x,_="touchend"+x,b=[],w=Zo.select(g).on(A,null).on(z,e),k=I();H.call(g),e(),o(p)}function f(){var n=L.of(this,arguments);d?clearTimeout(d):(g=t(p=v||Zo.mouse(this)),H.call(this),o(n)),d=setTimeout(function(){d=null,c(n)},50),y(),r(Math.pow(2,.002*Ta())*S.k),u(p,g),a(n)}function h(){var n=L.of(this,arguments),e=Zo.mouse(this),i=t(e),s=Math.log(S.k)/Math.LN2;o(n),r(Math.pow(2,Zo.event.shiftKey?Math.ceil(s)-1:Math.floor(s)+1)),u(e,i),a(n),c(n)}var g,p,v,d,m,x,_,b,w,S={x:0,y:0,k:1},k=[960,500],E=qa,A="mousedown.zoom",C="mousemove.zoom",N="mouseup.zoom",z="touchstart.zoom",L=M(n,"zoomstart","zoom","zoomend");return n.event=function(n){n.each(function(){var n=L.of(this,arguments),t=S;Ss?Zo.select(this).transition().each("start.zoom",function(){S=this.__chart__||{x:0,y:0,k:1},o(n)}).tween("zoom:zoom",function(){var e=k[0],r=k[1],u=e/2,i=r/2,o=Zo.interpolateZoom([(u-S.x)/S.k,(i-S.y)/S.k,e/S.k],[(u-t.x)/t.k,(i-t.y)/t.k,e/t.k]);return function(t){var r=o(t),c=e/r[2];this.__chart__=S={x:u-r[0]*c,y:i-r[1]*c,k:c},a(n)}}).each("end.zoom",function(){c(n)}):(this.__chart__=S,o(n),a(n),c(n))})},n.translate=function(t){return arguments.length?(S={x:+t[0],y:+t[1],k:S.k},i(),n):[S.x,S.y]},n.scale=function(t){return arguments.length?(S={x:S.x,y:S.y,k:+t},i(),n):S.k},n.scaleExtent=function(t){return arguments.length?(E=null==t?qa:[+t[0],+t[1]],n):E},n.center=function(t){return arguments.length?(v=t&&[+t[0],+t[1]],n):v},n.size=function(t){return arguments.length?(k=t&&[+t[0],+t[1]],n):k},n.x=function(t){return arguments.length?(_=t,x=t.copy(),S={x:0,y:0,k:1},n):_},n.y=function(t){return arguments.length?(w=t,b=t.copy(),S={x:0,y:0,k:1},n):w},Zo.rebind(n,L,"on")};var Ta,qa=[0,1/0],Ra="onwheel"in $o?(Ta=function(){return-Zo.event.deltaY*(Zo.event.deltaMode?120:1)},"wheel"):"onmousewheel"in $o?(Ta=function(){return Zo.event.wheelDelta},"mousewheel"):(Ta=function(){return-Zo.event.detail},"MozMousePixelScroll");Zo.color=et,et.prototype.toString=function(){return this.rgb()+""},Zo.hsl=rt;var Da=rt.prototype=new et;Da.brighter=function(n){return n=Math.pow(.7,arguments.length?n:1),new rt(this.h,this.s,this.l/n)},Da.darker=function(n){return n=Math.pow(.7,arguments.length?n:1),new rt(this.h,this.s,n*this.l)},Da.rgb=function(){return ut(this.h,this.s,this.l)},Zo.hcl=it;var Pa=it.prototype=new et;Pa.brighter=function(n){return new it(this.h,this.c,Math.min(100,this.l+Ua*(arguments.length?n:1)))},Pa.darker=function(n){return new it(this.h,this.c,Math.max(0,this.l-Ua*(arguments.length?n:1)))},Pa.rgb=function(){return ot(this.h,this.c,this.l).rgb()},Zo.lab=at;var Ua=18,ja=.95047,Ha=1,Fa=1.08883,Oa=at.prototype=new et;Oa.brighter=function(n){return new at(Math.min(100,this.l+Ua*(arguments.length?n:1)),this.a,this.b)},Oa.darker=function(n){return new at(Math.max(0,this.l-Ua*(arguments.length?n:1)),this.a,this.b)},Oa.rgb=function(){return ct(this.l,this.a,this.b)},Zo.rgb=gt;var Ya=gt.prototype=new et;Ya.brighter=function(n){n=Math.pow(.7,arguments.length?n:1);var t=this.r,e=this.g,r=this.b,u=30;return t||e||r?(t&&u>t&&(t=u),e&&u>e&&(e=u),r&&u>r&&(r=u),new gt(Math.min(255,t/n),Math.min(255,e/n),Math.min(255,r/n))):new gt(u,u,u)},Ya.darker=function(n){return n=Math.pow(.7,arguments.length?n:1),new gt(n*this.r,n*this.g,n*this.b)},Ya.hsl=function(){return yt(this.r,this.g,this.b)},Ya.toString=function(){return"#"+dt(this.r)+dt(this.g)+dt(this.b)};var Ia=Zo.map({aliceblue:15792383,antiquewhite:16444375,aqua:65535,aquamarine:8388564,azure:15794175,beige:16119260,bisque:16770244,black:0,blanchedalmond:16772045,blue:255,blueviolet:9055202,brown:10824234,burlywood:14596231,cadetblue:6266528,chartreuse:8388352,chocolate:13789470,coral:16744272,cornflowerblue:6591981,cornsilk:16775388,crimson:14423100,cyan:65535,darkblue:139,darkcyan:35723,darkgoldenrod:12092939,darkgray:11119017,darkgreen:25600,darkgrey:11119017,darkkhaki:12433259,darkmagenta:9109643,darkolivegreen:5597999,darkorange:16747520,darkorchid:10040012,darkred:9109504,darksalmon:15308410,darkseagreen:9419919,darkslateblue:4734347,darkslategray:3100495,darkslategrey:3100495,darkturquoise:52945,darkviolet:9699539,deeppink:16716947,deepskyblue:49151,dimgray:6908265,dimgrey:6908265,dodgerblue:2003199,firebrick:11674146,floralwhite:16775920,forestgreen:2263842,fuchsia:16711935,gainsboro:14474460,ghostwhite:16316671,gold:16766720,goldenrod:14329120,gray:8421504,green:32768,greenyellow:11403055,grey:8421504,honeydew:15794160,hotpink:16738740,indianred:13458524,indigo:4915330,ivory:16777200,khaki:15787660,lavender:15132410,lavenderblush:16773365,lawngreen:8190976,lemonchiffon:16775885,lightblue:11393254,lightcoral:15761536,lightcyan:14745599,lightgoldenrodyellow:16448210,lightgray:13882323,lightgreen:9498256,lightgrey:13882323,lightpink:16758465,lightsalmon:16752762,lightseagreen:2142890,lightskyblue:8900346,lightslategray:7833753,lightslategrey:7833753,lightsteelblue:11584734,lightyellow:16777184,lime:65280,limegreen:3329330,linen:16445670,magenta:16711935,maroon:8388608,mediumaquamarine:6737322,mediumblue:205,mediumorchid:12211667,mediumpurple:9662683,mediumseagreen:3978097,mediumslateblue:8087790,mediumspringgreen:64154,mediumturquoise:4772300,mediumvioletred:13047173,midnightblue:1644912,mintcream:16121850,mistyrose:16770273,moccasin:16770229,navajowhite:16768685,navy:128,oldlace:16643558,olive:8421376,olivedrab:7048739,orange:16753920,orangered:16729344,orchid:14315734,palegoldenrod:15657130,palegreen:10025880,paleturquoise:11529966,palevioletred:14381203,papayawhip:16773077,peachpuff:16767673,peru:13468991,pink:16761035,plum:14524637,powderblue:11591910,purple:8388736,red:16711680,rosybrown:12357519,royalblue:4286945,saddlebrown:9127187,salmon:16416882,sandybrown:16032864,seagreen:3050327,seashell:16774638,sienna:10506797,silver:12632256,skyblue:8900331,slateblue:6970061,slategray:7372944,slategrey:7372944,snow:16775930,springgreen:65407,steelblue:4620980,tan:13808780,teal:32896,thistle:14204888,tomato:16737095,turquoise:4251856,violet:15631086,wheat:16113331,white:16777215,whitesmoke:16119285,yellow:16776960,yellowgreen:10145074});Ia.forEach(function(n,t){Ia.set(n,pt(t))}),Zo.functor=bt,Zo.xhr=St(wt),Zo.dsv=function(n,t){function e(n,e,i){arguments.length<3&&(i=e,e=null);var o=kt(n,t,null==e?r:u(e),i);return o.row=function(n){return arguments.length?o.response(null==(e=n)?r:u(n)):e},o}function r(n){return e.parse(n.responseText)}function u(n){return function(t){return e.parse(t.responseText,n)}}function i(t){return t.map(o).join(n)}function o(n){return a.test(n)?'"'+n.replace(/\"/g,'""')+'"':n}var a=new RegExp('["'+n+"\n]"),c=n.charCodeAt(0);return e.parse=function(n,t){var r;return e.parseRows(n,function(n,e){if(r)return r(n,e-1);var u=new Function("d","return {"+n.map(function(n,t){return JSON.stringify(n)+": d["+t+"]"}).join(",")+"}");r=t?function(n,e){return t(u(n),e)}:u})},e.parseRows=function(n,t){function e(){if(l>=s)return o;if(u)return u=!1,i;var t=l;if(34===n.charCodeAt(t)){for(var e=t;e++<s;)if(34===n.charCodeAt(e)){if(34!==n.charCodeAt(e+1))break;++e}l=e+2;var r=n.charCodeAt(e+1);return 13===r?(u=!0,10===n.charCodeAt(e+2)&&++l):10===r&&(u=!0),n.substring(t+1,e).replace(/""/g,'"')}for(;s>l;){var r=n.charCodeAt(l++),a=1;if(10===r)u=!0;else if(13===r)u=!0,10===n.charCodeAt(l)&&(++l,++a);else if(r!==c)continue;return n.substring(t,l-a)}return n.substring(t)}for(var r,u,i={},o={},a=[],s=n.length,l=0,f=0;(r=e())!==o;){for(var h=[];r!==i&&r!==o;)h.push(r),r=e();(!t||(h=t(h,f++)))&&a.push(h)}return a},e.format=function(t){if(Array.isArray(t[0]))return e.formatRows(t);var r=new h,u=[];return t.forEach(function(n){for(var t in n)r.has(t)||u.push(r.add(t))}),[u.map(o).join(n)].concat(t.map(function(t){return u.map(function(n){return o(t[n])}).join(n)})).join("\n")},e.formatRows=function(n){return n.map(i).join("\n")},e},Zo.csv=Zo.dsv(",","text/csv"),Zo.tsv=Zo.dsv("	","text/tab-separated-values"),Zo.touch=function(n,t,e){if(arguments.length<3&&(e=t,t=x().changedTouches),t)for(var r,u=0,i=t.length;i>u;++u)if((r=t[u]).identifier===e)return Z(n,r)};var Za,Va,Xa,$a,Ba,Wa=Wo[p(Wo,"requestAnimationFrame")]||function(n){setTimeout(n,17)};Zo.timer=function(n,t,e){var r=arguments.length;2>r&&(t=0),3>r&&(e=Date.now());var u=e+t,i={c:n,t:u,f:!1,n:null};Va?Va.n=i:Za=i,Va=i,Xa||($a=clearTimeout($a),Xa=1,Wa(At))},Zo.timer.flush=function(){Ct(),Nt()},Zo.round=function(n,t){return t?Math.round(n*(t=Math.pow(10,t)))/t:Math.round(n)};var Ja=["y","z","a","f","p","n","\xb5","m","","k","M","G","T","P","E","Z","Y"].map(Lt);Zo.formatPrefix=function(n,t){var e=0;return n&&(0>n&&(n*=-1),t&&(n=Zo.round(n,zt(n,t))),e=1+Math.floor(1e-12+Math.log(n)/Math.LN10),e=Math.max(-24,Math.min(24,3*Math.floor((e-1)/3)))),Ja[8+e/3]};var Ga=/(?:([^{])?([<>=^]))?([+\- ])?([$#])?(0)?(\d+)?(,)?(\.-?\d+)?([a-z%])?/i,Ka=Zo.map({b:function(n){return n.toString(2)},c:function(n){return String.fromCharCode(n)},o:function(n){return n.toString(8)},x:function(n){return n.toString(16)},X:function(n){return n.toString(16).toUpperCase()},g:function(n,t){return n.toPrecision(t)},e:function(n,t){return n.toExponential(t)},f:function(n,t){return n.toFixed(t)},r:function(n,t){return(n=Zo.round(n,zt(n,t))).toFixed(Math.max(0,Math.min(20,zt(n*(1+1e-15),t))))}}),Qa=Zo.time={},nc=Date;Rt.prototype={getDate:function(){return this._.getUTCDate()},getDay:function(){return this._.getUTCDay()},getFullYear:function(){return this._.getUTCFullYear()},getHours:function(){return this._.getUTCHours()},getMilliseconds:function(){return this._.getUTCMilliseconds()},getMinutes:function(){return this._.getUTCMinutes()},getMonth:function(){return this._.getUTCMonth()},getSeconds:function(){return this._.getUTCSeconds()},getTime:function(){return this._.getTime()},getTimezoneOffset:function(){return 0},valueOf:function(){return this._.valueOf()},setDate:function(){tc.setUTCDate.apply(this._,arguments)},setDay:function(){tc.setUTCDay.apply(this._,arguments)},setFullYear:function(){tc.setUTCFullYear.apply(this._,arguments)},setHours:function(){tc.setUTCHours.apply(this._,arguments)},setMilliseconds:function(){tc.setUTCMilliseconds.apply(this._,arguments)},setMinutes:function(){tc.setUTCMinutes.apply(this._,arguments)},setMonth:function(){tc.setUTCMonth.apply(this._,arguments)},setSeconds:function(){tc.setUTCSeconds.apply(this._,arguments)},setTime:function(){tc.setTime.apply(this._,arguments)}};var tc=Date.prototype;Qa.year=Dt(function(n){return n=Qa.day(n),n.setMonth(0,1),n},function(n,t){n.setFullYear(n.getFullYear()+t)},function(n){return n.getFullYear()}),Qa.years=Qa.year.range,Qa.years.utc=Qa.year.utc.range,Qa.day=Dt(function(n){var t=new nc(2e3,0);return t.setFullYear(n.getFullYear(),n.getMonth(),n.getDate()),t},function(n,t){n.setDate(n.getDate()+t)},function(n){return n.getDate()-1}),Qa.days=Qa.day.range,Qa.days.utc=Qa.day.utc.range,Qa.dayOfYear=function(n){var t=Qa.year(n);return Math.floor((n-t-6e4*(n.getTimezoneOffset()-t.getTimezoneOffset()))/864e5)},["sunday","monday","tuesday","wednesday","thursday","friday","saturday"].forEach(function(n,t){t=7-t;var e=Qa[n]=Dt(function(n){return(n=Qa.day(n)).setDate(n.getDate()-(n.getDay()+t)%7),n},function(n,t){n.setDate(n.getDate()+7*Math.floor(t))},function(n){var e=Qa.year(n).getDay();return Math.floor((Qa.dayOfYear(n)+(e+t)%7)/7)-(e!==t)});Qa[n+"s"]=e.range,Qa[n+"s"].utc=e.utc.range,Qa[n+"OfYear"]=function(n){var e=Qa.year(n).getDay();return Math.floor((Qa.dayOfYear(n)+(e+t)%7)/7)}}),Qa.week=Qa.sunday,Qa.weeks=Qa.sunday.range,Qa.weeks.utc=Qa.sunday.utc.range,Qa.weekOfYear=Qa.sundayOfYear;var ec={"-":"",_:" ",0:"0"},rc=/^\s*\d+/,uc=/^%/;Zo.locale=function(n){return{numberFormat:Tt(n),timeFormat:Ut(n)}};var ic=Zo.locale({decimal:".",thousands:",",grouping:[3],currency:["$",""],dateTime:"%a %b %e %X %Y",date:"%m/%d/%Y",time:"%H:%M:%S",periods:["AM","PM"],days:["Sunday","Monday","Tuesday","Wednesday","Thursday","Friday","Saturday"],shortDays:["Sun","Mon","Tue","Wed","Thu","Fri","Sat"],months:["January","February","March","April","May","June","July","August","September","October","November","December"],shortMonths:["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]});Zo.format=ic.numberFormat,Zo.geo={},ue.prototype={s:0,t:0,add:function(n){ie(n,this.t,oc),ie(oc.s,this.s,this),this.s?this.t+=oc.t:this.s=oc.t},reset:function(){this.s=this.t=0},valueOf:function(){return this.s}};var oc=new ue;Zo.geo.stream=function(n,t){n&&ac.hasOwnProperty(n.type)?ac[n.type](n,t):oe(n,t)};var ac={Feature:function(n,t){oe(n.geometry,t)},FeatureCollection:function(n,t){for(var e=n.features,r=-1,u=e.length;++r<u;)oe(e[r].geometry,t)}},cc={Sphere:function(n,t){t.sphere()},Point:function(n,t){n=n.coordinates,t.point(n[0],n[1],n[2])},MultiPoint:function(n,t){for(var e=n.coordinates,r=-1,u=e.length;++r<u;)n=e[r],t.point(n[0],n[1],n[2])},LineString:function(n,t){ae(n.coordinates,t,0)},MultiLineString:function(n,t){for(var e=n.coordinates,r=-1,u=e.length;++r<u;)ae(e[r],t,0)},Polygon:function(n,t){ce(n.coordinates,t)},MultiPolygon:function(n,t){for(var e=n.coordinates,r=-1,u=e.length;++r<u;)ce(e[r],t)},GeometryCollection:function(n,t){for(var e=n.geometries,r=-1,u=e.length;++r<u;)oe(e[r],t)}};Zo.geo.area=function(n){return sc=0,Zo.geo.stream(n,fc),sc};var sc,lc=new ue,fc={sphere:function(){sc+=4*ba},point:v,lineStart:v,lineEnd:v,polygonStart:function(){lc.reset(),fc.lineStart=se},polygonEnd:function(){var n=2*lc;sc+=0>n?4*ba+n:n,fc.lineStart=fc.lineEnd=fc.point=v}};Zo.geo.bounds=function(){function n(n,t){x.push(M=[l=n,h=n]),f>t&&(f=t),t>g&&(g=t)}function t(t,e){var r=le([t*Aa,e*Aa]);if(m){var u=he(m,r),i=[u[1],-u[0],0],o=he(i,u);ve(o),o=de(o);var c=t-p,s=c>0?1:-1,v=o[0]*Ca*s,d=ua(c)>180;if(d^(v>s*p&&s*t>v)){var y=o[1]*Ca;y>g&&(g=y)}else if(v=(v+360)%360-180,d^(v>s*p&&s*t>v)){var y=-o[1]*Ca;f>y&&(f=y)}else f>e&&(f=e),e>g&&(g=e);d?p>t?a(l,t)>a(l,h)&&(h=t):a(t,h)>a(l,h)&&(l=t):h>=l?(l>t&&(l=t),t>h&&(h=t)):t>p?a(l,t)>a(l,h)&&(h=t):a(t,h)>a(l,h)&&(l=t)}else n(t,e);m=r,p=t}function e(){_.point=t}function r(){M[0]=l,M[1]=h,_.point=n,m=null}function u(n,e){if(m){var r=n-p;y+=ua(r)>180?r+(r>0?360:-360):r}else v=n,d=e;fc.point(n,e),t(n,e)}function i(){fc.lineStart()}function o(){u(v,d),fc.lineEnd(),ua(y)>ka&&(l=-(h=180)),M[0]=l,M[1]=h,m=null}function a(n,t){return(t-=n)<0?t+360:t}function c(n,t){return n[0]-t[0]}function s(n,t){return t[0]<=t[1]?t[0]<=n&&n<=t[1]:n<t[0]||t[1]<n}var l,f,h,g,p,v,d,m,y,x,M,_={point:n,lineStart:e,lineEnd:r,polygonStart:function(){_.point=u,_.lineStart=i,_.lineEnd=o,y=0,fc.polygonStart()},polygonEnd:function(){fc.polygonEnd(),_.point=n,_.lineStart=e,_.lineEnd=r,0>lc?(l=-(h=180),f=-(g=90)):y>ka?g=90:-ka>y&&(f=-90),M[0]=l,M[1]=h}};return function(n){g=h=-(l=f=1/0),x=[],Zo.geo.stream(n,_);var t=x.length;if(t){x.sort(c);for(var e,r=1,u=x[0],i=[u];t>r;++r)e=x[r],s(e[0],u)||s(e[1],u)?(a(u[0],e[1])>a(u[0],u[1])&&(u[1]=e[1]),a(e[0],u[1])>a(u[0],u[1])&&(u[0]=e[0])):i.push(u=e);
+for(var o,e,p=-1/0,t=i.length-1,r=0,u=i[t];t>=r;u=e,++r)e=i[r],(o=a(u[1],e[0]))>p&&(p=o,l=e[0],h=u[1])}return x=M=null,1/0===l||1/0===f?[[0/0,0/0],[0/0,0/0]]:[[l,f],[h,g]]}}(),Zo.geo.centroid=function(n){hc=gc=pc=vc=dc=mc=yc=xc=Mc=_c=bc=0,Zo.geo.stream(n,wc);var t=Mc,e=_c,r=bc,u=t*t+e*e+r*r;return Ea>u&&(t=mc,e=yc,r=xc,ka>gc&&(t=pc,e=vc,r=dc),u=t*t+e*e+r*r,Ea>u)?[0/0,0/0]:[Math.atan2(e,t)*Ca,G(r/Math.sqrt(u))*Ca]};var hc,gc,pc,vc,dc,mc,yc,xc,Mc,_c,bc,wc={sphere:v,point:ye,lineStart:Me,lineEnd:_e,polygonStart:function(){wc.lineStart=be},polygonEnd:function(){wc.lineStart=Me}},Sc=Ae(we,Te,Re,[-ba,-ba/2]),kc=1e9;Zo.geo.clipExtent=function(){var n,t,e,r,u,i,o={stream:function(n){return u&&(u.valid=!1),u=i(n),u.valid=!0,u},extent:function(a){return arguments.length?(i=Ue(n=+a[0][0],t=+a[0][1],e=+a[1][0],r=+a[1][1]),u&&(u.valid=!1,u=null),o):[[n,t],[e,r]]}};return o.extent([[0,0],[960,500]])},(Zo.geo.conicEqualArea=function(){return He(Fe)}).raw=Fe,Zo.geo.albers=function(){return Zo.geo.conicEqualArea().rotate([96,0]).center([-.6,38.7]).parallels([29.5,45.5]).scale(1070)},Zo.geo.albersUsa=function(){function n(n){var i=n[0],o=n[1];return t=null,e(i,o),t||(r(i,o),t)||u(i,o),t}var t,e,r,u,i=Zo.geo.albers(),o=Zo.geo.conicEqualArea().rotate([154,0]).center([-2,58.5]).parallels([55,65]),a=Zo.geo.conicEqualArea().rotate([157,0]).center([-3,19.9]).parallels([8,18]),c={point:function(n,e){t=[n,e]}};return n.invert=function(n){var t=i.scale(),e=i.translate(),r=(n[0]-e[0])/t,u=(n[1]-e[1])/t;return(u>=.12&&.234>u&&r>=-.425&&-.214>r?o:u>=.166&&.234>u&&r>=-.214&&-.115>r?a:i).invert(n)},n.stream=function(n){var t=i.stream(n),e=o.stream(n),r=a.stream(n);return{point:function(n,u){t.point(n,u),e.point(n,u),r.point(n,u)},sphere:function(){t.sphere(),e.sphere(),r.sphere()},lineStart:function(){t.lineStart(),e.lineStart(),r.lineStart()},lineEnd:function(){t.lineEnd(),e.lineEnd(),r.lineEnd()},polygonStart:function(){t.polygonStart(),e.polygonStart(),r.polygonStart()},polygonEnd:function(){t.polygonEnd(),e.polygonEnd(),r.polygonEnd()}}},n.precision=function(t){return arguments.length?(i.precision(t),o.precision(t),a.precision(t),n):i.precision()},n.scale=function(t){return arguments.length?(i.scale(t),o.scale(.35*t),a.scale(t),n.translate(i.translate())):i.scale()},n.translate=function(t){if(!arguments.length)return i.translate();var s=i.scale(),l=+t[0],f=+t[1];return e=i.translate(t).clipExtent([[l-.455*s,f-.238*s],[l+.455*s,f+.238*s]]).stream(c).point,r=o.translate([l-.307*s,f+.201*s]).clipExtent([[l-.425*s+ka,f+.12*s+ka],[l-.214*s-ka,f+.234*s-ka]]).stream(c).point,u=a.translate([l-.205*s,f+.212*s]).clipExtent([[l-.214*s+ka,f+.166*s+ka],[l-.115*s-ka,f+.234*s-ka]]).stream(c).point,n},n.scale(1070)};var Ec,Ac,Cc,Nc,zc,Lc,Tc={point:v,lineStart:v,lineEnd:v,polygonStart:function(){Ac=0,Tc.lineStart=Oe},polygonEnd:function(){Tc.lineStart=Tc.lineEnd=Tc.point=v,Ec+=ua(Ac/2)}},qc={point:Ye,lineStart:v,lineEnd:v,polygonStart:v,polygonEnd:v},Rc={point:Ve,lineStart:Xe,lineEnd:$e,polygonStart:function(){Rc.lineStart=Be},polygonEnd:function(){Rc.point=Ve,Rc.lineStart=Xe,Rc.lineEnd=$e}};Zo.geo.path=function(){function n(n){return n&&("function"==typeof a&&i.pointRadius(+a.apply(this,arguments)),o&&o.valid||(o=u(i)),Zo.geo.stream(n,o)),i.result()}function t(){return o=null,n}var e,r,u,i,o,a=4.5;return n.area=function(n){return Ec=0,Zo.geo.stream(n,u(Tc)),Ec},n.centroid=function(n){return pc=vc=dc=mc=yc=xc=Mc=_c=bc=0,Zo.geo.stream(n,u(Rc)),bc?[Mc/bc,_c/bc]:xc?[mc/xc,yc/xc]:dc?[pc/dc,vc/dc]:[0/0,0/0]},n.bounds=function(n){return zc=Lc=-(Cc=Nc=1/0),Zo.geo.stream(n,u(qc)),[[Cc,Nc],[zc,Lc]]},n.projection=function(n){return arguments.length?(u=(e=n)?n.stream||Ge(n):wt,t()):e},n.context=function(n){return arguments.length?(i=null==(r=n)?new Ie:new We(n),"function"!=typeof a&&i.pointRadius(a),t()):r},n.pointRadius=function(t){return arguments.length?(a="function"==typeof t?t:(i.pointRadius(+t),+t),n):a},n.projection(Zo.geo.albersUsa()).context(null)},Zo.geo.transform=function(n){return{stream:function(t){var e=new Ke(t);for(var r in n)e[r]=n[r];return e}}},Ke.prototype={point:function(n,t){this.stream.point(n,t)},sphere:function(){this.stream.sphere()},lineStart:function(){this.stream.lineStart()},lineEnd:function(){this.stream.lineEnd()},polygonStart:function(){this.stream.polygonStart()},polygonEnd:function(){this.stream.polygonEnd()}},Zo.geo.projection=nr,Zo.geo.projectionMutator=tr,(Zo.geo.equirectangular=function(){return nr(rr)}).raw=rr.invert=rr,Zo.geo.rotation=function(n){function t(t){return t=n(t[0]*Aa,t[1]*Aa),t[0]*=Ca,t[1]*=Ca,t}return n=ir(n[0]%360*Aa,n[1]*Aa,n.length>2?n[2]*Aa:0),t.invert=function(t){return t=n.invert(t[0]*Aa,t[1]*Aa),t[0]*=Ca,t[1]*=Ca,t},t},ur.invert=rr,Zo.geo.circle=function(){function n(){var n="function"==typeof r?r.apply(this,arguments):r,t=ir(-n[0]*Aa,-n[1]*Aa,0).invert,u=[];return e(null,null,1,{point:function(n,e){u.push(n=t(n,e)),n[0]*=Ca,n[1]*=Ca}}),{type:"Polygon",coordinates:[u]}}var t,e,r=[0,0],u=6;return n.origin=function(t){return arguments.length?(r=t,n):r},n.angle=function(r){return arguments.length?(e=sr((t=+r)*Aa,u*Aa),n):t},n.precision=function(r){return arguments.length?(e=sr(t*Aa,(u=+r)*Aa),n):u},n.angle(90)},Zo.geo.distance=function(n,t){var e,r=(t[0]-n[0])*Aa,u=n[1]*Aa,i=t[1]*Aa,o=Math.sin(r),a=Math.cos(r),c=Math.sin(u),s=Math.cos(u),l=Math.sin(i),f=Math.cos(i);return Math.atan2(Math.sqrt((e=f*o)*e+(e=s*l-c*f*a)*e),c*l+s*f*a)},Zo.geo.graticule=function(){function n(){return{type:"MultiLineString",coordinates:t()}}function t(){return Zo.range(Math.ceil(i/d)*d,u,d).map(h).concat(Zo.range(Math.ceil(s/m)*m,c,m).map(g)).concat(Zo.range(Math.ceil(r/p)*p,e,p).filter(function(n){return ua(n%d)>ka}).map(l)).concat(Zo.range(Math.ceil(a/v)*v,o,v).filter(function(n){return ua(n%m)>ka}).map(f))}var e,r,u,i,o,a,c,s,l,f,h,g,p=10,v=p,d=90,m=360,y=2.5;return n.lines=function(){return t().map(function(n){return{type:"LineString",coordinates:n}})},n.outline=function(){return{type:"Polygon",coordinates:[h(i).concat(g(c).slice(1),h(u).reverse().slice(1),g(s).reverse().slice(1))]}},n.extent=function(t){return arguments.length?n.majorExtent(t).minorExtent(t):n.minorExtent()},n.majorExtent=function(t){return arguments.length?(i=+t[0][0],u=+t[1][0],s=+t[0][1],c=+t[1][1],i>u&&(t=i,i=u,u=t),s>c&&(t=s,s=c,c=t),n.precision(y)):[[i,s],[u,c]]},n.minorExtent=function(t){return arguments.length?(r=+t[0][0],e=+t[1][0],a=+t[0][1],o=+t[1][1],r>e&&(t=r,r=e,e=t),a>o&&(t=a,a=o,o=t),n.precision(y)):[[r,a],[e,o]]},n.step=function(t){return arguments.length?n.majorStep(t).minorStep(t):n.minorStep()},n.majorStep=function(t){return arguments.length?(d=+t[0],m=+t[1],n):[d,m]},n.minorStep=function(t){return arguments.length?(p=+t[0],v=+t[1],n):[p,v]},n.precision=function(t){return arguments.length?(y=+t,l=fr(a,o,90),f=hr(r,e,y),h=fr(s,c,90),g=hr(i,u,y),n):y},n.majorExtent([[-180,-90+ka],[180,90-ka]]).minorExtent([[-180,-80-ka],[180,80+ka]])},Zo.geo.greatArc=function(){function n(){return{type:"LineString",coordinates:[t||r.apply(this,arguments),e||u.apply(this,arguments)]}}var t,e,r=gr,u=pr;return n.distance=function(){return Zo.geo.distance(t||r.apply(this,arguments),e||u.apply(this,arguments))},n.source=function(e){return arguments.length?(r=e,t="function"==typeof e?null:e,n):r},n.target=function(t){return arguments.length?(u=t,e="function"==typeof t?null:t,n):u},n.precision=function(){return arguments.length?n:0},n},Zo.geo.interpolate=function(n,t){return vr(n[0]*Aa,n[1]*Aa,t[0]*Aa,t[1]*Aa)},Zo.geo.length=function(n){return Dc=0,Zo.geo.stream(n,Pc),Dc};var Dc,Pc={sphere:v,point:v,lineStart:dr,lineEnd:v,polygonStart:v,polygonEnd:v},Uc=mr(function(n){return Math.sqrt(2/(1+n))},function(n){return 2*Math.asin(n/2)});(Zo.geo.azimuthalEqualArea=function(){return nr(Uc)}).raw=Uc;var jc=mr(function(n){var t=Math.acos(n);return t&&t/Math.sin(t)},wt);(Zo.geo.azimuthalEquidistant=function(){return nr(jc)}).raw=jc,(Zo.geo.conicConformal=function(){return He(yr)}).raw=yr,(Zo.geo.conicEquidistant=function(){return He(xr)}).raw=xr;var Hc=mr(function(n){return 1/n},Math.atan);(Zo.geo.gnomonic=function(){return nr(Hc)}).raw=Hc,Mr.invert=function(n,t){return[n,2*Math.atan(Math.exp(t))-Sa]},(Zo.geo.mercator=function(){return _r(Mr)}).raw=Mr;var Fc=mr(function(){return 1},Math.asin);(Zo.geo.orthographic=function(){return nr(Fc)}).raw=Fc;var Oc=mr(function(n){return 1/(1+n)},function(n){return 2*Math.atan(n)});(Zo.geo.stereographic=function(){return nr(Oc)}).raw=Oc,br.invert=function(n,t){return[-t,2*Math.atan(Math.exp(n))-Sa]},(Zo.geo.transverseMercator=function(){var n=_r(br),t=n.center,e=n.rotate;return n.center=function(n){return n?t([-n[1],n[0]]):(n=t(),[n[1],-n[0]])},n.rotate=function(n){return n?e([n[0],n[1],n.length>2?n[2]+90:90]):(n=e(),[n[0],n[1],n[2]-90])},e([0,0,90])}).raw=br,Zo.geom={},Zo.geom.hull=function(n){function t(n){if(n.length<3)return[];var t,u=bt(e),i=bt(r),o=n.length,a=[],c=[];for(t=0;o>t;t++)a.push([+u.call(this,n[t],t),+i.call(this,n[t],t),t]);for(a.sort(Er),t=0;o>t;t++)c.push([a[t][0],-a[t][1]]);var s=kr(a),l=kr(c),f=l[0]===s[0],h=l[l.length-1]===s[s.length-1],g=[];for(t=s.length-1;t>=0;--t)g.push(n[a[s[t]][2]]);for(t=+f;t<l.length-h;++t)g.push(n[a[l[t]][2]]);return g}var e=wr,r=Sr;return arguments.length?t(n):(t.x=function(n){return arguments.length?(e=n,t):e},t.y=function(n){return arguments.length?(r=n,t):r},t)},Zo.geom.polygon=function(n){return sa(n,Yc),n};var Yc=Zo.geom.polygon.prototype=[];Yc.area=function(){for(var n,t=-1,e=this.length,r=this[e-1],u=0;++t<e;)n=r,r=this[t],u+=n[1]*r[0]-n[0]*r[1];return.5*u},Yc.centroid=function(n){var t,e,r=-1,u=this.length,i=0,o=0,a=this[u-1];for(arguments.length||(n=-1/(6*this.area()));++r<u;)t=a,a=this[r],e=t[0]*a[1]-a[0]*t[1],i+=(t[0]+a[0])*e,o+=(t[1]+a[1])*e;return[i*n,o*n]},Yc.clip=function(n){for(var t,e,r,u,i,o,a=Nr(n),c=-1,s=this.length-Nr(this),l=this[s-1];++c<s;){for(t=n.slice(),n.length=0,u=this[c],i=t[(r=t.length-a)-1],e=-1;++e<r;)o=t[e],Ar(o,l,u)?(Ar(i,l,u)||n.push(Cr(i,o,l,u)),n.push(o)):Ar(i,l,u)&&n.push(Cr(i,o,l,u)),i=o;a&&n.push(n[0]),l=u}return n};var Ic,Zc,Vc,Xc,$c,Bc=[],Wc=[];Ur.prototype.prepare=function(){for(var n,t=this.edges,e=t.length;e--;)n=t[e].edge,n.b&&n.a||t.splice(e,1);return t.sort(Hr),t.length},Wr.prototype={start:function(){return this.edge.l===this.site?this.edge.a:this.edge.b},end:function(){return this.edge.l===this.site?this.edge.b:this.edge.a}},Jr.prototype={insert:function(n,t){var e,r,u;if(n){if(t.P=n,t.N=n.N,n.N&&(n.N.P=t),n.N=t,n.R){for(n=n.R;n.L;)n=n.L;n.L=t}else n.R=t;e=n}else this._?(n=nu(this._),t.P=null,t.N=n,n.P=n.L=t,e=n):(t.P=t.N=null,this._=t,e=null);for(t.L=t.R=null,t.U=e,t.C=!0,n=t;e&&e.C;)r=e.U,e===r.L?(u=r.R,u&&u.C?(e.C=u.C=!1,r.C=!0,n=r):(n===e.R&&(Kr(this,e),n=e,e=n.U),e.C=!1,r.C=!0,Qr(this,r))):(u=r.L,u&&u.C?(e.C=u.C=!1,r.C=!0,n=r):(n===e.L&&(Qr(this,e),n=e,e=n.U),e.C=!1,r.C=!0,Kr(this,r))),e=n.U;this._.C=!1},remove:function(n){n.N&&(n.N.P=n.P),n.P&&(n.P.N=n.N),n.N=n.P=null;var t,e,r,u=n.U,i=n.L,o=n.R;if(e=i?o?nu(o):i:o,u?u.L===n?u.L=e:u.R=e:this._=e,i&&o?(r=e.C,e.C=n.C,e.L=i,i.U=e,e!==o?(u=e.U,e.U=n.U,n=e.R,u.L=n,e.R=o,o.U=e):(e.U=u,u=e,n=e.R)):(r=n.C,n=e),n&&(n.U=u),!r){if(n&&n.C)return n.C=!1,void 0;do{if(n===this._)break;if(n===u.L){if(t=u.R,t.C&&(t.C=!1,u.C=!0,Kr(this,u),t=u.R),t.L&&t.L.C||t.R&&t.R.C){t.R&&t.R.C||(t.L.C=!1,t.C=!0,Qr(this,t),t=u.R),t.C=u.C,u.C=t.R.C=!1,Kr(this,u),n=this._;break}}else if(t=u.L,t.C&&(t.C=!1,u.C=!0,Qr(this,u),t=u.L),t.L&&t.L.C||t.R&&t.R.C){t.L&&t.L.C||(t.R.C=!1,t.C=!0,Kr(this,t),t=u.L),t.C=u.C,u.C=t.L.C=!1,Qr(this,u),n=this._;break}t.C=!0,n=u,u=u.U}while(!n.C);n&&(n.C=!1)}}},Zo.geom.voronoi=function(n){function t(n){var t=new Array(n.length),r=a[0][0],u=a[0][1],i=a[1][0],o=a[1][1];return tu(e(n),a).cells.forEach(function(e,a){var c=e.edges,s=e.site,l=t[a]=c.length?c.map(function(n){var t=n.start();return[t.x,t.y]}):s.x>=r&&s.x<=i&&s.y>=u&&s.y<=o?[[r,o],[i,o],[i,u],[r,u]]:[];l.point=n[a]}),t}function e(n){return n.map(function(n,t){return{x:Math.round(i(n,t)/ka)*ka,y:Math.round(o(n,t)/ka)*ka,i:t}})}var r=wr,u=Sr,i=r,o=u,a=Jc;return n?t(n):(t.links=function(n){return tu(e(n)).edges.filter(function(n){return n.l&&n.r}).map(function(t){return{source:n[t.l.i],target:n[t.r.i]}})},t.triangles=function(n){var t=[];return tu(e(n)).cells.forEach(function(e,r){for(var u,i,o=e.site,a=e.edges.sort(Hr),c=-1,s=a.length,l=a[s-1].edge,f=l.l===o?l.r:l.l;++c<s;)u=l,i=f,l=a[c].edge,f=l.l===o?l.r:l.l,r<i.i&&r<f.i&&ru(o,i,f)<0&&t.push([n[r],n[i.i],n[f.i]])}),t},t.x=function(n){return arguments.length?(i=bt(r=n),t):r},t.y=function(n){return arguments.length?(o=bt(u=n),t):u},t.clipExtent=function(n){return arguments.length?(a=null==n?Jc:n,t):a===Jc?null:a},t.size=function(n){return arguments.length?t.clipExtent(n&&[[0,0],n]):a===Jc?null:a&&a[1]},t)};var Jc=[[-1e6,-1e6],[1e6,1e6]];Zo.geom.delaunay=function(n){return Zo.geom.voronoi().triangles(n)},Zo.geom.quadtree=function(n,t,e,r,u){function i(n){function i(n,t,e,r,u,i,o,a){if(!isNaN(e)&&!isNaN(r))if(n.leaf){var c=n.x,l=n.y;if(null!=c)if(ua(c-e)+ua(l-r)<.01)s(n,t,e,r,u,i,o,a);else{var f=n.point;n.x=n.y=n.point=null,s(n,f,c,l,u,i,o,a),s(n,t,e,r,u,i,o,a)}else n.x=e,n.y=r,n.point=t}else s(n,t,e,r,u,i,o,a)}function s(n,t,e,r,u,o,a,c){var s=.5*(u+a),l=.5*(o+c),f=e>=s,h=r>=l,g=(h<<1)+f;n.leaf=!1,n=n.nodes[g]||(n.nodes[g]=ou()),f?u=s:a=s,h?o=l:c=l,i(n,t,e,r,u,o,a,c)}var l,f,h,g,p,v,d,m,y,x=bt(a),M=bt(c);if(null!=t)v=t,d=e,m=r,y=u;else if(m=y=-(v=d=1/0),f=[],h=[],p=n.length,o)for(g=0;p>g;++g)l=n[g],l.x<v&&(v=l.x),l.y<d&&(d=l.y),l.x>m&&(m=l.x),l.y>y&&(y=l.y),f.push(l.x),h.push(l.y);else for(g=0;p>g;++g){var _=+x(l=n[g],g),b=+M(l,g);v>_&&(v=_),d>b&&(d=b),_>m&&(m=_),b>y&&(y=b),f.push(_),h.push(b)}var w=m-v,S=y-d;w>S?y=d+w:m=v+S;var k=ou();if(k.add=function(n){i(k,n,+x(n,++g),+M(n,g),v,d,m,y)},k.visit=function(n){au(n,k,v,d,m,y)},g=-1,null==t){for(;++g<p;)i(k,n[g],f[g],h[g],v,d,m,y);--g}else n.forEach(k.add);return f=h=n=l=null,k}var o,a=wr,c=Sr;return(o=arguments.length)?(a=uu,c=iu,3===o&&(u=e,r=t,e=t=0),i(n)):(i.x=function(n){return arguments.length?(a=n,i):a},i.y=function(n){return arguments.length?(c=n,i):c},i.extent=function(n){return arguments.length?(null==n?t=e=r=u=null:(t=+n[0][0],e=+n[0][1],r=+n[1][0],u=+n[1][1]),i):null==t?null:[[t,e],[r,u]]},i.size=function(n){return arguments.length?(null==n?t=e=r=u=null:(t=e=0,r=+n[0],u=+n[1]),i):null==t?null:[r-t,u-e]},i)},Zo.interpolateRgb=cu,Zo.interpolateObject=su,Zo.interpolateNumber=lu,Zo.interpolateString=fu;var Gc=/[-+]?(?:\d+\.?\d*|\.?\d+)(?:[eE][-+]?\d+)?/g,Kc=new RegExp(Gc.source,"g");Zo.interpolate=hu,Zo.interpolators=[function(n,t){var e=typeof t;return("string"===e?Ia.has(t)||/^(#|rgb\(|hsl\()/.test(t)?cu:fu:t instanceof et?cu:Array.isArray(t)?gu:"object"===e&&isNaN(t)?su:lu)(n,t)}],Zo.interpolateArray=gu;var Qc=function(){return wt},ns=Zo.map({linear:Qc,poly:Mu,quad:function(){return mu},cubic:function(){return yu},sin:function(){return _u},exp:function(){return bu},circle:function(){return wu},elastic:Su,back:ku,bounce:function(){return Eu}}),ts=Zo.map({"in":wt,out:vu,"in-out":du,"out-in":function(n){return du(vu(n))}});Zo.ease=function(n){var t=n.indexOf("-"),e=t>=0?n.substring(0,t):n,r=t>=0?n.substring(t+1):"in";return e=ns.get(e)||Qc,r=ts.get(r)||wt,pu(r(e.apply(null,Vo.call(arguments,1))))},Zo.interpolateHcl=Au,Zo.interpolateHsl=Cu,Zo.interpolateLab=Nu,Zo.interpolateRound=zu,Zo.transform=function(n){var t=$o.createElementNS(Zo.ns.prefix.svg,"g");return(Zo.transform=function(n){if(null!=n){t.setAttribute("transform",n);var e=t.transform.baseVal.consolidate()}return new Lu(e?e.matrix:es)})(n)},Lu.prototype.toString=function(){return"translate("+this.translate+")rotate("+this.rotate+")skewX("+this.skew+")scale("+this.scale+")"};var es={a:1,b:0,c:0,d:1,e:0,f:0};Zo.interpolateTransform=Du,Zo.layout={},Zo.layout.bundle=function(){return function(n){for(var t=[],e=-1,r=n.length;++e<r;)t.push(ju(n[e]));return t}},Zo.layout.chord=function(){function n(){var n,s,f,h,g,p={},v=[],d=Zo.range(i),m=[];for(e=[],r=[],n=0,h=-1;++h<i;){for(s=0,g=-1;++g<i;)s+=u[h][g];v.push(s),m.push(Zo.range(i)),n+=s}for(o&&d.sort(function(n,t){return o(v[n],v[t])}),a&&m.forEach(function(n,t){n.sort(function(n,e){return a(u[t][n],u[t][e])})}),n=(wa-l*i)/n,s=0,h=-1;++h<i;){for(f=s,g=-1;++g<i;){var y=d[h],x=m[y][g],M=u[y][x],_=s,b=s+=M*n;p[y+"-"+x]={index:y,subindex:x,startAngle:_,endAngle:b,value:M}}r[y]={index:y,startAngle:f,endAngle:s,value:(s-f)/n},s+=l}for(h=-1;++h<i;)for(g=h-1;++g<i;){var w=p[h+"-"+g],S=p[g+"-"+h];(w.value||S.value)&&e.push(w.value<S.value?{source:S,target:w}:{source:w,target:S})}c&&t()}function t(){e.sort(function(n,t){return c((n.source.value+n.target.value)/2,(t.source.value+t.target.value)/2)})}var e,r,u,i,o,a,c,s={},l=0;return s.matrix=function(n){return arguments.length?(i=(u=n)&&u.length,e=r=null,s):u},s.padding=function(n){return arguments.length?(l=n,e=r=null,s):l},s.sortGroups=function(n){return arguments.length?(o=n,e=r=null,s):o},s.sortSubgroups=function(n){return arguments.length?(a=n,e=null,s):a},s.sortChords=function(n){return arguments.length?(c=n,e&&t(),s):c},s.chords=function(){return e||n(),e},s.groups=function(){return r||n(),r},s},Zo.layout.force=function(){function n(n){return function(t,e,r,u){if(t.point!==n){var i=t.cx-n.x,o=t.cy-n.y,a=u-e,c=i*i+o*o;if(c>a*a/d){if(p>c){var s=t.charge/c;n.px-=i*s,n.py-=o*s}return!0}if(t.point&&c&&p>c){var s=t.pointCharge/c;n.px-=i*s,n.py-=o*s}}return!t.charge}}function t(n){n.px=Zo.event.x,n.py=Zo.event.y,a.resume()}var e,r,u,i,o,a={},c=Zo.dispatch("start","tick","end"),s=[1,1],l=.9,f=rs,h=us,g=-30,p=is,v=.1,d=.64,m=[],y=[];return a.tick=function(){if((r*=.99)<.005)return c.end({type:"end",alpha:r=0}),!0;var t,e,a,f,h,p,d,x,M,_=m.length,b=y.length;for(e=0;b>e;++e)a=y[e],f=a.source,h=a.target,x=h.x-f.x,M=h.y-f.y,(p=x*x+M*M)&&(p=r*i[e]*((p=Math.sqrt(p))-u[e])/p,x*=p,M*=p,h.x-=x*(d=f.weight/(h.weight+f.weight)),h.y-=M*d,f.x+=x*(d=1-d),f.y+=M*d);if((d=r*v)&&(x=s[0]/2,M=s[1]/2,e=-1,d))for(;++e<_;)a=m[e],a.x+=(x-a.x)*d,a.y+=(M-a.y)*d;if(g)for(Vu(t=Zo.geom.quadtree(m),r,o),e=-1;++e<_;)(a=m[e]).fixed||t.visit(n(a));for(e=-1;++e<_;)a=m[e],a.fixed?(a.x=a.px,a.y=a.py):(a.x-=(a.px-(a.px=a.x))*l,a.y-=(a.py-(a.py=a.y))*l);c.tick({type:"tick",alpha:r})},a.nodes=function(n){return arguments.length?(m=n,a):m},a.links=function(n){return arguments.length?(y=n,a):y},a.size=function(n){return arguments.length?(s=n,a):s},a.linkDistance=function(n){return arguments.length?(f="function"==typeof n?n:+n,a):f},a.distance=a.linkDistance,a.linkStrength=function(n){return arguments.length?(h="function"==typeof n?n:+n,a):h},a.friction=function(n){return arguments.length?(l=+n,a):l},a.charge=function(n){return arguments.length?(g="function"==typeof n?n:+n,a):g},a.chargeDistance=function(n){return arguments.length?(p=n*n,a):Math.sqrt(p)},a.gravity=function(n){return arguments.length?(v=+n,a):v},a.theta=function(n){return arguments.length?(d=n*n,a):Math.sqrt(d)},a.alpha=function(n){return arguments.length?(n=+n,r?r=n>0?n:0:n>0&&(c.start({type:"start",alpha:r=n}),Zo.timer(a.tick)),a):r},a.start=function(){function n(n,r){if(!e){for(e=new Array(c),a=0;c>a;++a)e[a]=[];for(a=0;s>a;++a){var u=y[a];e[u.source.index].push(u.target),e[u.target.index].push(u.source)}}for(var i,o=e[t],a=-1,s=o.length;++a<s;)if(!isNaN(i=o[a][n]))return i;return Math.random()*r}var t,e,r,c=m.length,l=y.length,p=s[0],v=s[1];for(t=0;c>t;++t)(r=m[t]).index=t,r.weight=0;for(t=0;l>t;++t)r=y[t],"number"==typeof r.source&&(r.source=m[r.source]),"number"==typeof r.target&&(r.target=m[r.target]),++r.source.weight,++r.target.weight;for(t=0;c>t;++t)r=m[t],isNaN(r.x)&&(r.x=n("x",p)),isNaN(r.y)&&(r.y=n("y",v)),isNaN(r.px)&&(r.px=r.x),isNaN(r.py)&&(r.py=r.y);if(u=[],"function"==typeof f)for(t=0;l>t;++t)u[t]=+f.call(this,y[t],t);else for(t=0;l>t;++t)u[t]=f;if(i=[],"function"==typeof h)for(t=0;l>t;++t)i[t]=+h.call(this,y[t],t);else for(t=0;l>t;++t)i[t]=h;if(o=[],"function"==typeof g)for(t=0;c>t;++t)o[t]=+g.call(this,m[t],t);else for(t=0;c>t;++t)o[t]=g;return a.resume()},a.resume=function(){return a.alpha(.1)},a.stop=function(){return a.alpha(0)},a.drag=function(){return e||(e=Zo.behavior.drag().origin(wt).on("dragstart.force",Ou).on("drag.force",t).on("dragend.force",Yu)),arguments.length?(this.on("mouseover.force",Iu).on("mouseout.force",Zu).call(e),void 0):e},Zo.rebind(a,c,"on")};var rs=20,us=1,is=1/0;Zo.layout.hierarchy=function(){function n(u){var i,o=[u],a=[];for(u.depth=0;null!=(i=o.pop());)if(a.push(i),(s=e.call(n,i,i.depth))&&(c=s.length)){for(var c,s,l;--c>=0;)o.push(l=s[c]),l.parent=i,l.depth=i.depth+1;r&&(i.value=0),i.children=s}else r&&(i.value=+r.call(n,i,i.depth)||0),delete i.children;return Bu(u,function(n){var e,u;t&&(e=n.children)&&e.sort(t),r&&(u=n.parent)&&(u.value+=n.value)}),a}var t=Gu,e=Wu,r=Ju;return n.sort=function(e){return arguments.length?(t=e,n):t},n.children=function(t){return arguments.length?(e=t,n):e},n.value=function(t){return arguments.length?(r=t,n):r},n.revalue=function(t){return r&&($u(t,function(n){n.children&&(n.value=0)}),Bu(t,function(t){var e;t.children||(t.value=+r.call(n,t,t.depth)||0),(e=t.parent)&&(e.value+=t.value)})),t},n},Zo.layout.partition=function(){function n(t,e,r,u){var i=t.children;if(t.x=e,t.y=t.depth*u,t.dx=r,t.dy=u,i&&(o=i.length)){var o,a,c,s=-1;for(r=t.value?r/t.value:0;++s<o;)n(a=i[s],e,c=a.value*r,u),e+=c}}function t(n){var e=n.children,r=0;if(e&&(u=e.length))for(var u,i=-1;++i<u;)r=Math.max(r,t(e[i]));return 1+r}function e(e,i){var o=r.call(this,e,i);return n(o[0],0,u[0],u[1]/t(o[0])),o}var r=Zo.layout.hierarchy(),u=[1,1];return e.size=function(n){return arguments.length?(u=n,e):u},Xu(e,r)},Zo.layout.pie=function(){function n(i){var o=i.map(function(e,r){return+t.call(n,e,r)}),a=+("function"==typeof r?r.apply(this,arguments):r),c=(("function"==typeof u?u.apply(this,arguments):u)-a)/Zo.sum(o),s=Zo.range(i.length);null!=e&&s.sort(e===os?function(n,t){return o[t]-o[n]}:function(n,t){return e(i[n],i[t])});var l=[];return s.forEach(function(n){var t;l[n]={data:i[n],value:t=o[n],startAngle:a,endAngle:a+=t*c}}),l}var t=Number,e=os,r=0,u=wa;return n.value=function(e){return arguments.length?(t=e,n):t},n.sort=function(t){return arguments.length?(e=t,n):e},n.startAngle=function(t){return arguments.length?(r=t,n):r},n.endAngle=function(t){return arguments.length?(u=t,n):u},n};var os={};Zo.layout.stack=function(){function n(a,c){var s=a.map(function(e,r){return t.call(n,e,r)}),l=s.map(function(t){return t.map(function(t,e){return[i.call(n,t,e),o.call(n,t,e)]})}),f=e.call(n,l,c);s=Zo.permute(s,f),l=Zo.permute(l,f);var h,g,p,v=r.call(n,l,c),d=s.length,m=s[0].length;for(g=0;m>g;++g)for(u.call(n,s[0][g],p=v[g],l[0][g][1]),h=1;d>h;++h)u.call(n,s[h][g],p+=l[h-1][g][1],l[h][g][1]);return a}var t=wt,e=ei,r=ri,u=ti,i=Qu,o=ni;return n.values=function(e){return arguments.length?(t=e,n):t},n.order=function(t){return arguments.length?(e="function"==typeof t?t:as.get(t)||ei,n):e},n.offset=function(t){return arguments.length?(r="function"==typeof t?t:cs.get(t)||ri,n):r},n.x=function(t){return arguments.length?(i=t,n):i},n.y=function(t){return arguments.length?(o=t,n):o},n.out=function(t){return arguments.length?(u=t,n):u},n};var as=Zo.map({"inside-out":function(n){var t,e,r=n.length,u=n.map(ui),i=n.map(ii),o=Zo.range(r).sort(function(n,t){return u[n]-u[t]}),a=0,c=0,s=[],l=[];for(t=0;r>t;++t)e=o[t],c>a?(a+=i[e],s.push(e)):(c+=i[e],l.push(e));return l.reverse().concat(s)},reverse:function(n){return Zo.range(n.length).reverse()},"default":ei}),cs=Zo.map({silhouette:function(n){var t,e,r,u=n.length,i=n[0].length,o=[],a=0,c=[];for(e=0;i>e;++e){for(t=0,r=0;u>t;t++)r+=n[t][e][1];r>a&&(a=r),o.push(r)}for(e=0;i>e;++e)c[e]=(a-o[e])/2;return c},wiggle:function(n){var t,e,r,u,i,o,a,c,s,l=n.length,f=n[0],h=f.length,g=[];for(g[0]=c=s=0,e=1;h>e;++e){for(t=0,u=0;l>t;++t)u+=n[t][e][1];for(t=0,i=0,a=f[e][0]-f[e-1][0];l>t;++t){for(r=0,o=(n[t][e][1]-n[t][e-1][1])/(2*a);t>r;++r)o+=(n[r][e][1]-n[r][e-1][1])/a;i+=o*n[t][e][1]}g[e]=c-=u?i/u*a:0,s>c&&(s=c)}for(e=0;h>e;++e)g[e]-=s;return g},expand:function(n){var t,e,r,u=n.length,i=n[0].length,o=1/u,a=[];for(e=0;i>e;++e){for(t=0,r=0;u>t;t++)r+=n[t][e][1];if(r)for(t=0;u>t;t++)n[t][e][1]/=r;else for(t=0;u>t;t++)n[t][e][1]=o}for(e=0;i>e;++e)a[e]=0;return a},zero:ri});Zo.layout.histogram=function(){function n(n,i){for(var o,a,c=[],s=n.map(e,this),l=r.call(this,s,i),f=u.call(this,l,s,i),i=-1,h=s.length,g=f.length-1,p=t?1:1/h;++i<g;)o=c[i]=[],o.dx=f[i+1]-(o.x=f[i]),o.y=0;if(g>0)for(i=-1;++i<h;)a=s[i],a>=l[0]&&a<=l[1]&&(o=c[Zo.bisect(f,a,1,g)-1],o.y+=p,o.push(n[i]));return c}var t=!0,e=Number,r=si,u=ai;return n.value=function(t){return arguments.length?(e=t,n):e},n.range=function(t){return arguments.length?(r=bt(t),n):r},n.bins=function(t){return arguments.length?(u="number"==typeof t?function(n){return ci(n,t)}:bt(t),n):u},n.frequency=function(e){return arguments.length?(t=!!e,n):t},n},Zo.layout.pack=function(){function n(n,i){var o=e.call(this,n,i),a=o[0],c=u[0],s=u[1],l=null==t?Math.sqrt:"function"==typeof t?t:function(){return t};if(a.x=a.y=0,Bu(a,function(n){n.r=+l(n.value)}),Bu(a,pi),r){var f=r*(t?1:Math.max(2*a.r/c,2*a.r/s))/2;Bu(a,function(n){n.r+=f}),Bu(a,pi),Bu(a,function(n){n.r-=f})}return mi(a,c/2,s/2,t?1:1/Math.max(2*a.r/c,2*a.r/s)),o}var t,e=Zo.layout.hierarchy().sort(li),r=0,u=[1,1];return n.size=function(t){return arguments.length?(u=t,n):u},n.radius=function(e){return arguments.length?(t=null==e||"function"==typeof e?e:+e,n):t},n.padding=function(t){return arguments.length?(r=+t,n):r},Xu(n,e)},Zo.layout.tree=function(){function n(n,u){var l=o.call(this,n,u),f=l[0],h=t(f);if(Bu(h,e),h.parent.m=-h.z,$u(h,r),s)$u(f,i);else{var g=f,p=f,v=f;$u(f,function(n){n.x<g.x&&(g=n),n.x>p.x&&(p=n),n.depth>v.depth&&(v=n)});var d=a(g,p)/2-g.x,m=c[0]/(p.x+a(p,g)/2+d),y=c[1]/(v.depth||1);$u(f,function(n){n.x=(n.x+d)*m,n.y=n.depth*y})}return l}function t(n){for(var t,e={A:null,children:[n]},r=[e];null!=(t=r.pop());)for(var u,i=t.children,o=0,a=i.length;a>o;++o)r.push((i[o]=u={_:i[o],parent:t,children:(u=i[o].children)&&u.slice()||[],A:null,a:null,z:0,m:0,c:0,s:0,t:null,i:o}).a=u);return e.children[0]}function e(n){var t=n.children,e=n.parent.children,r=n.i?e[n.i-1]:null;if(t.length){wi(n);var i=(t[0].z+t[t.length-1].z)/2;r?(n.z=r.z+a(n._,r._),n.m=n.z-i):n.z=i}else r&&(n.z=r.z+a(n._,r._));n.parent.A=u(n,r,n.parent.A||e[0])}function r(n){n._.x=n.z+n.parent.m,n.m+=n.parent.m}function u(n,t,e){if(t){for(var r,u=n,i=n,o=t,c=u.parent.children[0],s=u.m,l=i.m,f=o.m,h=c.m;o=_i(o),u=Mi(u),o&&u;)c=Mi(c),i=_i(i),i.a=n,r=o.z+f-u.z-s+a(o._,u._),r>0&&(bi(Si(o,n,e),n,r),s+=r,l+=r),f+=o.m,s+=u.m,h+=c.m,l+=i.m;o&&!_i(i)&&(i.t=o,i.m+=f-l),u&&!Mi(c)&&(c.t=u,c.m+=s-h,e=n)}return e}function i(n){n.x*=c[0],n.y=n.depth*c[1]}var o=Zo.layout.hierarchy().sort(null).value(null),a=xi,c=[1,1],s=null;return n.separation=function(t){return arguments.length?(a=t,n):a},n.size=function(t){return arguments.length?(s=null==(c=t)?i:null,n):s?null:c},n.nodeSize=function(t){return arguments.length?(s=null==(c=t)?null:i,n):s?c:null},Xu(n,o)},Zo.layout.cluster=function(){function n(n,i){var o,a=t.call(this,n,i),c=a[0],s=0;Bu(c,function(n){var t=n.children;t&&t.length?(n.x=Ei(t),n.y=ki(t)):(n.x=o?s+=e(n,o):0,n.y=0,o=n)});var l=Ai(c),f=Ci(c),h=l.x-e(l,f)/2,g=f.x+e(f,l)/2;return Bu(c,u?function(n){n.x=(n.x-c.x)*r[0],n.y=(c.y-n.y)*r[1]}:function(n){n.x=(n.x-h)/(g-h)*r[0],n.y=(1-(c.y?n.y/c.y:1))*r[1]}),a}var t=Zo.layout.hierarchy().sort(null).value(null),e=xi,r=[1,1],u=!1;return n.separation=function(t){return arguments.length?(e=t,n):e},n.size=function(t){return arguments.length?(u=null==(r=t),n):u?null:r},n.nodeSize=function(t){return arguments.length?(u=null!=(r=t),n):u?r:null},Xu(n,t)},Zo.layout.treemap=function(){function n(n,t){for(var e,r,u=-1,i=n.length;++u<i;)r=(e=n[u]).value*(0>t?0:t),e.area=isNaN(r)||0>=r?0:r}function t(e){var i=e.children;if(i&&i.length){var o,a,c,s=f(e),l=[],h=i.slice(),p=1/0,v="slice"===g?s.dx:"dice"===g?s.dy:"slice-dice"===g?1&e.depth?s.dy:s.dx:Math.min(s.dx,s.dy);for(n(h,s.dx*s.dy/e.value),l.area=0;(c=h.length)>0;)l.push(o=h[c-1]),l.area+=o.area,"squarify"!==g||(a=r(l,v))<=p?(h.pop(),p=a):(l.area-=l.pop().area,u(l,v,s,!1),v=Math.min(s.dx,s.dy),l.length=l.area=0,p=1/0);l.length&&(u(l,v,s,!0),l.length=l.area=0),i.forEach(t)}}function e(t){var r=t.children;if(r&&r.length){var i,o=f(t),a=r.slice(),c=[];for(n(a,o.dx*o.dy/t.value),c.area=0;i=a.pop();)c.push(i),c.area+=i.area,null!=i.z&&(u(c,i.z?o.dx:o.dy,o,!a.length),c.length=c.area=0);r.forEach(e)}}function r(n,t){for(var e,r=n.area,u=0,i=1/0,o=-1,a=n.length;++o<a;)(e=n[o].area)&&(i>e&&(i=e),e>u&&(u=e));return r*=r,t*=t,r?Math.max(t*u*p/r,r/(t*i*p)):1/0}function u(n,t,e,r){var u,i=-1,o=n.length,a=e.x,s=e.y,l=t?c(n.area/t):0;if(t==e.dx){for((r||l>e.dy)&&(l=e.dy);++i<o;)u=n[i],u.x=a,u.y=s,u.dy=l,a+=u.dx=Math.min(e.x+e.dx-a,l?c(u.area/l):0);u.z=!0,u.dx+=e.x+e.dx-a,e.y+=l,e.dy-=l}else{for((r||l>e.dx)&&(l=e.dx);++i<o;)u=n[i],u.x=a,u.y=s,u.dx=l,s+=u.dy=Math.min(e.y+e.dy-s,l?c(u.area/l):0);u.z=!1,u.dy+=e.y+e.dy-s,e.x+=l,e.dx-=l}}function i(r){var u=o||a(r),i=u[0];return i.x=0,i.y=0,i.dx=s[0],i.dy=s[1],o&&a.revalue(i),n([i],i.dx*i.dy/i.value),(o?e:t)(i),h&&(o=u),u}var o,a=Zo.layout.hierarchy(),c=Math.round,s=[1,1],l=null,f=Ni,h=!1,g="squarify",p=.5*(1+Math.sqrt(5));return i.size=function(n){return arguments.length?(s=n,i):s},i.padding=function(n){function t(t){var e=n.call(i,t,t.depth);return null==e?Ni(t):zi(t,"number"==typeof e?[e,e,e,e]:e)}function e(t){return zi(t,n)}if(!arguments.length)return l;var r;return f=null==(l=n)?Ni:"function"==(r=typeof n)?t:"number"===r?(n=[n,n,n,n],e):e,i},i.round=function(n){return arguments.length?(c=n?Math.round:Number,i):c!=Number},i.sticky=function(n){return arguments.length?(h=n,o=null,i):h},i.ratio=function(n){return arguments.length?(p=n,i):p},i.mode=function(n){return arguments.length?(g=n+"",i):g},Xu(i,a)},Zo.random={normal:function(n,t){var e=arguments.length;return 2>e&&(t=1),1>e&&(n=0),function(){var e,r,u;do e=2*Math.random()-1,r=2*Math.random()-1,u=e*e+r*r;while(!u||u>1);return n+t*e*Math.sqrt(-2*Math.log(u)/u)}},logNormal:function(){var n=Zo.random.normal.apply(Zo,arguments);return function(){return Math.exp(n())}},bates:function(n){var t=Zo.random.irwinHall(n);return function(){return t()/n}},irwinHall:function(n){return function(){for(var t=0,e=0;n>e;e++)t+=Math.random();return t}}},Zo.scale={};var ss={floor:wt,ceil:wt};Zo.scale.linear=function(){return Ui([0,1],[0,1],hu,!1)};var ls={s:1,g:1,p:1,r:1,e:1};Zo.scale.log=function(){return Vi(Zo.scale.linear().domain([0,1]),10,!0,[1,10])};var fs=Zo.format(".0e"),hs={floor:function(n){return-Math.ceil(-n)},ceil:function(n){return-Math.floor(-n)}};Zo.scale.pow=function(){return Xi(Zo.scale.linear(),1,[0,1])},Zo.scale.sqrt=function(){return Zo.scale.pow().exponent(.5)},Zo.scale.ordinal=function(){return Bi([],{t:"range",a:[[]]})},Zo.scale.category10=function(){return Zo.scale.ordinal().range(gs)},Zo.scale.category20=function(){return Zo.scale.ordinal().range(ps)},Zo.scale.category20b=function(){return Zo.scale.ordinal().range(vs)},Zo.scale.category20c=function(){return Zo.scale.ordinal().range(ds)};var gs=[2062260,16744206,2924588,14034728,9725885,9197131,14907330,8355711,12369186,1556175].map(vt),ps=[2062260,11454440,16744206,16759672,2924588,10018698,14034728,16750742,9725885,12955861,9197131,12885140,14907330,16234194,8355711,13092807,12369186,14408589,1556175,10410725].map(vt),vs=[3750777,5395619,7040719,10264286,6519097,9216594,11915115,13556636,9202993,12426809,15186514,15190932,8666169,11356490,14049643,15177372,8077683,10834324,13528509,14589654].map(vt),ds=[3244733,7057110,10406625,13032431,15095053,16616764,16625259,16634018,3253076,7652470,10607003,13101504,7695281,10394312,12369372,14342891,6513507,9868950,12434877,14277081].map(vt);Zo.scale.quantile=function(){return Wi([],[])},Zo.scale.quantize=function(){return Ji(0,1,[0,1])},Zo.scale.threshold=function(){return Gi([.5],[0,1])},Zo.scale.identity=function(){return Ki([0,1])},Zo.svg={},Zo.svg.arc=function(){function n(){var n=t.apply(this,arguments),i=e.apply(this,arguments),o=r.apply(this,arguments)+ms,a=u.apply(this,arguments)+ms,c=(o>a&&(c=o,o=a,a=c),a-o),s=ba>c?"0":"1",l=Math.cos(o),f=Math.sin(o),h=Math.cos(a),g=Math.sin(a);
+return c>=ys?n?"M0,"+i+"A"+i+","+i+" 0 1,1 0,"+-i+"A"+i+","+i+" 0 1,1 0,"+i+"M0,"+n+"A"+n+","+n+" 0 1,0 0,"+-n+"A"+n+","+n+" 0 1,0 0,"+n+"Z":"M0,"+i+"A"+i+","+i+" 0 1,1 0,"+-i+"A"+i+","+i+" 0 1,1 0,"+i+"Z":n?"M"+i*l+","+i*f+"A"+i+","+i+" 0 "+s+",1 "+i*h+","+i*g+"L"+n*h+","+n*g+"A"+n+","+n+" 0 "+s+",0 "+n*l+","+n*f+"Z":"M"+i*l+","+i*f+"A"+i+","+i+" 0 "+s+",1 "+i*h+","+i*g+"L0,0"+"Z"}var t=Qi,e=no,r=to,u=eo;return n.innerRadius=function(e){return arguments.length?(t=bt(e),n):t},n.outerRadius=function(t){return arguments.length?(e=bt(t),n):e},n.startAngle=function(t){return arguments.length?(r=bt(t),n):r},n.endAngle=function(t){return arguments.length?(u=bt(t),n):u},n.centroid=function(){var n=(t.apply(this,arguments)+e.apply(this,arguments))/2,i=(r.apply(this,arguments)+u.apply(this,arguments))/2+ms;return[Math.cos(i)*n,Math.sin(i)*n]},n};var ms=-Sa,ys=wa-ka;Zo.svg.line=function(){return ro(wt)};var xs=Zo.map({linear:uo,"linear-closed":io,step:oo,"step-before":ao,"step-after":co,basis:po,"basis-open":vo,"basis-closed":mo,bundle:yo,cardinal:fo,"cardinal-open":so,"cardinal-closed":lo,monotone:So});xs.forEach(function(n,t){t.key=n,t.closed=/-closed$/.test(n)});var Ms=[0,2/3,1/3,0],_s=[0,1/3,2/3,0],bs=[0,1/6,2/3,1/6];Zo.svg.line.radial=function(){var n=ro(ko);return n.radius=n.x,delete n.x,n.angle=n.y,delete n.y,n},ao.reverse=co,co.reverse=ao,Zo.svg.area=function(){return Eo(wt)},Zo.svg.area.radial=function(){var n=Eo(ko);return n.radius=n.x,delete n.x,n.innerRadius=n.x0,delete n.x0,n.outerRadius=n.x1,delete n.x1,n.angle=n.y,delete n.y,n.startAngle=n.y0,delete n.y0,n.endAngle=n.y1,delete n.y1,n},Zo.svg.chord=function(){function n(n,a){var c=t(this,i,n,a),s=t(this,o,n,a);return"M"+c.p0+r(c.r,c.p1,c.a1-c.a0)+(e(c,s)?u(c.r,c.p1,c.r,c.p0):u(c.r,c.p1,s.r,s.p0)+r(s.r,s.p1,s.a1-s.a0)+u(s.r,s.p1,c.r,c.p0))+"Z"}function t(n,t,e,r){var u=t.call(n,e,r),i=a.call(n,u,r),o=c.call(n,u,r)+ms,l=s.call(n,u,r)+ms;return{r:i,a0:o,a1:l,p0:[i*Math.cos(o),i*Math.sin(o)],p1:[i*Math.cos(l),i*Math.sin(l)]}}function e(n,t){return n.a0==t.a0&&n.a1==t.a1}function r(n,t,e){return"A"+n+","+n+" 0 "+ +(e>ba)+",1 "+t}function u(n,t,e,r){return"Q 0,0 "+r}var i=gr,o=pr,a=Ao,c=to,s=eo;return n.radius=function(t){return arguments.length?(a=bt(t),n):a},n.source=function(t){return arguments.length?(i=bt(t),n):i},n.target=function(t){return arguments.length?(o=bt(t),n):o},n.startAngle=function(t){return arguments.length?(c=bt(t),n):c},n.endAngle=function(t){return arguments.length?(s=bt(t),n):s},n},Zo.svg.diagonal=function(){function n(n,u){var i=t.call(this,n,u),o=e.call(this,n,u),a=(i.y+o.y)/2,c=[i,{x:i.x,y:a},{x:o.x,y:a},o];return c=c.map(r),"M"+c[0]+"C"+c[1]+" "+c[2]+" "+c[3]}var t=gr,e=pr,r=Co;return n.source=function(e){return arguments.length?(t=bt(e),n):t},n.target=function(t){return arguments.length?(e=bt(t),n):e},n.projection=function(t){return arguments.length?(r=t,n):r},n},Zo.svg.diagonal.radial=function(){var n=Zo.svg.diagonal(),t=Co,e=n.projection;return n.projection=function(n){return arguments.length?e(No(t=n)):t},n},Zo.svg.symbol=function(){function n(n,r){return(ws.get(t.call(this,n,r))||To)(e.call(this,n,r))}var t=Lo,e=zo;return n.type=function(e){return arguments.length?(t=bt(e),n):t},n.size=function(t){return arguments.length?(e=bt(t),n):e},n};var ws=Zo.map({circle:To,cross:function(n){var t=Math.sqrt(n/5)/2;return"M"+-3*t+","+-t+"H"+-t+"V"+-3*t+"H"+t+"V"+-t+"H"+3*t+"V"+t+"H"+t+"V"+3*t+"H"+-t+"V"+t+"H"+-3*t+"Z"},diamond:function(n){var t=Math.sqrt(n/(2*As)),e=t*As;return"M0,"+-t+"L"+e+",0"+" 0,"+t+" "+-e+",0"+"Z"},square:function(n){var t=Math.sqrt(n)/2;return"M"+-t+","+-t+"L"+t+","+-t+" "+t+","+t+" "+-t+","+t+"Z"},"triangle-down":function(n){var t=Math.sqrt(n/Es),e=t*Es/2;return"M0,"+e+"L"+t+","+-e+" "+-t+","+-e+"Z"},"triangle-up":function(n){var t=Math.sqrt(n/Es),e=t*Es/2;return"M0,"+-e+"L"+t+","+e+" "+-t+","+e+"Z"}});Zo.svg.symbolTypes=ws.keys();var Ss,ks,Es=Math.sqrt(3),As=Math.tan(30*Aa),Cs=[],Ns=0;Cs.call=pa.call,Cs.empty=pa.empty,Cs.node=pa.node,Cs.size=pa.size,Zo.transition=function(n){return arguments.length?Ss?n.transition():n:ma.transition()},Zo.transition.prototype=Cs,Cs.select=function(n){var t,e,r,u=this.id,i=[];n=b(n);for(var o=-1,a=this.length;++o<a;){i.push(t=[]);for(var c=this[o],s=-1,l=c.length;++s<l;)(r=c[s])&&(e=n.call(r,r.__data__,s,o))?("__data__"in r&&(e.__data__=r.__data__),Po(e,s,u,r.__transition__[u]),t.push(e)):t.push(null)}return qo(i,u)},Cs.selectAll=function(n){var t,e,r,u,i,o=this.id,a=[];n=w(n);for(var c=-1,s=this.length;++c<s;)for(var l=this[c],f=-1,h=l.length;++f<h;)if(r=l[f]){i=r.__transition__[o],e=n.call(r,r.__data__,f,c),a.push(t=[]);for(var g=-1,p=e.length;++g<p;)(u=e[g])&&Po(u,g,o,i),t.push(u)}return qo(a,o)},Cs.filter=function(n){var t,e,r,u=[];"function"!=typeof n&&(n=R(n));for(var i=0,o=this.length;o>i;i++){u.push(t=[]);for(var e=this[i],a=0,c=e.length;c>a;a++)(r=e[a])&&n.call(r,r.__data__,a,i)&&t.push(r)}return qo(u,this.id)},Cs.tween=function(n,t){var e=this.id;return arguments.length<2?this.node().__transition__[e].tween.get(n):P(this,null==t?function(t){t.__transition__[e].tween.remove(n)}:function(r){r.__transition__[e].tween.set(n,t)})},Cs.attr=function(n,t){function e(){this.removeAttribute(a)}function r(){this.removeAttributeNS(a.space,a.local)}function u(n){return null==n?e:(n+="",function(){var t,e=this.getAttribute(a);return e!==n&&(t=o(e,n),function(n){this.setAttribute(a,t(n))})})}function i(n){return null==n?r:(n+="",function(){var t,e=this.getAttributeNS(a.space,a.local);return e!==n&&(t=o(e,n),function(n){this.setAttributeNS(a.space,a.local,t(n))})})}if(arguments.length<2){for(t in n)this.attr(t,n[t]);return this}var o="transform"==n?Du:hu,a=Zo.ns.qualify(n);return Ro(this,"attr."+n,t,a.local?i:u)},Cs.attrTween=function(n,t){function e(n,e){var r=t.call(this,n,e,this.getAttribute(u));return r&&function(n){this.setAttribute(u,r(n))}}function r(n,e){var r=t.call(this,n,e,this.getAttributeNS(u.space,u.local));return r&&function(n){this.setAttributeNS(u.space,u.local,r(n))}}var u=Zo.ns.qualify(n);return this.tween("attr."+n,u.local?r:e)},Cs.style=function(n,t,e){function r(){this.style.removeProperty(n)}function u(t){return null==t?r:(t+="",function(){var r,u=Wo.getComputedStyle(this,null).getPropertyValue(n);return u!==t&&(r=hu(u,t),function(t){this.style.setProperty(n,r(t),e)})})}var i=arguments.length;if(3>i){if("string"!=typeof n){2>i&&(t="");for(e in n)this.style(e,n[e],t);return this}e=""}return Ro(this,"style."+n,t,u)},Cs.styleTween=function(n,t,e){function r(r,u){var i=t.call(this,r,u,Wo.getComputedStyle(this,null).getPropertyValue(n));return i&&function(t){this.style.setProperty(n,i(t),e)}}return arguments.length<3&&(e=""),this.tween("style."+n,r)},Cs.text=function(n){return Ro(this,"text",n,Do)},Cs.remove=function(){return this.each("end.transition",function(){var n;this.__transition__.count<2&&(n=this.parentNode)&&n.removeChild(this)})},Cs.ease=function(n){var t=this.id;return arguments.length<1?this.node().__transition__[t].ease:("function"!=typeof n&&(n=Zo.ease.apply(Zo,arguments)),P(this,function(e){e.__transition__[t].ease=n}))},Cs.delay=function(n){var t=this.id;return arguments.length<1?this.node().__transition__[t].delay:P(this,"function"==typeof n?function(e,r,u){e.__transition__[t].delay=+n.call(e,e.__data__,r,u)}:(n=+n,function(e){e.__transition__[t].delay=n}))},Cs.duration=function(n){var t=this.id;return arguments.length<1?this.node().__transition__[t].duration:P(this,"function"==typeof n?function(e,r,u){e.__transition__[t].duration=Math.max(1,n.call(e,e.__data__,r,u))}:(n=Math.max(1,n),function(e){e.__transition__[t].duration=n}))},Cs.each=function(n,t){var e=this.id;if(arguments.length<2){var r=ks,u=Ss;Ss=e,P(this,function(t,r,u){ks=t.__transition__[e],n.call(t,t.__data__,r,u)}),ks=r,Ss=u}else P(this,function(r){var u=r.__transition__[e];(u.event||(u.event=Zo.dispatch("start","end"))).on(n,t)});return this},Cs.transition=function(){for(var n,t,e,r,u=this.id,i=++Ns,o=[],a=0,c=this.length;c>a;a++){o.push(n=[]);for(var t=this[a],s=0,l=t.length;l>s;s++)(e=t[s])&&(r=Object.create(e.__transition__[u]),r.delay+=r.duration,Po(e,s,i,r)),n.push(e)}return qo(o,i)},Zo.svg.axis=function(){function n(n){n.each(function(){var n,s=Zo.select(this),l=this.__chart__||e,f=this.__chart__=e.copy(),h=null==c?f.ticks?f.ticks.apply(f,a):f.domain():c,g=null==t?f.tickFormat?f.tickFormat.apply(f,a):wt:t,p=s.selectAll(".tick").data(h,f),v=p.enter().insert("g",".domain").attr("class","tick").style("opacity",ka),d=Zo.transition(p.exit()).style("opacity",ka).remove(),m=Zo.transition(p.order()).style("opacity",1),y=Ti(f),x=s.selectAll(".domain").data([0]),M=(x.enter().append("path").attr("class","domain"),Zo.transition(x));v.append("line"),v.append("text");var _=v.select("line"),b=m.select("line"),w=p.select("text").text(g),S=v.select("text"),k=m.select("text");switch(r){case"bottom":n=Uo,_.attr("y2",u),S.attr("y",Math.max(u,0)+o),b.attr("x2",0).attr("y2",u),k.attr("x",0).attr("y",Math.max(u,0)+o),w.attr("dy",".71em").style("text-anchor","middle"),M.attr("d","M"+y[0]+","+i+"V0H"+y[1]+"V"+i);break;case"top":n=Uo,_.attr("y2",-u),S.attr("y",-(Math.max(u,0)+o)),b.attr("x2",0).attr("y2",-u),k.attr("x",0).attr("y",-(Math.max(u,0)+o)),w.attr("dy","0em").style("text-anchor","middle"),M.attr("d","M"+y[0]+","+-i+"V0H"+y[1]+"V"+-i);break;case"left":n=jo,_.attr("x2",-u),S.attr("x",-(Math.max(u,0)+o)),b.attr("x2",-u).attr("y2",0),k.attr("x",-(Math.max(u,0)+o)).attr("y",0),w.attr("dy",".32em").style("text-anchor","end"),M.attr("d","M"+-i+","+y[0]+"H0V"+y[1]+"H"+-i);break;case"right":n=jo,_.attr("x2",u),S.attr("x",Math.max(u,0)+o),b.attr("x2",u).attr("y2",0),k.attr("x",Math.max(u,0)+o).attr("y",0),w.attr("dy",".32em").style("text-anchor","start"),M.attr("d","M"+i+","+y[0]+"H0V"+y[1]+"H"+i)}if(f.rangeBand){var E=f,A=E.rangeBand()/2;l=f=function(n){return E(n)+A}}else l.rangeBand?l=f:d.call(n,f);v.call(n,l),m.call(n,f)})}var t,e=Zo.scale.linear(),r=zs,u=6,i=6,o=3,a=[10],c=null;return n.scale=function(t){return arguments.length?(e=t,n):e},n.orient=function(t){return arguments.length?(r=t in Ls?t+"":zs,n):r},n.ticks=function(){return arguments.length?(a=arguments,n):a},n.tickValues=function(t){return arguments.length?(c=t,n):c},n.tickFormat=function(e){return arguments.length?(t=e,n):t},n.tickSize=function(t){var e=arguments.length;return e?(u=+t,i=+arguments[e-1],n):u},n.innerTickSize=function(t){return arguments.length?(u=+t,n):u},n.outerTickSize=function(t){return arguments.length?(i=+t,n):i},n.tickPadding=function(t){return arguments.length?(o=+t,n):o},n.tickSubdivide=function(){return arguments.length&&n},n};var zs="bottom",Ls={top:1,right:1,bottom:1,left:1};Zo.svg.brush=function(){function n(i){i.each(function(){var i=Zo.select(this).style("pointer-events","all").style("-webkit-tap-highlight-color","rgba(0,0,0,0)").on("mousedown.brush",u).on("touchstart.brush",u),o=i.selectAll(".background").data([0]);o.enter().append("rect").attr("class","background").style("visibility","hidden").style("cursor","crosshair"),i.selectAll(".extent").data([0]).enter().append("rect").attr("class","extent").style("cursor","move");var a=i.selectAll(".resize").data(p,wt);a.exit().remove(),a.enter().append("g").attr("class",function(n){return"resize "+n}).style("cursor",function(n){return Ts[n]}).append("rect").attr("x",function(n){return/[ew]$/.test(n)?-3:null}).attr("y",function(n){return/^[ns]/.test(n)?-3:null}).attr("width",6).attr("height",6).style("visibility","hidden"),a.style("display",n.empty()?"none":null);var l,f=Zo.transition(i),h=Zo.transition(o);c&&(l=Ti(c),h.attr("x",l[0]).attr("width",l[1]-l[0]),e(f)),s&&(l=Ti(s),h.attr("y",l[0]).attr("height",l[1]-l[0]),r(f)),t(f)})}function t(n){n.selectAll(".resize").attr("transform",function(n){return"translate("+l[+/e$/.test(n)]+","+f[+/^s/.test(n)]+")"})}function e(n){n.select(".extent").attr("x",l[0]),n.selectAll(".extent,.n>rect,.s>rect").attr("width",l[1]-l[0])}function r(n){n.select(".extent").attr("y",f[0]),n.selectAll(".extent,.e>rect,.w>rect").attr("height",f[1]-f[0])}function u(){function u(){32==Zo.event.keyCode&&(C||(x=null,z[0]-=l[1],z[1]-=f[1],C=2),y())}function p(){32==Zo.event.keyCode&&2==C&&(z[0]+=l[1],z[1]+=f[1],C=0,y())}function v(){var n=Zo.mouse(_),u=!1;M&&(n[0]+=M[0],n[1]+=M[1]),C||(Zo.event.altKey?(x||(x=[(l[0]+l[1])/2,(f[0]+f[1])/2]),z[0]=l[+(n[0]<x[0])],z[1]=f[+(n[1]<x[1])]):x=null),E&&d(n,c,0)&&(e(S),u=!0),A&&d(n,s,1)&&(r(S),u=!0),u&&(t(S),w({type:"brush",mode:C?"move":"resize"}))}function d(n,t,e){var r,u,a=Ti(t),c=a[0],s=a[1],p=z[e],v=e?f:l,d=v[1]-v[0];return C&&(c-=p,s-=d+p),r=(e?g:h)?Math.max(c,Math.min(s,n[e])):n[e],C?u=(r+=p)+d:(x&&(p=Math.max(c,Math.min(s,2*x[e]-r))),r>p?(u=r,r=p):u=p),v[0]!=r||v[1]!=u?(e?o=null:i=null,v[0]=r,v[1]=u,!0):void 0}function m(){v(),S.style("pointer-events","all").selectAll(".resize").style("display",n.empty()?"none":null),Zo.select("body").style("cursor",null),L.on("mousemove.brush",null).on("mouseup.brush",null).on("touchmove.brush",null).on("touchend.brush",null).on("keydown.brush",null).on("keyup.brush",null),N(),w({type:"brushend"})}var x,M,_=this,b=Zo.select(Zo.event.target),w=a.of(_,arguments),S=Zo.select(_),k=b.datum(),E=!/^(n|s)$/.test(k)&&c,A=!/^(e|w)$/.test(k)&&s,C=b.classed("extent"),N=I(),z=Zo.mouse(_),L=Zo.select(Wo).on("keydown.brush",u).on("keyup.brush",p);if(Zo.event.changedTouches?L.on("touchmove.brush",v).on("touchend.brush",m):L.on("mousemove.brush",v).on("mouseup.brush",m),S.interrupt().selectAll("*").interrupt(),C)z[0]=l[0]-z[0],z[1]=f[0]-z[1];else if(k){var T=+/w$/.test(k),q=+/^n/.test(k);M=[l[1-T]-z[0],f[1-q]-z[1]],z[0]=l[T],z[1]=f[q]}else Zo.event.altKey&&(x=z.slice());S.style("pointer-events","none").selectAll(".resize").style("display",null),Zo.select("body").style("cursor",b.style("cursor")),w({type:"brushstart"}),v()}var i,o,a=M(n,"brushstart","brush","brushend"),c=null,s=null,l=[0,0],f=[0,0],h=!0,g=!0,p=qs[0];return n.event=function(n){n.each(function(){var n=a.of(this,arguments),t={x:l,y:f,i:i,j:o},e=this.__chart__||t;this.__chart__=t,Ss?Zo.select(this).transition().each("start.brush",function(){i=e.i,o=e.j,l=e.x,f=e.y,n({type:"brushstart"})}).tween("brush:brush",function(){var e=gu(l,t.x),r=gu(f,t.y);return i=o=null,function(u){l=t.x=e(u),f=t.y=r(u),n({type:"brush",mode:"resize"})}}).each("end.brush",function(){i=t.i,o=t.j,n({type:"brush",mode:"resize"}),n({type:"brushend"})}):(n({type:"brushstart"}),n({type:"brush",mode:"resize"}),n({type:"brushend"}))})},n.x=function(t){return arguments.length?(c=t,p=qs[!c<<1|!s],n):c},n.y=function(t){return arguments.length?(s=t,p=qs[!c<<1|!s],n):s},n.clamp=function(t){return arguments.length?(c&&s?(h=!!t[0],g=!!t[1]):c?h=!!t:s&&(g=!!t),n):c&&s?[h,g]:c?h:s?g:null},n.extent=function(t){var e,r,u,a,h;return arguments.length?(c&&(e=t[0],r=t[1],s&&(e=e[0],r=r[0]),i=[e,r],c.invert&&(e=c(e),r=c(r)),e>r&&(h=e,e=r,r=h),(e!=l[0]||r!=l[1])&&(l=[e,r])),s&&(u=t[0],a=t[1],c&&(u=u[1],a=a[1]),o=[u,a],s.invert&&(u=s(u),a=s(a)),u>a&&(h=u,u=a,a=h),(u!=f[0]||a!=f[1])&&(f=[u,a])),n):(c&&(i?(e=i[0],r=i[1]):(e=l[0],r=l[1],c.invert&&(e=c.invert(e),r=c.invert(r)),e>r&&(h=e,e=r,r=h))),s&&(o?(u=o[0],a=o[1]):(u=f[0],a=f[1],s.invert&&(u=s.invert(u),a=s.invert(a)),u>a&&(h=u,u=a,a=h))),c&&s?[[e,u],[r,a]]:c?[e,r]:s&&[u,a])},n.clear=function(){return n.empty()||(l=[0,0],f=[0,0],i=o=null),n},n.empty=function(){return!!c&&l[0]==l[1]||!!s&&f[0]==f[1]},Zo.rebind(n,a,"on")};var Ts={n:"ns-resize",e:"ew-resize",s:"ns-resize",w:"ew-resize",nw:"nwse-resize",ne:"nesw-resize",se:"nwse-resize",sw:"nesw-resize"},qs=[["n","e","s","w","nw","ne","se","sw"],["e","w"],["n","s"],[]],Rs=Qa.format=ic.timeFormat,Ds=Rs.utc,Ps=Ds("%Y-%m-%dT%H:%M:%S.%LZ");Rs.iso=Date.prototype.toISOString&&+new Date("2000-01-01T00:00:00.000Z")?Ho:Ps,Ho.parse=function(n){var t=new Date(n);return isNaN(t)?null:t},Ho.toString=Ps.toString,Qa.second=Dt(function(n){return new nc(1e3*Math.floor(n/1e3))},function(n,t){n.setTime(n.getTime()+1e3*Math.floor(t))},function(n){return n.getSeconds()}),Qa.seconds=Qa.second.range,Qa.seconds.utc=Qa.second.utc.range,Qa.minute=Dt(function(n){return new nc(6e4*Math.floor(n/6e4))},function(n,t){n.setTime(n.getTime()+6e4*Math.floor(t))},function(n){return n.getMinutes()}),Qa.minutes=Qa.minute.range,Qa.minutes.utc=Qa.minute.utc.range,Qa.hour=Dt(function(n){var t=n.getTimezoneOffset()/60;return new nc(36e5*(Math.floor(n/36e5-t)+t))},function(n,t){n.setTime(n.getTime()+36e5*Math.floor(t))},function(n){return n.getHours()}),Qa.hours=Qa.hour.range,Qa.hours.utc=Qa.hour.utc.range,Qa.month=Dt(function(n){return n=Qa.day(n),n.setDate(1),n},function(n,t){n.setMonth(n.getMonth()+t)},function(n){return n.getMonth()}),Qa.months=Qa.month.range,Qa.months.utc=Qa.month.utc.range;var Us=[1e3,5e3,15e3,3e4,6e4,3e5,9e5,18e5,36e5,108e5,216e5,432e5,864e5,1728e5,6048e5,2592e6,7776e6,31536e6],js=[[Qa.second,1],[Qa.second,5],[Qa.second,15],[Qa.second,30],[Qa.minute,1],[Qa.minute,5],[Qa.minute,15],[Qa.minute,30],[Qa.hour,1],[Qa.hour,3],[Qa.hour,6],[Qa.hour,12],[Qa.day,1],[Qa.day,2],[Qa.week,1],[Qa.month,1],[Qa.month,3],[Qa.year,1]],Hs=Rs.multi([[".%L",function(n){return n.getMilliseconds()}],[":%S",function(n){return n.getSeconds()}],["%I:%M",function(n){return n.getMinutes()}],["%I %p",function(n){return n.getHours()}],["%a %d",function(n){return n.getDay()&&1!=n.getDate()}],["%b %d",function(n){return 1!=n.getDate()}],["%B",function(n){return n.getMonth()}],["%Y",we]]),Fs={range:function(n,t,e){return Zo.range(Math.ceil(n/e)*e,+t,e).map(Oo)},floor:wt,ceil:wt};js.year=Qa.year,Qa.scale=function(){return Fo(Zo.scale.linear(),js,Hs)};var Os=js.map(function(n){return[n[0].utc,n[1]]}),Ys=Ds.multi([[".%L",function(n){return n.getUTCMilliseconds()}],[":%S",function(n){return n.getUTCSeconds()}],["%I:%M",function(n){return n.getUTCMinutes()}],["%I %p",function(n){return n.getUTCHours()}],["%a %d",function(n){return n.getUTCDay()&&1!=n.getUTCDate()}],["%b %d",function(n){return 1!=n.getUTCDate()}],["%B",function(n){return n.getUTCMonth()}],["%Y",we]]);Os.year=Qa.year.utc,Qa.scale.utc=function(){return Fo(Zo.scale.linear(),Os,Ys)},Zo.text=St(function(n){return n.responseText}),Zo.json=function(n,t){return kt(n,"application/json",Yo,t)},Zo.html=function(n,t){return kt(n,"text/html",Io,t)},Zo.xml=St(function(n){return n.responseXML}),"function"==typeof define&&define.amd?define(Zo):"object"==typeof module&&module.exports&&(module.exports=Zo),this.d3=Zo}();
+
+// parseUri 1.2.2
+// (c) Steven Levithan <stevenlevithan.com>
+// MIT License
+
+function parseUri (str) {
+	var	o   = parseUri.options,
+		m   = o.parser[o.strictMode ? "strict" : "loose"].exec(str),
+		uri = {},
+		i   = 14;
+
+	while (i--) uri[o.key[i]] = m[i] || "";
+
+	uri[o.q.name] = {};
+	uri[o.key[12]].replace(o.q.parser, function ($0, $1, $2) {
+		if ($1) uri[o.q.name][$1] = $2;
+	});
+
+	return uri;
+};
+
+parseUri.options = {
+	strictMode: false,
+	key: ["source","protocol","authority","userInfo","user","password","host","port","relative","path","directory","file","query","anchor"],
+	q:   {
+		name:   "queryKey",
+		parser: /(?:^|&)([^&=]*)=?([^&]*)/g
+	},
+	parser: {
+		strict: /^(?:([^:\/?#]+):)?(?:\/\/((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?))?((((?:[^?#\/]*\/)*)([^?#]*))(?:\?([^#]*))?(?:#(.*))?)/,
+		loose:  /^(?:(?![^:@]+:[^:@\/]*@)([^:\/?#.]+):)?(?:\/\/)?((?:(([^:@]*)(?::([^:@]*))?)?@)?([^:\/?#]*)(?::(\d*))?)(((\/(?:[^?#](?![^?#\/]*\.[^?#\/.]+(?:[?#]|$)))*\/?)?([^?#\/]*))(?:\?([^#]*))?(?:#(.*))?)/
+	}
+};
+
+
+// graph.js
+var drawGraph = function() {
+
+    $("svg").remove();
+
+    //Dataset and metric to draw is passed via query option:
+    query = parseUri(location).queryKey;
+    query.stats = unescape(query.stats);
+    stats_db = '/tests/artifacts/' + query.stats + '/stats';
+    var metric = query.metric;
+    var operation = query.operation;
+    var smoothing = query.smoothing;
+    var show_aggregates = query.show_aggregates;
+
+    xmin = query.xmin;
+    xmax = query.xmax;
+    ymin = query.ymin;
+    ymax = query.ymax;
+
+    //Pull metrics from the stats json:
+    stress_metrics = $.extend([], stats['stats'][0]['metrics']);
+    $.each(stress_metrics, function(i,v) {
+        stress_metrics[i] = v.replace(/\W/g,"_");
+    });
+    stress_metric_names = {};
+    $.each(stress_metrics, function(i,v) {
+        stress_metric_names[v] = stats['stats'][0]['metrics'][i];
+    });
+    //Replace names of shorthand metric names with longer ones:
+    //Replace names of shorthand metric names with longer ones:
+    $.extend(stress_metric_names, {
+       "mean": "latency mean",
+       "med" : "latency median",
+       "_95" : "latency 95th pct",
+       "_99" : "latency 99th pct",
+       "_999": "latency 99.9th pct",
+       "max" : "latency max",
+       "max_ms" : "gc max (ms)",
+       "sum_ms" : "gc sum (ms)",
+       "sdv_ms" : "gc sdv (ms)",
+       "mb"     : "gc MB"
+    });
+
+    var updateURLBar = function() {
+        //Update the URL bar with the current parameters:
+        window.history.replaceState(null,null,parseUri(location).path + "?" + $.param(query));
+    };
+    
+    //Check query parameters:
+    if (metric == undefined) {
+        metric = query.metric = 'op_s';
+    }
+    if (operation == undefined) {
+        operation = query.operation = stats['stats'][0]['test'];
+    }
+    if (smoothing == undefined) {
+        smoothing = query.smoothing = 1;
+    }
+    if (show_aggregates == undefined || query.show_aggregates == 'true') {
+        show_aggregates = query.show_aggregates = true;
+    } else {
+        show_aggregates = query.show_aggregates = false;
+    }
+    updateURLBar();
+
+    var metric_index = stress_metrics.indexOf(metric);
+    var time_index = stress_metrics.indexOf('time');
+
+    /// Add dropdown controls to select chart criteria / options:
+    var chart_controls = $('<div id="chart_controls"/>');
+    var chart_controls_tbl = $('<table/>');
+    chart_controls.append(chart_controls_tbl);
+    $('body').append(chart_controls);
+    var metric_selector = $('<select id="metric_selector"/>');
+    $.each(stress_metric_names, function(k,v) {
+        if (k == 'time') {
+            return; //Elapsed time makes no sense to graph, skip it.
+        }
+        var option = $('<option/>').attr('value', k).text(v);
+        if (metric == k) {
+            option.attr('selected','selected');
+        }
+        metric_selector.append(option);
+
+    });
+    chart_controls_tbl.append('<tr><td><label for="metric_selector"/>Choose metric:</label></td><td id="metric_selector_td"></td></tr>')
+    $('#metric_selector_td').append(metric_selector);
+
+    var operation_selector = $('<select id="operation_selector"/>')
+    chart_controls_tbl.append('<tr><td><label for="operation_selector"/>Choose operation:</label></td><td id="operation_selector_td"></td></tr>')
+    $('#operation_selector_td').append(operation_selector);
+
+
+    var smoothing_selector = $('<select id="smoothing_selector"/>')
+    $.each([1,2,3,4,5,6,7,8], function(i, v) {
+        var option = $('<option/>').attr('value', v).text(v);
+        if (smoothing == v) {
+            option.attr('selected','selected');
+        }
+        smoothing_selector.append(option);
+    });
+    chart_controls_tbl.append('<tr><td style="width:150px"><label for="smoothing_selector"/>Data smoothing:</label></td><td id="smoothing_selector_td"></td></tr>')
+    $("#smoothing_selector_td").append(smoothing_selector);
+
+    var show_aggregates_checkbox = $('<input type="checkbox" id="show_aggregates_checkbox"/>');
+    chart_controls_tbl.append('<tr><td style="padding-top:10px"><label for="show_aggregates_checkbox">Show aggregates</label></td><td id="show_aggregates_td"></td></tr>');
+    $("#show_aggregates_td").append(show_aggregates_checkbox);
+    show_aggregates_checkbox.attr("checked", show_aggregates);
+
+    chart_controls_tbl.append('<tr><td colspan="100%">Zoom: <a href="#" id="reset_zoom">reset</a><table id="zoom"><tr><td><label for="xmin"/>x min</label></td><td><input id="xmin"/></td><td><label for="xmax"/>x max</label></td><td><input id="xmax"/></td></tr><tr><td><label for="ymin"/>y min</label></td><td><input id="ymin"/></td><td><label for="ymax"/>y max</label></td><td><input id="ymax"/></td></tr></table></td></tr>');
+
+    chart_controls_tbl.append('<tr><td style="padding-top:10px" colspan="100%">To hide/show a dataset click on the associated colored box</td></tr>');
+
+    var raw_data;
+
+    //Callback to draw graph once we have json data.
+    var graph_callback = function() {
+        var data = [];
+        var trials = {};
+        var data_by_title = {};
+        //Keep track of what operations are availble from the test:
+        var operations = {};
+
+        raw_data.stats.forEach(function(d) {
+            // Make a copy of d so we never modify raw_data
+            d = $.extend({}, d);
+            operations[d.test] = true;
+            if (d.test!=operation) {
+                return;
+            }
+            d.title = d['label'] != undefined ? d['label'] : d['revision'];
+            data_by_title[d.title] = d;
+            data.push(d);
+            trials[d.title] = d;
+            //Clean up the intervals:
+            //Remove every other item, so as to smooth the line:
+            var new_intervals = [];
+            d.intervals.forEach(function(i, x) {
+                if (x % smoothing == 0) {
+                    new_intervals.push(i);
+                }
+            });
+            d.intervals = new_intervals;
+        });
+
+        //Fill operations available from test:
+        operation_selector.children().remove();
+        $.each(operations, function(k) {
+            var option = $('<option/>').attr('value', k).text(k);
+            if (operation == k) {
+                option.attr('selected','selected');
+            }
+            operation_selector.append(option);
+        });
+
+        var getMetricValue = function(d) {
+            if (metric_index >= 0) {
+                //This is one of the metrics directly reported by stress:
+                return d[metric_index];
+            } else {
+                //This metric is not reported by stress, so compute it ourselves:
+                if (metric == 'num_timeouts') {
+                    return d[stress_metrics.indexOf('interval_op_rate')] - d[stress_metrics.indexOf('interval_key_rate')];
+                }
+            }        
+        };
+
+        //Parse the dates:
+        data.forEach(function(d) {
+            d.date = new Date(Date.parse(d.date));
+        });
+
+
+        $("svg").remove();
+        //Setup initial zoom level:
+        defaultZoom = function(initialize) {
+            if (!initialize) {
+                //Reset zoom query params:
+                query.xmin = xmin = undefined;
+                query.xmax = xmax = undefined;
+                query.ymin = ymin = undefined;
+                query.ymax = ymax = undefined;
+            }
+            query.xmin = xmin = query.xmin ? query.xmin : 0;
+            query.xmax = xmax = query.xmax ? query.xmax : Math.round(d3.max(data, function(d) {
+                if (d.intervals.length > 0) {
+                    return d.intervals[d.intervals.length-1][time_index];
+                }
+            }) * 1.1 * 100) / 100;
+            query.ymin = ymin = query.ymin ? query.ymin : 0;
+            query.ymax = ymax = query.ymax ? query.ymax : Math.round(d3.max(data, function(d) {
+                return d3.max(d.intervals, function(i) {
+                    return getMetricValue(i);
+                });
+            }) * 1.1 * 100) / 100;
+            $("#xmin").val(xmin);
+            $("#xmax").val(xmax);
+            $("#ymin").val(ymin);
+            $("#ymax").val(ymax);
+            var updateX = function() {
+                query.xmin = xmin = $("#xmin").val();
+                query.xmax = xmax = $("#xmax").val();
+                x.domain([xmin,xmax]);
+                updateURLBar();
+            };
+            var updateY = function() {
+                query.ymin = ymin = $("#ymin").val();
+                query.ymax = ymax = $("#ymax").val();
+                y.domain([ymin, ymax]);
+                updateURLBar();
+            };
+            $("#xmin,#xmax").unbind().change(function(e) {
+                updateX();
+                redrawLines();
+            });
+            $("#ymin,#ymax").unbind().change(function(e) {
+                updateY();
+                redrawLines();
+            });
+            // The first time defaultZoom is called, we pass
+            // initialize=true, and we do not call the change() method
+            // yet. On subsequent calls, without initialize, we do.
+            if (!initialize) {
+                updateX();
+                updateY();
+                redrawLines();
+            }
+        }
+        defaultZoom(true);
+
+        $("#reset_zoom").click(function(e) {
+            defaultZoom();
+            e.preventDefault();
+        });
+
+        //Setup chart:
+        var margin = {top: 20, right: 1180, bottom: 4240, left: 60};
+        var width = 2060 - margin.left - margin.right;
+        var height = 4700 - margin.top - margin.bottom;
+
+        var x = d3.scale.linear()
+            .domain([xmin, xmax])
+            .range([0, width]);
+
+        var y = d3.scale.linear()
+            .domain([ymin, ymax])
+            .range([height, 0]);
+
+        var color = d3.scale.category10();
+        color.domain(data.map(function(d){return d.title}));
+
+        var xAxis = d3.svg.axis()
+            .scale(x)
+            .orient("bottom");
+
+        var yAxis = d3.svg.axis()
+            .scale(y)
+            .orient("left");
+
+        var line = d3.svg.line()
+            .interpolate("basis")
+            .x(function(d) { 
+                return x(d[time_index]); //time in seconds
+            })
+            .y(function(d) { 
+                return y(getMetricValue(d));
+            });
+        
+        $("body").append("<div id='svg_container'>");
+
+        var redrawLines = function() {
+            svg.select(".x.axis").call(xAxis);
+            svg.select(".y.axis").call(yAxis);
+            svg.selectAll(".line")
+                .attr("class","line")
+                .attr("d", function(d) {
+                    return line(d.intervals);
+                })
+            $("#xmin").val(x.domain()[0]);
+            $("#xmax").val(x.domain()[1]);
+            $("#ymin").val(y.domain()[0]);
+            $("#ymax").val(y.domain()[1]);
+        }
+
+        var zoom = d3.behavior.zoom()
+            .x(x)
+            .y(y)
+            .on("zoom", redrawLines);
+
+        var svg = d3.select("div#svg_container").append("svg")
+            .attr("width", width + margin.left + margin.right + 250)
+            .attr("height", height + margin.top + margin.bottom)
+            .append("g")
+            .attr("transform", "translate(" + margin.left + "," + margin.top + ")")
+
+        // Clip Path
+        svg.append("svg:clipPath")
+            .attr("id", "chart_clip")
+            .append("svg:rect")
+            .attr("width", width)
+            .attr("height", height);
+
+        // Chart title
+        svg.append("text")
+            .attr("x", width / 2 )
+            .attr("y", 0 )
+            .style('font-size', '2em')
+            .style("text-anchor", "middle")
+            .text(raw_data.title + ' - ' + operation);
+
+        // Chart subtitle
+        svg.append("text")
+            .attr("x", width / 2 )
+            .attr("y", 15 )
+            .style('font-size', '1.2em')
+            .style("text-anchor", "middle")
+            .text((raw_data.subtitle ? raw_data.subtitle : ''));
+
+        // x-axis - time
+        svg.append("g")
+            .attr("class", "x axis")
+            .attr("transform", "translate(0," + height + ")")
+            .call(xAxis);
+
+        // x-axis label   
+        svg.append("text")
+            .attr("x", width / 2 )
+            .attr("y", height + 30 )
+            .style("text-anchor", "middle")
+            .style("font-size", "1.2em")
+            .text(stress_metric_names['time']);
+
+        // y-axis
+        svg.append("g")
+            .attr("class", "y axis")
+            .call(yAxis)
+            .append("text")
+            .attr("transform", "rotate(-90)")
+            .attr("y", -60)
+            .attr("dy", ".91em")
+            .style("font-size", "1.2em")
+            .style("text-anchor", "end")
+            .text(stress_metric_names[metric]);
+
+        var trial = svg.selectAll(".trial")
+            .data(data)
+            .enter().append("g")
+            .attr("class", "trial")
+            .attr("title", function(d) {
+                return d.title;
+            });
+
+        // Draw benchmarked data:
+        trial.append("path")
+            .attr("class", "line")
+            .attr("clip-path", "url(#chart_clip)")
+            .attr("d", function(d) {
+                return line(d.intervals);
+            })
+            .style("stroke", function(d) { return color(d.title); });
+
+        var legend = svg.selectAll(".legend")
+            .data(color.domain())
+            .enter().append("g")
+            .attr("class", "legend")
+            .attr("transform", function(d, i) {
+                if (show_aggregates == true) {
+                    var y_offset = 425 + (i*240) + 70;
+                } else {
+                    var y_offset = 425 + (i*25) + 70;
+                }
+                var x_offset = -550;
+                return "translate(" + x_offset + "," + y_offset + ")"; 
+            });
+
+        var renderLegendText = function(linenum, getTextCallback) {
+            legend.append("text")
+                .attr("x", width - 24 - 250)
+                .attr("y", 12*linenum)
+                .attr("dy", ".35em")
+                .style("font-family", "monospace")
+                .style("font-size", "1.2em")
+                .style("text-anchor", "start")
+                .text(function(d) { 
+                    return getTextCallback(d);
+                });
+        };
+
+        var padTextEnd = function(text, length) {
+            for(var x=text.length; x<length; x++) {
+                text = text + '\u00A0';
+            }
+            return text;
+        };
+        var padTextStart = function(text, length) {
+            for(var x=text.length; x<length; x++) {
+                text = '\u00A0' + text;
+            }
+            return text;
+        };
+
+        renderLegendText(1, function(title) {
+            return padTextStart(title, title.length + 5);
+        });
+
+        if (show_aggregates === true) {
+            renderLegendText(2, function(title) {
+                return '---------------------------------------';
+            });
+
+            renderLegendText(3, function(title) {
+                return padTextEnd('op rate', 26) + " : " + data_by_title[title]['op rate'];
+            });
+
+            renderLegendText(4, function(title) {
+                return padTextEnd('partition rate', 26) + " : " + data_by_title[title]['partition rate'];
+            });
+
+            renderLegendText(5, function(title) {
+                return padTextEnd('row rate', 26) + ' : ' + data_by_title[title]['row rate'];
+            });
+
+            renderLegendText(6, function(title) {
+                return padTextEnd('latency mean', 26) + ' : ' + data_by_title[title]['latency mean'];
+            });
+
+            renderLegendText(7, function(title) {
+                return padTextEnd('latency median', 26) + ' : ' + data_by_title[title]['latency median'];
+            });
+
+            renderLegendText(8, function(title) {
+                return padTextEnd('latency 95th percentile', 26) + ' : ' + data_by_title[title]['latency 95th percentile'];
+            });
+
+            renderLegendText(9, function(title) {
+                return padTextEnd('latency 99th percentile', 26) + ' : ' + data_by_title[title]['latency 99th percentile'];
+            });
+
+            renderLegendText(10, function(title) {
+                return padTextEnd('latency 99.9th percentile', 26) + ' : ' + data_by_title[title]['latency 99.9th percentile'];
+            });
+
+            renderLegendText(11, function(title) {
+                return padTextEnd('latency max', 26) + ' : ' + data_by_title[title]['latency max'];
+            });
+
+            renderLegendText(12, function(title) {
+                return padTextEnd('total gc count', 26) + ' : ' + data_by_title[title]['total gc count'];
+            });
+
+            renderLegendText(13, function(title) {
+                return padTextEnd('total gc mb', 26) + ' : ' + data_by_title[title]['total gc mb'];
+            });
+
+            renderLegendText(14, function(title) {
+                return padTextEnd('total gc time (s)', 26) + ' : ' + data_by_title[title]['total gc time (s)'];
+            });
+
+            renderLegendText(15, function(title) {
+                return padTextEnd('avg gc time(ms)', 26) + ' : ' + data_by_title[title]['avg gc time(ms)'];
+            });
+
+            renderLegendText(16, function(title) {
+                return padTextEnd('stdev gc time(ms)', 26) + ' : ' + data_by_title[title]['stdev gc time(ms)'];
+            });
+
+            renderLegendText(17, function(title) {
+                return padTextEnd('Total operation time', 26) + ' : ' + data_by_title[title]['Total operation time'];
+            });
+
+            renderLegendText(18, function(title) {
+                var cmd = data_by_title[title]['command'];
+                return 'cmd: ' + cmd;
+            });
+        }
+        legend.append("rect")
+            .attr("x", width - 270)
+            .attr("width", 18)
+            .attr("height", 18)
+            .attr("class", "legend-rect")
+            .attr("title", function(title) {
+                return title;
+            })
+            .style("fill", color);
+
+        //Make trials hideable by double clicking on the colored legend box
+        $("rect.legend-rect").click(function() {
+            $("g.trial[title='" + $(this).attr('title') + "']").toggle();
+        });
+
+
+        // Chart control callbacks:
+        metric_selector.unbind().change(function(e) {
+            // change the metric in the url to reload the page:
+            metric = query.metric = this.value;
+            metric_index = stress_metrics.indexOf(metric);
+            graph_callback();
+            defaultZoom();
+        });
+        operation_selector.unbind().change(function(e) {
+            // change the metric in the url to reload the page:
+            operation = query.operation = this.value;
+            graph_callback();
+            defaultZoom();
+        });
+        smoothing_selector.unbind().change(function(e) {
+            // change the metric in the url to reload the page:
+            smoothing = query.smoothing = this.value;
+            graph_callback();
+            defaultZoom();
+        });
+        show_aggregates_checkbox.unbind().change(function(e) {
+            show_aggregates = query.show_aggregates = this.checked;
+            graph_callback();
+        });
+
+        updateURLBar();
+
+        $("#dl-test-data").attr("href",stats_db);
+
+        // Chart zoom/drag surface
+        // This should always be last, so it's on top of everything else
+        svg.append("svg:rect")
+            .attr("id", "zoom_drag_surface")
+            .attr("width", width)
+            .attr("height", height);
+    }
+
+    raw_data = stats;
+    graph_callback();
+
+}
+
+$(document).ready(function(){
+    
+    drawGraph();
+    
+});
+      -->
+  </script>
+  <style type="text/css">
+div#chart_controls {
+    margin-left: 900px;
+    position: absolute;
+}
+
+#chart_controls > table {
+    width: 640px;
+}
+
+#chart_controls td {
+    padding: 2px;
+}
+
+#chart_controls #zoom input {
+    width: 50px;
+}
+
+#chart_controls table#zoom {
+    padding-left: 20px;
+}
+
+#chart_controls table#zoom td {
+    padding-left: 20px;
+}
+
+#zoom_drag_surface {
+    fill: rgba(250, 250, 255, 0.0);
+    z-index: 100;
+}
+
+svg {
+  font: 10px sans-serif;
+}
+
+.axis path,
+.axis line {
+  fill: none;
+  stroke: #000;
+  shape-rendering: crispEdges;
+}
+
+.x.axis path {
+  display: none;
+}
+
+.line {
+  fill: none;
+  stroke: steelblue;
+  stroke-width: 1.5px;
+}
+
+  </style>
+</head>
+<body>
+</body>
+
commit	a98552e65c823238862893c8cf515f866cccbbd7	[log] [tgz]
author	Marcus Eriksson <marcuse@apache.org>	Mon Aug 01 10:46:08 2016 +0200
committer	Marcus Eriksson <marcuse@apache.org>	Mon Aug 01 10:46:08 2016 +0200
tree	f200948c154b84ca248d8752dbd70c1d5d3f708f
parent	5fb6f95859f3b08e1037439ce1fe1475434af1dd [diff]
parent	10e719cb6c15b38ff2ae769734d0508509d2ea22 [diff]