CONTRIBUTING.md - cassandra-java-driver - Git at Google

 # Contributing guidelines

 ## Code formatting

 ### Java

 We follow the [Google Java Style Guide](https://google.github.io/styleguide/javaguide.html). See
 https://github.com/google/google-java-format for IDE plugins. The rules are not configurable.

 The build will fail if the code is not formatted. To format all files from the command line, run:

 ```
 mvn fmt:format
 ```

 Some aspects are not covered by the formatter: braces must be used with `if`, `else`, `for`, `do`
 and `while` statements, even when the body is empty or contains only a single statement.

 ### XML

 The build will fail if XML files are not formatted correctly. Run the following command before you
 commit:

 ```java
 mvn xml-format:xml-format
 ```

 The formatter does not enforce a maximum line length, but please try to keep it below 100 characters
 to keep files readable across all mediums (IDE, terminal, Github...).

 ### Other text files (markdown, etc)

 Similarly, enforce a right margin of 100 characters in those files. Editors and IDEs generally have
 a way to configure this (for IDEA, install the "Wrap to column" plugin).

 ## Coding style -- production code

 Do not use static imports. They make things harder to understand when you look at the code
 someplace where you don't have IDE support, like Github's code view.

 Avoid abbreviations in class and variable names. A good rule of thumb is that you should only use
 them if you would also do so verbally, for example "id" and "config" are probably reasonable.
 Single-letter variables are permissible if the variable scope is only a few lines, or for commonly
 understood cases (like `i` for a loop index).

 Keep source files short. Short files are easy to understand and test. The average should probably
 be around 200-300 lines.

 ### Javadoc

 All types in "API" packages must be documented. For "internal" packages, documentation is optional,
 but in no way discouraged: it's generally a good idea to have a class-level comment that explains
 where the component fits in the architecture, and anything else that you feel is important.

 You don't need to document every parameter or return type, or even every method. Don't document
 something if it is completely obvious, we don't want to end up with this:

 ```java
 /**
  * Returns the name.
  *
  * @return the name
  */
 String getName();
 ```

 On the other hand, there is often something useful to say about a method, so most should have at
 least a one-line comment. Use common sense.

 Driver users coding in their IDE should find the right documentation at the right time. Try to
 think of how they will come into contact with the class. For example, if a type is constructed with
 a builder, each builder method should probably explain what the default is when you don't call it.

 Avoid using too many links, they can make comments harder to read, especially in the IDE. Link to a
 type the first time it's mentioned, then use a text description ("this registry"...) or an `@code`
 block. Don't link to a class in its own documentation. Don't link to types that appear right below
 in the documented item's signature.

 ```java
 /**
 * @return this {@link Builder} <-- completely unnecessary
 */
 Builder withLimit(int limit) {
 ```

 ### Logs

 We use SLF4J; loggers are declared like this:

 ```java
 private static final Logger LOG = LoggerFactory.getLogger(TheEnclosingClass.class);
 ```

 Logs are intended for two personae:

 * Ops who manage the application in production.
 * Developers (maybe you) who debug a particular issue.

 The first 3 log levels are for ops:

 * `ERROR`: something that renders the driver -- or a part of it -- completely unusable. An action is
   required to fix it: bouncing the client, applying a patch, etc.
 * `WARN`: something that the driver can recover from automatically, but indicates a configuration or
   programming error that should be addressed. For example: the driver connected successfully, but
   one of the contact points in the configuration was malformed; the same prepared statement is being
   prepared multiple time by the application code.
 * `INFO`: something that is part of the normal operation of the driver, but might be useful to know
   for an operator. For example: the driver has initialized successfully and is ready to process
   queries; an optional dependency was detected in the classpath and activated an enhanced feature.

 Do not log errors that are rethrown to the client (such as the error that you're going to complete a
 request with). This is annoying for ops because they see a lot of stack traces that require no
 actual action on their part, because they're already handled by application code.

 Similarly, do not log stack traces for non-critical errors. If you still want the option to get the
 trace for debugging, see the `Loggers.warnWithException` utility.

 The last 2 levels are for developers, to help follow what the driver is doing from a "black box"
 perspective (think about debugging an issue remotely, and all you have are the logs).

 * `TRACE`: anything that happens **for every user request**. Not only request handling, but all
   related components (e.g. timestamp generators, policies, etc).
 * `DEBUG`: everything else. For example, node state changes, control connection activity, etc.

 Note that `DEBUG` and `TRACE` can coexist within the same component, for example the LBP
 initializing is a one-time event, but returning a query plan is a per-request event.

 Logs statements start with a prefix that identifies its origin, for example:

 * for components that are unique to the cluster instance, just the cluster name: `[c0]`.
 * for sessions, the cluster name + a generated unique identifier: `[c0|s0]`.
 * for channel pools, the session identifier + the address of the node: `[c0|s0|/127.0.0.2:9042]`.
 * for channels, the identifier of the owner (session or control connection) + the Netty identifier,
   which indicates the local and remote ports:
   `[c0|s0|id: 0xf9ef0b15, L:/127.0.0.1:51482 - R:/127.0.0.1:9042]`.
 * for request handlers, the session identifier, a unique identifier, and the index of the
   speculative execution: `[c0|s0|1077199500|0]`.

 Tests run with the configuration defined in `src/test/resources/logback-test.xml`. The default level
 for driver classes is `WARN`, but you can override it with a system property: `-DdriverLevel=DEBUG`.
 A nice setup is to use `DEBUG` when you run from your IDE, and keep the default for the command
 line.

 When you add or review new code, take a moment to run the tests in `DEBUG` mode and check if the
 output looks good.

 ### Don't abuse the stream API

 The `java.util.stream` API is often used (abused?) as a "functional API for collections":

 ```java
 List<Integer> sizes = words.stream().map(String::length).collect(Collectors.toList());
 ```

 The perceived advantages of this approach over traditional for-loops are debatable:

 * readability: this is highly subjective. But consider the following:
   * everyone can read for-loops, whether they are familiar with the Stream API or not. The opposite
     is not true.
   * the stream API does not spell out all the details: what kind of list does `Collectors.toList()`
     return? Is it pre-sized? Mutable? Thread-safe?
   * the stream API looks pretty on simple examples, but things can get ugly fast. Try rewriting
     `NetworkTopologyReplicationStrategy` with streams.
 * concision: this is irrelevant. When we look at code we care about maintainability, not how many
   keystrokes the author saved. The for-loop version of the above example is just 5 lines long, and
   your brain doesn't take longer to parse it.

 The bottom line: don't try to "be functional" at all cost. Plain old for-loops are often just as
 simple.

 ### Never assume a specific format for `toString()`

 Only use `toString()` for debug logs or exception messages, and always assume that its format is
 unspecified and can change at any time.

 If you need a specific string representation for a class, make it a dedicated method with a
 documented format, for example `toCqlLiteral`. Otherwise it's too easy to lose track of the intended
 usage and break things: for example, someone modifies your `toString()` method to make their logs
 prettier, but unintentionally breaks the script export feature that expected it to produce CQL
 literals.

 `toString()` can delegate to `toCqlLiteral()` if that is appropriate for logs.


 ### Concurrency annotations

 We use the [JCIP annotations](http://jcip.net/annotations/doc/index.html) to document thread-safety
 policies.

 Add them for all new code, with the exception of:

 * enums and interfaces;
 * utility classes (only static methods);
 * test code.

 Make sure you import the types from `net.jcip`, there are homonyms in the classpath.


 ### Nullability annotations

 We use the [Spotbugs annotations](https://spotbugs.github.io) to document nullability of parameters,
 method return types and class members.

 Please annotate any new class or interface with the appropriate annotations: `@NonNull`, `@Nullable`. Make sure you import
 the types from `edu.umd.cs.findbugs.annotations`, there are homonyms in the classpath.


 ## Coding style -- test code

 Static imports are permitted in a couple of places:
 * All AssertJ methods, e.g.:
   ```java
   assertThat(node.getDatacenter()).isNotNull();
   fail("Expecting IllegalStateException to be thrown");
   ```
 * All Mockito methods, e.g.:
   ```java
   when(codecRegistry.codecFor(DataTypes.INT)).thenReturn(codec);
   verify(codec).decodePrimitive(any(ByteBuffer.class), eq(ProtocolVersion.DEFAULT));
   ```
 * All Awaitility methods, e.g.:
   ```java
   await().until(() -> somethingBecomesTrue());
   ```

 Test methods names use lower snake case, generally start with `should`, and clearly indicate the
 purpose of the test, for example: `should_fail_if_key_already_exists`. If you have trouble coming
 up with a simple name, it might be a sign that your test does too much, and should be split.

 We use AssertJ (`assertThat`) for assertions. Don't use JUnit assertions (`assertEquals`,
 `assertNull`, etc).

 Don't try to generify at all cost: a bit of duplication is acceptable, if that helps keep the tests
 simple to understand (a newcomer should be able to understand how to fix a failing test without
 having to read too much code).

 Test classes can be a bit longer, since they often enumerate similar test cases. You can also
 factor some common code in a parent abstract class named with "XxxTestBase", and then split
 different families of tests into separate child classes. For example, `CqlRequestHandlerTestBase`,
 `CqlRequestHandlerRetryTest`, `CqlRequestHandlerSpeculativeExecutionTest`...

 ### Unit tests

 They live in the same module as the code they are testing. They should be fast and not start any
 external process. They usually target one specific component and mock the rest of the driver
 context.

 ### Integration tests

 They live in the `integration-tests` module, and exercise the whole driver stack against an external
 process, which can be either one of:
 * [Simulacron](https://github.com/datastax/simulacron): simulates Cassandra nodes on loopback
   addresses; your test must "prime" data, i.e. tell the nodes what results to return for
   pre-determined queries.

     For an example of a Simulacron-based test, see `NodeTargetingIT`.
 * [CCM](https://github.com/pcmanus/ccm): launches actual Cassandra nodes locally. The `ccm`
   executable must be in the path.

     You can pass a `-Dccm.version` system property to the build to target a particular Cassandra
     version (it defaults to 3.11.0). `-Dccm.directory` allows you to point to a local installation
     -- this can be a checkout of the Cassandra codebase, as long as it's built. See `CcmBridge` in
     the driver codebase for more details.

     For an example of a CCM-based test, see `PlainTextAuthProviderIT`.

 #### Categories

 Integration tests are divided into three categories:

 ##### Parallelizable tests

 These tests can be run in parallel, to speed up the build. They either use:
 * dedicated Simulacron instances. These are lightweight, and Simulacron will manage the ports to
   make sure that there are no collisions.
 * a shared, one-node CCM cluster. Each test works in its own keyspace.

 The build runs them with a configurable degree of parallelism (currently 8). The shared CCM cluster
 is initialized the first time it's used, and stopped before moving on to serial tests. Note that we
 run with `parallel=classes`, which means methods within the same class never run concurrent to each
 other.

 To make an integration test parallelizable, annotate it with `@Category(ParallelizableTests.class)`.
 If you use CCM, it **must** be with `CcmRule`.

 For an example of a Simulacron-based parallelizable test, see `NodeTargetingIT`. For a CCM-based
 test, see `DirectCompressionIT`.

 ##### Serial tests

 These tests cannot run in parallel, in general because they require CCM clusters of different sizes,
 or with a specific configuration (we never run more than one CCM cluster simultaneously: it would be
 too resource-intensive, and too complicated to manage all the ports).

 The build runs them one by one, after the parallelizable tests.

 To make an integration test serial, do not annotate it with `@Category`. The CCM rule **must** be
 `CustomCcmRule`.

 For an example, see `DefaultLoadBalancingPolicyIT`.

 Note: if multiple serial tests have a common "base" class, do not pull up `CustomCcmRule`, each
 child class must have its own instance. Otherwise they share the same CCM instance, and the first
 one destroys it on teardown. See `TokenITBase` for how to organize code in those cases.

 ##### Isolated tests

 Not only can those tests not run in parallel, they also require specific environment tweaks,
 typically system properties that need to be set before initialization.

 The build runs them one by one, *each in its own JVM fork*, after the serial tests.

 To isolate an integration test, annotate it with `@Category(IsolatedTests.class)`. The CCM rule
 **must** be `CustomCcmRule`.

 For an example, see `HeapCompressionIT`.

 #### About test rules

 Do not mix `CcmRule` and `SimulacronRule` in the same test. It makes things harder to follow, and
 can be inefficient (if the `SimulacronRule` is method-level, it will create a Simulacron cluster for
 every test method, even those that only need CCM).

 ##### Class-level rules

 Rules annotated with `@ClassRule` wrap the whole test class, and are reused across methods. Try to
 use this as much as possible, as it's more efficient. The fields need to be static; also make them
 final and use constant naming conventions, like `CCM_RULE`.

 When you use a server rule (`CcmRule` or `SimulacronRule`) and a `SessionRule` at the same level,
 wrap them into a rule chain to ensure proper initialization order:

 ```java
 private static final CcmRule CCM_RULE = CcmRule.getInstance();
 private static final SessionRule<CqlSession> SESSION_RULE = SessionRule.builder(CCM_RULE).build();

 @ClassRule
 public static final TestRule CHAIN = RuleChain.outerRule(CCM_RULE).around(SESSION_RULE);
 ```

 ##### Method-level rules

 Rules annotated with `@Rule` wrap each test method. Use lower-camel case for field names:

 ```java
 private CcmRule ccmRule = CcmRule.getInstance();
 private SessionRule<CqlSession> sessionRule = SessionRule.builder(ccmRule).build();

 @ClassRule
 public TestRule chain = RuleChain.outerRule(ccmRule).around(sessionRule);
 ```

 Only use this for:

 * CCM tests that use `@CassandraRequirement` or `@DseRequirement` restrictions at the method level
   (ex: `BatchStatementIT`).
 * tests where you *really* need to restart from a clean state for every method.

 ##### Mixed

 It's also possible to use a `@ClassRule` for CCM / Simulacron, and a `@Rule` for the session rule.
 In that case, you don't need to use a rule chain.

 ## Running the tests

 ### Unit tests

     mvn clean test

 This currently takes about 30 seconds. The goal is to keep it within a couple of minutes (it runs
 for each commit if you enable the pre-commit hook -- see below).

 ### Integration tests

     mvn clean verify

 This currently takes about 9 minutes. We don't have a hard limit, but ideally it should stay within
 30 minutes to 1 hour.

 You can skip test categories individually with `-DskipParallelizableITs`, `-DskipSerialITs` and
 `-DskipIsolatedITs` (`-DskipITs` still works to skip them all at once).

 ### Configuring MacOS for Simulacron

 Simulacron (used in integration tests) relies on loopback aliases to simulate multiple nodes. On
 Linux or Windows, you shouldn't have anything to do. On MacOS, run this script:

 ```
 #!/bin/bash
 for sub in {0..4}; do
     echo "Opening for 127.0.$sub"
     for i in {0..255}; do sudo ifconfig lo0 alias 127.0.$sub.$i up; done
 done
 ```

 Note that this is known to cause temporary increased CPU usage in OS X initially while mDNSResponder
 acclimates itself to the presence of added IP addresses. This lasts several minutes. Also, this does
 not survive reboots.


 ## License headers

 The build will fail if some license headers are missing. To update all files from the command line,
 run:

 ```
 mvn license:format
 ```

 ## Pre-commit hook (highly recommended)

 Ensure `pre-commit.sh` is executable, then run:

 ```
 ln -s ../../pre-commit.sh .git/hooks/pre-commit
 ```

 This will only allow commits if the tests pass. It is also a good reminder to keep the test suite
 short.

 Note: the tests run on the current state of the working directory. I tried to add a `git stash` in
 the script to only test what's actually being committed, but I couldn't get it to run reliably
 (it's still in there but commented). Keep this in mind when you commit, and don't forget to re-add
 the changes if the first attempt failed and you fixed the tests.

 ## Speeding up the build for local tests

 If you need to install something in your local repository quickly, you can use the `fast` profile to
 skip all "non-essential" checks (licenses, formatting, tests, etc):

 ```
 mvn clean install -Pfast
 ```

 You can speed things up even more by targeting specific modules with the `-pl` option:

 ```
 mvn clean install -Pfast -pl core,query-builder,mapper-runtime,mapper-processor,bom
 ```

 Please run the normal build at least once before you push your changes.

 ## Commits

 Keep your changes **focused**. Each commit should have a single, clear purpose expressed in its
 message.

 Resist the urge to "fix" cosmetic issues (add/remove blank lines, move methods, etc.) in existing
 code. This adds cognitive load for reviewers, who have to figure out which changes are relevant to
 the actual issue. If you see legitimate issues, like typos, address them in a separate commit (it's
 fine to group multiple typo fixes in a single commit).

 Isolate trivial refactorings into separate commits. For example, a method rename that affects dozens
 of call sites can be reviewed in a few seconds, but if it's part of a larger diff it gets mixed up
 with more complex changes (that might affect the same lines), and reviewers have to check every
 line.

 Commit message subjects start with a capital letter, use the imperative form and do **not** end
 with a period:

 * correct: "Add test for CQL request handler"
 * incorrect: "~~Added test for CQL request handler~~"
 * incorrect: "~~New test for CQL request handler~~"

 Avoid catch-all messages like "Minor cleanup", "Various fixes", etc. They don't provide any useful
 information to reviewers, and might be a sign that your commit contains unrelated changes.

 We don't enforce a particular subject line length limit, but try to keep it short.

 You can add more details after the subject line, separated by a blank line. The following pattern
 (inspired by [Netty](http://netty.io/wiki/writing-a-commit-message.html)) is not mandatory, but
 welcome for complex changes:

 ```
 One line description of your change

 Motivation:

 Explain here the context, and why you're making that change.
 What is the problem you're trying to solve.

 Modifications:

 Describe the modifications you've done.

 Result:

 After your change, what will change.
 ```

 ## Pull requests

 Like commits, pull requests should be focused on a single, clearly stated goal.

 Don't base a pull request onto another one, it's too complicated to follow two branches that evolve
 at the same time. If a ticket depends on another, wait for the first one to be merged.

 If you have to address feedback, avoid rewriting the history (e.g. squashing or amending commits):
 this makes the reviewers' job harder, because they have to re-read the full diff and figure out
 where your new changes are. Instead, push a new commit on top of the existing history; it will be
 squashed later when the PR gets merged. If the history is complex, it's a good idea to indicate in
 the message where the changes should be squashed:

 ```
 * 20c88f4 - Address feedback (to squash with "Add metadata parsing logic") (36 minutes ago)
 * 7044739 - Fix various typos in Javadocs (2 days ago)
 * 574dd08 - Add metadata parsing logic (2 days ago)
 ```

 (Note that the message refers to the other commit's subject line, not the SHA-1. This way it's still
 relevant if there are intermediary rebases.)

 If you need new stuff from the base branch, it's fine to rebase and force-push, as long as you don't
 rewrite the history. Just give a heads up to the reviewers beforehand. Don't push a merge commit to
 a pull request.