docs/contributing.adoc - kudu - Git at Google

 // Licensed to the Apache Software Foundation (ASF) under one
 // or more contributor license agreements.  See the NOTICE file
 // distributed with this work for additional information
 // regarding copyright ownership.  The ASF licenses this file
 // to you under the Apache License, Version 2.0 (the
 // "License"); you may not use this file except in compliance
 // with the License.  You may obtain a copy of the License at
 //
 //   http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing,
 // software distributed under the License is distributed on an
 // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 // KIND, either express or implied.  See the License for the
 // specific language governing permissions and limitations
 // under the License.

 [[contributing]]
 = Contributing to Apache Kudu
 :author: Kudu Team
 :imagesdir: ./images
 :icons: font
 :toc:
 :toclevels: 3
 :doctype: book
 :backend: html5
 :sectlinks:
 :experimental:

 == Contributing Patches Using Gerrit

 The Kudu team uses Gerrit for code review, rather than Github pull requests. Typically,
 you pull from Github but push to Gerrit, and Gerrit is used to review code and merge
 it into Github.

 See the link:https://www.mediawiki.org/wiki/Gerrit/Tutorial[Gerrit Tutorial]
 for an overview of using Gerrit for code review.

 === Initial Setup for Gerrit

 . Sign in to link:https://gerrit.cloudera.org/[Gerrit] using your Github username.

 . Go to link:https://gerrit.cloudera.org/settings/[Settings]. Update your name
 and email address on the *Contact Information* page, and upload a SSH public
 key under *SSH Public Keys* if you would like to use SSH to connect to Gerrit.
 Generate an HTTP password under *HTTP Password* if you would like to use HTTP
 or HTTPS to connect to Gerrit. (Most Kudu developers use the SSH option.)
 +
 NOTE: If you do not update your name, it will appear as "Anonymous Coward" in
 Gerrit reviews.

 . If you have not done so, clone the main Kudu repository. By default, the main remote
 is called `origin`. When you fetch or pull, you will do so from `origin`.
 +
 [source,bash]
 ----
 git clone https://github.com/apache/kudu
 ----

 . Change to the new `kudu` directory.

 . Add a `gerrit` remote.
 +
 If using SSH to connect to Gerrit, use the following command to add the Gerrit
 remote (substitute <username> with your Github username):
 +
 [source,bash]
 ----
 git remote add gerrit ssh://<username>@gerrit.cloudera.org:29418/kudu
 ----
 +
 If using HTTP or HTTPS to connect to Gerrit, use the following command to add
 the Gerrit remote (http:// also works):
 +
 [source,bash]
 ----
 git remote add gerrit https://gerrit.cloudera.org/a/kudu
 ----
 +
 [TIP]
 ====
 If you are using Gerrit's HTTP or HTTPS endpoints and prefer not to type a
 username and password each time you submit a patch, you can put your login and
 password into a `.netrc` file located at `$HOME/.netrc` and Git will use it.
 The password is stored as plaintext and the file format is as follows:

 [source]
 ----
 machine <hostname>
   login <username>
   password <password>
 ----
 ====

 . Run the following command to install the Gerrit `commit-msg` hook:
 +
 ----
 cd kudu
 gitdir=$(git rev-parse --git-dir)
 curl -LSsf https://gerrit.cloudera.org/tools/hooks/commit-msg -o ${gitdir}/hooks/commit-msg
 chmod +x ${gitdir}/hooks/commit-msg
 ----

 . Be sure you have set the Kudu repository to use `pull --rebase` by default. You
 can use the following two commands, assuming you have only ever checked out `master`
 so far:
 +
 ----
 git config branch.autosetuprebase always
 git config branch.master.rebase true
 ----
 +
 If for some reason you had already checked out branches other than `master`, substitute
 `master` for the other branch names in the second command above.

 === Submitting Patches

 To submit a patch, first commit your change (using a descriptive multi-line
 commit message if possible), then push the request to the `gerrit` remote. For instance, to push a change
 to the `master` branch:
 ----
 git push gerrit HEAD:refs/for/master --no-thin
 ----

 or to push a change to the `gh-pages` branch (to update the website):
 ----
 git push gerrit HEAD:refs/for/gh-pages --no-thin
 ----

 TIP: While preparing a patch for review, it's a good idea to follow
 link:https://git-scm.com/book/en/v2/Distributed-Git-Contributing-to-a-Project#_commit_guidelines[generic git commit guidelines and good practices].

 NOTE: The `--no-thin` argument is a workaround to prevent an error in Gerrit. See
 https://code.google.com/p/gerrit/issues/detail?id=1582.

 TIP: Consider creating Git aliases for the above commands. Gerrit also includes
 a command-line tool called
 link:https://www.mediawiki.org/wiki/Gerrit/Tutorial#Installing_git-review[git-review],
 which you may find helpful.

 TIP: You can add reviewers automatically for a patch by adding their GitHub
 username or associated email address to the remote branch name following with
 the "r" flag:
 ----
 git push gerrit HEAD:refs/for/master%r=githubuser,r=example@apache.org
 ----

 TIP: To find possible reviewer candidates for your commit, use git blame or git
 log to find out who are involved with the area you're touching. It's also a
 good idea to add as reviewer whoever is involved with the JIRA you're working
 on.

 Gerrit will add a change ID to your commit message and will create a Gerrit review,
 whose URL will be emitted as part of the push reply. If desired, you can send a message
 to the `kudu-dev` mailing list, explaining your patch and requesting review.

 After getting feedback, you can update or amend your commit, (for instance, using
 a command like `git commit --amend`) while leaving the Change
 ID intact. Push your change to Gerrit again, and this will create a new patch set
 in Gerrit and notify all reviewers about the change.

 When your code has been reviewed and is ready to be merged into the Kudu code base,
 a Kudu committer will merge it using Gerrit. You can discard your local branch.

 === Abandoning a Review

 If your patch is not accepted or you decide to pull it from consideration, you can
 use the Gerrit UI to *Abandon* the patch. It will still show in Gerrit's history,
 but will not be listed as a pending review.

 === Reviewing Patches In Gerrit

 You can view a unified or side-by-side diff of changes in Gerrit using the web UI.
 To leave a comment, click the relevant line number or highlight the relevant part
 of the line, and type 'c' to bring up a comment box. To submit your comments and/or
 your review status, go up to the top level of the review and click *Reply*. You can
 add additional top-level comments here, and submit them.

 To check out code from a Gerrit review, click *Download* and paste the relevant Git
 commands into your Git client. You can then update the commit and push to Gerrit to
 submit a patch to the review, even if you were not the original reviewer.

 Gerrit allows you to vote on a review. A vote of `+2` from at least one committer
 (besides the submitter) is required before the patch can be merged.

 == Code Style

 === {cpp} Code Style

 Get familiar with these guidelines so that your contributions can be reviewed and
 integrated quickly and easily.

 In general, Kudu follows the
 link:https://google.github.io/styleguide/cppguide.html[Google {cpp} Style Guide].

 A `clang-format` file is provided in `src/kudu/.clang-format` which allows
 automatic formatting of source code whitespacing, indentation, etc. Not all
 existing code conforms to this automatic formatting, so prefer using
 `clang-format-diff` to format only the lines changed by your patch. For example,
 after making a commit, run the following from the root of your checked out
 source:
 +
 [source,bash]
 ----
 git show -U0 | build-support/clang_format_diff.sh -i -p1
 git commit -a --amend
 ----

 === Exceptions from Google Style Guide
 Kudu's code base makes the following notable exceptions from the Google Style Guide
 referenced above:

 ==== Notes on {cpp} 11

 Kudu uses {cpp} 11. Check out this handy guide to {cpp} 11 move semantics and rvalue
 references: https://www.chromium.org/rvalue-references

 We aim to follow most of the same guidelines, such as, where possible, migrating
 away from `foo.Pass()` in favor of `std::move(foo)`.

 ==== Limitations on `boost` Use

 `boost` classes from header-only libraries can be used in cases where a suitable
 replacement does not exist in the Kudu code base. However:

 * Do not introduce dependencies on `boost` classes where equivalent functionality
   exists in the standard {cpp} library or in `src/kudu/gutil/`. For example, prefer
   `strings::Split()` from `gutil` rather than `boost::split`.
 * Prefer using functionality from  `boost` rather than re-implementing the same
   functionality, _unless_ using the `boost` functionality requires excessive use of
   {cpp} features which are disallowed by our style guidelines. For example,
   `boost::spirit` is heavily based on template metaprogramming and should not be used.
 * Do not use `boost` in any public headers for the Kudu {cpp} client, because
   `boost` commonly breaks backward compatibility, and passing data between two
   `boost` versions (one by the user, one by Kudu) causes serious issues.

 When in doubt about introducing a new dependency on any `boost` functionality,
 it is best to email `dev@kudu.apache.org` to start a discussion.

 ==== Line length

 The Kudu team allows line lengths of 100 characters per line, rather than Google's standard of 80. Try to
 keep under 80 where possible, but you can spill over to 100 or so if necessary.

 ==== Pointers

 .Smart Pointers and Singly-Owned Pointers

 Generally, most objects should have clear "single-owner" semantics.
 Most of the time, singly-owned objects can be wrapped in a `unique_ptr<>`
 which ensures deletion on scope exit and prevents accidental copying.

 If an object is singly owned, but referenced from multiple places, such as when
 the pointed-to object is known to be valid at least as long as the pointer itself,
 associate a comment with the constructor which takes and stores the raw pointer,
 as in the following example.

 [source,c++]
 ----
   // 'blah' must remain valid for the lifetime of this class
   MyClass(const Blah* blah) :
     blah_(blah) {
   }
 ----

 WARNING: Using `std::auto_ptr` is strictly disallowed because of its difficult and
 bug-prone semantics. Besides, `std::auto_ptr` is declared deprecated
 since {cpp}11.

 .Smart Pointers for Multiply-Owned Pointers:

 Although single ownership is ideal, sometimes it is not possible, particularly
 when multiple threads are in play and the lifetimes of the pointers are not
 clearly defined. In these cases, you can use either `std::shared_ptr` or
 Kudu's own `scoped_refptr` from _gutil/ref_counted.hpp_. Each of these mechanisms
 relies on reference counting to automatically delete the referent once no more
 pointers remain. The key difference between these two types of pointers is that
 `scoped_refptr` requires that the object extend a `RefCounted` base class, and
 stores its reference count inside the object storage itself, while `shared_ptr`
 maintains a separate reference count on the heap.

 The pros and cons are:

 .`shared_ptr`
 [none]
 * icon:plus-circle[role="green",alt="pro"] can be used with any type of object, without the
 object deriving from a special base class
 * icon:plus-circle[role="green",alt="pro"] part of the standard library and familiar to most
 {cpp} developers
 * icon:plus-circle[role="green",alt="pro"] supports the `weak_ptr` use cases:
  ** a temporary ownership when an object needs to be accessed only if it exists
  ** break circular references of `shared_ptr`, if any exists due to aggregation
 * icon:plus-circle[role="green",alt="pro"] you can convert from the
 `shared_ptr` into the `weak_ptr` and back
 * icon:plus-circle[role="green",alt="pro"] if creating an instance with
 `std::make_shared<>()` only one allocation is made (since {cpp}11;
 a non-binding requirement in the Standard, though)
 * icon:minus-circle[role="red",alt="con"] if creating a new object with
 `shared_ptr<T> p(new T)` requires two allocations (one to create the ref count,
 and one to create the object)
 * icon:minus-circle[role="red",alt="con"] the ref count may not be near the object on the heap,
 so extra cache misses may be incurred on access
 * icon:minus-circle[role="red",alt="con"] the `shared_ptr` instance itself requires 16 bytes
 (pointer to the ref count and pointer to the object)


 .`scoped_refptr`
 [none]
 * icon:plus-circle[pro, role="green"] only requires a single allocation, and ref count
 is on the same cache line as the object
 * icon:plus-circle[pro, role="green"] the pointer only requires 8 bytes (since
 the ref count is within the object)
 * icon:plus-circle[pro, role="green"] you can manually increase or decrease
 reference counts when more control is required
 * icon:plus-circle[pro, role="green"] you can convert from a raw pointer back
 to a `scoped_refptr` safely without worrying about double freeing
 * icon:plus-circle[pro, role="green"] since we control the implementation, we
 can implement features, such as debug builds that capture the stack trace of every
 referent to help debug leaks.
 * icon:minus-circle[con, role="red"] the referred-to object must inherit
 from `RefCounted`
 * icon:minus-circle[con, role="red"] does not support the `weak_ptr` use cases

 Since `scoped_refptr` is generally faster and smaller, try to use it
 rather than `shared_ptr` in new code. Existing code uses `shared_ptr`
 in many places. When interfacing with that code, you can continue to use `shared_ptr`.

 ==== Function Binding and Callbacks

 All code should use {cpp}11 lambdas to capture and manage functors. Functions that
 take a lambda as an argument should use `std::function` as the argument's
 type. Do not use `boost::bind` or `std::bind` to create functors. Lambdas offer
 the compiler greater opportunity to inline, and `std::bind` in particular is
 link:https://abseil.io/tips/108[error-prone] and has a proclivity towards heap
 allocation for storing bound parameters.

 Until Kudu is upgraded to {cpp}14, lambda support will be
 link:https://stackoverflow.com/a/17545183[somewhat incomplete]. For example, it
 is not possible in {cpp}11 to capture an argument by move. Nor is it possible
 to define new variables in the context of a lambda capture. Workarounds for
 these deficiencies exist, and they must be used in the interim.

 ==== GFlags

 Kudu uses gflags for both command-line and file-based configuration. Use these guidelines
 to add a new gflag. All new gflags must conform to these
 guidelines. Existing non-conformant ones will be made conformant in time.

 .Name

 The gflag's name conveys a lot of information, so choose a good name. The name
 will propagate into other systems, such as the
 link:configuration_reference.html[Configuration Reference].

 * The different parts of a multi-word name should be separated by underscores.
   For example, `fs_data_dirs`.
 * The name should be prefixed with the context that it affects. For example,
   `webserver_num_worker_threads` and `cfile_default_block_size`. Context can be
   difficult to define, so bear in mind that this prefix will be
   used to group similar gflags together. If the gflag affects the entire
   process, it should not be prefixed.
 * If the gflag is for a quantity, the name should be suffixed with the units.
   For example, `tablet_copy_idle_timeout_ms`.
 * Where possible, use short names. This will save time for those entering
   command line options by hand.
 * The name is part of Kudu's compatibility contract, and should not change
   without very good reason.

 .Default value

 Choosing a default value is generally simple, but like the name, it propagates
 into other systems.

 * The default value is part of Kudu's compatibility contract, and should not
   change without very good reason.

 .Description

 The gflag's description should supplement the name and provide additional
 context and information. Like the name, the description propagates into other
 systems.

 * The description may include multiple sentences. Each should begin with a
   capital letter, end with a period, and begin one space after the previous.
 * The description should NOT include the gflag's type or default value; they are
   provided out-of-band.
 * The description should be in the third person. Do not use words like `you`.
 * A gflag description can be changed freely; it is not expected to remain the
   same across Kudu releases.

 .Tags

 Kudu's gflag tagging mechanism adds machine-readable context to each gflag, for
 use in consuming systems such as documentation or management tools. See the large block
 comment in _flag_tags.h_ for guidelines.

 .Miscellaneous

 * Avoid creating multiple gflags for the same logical parameter. For
   example, many Kudu binaries need to configure a WAL directory. Rather than
   creating `foo_wal_dir` and `bar_wal_dir` gflags, better to have a single
   `kudu_wal_dir` gflag for use universally.

 === Java Code Style

 ==== Preconditions vs assert in the Kudu Java client

 Use `assert` for verification of the static (i.e. non-runtime) internal
 invariants. Internal means the pre- and post-conditions which are
 completely under control of the code of a class or a function itself and cannot
 be influenced by input parameters and other runtime/dynamic conditions.

 Use `Preconditions` for verification of the input parameters and the other
 conditions which are outside of the control of the local code, or conditions
 which are dependent on the state of other objects/components in runtime.

 [source,java]
 ----
 Object pop() {
   // Use Preconditions here because the external user of the class should not
   // call pop() on an empty stack, but the stack itself is internally consistent
   Preconditions.checkState(curSize > 0, "queue must not be empty");
   Object toReturn = data[--curSize];
   // Use an assert here because if we ended up with a negative size counter,
   // that's an indication of a broken implementation of the stack; i.e. it's
   // an invariant, not a state check.
   assert curSize >= 0;
   return toReturn;
 }
 ----

 However, keep in mind that `assert` checks are enabled only when the JVM is
 run with `-ea` option. So, if some dynamic condition is crucial for the
 overall consistency (e.g. a data loss can occur if some dynamic condition is not
 satisfied and the code continues its execution), consider throwing an
 `AssertionError`:

 [source,java]
 ----
 if (!isCriticalConditionSatisfied) {
   throw new AssertionError("cannot continue: data loss is possible otherwise");
 }
 ----

 ===== Checking code style with Gradle checkStyle

 Before posting a Java patch to Gerrit for review, make sure to check Java code
 style with Gradle `checkstyle` plugin. See
 link:https://docs.gradle.org/current/userguide/checkstyle_plugin.html[Gradle Checkstyle Plugin documentation]
 for more information.
 [source,bash]
 ----
 ./gradlew checkstyle
 ----

 ===== References
 * link:https://docs.oracle.com/javase/8/docs/technotes/guides/language/assert.html[Programming With Assertions]
 * link:https://github.com/google/guava/wiki/PreconditionsExplained[Guava Preconditions Explained]

 === `CMake` Style Guide

 `CMake` allows commands in lower, upper, or mixed case. To keep
 the CMake files consistent, please use the following guidelines:

 * *built-in commands* in lowercase
 ----
 add_subdirectory(some/path)
 ----
 * *built-in arguments* in uppercase
 ----
 message(STATUS "message goes here")
 ----
 * *custom commands or macros* in uppercase
 ----
 ADD_KUDU_TEST(some-test)
 ----

 == Third-party dependencies

 Like many complex applications, Kudu depends on a number of third-party
 dependencies. Some (such as OpenSSL) are expected to be found on the build
 system itself. However, the vast majority are "vendored" in the `thirdparty/`
 tree. These dependencies are all versioned and pinned. They are also
 source-based; the dependencies are built before the rest of Kudu is built using
 the `build-if-necessary.sh` script.

 Third-party dependencies and their versions are defined in `vars.sh`. The source
 code for each dependency is located in a tarball, typically named
 `<dependency>-<version>.tar.gz`. The tarballs are stored in an Amazon S3 bucket
 operated by Cloudera. The bucket is cached in Amazon CloudFront to maximize
 download performance and reliability.

 If as part of your contribution you need to add a new third-party dependency,
 here's what you need to do:

 . Begin by preparing a source tarball for the new dependency. Ideally it should
 be a vanilla tarball obtained directly from an upstream project, but sometimes
 either its name or the contents need to be massaged to meet Kudu's expectations.

 . Add the new dependency to the third-party build. You'll need to modify
 `vars.sh`, `download-thirdparty.sh`, `build-definitions.sh`, and
 `build-thirdparty.sh`.

 . On your local machine, extract the source tarball into `thirdparty/src`.

 . Test the dependency's build by running `build-thirdparty.sh <dependency>`.
 This should build and install the dependency into `thirdparty/installed`, making
 it available for the Kudu build.

 . Test the Kudu build using the new dependency. You will need to pass
 `NO_REBUILD_THIRDPARTY=1` in the environment to prevent the Kudu build from
 rebuilding the `thirdparty/` tree (whereupon it'll fail to download the new
 dependency).

 . When everything checks out, contact a Kudu committer who is also a Cloudera
 employee and ask them to upload your source tarball to S3.

 . After the tarball has been uploaded, test the entire third-party build
 end-to-end by running `build-if-necessary.sh`.

 . Publish your patch to gerrit. With the tarball uploaded, the precommit builds
 should download and build the new dependency successfully.

 == Testing

 All new code should have tests.::
   Add new tests either in existing files, or create new test files as necessary.

 All bug fixes should have tests.::
   It's OK to fix a bug without adding a
   new test if it's triggered by an existing test case. For example, if a
   race shows up when running a multi-threaded system test after 20
   minutes or so, it's worth trying to make a more targeted test case to
   trigger the bug. But if that's hard to do, the existing system test
   should be enough.

 Tests should run quickly (< 1s).::
   If you want to write a time-intensive
   test, make the runtime dependent on `KuduTest#AllowSlowTests`, which is
   enabled via the `KUDU_ALLOW_SLOW_TESTS` environment variable and is
   used by Jenkins test execution.

 Tests which run a number of iterations of some task should use a `gflags` command-line argument for the number of iterations.::
  This is handy for writing quick stress tests or performance tests.

 Commits which may affect performance should include before/after `perf-stat(1)` output.::
   This will show performance improvement or non-regression.
   Performance-sensitive code should include some test case which can be used as a
   targeted benchmark.


 == Documentation

 See the
 link:https://github.com/apache/kudu/blob/master/docs/design-docs/doc-style-guide.adoc[Documentation Style Guide]
 for guidelines about contributing to the official Kudu documentation.

 == Blog posts

 === Writing a post on the Kudu blog

 If you are using or integrating with Kudu, consider doing a write-up about your
 use case and your integration with Kudu and submitting it to be posted as an
 article on the Kudu blog. People in the community love to read about how Kudu
 is being used around the world.

 Consider checking with the project developers on the Kudu Slack instance or on
 mailto:dev@kudu.apache.org[dev@kudu.apache.org] if you have any questions about
 the content or the topic of a potential Kudu blog post.

 === Submitting a blog post in Google Doc format

 If you don't have the time to learn Markdown or to submit a Gerrit change
 request, but you would still like to submit a post for the Kudu blog, feel free
 to write your post in Google Docs format and share the draft with us publicly
 on mailto:dev@kudu.apache.org[dev@kudu.apache.org] -- we'll be happy to review
 it and post it to the blog for you once it's ready to go.

 If you would like to submit the post directly to Gerrit for review in Markdown
 format (the developers will appreciate it if you do), please read below.

 === How to format a Kudu blog post

 Blog posts live in the `gh-pages` branch under the `_posts` directory in
 Markdown format. They're automatically rendered by Jekyll so for those familiar
 with Markdown or Jekyll, submitting a blog post should be fairly
 straightforward.

 Each post is a separate file named in the following format:

 ----
 YYYY-MM-DD-title-of-the-post.md
 ----

 The `YYYY-MM-DD` part is the date which will be included in the link as
 `/YYYY-MM-DD`, then `title-of-the-post` is used verbatim. The words should be
 separated by dashes and should contain only lowercase letters of the English
 alphabet and numbers. Finally, the `.md` extension will be replaced with
 `.html`.

 The header contains the layout information (which is always "post"), the
 title and the author's name.

 ----
 ---
 layout: post
 title: Example Post
 author: John Doe
 ---
 ----

 The actual text of the blog post goes below this header, beginning with the
 "lead" which is a short excerpt that shows up in the index. This is separated
 by the `<!--more-\->` string from the rest of the post.

 === How to check the rendering of a blog post

 Once you've finished the post, there is a command you can run to make sure it
 looks good called `site_tool` in the root of the `gh-pages` branch that can
 start up Jekyll and serve the rendered site locally. To run this, you need Ruby
 and Python to be installed on your machine, and you can start it with the below
 command.

 ----
 $ ./site_tool jekyll serve
 ----

 When starting, it will print the URL where you can reach the site, but it should
 be http://localhost:4000, or to reach the blog directly,
 http://localhost:4000/blog

 You should be able to see the title and lead of your post along with your name
 at the top of this page, and after clicking on the title or the "Read full
 post...", the whole post.

 === How to submit a blog post

 To submit the post, you'll need to commit your change and push it to
 <<_contributing_patches_using_gerrit,Gerrit>> for review. If the post is deemed
 useful for the community and all comments are addressed, a committer can merge
 and publish your post.

 [TIP]
 ====
 If you have a GitHub account, you can fork Kudu from
 https://github.com/apache/kudu and push the change to your fork too. GitHub will
 automatically render it on https://<yourname>.github.io/blog and you can link it
 directly on Gerrit.

 This way the reviewers can see that the post renders well without having to
 download it, which can speed up the review process.

 ====