blob: 4a227fd205406bffe4342ba661393df431a06669 [file] [log] [blame]
// Copyright 2015 Cloudera, Inc.
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
// You may obtain a copy of the License at
//
// http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing, software
// distributed under the License is distributed on an "AS IS" BASIS,
// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.
[[contributing]]
= Contributing to Kudu
:author: Kudu Team
:imagesdir: ./images
:icons: font
:toc:
:toclevels: 3
:doctype: book
:backend: html5
:sectlinks:
:experimental:
== Code Style
In general, Kudu follows the
link:http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml[Google {cpp} Style Guide],
with the following exceptions:
=== Limitations on `boost` Library Use
`boost` libraries can be used in cases where we don't have a suitable
replacement in our own code-base. However, try to avoid introducing
dependencies on new `boost` libraries, and use our own code in preference
to `boost` where available. For example, do not use boost`'s scoped pointer
implementations.
.Approved `boost` Libraries
- `BOOST_FOREACH`
- `boost::assign` (container literals)
- `boost::mutex` and `boost::shared_mutex` (but prefer our own
spin lock implementation for short critical sections)
Check that any features from `boost` you use are present in *`boost` 1.46*
or earlier, for compatibility with RHEL 6.
.`boost` Libraries and the Kudu {cpp} Client
Do not use `boost` in any public headers for the Kudu {cpp} client, because
`boost` commonly breaks backward compatibility, and passing data between two `boost`
versions (one by the user, one by us) causes serious issues.
In addition, do not create dependencies from the Kudu {cpp} client to any `boost`
libraries. `libboost_system` is particularly troublesome, as any `boost` code
that throws exceptions will grow a dependency on it. Among other things, you
cannot use `boost::{lock_guard,unique_lock,shared_lock}` in any code consumed
by the {cpp} client (such as _common/_ and _util/_).
=== Line length
We allow 100 charcters per line rather than Google's standard of 80. Try to
keep under 80 where possible, but you can spill over to ~100 or so if necessary.
=== Pointers
.Smart Pointers and Singly-Owned Pointers
Generally, most objects should have clear "single-owner" semantics.
Most of the time, singly-owned objects can be wrapped in a `gscoped_ptr<>`
which ensures deletion on scope exit and prevents accidental copying.
`gscoped_ptr` is similar to {cpp}11's `unique_ptr` in that it has a `release`
method and also provides emulated `move` semantics (see _gscoped_ptr.h_ for
example usage).
If an object is singly owned, but referenced from multiple places, such as when the pointed-to object is known to be valid at least as long
as the pointer itself, associate a comment with the constructor which takes and stores
the raw pointer, as in the following example.
[source,c++]
----
// 'blah' must remain valid for the lifetime of this class
MyClass(const Blah* blah) :
blah_(blah) {
}
----
If you use raw pointers within STL collections or inside of vectors and other containers,
associate a comment with the container, which explains the ownership
semantics (owned or un-owned). Use utility code from _gutil/stl_util.h_, such as
`STLDeleteElements` or `ElementDeleter` to easily handle deletion of the
contained elements.
WARNING: Using `std::auto_ptr` is strictly disallowed because of its difficult and
bug-prone semantics.
.Smart Pointers for Multiply-Owned Pointers:
Although single ownership is ideal, sometimes it is not possible, particularly
when multiple threads are in play and the lifetimes of the pointers are not
clearly defined. In these cases, you can use either `std::tr1::shared_ptr` or
our own `scoped_refptr` from _gutil/ref_counted.hpp_. Each of these mechanisms
relies on reference counting to automatically delete the referrent once no more
pointers remain. The key difference between these two types of pointers is that
`scoped_refptr` requires that the object extend a `RefCounted` base class, and
stores its reference count inside the object storage itself, while`shared_ptr`
maintains a separate reference count on the heap.
The pros and cons are:
.`shared_ptr`
[none]
* icon:plus-circle[role="green",alt="pro"] can be used with any type of object, without the
object deriving from a special base class
* icon:plus-circle[role="green",alt="pro"] part of the standard library and familiar to most
{cpp} developers
* icon:minus-circle[role="red",alt="con"] creating a new object requires two allocations instead
of one (one to create the ref count, and one to create the object)
* icon:minus-circle[role="red",alt="con"] the ref count may not be near the object on the heap,
so extra cache misses may be incurred on access
* icon:minus-circle[role="red",alt="con"] the `shared_ptr` instance itself requires 16 bytes
(pointer to the ref count and pointer to the object)
* icon:minus-circle[role="red",alt="con"] if you convert from the `shared_ptr` to a raw pointer,
you can't get back the `shared_ptr`
.`scoped_refptr`
[none]
* icon:plus-circle[pro, role="green"] only requires a single allocation, and ref count
is on the same cache line as the object
* icon:plus-circle[pro, role="green"] the pointer only requires 8 bytes (since
the ref count is within the object)
* icon:plus-circle[pro, role="green"] you can manually increase or decrease
reference counts when more control is required
* icon:plus-circle[pro, role="green"] you can convert from a raw pointer back
to a `scoped_refptr` safely without worrying about double freeing
* icon:plus-circle[pro, role="green"] since we control the implementation, we
can implement features, such as debug builds that capture the stack trace of every
referrent to help debug leaks.
* icon:minus-circle[con, role="red"] the referred-to object must inherit
from `RefCounted`
* icon:minus-circle[con, role="red"] does not support `weak_ptr<>` use cases
Since `scoped_refptr` is generally faster and smaller, try to use it
rather than `shared_ptr` in new code. Existing code (as of 12/20/2013) uses `shared_ptr`
in many places. When interfacing with that code, continued use of `shared_ptr` is fine.
=== Function Binding and Callbacks
Existing code (as of 01/07/2014) uses `boost::bind` and `boost::function` for
function binding and callbacks. For new code, use the `Callback` and `Bind`
classes in `gutil` instead. While less full-featured (`Bind` doesn't support argument
place holders, wrapped function pointers, or function objects), they provide
more options by the way of argument lifecycle management. For example, a
bound argument whose class extends `RefCounted` will be incremented during `Bind`
and decremented when the `Callback` goes out of scope.
See the large file comment in _gutil/callback.h_ for more details, and
_util/callback_bind-test.cc_ for examples.
=== `CMake` Style Guide
`CMake` allows commands in lower, upper, or mixed case. Therefore to keep
the CMake files consistent, please use the following guidelines:
- *built-in commands* in lowercase
----
add_subdirectory(some/path)
----
- *built-in arguments* in uppercase
----
message(STATUS "message goes here")
----
- *custom commands or macros* in uppercase
----
ADD_KUDU_TEST(some-test)
----
=== GFlags
Kudu uses gflags for both command-line and file-based configuration. Below are
some guidelines on how to add a new gflag. All new gflags must conform to these
guidelines; existing non-conformant ones will be made conformant in time.
.Name
The gflag's name conveys a lot of information, so care must be taken to pick a
good name. The name will propagate into other systems, such as the official Kudu
documentation.
- The different parts of a multi-word name should be separated by underscores.
For example, `fs_data_dirs`.
- The name should be prefixed with the context that it affects. For example,
`webserver_num_worker_threads` and `cfile_default_block_size`. Context can be
difficult to define, so it's best to bear in mind that this prefix will be
used to group similar gflags together. If the gflag affects the entire
process, it should not be prefixed.
- If the gflag is for a quantity, the name should be suffixed with the units.
For example, `remote_bootstrap_idle_timeout_ms`.
- Where possible, use short names. This will save time for those entering
command line options by hand.
- The name is part of Kudu's compatibility contract, and should not change
without very good reason.
.Default value
Choosing a default value is generally simple, but like the name, it propagates
into other systems.
- The default value is part of Kudu's compatibility contract, and should not
change without very good reason.
.Description
The gflag's description should supplement the name and provide additional
context and information. Like the name, the description propagates into other
systems.
- The description may include multiple sentences. Each should begin with a
capital letter, end with a period, and begin one space after the previous.
- The description should NOT include the gflag's type or default value; they are
provided out-of-band.
- The description should be in the third person. Do not use words like `you`.
- A gflag description can be changed freely; it is not expected to remain the
same across Kudu releases.
.Tags
Kudu's gflag tagging mechanism adds machine-readable context to each gflag, for
use in consuming systems such as documentation or CM. See the large block
comment in flag_tags.h for guidelines.
.Misc
- Avoid creating multiple gflags for the same logical parameter. For
example, many Kudu binaries need to configure a WAL directory. Rather than
creating `foo_wal_dir` and `bar_wal_dir` gflags, better to have a single
`kudu_wal_dir` gflag for use universally.
== Testing
All new code should have tests.::
Add new tests either in existing files, or create new test files as necessary.
All bug fixes should have tests.::
It's OK to fix a bug without adding a
new test if it's triggered by an existing test case. For example, if a
race shows up when running a multi-threaded system test after 20
minutes or so, it's worth trying to make a more targeted test case to
trigger the bug. But if that's hard to do, the existing system test
should be enough.
Tests should run quickly (< 1s).::
If you want to write a time-intensive
test, make the runtime dependent on `KuduTest#AllowSlowTests`, which is
enabled via the `KUDU_ALLOW_SLOW_TESTS` environment variable and is
used by Jenkins test execution.
Tests which run a number of iterations of some task should use a `gflags` command-line argument for the number of iterations.::
This is handy for writing quick stress tests or performance tests.
Commits which may affect performance should include before/after `perf-stat(1)` output.::
This will show performance improvement or non-regression.
Performance-sensitive code should include some test case which can be used as a
targeted benchmark.
== Documentation
See link:documentation.html[Documentation Style Guide].