docs/contributing.adoc - kudu - Git at Google

 // Copyright 2015 Cloudera, Inc.
 //
 // Licensed under the Apache License, Version 2.0 (the "License");
 // you may not use this file except in compliance with the License.
 // You may obtain a copy of the License at
 //
 //     http://www.apache.org/licenses/LICENSE-2.0
 //
 // Unless required by applicable law or agreed to in writing, software
 // distributed under the License is distributed on an "AS IS" BASIS,
 // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
 // See the License for the specific language governing permissions and
 // limitations under the License.

 [[contributing]]
 = Contributing to Kudu
 :author: Kudu Team
 :imagesdir: ./images
 :icons: font
 :toc:
 :toclevels: 3
 :doctype: book
 :backend: html5
 :sectlinks:
 :experimental:

 == Code Style

 In general, Kudu follows the
 link:http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml[Google {cpp} Style Guide],
 with the following exceptions:

 === Limitations on `boost` Library Use

 `boost` libraries can be used in cases where we don't have a suitable
 replacement in our own code-base. However, try to avoid introducing
 dependencies on new `boost` libraries, and use our own code in preference
 to `boost` where available. For example, do not use boost`'s scoped pointer
 implementations.

 .Approved `boost` Libraries

 - `BOOST_FOREACH`
 - `boost::assign` (container literals)
 - `boost::mutex` and `boost::shared_mutex` (but prefer our own
   spin lock implementation for short critical sections)

 Check that any features from `boost` you use are present in *`boost` 1.46*
 or earlier, for compatibility with RHEL 6.

 .`boost` Libraries and the Kudu {cpp} Client
 Do not use `boost` in any public headers for the Kudu {cpp} client, because
 `boost` commonly breaks backward compatibility, and passing data between two `boost`
 versions (one by the user, one by us) causes serious issues.

 In addition, do not create dependencies from the Kudu {cpp} client to any `boost`
 libraries. `libboost_system` is particularly troublesome, as any `boost` code
 that throws exceptions will grow a dependency on it. Among other things, you
 cannot use `boost::{lock_guard,unique_lock,shared_lock}` in any code consumed
 by the {cpp} client (such as _common/_ and _util/_).

 === Line length

 We allow 100 charcters per line rather than Google's standard of 80. Try to
 keep under 80 where possible, but you can spill over to ~100 or so if necessary.

 === Pointers

 .Smart Pointers and Singly-Owned Pointers

 Generally, most objects should have clear "single-owner" semantics.
 Most of the time, singly-owned objects can be wrapped in a `gscoped_ptr<>`
 which ensures deletion on scope exit and prevents accidental copying.
 `gscoped_ptr` is similar to {cpp}11's `unique_ptr` in that it has a `release`
 method and also provides emulated `move` semantics (see _gscoped_ptr.h_ for
 example usage).

 If an object is singly owned, but referenced from multiple places, such as when the pointed-to object is known to be valid at least as long
 as the pointer itself, associate a comment with the constructor which takes and stores
 the raw pointer, as in the following example.

 [source,c++]
 ----
   // 'blah' must remain valid for the lifetime of this class
   MyClass(const Blah* blah) :
     blah_(blah) {
   }
 ----

 If you use raw pointers within STL collections or inside of vectors and other containers,
 associate a comment with the container, which explains the ownership
 semantics (owned or un-owned). Use utility code from _gutil/stl_util.h_, such as
 `STLDeleteElements` or `ElementDeleter` to easily handle deletion of the
 contained elements.

 WARNING: Using `std::auto_ptr` is strictly disallowed because of its difficult and
 bug-prone semantics.

 .Smart Pointers for Multiply-Owned Pointers:

 Although single ownership is ideal, sometimes it is not possible, particularly
 when multiple threads are in play and the lifetimes of the pointers are not
 clearly defined. In these cases, you can use either `std::tr1::shared_ptr` or
 our own `scoped_refptr` from _gutil/ref_counted.hpp_. Each of these mechanisms
 relies on reference counting to automatically delete the referrent once no more
 pointers remain. The key difference between these two types of pointers is that
 `scoped_refptr` requires that the object extend a `RefCounted` base class, and
 stores its reference count inside the object storage itself, while`shared_ptr`
 maintains a separate reference count on the heap.

 The pros and cons are:

 .`shared_ptr`
 [none]
 * icon:plus-circle[role="green",alt="pro"] can be used with any type of object, without the
 object deriving from a special base class
 * icon:plus-circle[role="green",alt="pro"] part of the standard library and familiar to most
 {cpp} developers
 * icon:minus-circle[role="red",alt="con"] creating a new object requires two allocations instead
 of one (one to create the ref count, and one to create the object)
 * icon:minus-circle[role="red",alt="con"] the ref count may not be near the object on the heap,
 so extra cache misses may be incurred on access
 * icon:minus-circle[role="red",alt="con"] the `shared_ptr` instance itself requires 16 bytes
 (pointer to the ref count and pointer to the object)
 * icon:minus-circle[role="red",alt="con"] if you convert from the `shared_ptr` to a raw pointer,
 you can't get back the `shared_ptr`


 .`scoped_refptr`
 [none]
 * icon:plus-circle[pro, role="green"] only requires a single allocation, and ref count
 is on the same cache line as the object
 * icon:plus-circle[pro, role="green"] the pointer only requires 8 bytes (since
 the ref count is within the object)
 * icon:plus-circle[pro, role="green"] you can manually increase or decrease
 reference counts when more control is required
 * icon:plus-circle[pro, role="green"] you can convert from a raw pointer back
 to a `scoped_refptr` safely without worrying about double freeing
 * icon:plus-circle[pro, role="green"] since we control the implementation, we
 can implement features, such as debug builds that capture the stack trace of every
 referrent to help debug leaks.
 * icon:minus-circle[con, role="red"] the referred-to object must inherit
 from `RefCounted`
 * icon:minus-circle[con, role="red"] does not support `weak_ptr<>` use cases

 Since `scoped_refptr` is generally faster and smaller, try to use it
 rather than `shared_ptr` in new code. Existing code (as of 12/20/2013) uses `shared_ptr`
 in many places. When interfacing with that code, continued use of `shared_ptr` is fine.

 === Function Binding and Callbacks

 Existing code (as of 01/07/2014) uses `boost::bind` and `boost::function` for
 function binding and callbacks. For new code, use the `Callback` and `Bind`
 classes in `gutil` instead. While less full-featured (`Bind` doesn't support argument
 place holders, wrapped function pointers, or function objects), they provide
 more options by the way of argument lifecycle management. For example, a
 bound argument whose class extends `RefCounted` will be incremented during `Bind`
 and decremented when the `Callback` goes out of scope.

 See the large file comment in _gutil/callback.h_ for more details, and
 _util/callback_bind-test.cc_ for examples.

 === `CMake` Style Guide

 `CMake` allows commands in lower, upper, or mixed case. Therefore to keep
 the CMake files consistent, please use the following guidelines:

 - *built-in commands* in lowercase
 ----
 add_subdirectory(some/path)
 ----
 - *built-in arguments* in uppercase
 ----
 message(STATUS "message goes here")
 ----
 - *custom commands or macros* in uppercase
 ----
 ADD_KUDU_TEST(some-test)
 ----

 === GFlags

 Kudu uses gflags for both command-line and file-based configuration. Below are
 some guidelines on how to add a new gflag. All new gflags must conform to these
 guidelines; existing non-conformant ones will be made conformant in time.

 .Name

 The gflag's name conveys a lot of information, so care must be taken to pick a
 good name. The name will propagate into other systems, such as the official Kudu
 documentation.
 - The different parts of a multi-word name should be separated by underscores.
   For example, `fs_data_dirs`.
 - The name should be prefixed with the context that it affects. For example,
   `webserver_num_worker_threads` and `cfile_default_block_size`. Context can be
   difficult to define, so it's best to bear in mind that this prefix will be
   used to group similar gflags together. If the gflag affects the entire
   process, it should not be prefixed.
 - If the gflag is for a quantity, the name should be suffixed with the units.
   For example, `remote_bootstrap_idle_timeout_ms`.
 - Where possible, use short names. This will save time for those entering
   command line options by hand.
 - The name is part of Kudu's compatibility contract, and should not change
   without very good reason.

 .Default value

 Choosing a default value is generally simple, but like the name, it propagates
 into other systems.
 - The default value is part of Kudu's compatibility contract, and should not
   change without very good reason.

 .Description

 The gflag's description should supplement the name and provide additional
 context and information. Like the name, the description propagates into other
 systems.
 - The description may include multiple sentences. Each should begin with a
   capital letter, end with a period, and begin one space after the previous.
 - The description should NOT include the gflag's type or default value; they are
   provided out-of-band.
 - The description should be in the third person. Do not use words like `you`.
 - A gflag description can be changed freely; it is not expected to remain the
   same across Kudu releases.

 .Tags

 Kudu's gflag tagging mechanism adds machine-readable context to each gflag, for
 use in consuming systems such as documentation or CM. See the large block
 comment in flag_tags.h for guidelines.

 .Misc

 - Avoid creating multiple gflags for the same logical parameter. For
   example, many Kudu binaries need to configure a WAL directory. Rather than
   creating `foo_wal_dir` and `bar_wal_dir` gflags, better to have a single
   `kudu_wal_dir` gflag for use universally.

 == Testing

 All new code should have tests.::
   Add new tests either in existing files, or create new test files as necessary.

 All bug fixes should have tests.::
   It's OK to fix a bug without adding a
   new test if it's triggered by an existing test case. For example, if a
   race shows up when running a multi-threaded system test after 20
   minutes or so, it's worth trying to make a more targeted test case to
   trigger the bug. But if that's hard to do, the existing system test
   should be enough.

 Tests should run quickly (< 1s).::
   If you want to write a time-intensive
   test, make the runtime dependent on `KuduTest#AllowSlowTests`, which is
   enabled via the `KUDU_ALLOW_SLOW_TESTS` environment variable and is
   used by Jenkins test execution.

 Tests which run a number of iterations of some task should use a `gflags` command-line argument for the number of iterations.::
  This is handy for writing quick stress tests or performance tests.

 Commits which may affect performance should include before/after `perf-stat(1)` output.::
   This will show performance improvement or non-regression.
   Performance-sensitive code should include some test case which can be used as a
   targeted benchmark.


 == Documentation
 See link:documentation.html[Documentation Style Guide].
	// Copyright 2015 Cloudera, Inc.
	//
	// Licensed under the Apache License, Version 2.0 (the "License");
	// you may not use this file except in compliance with the License.
	// You may obtain a copy of the License at
	//
	// http://www.apache.org/licenses/LICENSE-2.0
	//
	// Unless required by applicable law or agreed to in writing, software
	// distributed under the License is distributed on an "AS IS" BASIS,
	// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
	// See the License for the specific language governing permissions and
	// limitations under the License.

	[[contributing]]
	= Contributing to Kudu
	:author: Kudu Team
	:imagesdir: ./images
	:icons: font
	:toc:
	:toclevels: 3
	:doctype: book
	:backend: html5
	:sectlinks:
	:experimental:

	== Code Style

	In general, Kudu follows the
	link:http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml[Google {cpp} Style Guide],
	with the following exceptions:

	=== Limitations on `boost` Library Use

	`boost` libraries can be used in cases where we don't have a suitable
	replacement in our own code-base. However, try to avoid introducing
	dependencies on new `boost` libraries, and use our own code in preference
	to `boost` where available. For example, do not use boost`'s scoped pointer
	implementations.

	.Approved `boost` Libraries

	- `BOOST_FOREACH`
	- `boost::assign` (container literals)
	- `boost::mutex` and `boost::shared_mutex` (but prefer our own
	spin lock implementation for short critical sections)

	Check that any features from `boost` you use are present in `boost` 1.46
	or earlier, for compatibility with RHEL 6.

	.`boost` Libraries and the Kudu {cpp} Client
	Do not use `boost` in any public headers for the Kudu {cpp} client, because
	`boost` commonly breaks backward compatibility, and passing data between two `boost`
	versions (one by the user, one by us) causes serious issues.

	In addition, do not create dependencies from the Kudu {cpp} client to any `boost`
	libraries. `libboost_system` is particularly troublesome, as any `boost` code
	that throws exceptions will grow a dependency on it. Among other things, you
	cannot use `boost::{lock_guard,unique_lock,shared_lock}` in any code consumed
	by the {cpp} client (such as _common/_ and _util/_).

	=== Line length

	We allow 100 charcters per line rather than Google's standard of 80. Try to
	keep under 80 where possible, but you can spill over to ~100 or so if necessary.

	=== Pointers

	.Smart Pointers and Singly-Owned Pointers

	Generally, most objects should have clear "single-owner" semantics.
	Most of the time, singly-owned objects can be wrapped in a `gscoped_ptr<>`
	which ensures deletion on scope exit and prevents accidental copying.
	`gscoped_ptr` is similar to {cpp}11's `unique_ptr` in that it has a `release`
	method and also provides emulated `move` semantics (see _gscoped_ptr.h_ for
	example usage).

	If an object is singly owned, but referenced from multiple places, such as when the pointed-to object is known to be valid at least as long
	as the pointer itself, associate a comment with the constructor which takes and stores
	the raw pointer, as in the following example.

	[source,c++]
	----
	// 'blah' must remain valid for the lifetime of this class
	MyClass(const Blah* blah) :
	blah_(blah) {
	}
	----

	If you use raw pointers within STL collections or inside of vectors and other containers,
	associate a comment with the container, which explains the ownership
	semantics (owned or un-owned). Use utility code from _gutil/stl_util.h_, such as
	`STLDeleteElements` or `ElementDeleter` to easily handle deletion of the
	contained elements.

	WARNING: Using `std::auto_ptr` is strictly disallowed because of its difficult and
	bug-prone semantics.

	.Smart Pointers for Multiply-Owned Pointers:

	Although single ownership is ideal, sometimes it is not possible, particularly
	when multiple threads are in play and the lifetimes of the pointers are not
	clearly defined. In these cases, you can use either `std::tr1::shared_ptr` or
	our own `scoped_refptr` from _gutil/ref_counted.hpp_. Each of these mechanisms
	relies on reference counting to automatically delete the referrent once no more
	pointers remain. The key difference between these two types of pointers is that
	`scoped_refptr` requires that the object extend a `RefCounted` base class, and
	stores its reference count inside the object storage itself, while`shared_ptr`
	maintains a separate reference count on the heap.

	The pros and cons are:

	.`shared_ptr`
	[none]
	* icon:plus-circle[role="green",alt="pro"] can be used with any type of object, without the
	object deriving from a special base class
	* icon:plus-circle[role="green",alt="pro"] part of the standard library and familiar to most
	{cpp} developers
	* icon:minus-circle[role="red",alt="con"] creating a new object requires two allocations instead
	of one (one to create the ref count, and one to create the object)
	* icon:minus-circle[role="red",alt="con"] the ref count may not be near the object on the heap,
	so extra cache misses may be incurred on access
	* icon:minus-circle[role="red",alt="con"] the `shared_ptr` instance itself requires 16 bytes
	(pointer to the ref count and pointer to the object)
	* icon:minus-circle[role="red",alt="con"] if you convert from the `shared_ptr` to a raw pointer,
	you can't get back the `shared_ptr`


	.`scoped_refptr`
	[none]
	* icon:plus-circle[pro, role="green"] only requires a single allocation, and ref count
	is on the same cache line as the object
	* icon:plus-circle[pro, role="green"] the pointer only requires 8 bytes (since
	the ref count is within the object)
	* icon:plus-circle[pro, role="green"] you can manually increase or decrease
	reference counts when more control is required
	* icon:plus-circle[pro, role="green"] you can convert from a raw pointer back
	to a `scoped_refptr` safely without worrying about double freeing
	* icon:plus-circle[pro, role="green"] since we control the implementation, we
	can implement features, such as debug builds that capture the stack trace of every
	referrent to help debug leaks.
	* icon:minus-circle[con, role="red"] the referred-to object must inherit
	from `RefCounted`
	* icon:minus-circle[con, role="red"] does not support `weak_ptr<>` use cases

	Since `scoped_refptr` is generally faster and smaller, try to use it
	rather than `shared_ptr` in new code. Existing code (as of 12/20/2013) uses `shared_ptr`
	in many places. When interfacing with that code, continued use of `shared_ptr` is fine.

	=== Function Binding and Callbacks

	Existing code (as of 01/07/2014) uses `boost::bind` and `boost::function` for
	function binding and callbacks. For new code, use the `Callback` and `Bind`
	classes in `gutil` instead. While less full-featured (`Bind` doesn't support argument
	place holders, wrapped function pointers, or function objects), they provide
	more options by the way of argument lifecycle management. For example, a
	bound argument whose class extends `RefCounted` will be incremented during `Bind`
	and decremented when the `Callback` goes out of scope.

	See the large file comment in _gutil/callback.h_ for more details, and
	_util/callback_bind-test.cc_ for examples.

	=== `CMake` Style Guide

	`CMake` allows commands in lower, upper, or mixed case. Therefore to keep
	the CMake files consistent, please use the following guidelines:

	- built-in commands in lowercase
	----
	add_subdirectory(some/path)
	----
	- built-in arguments in uppercase
	----
	message(STATUS "message goes here")
	----
	- custom commands or macros in uppercase
	----
	ADD_KUDU_TEST(some-test)
	----

	=== GFlags

	Kudu uses gflags for both command-line and file-based configuration. Below are
	some guidelines on how to add a new gflag. All new gflags must conform to these
	guidelines; existing non-conformant ones will be made conformant in time.

	.Name

	The gflag's name conveys a lot of information, so care must be taken to pick a
	good name. The name will propagate into other systems, such as the official Kudu
	documentation.
	- The different parts of a multi-word name should be separated by underscores.
	For example, `fs_data_dirs`.
	- The name should be prefixed with the context that it affects. For example,
	`webserver_num_worker_threads` and `cfile_default_block_size`. Context can be
	difficult to define, so it's best to bear in mind that this prefix will be
	used to group similar gflags together. If the gflag affects the entire
	process, it should not be prefixed.
	- If the gflag is for a quantity, the name should be suffixed with the units.
	For example, `remote_bootstrap_idle_timeout_ms`.
	- Where possible, use short names. This will save time for those entering
	command line options by hand.
	- The name is part of Kudu's compatibility contract, and should not change
	without very good reason.

	.Default value

	Choosing a default value is generally simple, but like the name, it propagates
	into other systems.
	- The default value is part of Kudu's compatibility contract, and should not
	change without very good reason.

	.Description

	The gflag's description should supplement the name and provide additional
	context and information. Like the name, the description propagates into other
	systems.
	- The description may include multiple sentences. Each should begin with a
	capital letter, end with a period, and begin one space after the previous.
	- The description should NOT include the gflag's type or default value; they are
	provided out-of-band.
	- The description should be in the third person. Do not use words like `you`.
	- A gflag description can be changed freely; it is not expected to remain the
	same across Kudu releases.

	.Tags

	Kudu's gflag tagging mechanism adds machine-readable context to each gflag, for
	use in consuming systems such as documentation or CM. See the large block
	comment in flag_tags.h for guidelines.

	.Misc

	- Avoid creating multiple gflags for the same logical parameter. For
	example, many Kudu binaries need to configure a WAL directory. Rather than
	creating `foo_wal_dir` and `bar_wal_dir` gflags, better to have a single
	`kudu_wal_dir` gflag for use universally.

	== Testing

	All new code should have tests.::
	Add new tests either in existing files, or create new test files as necessary.

	All bug fixes should have tests.::
	It's OK to fix a bug without adding a
	new test if it's triggered by an existing test case. For example, if a
	race shows up when running a multi-threaded system test after 20
	minutes or so, it's worth trying to make a more targeted test case to
	trigger the bug. But if that's hard to do, the existing system test
	should be enough.

	Tests should run quickly (< 1s).::
	If you want to write a time-intensive
	test, make the runtime dependent on `KuduTest#AllowSlowTests`, which is
	enabled via the `KUDU_ALLOW_SLOW_TESTS` environment variable and is
	used by Jenkins test execution.

	Tests which run a number of iterations of some task should use a `gflags` command-line argument for the number of iterations.::
	This is handy for writing quick stress tests or performance tests.

	Commits which may affect performance should include before/after `perf-stat(1)` output.::
	This will show performance improvement or non-regression.
	Performance-sensitive code should include some test case which can be used as a
	targeted benchmark.


	== Documentation
	See link:documentation.html[Documentation Style Guide].