blob: 6abcd0fbc64a65c6d86d4cc9c77bb9a954cf9f8b [file] [log] [blame]
////
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
////
= For Committers
image::business-gremlin.png[width=400]
The guidelines that follow generally apply to those with commit access to the main repository, but those seeking to
contribute will also find helpful information here on the development style and process for the project.
[[initial-setup]]
== Initial Setup
Once the Apache TinkerPop PMC has sent an invitation to a contributor to become a new committer and that contributor
has accepted that invitation and provided their iCLA to Apache, then there are some administrative steps that the
new committer can expect to go through to complete the process so that they have access to Apache resources like the
Git repository. While the information for completing the process can be found in a multitude of places the following
listing provides a summary of what to do next:
* Look for a welcome email from root_at_apache.org. This will contain your Apache user name
** e.g. "Welcome to the Apache Software Foundation (ASF)!"
* Visit https://id.apache.org/reset/enter and enter your user name
* Look for a password reset email from root_at_apache.org. This will have been sent after you confirmed your user name, above.
** e.g. "Password reset request for [username] from Apache ID"
* Visit the link provided in the password reset email, and choose a new password. ASF asks you to choose a strong one. You will see a "Password change successful" page when this is done.
* You will now have SVN access
** link:https://svn.apache.org/[svn.apache.org]
* Try sending yourself an email at your new [username]@apache.org email address. It should forward to your primary address.
* Check your account details at https://id.apache.org/details/[username]
* Link your accounts using gitbox (https://gitbox.apache.org/setup/)
** Link your Apache account
** Link your GitHub account
*** You will be asked to "Authorize Apache M.A.T.T."
** Obtain MFA status
*** Visit link:https://id.apache.org[id.apache.org] and set your GitHub ID to be invited to the org
*** Wait for an email to arrive from "support_at_github.com", and click "Join @apache". Accept the invitation on github.com. Your new MFA status will not immediately be reflected on github, but you will get a confirmation email from noreply_at_github.com. Later, the status will read "MFA ENABLED"
**** e.g. "[GitHub] @asf-gitbox has invited you to join the @apache organization"
*** You can find yourself by searching in the link:https://github.com/orgs/apache/teams/apache-committers[Apache Committers listing]
* Read the link:https://www.apache.org/dev/new-committers-guide.html[Apache Committer Guide] and link:http://www.apache.org/dev/committers.html[Apache Committer FAQ]
* Read through the other sections of this document - the Developer Documentation - for more details on project procedures and other administrative items.
** In particular, see <<rtc,Review then Commit>>
* If you have trouble committing, email dev@tinkerpop.apache.org
== Communication
TinkerPop has a link:http://groups.google.com/group/gremlin-users[user mailing list] and a
pass:[<a href="https://lists.apache.org/list.html?dev@tinkerpop.apache.org">dev mailing list</a>]. As a committer,
it is a good idea to join both.
TinkerPop also has a link:https://the-asf.slack.com/archives/CUBJ577EW[Slack channel] for more real-time communication
about the project among contributors, though it must be kept in mind that project discussion and decisions must occur
on the dev mailing list mentioned above.
Occasionally, online meetings via video conference are held. These meetings are schedule via the dev mailing list
about a week before they are to occur to find a day and time that is available for those interested in attending.
On the day of the meeting, the meeting organizer will create a Google Hangout (or similar video conferencing link).
At that point, all who are interested can attend. Meeting minutes should be
taken and added to the <<meetings,Meetings>> section of this document using the pattern already established.
== Release Notes
There is a two-pronged approach to maintaining the change log and preparing the release notes.
1. For work that is documented in JIRA, run the release notes report to include all of
the tickets targeted for a specific release. This report can be included in the
release announcement.
2. The manual change log (`CHANGELOG.asciidoc`) can be used to highlight large
changes, describe themes (e.g. "We focused on performance improvements") or to
give voice to undocumented changes.
Given the dependence on the JIRA report for generating additions to the `CHANGELOG.asciidoc`,
which uses the title of the issue as the line presented in the release note report, titles should
be edited prior to release to be useful in that context. In other words, an issue title should
be understandable as a change in the fewest words possible while still conveying the gist of the
change.
Changes that break the public APIs should be marked with a "breaking" label and should be
distinguished from other changes in the release notes.
[[branches]]
== Branches
TinkerPop has several release branches:
* `3.0-dev` - 3.0.x (no longer maintained)
* `3.1-dev` - 3.1.x (no longer maintained)
* `3.2-dev` - 3.2.x (no longer maintained)
* `3.3-dev` - 3.3.x (no longer maintained)
* `3.4-dev` - 3.4.x (non-breaking bug fixes and enhancements)
* `3.5-dev` - 3.5.x (non-breaking bug fixes and enhancements)
* `master` - 3.6.x (current development)
* `4.0-dev` - 4.0.x (future development)
The branch description above that reads "non-breaking bug fixes and enhancements" simply means that within that release
line (i.e. patch version) changes should not alter existing behavior, introduce new APIs, change serialization formats,
modify protocols, etc. In this way, users and providers have an easy way to know that within a minor release line, they
can be assured that their upgrades will not introduce potential problems. A good rule of thumb is to consider whether a
client of one version within a release line can interact properly with a server version within that same line. If so,
it is likely an acceptable change within that branch.
Changes to earlier branches should merge forward toward `master` (e.g. `3.5-dev` should merge to `master`). Please read
more about this process in the <<pull-requests, Pull Requests>> section. Note that `4.0-dev` is rebased on `master`
and currently behaves as a fresh repository as all 3.x content was removed.
Other branches may be created for collaborating on features or for RFC's that other developers may want to inspect.
It is suggested that the JIRA issue ID be used as the prefix, since that triggers certain automation, and it provides a
way to account for the branch lifecycle, i.e. "Who's branch is this, and can I delete it?"
For branches that are NOT associated with JIRA issues, developers should utilize their Apache ID as
a branch name prefix. This provides a unique namespace, and also a way to account for the branch lifecycle.
Developers should remove their own branches when they are no longer needed.
== Tags
Tags are used for milestones, release candidates, and approved releases. Please refrain from creating arbitrary
tags, as they produce permanent clutter.
== Issue Tracker Conventions
TinkerPop uses Apache JIRA as its link:https://issues.apache.org/jira/browse/TINKERPOP[issue tracker]. JIRA is a
very robust piece of software with many options and configurations. To simplify usage and ensure consistency across
issues, the following conventions should be adhered to:
* An issue's "status" should generally be in one of two states: `open` or `closed` (`reopened` is equivalent to `open`
for our purposes).
** An `open` issue is newly created, under consideration or otherwise in progress.
** A `closed` issue is completed for purposes of release (i.e. code, testing, and documentation complete).
** Issues in a `resolved` state should immediately be evaluated for movement to `closed` - issue become `resolved`
by those who don't have the permissions to `close`.
* An issue's "type" should be one of two options: `bug` or `improvement`.
** A `bug` has a very specific meaning, referring to an error that prevents usage of TinkerPop AND does not have a
reasonable workaround. Given that definition, a `bug` should generally have very high priority for a fix.
** Everything else is an `improvement` in the sense that any other work is an enhancement to the current codebase.
* The "component" should be representative of the primary area of code that it applies to and all issues should have
this property set.
* Issues are not assigned "labels" with two exceptions:
** The "breaking" label which marks an issue as one that is representative of a change in the API that might
affect users or providers. This label is important when organizing release notes.
** The "deprecation" label which is assigned to an issue that includes changes to deprecate a portion of the API.
* The "affects/fix version(s)" fields should be appropriately set, where the "fix version" implies the version on
which that particular issue will completed. This is a field usually only set by committers.
* The "priority" field can be arbitrarily applied with one exception. The "trivial" option should be reserved for
tasks that are "easy" for a potential new contributor to jump into and do not have significant impact to urgently
required improvements.
* The "resolution" field which is set on the close of the issue should specify the status most closely related to why
the issue was closed. In most cases, this will mean "Fixed" for a "Bug" or "Done" for an "Improvement". Only one
resolution has special meaning and care should be taken with this particular option: "Later". "Later" means that the
item is a good idea but likely will not be implemented in any foreseeable future. By closing uncompleted issues with
this resolution, it should be easy to come back to them later when needed.
== Code Style
Contributors should examine the current code base to determine what the code style patterns are and should match their
style to what is already present. Of specific note however, TinkerPop does not use "import wildcards" - IDEs should
be adjusted accordingly to not auto-wildcard the imports.
== Build Server
TinkerPop uses link:https://travis-ci.com/[Travis] for link:https://en.wikipedia.org/wiki/Continuous_integration[CI]
services. The build status can be found link:https://travis-ci.org/apache/tinkerpop[here]. Note that the CI process
does not run all possible tests (e.g. Neo4j-related tests) as a full execution would likely exceed the allowable times
for builds on these servers. Instead Travis runs a basic cross-section of tests selected to provide a reasonably high
degree of confidence in the branch built.
== Deprecation
When possible, committers should avoid direct "breaking" change (e.g. removing a method from a class) and favor
deprecation. Deprecation should come with sufficient documentation and notice especially when the change involves
public APIs that might be utilized by users or implemented by providers:
* Mark the code with the `@Deprecated` annotation.
* Use javadoc to further document the change with the following content:
** `@deprecated As of release x.y.z, replaced by {@link SomeOtherClass#someNewMethod()}` - if the method is not
replaced then the comment can simply read "not replaced". Additional comments that provide more context are
encouraged.
** `@see <a href="https://issues.apache.org/jira/browse/TINKERPOP-XXX">TINKERPOP-XXX</a>` - supply a link to the
JIRA issue for reference - the issue should include the "deprecation" label.
* Be sure that deprecated methods are still under test - consider using javadoc/comments in the tests themselves to
call out this fact.
* Create a new JIRA issue to track removal of the deprecation for future evaluation.
* Update the "upgrade documentation" to reflect the API change and how the reader should resolve it.
The JIRA issues that track removal of deprecated methods should be periodically evaluated to determine if it is
prudent to schedule them into a release.
== Developing Tests
TinkerPop has a wide variety of test types that help validate its internal code as well as external provider code.
There are "unit tests" and "integration tests". Unit tests execute on standard runs of `mvn clean install`. These
tests tend to run quickly and provide a reasonable level of coverage and confidence in the code base. Integration
tests are disabled by default and must be explicitly turned on with a special build property by adding
`-DskipIntegrationTests=false` to the `mvn` execution. Integration tests run slower and may require external
components to be running when they are executed. They are "marked" as separate from unit tests by inclusion of the
suffix "IntegrateTest".
Here are some other points to consider when developing tests:
* Avoid use of `println` in tests and prefer use of a SLF4j `Logger` instance so that outputs can be controlled in a
standard way.
* If it is necessary to create files on the filesystem, do not hardcode directories - instead, use the `TestHelper` to
create directory structures. `TestHelper` will properly create file system structure in the appropriate build
directory thus allowing proper clean-up between test runs.
* If writing tests in one of the test suites, like `gremlin-test`, it is important to remember that if a new `Graph`
instance is constructed within the test manually, that it be closed on exit of that test. Failing to do this cleanup
can cause problems for some graph providers.
* Tests that are designed to use a `GraphProvider` implementation in conjunction with `AbstractGremlinTest` _and_ are
in the `/test` directory should not be named with `Test` as the suffix, as this will cause them to execute in some
environments without a `GraphProvider` being initialized by a suite. These types of tests should be suffixed with
`Check` instead. Please see link:https://github.com/apache/tinkerpop/blob/e32a4187e4f25e290aabe14007f9087c48a06521/neo4j-gremlin/src/test/java/org/apache/tinkerpop/gremlin/neo4j/structure/NativeNeo4jStructureCheck.java[NativeNeo4jStructureCheck]
for an example.
=== Gremlin Language Test Cases
Test cases for the Gremlin Language currently requires that the newly developed test be added in three places:
1. As a test written in Java in the `gremlin-test` module within the subpackages of
`org.apache.tinkerpop.gremlin.process.traversal.step`
2. As a test written in Gherkin in the `gremlin-test` module in the `/features` subdirectory
When writing a Java test case for a Gremlin step, be sure to use the following conventions.
* The name of the traversal generator should start with `get`, use `X` for brackets, `_` for space, and the Gremlin-Groovy sugar syntax.
** `get_g_V_hasLabelXpersonX_groupXaX_byXageX_byXsumX_name()`
* When creating a test for a step that has both a barrier and sideEffect form (e.g. `group()`, `groupCount()`, etc.), test both representations.
** `get_g_V_groupCount_byXnameX()`
** `get_g_V_groupCountXaX_byXnameX_capXaX()`
* The name of the actual test case should be the name of the traversal generator minus the `get_` prefix.
* The Gremlin-Groovy version of the test should use the sugar syntax in order to test sugar (as Gremlin-Java tests test standard syntax).
** `g.V.age.sum`
* Avoid using lambdas in the test case unless that is explicitly what is being tested as OLAP systems will typically not be able to execute those tests.
* `AbstractGremlinProcessTest` has various static methods to make writing a test case easy.
** `checkResults(Arrays.asList("marko","josh"), traversal)`
** `checkMap(new HashMap<String,Long>() {{ put("marko",1l); }}, traversal.next())`
Gherkin tests follow some important conventions and have a sub-language that must be adhered to for the tests to
function properly. Note that Gherkin tests are designed to support the testing of GLVs and at some point will likely
replace the Java tests. If a new Java test is added and an associated Gherkin tests is not, the overall build will
fail the `FeatureCoverageTest` of `gremlin-test` which validates that all tests written in Java are also implemented
in Gherkin.
The basic syntax of a Gherkin test is as follows:
[source,gherkin]
----
Scenario: g_VX1X_unionXrepeatXoutX_timesX2X__outX_name
Given the modern graph
And using the parameter vId1 defined as "v[marko].id"
And the traversal of
"""
g.V(vId1).union(__.repeat(__.out()).times(2), __.out()).values("name")
"""
When iterated to list
Then the result should be unordered
| result |
| ripple |
| lop |
| lop |
| vadas |
| josh |
----
==== Scenario Name
The name of the scenario needs to match the name of the Java test. If it does not then the `FeatureCoverageTest` will
fail.
==== Given
"Given" sets the context of the test. Specifically, it establishes the graph that will be used for the test. It
conforms to the pattern of "Given the _xxx_ graph" where the "xxx" may be one of the following:
* empty
* modern
* classic
* crew
* sink
* grateful
Never modify the data of any of the graphs except for the "empty" graph. The "empty" graph is the only graph that is
guaranteed to be refreshed between tests. The "empty" graph maybe be modified by the traversal under test or by an
additional "Given" option:
[source,gherkin]
----
Given the empty graph
And the graph initializer of
"""
g.addV("person").property(T.id, 1).property("name", "marko").property("age", 29).as("marko").
addV("person").property(T.id, 2).property("name", "vadas").property("age", 27).as("vadas").
addV("software").property(T.id, 3).property("name", "lop").property("lang", "java").as("lop").
addV("person").property(T.id, 4).property("name","josh").property("age", 32).as("josh").
addV("software").property(T.id, 5).property("name", "ripple").property("lang", "java").as("ripple").
addV("person").property(T.id, 6).property("name", "peter").property("age", 35).as('peter').
addE("knows").from("marko").to("vadas").property(T.id, 7).property("weight", 0.5).
addE("knows").from("marko").to("josh").property(T.id, 8).property("weight", 1.0).
addE("created").from("marko").to("lop").property(T.id, 9).property("weight", 0.4).
addE("created").from("josh").to("ripple").property(T.id, 10).property("weight", 1.0).
addE("created").from("josh").to("lop").property(T.id, 11).property("weight", 0.4).
addE("created").from("peter").to("lop").property(T.id, 12).property("weight", 0.2)
"""
----
The above configuration will use the "empty" graph and initialize it with the specified traversal. In this case, that
traversal loads the "empty" graph with the "modern" graph.
Once the graph for the test is defined, the context can be expanded to include parameters that will be applied to the
traversal under test. Any variable value being used in the traversal under test, especially ones that require a
specific type, should be defined as parameters. The structure for parameter definition looks like this:
[source,gherkin]
----
Given the modern graph
And using the parameter vId1 defined as "v[marko].id"
----
In the above example, "vId1" is the name of the parameter that will be used in the traversal. The end of that line in
quotes is the value of that parameter and should use the type system notation that has been developed for the TinkerPop
Gherkin tests. The type system notation ensures that different language variants have the ability to construct the
appropriate types expected by the tests.
The syntax of the type notation involves a prefix character to help denote the type, a value between two square
brackets, optionally suffixed with some additional notation depending on the primary type.
* Edge - *e[_xxx_]* - The "xxx" should be replaced with a representation of an edge in the form of the
`vertex_name-edgelabel->vertex_name`. This syntax may also include the `.id` suffix which would indicate getting the
edge identifier or the `.sid` suffix which gets a string representation of the edge identifier.
* Lambda - *c[_xxx_]* - The "xxx" should contain a lambda written in Groovy.
* List - *l[_xxx_,_yyy_,_zzz_,...]* - A comma separated collection of values that make up the list should be added to
between the square brackets. These values respect the type system thus allowing for creation of lists of vertices,
edges, maps, and any other available type.
* Map - *m[_xxx_]* - The "xxx" should be replaced with a JSON string. Note that keys and values will be parsed using
the type notation system so that it is possible to have maps containing arbitrary keys and values.
* Numeric - *d[_xxx_]._y_* - The "xxx" should be replaced with a number. The suffix denoted by "y" should always be
included to further qualify the type of numeric. The following options are available:
** *d* - 32-bit Double
** *f* - 32-bit Float
** *i* - 32-bit Integer
** *l* - 64-bit Long
** *m* - Arbitrary-precision signed decimal numbers (i.e. BigDecimal in Java)
* Path - *p[_xxx_,_yyy_,_zzz_,...]* - A comma separated collection of values that make up the `Path` should be added to
between the square brackets. These values respect the type system thus allowing for creation of `Path` of vertices,
edges, maps, and any other available type.
* Set - *s[_xxx_,_yyy_,_zzz_,...]* - A comma separated collection of values that make up the set should be added to
between the square brackets. These values respect the type system thus allowing for creation of sets of vertices,
edges, maps, and any other available type.
* String - Any value not using the system notation will be interpreted as a string.
* T - *t[_xxx_]* - The "xxx" should be replaced with a value of the `T` enum, such as `id` or `label`.
* Vertex - *v[_xxx_]* - The "xxx" should be replaced with the "name" property of a vertex in the graph. This syntax may
include the `.id` suffix which would indicate getting the vertex identifier or the `.sid` suffix which gets a string
representation of the edge identifier.
In addition, parameter names should adhere to a common form as they hold some meaning to certain language variant
implementations:
* General variables of no particular type should use `xx1`, `xx2` and `xx3`.
* A `Vertex` variable should be prefixed with "v" and be followed by the `id`, therefore, `v1` would signify a `Vertex`
with the `id` of "1".
* An `Edge` variable follows the pattern of vertices but with a "e" prefix.
* The "id" of a `Vertex` or `Edge` is prefixed with "vid"`" or "eid" respectively followed by the `id`, thus, `vid1`
would be "1" and refer to the `Vertex` with that `id`.
* `Function` variables should use `l1` and `l2`.
* `Predicate` variables should use `pred1`.
* `Comparator` variables should use `c1` and `c2`.
Finally, specify the traversal under test with the "Given" option "and the traversal":
[source,gherkin]
----
And the traversal of
"""
g.V(vId1).union(__.repeat(__.out()).times(2), __.out()).values("name")
"""
----
It will be the results of this traversal that end up being asserted by Gherkin. When writing these test traversals,
be sure to always use the method and enum prefixes. For example, use `__.out()` for an anonymous traversal rather
than just `out()` and prefer `Scope.local` rather than just `local`.
If a particular test cannot be written in Gherkin for some reason or cannot be otherwise supported by a GLV, first,
consider whether or not this test can be re-written in Java so that it will work for GLVs and then, second, if it
cannot, then use the following syntax for unsupported tests:
[source,gherkin]
----
Scenario: g_V_outXcreatedX_groupCountXxX_capXxX
Given an unsupported test
Then nothing should happen because
"""
The result returned is not supported under GraphSON 2.x and therefore cannot be properly asserted. More
specifically it has vertex keys which basically get toString()'d under GraphSON 2.x. This test can be supported
with GraphSON 3.x.
"""
----
==== When
The "When" options get the result from the traversal in preparation for assertion. There are two options to iterate:
* "When iterated to list" - iterates the entire traversal into a list result that is asserted
* "When iterated next" - gets the first value from the traversal as the result to be asserted
There should be only one "When" defined in a scenario.
==== Then
The "Then" options handle the assertion of the result. There are several options to consider:
* "the result should have a count of _xxx_" - assumes a list value in the result and counts the number of values
in it
* "the result should be empty" - no results
* "the result should be ordered" - the exact results and should appear in the order presented
* "the result should be unordered" - the exact results but can appear any order
* "the result should be of" - results can be any of the specified values and in any order (use when guarantees
regarding the exact results cannot be pre-determined easily - see the `range()`-step tests for examples)
These final three types of assertions mentioned above should be followed by a Gherkin table that has one column, where
each row value in that column represents a value to assert in the result. These values are type notation respected as
shown in the following example:
[source,gherkin]
----
Then the result should be unordered
| result |
| ripple |
| lop |
| lop |
| vadas |
| josh |
----
Another method of assertion is to test mutations in the original graph. Again, mutations should only occur on the
"empty" graph, but they can be validated as follows:
[source,gherkin]
----
Scenario: g_V_outE_drop
Given the empty graph
And the graph initializer of
"""
g.addV().as("a").addV().as("b").addE("knows").to("a")
"""
And the traversal of
"""
g.V().outE().drop()
"""
When iterated to list
Then the result should be empty
And the graph should return 2 for count of "g.V()"
And the graph should return 0 for count of "g.E()"
----
== Developing Benchmarks
Benchmarks are a useful tool to track performance between TinkerPop versions and also as tools to aid development
decision making. TinkerPop uses link:http://openjdk.java.net/projects/code-tools/jmh/[OpenJDK JMH] for benchmark development.
The JMH framework provides tools for writing robust benchmarking code that avoid many of the pitfalls inherent in benchmarking
JIT compiled code on the JVM. Example JMH benchmarks can be found
link:http://hg.openjdk.java.net/code-tools/jmh/file/tip/jmh-samples/src/main/java/org/openjdk/jmh/samples/[here].
TinkerPop benchmarks live in the `gremlin-benchmark` module and can either be run from within your IDE or as a standalone
uber-jar. The uber-jar is the JMH recommended approach and also makes it easy to distribute artifacts to various environments
to gather benchmarking numbers. Having said that, in most cases it should be sufficient to run it from within the IDE.
Benchmarks will not run by default because they are time consuming. To enable benchmarks during the test phase do
`-DskipBenchmarks=false`. To change the number of warmup iterations, measurement iterations, and forks you can do
`mvn clean test -DskipBenchmarks=false -DdefaultForks=5 -DmeasureIterations=20 -DwarmupIterations=20`. Benchmark results
will be output by default to the `benchmarks` directory in JSON format.
Benchmarks may also be run from the command line using the JMH runner. Build the uber-jar and simply run
`java -jar gremlin-benchmark-TP-VERSION.jar`. To see a list of JMH runner options, add the `-h` flag.
The JUnit/JMH integration was inspired by the Netty projects microbenchmarking suite. Please refer to the Netty
link:http://netty.io/wiki/microbenchmarks.html[docs] for more details. Presently there are 3 abstract benchmark classes
that may be used as building blocks for your benchmarks; `AbstractBenchmarkBase`, `AbstractGraphBenchmark`, and
`AbstractGraphMutateBenchmark`.
* `AbstractBenchmarkBase` - extend when your benchmark does not require a graph instance
* `AbstractGraphBenchmark` - extend when you are benchmarking read operations against a graph
* `AbstractGraphMutateBenchmark` - extend when you are benchmarking graph mutation operations eg. `g.addV()`, `graph.addVertex()`
[[rtc]]
== Review then Commit
Code modifications must go through a link:http://www.apache.org/foundation/glossary.html#ReviewThenCommit[review-then-commit] (RTC)
process before being merged into a release branch. All committers should follow the pattern below, where "you" refers
to the committer wanting to put code into a release branch.
* Make a JIRA ticket for the software problem you want to solve (i.e. a fix).
* Fork the release branch that the fix will be put into.
** The branch name should be the JIRA issue identifier (e.g. `TINKERPOP-XXX`).
* Develop your fix in your branch.
* When your fix is complete and ready to merge, issue a <<pull-requests,pull request>>.
** Be certain that the test suite is passing.
** If you updated documentation, be sure that the `process-docs.sh` is building the documentation correctly.
* Before you can merge your branch into the release branch, you must have at least 3 +1 link:http://www.apache.org/foundation/glossary.html#ConsensusApproval[consensus votes]
from other committers OR a single +1 from a committer and a seven day review period for objections (i.e. a "cool down
period") at which point we will assume a lazy consensus.
** Please see the Apache Software Foundations regulations regarding link:http://www.apache.org/foundation/voting.html#votes-on-code-modification[Voting on Code Modifications].
** With the "cool down" process and lazy consensus the single +1 may (should) come from the committer who submitted
the pull request (in other words, the change submitter and the reviewer are the same person).
** Committers are trusted with their changes, but are expected to request reviews for complex changes as necessary and
not rely strictly on lazy consensus.
* Votes are issued by TinkerPop committers as comments to the pull request.
* Once either consensus position is reached, you are responsible for merging to the release branch and handling any merge conflicts.
** If there is a higher version release branch that requires your fix (e.g. `3.y-1.z` fix going to a `3.y.z` release), multiple pull requests may be necessary (i.e. one for each branch).
* Be conscious of deleting your branch if it is no longer going to be used so stale branches don't pollute the repository.
NOTE: These steps also generally apply to external pull requests from those who are not official Apache committers. In
this case, the person responsible for the merge after voting is typically the first person available
who is knowledgeable in the area that the pull request affects. Any additional coordination on merging can be handled
via the pull request comment system.
For those performing reviews as part of this process it is worth noting that the notion of "review" is fairly wide for
our purposes. TinkerPop has grown into a large and complex code base and very few people (if anyone) is knowledgeable
on all of its modules. Detailed code reviews might often be difficult or impossible as a result.
To be clear, a "review" need not be specifically about the exact nature of the code. It is perfectly reasonable to
review (and VOTE) in the following fashion:
* VOTE +1 - ran docker integration tests and everything passes
* VOTE +1 - reviewed the code in detail - solid pull request
* VOTE +1 - agree with the principle of this pull request but don't fully understand the code
* VOTE +1 - read through the updated documentation and understand why this is important, nice
Non-committers are welcome to review and VOTE as well and while their VOTEs are not binding, they will be taken as
seriously as non-binding VOTEs on releases. Reviewing and VOTEing on pull requests as a non-committer is a great way
to contribute to the TinkerPop community and get a good pulse on the changes that are upcoming to the framework.
The following exceptions to the RTC (review-then-commit) model presented above are itemized below. It is up to the
committer to self-regulate as the itemization below is not complete and only hints at the types of commits that do not
require a review.
* You are responsible for a release and need to manipulate files accordingly for the release.
** `Gremlin.version()`, CHANGELOG dates, `pom.xml` version bumps, etc.
* You are doing an minor change and it is obvious that an RTC is not required (would be a pointless burden to the community).
** The fix is under the link:http://www.apache.org/foundation/glossary.html#CommitThenReview[commit-then-review] (CTR) policy and lazy consensus is sufficient, where a single -1 vote requires you to revert your changes.
** Adding a test case, fixing spelling/grammar mistakes in the documentation, fixing LICENSE/NOTICE/etc. files, fixing a minor issue in an already merged branch.
When the committer chooses CTR, it is considered good form to include something in the commit message that explains
that CTR was invoked and the reason for doing so. For example, "Invoking CTR as this change encompasses minor
adjustments to text formatting." CTR based commits will still require manual merging through all release branches.
Merges should occur in reverse order, starting with the latest release version first (e.g. if the fix is going to
3.3.x then the change should be merged in the following order `master`, `3.4-dev`, `3.3-dev`).
[[pull-requests]]
=== Pull Requests
When submitting a pull request to one of the <<branches, release branches>>, be sure it uses the following style:
* The title of the pull request is the JIRA ticket number + "colon" + the title of the JIRA ticket.
* The first line of the pull request message should contain a link to the JIRA ticket.
* Discuss what you did to solve the problem articulated in the JIRA ticket.
* Discuss any "extra" work done that go beyond the assumed requirements of the JIRA ticket.
* Be sure to explain what you did to prove that the issue is resolved.
** Test cases written.
** Integration tests run (if required for the work accomplished).
** Documentation building (if required for the work accomplished).
** Any manual testing (though this should be embodied in a test case).
* Notes about what you will do when you merge to the respective release branch (e.g. update CHANGELOG).
** These types of "on merge tweaks" are typically done to extremely dynamic files to combat and merge conflicts.
* If you are a TinkerPop committer, you can VOTE on your own pull request, so please do so.
A pull request will typically be made to a target <<branches, branch>>. Assuming that branch is upstream of other
release branches (e.g. a pull request made to for the branch containing 3.3.x must merge to the branch that releases
3.4.x), it is important to be sure that those changes are merged to the downstream release branches. If the merge from
one release branch to another is not terribly conflicted, it is likely safe to offer a single pull request and then
merge through the release branches after review. If there is conflict or the likelihood of test failures in downstream
branches then this process is best handled by multiple pull requests: one to each release branch. Release branches with
merged changes should be pushed in reverse order, starting with the latest release version first (e.g. if the fix is
going to 3.3.x then the change should be merged in the following order: `master`, 3.4-dev`, `3.3-dev`).
As an example, consider a situation where there is a feature branch named "TINKERPOP-1234" that contains a fix for
the `3.4-dev` branch:
[source,bash]
----
`git checkout -b TINKERPOP-1234 3.4-dev`
// do a bunch of stuff to implement TINKERPOP-1234 and commit/push
git checkout -b <TINKERPOP-1234-master> master
git merge TINKERPOP-1234
----
At this point, there are two branches, with the same set of commits going to `3.4-dev` and `master`. Voting will occur
on both pull requests. After a successful vote, it is time to merge. If there are no conflicts, then simply `git merge`
both pull requests to their respective branches. If there are conflicts, then there is some added work to do - time to
rebase:
[source,bash]
----
git checkout TINKERPOP-1234
git rebase origin/3.4-dev
----
Depending on the conflict, it might be a good idea to re-test before going any further, otherwise:
[source,bash]
----
git push origin TINKERPOP-1234 --force
----
Now, `git rebase` has re-written the commit history, which makes a mess of the other pull request to master. This
problem is rectified by essentially re-issuing the PR:
[source,bash]
----
git checkout TINKERPOP-1234-master
git reset --hard origin/master
git merge TINKERPOP-1234
----
Again, depending on the changes, it may make sense to re-test at this point, otherwise:
[source,bash]
----
git push origin TINKERPOP-1234-master --force
----
It should now be safe to merge both pull requests to their release branches.
IMPORTANT: Always take a moment to review the commits in a particular pull request. Be sure that they are *all* related
to the work that was done and that no extraneous commits are present that cannot be explained. Ensuring a pull request
only contains the expected commits is the responsibility of the committer as well as the reviewer.
[[dependencies]]
== Dependencies
There are many dependencies on other open source libraries in TinkerPop modules. When adding dependencies or
altering the version of a dependency, developers must consider the implications that may apply to the TinkerPop
LICENSE and NOTICE files. There are two implications to consider:
. Does the dependency fit an Apache _approved_ license?
. Given the addition or modification to a dependency, does it mean any change for TinkerPop LICENSE and NOTICE files?
Understanding these implications is important for insuring that TinkerPop stays compliant with the Apache 2 license
that it releases under.
Regarding the first item, refer to the Apache Legal for a list of link:http://www.apache.org/legal/resolved.html[approved licenses]
that are compatible with the Apache 2 license.
The second item requires a bit more effort to follow. The Apache website offers a
link:http://www.apache.org/dev/licensing-howto.html[how-to guide] on the approach to maintaining appropriate LICENSE
and NOTICE files, but this guide is designed to offer some more specific guidance as it pertains to TinkerPop
and its distribution.
To get started, TinkerPop has both "source" and "binary" LICENSE/NOTICE files:
* Source LICENSE/NOTICE relate to files packaged with the released source code distribution:
link:https://github.com/apache/tinkerpop/blob/master/LICENSE[LICENSE] / link:https://github.com/apache/tinkerpop/blob/master/NOTICE[NOTICE]
* Binary LICENSE/NOTICE relate to files packaged with the released binary distributions:
** Gremlin Console link:https://github.com/apache/tinkerpop/blob/master/gremlin-console/src/main/static/LICENSE[LICENSE]
/ link:https://github.com/apache/tinkerpop/blob/master/gremlin-console/src/main/static/NOTICE[NOTICE]
** Gremlin Server link:https://github.com/apache/tinkerpop/blob/master/gremlin-server/src/main/static/LICENSE[LICENSE]
/ link:https://github.com/apache/tinkerpop/blob/master/gremlin-server/src/main/static/NOTICE[NOTICE]
=== Source LICENSE and NOTICE
As dependencies are not typically added to the source distribution (i.e. the source zip distribution), there is
typically no need to edit source LICENSE/NOTICE when editing a TinkerPop `pom.xml`. These files only need to be edited
if the distribution has a file added to it. Such a situation may arise from several scenarios, but it would most
likely come from the addition of a source file from another library.
* If the file being bundled is Apache licensed, then add an entry to NOTICE.
* If the file being bundled is under a different approved license, then add an entry to LICENSE and include a copy of
that LICENSE in the root `/licenses` directory of the code repository.
=== Binary LICENSE and NOTICE
The binary LICENSE/NOTICE is perhaps most impacted by changes to the various `pom.xml` files. After altering the
`pom.xml` file of any module, build `gremlin-driver`, Gremlin Console and Gremlin Server and examine the contents of
the binary distributions:
* target/gremlin-driver-x.y.z-uber.jar
* target/gremlin-console-x.y.z-uber.jar
* target/apache-tinkerpop-gremlin-console-x.y.z-distribution.zip
* target/apache-tinkerpop-gremlin-server-x.y.z-distribution.zip
Apache licensed software does not need to be included in LICENSE, but if the new dependency is an Apache-approved
license (e.g. BSD, MIT) then it should be added in the pattern already defined. A copy of the LICENSE should be
added to the `<project>/src/main/static/licenses` directory of the code repository and the `maven-shade-plugin` section
of the `gremlin-console` and `gremlin-driver` `pom.xml` files should be updated to reference this new license file so
that it is included in the uber jar.
To determine if changes are required to the NOTICE, first check if the bundled jar has a NOTICE file in it (typically
found in `/META-INF` directory of the jar).
* If the bundled file does not have a NOTICE, then no changes to TinkerPop's NOTICE are required.
* If the NOTICE of the file being bundled is NOT Apache licensed, then there is no change to TinkerPop's NOTICE.
* If the NOTICE of the file being bundled is Apache licensed, then include the copyright notification in TinkerPop's
NOTICE.
* If the NOTICE of the file being bundled is Apache licensed AND is an Apache Software Foundation project, then
ONLY include the portion of that NOTICE in TinkerPop's NOTICE that is unrelated to the Apache boilerplate NOTICE.
If there is no such portion that is different than the boilerplate then this NOTICE can be excluded (i.e. don't
alter TinkerPop's NOTICE at all).
Please refer to the link:http://www.apache.org/dev/licensing-howto.html#mod-notice[Modifications to Notice] section
of the Apache "Licensing How-to" for more information.
[[documentation]]
== Documentation
The documentation for TinkerPop is stored in the git repository in `docs/src/` and are then split into several
subdirectories, each representing a "book" (or its own publishable body of work). If a new AsciiDoc file is added to
a book, then it should also be included in the `index.asciidoc` file for that book, otherwise the preprocessor will
ignore it. Likewise, if a whole new book (subdirectory) is added, it must include an `index.asciidoc` file to be
recognized by the AsciiDoc preprocessor.
Adding a book also requires a change to the root `pom.xml` file. Find the "asciidoc" Maven profile and add a new
`<execution>` to the `asciidoctor-maven-plugin` configuration. For each book in `docs/src/`, there should be a
related `<execution>` that generates the HTML from the AsciiDoc. Follows the patterns already established by
the existing `<execution>` entries, paying special attention to the pathing of the '<sourceDirectory>',
`<outputDirectory>` and `<imagesdir>`. Note that the `<outputDirectory>` represents where the book will exist when
uploaded to the server and should preserve the directory structure in git as referenced in `<sourceDirectory>`.
Adding Gremlin code examples to any of the link:https://github.com/apache/tinkerpop/tree/master/docs/src/recipes[docs/src/recipes]
or to link:https://github.com/apache/tinkerpop/tree/master/docs/src/reference/the-traversal.asciidoc[docs/src/reference/the-traversal.asciidoc]
also has the effect of improving testing of the Gremlin language. All Gremlin found in code sections that are marked
as `[gremlin-groovy]` are tested in two ways:
1. When `mvn clean install` is executed all such Gremlin are passed through the grammar parser to ensure validity.
As the grammar parser is not a Groovy parser, the test framework attempts to filter away or ignore things it can't
possibly parse. Ideally, examples should be written in such a way as to be parsed by the grammar, but in cases where it
cannot be as such, the test suite simply needs to be modified to suitably ignore the example.
2. When the documentation is built, the code snippets are actually executed and errors will result in a failure to
build the documentation.
Please see the <<building-testing,Building and Testing>> section for more information on how to generate the
documentation.
=== Asciidoc Formatting Tips
*Use Asciidoctor*
Asciidoc may render differently with different tools. What may look proper and correct with an IDE may be different
than what is ultimately generated during the official build of the documentation which uses Asciidoctor. As a result
it's best to not rely on any other view of changes besides one generated by Asciidoctor.
*Anonymous Traversal Formatting*
The double underscore (i.e. `+__+`) does not typically render right and requires such code to be wrapped with
`pass:[`pass:[__]`]` or `pass:[`+__+`]`.
Cause: link:https://github.com/asciidoctor/asciidoctor/issues/1717[#1717],
link:https://github.com/asciidoctor/asciidoctor/issues/1066[#1066]
*Non-whitespace After Backtick*
Use double backtick if there is non-whitespace immediately following the trailing backtick. So rather than:
pass:[`ScriptInputFormat`'s], prefer pass:[``ScriptInputFormat``'s].
Original: [...] globally available for `ScriptInputFormat`'s `parse()` method
Fixed: [...] globally available for ``ScriptInputFormat``'s `parse()` method
Cause: link:https://github.com/asciidoctor/asciidoctor/issues/1514[#1514]
[[site]]
== Site
The content for the TinkerPop home page and related pages that make up the web site at link://tinkerpop.apache.org[tinkerpop.apache.org]
is stored in the git repository under `/docs/site`. In this way, it becomes easier for the community to provide content
presented there, because the content can be accepted via the standard workflow of a pull request. To generate the site
for local viewing, run `bin/generate-home.sh`, which will build the site in `target/site/`. Note that Node.js and npm
have to be installed in order for the script to work. See the <<nodejs-environment,JavaScript Environment>> section for
more info about what parts of TinkerPop depend on Node.js and npm. While most of the generated website files can be
viewed locally by opening them in a browser, some of them rely on imported resources that will be blocked by the
browser's same-origin policy if not served from a single origin using a web server. The generated website can be served
locally by running `npx serve target/site/home`. PMC members can officially publish the site with
`bin/publish-home.sh <username>`.
"Publishing" does not publish documentation (e.g. reference docs, javadocs, etc) and only publishes what is generated
from the content in `/docs/site`. Publishing the site can be performed out of band with the release cycle and is no
way tied to a version. The `master` branch should always be considered the "current" web site and publishing should
only happen from that branch.
[[logging]]
== Logging
TinkerPop uses SLF4j for logging and typically leans back on Log4j as the implementation. Configuring log outputs
for debugging purposes within tests can be altered by editing the `log4j-test.properties` file in each module's test
resources. That file gets copied to the `target/test-classes` on build and surefire and failsafe plugins in maven
are then configured to point at that area of the file system for those configuration files. The properties files
can be edited to fine tune control of the log output, but generally speaking the current configuration is likely
best for everyone's general purposes, so if changes are made please revert them prior to commit.
[[io]]
== IO Documentation and Testing
The link:https://tinkerpop.apache.org/docs/x.y.z/dev/io[IO Documentation] provides more details into GraphML, GraphSON
and Gryo with a special focus on the needs of developers who are working directly with these formats. GraphSON gets
the greatest focus here as it is used as the primary IO format for link:https://tinkerpop.apache.org/docs/x.y.z/reference/#gremlin-drivers-variants[GLVs].
This documentation is largely generated from the `gremlin-io-test` module found under `gremlin-tools`. The
`gremlin-io-test` module also includes a testing framework which validates that formats don't break between TinkerPop
versions. Unfortunately, this module requires some maintenance to ensure that the documentation and tests both stay
updated.
The `gremlin-io-test` module contains a set of files in the test resources that are statically bound to the version in
which they were generated. Older versions should never be modified. The only time changes to these resources should be
accepted should be for the current `SNAPSHOT` version. The test resources are generated from the `Model` class which
contains the objects that will undergo serialization for purpose of testing. Note that these same objects in the
`Model` are also used to generate documentation.
To generate these test resources and documentation snippets based on the `Model`, use this Maven command:
[source,text]
----
mvn clean install -pl :gremlin-io-test -Dio
----
This command will generate two directories in the `/target` output directory of `gremlin-io-test`: `test-case-data`
and `dev-docs`. The contents of `test-case-data` represents the serialized `Model` objects that can be copied to the
test resources and the contents of the `dev-docs` contains asciidoc snippets that can be copied to the IO documentation.
Generating data files in the fashion mentioned above with Maven is only good for versions of TinkerPop on the 3.3.x
line because the `gremlin-io-test` module did not exist in 3.2.x. Of course, compatibility is still tested back to
those older versions. To generate test data from 3.2.x, there are Groovy scripts in the comments of the
`gryo.asciidoc` and `graphson.asciidoc` files that can copy/pasted to the Gremlin Console. They will generate the
batch of test files needed for `gremlin-io-test`.
When does this command need to be executed?
1. If a new object is added to the `Model` - in this case, the newly created data files should be copied to the
appropriate test resource directory for the current `SNAPSHOT` version and the appropriate asciidoc snippet added to
the IO asciidocs.
2. After the release of a new TinkerPop version - in this case, a new test resource directory should be created for
the `SNAPSHOT` version and the generated `test-case-data` copied in appropriately.
The second case, does require some additional discussion. When a new version is added the following classes will need
to be updated in the following ways:
*GraphBinaryCompatibility* - Include new GraphBinary 1.0 enums for the current `SNAPSHOT`.
[source,java]
----
V1_3_4_3("3.4.3", "1.0", "v1"),
V1_3_4_4("3.4.4", "1.0", "v1");
----
*GryoCompatibility* - Include new Gryo 1.0 and 3.0 enums for the current `SNAPSHOT`.
[source,java]
----
V1D0_3_3_x("3.3.x", "1.0", "v1d0"),
V3D0_3_3_x("3.3.x", "3.0", "v3d0")
----
*GraphSONCompatibility* - Include new GraphSON enums for each of the various GraphSON configurations and versions.
[source,java]
----
V1D0_3_3_x("3.3.x", "1.0", "v1d0"),
V2D0_PARTIAL_3_3_x("3.3.x", "2.0", "v2d0-partial"),
V2D0_NO_TYPE_3_3_x("3.3.x", "2.0", "v2d0-no-types"),
V3D0_PARTIAL_3_3_x("3.3.x", "3.0", "v3d0");
----
*GraphBinaryCompatibilityTest* - Add the newly included `GraphBinaryCompatibility` enums to the test parameters being
careful to match the appropriate "mapper" to the right version.
*GryoCompatibilityTest* - Add the newly included `GryoCompatibility` enums to the test parameters being careful to
match the appropriate "mapper" to the right version.
*GraphSONUntypedCompatibilityTest* - Add the newly included GraphSON 1.0 and 2.0 "untyped" enums to the test parameters
being careful to match the appropriate "mapper" to the right version.
*GraphSONTypedCompatibilityTest* - Add the newly included GraphSON 3.0 and 2.0 "typed" enums to the test parameters
being careful to match the appropriate "mapper" to the right version.
At this point, all of the IO tests are rigged up properly and assuming the test resources are available a standard
`mvn clean install` should execute the compatibility tests and validate that everything is working as expected and
that there are no breaks in serialization processes.