History of Apache Hadoop Ozone project

Ozone development was started on a feature branch HDFS-7240 as part of the Apache Hadoop HDFS project. Based on the Jira information the first Ozone commit was the commit of HDFS-8456 Ozone: Introduce STORAGE_CONTAINER_SERVICE as a new NodeType. in May 2015.

Ozone is an Object Store for Hadoop which is based on a lower level storage replication layer. This layer was originally called HDSL (Hadoop Distributed Storage Layer) and later renamed to HDDS (Hadoop Distributed Data Storage).

Implementation of the generic storage layer began under HDFS-11118 together with a iScsi/jScsi based block storage layer (“CBlock”) introduced by HDFS-11361.

As a summary:

  • HDDS (earlier HDSL): replicates huge binary containers between datanodes
  • Ozone: provides Object Store semantics with the help of HDDS
  • CBlock: provides mountable volumes with the help of the HDDS layer (based on iScsi protocol)

In the beginning of the year 2017 a new podling project was started inside Apache Incubator: Apache Ratis. Ratis is an embeddable RAFT protcol implementation it is which became the corner stone of consensus inside both Ozone and HDDS projects. (Started to be used by Ozone in March of 2017)

In the October of 2017 a discussion has been started on hdfs-dev mailing list to merge the existing functionality to the Apache Hadoop trunk. After a long debate Owen O'Malley suggested a consensus to merge it to the trunk but use separated release cycle:

  • HDSL become a subproject of Hadoop.
  • HDSL will release separately from Hadoop. Hadoop releases will notcontain HDSL and vice versa.
  • HDSL will get its own jira instance so that the release tags stay separate.
  • On trunk (as opposed to release branches) HDSL will be a separate module in Hadoop's source tree. This will enable the HDSL to work on their trunk and the Hadoop trunk without making releases for every change.
  • Hadoop's trunk will only build HDSL if a non-default profile is enabled. When Hadoop creates a release branch, the RM will delete the HDSL module from the branch.
  • HDSL will have their own Yetus checks and won't cause failures in the Hadoop patch check.

This proposal was passed and after reorganizing the code (see HDFS-13258) and Ozone has been voted to be merged to the Hadoop trunk at the March of 2018.

As the CBlock feature was not stable enough it was not merged and archived on a separated feature branch which was not synced with the newer Ozone/HDDS features. (Somewhat similar functionality is provided later with S3 Fuse file system and an S3 compatible REST gateway.)

After the merge a new Jira project was created (HDDS) and the work was tracked under that project instead of child issues under HDFS-7240.

In the next year multiple Ozone releases has been published in separated release package. The Ozone source release was developed on the Hadoop trunk, but the Ozone sources are removed from the main Hadoop releases.

Originally, Ozone depended on the in-tree (SNAPSHOT) Hadoop artifacts. It was required to compile the core hadoop-hdfs/hadoop-common artifacts before compiling the Ozone subprojects. During the development this dependency was reduced more and more. With the 0.4.1 release this dependency has been totally removed and it became possible to compile Ozone with the help of the released Hadoop artifacts which made it possible to separate the development of Ozone from the main Hadoop trunk branch.

In October 2019, the Ozone sources were moved out to the apache/hadoop-ozone git repository. During this move the git history was transformed to remove old YARN/HDFS/MAPREDUCE tasks.

  • The first git commit of the new repository is the commit which created the new maven subprojects for Ozone (before the trunk merge)
  • Some of the oldest Ozone commits are available only from the Hadoop repository.
  • Some newer HDDS commits have different commit hash in hadoop and hadoop-ozone repository.

In March 2020, Ozone 0.5.0 was released, the first release marked as _beta_tag (earlier releases were alpha).