| <!DOCTYPE html> |
| <html> |
| <head> |
| <meta charset="utf-8"> |
| <meta http-equiv="X-UA-Compatible" content="IE=edge"> |
| <meta name="viewport" content="width=device-width, initial-scale=1"> |
| |
| <link rel="shortcut icon" href="/favicon.ico" type="image/x-icon"> |
| <link rel="icon" href="/favicon.ico" type="image/x-icon"> |
| |
| <title>Storm Distributed Cache API</title> |
| |
| <!-- Bootstrap core CSS --> |
| <link href="/assets/css/bootstrap.min.css" rel="stylesheet"> |
| <!-- Bootstrap theme --> |
| <link href="/assets/css/bootstrap-theme.min.css" rel="stylesheet"> |
| |
| <!-- Custom styles for this template --> |
| <link rel="stylesheet" href="http://fortawesome.github.io/Font-Awesome/assets/font-awesome/css/font-awesome.css"> |
| <link href="/css/style.css" rel="stylesheet"> |
| <link href="/assets/css/owl.theme.css" rel="stylesheet"> |
| <link href="/assets/css/owl.carousel.css" rel="stylesheet"> |
| <script type="text/javascript" src="/assets/js/jquery.min.js"></script> |
| <script type="text/javascript" src="/assets/js/bootstrap.min.js"></script> |
| <script type="text/javascript" src="/assets/js/owl.carousel.min.js"></script> |
| <script type="text/javascript" src="/assets/js/storm.js"></script> |
| <!-- Just for debugging purposes. Don't actually copy these 2 lines! --> |
| <!--[if lt IE 9]><script src="../../assets/js/ie8-responsive-file-warning.js"></script><![endif]--> |
| |
| <!-- HTML5 shim and Respond.js for IE8 support of HTML5 elements and media queries --> |
| <!--[if lt IE 9]> |
| <script src="https://oss.maxcdn.com/html5shiv/3.7.2/html5shiv.min.js"></script> |
| <script src="https://oss.maxcdn.com/respond/1.4.2/respond.min.js"></script> |
| <![endif]--> |
| </head> |
| |
| |
| <body> |
| <header> |
| <div class="container-fluid"> |
| <div class="row"> |
| <div class="col-md-5"> |
| <a href="/index.html"><img src="/images/logo.png" class="logo" /></a> |
| </div> |
| <div class="col-md-5"> |
| |
| <h1>Version: 2.3.0</h1> |
| |
| </div> |
| <div class="col-md-2"> |
| <a href="/downloads.html" class="btn-std btn-block btn-download">Download</a> |
| </div> |
| </div> |
| </div> |
| </header> |
| <!--Header End--> |
| <!--Navigation Begin--> |
| <div class="navbar" role="banner"> |
| <div class="container-fluid"> |
| <div class="navbar-header"> |
| <button class="navbar-toggle" type="button" data-toggle="collapse" data-target=".bs-navbar-collapse"> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| <span class="icon-bar"></span> |
| </button> |
| </div> |
| <nav class="collapse navbar-collapse bs-navbar-collapse" role="navigation"> |
| <ul class="nav navbar-nav"> |
| <li><a href="/index.html" id="home">Home</a></li> |
| <li><a href="/getting-help.html" id="getting-help">Getting Help</a></li> |
| <li><a href="/about/integrates.html" id="project-info">Project Information</a></li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown" id="documentation">Documentation <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| |
| |
| <li><a href="/releases/2.3.0/index.html">2.3.0</a></li> |
| |
| |
| |
| <li><a href="/releases/2.2.0/index.html">2.2.0</a></li> |
| |
| |
| |
| <li><a href="/releases/2.1.0/index.html">2.1.0</a></li> |
| |
| |
| |
| <li><a href="/releases/2.0.0/index.html">2.0.0</a></li> |
| |
| |
| |
| <li><a href="/releases/1.2.3/index.html">1.2.3</a></li> |
| |
| |
| </ul> |
| </li> |
| <li><a href="/talksAndVideos.html">Talks and Slideshows</a></li> |
| <li class="dropdown"> |
| <a href="#" class="dropdown-toggle" data-toggle="dropdown" id="contribute">Community <b class="caret"></b></a> |
| <ul class="dropdown-menu"> |
| <li><a href="/contribute/Contributing-to-Storm.html">Contributing</a></li> |
| <li><a href="/contribute/People.html">People</a></li> |
| <li><a href="/contribute/BYLAWS.html">ByLaws</a></li> |
| </ul> |
| </li> |
| <li><a href="/2020/06/30/storm220-released.html" id="news">News</a></li> |
| </ul> |
| </nav> |
| </div> |
| </div> |
| |
| |
| |
| <div class="container-fluid"> |
| <h1 class="page-title">Storm Distributed Cache API</h1> |
| <div class="row"> |
| <div class="col-md-12"> |
| <!-- Documentation --> |
| |
| <p class="post-meta"></p> |
| |
| <div class="documentation-content"><h1 id="storm-distributed-cache-api">Storm Distributed Cache API</h1> |
| |
| <p>The distributed cache feature in storm is used to efficiently distribute files |
| (or blobs, which is the equivalent terminology for a file in the distributed |
| cache and is used interchangeably in this document) that are large and can |
| change during the lifetime of a topology, such as geo-location data, |
| dictionaries, etc. Typical use cases include phrase recognition, entity |
| extraction, document classification, URL re-writing, location/address detection |
| and so forth. Such files may be several KB to several GB in size. For small |
| datasets that don't need dynamic updates, including them in the topology jar |
| could be fine. But for large files, the startup times could become very large. |
| In these cases, the distributed cache feature can provide fast topology startup, |
| especially if the files were previously downloaded for the same submitter and |
| are still in the cache. This is useful with frequent deployments, sometimes few |
| times a day with updated jars, because the large cached files will remain available |
| without changes. The large cached blobs that do not change frequently will |
| remain available in the distributed cache.</p> |
| |
| <p>At the starting time of a topology, the user specifies the set of files the |
| topology needs. Once a topology is running, the user at any time can request for |
| any file in the distributed cache to be updated with a newer version. The |
| updating of blobs happens in an eventual consistency model. If the topology |
| needs to know what version of a file it has access to, it is the responsibility |
| of the user to find this information out. The files are stored in a cache with |
| Least-Recently Used (LRU) eviction policy, where the supervisor decides which |
| cached files are no longer needed and can delete them to free disk space. The |
| blobs can be compressed, and the user can request the blobs to be uncompressed |
| before it accesses them.</p> |
| |
| <h2 id="motivation-for-distributed-cache">Motivation for Distributed Cache</h2> |
| |
| <ul> |
| <li>Allows sharing blobs among topologies.</li> |
| <li>Allows updating the blobs from the command line.</li> |
| </ul> |
| |
| <h2 id="distributed-cache-implementations">Distributed Cache Implementations</h2> |
| |
| <p>The current BlobStore interface has the following two implementations |
| * LocalFsBlobStore |
| * HdfsBlobStore</p> |
| |
| <p>Appendix A contains the interface for blobstore implementation.</p> |
| |
| <h2 id="localfsblobstore">LocalFsBlobStore</h2> |
| |
| <p><img src="images/local_blobstore.png" alt="LocalFsBlobStore"></p> |
| |
| <p>Local file system implementation of Blobstore can be depicted in the above timeline diagram.</p> |
| |
| <p>There are several stages from blob creation to blob download and corresponding execution of a topology. |
| The main stages can be depicted as follows</p> |
| |
| <h3 id="blob-creation-command">Blob Creation Command</h3> |
| |
| <p>Blobs in the blobstore can be created through command line using the following command.</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm blobstore create --file README.txt --acl o::rwa --replication-factor 4 key1 |
| </code></pre></div> |
| <p>The above command creates a blob with a key name “key1” corresponding to the file README.txt. |
| The access given to all users being read, write and admin with a replication factor of 4.</p> |
| |
| <h3 id="topology-submission-and-blob-mapping">Topology Submission and Blob Mapping</h3> |
| |
| <p>Users can submit their topology with the following command. The command includes the |
| topology map configuration. The configuration holds two keys “key1” and “key2” with the |
| key “key1” having a local file name mapping named “blob_file” and it is not compressed.</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm jar /home/y/lib/storm-starter/current/storm-starter-jar-with-dependencies.jar |
| org.apache.storm.starter.clj.word_count test_topo -c topology.blobstore.map='{"key1":{"localname":"blob_file", "uncompress":false},"key2":{}}' |
| </code></pre></div> |
| <h3 id="blob-creation-process">Blob Creation Process</h3> |
| |
| <p>The creation of the blob takes place through the interface “ClientBlobStore”. Appendix B contains the “ClientBlobStore” interface. |
| The concrete implementation of this interface is the “NimbusBlobStore”. In the case of local file system the client makes a |
| call to the nimbus to create the blobs within the local file system. The nimbus uses the local file system implementation to create these blobs. |
| When a user submits a topology, the jar, configuration and code files are uploaded as blobs with the help of blobstore. |
| Also, all the other blobs specified by the topology are mapped to it with the help of topology.blobstore.map configuration.</p> |
| |
| <h3 id="blob-download-by-the-supervisor">Blob Download by the Supervisor</h3> |
| |
| <p>Finally, the blobs corresponding to a topology are downloaded by the supervisor once it receives the assignments from the nimbus through |
| the same “NimbusBlobStore” thrift client that uploaded the blobs. The supervisor downloads the code, jar and conf blobs by calling the |
| “NimbusBlobStore” client directly while the blobs specified in the topology.blobstore.map are downloaded and mapped locally with the help |
| of the Localizer. The Localizer talks to the “NimbusBlobStore” thrift client to download the blobs and adds the blob compression and local |
| blob name mapping logic to suit the implementation of a topology. Once all the blobs have been downloaded the workers are launched to run |
| the topologies.</p> |
| |
| <h2 id="hdfsblobstore">HdfsBlobStore</h2> |
| |
| <p><img src="images/hdfs_blobstore.png" alt="HdfsBlobStore"></p> |
| |
| <p>The HdfsBlobStore functionality has a similar implementation and blob creation and download procedure barring how the replication |
| is handled in the two blobstore implementations. The replication in HDFS blobstore is obvious as HDFS is equipped to handle replication |
| and it requires no state to be stored inside the zookeeper. On the other hand, the local file system blobstore requires the state to be |
| stored on the zookeeper in order for it to work with nimbus HA. Nimbus HA allows the local filesystem to implement the replication feature |
| seamlessly by storing the state in the zookeeper about the running topologies and syncing the blobs on various nimbuses. On the supervisor’s |
| end, the supervisor and localizer talks to HdfsBlobStore through “HdfsClientBlobStore” implementation.</p> |
| |
| <h2 id="additional-features-and-documentation">Additional Features and Documentation</h2> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm jar /home/y/lib/storm-starter/current/storm-starter-jar-with-dependencies.jar org.apache.storm.starter.clj.word_count test_topo |
| -c topology.blobstore.map='{"key1":{"localname":"blob_file", "uncompress":false},"key2":{}}' |
| </code></pre></div> |
| <h3 id="compression">Compression</h3> |
| |
| <p>The blobstore allows the user to specify the “uncompress” configuration to true or false. This configuration can be specified |
| in the topology.blobstore.map mentioned in the above command. This allows the user to upload a compressed file like a tarball/zip. |
| In local file system blobstore, the compressed blobs are stored on the nimbus node. The localizer code takes the responsibility to |
| uncompress the blob and store it on the supervisor node. Symbolic links to the blobs on the supervisor node are created within the worker |
| before the execution starts.</p> |
| |
| <h3 id="local-file-name-mapping">Local File Name Mapping</h3> |
| |
| <p>Apart from compression the blobstore helps to give the blob a name that can be used by the workers. The localizer takes |
| the responsibility of mapping the blob to a local name on the supervisor node.</p> |
| |
| <h2 id="additional-blobstore-implementation-details">Additional Blobstore Implementation Details</h2> |
| |
| <p>Blobstore uses a hashing function to create the blobs based on the key. The blobs are generally stored inside the directory specified by |
| the blobstore.dir configuration. By default, it is stored under “storm.local.dir/blobs” for local file system and a similar path on |
| hdfs file system.</p> |
| |
| <p>Once a file is submitted, the blobstore reads the configs and creates a metadata for the blob with all the access control details. The metadata |
| is generally used for authorization while accessing the blobs. The blob key and version contribute to the hash code and there by the directory |
| under “storm.local.dir/blobs/data” where the data is placed. The blobs are generally placed in a positive number directory like 193,822 etc.</p> |
| |
| <p>Once the topology is launched and the relevant blobs have been created, the supervisor downloads blobs related to the storm.conf, storm.ser |
| and storm.code first and all the blobs uploaded by the command line separately using the localizer to uncompress and map them to a local name |
| specified in the topology.blobstore.map configuration. The supervisor periodically updates blobs by checking for the change of version. |
| This allows updating the blobs on the fly and thereby making it a very useful feature.</p> |
| |
| <p>For a local file system, the distributed cache on the supervisor node is set to 10240 MB as a soft limit and the clean up code attempts |
| to clean anything over the soft limit every 600 seconds based on LRU policy.</p> |
| |
| <p>The HDFS blobstore implementation handles load better by removing the burden on the nimbus to store the blobs, which avoids it becoming a bottleneck. Moreover, it provides seamless replication of blobs. On the other hand, the local file system blobstore is not very efficient in |
| replicating the blobs and is limited by the number of nimbuses. Moreover, the supervisor talks to the HDFS blobstore directly without the |
| involvement of the nimbus and thereby reduces the load and dependency on nimbus.</p> |
| |
| <h2 id="highly-available-nimbus">Highly Available Nimbus</h2> |
| |
| <h3 id="problem-statement">Problem Statement:</h3> |
| |
| <p>Currently the storm master aka nimbus, is a process that runs on a single machine under supervision. In most cases, the |
| nimbus failure is transient and it is restarted by the process that does supervision. However sometimes when disks fail and networks |
| partitions occur, nimbus goes down. Under these circumstances, the topologies run normally but no new topologies can be |
| submitted, no existing topologies can be killed/deactivated/activated and if a supervisor node fails then the |
| reassignments are not performed resulting in performance degradation or topology failures. With this project we intend, |
| to resolve this problem by running nimbus in a primary backup mode to guarantee that even if a nimbus server fails one |
| of the backups will take over. </p> |
| |
| <h3 id="requirements-for-highly-available-nimbus">Requirements for Highly Available Nimbus:</h3> |
| |
| <ul> |
| <li>Increase overall availability of nimbus.</li> |
| <li>Allow nimbus hosts to leave and join the cluster at will any time. A newly joined host should auto catch up and join |
| the list of potential leaders automatically. </li> |
| <li>No topology resubmissions required in case of nimbus fail overs.</li> |
| <li>No active topology should ever be lost. </li> |
| </ul> |
| |
| <h4 id="leader-election">Leader Election:</h4> |
| |
| <p>The nimbus server will use the following interface:</p> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">interface</span> <span class="nc">ILeaderElector</span> <span class="o">{</span> |
| <span class="cm">/** |
| * queue up for leadership lock. The call returns immediately and the caller |
| * must check isLeader() to perform any leadership action. |
| */</span> |
| <span class="kt">void</span> <span class="nf">addToLeaderLockQueue</span><span class="o">();</span> |
| |
| <span class="cm">/** |
| * Removes the caller from the leader lock queue. If the caller is leader |
| * also releases the lock. |
| */</span> |
| <span class="kt">void</span> <span class="nf">removeFromLeaderLockQueue</span><span class="o">();</span> |
| |
| <span class="cm">/** |
| * |
| * @return true if the caller currently has the leader lock. |
| */</span> |
| <span class="kt">boolean</span> <span class="nf">isLeader</span><span class="o">();</span> |
| |
| <span class="cm">/** |
| * |
| * @return the current leader's address , throws exception if noone has has lock. |
| */</span> |
| <span class="n">InetSocketAddress</span> <span class="nf">getLeaderAddress</span><span class="o">();</span> |
| |
| <span class="cm">/** |
| * |
| * @return list of current nimbus addresses, includes leader. |
| */</span> |
| <span class="n">List</span><span class="o"><</span><span class="n">InetSocketAddress</span><span class="o">></span> <span class="nf">getAllNimbusAddresses</span><span class="o">();</span> |
| <span class="o">}</span> |
| </code></pre></div> |
| <p>Once a nimbus comes up it calls addToLeaderLockQueue() function. The leader election code selects a leader from the queue. |
| If the topology code, jar or config blobs are missing, it would download the blobs from any other nimbus which is up and running.</p> |
| |
| <p>The first implementation will be Zookeeper based. If the zookeeper connection is lost/reset resulting in loss of lock |
| or the spot in queue the implementation will take care of updating the state such that isLeader() will reflect the |
| current status. The leader like actions must finish in less than minimumOf(connectionTimeout, SessionTimeout) to ensure |
| the lock was held by nimbus for the entire duration of the action (Not sure if we want to just state this expectation |
| and ensure that zk configurations are set high enough which will result in higher failover time or we actually want to |
| create some sort of rollback mechanism for all actions, the second option needs a lot of code). If a nimbus that is not |
| leader receives a request that only a leader can perform, it will throw a RunTimeException.</p> |
| |
| <h3 id="nimbus-state-store">Nimbus state store:</h3> |
| |
| <p>To achieve fail over from primary to backup servers nimbus state/data needs to be replicated across all nimbus hosts or |
| needs to be stored in a distributed storage. Replicating the data correctly involves state management, consistency checks |
| and it is hard to test for correctness. However many storm users do not want to take extra dependency on another replicated |
| storage system like HDFS and still need high availability. The blobstore implementation along with the state storage helps |
| to overcome the failover scenarios in case a leader nimbus goes down.</p> |
| |
| <p>To support replication we will allow the user to define a code replication factor which would reflect number of nimbus |
| hosts to which the code must be replicated before starting the topology. With replication comes the issue of consistency. |
| The topology is launched once the code, jar and conf blob files are replicated based on the "topology.min.replication" config. |
| Maintaining state for failover scenarios is important for local file system. The current implementation makes sure one of the |
| available nimbus is elected as a leader in the case of a failure. If the topology specific blobs are missing, the leader nimbus |
| tries to download them as and when they are needed. With this current architecture, we do not have to download all the blobs |
| required for a topology for a nimbus to accept leadership. This helps us in case the blobs are very large and avoid causing any |
| inadvertant delays in electing a leader.</p> |
| |
| <p>The state for every blob is relevant for the local blobstore implementation. For HDFS blobstore the replication |
| is taken care by the HDFS. For handling the fail over scenarios for a local blobstore we need to store the state of the leader and |
| non-leader nimbuses within the zookeeper.</p> |
| |
| <p>The state is stored under /storm/blobstore/key/nimbusHostPort:SequenceNumber for the blobstore to work to make nimbus highly available. |
| This state is used in the local file system blobstore to support replication. The HDFS blobstore does not have to store the state inside the |
| zookeeper.</p> |
| |
| <ul> |
| <li><p>NimbusHostPort: This piece of information generally contains the parsed string holding the hostname and port of the nimbus. |
| It uses the same class “NimbusHostPortInfo” used earlier by the code-distributor interface to store the state and parse the data.</p></li> |
| <li><p>SequenceNumber: This is the blob sequence number information. The SequenceNumber information is implemented by a KeySequenceNumber class. |
| The sequence numbers are generated for every key. For every update, the sequence numbers are assigned based ona global sequence number |
| stored under /storm/blobstoremaxsequencenumber/key. For more details about how the numbers are generated you can look at the java docs for KeySequenceNumber.</p></li> |
| </ul> |
| |
| <p><img src="images/nimbus_ha_blobstore.png" alt="Nimbus High Availability - BlobStore"></p> |
| |
| <p>The sequence diagram proposes how the blobstore works and the state storage inside the zookeeper makes the nimbus highly available. |
| Currently, the thread to sync the blobs on a non-leader is within the nimbus. In the future, it will be nice to move the thread around |
| to the blobstore to make the blobstore coordinate the state change and blob download as per the sequence diagram.</p> |
| |
| <h2 id="thrift-and-rest-api">Thrift and Rest API</h2> |
| |
| <p>In order to avoid workers/supervisors/ui talking to zookeeper for getting master nimbus address we are going to modify the |
| <code>getClusterInfo</code> API so it can also return nimbus information. getClusterInfo currently returns <code>ClusterSummary</code> instance |
| which has a list of <code>supervisorSummary</code> and a list of <code>topologySummary</code> instances. We will add a list of <code>NimbusSummary</code> |
| to the <code>ClusterSummary</code>. See the structures below:</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">struct ClusterSummary { |
| 1: required list<SupervisorSummary> supervisors; |
| 3: required list<TopologySummary> topologies; |
| 4: required list<NimbusSummary> nimbuses; |
| } |
| |
| struct NimbusSummary { |
| 1: required string host; |
| 2: required i32 port; |
| 3: required i32 uptime_secs; |
| 4: required bool isLeader; |
| 5: required string version; |
| } |
| </code></pre></div> |
| <p>This will be used by StormSubmitter, Nimbus clients, supervisors and ui to discover the current leaders and participating |
| nimbus hosts. Any nimbus host will be able to respond to these requests. The nimbus hosts can read this information once |
| from zookeeper and cache it and keep updating the cache when the watchers are fired to indicate any changes,which should |
| be rare in general case.</p> |
| |
| <p>Note: All nimbus hosts have watchers on zookeeper to be notified immediately as soon as a new blobs is available for download, the callback may or may not download |
| the code. Therefore, a background thread is triggered to download the respective blobs to run the topologies. The replication is achieved when the blobs are downloaded |
| onto non-leader nimbuses. So you should expect your topology submission time to be somewhere between 0 to (2 * nimbus.code.sync.freq.secs) for any |
| nimbus.min.replication.count > 1.</p> |
| |
| <h2 id="configuration">Configuration</h2> |
| <div class="highlight"><pre><code class="language-" data-lang="">blobstore.dir: The directory where all blobs are stored. For local file system it represents the directory on the nimbus |
| node and for HDFS file system it represents the hdfs file system path. |
| |
| supervisor.blobstore.class: This configuration is meant to set the client for the supervisor in order to talk to the blobstore. |
| For a local file system blobstore it is set to “org.apache.storm.blobstore.NimbusBlobStore” and for the HDFS blobstore it is set |
| to “org.apache.storm.blobstore.HdfsClientBlobStore”. |
| |
| supervisor.blobstore.download.thread.count: This configuration spawns multiple threads for from the supervisor in order download |
| blobs concurrently. The default is set to 5 |
| |
| supervisor.blobstore.download.max_retries: This configuration is set to allow the supervisor to retry for the blob download. |
| By default it is set to 3. |
| |
| supervisor.localizer.cache.target.size.mb: The jvm opts provided to workers launched by this supervisor. All "%ID%" substrings |
| are replaced with an identifier for this worker. Also, "%WORKER-ID%", "%STORM-ID%" and "%WORKER-PORT%" are replaced with |
| appropriate runtime values for this worker. The distributed cache target size in MB. This is a soft limit to the size |
| of the distributed cache contents. It is set to 10240 MB. |
| |
| supervisor.localizer.cleanup.interval.ms: The distributed cache cleanup interval. Controls how often it scans to attempt to |
| cleanup anything over the cache target size. By default it is set to 300000 milliseconds. |
| |
| supervisor.localizer.update.blob.interval.secs: The distributed cache interval for checking for blobs to update. By |
| default it is set to 30 seconds. |
| |
| nimbus.blobstore.class: Sets the blobstore implementation nimbus uses. It is set to "org.apache.storm.blobstore.LocalFsBlobStore" |
| |
| nimbus.blobstore.expiration.secs: During operations with the blobstore, via master, how long a connection is idle before nimbus |
| considers it dead and drops the session and any associated connections. The default is set to 600. |
| |
| storm.blobstore.inputstream.buffer.size.bytes: The buffer size it uses for blobstore upload. It is set to 65536 bytes. |
| |
| client.blobstore.class: The blobstore implementation the storm client uses. The current implementation uses the default |
| config "org.apache.storm.blobstore.NimbusBlobStore". |
| |
| blobstore.replication.factor: It sets the replication for each blob within the blobstore. The “topology.min.replication.count” |
| ensures the minimum replication the topology specific blobs are set before launching the topology. You might want to set the |
| “topology.min.replication.count <= blobstore.replication”. The default is set to 3. |
| |
| topology.min.replication.count : Minimum number of nimbus hosts where the code must be replicated before leader nimbus |
| can mark the topology as active and create assignments. Default is 1. |
| |
| topology.max.replication.wait.time.sec: Maximum wait time for the nimbus host replication to achieve the nimbus.min.replication.count. |
| Once this time is elapsed nimbus will go ahead and perform topology activation tasks even if required nimbus.min.replication.count is not achieved. |
| The default is 60 seconds, a value of -1 indicates to wait for ever. |
| * nimbus.code.sync.freq.secs: Frequency at which the background thread on nimbus which syncs code for locally missing blobs. Default is 2 minutes. |
| </code></pre></div> |
| <p>Additionally, if you want to access to secure hdfs blobstore, you also need to set the following configs.<br> |
| <code> |
| storm.hdfs.login.keytab or blobstore.hdfs.keytab (deprecated) |
| storm.hdfs.login.principal or blobstore.hdfs.principal (deprecated) |
| </code></p> |
| |
| <p>For example, |
| <code> |
| storm.hdfs.login.keytab: /etc/keytab |
| storm.hdfs.login.principal: primary/instance@REALM |
| </code></p> |
| |
| <h2 id="using-the-distributed-cache-api-command-line-interface-cli">Using the Distributed Cache API, Command Line Interface (CLI)</h2> |
| |
| <h3 id="creating-blobs">Creating blobs</h3> |
| |
| <p>To use the distributed cache feature, the user first has to "introduce" files |
| that need to be cached and bind them to key strings. To achieve this, the user |
| uses the "blobstore create" command of the storm executable, as follows:</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm blobstore create [-f|--file FILE] [-a|--acl ACL1,ACL2,...] [--replication-factor NUMBER] [keyname] |
| </code></pre></div> |
| <p>The contents come from a FILE, if provided by -f or --file option, otherwise |
| from STDIN.<br> |
| The ACLs, which can also be a comma separated list of many ACLs, is of the |
| following format:</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">> [u|o]:[username]:[r-|w-|a-|_] |
| </code></pre></div> |
| <p>where: </p> |
| |
| <ul> |
| <li>u = user<br></li> |
| <li>o = other<br></li> |
| <li>username = user for this particular ACL<br></li> |
| <li>r = read access<br></li> |
| <li>w = write access<br></li> |
| <li>a = admin access<br></li> |
| <li>_ = ignored<br></li> |
| </ul> |
| |
| <p>The replication factor can be set to a value greater than 1 using --replication-factor.</p> |
| |
| <p>Note: The replication right now is configurable for a hdfs blobstore but for a |
| local blobstore the replication always stays at 1. For a hdfs blobstore |
| the default replication is set to 3.</p> |
| |
| <h6 id="example">Example:</h6> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm blobstore create --file README.txt --acl o::rwa --replication-factor 4 key1 |
| </code></pre></div> |
| <p>In the above example, the <em>README.txt</em> file is added to the distributed cache. |
| It can be accessed using the key string "<em>key1</em>" for any topology that needs |
| it. The file is set to have read/write/admin access for others, a.k.a world |
| everything and the replication is set to 4.</p> |
| |
| <h6 id="example">Example:</h6> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm blobstore create mytopo:data.tgz -f data.tgz -a u:alice:rwa,u:bob:rw,o::r |
| </code></pre></div> |
| <p>The above example createss a mytopo:data.tgz key using the data stored in |
| data.tgz. User alice would have full access, bob would have read/write access |
| and everyone else would have read access.</p> |
| |
| <h3 id="making-dist-cache-files-accessible-to-topologies">Making dist. cache files accessible to topologies</h3> |
| |
| <p>Once a blob is created, we can use it for topologies. This is generally achieved |
| by including the key string among the configurations of a topology, with the |
| following format. A shortcut is to add the configuration item on the command |
| line when starting a topology by using the <strong>-c</strong> command:</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">-c topology.blobstore.map='{"[KEY]":{"localname":"[VALUE]", "uncompress":[true|false]}}' |
| </code></pre></div> |
| <p>Note: Please take care of the quotes.</p> |
| |
| <p>The cache file would then be accessible to the topology as a local file with the |
| name [VALUE].<br> |
| The localname parameter is optional, if omitted the local cached file will have |
| the same name as [KEY].<br> |
| The uncompress parameter is optional, if omitted the local cached file will not |
| be uncompressed. Note that the key string needs to have the appropriate |
| file-name-like format and extension, so it can be uncompressed correctly.</p> |
| |
| <h6 id="example">Example:</h6> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm jar /home/y/lib/storm-starter/current/storm-starter-jar-with-dependencies.jar org.apache.storm.starter.clj.word_count test_topo -c topology.blobstore.map='{"key1":{"localname":"blob_file", "uncompress":false},"key2":{}}' |
| </code></pre></div> |
| <p>Note: Please take care of the quotes.</p> |
| |
| <p>In the above example, we start the <em>word_count</em> topology (stored in the |
| <em>storm-starter-jar-with-dependencies.jar</em> file), and ask it to have access |
| to the cached file stored with key string = <em>key1</em>. This file would then be |
| accessible to the topology as a local file called <em>blob_file</em>, and the |
| supervisor will not try to uncompress the file. Note that in our example, the |
| file's content originally came from <em>README.txt</em>. We also ask for the file |
| stored with the key string = <em>key2</em> to be accessible to the topology. Since |
| both the optional parameters are omitted, this file will get the local name = |
| <em>key2</em>, and will not be uncompressed.</p> |
| |
| <h3 id="updating-a-cached-file">Updating a cached file</h3> |
| |
| <p>It is possible for the cached files to be updated while topologies are running. |
| The update happens in an eventual consistency model, where the supervisors poll |
| Nimbus every 30 seconds, and update their local copies. In the current version, |
| it is the user's responsibility to check whether a new file is available.</p> |
| |
| <p>To update a cached file, use the following command. Contents come from a FILE or |
| STDIN. Write access is required to be able to update a cached file.</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm blobstore update [-f|--file NEW_FILE] [KEYSTRING] |
| </code></pre></div> |
| <h6 id="example">Example:</h6> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm blobstore update -f updates.txt key1 |
| </code></pre></div> |
| <p>In the above example, the topologies will be presented with the contents of the |
| file <em>updates.txt</em> instead of <em>README.txt</em> (from the previous example), even |
| though their access by the topology is still through a file called |
| <em>blob_file</em>.</p> |
| |
| <h3 id="removing-a-cached-file">Removing a cached file</h3> |
| |
| <p>To remove a file from the distributed cache, use the following command. Removing |
| a file requires write access.</p> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm blobstore delete [KEYSTRING] |
| </code></pre></div> |
| <h3 id="listing-blobs-currently-in-the-distributed-cache-blobstore">Listing Blobs currently in the distributed cache blobstore</h3> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm blobstore list [KEY...] |
| </code></pre></div> |
| <p>lists blobs currently in the blobstore</p> |
| |
| <h3 id="reading-the-contents-of-a-blob">Reading the contents of a blob</h3> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm blobstore cat [-f|--file FILE] KEY |
| </code></pre></div> |
| <p>read a blob and then either write it to a file, or STDOUT. Reading a blob |
| requires read access.</p> |
| |
| <h3 id="setting-the-access-control-for-a-blob">Setting the access control for a blob</h3> |
| <div class="highlight"><pre><code class="language-" data-lang="">set-acl [-s ACL] KEY |
| </code></pre></div> |
| <p>ACL is in the form [uo]:[username]:[r-][w-][a-] can be comma separated list |
| (requires admin access).</p> |
| |
| <h3 id="update-the-replication-factor-for-a-blob">Update the replication factor for a blob</h3> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm blobstore replication --update --replication-factor 5 key1 |
| </code></pre></div> |
| <h3 id="read-the-replication-factor-of-a-blob">Read the replication factor of a blob</h3> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm blobstore replication --read key1 |
| </code></pre></div> |
| <h3 id="command-line-help">Command line help</h3> |
| <div class="highlight"><pre><code class="language-" data-lang="">storm help blobstore |
| </code></pre></div> |
| <h2 id="using-the-distributed-cache-api-from-java">Using the Distributed Cache API from Java</h2> |
| |
| <p>We start by getting a ClientBlobStore object by calling this function:</p> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">Config</span> <span class="n">theconf</span> <span class="o">=</span> <span class="k">new</span> <span class="n">Config</span><span class="o">();</span> |
| <span class="n">theconf</span><span class="o">.</span><span class="na">putAll</span><span class="o">(</span><span class="n">Utils</span><span class="o">.</span><span class="na">readStormConfig</span><span class="o">());</span> |
| <span class="n">ClientBlobStore</span> <span class="n">clientBlobStore</span> <span class="o">=</span> <span class="n">Utils</span><span class="o">.</span><span class="na">getClientBlobStore</span><span class="o">(</span><span class="n">theconf</span><span class="o">);</span> |
| </code></pre></div> |
| <p>The required Utils package can by imported by:</p> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.storm.utils.Utils</span><span class="o">;</span> |
| </code></pre></div> |
| <p>ClientBlobStore and other blob-related classes can be imported by:</p> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kn">import</span> <span class="nn">org.apache.storm.blobstore.ClientBlobStore</span><span class="o">;</span> |
| <span class="kn">import</span> <span class="nn">org.apache.storm.blobstore.AtomicOutputStream</span><span class="o">;</span> |
| <span class="kn">import</span> <span class="nn">org.apache.storm.blobstore.InputStreamWithMeta</span><span class="o">;</span> |
| <span class="kn">import</span> <span class="nn">org.apache.storm.blobstore.BlobStoreAclHandler</span><span class="o">;</span> |
| <span class="kn">import</span> <span class="nn">org.apache.storm.generated.*</span><span class="o">;</span> |
| </code></pre></div> |
| <h3 id="creating-acls-to-be-used-for-blobs">Creating ACLs to be used for blobs</h3> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">String</span> <span class="n">stringBlobACL</span> <span class="o">=</span> <span class="s">"u:username:rwa"</span><span class="o">;</span> |
| <span class="n">AccessControl</span> <span class="n">blobACL</span> <span class="o">=</span> <span class="n">BlobStoreAclHandler</span><span class="o">.</span><span class="na">parseAccessControl</span><span class="o">(</span><span class="n">stringBlobACL</span><span class="o">);</span> |
| <span class="n">List</span><span class="o"><</span><span class="n">AccessControl</span><span class="o">></span> <span class="n">acls</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LinkedList</span><span class="o"><</span><span class="n">AccessControl</span><span class="o">>();</span> |
| <span class="n">acls</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">blobACL</span><span class="o">);</span> <span class="c1">// more ACLs can be added here</span> |
| <span class="n">SettableBlobMeta</span> <span class="n">settableBlobMeta</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SettableBlobMeta</span><span class="o">(</span><span class="n">acls</span><span class="o">);</span> |
| <span class="n">settableBlobMeta</span><span class="o">.</span><span class="na">set_replication_factor</span><span class="o">(</span><span class="mi">4</span><span class="o">);</span> <span class="c1">// Here we can set the replication factor</span> |
| </code></pre></div> |
| <p>The settableBlobMeta object is what we need to create a blob in the next step. </p> |
| |
| <h3 id="creating-a-blob">Creating a blob</h3> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">AtomicOutputStream</span> <span class="n">blobStream</span> <span class="o">=</span> <span class="n">clientBlobStore</span><span class="o">.</span><span class="na">createBlob</span><span class="o">(</span><span class="s">"some_key"</span><span class="o">,</span> <span class="n">settableBlobMeta</span><span class="o">);</span> |
| <span class="n">blobStream</span><span class="o">.</span><span class="na">write</span><span class="o">(</span><span class="s">"Some String or input data"</span><span class="o">.</span><span class="na">getBytes</span><span class="o">());</span> |
| <span class="n">blobStream</span><span class="o">.</span><span class="na">close</span><span class="o">();</span> |
| </code></pre></div> |
| <p>Note that the settableBlobMeta object here comes from the last step, creating ACLs. |
| It is recommended that for very large files, the user writes the bytes in smaller chunks (for example 64 KB, up to 1 MB chunks).</p> |
| |
| <h3 id="updating-a-blob">Updating a blob</h3> |
| |
| <p>Similar to creating a blob, but we get the AtomicOutputStream in a different way:</p> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">String</span> <span class="n">blobKey</span> <span class="o">=</span> <span class="s">"some_key"</span><span class="o">;</span> |
| <span class="n">AtomicOutputStream</span> <span class="n">blobStream</span> <span class="o">=</span> <span class="n">clientBlobStore</span><span class="o">.</span><span class="na">updateBlob</span><span class="o">(</span><span class="n">blobKey</span><span class="o">);</span> |
| </code></pre></div> |
| <p>Pass a byte stream to the returned AtomicOutputStream as before. </p> |
| |
| <h3 id="updating-the-acls-of-a-blob">Updating the ACLs of a blob</h3> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">String</span> <span class="n">blobKey</span> <span class="o">=</span> <span class="s">"some_key"</span><span class="o">;</span> |
| <span class="n">AccessControl</span> <span class="n">updateAcl</span> <span class="o">=</span> <span class="n">BlobStoreAclHandler</span><span class="o">.</span><span class="na">parseAccessControl</span><span class="o">(</span><span class="s">"u:USER:--a"</span><span class="o">);</span> |
| <span class="n">List</span><span class="o"><</span><span class="n">AccessControl</span><span class="o">></span> <span class="n">updateAcls</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LinkedList</span><span class="o"><</span><span class="n">AccessControl</span><span class="o">>();</span> |
| <span class="n">updateAcls</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">updateAcl</span><span class="o">);</span> |
| <span class="n">SettableBlobMeta</span> <span class="n">modifiedSettableBlobMeta</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SettableBlobMeta</span><span class="o">(</span><span class="n">updateAcls</span><span class="o">);</span> |
| <span class="n">clientBlobStore</span><span class="o">.</span><span class="na">setBlobMeta</span><span class="o">(</span><span class="n">blobKey</span><span class="o">,</span> <span class="n">modifiedSettableBlobMeta</span><span class="o">);</span> |
| |
| <span class="c1">//Now set write only</span> |
| <span class="n">updateAcl</span> <span class="o">=</span> <span class="n">BlobStoreAclHandler</span><span class="o">.</span><span class="na">parseAccessControl</span><span class="o">(</span><span class="s">"u:USER:-w-"</span><span class="o">);</span> |
| <span class="n">updateAcls</span> <span class="o">=</span> <span class="k">new</span> <span class="n">LinkedList</span><span class="o"><</span><span class="n">AccessControl</span><span class="o">>();</span> |
| <span class="n">updateAcls</span><span class="o">.</span><span class="na">add</span><span class="o">(</span><span class="n">updateAcl</span><span class="o">);</span> |
| <span class="n">modifiedSettableBlobMeta</span> <span class="o">=</span> <span class="k">new</span> <span class="n">SettableBlobMeta</span><span class="o">(</span><span class="n">updateAcls</span><span class="o">);</span> |
| <span class="n">clientBlobStore</span><span class="o">.</span><span class="na">setBlobMeta</span><span class="o">(</span><span class="n">blobKey</span><span class="o">,</span> <span class="n">modifiedSettableBlobMeta</span><span class="o">);</span> |
| </code></pre></div> |
| <h3 id="updating-and-reading-the-replication-of-a-blob">Updating and Reading the replication of a blob</h3> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">String</span> <span class="n">blobKey</span> <span class="o">=</span> <span class="s">"some_key"</span><span class="o">;</span> |
| <span class="n">BlobReplication</span> <span class="n">replication</span> <span class="o">=</span> <span class="n">clientBlobStore</span><span class="o">.</span><span class="na">updateBlobReplication</span><span class="o">(</span><span class="n">blobKey</span><span class="o">,</span> <span class="mi">5</span><span class="o">);</span> |
| <span class="kt">int</span> <span class="n">replication_factor</span> <span class="o">=</span> <span class="n">replication</span><span class="o">.</span><span class="na">get_replication</span><span class="o">();</span> |
| </code></pre></div> |
| <p>Note: The replication factor gets updated and reflected only for hdfs blobstore</p> |
| |
| <h3 id="reading-a-blob">Reading a blob</h3> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">String</span> <span class="n">blobKey</span> <span class="o">=</span> <span class="s">"some_key"</span><span class="o">;</span> |
| <span class="n">InputStreamWithMeta</span> <span class="n">blobInputStream</span> <span class="o">=</span> <span class="n">clientBlobStore</span><span class="o">.</span><span class="na">getBlob</span><span class="o">(</span><span class="n">blobKey</span><span class="o">);</span> |
| <span class="n">BufferedReader</span> <span class="n">r</span> <span class="o">=</span> <span class="k">new</span> <span class="n">BufferedReader</span><span class="o">(</span><span class="k">new</span> <span class="n">InputStreamReader</span><span class="o">(</span><span class="n">blobInputStream</span><span class="o">));</span> |
| <span class="n">String</span> <span class="n">blobContents</span> <span class="o">=</span> <span class="n">r</span><span class="o">.</span><span class="na">readLine</span><span class="o">();</span> |
| </code></pre></div> |
| <h3 id="deleting-a-blob">Deleting a blob</h3> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">String</span> <span class="n">blobKey</span> <span class="o">=</span> <span class="s">"some_key"</span><span class="o">;</span> |
| <span class="n">clientBlobStore</span><span class="o">.</span><span class="na">deleteBlob</span><span class="o">(</span><span class="n">blobKey</span><span class="o">);</span> |
| </code></pre></div> |
| <h3 id="getting-a-list-of-blob-keys-already-in-the-blobstore">Getting a list of blob keys already in the blobstore</h3> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="n">Iterator</span> <span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="n">stringIterator</span> <span class="o">=</span> <span class="n">clientBlobStore</span><span class="o">.</span><span class="na">listKeys</span><span class="o">();</span> |
| </code></pre></div> |
| <h2 id="appendix-a">Appendix A</h2> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">abstract</span> <span class="kt">void</span> <span class="nf">prepare</span><span class="o">(</span><span class="n">Map</span> <span class="n">conf</span><span class="o">,</span> <span class="n">String</span> <span class="n">baseDir</span><span class="o">);</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">AtomicOutputStream</span> <span class="nf">createBlob</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="n">SettableBlobMeta</span> <span class="n">meta</span><span class="o">,</span> <span class="n">Subject</span> <span class="n">who</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyAlreadyExistsException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">AtomicOutputStream</span> <span class="nf">updateBlob</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="n">Subject</span> <span class="n">who</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">ReadableBlobMeta</span> <span class="nf">getBlobMeta</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="n">Subject</span> <span class="n">who</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="kt">void</span> <span class="nf">setBlobMeta</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="n">SettableBlobMeta</span> <span class="n">meta</span><span class="o">,</span> <span class="n">Subject</span> <span class="n">who</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="kt">void</span> <span class="nf">deleteBlob</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="n">Subject</span> <span class="n">who</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">InputStreamWithMeta</span> <span class="nf">getBlob</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="n">Subject</span> <span class="n">who</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">Iterator</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="nf">listKeys</span><span class="o">(</span><span class="n">Subject</span> <span class="n">who</span><span class="o">);</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">BlobReplication</span> <span class="nf">getBlobReplication</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="n">Subject</span> <span class="n">who</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">Exception</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">BlobReplication</span> <span class="nf">updateBlobReplication</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="kt">int</span> <span class="n">replication</span><span class="o">,</span> <span class="n">Subject</span> <span class="n">who</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">,</span> <span class="n">IOException</span> |
| </code></pre></div> |
| <h2 id="appendix-b">Appendix B</h2> |
| <div class="highlight"><pre><code class="language-java" data-lang="java"><span class="kd">public</span> <span class="kd">abstract</span> <span class="kt">void</span> <span class="nf">prepare</span><span class="o">(</span><span class="n">Map</span> <span class="n">conf</span><span class="o">);</span> |
| |
| <span class="kd">protected</span> <span class="kd">abstract</span> <span class="n">AtomicOutputStream</span> <span class="nf">createBlobToExtend</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="n">SettableBlobMeta</span> <span class="n">meta</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyAlreadyExistsException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">AtomicOutputStream</span> <span class="nf">updateBlob</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">ReadableBlobMeta</span> <span class="nf">getBlobMeta</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">;</span> |
| |
| <span class="kd">protected</span> <span class="kd">abstract</span> <span class="kt">void</span> <span class="nf">setBlobMetaToExtend</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="n">SettableBlobMeta</span> <span class="n">meta</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="kt">void</span> <span class="nf">deleteBlob</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">InputStreamWithMeta</span> <span class="nf">getBlob</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">Iterator</span><span class="o"><</span><span class="n">String</span><span class="o">></span> <span class="nf">listKeys</span><span class="o">();</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="kt">void</span> <span class="nf">watchBlob</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">,</span> <span class="n">IBlobWatcher</span> <span class="n">watcher</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="kt">void</span> <span class="nf">stopWatchingBlob</span><span class="o">(</span><span class="n">String</span> <span class="n">key</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">BlobReplication</span> <span class="nf">getBlobReplication</span><span class="o">(</span><span class="n">String</span> <span class="n">Key</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span><span class="o">;</span> |
| |
| <span class="kd">public</span> <span class="kd">abstract</span> <span class="n">BlobReplication</span> <span class="nf">updateBlobReplication</span><span class="o">(</span><span class="n">String</span> <span class="n">Key</span><span class="o">,</span> <span class="kt">int</span> <span class="n">replication</span><span class="o">)</span> <span class="kd">throws</span> <span class="n">AuthorizationException</span><span class="o">,</span> <span class="n">KeyNotFoundException</span> |
| </code></pre></div> |
| <h2 id="appendix-c">Appendix C</h2> |
| <div class="highlight"><pre><code class="language-" data-lang="">service Nimbus { |
| ... |
| string beginCreateBlob(1: string key, 2: SettableBlobMeta meta) throws (1: AuthorizationException aze, 2: KeyAlreadyExistsException kae); |
| |
| string beginUpdateBlob(1: string key) throws (1: AuthorizationException aze, 2: KeyNotFoundException knf); |
| |
| void uploadBlobChunk(1: string session, 2: binary chunk) throws (1: AuthorizationException aze); |
| |
| void finishBlobUpload(1: string session) throws (1: AuthorizationException aze); |
| |
| void cancelBlobUpload(1: string session) throws (1: AuthorizationException aze); |
| |
| ReadableBlobMeta getBlobMeta(1: string key) throws (1: AuthorizationException aze, 2: KeyNotFoundException knf); |
| |
| void setBlobMeta(1: string key, 2: SettableBlobMeta meta) throws (1: AuthorizationException aze, 2: KeyNotFoundException knf); |
| |
| BeginDownloadResult beginBlobDownload(1: string key) throws (1: AuthorizationException aze, 2: KeyNotFoundException knf); |
| |
| binary downloadBlobChunk(1: string session) throws (1: AuthorizationException aze); |
| |
| void deleteBlob(1: string key) throws (1: AuthorizationException aze, 2: KeyNotFoundException knf); |
| |
| ListBlobsResult listBlobs(1: string session); |
| |
| BlobReplication getBlobReplication(1: string key) throws (1: AuthorizationException aze, 2: KeyNotFoundException knf); |
| |
| BlobReplication updateBlobReplication(1: string key, 2: i32 replication) throws (1: AuthorizationException aze, 2: KeyNotFoundException knf); |
| ... |
| } |
| |
| struct BlobReplication { |
| 1: required i32 replication; |
| } |
| |
| exception AuthorizationException { |
| 1: required string msg; |
| } |
| |
| exception KeyNotFoundException { |
| 1: required string msg; |
| } |
| |
| exception KeyAlreadyExistsException { |
| 1: required string msg; |
| } |
| |
| enum AccessControlType { |
| OTHER = 1, |
| USER = 2 |
| //eventually ,GROUP=3 |
| } |
| |
| struct AccessControl { |
| 1: required AccessControlType type; |
| 2: optional string name; //Name of user or group in ACL |
| 3: required i32 access; //bitmasks READ=0x1, WRITE=0x2, ADMIN=0x4 |
| } |
| |
| struct SettableBlobMeta { |
| 1: required list<AccessControl> acl; |
| 2: optional i32 replication_factor |
| } |
| |
| struct ReadableBlobMeta { |
| 1: required SettableBlobMeta settable; |
| //This is some indication of a version of a BLOB. The only guarantee is |
| // if the data changed in the blob the version will be different. |
| 2: required i64 version; |
| } |
| |
| struct ListBlobsResult { |
| 1: required list<string> keys; |
| 2: required string session; |
| } |
| |
| struct BeginDownloadResult { |
| //Same version as in ReadableBlobMeta |
| 1: required i64 version; |
| 2: required string session; |
| 3: optional i64 data_size; |
| } |
| </code></pre></div></div> |
| |
| |
| </div> |
| </div> |
| </div> |
| <footer> |
| <div class="container-fluid"> |
| <div class="row"> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>Meetups</h5> |
| <ul class="latest-news"> |
| |
| <li><a href="http://www.meetup.com/Apache-Storm-Apache-Kafka/">Apache Storm & Apache Kafka</a> <span class="small">(Sunnyvale, CA)</span></li> |
| |
| <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Apache Storm & Kafka Users</a> <span class="small">(Seattle, WA)</span></li> |
| |
| <li><a href="http://www.meetup.com/New-York-City-Storm-User-Group/">NYC Storm User Group</a> <span class="small">(New York, NY)</span></li> |
| |
| <li><a href="http://www.meetup.com/Bay-Area-Stream-Processing">Bay Area Stream Processing</a> <span class="small">(Emeryville, CA)</span></li> |
| |
| <li><a href="http://www.meetup.com/Boston-Storm-Users/">Boston Realtime Data</a> <span class="small">(Boston, MA)</span></li> |
| |
| <li><a href="http://www.meetup.com/storm-london">London Storm User Group</a> <span class="small">(London, UK)</span></li> |
| |
| <!-- <li><a href="http://www.meetup.com/Apache-Storm-Kafka-Users/">Seatle, WA</a> <span class="small">(27 Jun 2015)</span></li> --> |
| </ul> |
| </div> |
| </div> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>About Apache Storm</h5> |
| <p>Apache Storm integrates with any queueing system and any database system. Apache Storm's spout abstraction makes it easy to integrate a new queuing system. Likewise, integrating Apache Storm with database systems is easy.</p> |
| </div> |
| </div> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>First Look</h5> |
| <ul class="footer-list"> |
| <li><a href="/releases/current/Rationale.html">Rationale</a></li> |
| <li><a href="/releases/current/Tutorial.html">Tutorial</a></li> |
| <li><a href="/releases/current/Setting-up-development-environment.html">Setting up development environment</a></li> |
| <li><a href="/releases/current/Creating-a-new-Storm-project.html">Creating a new Apache Storm project</a></li> |
| </ul> |
| </div> |
| </div> |
| <div class="col-md-3"> |
| <div class="footer-widget"> |
| <h5>Documentation</h5> |
| <ul class="footer-list"> |
| <li><a href="/releases/current/index.html">Index</a></li> |
| <li><a href="/releases/current/javadocs/index.html">Javadoc</a></li> |
| <li><a href="/releases/current/FAQ.html">FAQ</a></li> |
| </ul> |
| </div> |
| </div> |
| </div> |
| <hr/> |
| <div class="row"> |
| <div class="col-md-12"> |
| <p align="center">Copyright © 2019 <a href="http://www.apache.org">Apache Software Foundation</a>. All Rights Reserved. |
| <br>Apache Storm, Apache, the Apache feather logo, and the Apache Storm project logos are trademarks of The Apache Software Foundation. |
| <br>All other marks mentioned may be trademarks or registered trademarks of their respective owners.</p> |
| </div> |
| </div> |
| </div> |
| </footer> |
| <!--Footer End--> |
| <!-- Scroll to top --> |
| <span class="totop"><a href="#"><i class="fa fa-angle-up"></i></a></span> |
| |
| </body> |
| |
| </html> |
| |