| /* |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| package org.apache.beam.sdk.io; |
| |
| /** |
| * Standard shard naming templates. |
| * |
| * <p>Shard naming templates are strings that may contain placeholders for the shard number and |
| * shard count. When constructing a filename for a particular shard number, the upper-case letters |
| * 'S' and 'N' are replaced with the 0-padded shard number and shard count respectively. |
| * |
| * <p>Left-padding of the numbers enables lexicographical sorting of the resulting filenames. If the |
| * shard number or count are too large for the space provided in the template, then the result may |
| * no longer sort lexicographically. For example, a shard template of "S-of-N", for 200 shards, will |
| * result in outputs named "0-of-200", ... '10-of-200', '100-of-200", etc. |
| * |
| * <p>Shard numbers start with 0, so the last shard number is the shard count minus one. For |
| * example, the template "-SSSSS-of-NNNNN" will be instantiated as "-00000-of-01000" for the first |
| * shard (shard 0) of a 1000-way sharded output. |
| * |
| * <p>A shard name template is typically provided along with a name prefix and suffix, which allows |
| * constructing complex paths that have embedded shard information. For example, outputs in the form |
| * "gs://bucket/path-01-of-99.txt" could be constructed by providing the individual components: |
| * |
| * <pre>{@code |
| * pipeline.apply( |
| * TextIO.write().to("gs://bucket/path") |
| * .withShardNameTemplate("-SS-of-NN") |
| * .withSuffix(".txt")) |
| * }</pre> |
| * |
| * <p>In the example above, you could make parts of the output configurable by users without the |
| * user having to specify all components of the output name. |
| * |
| * <p>If a shard name template does not contain any repeating 'S', then the output shard count must |
| * be 1, as otherwise the same filename would be generated for multiple shards. |
| */ |
| public class ShardNameTemplate { |
| /** |
| * Shard name containing the index and max. |
| * |
| * <p>Eg: [prefix]-00000-of-00100[suffix] and [prefix]-00001-of-00100[suffix] |
| */ |
| public static final String INDEX_OF_MAX = "-SSSSS-of-NNNNN"; |
| |
| /** |
| * Shard is a file within a directory. |
| * |
| * <p>Eg: [prefix]/part-00000[suffix] and [prefix]/part-00001[suffix] |
| */ |
| public static final String DIRECTORY_CONTAINER = "/part-SSSSS"; |
| } |