| /* |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| package org.apache.drill.exec.store; |
| |
| import java.util.List; |
| |
| /** |
| * Exposes partition information to UDFs to allow queries to limit reading |
| * partitions dynamically. |
| * |
| * In a Drill query, a specific partition can be read by simply |
| * using a filter on a directory column. For example, if data is partitioned |
| * by year and month using directory names, a particular year/month can be |
| * read with the following query. |
| * |
| * <pre> |
| * select * from dfs.my_workspace.data_directory where dir0 = '2014_01'; |
| * </pre> |
| * |
| * This assumes that below data_directory there are sub-directories with |
| * years and month numbers as folder names, and data stored below them. |
| * |
| * This works in cases where the partition column is known, but the current |
| * implementation does not allow the partition information itself to be queried. |
| * An example of such behavior would be a query that should always return the |
| * latest month of data, without having to be updated periodically. |
| * While it is possible to write a query like the one below, it will be very |
| * expensive, as this currently is materialized as a full table scan followed |
| * by an aggregation on the partition dir0 column and finally a filter. |
| * |
| * <pre> |
| * select * from dfs.my_workspace.data_directory where dir0 in |
| * (select MAX(dir0) from dfs.my_workspace.data_directory); |
| * </pre> |
| * |
| * This interface allows the definition of a UDF to perform the sub-query |
| * on the list of partitions. This UDF can be used at planning time to |
| * prune out all of the unnecessary reads of the previous example. |
| * |
| * <pre> |
| * select * from dfs.my_workspace.data_directory |
| * where dir0 = maxdir('dfs.my_workspace', 'data_directory'); |
| * </pre> |
| * |
| * Look at {@link org.apache.drill.exec.expr.fn.impl.DirectoryExplorers} |
| * for examples of UDFs that use this interface to query against |
| * partition information. |
| */ |
| public interface PartitionExplorer { |
| /** |
| * For the schema provided, |
| * get a list of sub-partitions of a particular table and the partitions |
| * specified by partition columns and values. Individual storage |
| * plugins will assign specific meaning to the parameters and return |
| * values. |
| * |
| * A return value of an empty list should be given if the partition has |
| * no sub-partitions. |
| * |
| * Note this does cause a collision between empty partitions and leaf partitions, |
| * the interface should be modified if the distinction is meaningful. |
| * |
| * Example: for a filesystem plugin the partition information can be simply |
| * be a path from the root of the given workspace to the desired directory. The |
| * return value should be defined as a list of full paths (again from the root |
| * of the workspace), which can be passed by into this interface to explore |
| * partitions further down. An empty list would be returned if the partition |
| * provided was a file, or an empty directory. |
| * |
| * Note to future devs, keep this doc in sync with |
| * {@link SchemaPartitionExplorer}. |
| * |
| * @param schema schema path, can be complete or relative to the default schema |
| * @param partitionColumns a list of partitions to match |
| * @param partitionValues list of values of each partition (corresponding |
| * to the partition column list) |
| * @return list of sub-partitions, will be empty if a there is no further |
| * level of sub-partitioning below, i.e. hit a leaf partition |
| * @throws PartitionNotFoundException when the partition does not exist in |
| * the given workspace |
| */ |
| Iterable<String> getSubPartitions(String schema, |
| String table, |
| List<String> partitionColumns, |
| List<String> partitionValues) |
| throws PartitionNotFoundException; |
| } |