| /* |
| * Licensed to the Apache Software Foundation (ASF) under one |
| * or more contributor license agreements. See the NOTICE file |
| * distributed with this work for additional information |
| * regarding copyright ownership. The ASF licenses this file |
| * to you under the Apache License, Version 2.0 (the |
| * "License"); you may not use this file except in compliance |
| * with the License. You may obtain a copy of the License at |
| * |
| * http://www.apache.org/licenses/LICENSE-2.0 |
| * |
| * Unless required by applicable law or agreed to in writing, software |
| * distributed under the License is distributed on an "AS IS" BASIS, |
| * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
| * See the License for the specific language governing permissions and |
| * limitations under the License. |
| */ |
| package org.apache.drill.exec.vector.accessor.writer; |
| |
| import org.apache.drill.exec.vector.BaseDataValueVector; |
| import org.apache.drill.exec.vector.UInt4Vector; |
| import org.apache.drill.exec.vector.accessor.InvalidConversionError; |
| import org.apache.drill.exec.vector.accessor.ValueType; |
| import org.apache.drill.exec.vector.accessor.impl.HierarchicalFormatter; |
| |
| /** |
| * Specialized column writer for the (hidden) offset vector used |
| * with variable-length or repeated vectors. See comments in the |
| * <tt>ColumnAccessors.java</tt> template file for more details. |
| * <p> |
| * Note that the <tt>lastWriteIndex</tt> tracked here corresponds |
| * to the data values; it is one less than the actual offset vector |
| * last write index due to the nature of offset vector layouts. The selection |
| * of last write index basis makes roll-over processing easier as only this |
| * writer need know about the +1 translation required for writing. |
| * <p> |
| * The states illustrated in the base class apply here as well, |
| * remembering that the end offset for a row (or array position) |
| * is written one ahead of the vector index. |
| * <p> |
| * The vector index does create an interesting dynamic for the child |
| * writers. From the child writer's perspective, the states described in |
| * the super class are the only states of interest. Here we want to |
| * take the perspective of the parent. |
| * <p> |
| * The offset vector is an implementation of a repeat level. A repeat |
| * level can occur for a single array, or for a collection of columns |
| * within a repeated map. (A repeat level also occurs for variable-width |
| * fields, but this is a bit harder to see, so let's ignore that for |
| * now.) |
| * <p> |
| * The key point to realize is that each repeat level introduces an |
| * isolation level in terms of indexing. That is, empty values in the |
| * outer level have no affect on indexing in the inner level. In fact, |
| * the nature of a repeated outer level means that there are no empties |
| * in the inner level. |
| * <p> |
| * To illustrate:<pre><code> |
| * Offset Vector Data Vector Indexes |
| * lw, v > | 10 | - - - - - > | X | 10 |
| * | 12 | - - + | X | < lw' 11 |
| * | | + - - > | | < v' 12 |
| * </code></pre> |
| * In the above, the client has just written an array of two elements |
| * at the current write position. The data starts at offset 10 in |
| * the data vector, and the next write will be at 12. The end offset |
| * is written one ahead of the vector index. |
| * <p> |
| * From the data vector's perspective, its last-write (lw') reflects |
| * the last element written. If this is an array of scalars, then the |
| * write index is automatically incremented, as illustrated by v'. |
| * (For map arrays, the index must be incremented by calling |
| * <tt>save()</tt> on the map array writer.) |
| * <p> |
| * Suppose the client now skips some arrays:<pre><code> |
| * Offset Vector Data Vector |
| * lw > | 10 | - - - - - > | X | 10 |
| * | 12 | - - + | X | < lw' 11 |
| * | | + - - > | | < v' 12 |
| * | | | | 13 |
| * v > | | | | 14 |
| * </code></pre> |
| * The last write position does not move and there are gaps in the |
| * offset vector. The vector index points to the current row. Note |
| * that the data vector last write and vector indexes do not change, |
| * this reflects the fact that the the data vector's vector index |
| * (v') matches the tail offset |
| * <p> |
| * The |
| * client now writes a three-element vector:<pre><code> |
| * Offset Vector Data Vector |
| * | 10 | - - - - - > | X | 10 |
| * | 12 | - - + | X | 11 |
| * | 12 | - - + - - > | Y | 12 |
| * | 12 | - - + | Y | 13 |
| * lw, v > | 12 | - - + | Y | < lw' 14 |
| * | 15 | - - - - - > | | < v' 15 |
| * </code></pre> |
| * Quite a bit just happened. The empty offset slots were back-filled |
| * with the last write offset in the data vector. The client wrote |
| * three values, which advanced the last write and vector indexes |
| * in the data vector. And, the last write index in the offset |
| * vector also moved to reflect the update of the offset vector. |
| * Note that as a result, multiple positions in the offset vector |
| * point to the same location in the data vector. This is fine; we |
| * compute the number of entries as the difference between two successive |
| * offset vector positions, so the empty positions have become 0-length |
| * arrays. |
| * <p> |
| * Note that, for an array of scalars, when overflow occurs, |
| * we need only worry about two |
| * states in the data vector. Either data has been written for the |
| * row (as in the third example above), and so must be moved to the |
| * roll-over vector, or no data has been written and no move is |
| * needed. We never have to worry about missing values because the |
| * cannot occur in the data vector. |
| * <p> |
| * See {@link ObjectArrayWriter} for information about arrays of |
| * maps (arrays of multiple columns.) |
| * |
| * <h4>Empty Slots</h4> |
| * |
| * The offset vector writer handles empty slots in two distinct ways. |
| * First, the writer handles its own empties. Suppose that this is the offset |
| * vector for a VarChar column. Suppose we write "Foo" in the first slot. Now |
| * we have an offset vector with the values <tt>[ 0 3 ]</tt>. Suppose the client |
| * skips several rows and next writes at slot 5. We must copy the latest |
| * offset (3) into all the skipped slots: <tt>[ 0 3 3 3 3 3 ]</tt>. The result |
| * is a set of four empty VarChars in positions 1, 2, 3 and 4. (Here, remember |
| * that the offset vector always has one more value than the the number of rows.) |
| * <p> |
| * The second way to fill empties is in the data vector. The data vector may choose |
| * to fill the four "empty" slots with a value, say "X". In this case, it is up to |
| * the data vector to fill in the values, calling into this vector to set each |
| * offset. Note that when doing this, the calls are a bit different than for writing |
| * a regular value because we want to write at the "last write position", not the |
| * current row position. See {@link BaseVarWidthWriter} for an example. |
| */ |
| |
| public class OffsetVectorWriterImpl extends AbstractFixedWidthWriter implements OffsetVectorWriter { |
| |
| private static final int VALUE_WIDTH = UInt4Vector.VALUE_WIDTH; |
| |
| private final UInt4Vector vector; |
| |
| /** |
| * Offset of the first value for the current row. Used during |
| * overflow or if the row is restarted. |
| */ |
| |
| private int rowStartOffset; |
| |
| /** |
| * Cached value of the end offset for the current value. Used |
| * primarily for variable-width columns to allow the column to be |
| * rewritten multiple times within the same row. The start offset |
| * value is updated with the end offset only when the value is |
| * committed in {@link @endValue()}. |
| */ |
| |
| protected int nextOffset; |
| |
| public OffsetVectorWriterImpl(UInt4Vector vector) { |
| this.vector = vector; |
| } |
| |
| @Override public BaseDataValueVector vector() { return vector; } |
| @Override public int width() { return VALUE_WIDTH; } |
| |
| @Override |
| protected void realloc(int size) { |
| vector.reallocRaw(size); |
| setBuffer(); |
| } |
| |
| @Override |
| public ValueType valueType() { return ValueType.INTEGER; } |
| |
| @Override |
| public void startWrite() { |
| super.startWrite(); |
| nextOffset = 0; |
| rowStartOffset = 0; |
| |
| // Special handling for first value. Alloc vector if needed. |
| // Offset vectors require a 0 at position 0. The (end) offset |
| // for row 0 starts at position 1, which is handled in |
| // writeOffset() below. |
| |
| if (capacity * VALUE_WIDTH < MIN_BUFFER_SIZE) { |
| realloc(MIN_BUFFER_SIZE); |
| } |
| drillBuf.setInt(0, 0); |
| } |
| |
| @Override |
| public int nextOffset() {return nextOffset; } |
| |
| @Override |
| public int rowStartOffset() { return rowStartOffset; } |
| |
| @Override |
| public void startRow() { rowStartOffset = nextOffset; } |
| |
| /** |
| * Return the write offset, which is one greater than the index reported |
| * by the vector index. |
| * |
| * @return the offset in which to write the current offset of the end |
| * of the current data value |
| */ |
| |
| protected final int prepareWrite() { |
| |
| // This is performance critical code; every operation counts. |
| // Please be thoughtful when changing the code. |
| |
| int valueIndex = prepareFill(); |
| int fillCount = valueIndex - lastWriteIndex - 1; |
| if (fillCount > 0) { |
| fillEmpties(fillCount); |
| } |
| |
| // Track the last write location for zero-fill use next time around. |
| |
| lastWriteIndex = valueIndex; |
| return valueIndex + 1; |
| } |
| |
| public final int prepareFill() { |
| int valueIndex = vectorIndex.vectorIndex(); |
| if (valueIndex + 1 < capacity) { |
| return valueIndex; |
| } |
| resize(valueIndex + 1); |
| |
| // Call to resize may cause rollover, so get new write index afterwards. |
| |
| return vectorIndex.vectorIndex(); |
| } |
| |
| @Override |
| protected final void fillEmpties(int fillCount) { |
| for (int i = 0; i < fillCount; i++) { |
| fillOffset(nextOffset); |
| } |
| } |
| |
| @Override |
| public final void setNextOffset(int newOffset) { |
| int writeIndex = prepareWrite(); |
| drillBuf.setInt(writeIndex * VALUE_WIDTH, newOffset); |
| nextOffset = newOffset; |
| } |
| |
| public final void reviseOffset(int newOffset) { |
| int writeIndex = vectorIndex.vectorIndex() + 1; |
| drillBuf.setInt(writeIndex * VALUE_WIDTH, newOffset); |
| nextOffset = newOffset; |
| } |
| |
| public final void fillOffset(int newOffset) { |
| drillBuf.setInt((++lastWriteIndex + 1) * VALUE_WIDTH, newOffset); |
| nextOffset = newOffset; |
| } |
| |
| @Override |
| public final void setValue(Object value) { |
| throw new InvalidConversionError( |
| "setValue() not supported for the offset vector writer: " + value); |
| } |
| |
| @Override |
| public void skipNulls() { |
| |
| // Nothing to do. Fill empties logic will fill in missing offsets. |
| } |
| |
| @Override |
| public void restartRow() { |
| nextOffset = rowStartOffset; |
| super.restartRow(); |
| } |
| |
| @Override |
| public void preRollover() { |
| |
| // Rollover is occurring. This means the current row is not complete. |
| // We want to keep 0..(row index - 1) which gives us (row index) |
| // rows. But, this being an offset vector, we add one to account |
| // for the extra 0 value at the start. That is, we want to set |
| // the value count to the current row start index, which already |
| // is set to one past the index of the last zero-based index. |
| // (Offset vector indexes are confusing.) |
| |
| setValueCount(vectorIndex.rowStartIndex()); |
| } |
| |
| @Override |
| public void postRollover() { |
| int newNext = nextOffset - rowStartOffset; |
| super.postRollover(); |
| nextOffset = newNext; |
| } |
| |
| @Override |
| public void setValueCount(int valueCount) { |
| |
| // Value count is in row positions, not index |
| // positions. (There are one more index positions |
| // than row positions.) |
| |
| int offsetCount = valueCount + 1; |
| mandatoryResize(offsetCount); |
| fillEmpties(valueCount - lastWriteIndex - 1); |
| vector().getBuffer().writerIndex(offsetCount * VALUE_WIDTH); |
| } |
| |
| @Override |
| public void dump(HierarchicalFormatter format) { |
| format.extend(); |
| super.dump(format); |
| format |
| .attribute("lastWriteIndex", lastWriteIndex) |
| .attribute("nextOffset", nextOffset) |
| .endObject(); |
| } |
| |
| @Override |
| public void setDefaultValue(Object value) { |
| throw new UnsupportedOperationException("Encoding not supported for offset vectors"); |
| } |
| } |