blob: 7aabdc6e1d462bf661e30257cfeae09a580e2a63 [file] [log] [blame]
.. Licensed to the Apache Software Foundation (ASF) under one
or more contributor license agreements. See the NOTICE file
distributed with this work for additional information
regarding copyright ownership. The ASF licenses this file
to you under the Apache License, Version 2.0 (the
"License"); you may not use this file except in compliance
with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing,
software distributed under the License is distributed on an
"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
KIND, either express or implied. See the License for the
specific language governing permissions and limitations
under the License.
.. _arrays.indexing:
Indexing
========
.. sectionauthor:: adapted from "Guide to NumPy" by Travis E. Oliphant
.. currentmodule:: mxnet.np
.. index:: indexing, slicing
:class:`ndarrays <ndarray>` can be indexed using the standard Python
``x[obj]`` syntax, where *x* is the array and *obj* the selection.
There are three kinds of indexing available: basic
slicing, advanced indexing, and boolean mask indexing. Which one occurs depends on *obj*.
.. note::
In Python, ``x[(exp1, exp2, ..., expN)]`` is equivalent to
``x[exp1, exp2, ..., expN]``; the latter is just syntactic sugar
for the former.
Basic Slicing and Indexing
--------------------------
Basic slicing extends Python's basic concept of slicing to N
dimensions. Basic slicing occurs when *obj* is a :class:`slice` object
(constructed by ``start:stop:step`` notation inside of brackets), an
integer, or a tuple of slice objects and integers. :const:`Ellipsis`
and :const:`newaxis` objects can be interspersed with these as
well.
The simplest case of indexing with *N* integers returns an array
scalar representing the corresponding item. As in
Python, all indices are zero-based: for the *i*-th index :math:`n_i`,
the valid range is :math:`0 \le n_i < d_i` where :math:`d_i` is the
*i*-th element of the shape of the array. Negative indices are
interpreted as counting from the end of the array (*i.e.*, if
:math:`n_i < 0`, it means :math:`n_i + d_i`).
All arrays generated by basic slicing are always views
of the original array if the fetched elements are contiguous in memory.
The standard rules of sequence slicing apply to basic slicing on a
per-dimension basis (including using a step index). Some useful
concepts to remember include:
- The basic slice syntax is ``i:j:k`` where *i* is the starting index,
*j* is the stopping index, and *k* is the step (:math:`k\neq0`).
This selects the *m* elements (in the corresponding dimension) with
index values *i*, *i + k*, ..., *i + (m - 1) k* where
:math:`m = q + (r\neq0)` and *q* and *r* are the quotient and remainder
obtained by dividing *j - i* by *k*: *j - i = q k + r*, so that
*i + (m - 1) k < j*.
.. admonition:: Example
>>> x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> x[1:7:2]
array([1, 3, 5])
- Negative *i* and *j* are interpreted as *n + i* and *n + j* where
*n* is the number of elements in the corresponding dimension.
Negative *k* makes stepping go towards smaller indices.
.. admonition:: Example
>>> x[-2:10]
array([8, 9])
>>> x[-3:3:-1]
array([7, 6, 5, 4])
- Assume *n* is the number of elements in the dimension being
sliced. Then, if *i* is not given it defaults to 0 for *k > 0* and
*n - 1* for *k < 0* . If *j* is not given it defaults to *n* for *k > 0*
and *-n-1* for *k < 0* . If *k* is not given it defaults to 1. Note that
``::`` is the same as ``:`` and means select all indices along this
axis.
.. admonition:: Example
>>> x[5:]
array([5, 6, 7, 8, 9])
- If the number of objects in the selection tuple is less than
*N* , then ``:`` is assumed for any subsequent dimensions.
.. admonition:: Example
>>> x = np.array([[[1],[2],[3]], [[4],[5],[6]]])
>>> x.shape
(2, 3, 1)
>>> x[1:2]
array([[[4],
[5],
[6]]])
- :const:`Ellipsis` expands to the number of ``:`` objects needed for the
selection tuple to index all dimensions. In most cases, this means that
length of the expanded selection tuple is ``x.ndim``. There may only be a
single ellipsis present.
.. admonition:: Example
>>> x[...,0]
array([[1, 2, 3],
[4, 5, 6]])
- Each :const:`newaxis` object in the selection tuple serves to expand
the dimensions of the resulting selection by one unit-length
dimension. The added dimension is the position of the :const:`newaxis`
object in the selection tuple.
.. admonition:: Example
>>> x[:,np.newaxis,:,:].shape
(2, 1, 3, 1)
- An integer, *i*, returns the same values as ``i:i+1``
**except** the dimensionality of the returned object is reduced by
1. In particular, a selection tuple with the *p*-th
element an integer (and all other entries ``:``) returns the
corresponding sub-array with dimension *N - 1*. If *N = 1*
then the returned object is an scalar `ndarray` whose `ndim=0`.
- If the selection tuple has all entries ``:`` except the
*p*-th entry which is a slice object ``i:j:k``,
then the returned array has dimension *N* formed by
concatenating the sub-arrays returned by integer indexing of
elements *i*, *i+k*, ..., *i + (m - 1) k < j*,
- Basic slicing with more than one non-``:`` entry in the slicing
tuple, acts like repeated application of slicing using a single
non-``:`` entry, where the non-``:`` entries are successively taken
(with all other non-``:`` entries replaced by ``:``). Thus,
``x[ind1,...,ind2,:]`` acts like ``x[ind1][...,ind2,:]`` under basic
slicing.
.. warning:: The above is **not** true for advanced indexing.
- You may use slicing to set values in the array, but (unlike lists) you
can never grow the array. The size of the value to be set in
``x[obj] = value`` must be (broadcastable) to the same shape as
``x[obj]``.
.. note::
Remember that a slicing tuple can always be constructed as *obj*
and used in the ``x[obj]`` notation. Slice objects can be used in
the construction in place of the ``[start:stop:step]``
notation. For example, ``x[1:10:5,::-1]`` can also be implemented
as ``obj = (slice(1,10,5), slice(None,None,-1)); x[obj]`` . This
can be useful for constructing generic code that works on arrays
of arbitrary dimension.
.. data:: newaxis
:noindex:
The :const:`newaxis` object can be used in all slicing operations to
create an axis of length one. :const:`newaxis` is an alias for
'None', and 'None' can be used in place of this with the same result.
Advanced Indexing
-----------------
Advanced indexing is triggered when the selection object, *obj*, is a
non-tuple sequence object, an :class:`ndarray` (of data type integer or bool),
or a tuple with at least one sequence object or ndarray (of data type
integer or bool). There are two types of advanced indexing: integer
and Boolean.
Advanced indexing always returns a *copy* of the data (contrast with
some cases in basic slicing that returns a view).
.. warning::
The definition of advanced indexing means that ``x[(1,2,3),]`` is
fundamentally different than ``x[(1,2,3)]``. The latter is
equivalent to ``x[1,2,3]`` which will trigger basic selection while
the former will trigger advanced indexing. Be sure to understand
why this occurs.
Also recognize that ``x[[1,2,3]]`` will trigger advanced indexing,
whereas due to the deprecated Numeric compatibility mentioned above,
``x[[1,2,slice(None)]]`` will trigger basic slicing in the official NumPy
which is not currently supported in MXNet `numpy` module.
Integer array indexing
^^^^^^^^^^^^^^^^^^^^^^
Integer array indexing allows selection of arbitrary items in the array
based on their *N*-dimensional index. Each integer array represents a number
of indexes into that dimension.
Purely integer array indexing
"""""""""""""""""""""""""""""
When the index consists of as many integer arrays as the array being indexed
has dimensions, the indexing is straight forward, but different from slicing.
Advanced indexes always are broadcasting and
iterated as *one*::
result[i_1, ..., i_M] == x[ind_1[i_1, ..., i_M], ind_2[i_1, ..., i_M],
..., ind_N[i_1, ..., i_M]]
Note that the result shape is identical to the (broadcast) indexing array
shapes ``ind_1, ..., ind_N``.
.. admonition:: Example
From each row, a specific element should be selected. The row index is just
``[0, 1, 2]`` and the column index specifies the element to choose for the
corresponding row, here ``[0, 1, 0]``. Using both together the task
can be solved using advanced indexing:
>>> x = np.array([[1, 2], [3, 4], [5, 6]])
>>> x[[0, 1, 2], [0, 1, 0]]
array([1, 4, 5])
Combining advanced and basic indexing
"""""""""""""""""""""""""""""""""""""
When there is at least one slice (``:``), ellipsis (``...``) or :const:`newaxis`
in the index (or the array has more dimensions than there are advanced indexes),
then the behaviour can be more complicated. It is like concatenating the
indexing result for each advanced index element
In the simplest case, there is only a *single* advanced index. A single
advanced index can for example replace a slice and the result array will be
the same, however, it is a copy and may have a different memory layout.
A slice is preferable when it is possible.
.. admonition:: Example
>>> x[1:2, 1:3]
array([[4, 5]])
>>> x[1:2, [1, 2]]
array([[4, 5]])
The easiest way to understand the situation may be to think in
terms of the result shape. There are two parts to the indexing operation,
the subspace defined by the basic indexing (excluding integers) and the
subspace from the advanced indexing part. Two cases of index combination
need to be distinguished:
* The advanced indexes are separated by a slice, :const:`Ellipsis` or :const:`newaxis`.
For example ``x[arr1, :, arr2]``.
* The advanced indexes are all next to each other.
For example ``x[..., arr1, arr2, :]`` but *not* ``x[arr1, :, 1]``
since ``1`` is an advanced index in this regard.
In the first case, the dimensions resulting from the advanced indexing
operation come first in the result array, and the subspace dimensions after
that.
In the second case, the dimensions from the advanced indexing operations
are inserted into the result array at the same spot as they were in the
initial array (the latter logic is what makes simple advanced indexing
behave just like slicing).
.. admonition:: Example
Suppose ``x.shape`` is (10,20,30) and ``ind`` is a (2,3,4)-shaped
indexing :class:`intp` array, then ``result = x[...,ind,:]`` has
shape (10,2,3,4,30) because the (20,)-shaped subspace has been
replaced with a (2,3,4)-shaped broadcasted indexing subspace. If
we let *i, j, k* loop over the (2,3,4)-shaped subspace then
``result[...,i,j,k,:] = x[...,ind[i,j,k],:]``. This example
produces the same result as :meth:`x.take(ind, axis=-2) <ndarray.take>`.
.. admonition:: Example
Let ``x.shape`` be (10,20,30,40,50) and suppose ``ind_1``
and ``ind_2`` can be broadcast to the shape (2,3,4). Then
``x[:,ind_1,ind_2]`` has shape (10,2,3,4,40,50) because the
(20,30)-shaped subspace from X has been replaced with the
(2,3,4) subspace from the indices. However,
``x[:,ind_1,:,ind_2]`` has shape (2,3,4,10,30,50) because there
is no unambiguous place to drop in the indexing subspace, thus
it is tacked-on to the beginning. It is always possible to use
:meth:`.transpose() <ndarray.transpose>` to move the subspace
anywhere desired. Note that this example cannot be replicated
using :func:`take`.
Boolean array indexing
^^^^^^^^^^^^^^^^^^^^^^
This advanced indexing occurs when obj is an array object of Boolean
type, such as may be returned from comparison operators. A single
boolean index array is practically identical to ``x[obj.nonzero()]`` where,
as described above, :meth:`obj.nonzero() <ndarray.nonzero>` returns a
tuple (of length :attr:`obj.ndim <ndarray.ndim>`) of integer index
arrays showing the :const:`True` elements of *obj*. However, it is
faster when ``obj.shape == x.shape``.
If ``obj.ndim == x.ndim``, ``x[obj]`` returns a 1-dimensional array
filled with the elements of *x* corresponding to the :const:`True`
values of *obj*. The search order will be row-major,
C-style. If *obj* has :const:`True` values at entries that are outside
of the bounds of *x*, then an index error will be raised. If *obj* is
smaller than *x* it is identical to filling it with :const:`False`.
.. note::
Boolean indexing currently only supports a single boolean ndarray as a index.
An composite index including a boolean array is not supported for now.
If there is only one Boolean array and no integer indexing array present,
this is straight forward. Care must only be taken to make sure that the
boolean index has *exactly* as many dimensions as it is supposed to work
with.
.. admonition:: Example
From an array, select all rows which sum up to less or equal two:
>>> x = np.array([[0, 1], [1, 1], [2, 2]], dtype=np.int32)
>>> rowsum = x.sum(-1)
>>> x[rowsum <= 2]
array([[0, 1],
[1, 1]], dtype=int32)
But if ``rowsum`` would have two dimensions as well:
>>> rowsum = x.sum(-1, keepdims=True)
>>> rowsum.shape
(3, 1)
>>> x[rowsum <= 2] # fail
IndexError: boolean index did not match indexed array along dimension 1
Detailed notes
--------------
These are some detailed notes, which are not of importance for day to day
indexing (in no particular order):
* For advanced assignments, there is in general no guarantee for the
iteration order. This means that if an element is set more than once,
it is not possible to predict the final result.
* An empty (tuple) index is a full scalar index into a zero dimensional array.
``x[()]`` returns a *scalar* `ndarray` if ``x`` has zero dimensions.
On the other hand ``x[...]`` always returns a view.
* If a zero dimensional array is present in the index *and* it is *not considered as* a full
integer index as in NumPy. Advanced indexing is not triggered.
* the ``nonzero`` equivalence for Boolean arrays does not hold for zero
dimensional boolean arrays.
* When the result of an advanced indexing operation has no elements but an
individual index is out of bounds, currently no ``IndexError`` is
raised as in NumPy.
.. index::
single: indexing
single: ndarray