AITC Wiki

0207 Fancy Indexing

花式索引

0207 Fancy Indexing

中文版:花式索引

Fancy Indexing

In the previous sections, wesawhowto access and modify portions of arrays using simple indices (e.g., arr[0]), slices (e.g., arr[:5]), and Boolean masks (e.g., arr[arr > 0]). In this section, we’lllookat another style of array indexing, known as fancy indexing. Fancy indexing islikethe simple indexing we’ve already seen, butwepass arrays of indices in place of single scalars. This allows ustovery quickly access and modify complicated subsets ofanarray’s values.

Exploring Fancy Indexing

Fancy indexing is conceptually simple: it means passing an array of indices to access multiple array elements at once. For example, consider the following array:

import numpy as np
rand = np.random.RandomState(42)
 
x = rand.randint(100, size=10)
print(x)

Suppose wewantto access three different elements. We could doitlikethis:

[x[3], x[7], x[2]]

Alternatively, wecanpassa single listorarray of indices to obtain the same result:

ind = [3, 7, 4]
x[ind]

When using fancy indexing, the shape of the result reflects the shape of the index arrays rather thantheshape of the array being indexed:

ind = np.array([[3, 7],
 [4, 5]])
x[ind]

Fancy indexing also works in multiple dimensions. Consider the following array:

X = np.arange(12).reshape((3, 4))
X

Like with standard indexing, the first index refers totherow, and the second to the column:

row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]

Notice thatthefirst value in the result is X[0, 2], the second is X[1, 1], andthethird is X[2, 3]. The pairing of indices in fancy indexing follows all the broadcasting rules that were mentioned in Computation on Arrays: Broadcasting. So, for example, if we combine a column vector andarow vector within the indices, wegetatwo-dimensional result:

X[row[:, np.newaxis], col]

Combined Indexing

Forevenmore powerful operations, fancy indexing can be combined withtheother indexing schemes we’ve seen:

print(X)

We can combine fancy and simple indices:

X[2, [2, 0, 1]]

Wecanalso combine fancy indexing with slicing:

X[1:, [2, 0, 1]]

Andwecan combine fancy indexing with masking:

mask = np.array([1, 0, 1, 0], dtype=bool)
X[row[:, np.newaxis], mask]

Allofthese indexing options combined leadtoavery flexible set of operations for accessing and modifying array values.

Example: Selecting Random Points

One common useoffancy indexing is the selection of subsets ofrowsfroma matrix. For example, we might have an by matrix representing points in dimensions, suchasthe following points drawn fromatwo-dimensional normal distribution:

mean = [0, 0]
cov = [[1, 2],
 [2, 5]]
X = rand.multivariate_normal(mean, cov, 100)
X.shape

Using the plotting tools we will discuss in Introduction to Matplotlib, we can visualize these points as a scatter-plot:

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn; seaborn.set() # for plot styling
 
plt.scatter(X[:, 0], X[:, 1]);

Let’susefancy indexing to select 20 random points. We’lldothisbyfirst choosing 20 random indices with no repeats, andusethese indices to select a portion of the original array:

indices = np.random.choice(X.shape[0], 20, replace=False)
indices
selection = X[indices] # fancy indexing here
selection.shape

Nowtoseewhich points were selected, let’s over-plot large circles at the locations of the selected points:

plt.scatter(X[:, 0], X[:, 1], alpha=0.3)
plt.scatter(selection[:, 0], selection[:, 1],
 facecolor='none', s=200);

Thissortof strategy is often used to quickly partition datasets, asisoften needed in train/test splitting for validation of statistical models (see Hyperparameters and Model Validation), and in sampling approaches to answering statistical questions.

Modifying Values with Fancy Indexing

Justasfancy indexing canbeusedto access parts ofanarray, itcanalsobeusedto modify parts ofanarray. For example, imagine wehaveanarray of indices and we’dliketosetthe corresponding items inanarray tosomevalue:

x = np.arange(10)
i = np.array([2, 1, 8, 4])
x[i] = 99
print(x)

Wecanuseany assignment-type operator for this. For example:

x[i] -= 10
print(x)

Notice, though, that repeated indices with these operations can cause some potentially unexpected results. Consider the following:

x = np.zeros(10)
x[[0, 0]] = [4, 6]
print(x)

Where did the 4 go? The result of this operation istofirst assign x[0] = 4, followed by x[0] = 6. The result, of course, is that x[0] contains the value 6.

Fair enough, but consider this operation:

i = [2, 3, 3, 4, 4, 4]
x[i] += 1
x

You might expect that x[3] would contain the value 2, and x[4] would contain the value 3, asthisishowmanytimes each index is repeated. Whyisthisnotthecase? Conceptually, this is because x[i] += 1 is meant as a shorthand of x[i] = x[i] + 1. x[i] + 1 is evaluated, andthenthe result is assigned to the indices in x. Withthisinmind, itisnotthe augmentation that happens multiple times, but the assignment, which leads to the rather nonintuitive results.

Sowhatifyouwanttheother behavior where the operation is repeated? For this, youcanusethe at() method of ufuncs (available since NumPy 1.8), anddothe following:

x = np.zeros(10)
np.add.at(x, i, 1)
print(x)

The at() method doesanin-place application ofthegiven operator at the specified indices (here, i) with the specified value (here, 1). Another method that is similar in spirit is the reduceat() method of ufuncs, which youcanreadabout intheNumPy documentation.