020202 The Basics Of NumPy Arrays
中文版:更多 NumPy 数组
The Basics of NumPy Arrays
Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas (Chapter 3) are built around the NumPy array. This section will present several examples of using NumPy array manipulation to access data and subarrays, andtosplit, reshape, andjointhe arrays. While the types of operations shown heremayseemabitdryand pedantic, they comprise the building blocks ofmanyother examples used throughout the book. Gettoknowthemwell!
We’ll cover a few categories of basic array manipulations here:
- Reshaping of arrays: Changing the shape ofagiven array
- Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many
x = np.arange(10)
xx[:5] # first five elementsx[5:] # elements after index 5x[4:7] # middle sub-arrayx[::2] # every other elementx[1::2] # every other element, starting at index 1A potentially confusing caseiswhenthe step value is negative.
Inthiscase, the defaults for start and stop are swapped.
This becomes a convenient way to reverse an array:
x[::-1] # all elements, reversedx[5::-2] # reversed every other from index 5Multi-dimensional subarrays
Multi-dimensional slices workinthesameway, with multiple slices separated by commas. For example:
x2x2[:2, :3] # two rows, three columnsx2[:3, ::2] # all rows, every other columnFinally, subarray dimensions canevenbe reversed together:
x2[::-1, ::-1]Accessing array rows and columns
One commonly needed routine is accessing of single rows or columns ofanarray.
Thiscanbedoneby combining indexing and slicing, using an empty slice marked by a single colon (:):
print(x2[:, 0]) # first column of x2print(x2[0, :]) # first row of x2Inthecaseofrow access, the empty slice can be omitted foramore compact syntax:
print(x2[0]) # equivalent to x2[0, :]Subarrays as no-copy views
One important–and extremely useful–thing toknowabout array slices isthatthey return views rather than copies ofthearray data. Thisisoneareainwhich NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider our two-dimensional array from before:
print(x2)Let’s extract a subarray from this:
x2_sub = x2[:2, :2]
print(x2_sub)Nowifwe modify this subarray, we’llseethatthe original array is changed! Observe:
x2_sub[0, 0] = 99
print(x2_sub)print(x2)This default behavior is actually quite useful: it means thatwhenweworkwithlarge datasets, we can access and process pieces of these datasets without theneedtocopythe underlying data buffer.
Creating copies of arrays
Despite the nice features of array views, it is sometimes useful to instead explicitly copythedata within an array or a subarray. Thiscanbemost easily donewiththe copy() method:
x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)Ifwenow modify this subarray, the original array is not touched:
x2_sub_copy[0, 0] = 42
print(x2_sub_copy)print(x2)Reshaping of Arrays
Another useful type of operation is reshaping of arrays.
The most flexible wayofdoing thisiswiththe reshape method.
For example, ifyouwanttoputthe numbers 1 through 9 in a grid, youcandothe following:
grid = np.arange(1, 10).reshape((3, 3))
print(grid)Notethatforthistowork, thesizeofthe initial array must match thesizeofthe reshaped array.
Where possible, the reshape method willuseano-copyviewofthe initial array, butwithnon-contiguous memory buffers thisisnot always the case.
Another common reshaping pattern is the conversion ofaone-dimensional array intoatwo-dimensional row or column matrix.
Thiscanbedonewiththe reshape method, or more easily done by making useofthe newaxis keyword within a slice operation:
x = np.array([1, 2, 3])
# row vector via reshape
x.reshape((1, 3))# row vector via newaxis
x[np.newaxis, :]# column vector via reshape
x.reshape((3, 1))# column vector via newaxis
x[:, np.newaxis]Wewillseethistypeof transformation often throughout the remainder ofthebook.
Array Concatenation and Splitting
Allofthe preceding routines worked on single arrays. It’s also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We’lltakealookatthose operations here.
Concatenation of arrays
Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines np.concatenate, np.vstack, and np.hstack.
np.concatenate takes a tuple orlistof arrays asitsfirst argument, aswecanseehere:
x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])Youcanalso concatenate morethantwo arrays at once:
z = [99, 99, 99]
print(np.concatenate([x, y, z]))Itcanalsobeusedfortwo-dimensional arrays:
grid = np.array([[1, 2, 3],
[4, 5, 6]])# concatenate along the first axis
np.concatenate([grid, grid])# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)For working with arrays of mixed dimensions, itcanbe clearer tousethe np.vstack (vertical stack) and np.hstack (horizontal stack) functions:
x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
[6, 5, 4]])
# vertically stack the arrays
np.vstack([x, grid])# horizontally stack the arrays
y = np.array([[99],
[99]])
np.hstack([grid, y])Similary, np.dstack will stack arrays along the third axis.
Splitting of arrays
The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. Foreachofthese, wecanpassalistof indices giving the split points:
x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)Notice that N split-points, leads to N + 1 subarrays.
The related functions np.hsplit and np.vsplit are similar:
grid = np.arange(16).reshape((4, 4))
gridupper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)left, right = np.hsplit(grid, [2])
print(left)
print(right)Similarly, np.dsplit will split arrays along the third axis.