AITC Wiki

020202 The Basics Of NumPy Arrays

更多 NumPy 数组

020202 The Basics Of NumPy Arrays

中文版:更多 NumPy 数组

The Basics of NumPy Arrays

Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas (Chapter 3) are built around the NumPy array. This section will present several examples of using NumPy array manipulation to access data and subarrays, andtosplit, reshape, andjointhe arrays. While the types of operations shown heremayseemabitdryand pedantic, they comprise the building blocks ofmanyother examples used throughout the book. Gettoknowthemwell!

We’ll cover a few categories of basic array manipulations here:

  • Reshaping of arrays: Changing the shape ofagiven array
  • Joining and splitting of arrays: Combining multiple arrays into one, and splitting one array into many
x = np.arange(10)
x
x[:5] # first five elements
x[5:] # elements after index 5
x[4:7] # middle sub-array
x[::2] # every other element
x[1::2] # every other element, starting at index 1

A potentially confusing caseiswhenthe step value is negative. Inthiscase, the defaults for start and stop are swapped. This becomes a convenient way to reverse an array:

x[::-1] # all elements, reversed
x[5::-2] # reversed every other from index 5

Multi-dimensional subarrays

Multi-dimensional slices workinthesameway, with multiple slices separated by commas. For example:

x2
x2[:2, :3] # two rows, three columns
x2[:3, ::2] # all rows, every other column

Finally, subarray dimensions canevenbe reversed together:

x2[::-1, ::-1]

Accessing array rows and columns

One commonly needed routine is accessing of single rows or columns ofanarray. Thiscanbedoneby combining indexing and slicing, using an empty slice marked by a single colon (:):

print(x2[:, 0]) # first column of x2
print(x2[0, :]) # first row of x2

Inthecaseofrow access, the empty slice can be omitted foramore compact syntax:

print(x2[0]) # equivalent to x2[0, :]

Subarrays as no-copy views

One important–and extremely useful–thing toknowabout array slices isthatthey return views rather than copies ofthearray data. Thisisoneareainwhich NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider our two-dimensional array from before:

print(x2)

Let’s extract a subarray from this:

x2_sub = x2[:2, :2]
print(x2_sub)

Nowifwe modify this subarray, we’llseethatthe original array is changed! Observe:

x2_sub[0, 0] = 99
print(x2_sub)
print(x2)

This default behavior is actually quite useful: it means thatwhenweworkwithlarge datasets, we can access and process pieces of these datasets without theneedtocopythe underlying data buffer.

Creating copies of arrays

Despite the nice features of array views, it is sometimes useful to instead explicitly copythedata within an array or a subarray. Thiscanbemost easily donewiththe copy() method:

x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

Ifwenow modify this subarray, the original array is not touched:

x2_sub_copy[0, 0] = 42
print(x2_sub_copy)
print(x2)

Reshaping of Arrays

Another useful type of operation is reshaping of arrays. The most flexible wayofdoing thisiswiththe reshape method. For example, ifyouwanttoputthe numbers 1 through 9 in a grid, youcandothe following:

grid = np.arange(1, 10).reshape((3, 3))
print(grid)

Notethatforthistowork, thesizeofthe initial array must match thesizeofthe reshaped array. Where possible, the reshape method willuseano-copyviewofthe initial array, butwithnon-contiguous memory buffers thisisnot always the case.

Another common reshaping pattern is the conversion ofaone-dimensional array intoatwo-dimensional row or column matrix. Thiscanbedonewiththe reshape method, or more easily done by making useofthe newaxis keyword within a slice operation:

x = np.array([1, 2, 3])
 
# row vector via reshape
x.reshape((1, 3))
# row vector via newaxis
x[np.newaxis, :]
# column vector via reshape
x.reshape((3, 1))
# column vector via newaxis
x[:, np.newaxis]

Wewillseethistypeof transformation often throughout the remainder ofthebook.

Array Concatenation and Splitting

Allofthe preceding routines worked on single arrays. It’s also possible to combine multiple arrays into one, and to conversely split a single array into multiple arrays. We’lltakealookatthose operations here.

Concatenation of arrays

Concatenation, or joining of two arrays in NumPy, is primarily accomplished using the routines np.concatenate, np.vstack, and np.hstack. np.concatenate takes a tuple orlistof arrays asitsfirst argument, aswecanseehere:

x = np.array([1, 2, 3])
y = np.array([3, 2, 1])
np.concatenate([x, y])

Youcanalso concatenate morethantwo arrays at once:

z = [99, 99, 99]
print(np.concatenate([x, y, z]))

Itcanalsobeusedfortwo-dimensional arrays:

grid = np.array([[1, 2, 3],
 [4, 5, 6]])
# concatenate along the first axis
np.concatenate([grid, grid])
# concatenate along the second axis (zero-indexed)
np.concatenate([grid, grid], axis=1)

For working with arrays of mixed dimensions, itcanbe clearer tousethe np.vstack (vertical stack) and np.hstack (horizontal stack) functions:

x = np.array([1, 2, 3])
grid = np.array([[9, 8, 7],
 [6, 5, 4]])
 
# vertically stack the arrays
np.vstack([x, grid])
# horizontally stack the arrays
y = np.array([[99],
 [99]])
np.hstack([grid, y])

Similary, np.dstack will stack arrays along the third axis.

Splitting of arrays

The opposite of concatenation is splitting, which is implemented by the functions np.split, np.hsplit, and np.vsplit. Foreachofthese, wecanpassalistof indices giving the split points:

x = [1, 2, 3, 99, 99, 3, 2, 1]
x1, x2, x3 = np.split(x, [3, 5])
print(x1, x2, x3)

Notice that N split-points, leads to N + 1 subarrays. The related functions np.hsplit and np.vsplit are similar:

grid = np.arange(16).reshape((4, 4))
grid
upper, lower = np.vsplit(grid, [2])
print(upper)
print(lower)
left, right = np.hsplit(grid, [2])
print(left)
print(right)

Similarly, np.dsplit will split arrays along the third axis.