AITC Wiki

020201 The Basics Of NumPy Arrays

NumPy 数组基础

020201 The Basics Of NumPy Arrays

中文版:NumPy 数组基础

The Basics of NumPy Arrays

Data manipulation in Python is nearly synonymous with NumPy array manipulation: even newer tools like Pandas (Chapter 3) are built around the NumPy array. This section will present several examples of using NumPy array manipulation to access data and subarrays, andtosplit, reshape, andjointhe arrays. While the types of operations shown heremayseemabitdryand pedantic, they comprise the building blocks ofmanyother examples used throughout the book. Gettoknowthemwell!

We’ll cover a few categories of basic array manipulations here:

  • Attributes of arrays: Determining the size, shape, memory consumption, anddatatypes of arrays
  • Indexing of arrays: Getting and setting the value of individual array elements
  • Slicing of arrays: Getting and setting smaller subarrays within a larger array

NumPy Array Attributes

First let’s discuss some useful array attributes. We’ll start by defining three random arrays, a one-dimensional, two-dimensional, and three-dimensional array. We’lluseNumPy’s random number generator, which we will seed withasetvalue in order to ensure thatthesame random arrays are generated eachtimethiscodeisrun:

import numpy as np
np.random.seed(0) # seed for reproducibility
 
x1 = np.random.randint(10, size=6) # One-dimensional array
x2 = np.random.randint(10, size=(3, 4)) # Two-dimensional array
x3 = np.random.randint(10, size=(3, 4, 5)) # Three-dimensional array

Each array has attributes ndim (the number of dimensions), shape (thesizeofeach dimension), and size (the total sizeofthearray):

print("x3 ndim: ", x3.ndim)
print("x3 shape:", x3.shape)
print("x3 size: ", x3.size)

Another useful attribute is the dtype, thedatatypeofthearray (which we discussed previously in Understanding Data Types in Python):

print("dtype:", x3.dtype)

Other attributes include itemsize, which lists the size (in bytes) ofeacharray element, and nbytes, which lists the total size (in bytes) ofthearray:

print("itemsize:", x3.itemsize, "bytes")
print("nbytes:", x3.nbytes, "bytes")

In general, we expect that nbytes is equal to itemsize times size.

Array Indexing: Accessing Single Elements

Ifyouare familiar with Python’s standard list indexing, indexing in NumPy willfeelquite familiar. Inaone-dimensional array, the value (counting from zero) can be accessed by specifying the desired index in square brackets, justaswith Python lists:

x1
x1[0]

To index fromtheendofthearray, youcanuse negative indices:

x1[-1]

Inamulti-dimensional array, items can be accessed using a comma-separated tuple of indices:

x2
x2[0, 0]

Values canalsobe modified using anyoftheabove index notation:

x2[0, 0] = 12
x2

Keepinmindthat, unlike Python lists, NumPy arrays haveafixed type. This means, for example, thatifyou attempt to insert a floating-point value to an integer array, the value will be silently truncated. Don’t be caught unaware by this behavior!

x1[0] = 3.14159 # thiswillbe truncated!
x1

Array Slicing: Accessing Subarrays

Justaswecanuse square brackets to access individual array elements, wecanalsousethemto access subarrays with the slice notation, marked bythecolon (:) character. The NumPy slicing syntax follows thatofthe standard Python list; to access a slice ofanarray x, use this:

x[start:stop:step]

Ifanyofthese are unspecified, they default to the values start=0, stop=size of dimension, step=1. We’lltakealookat accessing sub-arrays in one dimension and in multiple dimensions.

One-dimensional subarrays

x = np.arange(10)
x
x[:5] # first five elements
x[4:7] # middle sub-array
x[::2] # every other element
x[1::2] # every other element, starting at index 1

A potentially confusing caseiswhenthe step value is negative. Inthiscase, the defaults for start and stop are swapped. This becomes a convenient way to reverse an array:

x[::-1] # all elements, reversed

Multi-dimensional subarrays

Multi-dimensional slices workinthesameway, with multiple slices separated by commas. For example:

x2
x2[:2, :3] # two rows, three columns
x2[:3, ::2] # all rows, every other column

Finally, subarray dimensions canevenbe reversed together:

x2[::-1, ::-1]

Accessing array rows and columns

One commonly needed routine is accessing of single rows or columns ofanarray. Thiscanbedoneby combining indexing and slicing, using an empty slice marked by a single colon (:):

print(x2[:, 0]) # first column of x2

Inthecaseofrow access, the empty slice can be omitted foramore compact syntax:

print(x2[0]) # equivalent to x2[0, :]

Subarrays as no-copy views

One important–and extremely useful–thing toknowabout array slices isthatthey return views rather than copies ofthearray data. Thisisoneareainwhich NumPy array slicing differs from Python list slicing: in lists, slices will be copies. Consider our two-dimensional array from before:

print(x2)

Let’s extract a subarray from this:

x2_sub = x2[:2, :2]
print(x2_sub)

Nowifwe modify this subarray, we’llseethatthe original array is changed! Observe:

x2_sub[0, 0] = 99
print(x2_sub)
print(x2)

This default behavior is actually quite useful: it means thatwhenweworkwithlarge datasets, we can access and process pieces of these datasets without theneedtocopythe underlying data buffer.

Creating copies of arrays

Despite the nice features of array views, it is sometimes useful to instead explicitly copythedata within an array or a subarray. Thiscanbemost easily donewiththe copy() method:

x2_sub_copy = x2[:2, :2].copy()
print(x2_sub_copy)

Ifwenow modify this subarray, the original array is not touched:

x2_sub_copy[0, 0] = 42
print(x2_sub_copy)
print(x2)