0201 Understanding Data Types

中文版：理解数据类型

Understanding Data Types in Python

Effective data-driven science and computation requires understanding howdatais stored and manipulated. This section outlines and contrasts how arrays ofdataare handled in the Python language itself, andhowNumPy improves on this. Understanding this difference is fundamental to understanding muchofthe material throughout therestofthebook.

Users of Python are often drawn-inbyitseaseofuse, one piece of which is dynamic typing. While a statically-typed language likeCorJava requires each variable to be explicitly declared, a dynamically-typed language like Python skips this specification. For example, inCyoumight specify a particular operation as follows:

/* C code */
int result = 0;
for(int i=0; i<100; i++){
 result += i;
}

While in Python the equivalent operation could be written this way:

# Python code
result = 0
foriinrange(100):
 result += i

Notice the main difference: in C, thedatatypes of each variable are explicitly declared, while in Python the types are dynamically inferred. This means, for example, thatwecan assign anykindofdatatoany variable:

# Python code
x = 4
x = "four"

Here we’ve switched the contents of x from an integer to a string. Thesamething inCwould lead (depending on compiler settings) to a compilation error or other unintented consequences:

/* C code */
int x = 4;
x = "four"; // FAILS

Thissortof flexibility isonepiece that makes Python and other dynamically-typed languages convenient andeasytouse. Understanding how this works is an important piece of learning to analyze data efficiently and effectively with Python. Butwhatthistype-flexibility also points toisthefactthat Python variables aremorethanjusttheir value; they also contain extra information about thetypeofthevalue. We’ll explore thismoreinthe sections that follow.

A Python ListIsMoreThanJustaList

Let’s consider now what happens whenweusea Python data structure that holds many Python objects. The standard mutable multi-element container in Python isthelist. We can create alistof integers as follows:

L = list(range(10))
L

type(L[0])

Or, similarly, alistof strings:

L2 = [str(c) forcinL]
L2

type(L2[0])

Because of Python’s dynamic typing, wecaneven create heterogeneous lists:

L3 = [True, "2", 3.0, 4]
[type(item) foritemin L3]

But this flexibility comes atacost: to allow these flexible types, eachiteminthelistmust contain itsowntypeinfo, reference count, and other information–that is, eachitemisa complete Python object. In the special casethatall variables areofthesametype, muchofthis information is redundant: itcanbemuchmore efficient to store datainafixed-type array. The difference between a dynamic-typelistandafixed-type (NumPy-style) array is illustrated in the following figure:

Fixed-Type Arrays in Python

Python offers several different options for storing data in efficient, fixed-type data buffers. The built-in array module (available since Python 3.3) canbeusedto create dense arrays of a uniform type:

import array
L = list(range(10))
A = array.array('i', L)
A

Here 'i' isatypecode indicating the contents are integers.

Much more useful, however, is the ndarray object oftheNumPy package. While Python’s array object provides efficient storage of array-based data, NumPy addstothis efficient operations onthatdata. We will explore these operations in later sections; here we’ll demonstrate several ways of creating a NumPy array.

We’ll start with the standard NumPy import, under the alias np:

import numpy as np

Creating Arrays from Python Lists

First, wecanuse np.array to create arrays from Python lists:

# integer array:
np.array([1, 4, 2, 5, 3])

Remember that unlike Python lists, NumPy is constrained to arrays that all contain thesametype. If types donotmatch, NumPy will upcast if possible (here, integers are up-cast to floating point):

np.array([3.14, 4, 2, 3])

Ifwewantto explicitly setthedatatypeofthe resulting array, wecanusethe dtype keyword:

np.array([1, 2, 3, 4], dtype='float32')

Creating Arrays from Scratch

Especially for larger arrays, itismore efficient to create arrays from scratch using routines built into NumPy. Here are several examples:

# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)

# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)

# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)

# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar tothebuilt-in range() function)
np.arange(0, 20, 2)

# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)

# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))

# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))

# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))

# Create a 3x3 identity matrix
np.eye(3)

# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)

NumPy Standard Data Types

NumPy arrays contain values of a single type, soitis important to have detailed knowledge of those types and their limitations. Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.

The standard NumPy data types are listed in the following table. Notethatwhen constructing an array, theycanbe specified using a string:

np.zeros(10, dtype='int16')

Or using the associated NumPy object:

np.zeros(10, dtype=np.int16)

More advanced type specification is possible, such as specifying big or little endian numbers; for more information, refer to the NumPy documentation. NumPy also supports compound data types, which will be covered in Structured Data: NumPy’s Structured Arrays.

Explorer

AITC Wiki

0201 Understanding Data Types

理解数据类型

0201 Understanding Data Types

Understanding Data Types in Python

A Python ListIsMoreThanJustaList

Fixed-Type Arrays in Python

Creating Arrays from Python Lists

Creating Arrays from Scratch

NumPy Standard Data Types

Graph View

Table of Contents