0201 Understanding Data Types
中文版:理解数据类型
Understanding Data Types in Python
Effective data-driven science and computation requires understanding howdatais stored and manipulated. This section outlines and contrasts how arrays ofdataare handled in the Python language itself, andhowNumPy improves on this. Understanding this difference is fundamental to understanding muchofthe material throughout therestofthebook.
Users of Python are often drawn-inbyitseaseofuse, one piece of which is dynamic typing. While a statically-typed language likeCorJava requires each variable to be explicitly declared, a dynamically-typed language like Python skips this specification. For example, inCyoumight specify a particular operation as follows:
/* C code */
int result = 0;
for(int i=0; i<100; i++){
result += i;
}While in Python the equivalent operation could be written this way:
# Python code
result = 0
foriinrange(100):
result += iNotice the main difference: in C, thedatatypes of each variable are explicitly declared, while in Python the types are dynamically inferred. This means, for example, thatwecan assign anykindofdatatoany variable:
# Python code
x = 4
x = "four"Here we’ve switched the contents of x from an integer to a string. Thesamething inCwould lead (depending on compiler settings) to a compilation error or other unintented consequences:
/* C code */
int x = 4;
x = "four"; // FAILSThissortof flexibility isonepiece that makes Python and other dynamically-typed languages convenient andeasytouse. Understanding how this works is an important piece of learning to analyze data efficiently and effectively with Python. Butwhatthistype-flexibility also points toisthefactthat Python variables aremorethanjusttheir value; they also contain extra information about thetypeofthevalue. We’ll explore thismoreinthe sections that follow.
A Python ListIsMoreThanJustaList
Let’s consider now what happens whenweusea Python data structure that holds many Python objects. The standard mutable multi-element container in Python isthelist. We can create alistof integers as follows:
L = list(range(10))
Ltype(L[0])Or, similarly, alistof strings:
L2 = [str(c) forcinL]
L2type(L2[0])Because of Python’s dynamic typing, wecaneven create heterogeneous lists:
L3 = [True, "2", 3.0, 4]
[type(item) foritemin L3]But this flexibility comes atacost: to allow these flexible types, eachiteminthelistmust contain itsowntypeinfo, reference count, and other information–that is, eachitemisa complete Python object. In the special casethatall variables areofthesametype, muchofthis information is redundant: itcanbemuchmore efficient to store datainafixed-type array. The difference between a dynamic-typelistandafixed-type (NumPy-style) array is illustrated in the following figure:
Fixed-Type Arrays in Python
Python offers several different options for storing data in efficient, fixed-type data buffers.
The built-in array module (available since Python 3.3) canbeusedto create dense arrays of a uniform type:
import array
L = list(range(10))
A = array.array('i', L)
AHere 'i' isatypecode indicating the contents are integers.
Much more useful, however, is the ndarray object oftheNumPy package.
While Python’s array object provides efficient storage of array-based data, NumPy addstothis efficient operations onthatdata.
We will explore these operations in later sections; here we’ll demonstrate several ways of creating a NumPy array.
We’ll start with the standard NumPy import, under the alias np:
import numpy as npCreating Arrays from Python Lists
First, wecanuse np.array to create arrays from Python lists:
# integer array:
np.array([1, 4, 2, 5, 3])Remember that unlike Python lists, NumPy is constrained to arrays that all contain thesametype. If types donotmatch, NumPy will upcast if possible (here, integers are up-cast to floating point):
np.array([3.14, 4, 2, 3])Ifwewantto explicitly setthedatatypeofthe resulting array, wecanusethe dtype keyword:
np.array([1, 2, 3, 4], dtype='float32')Creating Arrays from Scratch
Especially for larger arrays, itismore efficient to create arrays from scratch using routines built into NumPy. Here are several examples:
# Create a length-10 integer array filled with zeros
np.zeros(10, dtype=int)# Create a 3x5 floating-point array filled with ones
np.ones((3, 5), dtype=float)# Create a 3x5 array filled with 3.14
np.full((3, 5), 3.14)# Create an array filled with a linear sequence
# Starting at 0, ending at 20, stepping by 2
# (this is similar tothebuilt-in range() function)
np.arange(0, 20, 2)# Create an array of five values evenly spaced between 0 and 1
np.linspace(0, 1, 5)# Create a 3x3 array of uniformly distributed
# random values between 0 and 1
np.random.random((3, 3))# Create a 3x3 array of normally distributed random values
# with mean 0 and standard deviation 1
np.random.normal(0, 1, (3, 3))# Create a 3x3 array of random integers in the interval [0, 10)
np.random.randint(0, 10, (3, 3))# Create a 3x3 identity matrix
np.eye(3)# Create an uninitialized array of three integers
# The values will be whatever happens to already exist at that memory location
np.empty(3)NumPy Standard Data Types
NumPy arrays contain values of a single type, soitis important to have detailed knowledge of those types and their limitations. Because NumPy is built in C, the types will be familiar to users of C, Fortran, and other related languages.
The standard NumPy data types are listed in the following table. Notethatwhen constructing an array, theycanbe specified using a string:
np.zeros(10, dtype='int16')Or using the associated NumPy object:
np.zeros(10, dtype=np.int16)| Datatype | Description |
| bool_ | Boolean (TrueorFalse) stored asabyte |
| int_ | Default integer type (sameasC long; normally either int64 or int32)|
| intc | Identical to C int (normally int32 or int64)|
| intp | Integer used for indexing (sameasC ssize_t; normally either int32 or int64)|
| int8 | Byte (-128 to 127)|
| int16 | Integer (-32768 to 32767)|
| int32 | Integer (-2147483648 to 2147483647)|
| int64 | Integer (-9223372036854775808 to 9223372036854775807)|
| uint8 | Unsigned integer (0 to 255)|
| uint16 | Unsigned integer (0 to 65535)|
| uint32 | Unsigned integer (0 to 4294967295)|
| uint64 | Unsigned integer (0 to 18446744073709551615)|
| float_ | Shorthand for float64.|
| float16 | Half precision float: sign bit, 5 bits exponent, 10 bits mantissa|
| float32 | Single precision float: sign bit, 8 bits exponent, 23 bits mantissa|
| float64 | Double precision float: sign bit, 11 bits exponent, 52 bits mantissa|
| complex_ | Shorthand for complex128.|
| complex64 | Complex number, represented by two 32-bit floats|
| complex128| Complex number, represented by two 64-bit floats|
More advanced type specification is possible, such as specifying big or little endian numbers; for more information, refer to the NumPy documentation. NumPy also supports compound data types, which will be covered in Structured Data: NumPy’s Structured Arrays.