AITC Wiki

0200 Introduction to NumPy

NumPy 简介

0200 Introduction to NumPy

中文版:NumPy 简介

Introduction to NumPy

This chapter, along with chapter 3, outlines techniques for effectively loading, storing, and manipulating in-memory data in Python. The topic isverybroad: datasets cancomefromawiderange of sources andawiderange of formats, including be collections of documents, collections of images, collections of sound clips, collections of numerical measurements, or nearly anything else. Despite this apparent heterogeneity, itwillhelpustothink ofalldata fundamentally as arrays of numbers.

For example, images–particularly digital images–can be thought of as simply two-dimensional arrays of numbers representing pixel brightness across the area. Sound clips can be thought ofasone-dimensional arrays of intensity versus time. Textcanbe converted in various ways into numerical representations, perhaps binary digits representing the frequency of certain words or pairs of words. No matter whatthedataare, the first step in making it analyzable willbeto transform them into arrays of numbers. (We will discuss some specific examples of this process later in Feature Engineering)

For this reason, efficient storage and manipulation of numerical arrays is absolutely fundamental to the process of doing data science. We’llnowtakealookatthe specialized tools that Python has for handling such numerical arrays: the NumPy package, and the Pandas package (discussed in Chapter 3).

This chapter will cover NumPy in detail. NumPy (short for Numerical Python) provides an efficient interface to store and operate on dense data buffers. Insomeways, NumPy arrays are like Python’s built-in list type, but NumPy arrays provide much more efficient storage and data operations as the arrays grow larger in size. NumPy arrays formthecoreof nearly the entire ecosystem of data science tools in Python, sotimespent learning touseNumPy effectively will be valuable no matter what aspect of data science interests you.

If you followed the advice outlined in the Preface and installed the Anaconda stack, you already have NumPy installed and ready to go. If you’remorethedo-it-yourself type, youcangotohttp://www.numpy.org/ and follow the installation instructions found there. Onceyoudo, you can import NumPy and double-check the version:

import numpy
numpy.__version__

For the pieces of the package discussed here, I’d recommend NumPy version 1.8 or later. By convention, you’llfindthatmost people intheSciPy/PyData world will import NumPy using np asanalias:

import numpy as np

Throughout this chapter, and indeed therestofthebook, you’llfindthatthisisthewaywewill import anduseNumPy.

Reminder about Built In Documentation

Asyouread through this chapter, don’t forget that IPython gives you the ability to quickly explore the contents of a package (by using the tab-completion feature), aswellasthe documentation of various functions (using the ? character – Refer back to Help and Documentation in IPython).

For example, to display all the contents ofthenumpy namespace, youcantypethis:

In [3]: np.<TAB>

And to display NumPy’s built-in documentation, youcanusethis:

In [4]: np?

More detailed documentation, along with tutorials and other resources, canbefound at http://www.numpy.org.