NumPy是一个开源的Python科学计算基础库

强大的N维数据兑现ndarry
广播功能函数
整合C/C++/Fortran代码的工具
线性代数、傅立叶变换、随机数生成等功能

安装

OSX: pip install numpy

数组ndarray

实际数据
描述这些数据的元数据（数据的维度、数据类型等）

属性

属性	描述
ndim	维度的数量
shape	矩阵的行列数
size	数组元素个数
dtype	数组元素的类型
itemsize	数组元素的大小

>>> import numpy as np
>>> a = np.array([[0,1,2,3],[1,2,3,4]])
>>> print a
[[0 1 2 3]
 [1 2 3 4]]
>>> a.dim
2
>>> a.shape
(2, 4)
>>> a.size
8
>>> a.dtype
dtype('int64')
>>> a.itemsize
8

ndarray数组可以由非同质对象构成，非同质ndarray元素为对象类型。非同质ndarray对象无法有效发挥NumPy优势，应尽量避免使用。

>>> b = np.array([[1,2,3],[1,2]])
>>> print b
[list([1, 2, 3]) list([1, 2])]
>>> b.dtype
dtype('O')  # 'O'为对象类型
>>> b.shape
(2,)

ndarray元素类型

数据类型	描述
bool	布尔类型 True or False
intc	与C语言中的int一致
intp	与C语言中的size_t一致
int8	8位长度的整数[-128,127]
int16	16位长度的整数[-32768,32767]
int32	32位长度的整数[-231,231-1]
int64	64位长度的整数[-263,263-1]
uint8	8位无符合整数
uint16	16位无符号整数
uint32	32位无符号整数
uint64	64位无符号整数
float16	16位半精度浮点数
float32	32位半精度浮点数
float64	64位半精度浮点数
complex64	复数类型，实部和虚部都是32位浮点数
complex128	复数类型，实部和虚部都是64位浮点数

ndarray创建

从Python中的列表、元组等类型创建数组

np.array(list/tuple) or np.array(list/tuple, dtype=np.float32)当np.array不指定dtype时，NumPy将根据数据情况关联一个dtype类型。

使用NumPy中的函数创建数组

函数	描述
np.arange(n)	返回一个从0到n-1的整数类型数组
np.ones(shape)	根据shape生成一个全1的数组
np.zeros(shape)	根据shape生成一个全0的数组
np.full(shape,val)	根据shape生成一个全val的数组
np.eye(n)	返回一个n*n的数组，对角线为1，其余为0
np.ones_like(a)	根据数组a的shape生成一个全1的数组
np.zeros_like(a)	根据数组a的shape生成一个全0的数组
np.full_like(a,val)	根据数组a的shape生成一个全val的数组
np.linspace()	根据起止数据等间距的填充数据，形成数组
np.concatenate()	将两个或多个数组合并成一个新的数组

arange

1
2
3

>>> d = np.arange(10)
>>> print d
[0 1 2 3 4 5 6 7 8 9]

ones zeros full

ones,zeros生成的数组元素都是浮点数类型(float64)，而full生成的是整数类型。若想要ones和zeros也生成整数类型，增加参数dtype=np.int64

>>> print np.zeros((3,4))
[[ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]
 [ 0.  0.  0.  0.]]
>>> print np.ones((3,4))
[[ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]
 [ 1.  1.  1.  1.]]
>>> print np.full((3,4), 8)
[[8 8 8 8]
 [8 8 8 8]
 [8 8 8 8]]

eye

>>> c = np.eye(5)
>>> print c
[[ 1.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]]

linspace

>>> print np.linspace(1,10,4)
[  1.   4.   7.  10.]
>>> print np.linspace(1,10,4, endpoint=False)
[ 1.    3.25  5.5   7.75]

endpoint表示最后一个元素10是否作为最后一个元素出现

concatenate

>>> print np.concatenate((c,c))
[[ 1.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]
 [ 1.  0.  0.  0.  0.]
 [ 0.  1.  0.  0.  0.]
 [ 0.  0.  1.  0.  0.]
 [ 0.  0.  0.  1.  0.]
 [ 0.  0.  0.  0.  1.]]
>>> print np.concatenate((c,c)).shape
(10, 5)

ndarray维度变换

函数	描述
.reshape(shape)	不改变数组元素，返回一个shape形状的数组，原数组不变
.resize(shape)	与reshape功能一般，但修改原数组
.swapaxes(ax1,ax2)	将数组n个维度中的两个维度进行调换
.flatten()	对数组进行降维，返回一维数组，原数组不变

reshape

>>> f = np.ones((2,3,4))
>>> print f.reshape((3,8))
[[ 1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.]]
>>> print f
[[[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]

 [[ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]
  [ 1.  1.  1.  1.]]]

resize

>>> f.resize((3,8))
>>> print f
[[ 1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.]
 [ 1.  1.  1.  1.  1.  1.  1.  1.]]

swapaxes

>>> print np.arange(12).reshape((3,4))
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]]
>>> print np.arange(12).reshape((3,4)).swapaxes(0,1)
[[ 0  4  8]
 [ 1  5  9]
 [ 2  6 10]
 [ 3  7 11]]

flatten

1 2	>>> print np.arange(12).reshape((3,4)).flatten() [ 0 1 2 3 4 5 6 7 8 9 10 11]

ndarray变换类型

>>> g = np.arange(12)
>>> g.dtype
dtype('int64')
>>> print g
[ 0  1  2  3  4  5  6  7  8  9 10 11]
>>> g.astype(np.float)
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,
        11.])
>>> print g
[ 0  1  2  3  4  5  6  7  8  9 10 11]

使用.astype改变Array数据类型，并不改变原数组的数据类型。

NumPy数组转换成Python列表

>>> type(g)
<type 'numpy.ndarray'>
>>> type(g.tolist())
<type 'list'>
>>> g.tolist()
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

ndarray获取操作

索引

>>> h = np.arange(24).reshape((2,3,4))
>>> print h
[[[ 0  1  2  3]
  [ 4  5  6  7]
  [ 8  9 10 11]]

 [[12 13 14 15]
  [16 17 18 19]
  [20 21 22 23]]]
>>> h[1,2,3]
23
>>> h[-1,-2,-3]
17

切片

>>> h[:,1,-3]
array([ 5, 17])
>>> h[:,1:3,:]
array([[[ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[16, 17, 18, 19],
        [20, 21, 22, 23]]])
>>> h[:,:,::2]
array([[[ 0,  2],
        [ 4,  6],
        [ 8, 10]],

       [[12, 14],
        [16, 18],
        [20, 22]]])

ndarray运算

与标量运算

>>> h / h.mean()
array([[[ 0.        ,  0.08695652,  0.17391304,  0.26086957],
        [ 0.34782609,  0.43478261,  0.52173913,  0.60869565],
        [ 0.69565217,  0.7826087 ,  0.86956522,  0.95652174]],

       [[ 1.04347826,  1.13043478,  1.2173913 ,  1.30434783],
        [ 1.39130435,  1.47826087,  1.56521739,  1.65217391],
        [ 1.73913043,  1.82608696,  1.91304348,  2.        ]]])

一元函数

函数	描述
np.abs(x) np.fabs(x)	计算元素的绝对值
np.sqrt(x)	计算元素的平方根
np.square(x)	计算元素的平方
np.log(x) np.log10(x) np.log2(x)	计算元素的自然对数、10为底、2为底对数
np.ceil(x) np.floor(x)	计算元素的ceiliing floor值
np.rint(x)	计算元素的四舍五入值
np.modf(x)	计算元素的小数和整数部分以两个独立的数组形式返回
np.cos/cosh/sin/sinh/tan/tanh(x)	计算三角函数
np.exp(x)	计算元素的指数值
np.sign(x)	计算元素的符号值，1(+),0(0),-1(-)

二元函数

函数	描述
+ - * / **	加、减、乘、除、幂
np.maximum(x,y) np.fmax()	取最大值
np.minimum(x,y) np.fmin()	取最小值
np.mod(x,y)	模运算
np.copysign(x,y)	将数组y中各元素的符号赋值给数组x对应的元素
> < >= <= == !=	产生布尔型数组

>>> h > (h/h.mean())
array([[[False,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True]],

       [[ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True]]], dtype=bool)

NumPy I/O

CSV文件

CSV只能有效存储一维和二维数组

写入CSV文件

np.savetxt(frame, array, fmt='%.18e', delimiter=None)

frame
文件、字符串或产生器，可以是.gz或.bz2的压缩文件。
array
存入文件的数组
fmt
写入文件的格式，eg: %d %.2f %.18e
delimiter
分割字符串，默认是空格。

1 2	>>> np.savetxt('h.csv', h.reshape((3,8)), fmt="%d", delimiter=',') >>>

$ cat h.csv
0,1,2,3,4,5,6,7
8,9,10,11,12,13,14,15
16,17,18,19,20,21,22,23

读取CSV文件

np.loadtxt(frame, dtype=np.float, delimiter=None, unpack=False)

frame
文件、字符串或产生器，可以是.gz或.bz2的压缩文件。
dtype
数据类型，可选。
delimiter
分割字符串，默认是空格。
unpack
True，表示读入属性将分别写入不同变量。

>>> np.loadtxt('h.csv', delimiter=',')
array([[  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.],
       [  8.,   9.,  10.,  11.,  12.,  13.,  14.,  15.],
       [ 16.,  17.,  18.,  19.,  20.,  21.,  22.,  23.]])

多维度数据I/O

写入文件

ndarray.tofile(frame, sep='', format='%s')

frame
文件，字符串
sep
分割字符串，如果是空串，写入文件为二进制。
format
写入数据的格式

1 2	>>> h.tofile('h.dat', sep=',', format='%d') >>>

1 2	$ cat h.dat 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23

读取文件

np.fromfile(frame, dtype=np.float, count=-1, sep='')

frame
文件、字符串
dtype
读取的数据类型
count
读入元素个数，-1表示读入整个文件。
sep
分割字符串，如果是空串，读取文件为二进制。

>>> h.tofile('h.dat', sep=',', format='%d')
>>> np.fromfile('h.dat', dtype=np.int, sep=',')
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
       17, 18, 19, 20, 21, 22, 23])
>>> np.fromfile('h.dat', dtype=np.int, sep=',').reshape((3,8))
array([[ 0,  1,  2,  3,  4,  5,  6,  7],
       [ 8,  9, 10, 11, 12, 13, 14, 15],
       [16, 17, 18, 19, 20, 21, 22, 23]])

使用fromfile读取文件，需要知道数组的shape

便捷文件读写

写入文件

np.save(frame, array) or np.savez(frame, array)

frame
文件名，以.npy为扩展名，压缩扩展名为.npz
array
数组变量

1	>>> np.save('h.npy',h)

1 2	$ cat h.npy �NUMPYF{'descr': '<i8', 'fortran_order': False, 'shape': (2, 3, 4), }

读取文件

np.load(frame)

frame
文件名，以.npy为扩展名，压缩扩展名为.npz

>>> np.load('h.npy')
array([[[ 0,  1,  2,  3],
        [ 4,  5,  6,  7],
        [ 8,  9, 10, 11]],

       [[12, 13, 14, 15],
        [16, 17, 18, 19],
        [20, 21, 22, 23]]])

NumPy的随机函数

import numpy.random.* or np.random.rand()、np.random.randn()、np.random.randint() …

随机产生

函数	描述
rand(d0,d1…,dn)	根据d0-dn创建随机数数组，浮点数，[0,1)，均匀分布
randn(d0,d1…,dn)	根据d0-dn创建随机数数组，标准正态分布
randint(low[,high,shape])	根据shape创建随机整数或整数数组，范围是[low,high)
seed(s)	随机数种子，s是给定的种子值

>>> np.random.rand(2,3,4)
array([[[ 0.90829822,  0.22946923,  0.6652444 ,  0.99272168],
        [ 0.06080758,  0.44248097,  0.93218605,  0.52258628],
        [ 0.72315102,  0.15270485,  0.78927709,  0.69494654]],

       [[ 0.35967592,  0.94780636,  0.70360292,  0.78655759],
        [ 0.99834954,  0.5114802 ,  0.41374511,  0.52093554],
        [ 0.76139948,  0.71690824,  0.05146873,  0.24752045]]])
>>> np.random.randn(2,3,4)
array([[[ 0.27215006,  0.88541886, -1.23316095, -0.33090933],
        [-0.02253028,  0.09407495,  1.76527947, -0.29149431],
        [-0.3821042 ,  0.29664845, -0.73678807, -1.0182318 ]],

       [[-0.46816445, -1.62242958, -0.98800274,  0.5167897 ],
        [-0.14876304,  0.60325633, -1.22824121,  1.40063683],
        [ 0.7508062 ,  0.42850387,  0.67736631,  0.68300665]]])
>>> np.random.randint(50,80,(2,3,4))
array([[[67, 79, 65, 56],
        [76, 60, 58, 70],
        [74, 61, 53, 71]],

       [[72, 67, 61, 53],
        [58, 66, 52, 69],
        [60, 70, 73, 64]]])
>>> np.random.seed(1111)
>>> np.random.randint(50,80,(2,3,4))
array([[[78, 73, 55, 67],
        [62, 52, 70, 74],
        [72, 70, 61, 64]],

       [[58, 56, 62, 64],
        [68, 72, 58, 59],
        [79, 69, 60, 62]]])
>>> np.random.seed(1111)
>>> np.random.randint(50,80,(2,3,4))
array([[[78, 73, 55, 67],
        [62, 52, 70, 74],
        [72, 70, 61, 64]],

       [[58, 56, 62, 64],
        [68, 72, 58, 59],
        [79, 69, 60, 62]]])

随机排序拾取

函数	描述
shuffle(a)	对数组a按照最外维度进行随机排列，数组a改变
permutation(a)	对数组a按照最外维度进行随机排列，数组a不改变，生成新数组
choice(a[,size,replace,p])	从一维数组中以概率p抽取元素，形成size形状新数组，replace表示是否重复抽取(默认True)

>>> i = h.reshape((3,8))
>>> print i
[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]]
>>> print np.random.permutation(i)
[[ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]
 [ 0  1  2  3  4  5  6  7]]
>>> print i
[[ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]
 [16 17 18 19 20 21 22 23]]
>>> np.random.shuffle(i)
>>> print i
[[16 17 18 19 20 21 22 23]
 [ 0  1  2  3  4  5  6  7]
 [ 8  9 10 11 12 13 14 15]]
>>> np.random.choice(i.flatten(), size=(3,4))
array([[ 1,  1, 15, 10],
       [16, 20,  0, 21],
       [13,  9,  5, 10]])
>>> np.random.choice(i.flatten(), size=(3,4), replace=False)
array([[ 6,  2, 12, 19],
       [ 7,  9, 13, 22],
       [17, 21, 20,  0]])

随机分布

函数	描述
uniform(low,high,size)	产生均匀分布的数组，low起始值，hight结束值，size形状
normal(loc,scale,size)	产生正态分布的数组，loc均值，scale标准差，size形状
poisson(lam,size)	产生泊松分布的数组，lam随机事件发生率，size形状

>>> np.random.uniform(0,10,(3,4))
array([[ 7.60854954,  6.30411065,  0.79332327,  1.30857967],
       [ 3.00316754,  3.22281521,  1.61121489,  3.15193241],
       [ 5.25528483,  6.45849382,  7.50645475,  0.56027292]])
>>> np.random.normal(10,5,(3,4))
array([[  8.29746473,  14.35578466,   9.40820399,   9.47806228],
       [ 20.81393417,   6.16603104,   2.10005772,  11.39870886],
       [ 11.81287925,   5.19702161,  11.13850168,  13.95413254]])

NumPy的统计函数

NumPy直接提供库一级别的统计函数import numpy.* or np.std()、np.var()…

函数	描述
sum(a, axis=None)	对数组a求和，默认所有元素，axis指定维度
mean((a, axis=None)a	对数组a求平均值，默认所有元素，axis指定维度
average(a, axis=None, weights=None)	对数组a求加权平均值，默认所有元素，axis指定维度
std(a, axis=None)	对数组a求标准差，默认所有元素，axis指定维度
var(a, axis=None)	对数组a求方差，默认所有元素，axis指定维度

>>> print h
[[[16 17 18 19]
  [20 21 22 23]
  [ 0  1  2  3]]

 [[ 4  5  6  7]
  [ 8  9 10 11]
  [12 13 14 15]]]
>>> np.sum(h)
276
>>> np.sum(h,axis=0)
array([[20, 22, 24, 26],
       [28, 30, 32, 34],
       [12, 14, 16, 18]])
>>> np.sum(h,axis=1)
array([[36, 39, 42, 45],
       [24, 27, 30, 33]])
>>> np.sum(h,axis=2)
array([[70, 86,  6],
       [22, 38, 54]])
>>> np.average(h,axis=1,weights=[10,5,1])
array([[ 16.25,  17.25,  18.25,  19.25],
       [  5.75,   6.75,   7.75,   8.75]])
# (16*10 + 20*5 + 0*1)/(10+5+1)=16.25
# (17*10 + 21*5 + 1*1)/(10+5+1)=17.25
# (18*10 + 22*5 + 2*1)/(10+5+1)=18.25

函数	描述
min(a) max(a)	数组a中的最小值、最大值
argmin(a) argmax(a)	数组a中最小值、最大值降到一维后的下标索引
unravel_index(index,shape)	根据shape将一维下标索引index转换成多维下标索引
ptp(a)	数组a中最大值与最小值的差
median(a)	数组a中元素的中位数（中值）

>>> print h
[[[16 17 18 19]
  [20 21 22 23]
  [ 0  1  2  3]]

 [[ 4  5  6  7]
  [ 8  9 10 11]
  [12 13 14 15]]]
>>> np.max(h)
23
>>> np.argmax(h)
7
>>> np.unravel_index(np.argmax(h), (2,3,4))
(0, 1, 3)
>>> np.ptp(h)
23
>>> np.median(h)
11.5

NumPy的梯度函数

梯度

连续值之间的变化率，即斜率。若有a,b,c三个值，b存在两侧值，b的梯度=(c-a)/2；若有a，b两个值，b只有一侧值，b或a的梯度＝(b-a)/1

梯度计算

np.gradient(a)计算数组a中的梯度，若a为多维数组，返回每个维度梯度

>>> np.gradient(h)
[array([[[-12., -12., -12., -12.],
         [-12., -12., -12., -12.],
         [ 12.,  12.,  12.,  12.]],

        [[-12., -12., -12., -12.],
         [-12., -12., -12., -12.],
         [ 12.,  12.,  12.,  12.]]]), 
 array([[[  4.,   4.,   4.,   4.],
         [ -8.,  -8.,  -8.,  -8.],
         [-20., -20., -20., -20.]],

        [[  4.,   4.,   4.,   4.],
         [  4.,   4.,   4.,   4.],
         [  4.,   4.,   4.,   4.]]]),
 array([[[ 1.,  1.,  1.,  1.],
         [ 1.,  1.,  1.,  1.],
         [ 1.,  1.,  1.,  1.]],

        [[ 1.,  1.,  1.,  1.],
         [ 1.,  1.,  1.,  1.],
         [ 1.,  1.,  1.,  1.]]])]
>>> print h
[[[16 17 18 19]
  [20 21 22 23]
  [ 0  1  2  3]]

 [[ 4  5  6  7]
  [ 8  9 10 11]
  [12 13 14 15]]]

梯度计算用于发现声音或图像数据的边缘，当梯度发生很大变化时，此处即为边缘

参考&鸣谢

Python数据分析与展示