Python資料科學學習筆記-UniversalFunctions

2019-07-11

3 minutes read

資料科學

本篇文章為筆者的學習筆記，參考書籍為歐萊禮/Python資料科學學習手冊/Jake VanderPlas/何敏煌(譯) https://www.books.com.tw/products/0010774364

基本運算子

運算子	對應的ufunc
+	np.add
-	np.subtract
*	np.multiply
/	np.divide
//	np.floor_divide
**	np.power
%	np.mod
abs	np.absolute

三角函數

函數	對應的ufunc
$π$	np.pi
$sin$	np.sin
$cos$	np.cos
$tan$	np.tan
$arcsin$	np.arcsin
$arccos$	np.arccos
$arctan$	np.arctan
$log_e$	np.log
$log_2$	np.log2
$log_{10}$	np.log10
$e^x$	np.exp(x)
更精準的$log_e$	np.log1p
更精準的$e^x$	np.expm1(x)

ufunc透過out設定輸出位置

x = np.arange(5)
y = np.empty(5)
np.multiply(x,2,out=y)
# [0, 10, 20, 30, 40]

x = np.zeros(10)
np.power(2,x,out=y[::2])
# [1, 0, 2, 0, 4, 0, 7, 0, 16, 0]

聚合方法

從頭跑到尾取得最終結果

x = np.arange(1, 6)
np.add.reduce(x)

從頭跑到尾紀錄一路上的輸出

x = np.arange(1, 6)
np.add.accumulate(x)

外積

x = np.arange(1, 6)
np.add.outer(x, x)

Result:

[ 2,  3,  4,  5,  6],
[ 3,  4,  5,  6,  7],
[ 4,  5,  6,  7,  8],
[ 5,  6,  7,  8,  9],
[ 6,  7,  8,  9, 10]

x = np.arange(1, 6)
np.multiply.outer(x, x)

Result:

[ 1,  2,  3,  4,  5],
[ 2,  4,  6,  8, 10],
[ 3,  6,  9, 12, 15],
[ 4,  8, 12, 16, 20],
[ 5, 10, 15, 20, 25]

加總、最大、最小

使用到numpy的陣列時，建議使用numpy提供的function

big_array = np.random.rand(1000000)
array_2d = np.random.random((3,4))

加總

sum(big_array) #原生Python版本  
np.sum(big_array) #numpy版本

最大值

#原生Python版本  
max(big_array)  
#numpy版本  
np.max(big_array)  
#numpy陣列版本  
big_array.max()  
#取得各欄最大值  
np.max(array_2d, axix=0)  
#取得各列最大值  
np.max(array_2d, axix=1)

最小值

min(big_array) # 原生Python版本  
np.min(big_array) # numpy版本  
big_array.min() # numpy陣列版本  
np.min(array_2d, axix=0) # 取得各欄最小值  
np.min(array_2d, axix=1) # 取得各列最小值

axis用來指定陣列中要被收合起來的那個維度，而不是要被傳回的那個

函式	Nan-Safe版本	說明
np.sum	np.nansum	所有元素加總
np.prod	np.nanprod	所有元素乘積
np.mean	np.nanmean	所有元素平均值
np.std	np.nanstd	計算標準差
np.var	np.nanvar	計算變異數
np.min	np.nanmin	計算最小值
np.max	np.nanmax	計算最大值
np.argmin	np.nanargmin	找出最小值的索引
np.argmax	np.nanargmax	找出最大值的索引
np.median	np.nanmedian	找出中位數
np.percentile	np.nanpercentile	計算元素的排名統計(百分位數)
np.any	N/A	當陣列中任一值為True或非0時傳回True
np.all	N/A	當陣列中所有值為True或非0時傳回True

Broadcasting概念

最簡單概念是

a = np.arange(1,4)
a + 5

Result : [5, 6, 7]

數值5就如同被拉長成一個陣列[5, 5, 5]，然後再計算

再更進一步

M = np.ones((3, 3))
M + a

Result :

[[1, 2, 3],
 [1, 2, 3],
 [1, 2, 3]]

複雜一點

a = np.arange(3)
b = np.arange(3)[:, newaxis]
a + b

Result :

[[0, 1, 2],
 [1, 2, 3],
 [2, 3, 4]]

圖片來源:https://jakevdp.github.io/PythonDataScienceHandbook/02.05-computation-on-arrays-broadcasting.html

比較運算子

運算子	ufunc
==	np.equal
!=	np.not_equal
<	np.less
<=	np.less_equal
>	np.greater
>=	np.greater_equal

x = np.arange(1, 6)
x < 3

Result:array([ True, True, False, False, False])

計算元素數量

x = np.arange(10)

計算小於6的數量

np.count_nonzero(x < 6)

計算小於6的總和

np.sum(x < 6)

是否有任一值大於8

np.any(x > 8)

是否全都等於10

np.all(x > 10)

邏輯運算元

運算子	ufunc
&	np.bitwise_and
\|	np.bitwise_or
^	np.bitwise_xor
~	np.bitwise_not

布林遮罩

x = np.array([[5,0,3,3],
              [7,9,3,5],
              [2,4,7,6]])
x_mask = x < 5
x[x_mask]

Fancy索引:更強的索引機制

傳入索引陣列取值

x = np.random.randint(10, size=100)
idx = [3,7,5]
x[idx]

根據索引陣列形狀產生陣列

x = np.random.randint(10, size=100)
idx = np.array([[3,7],
       [5,9]])
x[idx]

多維陣列索引

X = np.arange(12).reshape((3,4))
row = np.array([0, 1, 2])
col = np.array([2, 1, 3])
X[row, col]

搭配切片

X = np.arange(12).reshape((3,4))
X[1:, [2,0,1]]

搭配遮罩

mask = np.array([1,0,1,0], dtype=np.bool)
X[row[:, np.newaxis], mask]

#python #學習筆記 #numpy

上一頁如何正確安裝VSCode

下一頁 Python資料科學學習筆記-Numpy陣列基礎操作

comments powered by Disqus

Python資料科學學習筆記-Numpy陣列基礎操作
喔不！是CORS