Difference between map, applymap and apply methods in Pandas

In [1]:
import pandas as pd
import numpy as np
In [2]:
d = {"id":[15594815,15805254,15656148],"score":[62.118782,13.003589,997.3572]}
df_map = pd.DataFrame(data=d)
df_map.head()
Out[2]:
id score
0 15594815 62.118782
1 15805254 13.003589
2 15656148 997.357200
In [3]:
def score2label(x):
    if x>500:
        return 1
    else :
        return 0

map() 用法

  • map() 是 series 函数
  • map() 支持传入 lambda 表达式和函数
  • map() 不支持额外传参
In [4]:
#df_map['score1'] = df_map['score'].map(lambda x: 1 if x>500 else 0)
df_map['score1'] = df_map['score'].map(score2label)
df_map.head()
Out[4]:
id score score1
0 15594815 62.118782 0
1 15805254 13.003589 0
2 15656148 997.357200 1

apply() 用法

  • apply() 既可以dataframe又可以series
  • apply() 应用更复杂的功能

如额外传参(Series):

In [5]:
def score2label1(x, y):
    if x>500:
        return 1 + y
    else :
        return 0 + y
# apply() applymap() 是 pandas 函数, apply()作用于一列,通常为统计,applymap()为所有
df_map['score2'] = df_map['score'].apply(score2label1, y=3)
df_map.head()
Out[5]:
id score score1 score2
0 15594815 62.118782 0 3
1 15805254 13.003589 0 3
2 15656148 997.357200 1 4

行列求和(Dataframe):

In [6]:
df_map.apply(np.sum, axis=0)
Out[6]:
id        4.705622e+07
score     1.072480e+03
score1    1.000000e+00
score2    1.000000e+01
dtype: float64
In [7]:
df_map.apply(np.sum, axis=1)
Out[7]:
0    1.559488e+07
1    1.580527e+07
2    1.565715e+07
dtype: float64

applymap() 用法

  • applymap() 应用于整个 Dataframe
In [8]:
df_map = df_map.applymap(lambda x: '%.2f' % x)
df_map.head()
Out[8]:
id score score1 score2
0 15594815.00 62.12 0.00 3.00
1 15805254.00 13.00 0.00 3.00
2 15656148.00 997.36 1.00 4.00