您可以使用pivot
withfillna
和 last cast float
to int
by astype
:
df = df.pivot(index='ser_id', columns='action', values='count').fillna(0).astype(int)
print (df)
action delete read write
ser_id
1 7 15 5
2 0 0 2
3 2 9 1
另一个解决方案set_index
and unstack
:
df = df.set_index(['ser_id','action'])['count'].unstack(fill_value=0)
print (df)
action delete read write
ser_id
1 7 15 5
2 0 0 2
3 2 9 1
如果列中的重复项ser_id
and action
and pivot
orunstack
不能使用,解决方案是groupby
聚合mean
orsum
和 reshape by unstack
:
df = df.groupby(['ser_id','action'])['count'].mean().unstack(fill_value=0)
print (df)
action delete read write
ser_id
1 7 15 5
2 0 0 2
3 2 9 1
时间:
#random dataframe
np.random.seed(100)
N = 10000
df = pd.DataFrame(np.random.randint(100, size=(N,3)), columns=['user_id','action', 'count'])
#[10000000 rows x 2 columns]
print (df)
In [124]: %timeit (df.groupby(['user_id','action'])['count'].mean().unstack(fill_value=0))
100 loops, best of 3: 5.5 ms per loop
In [125]: %timeit (df.pivot_table('count', 'user_id', 'action', fill_value=0))
10 loops, best of 3: 35.9 ms per loop