python - 替换选定列中特定的逐行重复单元格而不删除行

Question

如何在不删除行的情况下替换选定列中特定的逐行重复单元格（最好不遍历行）？

基本上，我想保留第一个值并用 NAN 替换剩余的重复项。

例如：

df_example = pd.DataFrame({'A':['a' , 'b', 'c'], 'B':['a', 'f', 'c'],'C':[1,2,3]})
df_example.head()

原来的：

    A   B   C
0   a   a   1
1   b   f   2
2   c   c   3

预期输出：

    A   B   C
0   a   nan 1
1   b   f   2
2   c   nan 3

稍微复杂一点的例子如下：

原来的：

    A   B   C D 
0   a   1   a 1
1   b   2   f 5
2   c   3   c 3

预期输出：

    A   B   C D 
0   a   1   nan nan
1   b   2   f 5
2   c   3   nan nan

score 1 · Accepted Answer

DataFrame.mask与中的Series.duplicated每行一起使用DataFrame.apply：

df_example = df_example.mask(df_example.apply(lambda x: x.duplicated(), axis=1))
print (df_example)
   A    B  C
0  a  NaN  1
1  b    f  2
2  c  NaN  3

使用新数据：

df_example = df_example.mask(df_example.apply(lambda x: x.duplicated(), axis=1))
print (df_example)
   A  B    C    D
0  a  1  NaN  NaN
1  b  2    f  5.0
2  c  3  NaN  NaN

python - 替换选定列中特定的逐行重复单元格而不删除行

1 回答 1

Related

Reference