1

我正在尝试根据一些规则从 pyspark 数据框中选择一些值。在 pyspark 中出现异常。

from pyspark.sql import functions as F

df.select(df.card_key,F.when((df.tran_sponsor = 'GAMES') &  (df.location_code = '9145'),'ENTERTAINMENT').when((df.tran_sponsor = 'XYZ') &  (df.location_code = '123'),'eBOOKS').when((df.tran_sponsor = 'XYZ') &  (df.l_code.isin(['123', '234', '345', '456', '567', '678', '789', '7878', '67', '456']) ),'FINANCE').otherwise(df.tran_sponsor)).show()

我遇到以下异常。你能给点建议吗?

文件“”,第 1 行 df.select(df.card_key,F.when((df.tran_sponsor = 'GAMES') & (df.location_code = '9145'),'ENTERTAINMENT').when((df.tran_sponsor = 'XYZ') & (df.location_code = '123'),'eBOOKS').when((df.tran_sponsor = 'XYZ') & (df.l_code.isin(['6001', '6002', '6003 ', '6004', '6005', '6006', '6007', '6008', '6009', '6010', '6011', '6012', '6013', '6014']),' FINANCE').otherwise(df.tran_sponsor)).show() ^ SyntaxError: 无效语法

4

1 回答 1

2

好吧,我只是想通了,isin 没有问题,问题在于赋值运算符:(

df.select(df.card_key,F.when((df.tran_sponsor == 'GAMES') &  (df.location_code == '9145'),'ENTERTAINMENT').when((df.tran_sponsor == 'XYZ') &  (df.location_code == '123'),'eBOOKS').when((df.tran_sponsor == 'XYZ') &  (df.l_code.isin(['123', '234', '345', '456', '567', '678', '789', '7878', '67', '456']) ),'FINANCE').otherwise(df.tran_sponsor)).show()

它运作良好,如果有人正在研究它,感谢您的努力。

于 2016-11-02T17:37:27.943 回答