我有一个数据集“banks”,如果我对列名“jobs”进行分组以检查每个类别的计数,我可以找到以下内容:
指数 | 工作 | 数数 |
---|---|---|
0 | 阿德宁。 | 478 |
1 | 蓝领 | 946 |
2 | 企业家 | 168 |
3 | 保姆 | 112 |
4 | 管理 | 969 |
5 | 退休 | 230 |
6 | 自雇人士 | 183 |
7 | 服务 | 417 |
8 | 学生 | 84 |
9 | 技术员。 | 768 |
我还添加了我正在使用的数据集的前 3 行:年龄、工作、婚姻、教育、默认、余额、住房、贷款、联系人、日、月、持续时间、活动、pdays、previous、poutcome、y 30,失业,已婚,主要,无,1787,无,无,蜂窝,19,十月,79,1,-1,0,未知,无 33,服务,已婚,次要,无,4789,是,是,蜂窝, 11,may,220,1,339,4,failure,no 35,management,single,tertiary,no,1350,yes,no,cellular,16,apr,185,1,330,1,failure,no
我的目的是创建一个可以用于其他列的小函数,因此我尝试使用“dfply”包创建一个函数。
import pandas as pd
import dfply
from dfply import *
#creating the function
@dfpipe
def woe_iv(df,variable):
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
return step1
#invoking the function
banks>>woe_iv(X.job)
但是,这段代码给了我一个错误,说明如下:
@dfpipe
def woe_iv(df,variable):
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
return step1
banks>>woe_iv(X.job)
Traceback (most recent call last):
File "<ipython-input-46-d851aeac1927>", line 7, in <module>
banks>>woe_iv(X.job)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 142, in __rrshift__
result = self.function(other_copy)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 149, in <lambda>
return pipe(lambda x: self.function(x, *args, **kwargs))
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 329, in __call__
return self.function(*args, **kwargs)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 282, in __call__
return self.function(df, *args, **kwargs)
File "<ipython-input-46-d851aeac1927>", line 5, in woe_iv
step1=df>>group_by(X.variable)>>summarize(COUNT=X.variable.count())
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 142, in __rrshift__
result = self.function(other_copy)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 149, in <lambda>
return pipe(lambda x: self.function(x, *args, **kwargs))
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 279, in __call__
args = self._recursive_arg_eval(df, args[1:])
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 241, in _recursive_arg_eval
return [
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 242, in <listcomp>
self._symbolic_to_label(df, a) if i in eval_as_label
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 231, in _symbolic_to_label
return self._evaluator_loop(df, arg, self._evaluate_label)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 225, in _evaluator_loop
return eval_func(df, arg)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 181, in _evaluate_label
arg = self._evaluate(df, arg)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 175, in _evaluate
arg = arg.evaluate(df)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 71, in evaluate
return self.function(context)
File "/opt/anaconda3/lib/python3.8/site-packages/dfply/base.py", line 74, in <lambda>
return Intention(lambda x: getattr(self.function(x), attribute),
File "/opt/anaconda3/lib/python3.8/site-packages/pandas/core/generic.py", line 5139, in __getattr__
return object.__getattribute__(self, name)
AttributeError: 'DataFrame' object has no attribute 'variable'
如果我遗漏了什么,请告诉我。