1

我的数据集是 Churn_Modeling:

我正在寻找一个名为 c_rating 的列,其范围如下:(<500 -="very poor", 500-600="poor", 601-660="fair", 661-780="good", and >= 780 – "excellent").

一些示例数据:按顺序排列:

RowNumber     CustomerId     Surname    CreditScore     Geography   Gender  Age Tenure    Balance   NumOfProducts   HasCrCard   IsActiveMember  EstimatedSalary Exited
        1       15634602    Hargrave            619       France    Female  42       2          0               1           1                1        101348.88     1
        2       15647311        Hill            608       Spain     Female  41       1   83807.86               1           0                1        112542.58     0
        3       15619304        Onio            502       France    Female  42       8   159660.8               3           1                0        113931.57     1
        4       15701354        Boni            699       France    Female  39       1          0               2           0                0         93826.63     0
        5       15737888    Mitchell            850       Spain     Female  43       2  125510.82               1           1                1          79084.1     0
        6       15574012         Chu            645       Spain     Male    44       8  113755.78               2           1                0        149756.71     1

我正在处理其他代码,所以我的库如下:

from plotnine import *
from dfply import *
import numpy as np
import pandas as pd
from sklearn.linear_model import LinearRegression
a_churn = pd.read_csv("Churn_Modeling.csv")

我怎样才能做一个 case_when (如在 R 中)但 python 来创建这个列?

4

1 回答 1

1
df['c_rating'] = pd.cut(df['CreditScore'], bins=[0,500,600,660,780,1000], labels=['very poor','poor','fair','good','excellent'])

检查输出

df[['CreditScore','c_rating']]

    CreditScore c_rating
0   619         fair
1   608         fair
2   502         poor
3   699         good
4   850         excellent
5   645         fair
于 2020-12-12T20:29:34.630 回答