1

我有一个熊猫数据框,它有一个 json 响应列

我已使用以下代码读取数据:

data = pd.read_csv(f"""bureau_response_1.csv""",sep=";")

后来我使用 eval 函数来评估使用此代码:

data['account_Segments']=data['account_Segments'].apply(lambda x:eval(x))

它在使用 json_normalize 时抛出错误

code1 : data = pd.json_normalize(data['account_Segments'])

运行上面的代码1错误:

在此处输入图像描述

评估后的数据如下所示:

在此处输入图像描述

我需要扁平列值中的这个 json 数据。

使用读取 csv 后, data1 = pd.read_csv("bureau_response.csv",sep=",") 请注意,它有两列 APPLICATION__ID 和 account_Segments 我希望 APPLICATION_ID 列在展平 account_Segments 后作为索引

所以在 data1.head(1).to_dict() 我得到之后, 这个输出

我删除了双引号并清理了数据,因此下面给出了前两行。请注意,索引列具有 APPLICATION_ID

当我调用您的函数 s = (data2.applymap(type) == list).all()时,此条件为假,因此数据不会变平

{'LAI-100518437': "[{'cashLimit': '3,000', 'accountType': 'Credit Card', 'creditLimit': '30,000', 'amountOverdue': '1,331', 'currentBalance': '4,336', 'paymentHistory1': '093    063    033    003    003    000    003    003    003    003    003    000    000    003    000    003    003    003    ', 'paymentHistory2': '003    000    000    003    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentFrequency': 'Monthly', 'dateofLastPayment': '07/07/2017', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '27/08/2013', 'paymentHistoryEndDate': '01/12/2014', 'paymentHistoryStartDate': '01/11/2017', 'dateReportedandCertified': '03/11/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '34,051'}, {'emiAmount': '11,288', 'accountType': 'Personal Loan', 'amountOverdue': '31,728', 'currentBalance': '3,92,459', 'rateOfInterest': '12.00', 'paymentHistory1': '089    029    STD    STD    STD    STD    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    ', 'repaymentTenure': '60', 'paymentFrequency': 'Monthly', 'dateofLastPayment': '12/04/2017', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '12/01/2016', 'paymentHistoryEndDate': '01/02/2016', 'paymentHistoryStartDate': '01/10/2017', 'dateReportedandCertified': '31/10/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '5,00,000'}, {'dateClosed': '11/07/2017', 'accountType': 'Business Loan – General', 'accountNumber': 'LK0000010410', 'currentBalance': '0', 'paymentHistory1': '000    000    000    000    ', 'paymentFrequency': 'Monthly', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '10/04/2017', 'paymentHistoryEndDate': '01/04/2017', 'paymentHistoryStartDate': '01/07/2017', 'dateReportedandCertified': '31/07/2017', 'reportingMemberShortName': 'AADRILTD', 'highCredit_SanctionedAmount': '1,00,000'}, {'accountType': 'Auto Loan (Personal)', 'currentBalance': '10,65,245', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'dateofLastPayment': '12/09/2017', 'ownershipIndicator': 'Guarantor', 'dateOpened_Disbursed': '25/08/2016', 'paymentHistoryEndDate': '01/08/2016', 'paymentHistoryStartDate': '01/09/2017', 'dateReportedandCertified': '30/09/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '14,00,000'}, {'accountType': 'Auto Loan (Personal)', 'currentBalance': '3,74,330', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    ', 'dateofLastPayment': '12/09/2017', 'ownershipIndicator': 'Joint', 'dateOpened_Disbursed': '21/03/2016', 'paymentHistoryEndDate': '01/03/2016', 'paymentHistoryStartDate': '01/09/2017', 'dateReportedandCertified': '30/09/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '7,00,000'}, {'accountType': 'Credit Card', 'creditLimit': '1,25,000', 'currentBalance': '71,670', 'paymentHistory1': '000    005    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': 'XXX    000    XXX    000    000    000    000    ', 'paymentFrequency': 'Monthly', 'dateofLastPayment': '02/10/2017', 'ownershipIndicator': 'Individual', 'actualPaymentAmount': '6,884', 'dateOpened_Disbursed': '30/10/2015', 'paymentHistoryEndDate': '01/10/2015', 'paymentHistoryStartDate': '01/10/2017', 'dateReportedandCertified': '31/10/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '1,14,344'}, {'accountType': 'Credit Card', 'currentBalance': '11,036', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'dateofLastPayment': '02/10/2017', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '26/10/2014', 'paymentHistoryEndDate': '01/11/2014', 'paymentHistoryStartDate': '01/10/2017', 'dateReportedandCertified': '13/10/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '26,102'}, {'dateClosed': '03/11/2016', 'accountType': 'Auto Loan (Personal)', 'currentBalance': '0', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    000    000    XXX    000    000    000    000    000    000    ', 'dateofLastPayment': '28/10/2016', 'ownershipIndicator': 'Guarantor', 'dateOpened_Disbursed': '25/06/2014', 'paymentHistoryEndDate': '01/06/2014', 'paymentHistoryStartDate': '01/11/2016', 'dateReportedandCertified': '30/11/2016', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '10,27,000'}]",
 'LAI-100826051': "[{'dateClosed': '02/01/2018', 'accountType': 'Business Loan – General', 'accountNumber': 'LK0000013293', 'currentBalance': '0', 'paymentHistory1': '000    STD    STD    STD    000    000    000    ', 'paymentFrequency': 'Monthly', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '01/07/2017', 'paymentHistoryEndDate': '01/07/2017', 'paymentHistoryStartDate': '01/01/2018', 'dateReportedandCertified': '31/01/2018', 'reportingMemberShortName': 'AADRILTD', 'highCredit_SanctionedAmount': '50,00,000'}, {'dateClosed': '04/10/2017', 'accountType': 'Business Loan – General', 'accountNumber': 'LK0000013294', 'currentBalance': '0', 'paymentHistory1': 'STD    000    000    000    ', 'paymentFrequency': 'Monthly', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '01/07/2017', 'paymentHistoryEndDate': '01/07/2017', 'paymentHistoryStartDate': '01/10/2017', 'dateReportedandCertified': '31/10/2017', 'reportingMemberShortName': 'AADRILTD', 'highCredit_SanctionedAmount': '50,00,000'}, {'dateClosed': '27/09/2017', 'accountType': 'Business Loan – General', 'accountNumber': 'LK0000009268', 'currentBalance': '0', 'paymentHistory1': '000    XXX    XXX    000    000    000    000    ', 'paymentFrequency': 'Monthly', 'dateofLastPayment': '27/09/2017', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '08/03/2017', 'paymentHistoryEndDate': '01/03/2017', 'paymentHistoryStartDate': '01/09/2017', 'dateReportedandCertified': '30/09/2017', 'reportingMemberShortName': 'AADRILTD', 'highCredit_SanctionedAmount': '30,00,000'}, {'accountType': 'Credit Card', 'currentBalance': '-170429', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    000    ', 'dateofLastPayment': '29/06/2017', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '17/05/2016', 'paymentHistoryEndDate': '01/05/2016', 'paymentHistoryStartDate': '01/02/2018', 'dateReportedandCertified': '28/02/2018', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '28,750'}, {'dateClosed': '27/04/2016', 'accountType': 'Credit Card', 'currentBalance': '-14', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    000    000    000    000    000    000    000    ', 'dateofLastPayment': '27/04/2016', 'ownershipIndicator': 'Authorised User', 'dateOpened_Disbursed': '23/01/2014', 'paymentHistoryEndDate': '01/01/2014', 'paymentHistoryStartDate': '01/04/2016', 'dateReportedandCertified': '30/04/2016', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '1,16,999'}, {'dateClosed': '27/04/2016', 'accountType': 'Credit Card', 'currentBalance': '-14', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'dateofLastPayment': '27/04/2016', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '07/09/2012', 'paymentHistoryEndDate': '01/05/2013', 'paymentHistoryStartDate': '01/04/2016', 'dateReportedandCertified': '30/04/2016', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '1,16,999'}]"} 

我已经使用这些代码来清理数据并最终使用您的代码创建了一个新的 DF,并且我得到以下格式作为输出:

dict1 = data1['account_Segments'].to_dict() dict_str = str(dict1).replace('"','') import ast new_dict = ast.literal_eval(dict_str)

df1 = pd.DataFrame.from_dict(new_dict, orient='index').reset_index()

df2 = flatten_nested_json_df(df1) df2 = df2.drop(['level_0'], axis=1) df2

输出df2

我想让所有 json 用 application_id 作为行索引展平

4

1 回答 1

0

你可以试试这个功能:

def flatten_nested_json_df(df):
    df = df.reset_index()
    s = (df.applymap(type) == list).all()
    list_columns = s[s].index.tolist()
    
    s = (df.applymap(type) == dict).all()
    dict_columns = s[s].index.tolist()

    
    while len(list_columns) > 0 or len(dict_columns) > 0:
        new_columns = []

        for col in dict_columns:
            horiz_exploded = pd.json_normalize(df[col]).add_prefix(f'{col}.')
            horiz_exploded.index = df.index
            df = pd.concat([df, horiz_exploded], axis=1).drop(columns=[col])
            new_columns.extend(horiz_exploded.columns) # inplace

        for col in list_columns:
            print(f"exploding: {col}")
            df = df.drop(columns=[col]).join(df[col].explode().to_frame())
            new_columns.append(col)

        s = (df[new_columns].applymap(type) == list).all()
        list_columns = s[s].index.tolist()

        s = (df[new_columns].applymap(type) == dict).all()
        dict_columns = s[s].index.tolist()
    return df

所以,在你的情况下:

flatten_nested_json_df(data)

但是:您需要对如何读取数据做一些事情。它应该如下所示:

data = {'LAI-100518437': [{'cashLimit': '3,000', 'accountType': 'Credit Card', 'creditLimit': '30,000', 'amountOverdue': '1,331', 'currentBalance': '4,336', 'paymentHistory1': '093    063    033    003    003    000    003    003    003    003    003    000    000    003    000    003    003    003    ', 'paymentHistory2': '003    000    000    003    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentFrequency': 'Monthly', 'dateofLastPayment': '07/07/2017', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '27/08/2013', 'paymentHistoryEndDate': '01/12/2014', 'paymentHistoryStartDate': '01/11/2017', 'dateReportedandCertified': '03/11/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '34,051'}, {'emiAmount': '11,288', 'accountType': 'Personal Loan', 'amountOverdue': '31,728', 'currentBalance': '3,92,459', 'rateOfInterest': '12.00', 'paymentHistory1': '089    029    STD    STD    STD    STD    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    ', 'repaymentTenure': '60', 'paymentFrequency': 'Monthly', 'dateofLastPayment': '12/04/2017', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '12/01/2016', 'paymentHistoryEndDate': '01/02/2016', 'paymentHistoryStartDate': '01/10/2017', 'dateReportedandCertified': '31/10/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '5,00,000'}, {'dateClosed': '11/07/2017', 'accountType': 'Business Loan – General', 'accountNumber': 'LK0000010410', 'currentBalance': '0', 'paymentHistory1': '000    000    000    000    ', 'paymentFrequency': 'Monthly', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '10/04/2017', 'paymentHistoryEndDate': '01/04/2017', 'paymentHistoryStartDate': '01/07/2017', 'dateReportedandCertified': '31/07/2017', 'reportingMemberShortName': 'AADRILTD', 'highCredit_SanctionedAmount': '1,00,000'}, {'accountType': 'Auto Loan (Personal)', 'currentBalance': '10,65,245', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'dateofLastPayment': '12/09/2017', 'ownershipIndicator': 'Guarantor', 'dateOpened_Disbursed': '25/08/2016', 'paymentHistoryEndDate': '01/08/2016', 'paymentHistoryStartDate': '01/09/2017', 'dateReportedandCertified': '30/09/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '14,00,000'}, {'accountType': 'Auto Loan (Personal)', 'currentBalance': '3,74,330', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    ', 'dateofLastPayment': '12/09/2017', 'ownershipIndicator': 'Joint', 'dateOpened_Disbursed': '21/03/2016', 'paymentHistoryEndDate': '01/03/2016', 'paymentHistoryStartDate': '01/09/2017', 'dateReportedandCertified': '30/09/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '7,00,000'}, {'accountType': 'Credit Card', 'creditLimit': '1,25,000', 'currentBalance': '71,670', 'paymentHistory1': '000    005    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': 'XXX    000    XXX    000    000    000    000    ', 'paymentFrequency': 'Monthly', 'dateofLastPayment': '02/10/2017', 'ownershipIndicator': 'Individual', 'actualPaymentAmount': '6,884', 'dateOpened_Disbursed': '30/10/2015', 'paymentHistoryEndDate': '01/10/2015', 'paymentHistoryStartDate': '01/10/2017', 'dateReportedandCertified': '31/10/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '1,14,344'}, {'accountType': 'Credit Card', 'currentBalance': '11,036', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'dateofLastPayment': '02/10/2017', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '26/10/2014', 'paymentHistoryEndDate': '01/11/2014', 'paymentHistoryStartDate': '01/10/2017', 'dateReportedandCertified': '13/10/2017', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '26,102'}, {'dateClosed': '03/11/2016', 'accountType': 'Auto Loan (Personal)', 'currentBalance': '0', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    000    000    XXX    000    000    000    000    000    000    ', 'dateofLastPayment': '28/10/2016', 'ownershipIndicator': 'Guarantor', 'dateOpened_Disbursed': '25/06/2014', 'paymentHistoryEndDate': '01/06/2014', 'paymentHistoryStartDate': '01/11/2016', 'dateReportedandCertified': '30/11/2016', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '10,27,000'}],
 'LAI-100826051': [{'dateClosed': '02/01/2018', 'accountType': 'Business Loan – General', 'accountNumber': 'LK0000013293', 'currentBalance': '0', 'paymentHistory1': '000    STD    STD    STD    000    000    000    ', 'paymentFrequency': 'Monthly', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '01/07/2017', 'paymentHistoryEndDate': '01/07/2017', 'paymentHistoryStartDate': '01/01/2018', 'dateReportedandCertified': '31/01/2018', 'reportingMemberShortName': 'AADRILTD', 'highCredit_SanctionedAmount': '50,00,000'}, {'dateClosed': '04/10/2017', 'accountType': 'Business Loan – General', 'accountNumber': 'LK0000013294', 'currentBalance': '0', 'paymentHistory1': 'STD    000    000    000    ', 'paymentFrequency': 'Monthly', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '01/07/2017', 'paymentHistoryEndDate': '01/07/2017', 'paymentHistoryStartDate': '01/10/2017', 'dateReportedandCertified': '31/10/2017', 'reportingMemberShortName': 'AADRILTD', 'highCredit_SanctionedAmount': '50,00,000'}, {'dateClosed': '27/09/2017', 'accountType': 'Business Loan – General', 'accountNumber': 'LK0000009268', 'currentBalance': '0', 'paymentHistory1': '000    XXX    XXX    000    000    000    000    ', 'paymentFrequency': 'Monthly', 'dateofLastPayment': '27/09/2017', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '08/03/2017', 'paymentHistoryEndDate': '01/03/2017', 'paymentHistoryStartDate': '01/09/2017', 'dateReportedandCertified': '30/09/2017', 'reportingMemberShortName': 'AADRILTD', 'highCredit_SanctionedAmount': '30,00,000'}, {'accountType': 'Credit Card', 'currentBalance': '-170429', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    000    ', 'dateofLastPayment': '29/06/2017', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '17/05/2016', 'paymentHistoryEndDate': '01/05/2016', 'paymentHistoryStartDate': '01/02/2018', 'dateReportedandCertified': '28/02/2018', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '28,750'}, {'dateClosed': '27/04/2016', 'accountType': 'Credit Card', 'currentBalance': '-14', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    000    000    000    000    000    000    000    ', 'dateofLastPayment': '27/04/2016', 'ownershipIndicator': 'Authorised User', 'dateOpened_Disbursed': '23/01/2014', 'paymentHistoryEndDate': '01/01/2014', 'paymentHistoryStartDate': '01/04/2016', 'dateReportedandCertified': '30/04/2016', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '1,16,999'}, {'dateClosed': '27/04/2016', 'accountType': 'Credit Card', 'currentBalance': '-14', 'paymentHistory1': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'paymentHistory2': '000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    000    ', 'dateofLastPayment': '27/04/2016', 'ownershipIndicator': 'Individual', 'dateOpened_Disbursed': '07/09/2012', 'paymentHistoryEndDate': '01/05/2013', 'paymentHistoryStartDate': '01/04/2016', 'dateReportedandCertified': '30/04/2016', 'reportingMemberShortName': 'NOT DISCLOSED', 'highCredit_SanctionedAmount': '1,16,999'}]} 

也就是说,“”不应该出现在 json 列表周围。

因此,请执行以下操作

df1 = pd.DataFrame.from_dict(data, orient='index').reset_index()
df2 = flatten_nested_json_df(df1)
df2 = df2.drop(['level_0'], axis=1)
df2 

返回:

  index                                                  6  \
0  LAI-100518437  {'accountType': 'Credit Card', 'currentBalance...   
1  LAI-100826051                                               None   

                                                   7 0.cashLimit  \
0  {'dateClosed': '03/11/2016', 'accountType': 'A...       3,000   
1                                               None         NaN   

             0.accountType 0.creditLimit 0.amountOverdue 0.currentBalance  \
0              Credit Card        30,000           1,331            4,336   
1  Business Loan – General           NaN             NaN                0   

                                   0.paymentHistory1  \
0  093    063    033    003    003    000    003 ...   
1  000    STD    STD    STD    000    000    000       

                                   0.paymentHistory2  ... 5.dateofLastPayment  \
0  003    000    000    003    000    000    000 ...  ...          02/10/2017   
1                                                NaN  ...          27/04/2016   

  5.ownershipIndicator 5.actualPaymentAmount 5.dateOpened_Disbursed  \
0           Individual                 6,884             30/10/2015   
1           Individual                   NaN             07/09/2012   

  5.paymentHistoryEndDate 5.paymentHistoryStartDate  \
0              01/10/2015                01/10/2017   
1              01/05/2013                01/04/2016   

  5.dateReportedandCertified 5.reportingMemberShortName  \
0                 31/10/2017              NOT DISCLOSED   
1                 30/04/2016              NOT DISCLOSED   

  5.highCredit_SanctionedAmount 5.dateClosed  
0                      1,14,344          NaN  
1                      1,16,999   27/04/2016  

[2 rows x 95 columns]

由您重命名列(例如,您可以删除 0. 等前缀)。

请注意,我的解决方案不会使两个列表不常见的内容变平。我想你需要单独处理它们。但是,请首先检查您阅读的数据的质量。

于 2021-08-04T09:06:57.280 回答