0

鉴于以下字典创建自df['statistics'].head().to_dict()

{0: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 1: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 2: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 3: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}},
 4: {'executions': {'total': '1',
   'passed': '1',
   'failed': '0',
   'skipped': '0'},
  'defects': {'product_bug': {'total': 0, 'PB001': 0},
   'automation_bug': {'AB001': 0, 'total': 0},
   'system_issue': {'total': 0, 'SI001': 0},
   'to_investigate': {'total': 0, 'TI001': 0},
   'no_defect': {'ND001': 0, 'total': 0}}}}

有没有办法将字典键/值对扩展为它们自己的列,并在这些列前面加上原始列的名称,即 statisistics.executions.total 会变成 statistics_executions_total 甚至是 executions_total?

我已经证明我可以使用以下内容创建列:

pd.concat([df.drop(['statistics'], axis=1), df['statistics'].apply(pd.Series)], axis=1) 但是,您会注意到这些新创建的列中的每一个都有一个重复的名称“total”。

我; 但是,还没有找到一种方法来为新创建的列添加原始列名前缀,即 executions_total。

为了获得更多洞察力,统计数据将扩展到执行和缺陷,执行将扩展到通过 | 失败 | 跳过 | 总数和缺陷将扩展为automation_bug | 系统问题 | 调查 | 产品错误 | 无缺陷。后者将扩展为总 | **001 列,其中总计重复了多次。

非常感谢任何想法。-谢谢!

4

1 回答 1

1
import pandas as pd

# this is for setting up the test dataframe from the data in the question, where data is the name of the dict
df = pd.DataFrame({'statistics': [v for v in data.values()]})

# display(df)
                                                                                                                                                                                                                                                                                                    statistics
0  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
1  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
2  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
3  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}
4  {'executions': {'total': '1', 'passed': '1', 'failed': '0', 'skipped': '0'}, 'defects': {'product_bug': {'total': 0, 'PB001': 0}, 'automation_bug': {'AB001': 0, 'total': 0}, 'system_issue': {'total': 0, 'SI001': 0}, 'to_investigate': {'total': 0, 'TI001': 0}, 'no_defect': {'ND001': 0, 'total': 0}}}

# normalize the statistics column
dfs = pd.json_normalize(df.statistics)

# display(dfs)
  total passed failed skipped  product_bug.total  product_bug.PB001  automation_bug.AB001  automation_bug.total  system_issue.total  system_issue.SI001  to_investigate.total  to_investigate.TI001  no_defect.ND001  no_defect.total
0     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
1     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
2     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
3     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
4     1      1      0       0                  0                  0                     0                     0                   0                   0                     0                     0                0                0
于 2020-09-19T17:48:38.967 回答