1

在这里,我从请求响应中得到了一个嵌套的 JSON,例如:

{
 'code': 0,
 'daily_stats': [{'consume_data': {'fans_go_detail_count': 0,
                                   'fans_impression_count': 215,
                                   'fans_play_count': 7,
                                   'go_detail_count': 0,
                                   'impression_count': 226,
                                   'play_count': 8},
                                   'date': '2020-06-22'}],
 'jump_rate': [],
 'message': 'success',
 'total_stat': {'consume_data': {'fans_go_detail_count': 0,
                                 'fans_impression_count': 215,
                                 'fans_play_count': 7,
                                 'go_detail_count': 0,
                                 'impression_count': 226,
                                 'play_count': 8},
  'consume_detail': {'click_rate': 0.035398230088495575,
                     'read_complete_rate': 0,
                     'read_duration': 111},
                     'fans_change_count': 0,
                     'fans_data': {},
                     'interaction_data': {},
                     'ranking_data': {}}}

我想要一个扁平的df,例如:

日期、daily_stats.consume_data.fans_go_detail_count、consume_detail.click_rate 等。

将它输入 pandas.json_normalize 我得到:


df = pd.json_normalize(r.json())
list(df)

['code',
 'daily_stats',
 'jump_rate',
 'message',
 'total_stat.consume_data.fans_go_detail_count',
 'total_stat.consume_data.fans_impression_count',
 'total_stat.consume_data.fans_play_count',
 'total_stat.consume_data.go_detail_count',
 'total_stat.consume_data.impression_count',
 'total_stat.consume_data.play_count',
 'total_stat.consume_detail.click_rate',
 'total_stat.consume_detail.read_complete_rate',
 'total_stat.consume_detail.read_duration',
 'total_stat.fans_change_count']

问题:

  1. 'daily_stats' 和 'jump_rate' 仍然打包在列表中,例如:
df['daily_stats']

0    [{'consume_data': {'fans_go_detail_count': 0, ...
Name: daily_stats, dtype: object
  1. 'fans_data': {}、'interaction_data': {}、'ranking_data': {} 等空字段缺失。

我试图添加 record_path=r.json['daily_stats'] 然后我得到:

unhashable type: 'dict'

当然可以手动将每个循环解包到 dfs 并加入并转换为一个平面,但我有一种感觉有一种方法可以不用大惊小怪。

4

1 回答 1

3
  • 作为. r_dict
# load r into a dataframe
df = pd.json_normalize(r)

# explode the columns with lists
df = df.apply(lambda x: x.explode()).reset_index(drop=True)

# expand the dicts in daily_stats and join them to df
df = df.join(pd.json_normalize(df.daily_stats)).drop(columns=['daily_stats'])

# display(df)
   code jump_rate  message  total_stat.consume_data.fans_go_detail_count  total_stat.consume_data.fans_impression_count  total_stat.consume_data.fans_play_count  total_stat.consume_data.go_detail_count  total_stat.consume_data.impression_count  total_stat.consume_data.play_count  total_stat.consume_detail.click_rate  total_stat.consume_detail.read_complete_rate  total_stat.consume_detail.read_duration  total_stat.fans_change_count        date  consume_data.fans_go_detail_count  consume_data.fans_impression_count  consume_data.fans_play_count  consume_data.go_detail_count  consume_data.impression_count  consume_data.play_count
0     0       NaN  success                                             0                                            215                                        7                                        0                                       226                                   8                              0.035398                                             0                                      111                             0  2020-06-22                                  0                                 215                             7                             0                            226                        8

其他资源

于 2020-09-24T02:45:23.793 回答