1

json从要附加到列表的 API 中返回。在我完成那个调用之后,我需要使用 pandas 来展平这些数据。我不知道该怎么做。

代码:

api_results = []

response = requests.post(target_url, data=doc, headers=login_details)
       response_data = json.loads(response.text)
       if type(response_data)==dict and 'error' in response_data.keys():
           error_results.append(response_data)
       else:
           api_results.append(response_data)

当我打电话给api_results我时,我的数据如下所示:

[{"requesturl":"http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4","clientid":"123456789","adjustedsummaryguidelines":{"midrangeallabsence":46,"midrangeclaims":36,"atriskallabsence":374,"atriskclaims":98},"riskassessment":{"score":87.95,"status":"Red (Extreme)","magnitude":"86.65","volatility":"89.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"alert":"yellow","days":185},"cp100":{"alert":"yellow","days":365}},"icdcodes":[{"code":"719.41","name":"Pain in joint, shoulder region","meandurationdays":{"bp":18,"cp95":72,"cp100":93}},{"code":"840.9","name":"Sprains and strains of unspecified site of shoulder and upper arm","meandurationdays":{"bp":10,"cp95":27,"cp100":35}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}
,{"clientid":"987654321","adjustedsummaryguidelines":{"midrangeallabsence":25,"midrangeclaims":42,"atriskallabsence":0,"atriskclaims":194},"riskassessment":{"score":76.85,"status":"Orange (High)","magnitude":"74.44","volatility":"79.25"},"adjustedduration":{"bp":{"days":2},"cp95":{"days":95},"cp100":{"alert":"yellow","days":193}},"icdcodes":[{"code":"724.2","name":"Lumbago","meandurationdays":{"bp":10,"cp95":38,"cp100":50}},{"code":"847.2","name":"Sprain of lumbar","meandurationdays":{"bp":10,"cp95":22,"cp100":29}}],"cfactors":{"legalrep":{"applied":"1","alert":"red"}},"alertdesc":{"red":"Recommend early intervention and priority medical case management.","yellow":"Consider early intervention and priority medical case management."}}]

我一直在使用json_normalize,但我知道我没有正确使用这个库。

如何展平这些数据?

我需要的是这个:

+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
| clientid|days| alert|days| alert|days|atriskallabsence|atriskclaims|midrangeallabsence|midrangeclaims|           alertdesc|alert|applied|magnitude|score|       status|volatility|  code| bp|cp100|cp95|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
|123456789|   2|yellow| 365|yellow| 185|             374|          98|                46|            36|[Recommend early ...|  red|      1|    86.65|87.95|Red (Extreme)|     89.25|719.41| 18|   93|  72|
|123456789|   2|yellow| 365|yellow| 185|             374|          98|                46|            36|[Recommend early ...|  red|      1|    86.65|87.95|Red (Extreme)|     89.25| 840.9| 10|   35|  27|
|987654321|   2|yellow| 193|  null|  95|               0|         194|                25|            42|[Recommend early ...|  red|      1|    74.44|76.85|Orange (High)|     79.25| 724.2| 10|   50|  38|
|987654321|   2|yellow| 193|  null|  95|               0|         194|                25|            42|[Recommend early ...|  red|      1|    74.44|76.85|Orange (High)|     79.25| 847.2| 10|   29|  22|
+---------+----+------+----+------+----+----------------+------------+------------------+--------------+--------------------+-----+-------+---------+-----+-------------+----------+------+---+-----+----+
4

1 回答 1

1
  • 因为期望的结果是每个dict中的数据'icdcodes' key都有一个单独的行,所以最好的选择是使用pandas.json_normalize.
  • 首先创建主数据框并使用pandas.DataFrame.explode('icdcodes'),这将扩展数据框以'clientid'根据 if dictsin的数量为每个数据框提供适当的行数'icdcodes'
  • .json_normalize()在列上使用'icdcodes',这是一个listof dicts,其中一些values也可能是dicts
  • .join两个数据框并删除'icdcodes'
  • 用于pandas.DataFrame.rename()重命名列,并pandas.DataFrame.drop()根据需要删除不需要的列。
  • 还可以从SO 中看到这个答案:Splitting dictionary/list inside a Pandas Column into separate Columns
import pandas as pd

# create the initial dataframe from api_results
df = pd.json_normalize(api_results).explode('icdcodes').reset_index(drop=True)

# create a dataframe for only icdcodes, which will expand all the lists of dicts
icdcodes = pd.json_normalize(df.icdcodes)

# join df to icdcodes and drop the icdcodes column
df = df.join(icdcodes).drop(['icdcodes'], axis=1)

# display(df)
                                                                                             requesturl   clientid  adjustedsummaryguidelines.midrangeallabsence  adjustedsummaryguidelines.midrangeclaims  adjustedsummaryguidelines.atriskallabsence  adjustedsummaryguidelines.atriskclaims  riskassessment.score riskassessment.status riskassessment.magnitude riskassessment.volatility  adjustedduration.bp.days adjustedduration.cp95.alert  adjustedduration.cp95.days adjustedduration.cp100.alert  adjustedduration.cp100.days cfactors.legalrep.applied cfactors.legalrep.alert                                                       alertdesc.red                                                   alertdesc.yellow    code                                                               name  meandurationdays.bp  meandurationdays.cp95  meandurationdays.cp100
0  http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4  123456789                                            46                                        36                                         374                                      98                 87.95         Red (Extreme)                    86.65                     89.25                         2                      yellow                         185                       yellow                          365                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.  719.41                                     Pain in joint, shoulder region                   18                     72                      93
1  http:\/\/www.odg-twc.com\/index.html?calculator.htm?icd=840.9~719.41&age=-1&state=IL&jobclass=1&cf=4  123456789                                            46                                        36                                         374                                      98                 87.95         Red (Extreme)                    86.65                     89.25                         2                      yellow                         185                       yellow                          365                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.   840.9  Sprains and strains of unspecified site of shoulder and upper arm                   10                     27                      35
2                                                                                                   NaN  987654321                                            25                                        42                                           0                                     194                 76.85         Orange (High)                    74.44                     79.25                         2                         NaN                          95                       yellow                          193                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.   724.2                                                            Lumbago                   10                     38                      50
3                                                                                                   NaN  987654321                                            25                                        42                                           0                                     194                 76.85         Orange (High)                    74.44                     79.25                         2                         NaN                          95                       yellow                          193                         1                     red  Recommend early intervention and priority medical case management.  Consider early intervention and priority medical case management.   847.2                                                   Sprain of lumbar                   10                     22                      29
于 2021-01-05T18:20:51.343 回答