python - 如何使用 json_normalize 规范化嵌套的 json

Question

我正在尝试从嵌套的 json 中创建一个 pandas 数据框。由于某种原因，我似乎无法解决第三级问题。

我的 json 看起来像这样：

  "numberOfResults": 376,
  "results": [
    {
      "name": "single",
      "docs": [
        {
          "id": "RAKDI342342",
          "type": "Culture",
          "category": "Culture",
          "media": "unknown",
          "label": "exampellabel",
          "title": "testtitle and titletest",
          "subtitle": "Archive" 

            ]
        },
        {
          "id": "GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER",
          "type": "Culture",
          "category": "Culture",
          "media": "image",
          "label": "more label als example",
          "title": "test the second title",
          "subtitle": "picture"

等等。

在“文档”部分中是所有实际结果，以“id”开头。一旦所有信息都在那里，下一个以“id”开头的块就紧随其后。

现在我正在尝试为每个单独的块（在本例中为实际项目）创建一个带有键 id、标签和标题（开始）的表。

在定义了 search_url（我从中获取 json）之后，我的代码目前如下所示：

result = requests.get(search_url)
data = result.json()
data.keys()

有了这个，我被告知他们 dict_keys 如下：

dict_keys(['numberOfResults', 'results', 'facets', 'entities', 'fulltexts', 'correctedQuery', 'highlightedTerms', 'randomSeed', 'nextCursorMark'])

鉴于上面的 json，我知道我想查看“结果”，然后进一步查看“文档”。根据我找到的文档，我应该能够通过直接寻址结果部分然后通过用“。”分隔字段来寻址嵌套位来实现这一点。我现在尝试了以下代码：

fields = ["docs.id", "docs.label", "docs.title"]
df = pd.json_normalize(data["results"])
df[fields]

这一直有效，直到 df[field] - 在这个阶段程序告诉我：

KeyError: "['docs.id'] not in index"

不过，它确实适用于上述级别，所以如果我尝试对“名称”和“文档”进行相同操作，我会得到一个可爱的数据框。我究竟做错了什么？我仍然是 python 和 pandas 的初学者，非常感谢任何帮助！

编辑：

所需的数据帧输出大致如下所示：

    id              label            title  
0   RAKDI342342     exampellabel     testtitle and titletest

score 2 · Accepted Answer

利用pandas.json_normalize()
以下代码使用pandas v.1.2.4
如果您不想要其他列，请删除keys分配给的列表meta
用于pandas.DataFrame.drop从中删除任何其他不需要的列df。

import pandas as pd

df = pd.json_normalize(data, record_path=['results', 'docs'], meta=[['results', 'name'], 'numberOfResults'])

display(df)
                                 id     type category    media                   label                    title subtitle results.name numberOfResults
0                       RAKDI342342  Culture  Culture  unknown            exampellabel  testtitle and titletest  Archive       single             376
1  GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER  Culture  Culture    image  more label als example    test the second title  picture       single             376

数据

发布的 JSON / Dict 格式不正确
假设以下更正的形式

data = \
{'numberOfResults': 376,
 'results': [{'docs': [{'category': 'Culture',
                        'id': 'RAKDI342342',
                        'label': 'exampellabel',
                        'media': 'unknown',
                        'subtitle': 'Archive',
                        'title': 'testtitle and titletest',
                        'type': 'Culture'},
                       {'category': 'Culture',
                        'id': 'GUI6N5QHBPTO6GJ66VP5OXB7GKX6J7ER',
                        'label': 'more label als example',
                        'media': 'image',
                        'subtitle': 'picture',
                        'title': 'test the second title',
                        'type': 'Culture'}],
              'name': 'single'}]}

python - 如何使用 json_normalize 规范化嵌套的 json

1 回答 1

数据

Related

Reference