我有一个从 Twitter 的 API 收集 JSON 数据的脚本。该脚本jq
每分钟收集数据并对其进行解析。这些数据被收集到一个文件中,最终看起来如下所示:
[
{"text": "Tweet 01",
"id": "001"
},
{"text": "Tweet 02",
"id": "002"
},
{"text": "Tweet 03",
"id": "003"
}
]
[
{"text": "Tweet 04",
"id": "004"
},
{"text": "Tweet 05",
"id": "005"
},
{"text": "Tweet 06",
"id": "006"
},
{"text": "Tweet 07",
"id": "007"
},
{"text": "Tweet 08",
"id": "008"
}
]
[
{"text": "Tweet 09",
"id": "009"
},
{"text": "Tweet 10",
"id": "010"
}
]
我以前每个文件都有一个 JSON 数据列表,Pandas 可以轻松地处理文件中的一个列表。但是我怎样才能有效地遍历这些不是逗号分隔且长度不一定相同的多个列表?
我的最终目标是聚合这个文件中的所有 JSON 数据并将其转换为 CSV 文件,其中每一列都是 JSON 数据中的一个键。它最终应该看起来像:
text, id
Tweet 01, 001
Tweet 02, 002
Tweet 03, 003
Tweet 04, 004
Tweet 05, 005
Tweet 06, 006
Tweet 07, 007
Tweet 08, 008
Tweet 09, 009
Tweet 10, 010
如果我还是要尝试读取文件,则会发生以下情况:
>>> import pandas as pd
>>> df = pd.read_json("sample.json")
>>> df.head()
Traceback (most recent call last):
File "lists.py", line 3, in <module>
df = pd.read_json("sample.json")
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/util/_decorators.py", line 214, in wrapper
return func(*args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/json/_json.py", line 608, in read_json
result = json_reader.read()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/json/_json.py", line 731, in read
obj = self._get_object_parser(self.data)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/json/_json.py", line 753, in _get_object_parser
obj = FrameParser(json, **kwargs).parse()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/json/_json.py", line 857, in parse
self._parse_no_numpy()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/io/json/_json.py", line 1089, in _parse_no_numpy
loads(json, precise_float=self.precise_float), dtype=None
ValueError: Trailing data