0

我需要使用 Pandas 读取 CSV 文件,并且 CSV 中的一列是 JSON 数据。但是,一旦我引入文件,JSON 就会损坏(?),我无法使用json_normalize()它。

我无法附加该文件,但这里有一些演示该问题的示例代码:

df = pd.DataFrame({'location_id':[1,2,3], 'visits':[{"ABCD":9,"DEFG":8,"ASDF":6},
                                                    {"XYZR":4,"ABCD":4},
                                                    {"ASDF":4}]})
pd.json_normalize(df.visits)
# OUTPUTS THE NORMALIZED JSON JUST FINE

df.to_csv('test_visits.csv')
df2 = pd.read_csv('test_visits.csv')
pd.json_normalize(df2.visits)

# RESULTS IN ERROR:
# AttributeError: 'str' object has no attribute 'values'

有什么我遗漏的东西read_csv()可以让 JSON 保持可用吗?

先感谢您。

4

2 回答 2

1
In [77]: df = pd.DataFrame({'location_id':[1,2,3], 'visits':[{"ABCD":9,"DEFG":8,"ASDF":6},
    ...:                                                     {"XYZR":4,"ABCD":4},
    ...:                                                     {"ASDF":4}]})

In [78]: df
Out[78]:
   location_id                             visits
0            1  {'ABCD': 9, 'DEFG': 8, 'ASDF': 6}
1            2             {'XYZR': 4, 'ABCD': 4}
2            3                        {'ASDF': 4}

In [79]: pd.json_normalize(df["visits"])
Out[79]:
   ABCD  DEFG  ASDF  XYZR
0   9.0   8.0   6.0   NaN
1   4.0   NaN   NaN   4.0
2   NaN   NaN   4.0   NaN

发生这种情况是因为一旦您写入 csv 并从 csv 读取它,pandas 会将其读取为字符串。因此,当您尝试对其进行规范化时,它会抛出错误说str对象没有属性values,因为它不是 json 对象

于 2020-07-16T23:12:11.277 回答
1
  • 问题是,'visits'列是str类型(例如'{"ABCD":9,"DEFG":8,"ASDF":6}')。
  • 加载 csv 时.read_csv,使用converters参数将ast.literal_eval应用于'visits'列,这会将 转换strdict.
    • converters用于转换某些列中的值的函数的字典。键可以是整数或列标签。
from ast import literal_eval
import pandas as pd

# load the csv using the converters parameter with literal_eval
df2 = pd.read_csv('test_visits.csv', converters={'visits': literal_eval})

# normalize the visits, join it to location_id and drop the visits column
df2 = df2.join(pd.json_normalize(df2.visits)).drop(columns=['visits'])

# display(df)
   location_id  ABCD  DEFG  ASDF  XYZR
0            1   9.0   8.0   6.0   NaN
1            2   4.0   NaN   NaN   4.0
2            3   NaN   NaN   4.0   NaN
于 2020-07-17T15:53:28.323 回答