这是一个 JSON 文件的示例,我试图将其展平为数据框。如果有人熟悉的话,它基于 OCDS(开放合同数据标准)数据模型。
data =\
{'publishedDate': '2021-07-12T02:00:36Z',
'releases': [{'awards': [{'id': '391950',
'suppliers': [{'id': 'FO-296275',
'name': 'Medic Supplies A'}],
'value': {'amount': 189175.0}}],
'bids': [{'id': 'FO-296275', 'value': 37835.0},
{'id': 'FO-296276', 'value': 36131.0}],
'buyer': {'id': 'OP-17211', 'name': 'Hospital ABC'},
'contracts': [{'awardID': '391950',
'dateSigned': '2013-03-01T00:00:00Z',
'period': {'endDate': '2018-02-28T00:00:00Z'},
'value': {'amount': 192870.88}}],
'id': '201210995433',
'initiationType': 'tender',
'parties': [{'address': {'countryName': 'CAN',
'streetAddress': '1 Medic Street'},
'id': 'OP-17211',
'name': 'Hospital ABC',
'roles': ['buyer']},
{'address': {'countryName': 'CAN',
'streetAddress': '123 Supplier avenue'},
'id': 'FO-296275',
'name': 'Medic Supplies A',
'roles': ['supplier']},
{'address': {'countryName': 'CAN',
'streetAddress': '2 Medic Street'},
'id': 'FO-296276',
'name': 'Johnson & Johnson',
'roles': ['tenderer']}],
'tag': ['contractTermination'],
'tender': {'additionalProcurementCategories': ['Approvisionnement '
'(biens)'],
'documents': [{'id': '201210995433',
'url': 'www.medical.com'}],
'id': '3109',
'items': [{'additionalClassifications': [{'description': 'G21 '
'- '
'Medical '
'equipment',
'scheme': 'CATEGORY'}],
'classification': {'description': 'Electrodes',
'scheme': 'UNSPSC'},
'id': 39121436}],
'mainProcurementCategory': 'goods',
'numberOfTenderers': 2,
'procuringEntity': {'id': 'OP-17211',
'name': 'Hospital ABC'},
'tenderPeriod': {'endDate': '2012-10-31T14:00:00Z',
'startDate': '2012-10-09T09:54:33Z'},
'tenderers': [{'id': 'FO-296275',
'name': 'Medic Supplies A'},
{'id': 'FO-296276',
'name': 'Johnson & Johnson'}]}},
{'awards': [{'id': '749668',
'suppliers': [{'id': 'FO-531761',
'name': 'Cleaning Company A'}],
'value': {'amount': 13555047.0,
'totalamount': 21688073.0}}],
'buyer': {'id': 'OP-1321', 'name': 'University A'},
'contracts': [{'awardID': '749668',
'dateSigned': '2015-02-05T00:00:00Z',
'implementation': {'transactions': [{'id': '35304',
'value': {'amount': 2658405.49,
'currency': 'CAD'}}]},
'period': {'endDate': '2021-04-30T00:00:00Z'},
'value': {'amount': 22192837.2,
'currency': 'CAD'}}],
'id': '2014924145419',
'initiationType': 'tender',
'parties': [{'address': {'countryName': 'CAN',
'streetAddress': 'Education Street'},
'details': {'Municipal': '0'},
'id': 'OP-1321',
'name': 'University A'},
{'address': {'countryName': 'CAN',
'streetAddress': '1 Cleaning street'},
'details': {'NEQ': '1142147900'},
'id': 'FO-531761',
'name': 'Cleaning Company A',
'roles': ['supplier']},
{'address': {'countryName': 'CAN',
'streetAddress': '2 Cleaning street'},
'details': {'NEQ': '1144841450'},
'id': 'FO-531762',
'name': 'Cleaning Company B',
'roles': ['tenderer']}],
'tender': {'additionalProcurementCategories': ['Technical '
'cleaning '
'services'],
'documents': [{'id': '2014924145419',
'url': 'www.cleaning.com'}],
'id': '7000310',
'items': [{'additionalClassifications': [{'description': 'S9 '
'- '
'Cleaning '
'services',
'id': 'S9'}],
'classification': {'description': 'Cleaning '
'services',
'id': '76110000'},
'id': 76110000}],
'numberOfTenderers': 2,
'procuringEntity': {'id': 'OP-1321',
'name': 'University A'},
'tenderPeriod': {'endDate': '2014-11-21T15:00:00Z',
'startDate': '2014-09-24T14:54:19Z'},
'tenderers': [{'id': 'FO-531761',
'name': 'Cleaning Company A'},
{'id': 'FO-531762',
'name': 'Cleaning Company B'}]}}],
'version': '1.1'}
我正在使用 json_normalize。这是我现在拥有的代码:
import json
import pandas as pd
from pandas.io.json import json_normalize
with open('jsonFile', encoding="utf8") as f_in:
data = json.load(f_in)
releases = pd.json_normalize(data['releases'],
record_path='parties',
meta = [
'ocid',
'id',
'date',
'language',
['buyer','name'],
['buyer','id'],
'tag',
'initiationType',
['tender','id'],
['tender','title'],
['tender','deliveryarea'],
['tender','status'],
['tender','procuringEntity','name'],
['tender','procuringEntity','id'],
['tender','tenderPeriod','startDate'],
['tender','tenderPeriod','endDate'],
['tender','tenderPeriod','durationInDays'],
['tender','items'],
'bids',
'awards',
'contracts'
],
record_prefix='parties_',
errors='ignore')
我遇到的困难是将数据放入tender->items
, bids
,awards
和contracts
. 我尝试了很多组合,但我总是遇到以下错误之一:
list indices must be integers or slices, not str
sequence item 1: expected str instance, list found
使用现在的代码,一切都变平了,除此之外['tender','items'],'bids','awards','contracts'
看起来像这样(右边的 4):
我不知道如何在这 4 个字典列表中获取数据。任何帮助表示赞赏。