你可以试试这段代码
import requests
import re
city_list = ['Jerusalem', 'Tel-Aviv', 'New York', 'London', 'Madrid', 'Alliance',
'Mocoa', 'March', 'San Miguel', 'Neiva', 'Naranjito', 'San Fernando',
'Alliance', 'Progreso', 'NewYork', 'Toronto']
city_country_dict = {}
country_city_dict = {}
for city in city_list:
response = requests.request("GET", f"https://www.geonames.org/search.html?q={city}&country=")
country = re.findall("/countries.*\.html", response.text)[0].strip(".html").split("/")[-1]
if country not in country_city_dict:
country_city_dict[country] = [city]
else:
country_city_dict[country].append(city)
city_country_dict[city] = country
此代码使用城市名称请求地理名称,而不是搜索到国家/地区的第一个链接,您可以更改它并使用 beautifulsoup 使其更优雅。如果您在大型列表上运行此代码,请注意这需要时间,因为他等待 geoname 的响应!
示例输出:
city_country_dict = {'Jerusalem': 'israe', 'Tel-Aviv': 'israe', 'New York': 'united-states', 'London': 'united-kingdo', 'Madrid': 'spain', 'Alliance': 'united-states', 'Mocoa': 'colombia', 'March': 'switzerland', 'San Miguel': 'el-salvador', 'Neiva': 'colombia', 'Naranjito': 'puerto-rico', 'San Fernando': 'trinidad-and-tobago', 'Progreso': 'honduras', 'NewYork': 'united-kingdo', 'Toronto': 'canada'}
country_city_dict = {'israe': ['Jerusalem', 'Tel-Aviv'], 'united-states': ['New York', 'Alliance', 'Alliance'], 'united-kingdo': ['London', 'NewYork'], 'spain': ['Madrid'], 'colombia': ['Mocoa', 'Neiva'], 'switzerland': ['March'], 'el-salvador': ['San Miguel'], 'puerto-rico': ['Naranjito'], 'trinidad-and-tobago': ['San Fernando'], 'honduras': ['Progreso'], 'canada': ['Toronto']}