大家下午好,
我想从 DataFrame 中过滤掉我不感兴趣的列。要做到这一点 - 由于列可能会根据用户输入而改变(我不会在这里显示) - 我在我的offshore_filter函数中使用以下代码:
# Note: 'df' is my DataFrame, with different country codes as rows and years as columns' headers
import datetime as d
import pandas as pd
COUNTRIES = [
'EU28', 'AL', 'AT', 'BE', 'BG', 'CY', 'CZ', 'DE', 'DK', 'EE', 'EL',
'ES', 'FI', 'FR', 'GE', 'HR', 'HU', 'IE', 'IS', 'IT', 'LT', 'LU', 'LV',
'MD', 'ME', 'MK', 'MT', 'NL', 'NO', 'PL', 'PT', 'RO', 'SE', 'SI', 'SK',
'TR', 'UA', 'UK', 'XK'
YEARS = list(range(2005, int(d.datetime.now().year)))
def offshore_filter(df, countries=COUNTRIES, years=YEARS):
# This function is specific for filtering out the countries
# and the years not needed in the analysis
# Filter out all of the countries not of interest
df.drop(df[~df['country'].isin(countries)].index, inplace=True)
# Filter out all of the years not of interest
columns_to_keep = ['country', 'country_name'] + [i for i in years]
temp = df.reindex(columns=columns_to_keep)
df = temp # This step to avoid the copy vs view complication
return df
当我传递一个years整数列表时,代码运行良好,并通过仅获取列表中的列来过滤 DataFrame years。
但是,如果 DataFrame 的列标题是字符串(例如,'2018'而不是2018),则更[i for i in years]改为[str(i) for i in years]不起作用,并且我有 Nan 的列(如reindex 文档所述)。
你能帮我找出原因吗?