1

我有一个数据框由reviews列组成multi dimensional array,我想提取第一个元素,如下所示,

假设df['Reviews']包括以下rows

评论数据

我希望将输出放在单独的列中,如下所示,

输出

请在下面的列中找到示例数据 3 值:

df['评论'] = [['就像家一样','热烈欢迎来到冬日阿姆斯特丹'],['01/03/2018','01/01/2018']] [['美味的食物和员工', '刚刚好'], ['01/06/2018', '01/04/2018']] [['满意', '美味的老派餐厅'], ['01/04/2018', ' 2018 年 1 月 4 日']]

请帮忙

4

4 回答 4

0

如果需要第一个列表,请使用以下索引str[0]

import ast

df['Reviews'] = df['Reviews'].apply(ast.literal_eval).str[0]

如果需要通过,字符串添加加入列表Series.str.join

import ast

df['Reviews'] = df['Reviews'].apply(ast.literal_eval).str[0].str.join(',')
于 2020-06-01T08:20:28.910 回答
0

您需要根据需要为您的数据框访问添加以下内容。这将创建一个名为output的新列,并具有适当的要求

应用功能

df['output'] = df.Reviews.apply(lambda x: x[0])

地图功能

df.loc[:, 'output'] = df.Reviews.map(lambda x: x[0])
于 2020-06-01T07:43:54.037 回答
0

如果您收到错误,您可能在评论中有一些空数据。如果此类数据对您无用,您可以删除它们: df.dropna(subset='Reviews', inplace=True)

或添加检查类型的数据:

a = [[['Just like home', 'A Warm Welcome to Wintry Amsterdam'], ['01/03/2018', '01/01/2018']], [['Great food and staff', 'just perfect'], ['01/06/2018', '01/04/2018']], [['Satisfaction', 'Delicious old school restaurant'], ['01/04/2018', '01/04/2018']]]

df = pd.DataFrame(columns=['Reviews', 'Review'])
df['Reviews'] = a
df
executed in 18ms, finished 07:39:04 2020-06-05
Reviews Review
0   [[Just like home, A Warm Welcome to Wintry Ams...   NaN
1   [[Great food and staff, just perfect], [01/06/...   NaN
2   [[Satisfaction, Delicious old school restauran...   NaN

def get_review(reviews):
    if type(reviews) == list:
        return reviews[0]
    else:
        return None

df['Review'] = df['Reviews'].apply(get_review)
df
    Reviews Review
0   [[Just like home, A Warm Welcome to Wintry Ams...   [Just like home, A Warm Welcome to Wintry Amst...
1   [[Great food and staff, just perfect], [01/06/...   [Great food and staff, just perfect]
2   [[Satisfaction, Delicious old school restauran...   [Satisfaction, Delicious old school restaurant]

如果您不希望 columnReview成为列表,只需将其转换为带有一些分隔符的字符串:

def get_review(reviews):
    if type(reviews) == list:
        return ', '.join(reviews[0])
    else:
        return ''

df['Review'] = df['Reviews'].apply(get_review)
df
    Reviews Review
0   [[Just like home, A Warm Welcome to Wintry Ams...   Just like home, A Warm Welcome to Wintry Amste...
1   [[Great food and staff, just perfect], [01/06/...   Great food and staff, just perfect
2   [[Satisfaction, Delicious old school restauran...   Satisfaction, Delicious old school restaurant

我您的输入数据不是列表类型(即您从 CSV 读取它),您需要先将其转换为列表:

import ast

def get_review(reviews):
    if pd.notna(reviews) and reviews != '': 
        r_list = ast.literal_eval(reviews)[0]
        if len(r_list) > 0:
            return ', '.join(r_list)
        else:
            return ''
    else:
        return ''

df2['Review'] = df2['Reviews'].apply(get_review)
df2

Reviews Review
Reviews Review
0   [['Just like home', 'A Warm Welcome to Wintry ...   Just like home, A Warm Welcome to Wintry Amste...
1   [['Great food and staff', 'just perfect'], ['0...   Great food and staff, just perfect
2   [['Satisfaction', 'Delicious old school restau...   Satisfaction, Delicious old school restaurant
于 2020-06-01T08:30:35.397 回答
0

我想这应该会有所帮助。这对我有用。

df['Reviews']=df['Reviews'].apply(lambda c: str(c[0]).strip('[]'))

如果运行一次效果很好。如果在相同的代码上再次运行,它将进一步划分文本。所以我建议在使用后将其注释掉。或者创建一个新列。

PS:您应该包含代码而不是屏幕截图,以便可以先进行测试。

编辑 在此处输入图像描述 对我来说看起来很好。再试一次,记住如果你运行它两次(以防不做单独的列),它会返回无

于 2020-06-01T08:04:30.793 回答