0

我有一个数据框

   Unnamed: 0                       game score home_odds draw_odds away_odds country                 league             datetime
0           0  Sport Recife - Imperatriz   2:2      1.36      4.31      7.66  Brazil  Copa do Nordeste 2020  2020-02-07 00:00:00
1           1           ABC - America RN   2:1      2.62      3.30      2.48  Brazil  Copa do Nordeste 2020  2020-02-02 22:00:00
2           2  Frei Paulistano - Nautico   0:2      5.19      3.58      1.62  Brazil  Copa do Nordeste 2020  2020-02-02 00:00:00
3           3    Botafogo PB - Confianca   1:1      2.06      3.16       3.5  Brazil  Copa do Nordeste 2020  2020-02-02 22:00:00
4           4          Fortaleza - Ceara   1:1      2.19      2.98      3.38  Brazil  Copa do Nordeste 2020  2020-02-02 22:00:00

我正在执行以下功能

df['game'] = df['game'].astype(str).str.replace('(\(\w+\))', '', regex=True)
df['league'] = df['league'].astype(str).str.replace('(\s\d+\S\d+)$', '', regex=True)
df['game'] = df['game'].astype(str).str.replace('(\s\d+\S\d+)$', '', regex=True)
df[['home_team', 'away_team']] = df['game'].str.split(' - ', expand=True, n=1)
df[['home_score', 'away_score']] = df['score'].str.split(':', expand=True)
df['away_score'] = df['away_score'].astype(str).str.replace('[a-zA-Z\s\D]', '', regex=True)
df['home_score'] = df['home_score'].astype(str).str.replace('[a-zA-Z\s\D]', '', regex=True)
df = df[df.home_score != "."]
df = df[df.home_score != ".."]
df = df[df.home_score != "."]
df = df[df.home_odds != "-"]
df = df[df.draw_odds != "-"]
df = df[df.away_odds != "-"]
m = df[['home_odds', 'draw_odds', 'away_odds']].astype(str).agg(lambda x: x.str.count('/'), 1).ne(0).all(1)
n = df[['home_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)
o = df[['away_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)
df = df[~m]
df = df[~n]
df = df[~o]
df = df[df.home_score != '']
df = df[df.away_score != '']
df = df.dropna()

但是,当我这样做时,我会收到警告:

UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  df = df[~n]
UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  df = df[~o]

我该如何解决这个问题?

Lorem ipsum dolor sit amet,consectetur adipiscing elit,sed do eiusmod tempor incididunt ut labore et dolore magna aliqua。Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat。

4

1 回答 1

1

我认为您可以尝试更改:

n = df[['home_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)
o = df[['away_score']].agg(lambda x: x.str.count('-'), 1).ne(0).all(1)

至:

n = df['home_score'].str.count('-').ne(0)
o = df['away_score'].str.count('-').ne(0)

它应该是一样的:

n = ~df['home_score'].str.contains('-')
o = ~df['away_score'].str.contains('-')

也应该改变:

df = df[df.home_score != "."]
df = df[df.home_score != ".."]
df = df[df.home_score != "."]
df = df[df.home_odds != "-"]
df = df[df.draw_odds != "-"]
df = df[df.away_odds != "-"]

至:

df = df[~df.home_score.isin([".",".."]) | 
         df[['home_odds','draw_odds','away_odds']].ne("-").any(axis=1)]
于 2021-12-20T05:18:02.930 回答