1

例如:

df.select('category').show()

+---------------------------+
|                   category|
+---------------------------+
|            money,insurance|
|            life, housework|
|           game,FPS,network|
|            game,fight,jump|
|                      hotel|
|                 trip,hotel|
|                       null|

我想用RLIKE写一个正则表达式来模糊匹配子字符串列表之一,['money', 'life'].

-- This is an exact match
SELECT * 
FROM tb_name
WHERE col_name RLIKE '(money|life)'

-- This is a fuzzy match
SELECT * 
FROM tb_name
WHERE col_name RLIKE '*.(money|life)'

但是模糊匹配代码片段中的 ast 树存在错误。

06-11 16:59:17-fatal filter ast tree

(TOK_QUERY (TOK_FROM (TOK_TABREF (TOK_TAB tb_name))) (TOK_INSERT (TOK_DESTINATION (TOK_DIR "hdfs://XXXX/XX")) (TOK_SELECT (TOK_SELEXPR TOK_ALLCOLREF)) (TOK_WHERE (RLIKE (TOK_TABLE_OR_COL col_name) '*.(money|生活)')) (TOK_LIMIT 2000)))

06-11 16:59:17-致命过滤器功能:。TOK_TAB \S tdw_inter_db.*|。TOK_(CUBE|ROLLUP) 。

所以我看不出模糊匹配代码段有什么问题。
那么有人可以帮助我吗?
提前感谢。

4

1 回答 1

1

'(?i)money|life'regexp 将匹配包含任何money, life, 不区分大小写的字符串 -(?i)

于 2020-06-11T12:32:43.220 回答