php - 使用 preg_match 提取字符串中出现的任何 unicode 字符串

Question

我有这种字符串

sample İletişim form:: aşağıdaki formu

我的目标是使用 php 的 preg_match 或 preg_match_all 提取其中包含 unicode/非 ascii 字符的字符串。

所以我只期待 2 İletişim和aşağıdaki字的结果。

Array
(
    [0] => İletişim 
    [1] => aşağıdaki
)

我只是想不出正则表达式，因为我不擅长它。欢迎任何帮助。

太感谢了。

score 1 · Accepted Answer

您可以使用 unicode 属性：

$string = 'sample İletişim form:: aşağıdaki formu';
preg_match_all("/(\pL+)/u", $string, $matches); 
print_r($matches);

输出：

Array
(
    [0] => Array
        (
            [0] => sample
            [1] => İletişim
            [2] => form
            [3] => aşağıdaki
            [4] => formu
        )

    [1] => Array
        (
            [0] => sample
            [1] => İletişim
            [2] => form
            [3] => aşağıdaki
            [4] => formu
        )

)

score 1 · Accepted Answer

我认为您想要的解决方案的开始就在这里：如何检测字符串中的非 ASCII 字符？

通过使用 preg_match()，你可以这样做：

preg_match_all('/[^\s]*[^\x20-\x7f]+[^\s]*/', $string, $matches);
print_r($matches);

或者，没有 preg_match，您可以使用函数 mb_detect_encoding() 来测试字符串的编码。在您的情况下，您可以这样使用它：

$matches = array_filter(explode(' ', $string), function($item) {
    return !mb_detect_encoding($item, 'ASCII', TRUE);
});
print_r($matches);

不过最后一个有点歪^^

php - 使用 preg_match 提取字符串中出现的任何 unicode 字符串

2 回答 2

Related

Reference