4

考虑以下使用 strtok 拆分字符串 madddy 的片段。

char* str = (char*) malloc(sizeof("Madddy"));
strcpy(str,"Madddy");

char* tmp = strtok(str,"d");
std::cout<<tmp;

do
{
    std::cout<<tmp;
    tmp=strtok(NULL, "dddy");
}while(tmp!=NULL);

它工作正常,输出是 Ma。但是通过将 strtok 修改为以下内容,

tmp=strtok(NULL, "ay");

输出变为 Madd。那么 strtok 究竟是如何工作的呢?我有这个问题是因为我希望 strtok 将分隔符字符串中的每个字符都作为分隔符。但在某些情况下,它会这样做,但在少数情况下,它会产生意想不到的结果。谁能帮我理解这一点?

4

6 回答 6

10

“试图理解 strtok” 祝你好运!

无论如何,我们在 2011 年。正确标记:

std::string str("abc:def");
char split_char = ':';
std::istringstream split(str);
std::vector<std::string> token;

for (std::string each; std::getline(split, each, split_char); token.push_back(each));

:D

于 2011-01-14T02:34:05.940 回答
3

Fred Flintstone probably used strtok(). It predates multi threaded environments and beats up (modifies) the source string.

When called with NULL for the first parameter, it continues parsing the last string. This feature was convenient, but a bit unusual even in its day.

于 2011-01-14T02:37:08.000 回答
2

Actually your code is wrong, no wonder you get unexpected results:

char* str = (char*) malloc(sizeof("Madddy"));

should be

char* str = (char*) malloc(strlen("Madddy") + 1);
于 2011-01-14T02:40:27.433 回答
1

似乎您忘记了您第一次(在循环外)通过分隔符“d”调用了 strtok。

strtok 工作正常。你应该在这里有一个参考。

对于第二个示例(strtok("ay")):

首先,调用 strtok(str, "d")。它将查找第一个“d”,并分隔您的字符串。具体来说,它设置 tmp = "Ma" 和 str = "ddy"(删除第一个 "d")。

然后,调用 strtok(str, "ay")。它将在 str 中查找“a”,但由于您的字符串现在只是“ddy”,因此不会发生匹配。然后它会寻找一个“y”。所以 str = "dd" 和 tmp = ""。

如您所见,它会打印“Madd”。

于 2011-01-14T03:02:47.017 回答
0

I asked a question inspired from another question about functions causing security problems/bad practise functions and the c standard library.

To quote the answer given to me from there:

A common pitfall with the strtok() function is to assume that the parsed string is left unchanged, while it actually replaces the separator character with '\0'.

Also, strtok() is used by making subsequent calls to it, until the entire string is tokenized. Some library implementations store strtok()'s internal status in a global variable, which may induce some nasty suprises, if strtok() is called from multiple threads at the same time.

As you've tagged your question C++, use something else! If you want to use C, I'd suggest implementing your own tokenizer that works in a safe fashion.

于 2011-01-14T02:48:08.950 回答
0

由于您将标签更改为 C 而不是 C++,因此我重写了您的函数以使用 printf,以便您可以看到发生了什么。黄是对的。您看到正确的输出,但我认为您在同一行打印所有内容,因此您对输出感到困惑。看看 Hoang 的回答,因为他正确地解释了正在发生的事情。此外,正如其他人所指出的那样, strtok 会破坏输入字符串,因此您必须小心这一点 - 而且它不是线程安全的。但是,如果您需要一个快速的脏标记器,它可以工作。此外,我更改了代码以正确使用 strlen,而不是安德斯正确指出的 sizeof。

这是您的代码修改为更像 C:

char* str = (char*) malloc(strlen("Madddy") + 1);
strcpy(str,"Madddy");

char* tmp = strtok(str,"d");
printf ("first token: %s\n", tmp);

do
{
    tmp=strtok(NULL, "ay");
    if (tmp != NULL ) {
       printf ("next token: %s\n", tmp);
    }
} while(tmp != NULL);
于 2011-01-14T19:52:19.570 回答