3

我有一个这样的头文件:

/*
 * APP 180-2 ALG-254/258/772 implementation
 * Last update: 03/01/2006
 * Issue date:  08/22/2004
 *
 * Copyright (C) 2006 Somebody's Name here
 * All rights reserved.
 *
 * Redistribution and use in source and binary forms, with or without
 * modification, are permitted provided that the following conditions
 * are met:
 * 1. Redistributions of source code must retain the above copyright
 *    notice, this list of conditions and the following disclaimer.
 * 2. Redistributions in binary form must reproduce the above copyright
 *    notice, this list of conditions and the following disclaimer in the
 *    documentation and/or other materials provided with the distribution.
 * 3. Neither the name of the project nor the names of its contributors
 *    may be used to endorse or promote products derived from this software
 *    without specific prior written permission.
 *
 * THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
 * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 * ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
 * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 * SUCH DAMAGE.
 */

#ifndef HEADER_H
#define HEADER_H

/* More comments and C++ code here. */

#endif /* End of file. */

我希望只提取第一个C 样式注释的内容,并在每行的开头删除“*”以获取包含以下内容的文件:

 APP 180-2 ALG-254/258/772 implementation
 Last update: 03/01/2006
 Issue date:  08/22/2004

 Copyright (C) 2006 Somebody's Name here
 All rights reserved.

 Redistribution and use in source and binary forms, with or without
 modification, are permitted provided that the following conditions
 are met:
 1. Redistributions of source code must retain the above copyright
    notice, this list of conditions and the following disclaimer.
 2. Redistributions in binary form must reproduce the above copyright
    notice, this list of conditions and the following disclaimer in the
    documentation and/or other materials provided with the distribution.
 3. Neither the name of the project nor the names of its contributors
    may be used to endorse or promote products derived from this software
    without specific prior written permission.

 THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND
 ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
 IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
 ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE
 FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
 DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS
 OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION)
 HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT
 LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY
 OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF
 SUCH DAMAGE.

请建议一种简单的方法来使用 Python、Perl、sed 或 Unix 上的其他方式来执行此操作。最好是单线。

4

3 回答 3

5

这应该适合你:

sed -n '/\*\//q; /^\/\*/d; s/^ \* \?//p' <file.h >comment.txt

这里有一个解释:sed(你可能知道)是一个命令,它通过一个文件将规则列表应用于每一行。每个规则都是由“选择器”和命令组成,只有选择匹配的匹配项时才会应用于该行。

第一条规则有选择器/\*\//。这是一个正则表达式选择器;它匹配任何包含字符的行*/。这两个都需要反斜杠转义,因为它们在正则表达式中具有特殊含义。(我假设这只会与您的情况下的评论的最后一行匹配,并且应该删除整行。)该命令的q意思是“退出”。sed 刚刚停止。通常它会打印出该行,但我提供了一个-n选项,意思是“除非明确指示,否则不要打印”。

第二条规则的选择器/^\/\*/又是一个正则表达式选择器,它匹配行首的字符/*。同样,我假设这一行将不包含评论的一部分。该d命令告诉 sed 删除这一行并继续前进。

最终规则没有选择器,因此它适用于所有行(除非先前的命令阻止处理到达最终规则)。最后一条规则中的命令是替换命令,s/PATTERN/REPLACEMENT/它在行中查找与某个模式匹配的文本并将其替换为替换文本。这里的模式是^ \* \?,它匹配一个空格、一个星号和 0 或 1 个空格,但仅在行首。替换什么都不是。所以 sed 只是简单地删除了前导空格-星号-(空格)?顺序。这p实际上是替换命令的一个标志,它告诉 sed 打印出替换的结果。由于-n选项而需要它。

于 2010-05-22T21:15:50.263 回答
4

Pyparsing 包含一个内置模式,用于匹配来自各种语言的注释格式。使用cStyleCommentandscanString查找源文件中的第一个注释使其余的只是字符串函数:

c_src = open(c_source_file).read()

from pyparsing import cStyleComment
cmt = cStyleComment.scanString(c_src).next()[0][0]
lines = [l[3:] for l in cmt.splitlines()]
print '\n'.join(lines)

scanString是一个生成器,它在转到下一个实例之前返回每个匹配项,因此只处理第一个评论。使用您的示例代码,这将返回:

APP 180-2 ALG-254/258/772 implementation 
Last update: 03/01/2006 
Issue date:  08/22/2004 

Copyright (C) 2006 Somebody's Name here 
All rights reserved. 

Redistribution and use in source and binary forms, with or without 
modification, are permitted provided that the following conditions 
are met: 
1. Redistributions of source code must retain the above copyright 
   notice, this list of conditions and the following disclaimer. 
2. Redistributions in binary form must reproduce the above copyright 
   notice, this list of conditions and the following disclaimer in the 
   documentation and/or other materials provided with the distribution. 
3. Neither the name of the project nor the names of its contributors 
   may be used to endorse or promote products derived from this software 
   without specific prior written permission. 

THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND 
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE 
ARE DISCLAIMED.  IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE 
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS 
OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) 
HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT 
LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY 
OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF 
SUCH DAMAGE. 
于 2010-05-23T01:58:56.297 回答
-1
sed -i -r "s/[\/\ ]{1}\*[\/\ ]?//g" YOURFILENAME

这将替换从您的文件中修剪评论,保留内容。不过,这将修改 YOURFILENAME 文件。如果您不希望从该行中删除 -i

于 2010-05-22T21:14:14.043 回答