我对PLY很陌生,而且比 Python 的初学者还要多。我正在尝试使用PLY-3.4和 python 2.7 来学习它。请看下面的代码。我正在尝试创建一个令牌 QTAG,它是一个由零个更多空格组成的字符串,后跟“Q”或“q”,然后是“。” 和一个正整数和一个或多个空格。例如 VALID QTAG 是
"Q.11 "
" Q.12 "
"q.13 "
'''
Q.14
'''
无效的是
"asdf Q.15 "
"Q. 15 "
这是我的代码:
import ply.lex as lex
class LqbLexer:
# List of token names. This is always required
tokens = [
'QTAG',
'INT'
]
# Regular expression rules for simple tokens
def t_QTAG(self,t):
r'^[ \t]*[Qq]\.[0-9]+\s+'
t.value = int(t.value.strip()[2:])
return t
# A regular expression rule with some action code
# Note addition of self parameter since we're in a class
def t_INT(self,t):
r'\d+'
t.value = int(t.value)
return t
# Define a rule so we can track line numbers
def t_newline(self,t):
r'\n+'
print "Newline found"
t.lexer.lineno += len(t.value)
# A string containing ignored characters (spaces and tabs)
t_ignore = ' \t'
# Error handling rule
def t_error(self,t):
print "Illegal character '%s'" % t.value[0]
t.lexer.skip(1)
# Build the lexer
def build(self,**kwargs):
self.lexer = lex.lex(debug=1,module=self, **kwargs)
# Test its output
def test(self,data):
self.lexer.input(data)
while True:
tok = self.lexer.token()
if not tok: break
print tok
# test it
q = LqbLexer()
q.build()
#VALID inputs
q.test("Q.11 ")
q.test(" Q.12 ")
q.test("q.13 ")
q.test('''
Q.14
''')
# INVALID ones are
q.test("asdf Q.15 ")
q.test("Q. 15 ")
我得到的输出如下:
LexToken(QTAG,11,1,0)
Illegal character 'Q'
Illegal character '.'
LexToken(INT,12,1,4)
LexToken(QTAG,13,1,0)
Newline found
Illegal character 'Q'
Illegal character '.'
LexToken(INT,14,2,6)
Newline found
Illegal character 'a'
Illegal character 's'
Illegal character 'd'
Illegal character 'f'
Illegal character 'Q'
Illegal character '.'
LexToken(INT,15,3,7)
Illegal character 'Q'
Illegal character '.'
LexToken(INT,15,3,4)
请注意,只有第一个和第三个有效输入被正确标记。我无法弄清楚为什么我的其他有效输入没有被正确标记。在 t_QTAG 的文档字符串中:
- 替换
'^'
为'\A'
无效。 - 我尝试删除
'^'
. 然后所有有效输入都被标记化,但是第二个无效输入也被标记化。
提前感谢任何帮助!
谢谢
PS:我加入了 google-group ply-hack 并尝试在那里发帖,但我无法直接在论坛或通过电子邮件发帖。我不确定该组是否已处于活动状态。Beazley 教授也没有回应。有任何想法吗?