因此,解析一个包含 56,900 个书名的大型文本文件,其中包含作者和 etext 编号。试图找到作者。通过解析文件。该文件是这样的:
TITLE and AUTHOR ETEXT NO.
Aspects of plant life; with special reference to the British flora, 56900
by Robert Lloyd Praeger
The Vicar of Morwenstow, by Sabine Baring-Gould 56899
[Subtitle: Being a Life of Robert Stephen Hawker, M.A.]
Raamatun tutkisteluja IV, mennessä Charles T. Russell 56898
[Subtitle: Harmagedonin taistelu]
[Language: Finnish]
Raamatun tutkisteluja III, mennessä Charles T. Russell 56897
[Subtitle: Tulkoon valtakuntasi]
[Language: Finnish]
Tom Thatcher's Fortune, by Horatio Alger, Jr. 56896
A Yankee Flier in the Far East, by Al Avery 56895
and George Rutherford Montgomery
[Illustrator: Paul Laune]
Nancy Brandon's Mystery, by Lillian Garis 56894
Nervous Ills, by Boris Sidis 56893
[Subtitle: Their Cause and Cure]
Pensées sans langage, par Francis Picabia 56892
[Language: French]
Helon's Pilgrimage to Jerusalem, Volume 2 of 2, by Frederick Strauss 56891
[Subtitle: A picture of Judaism, in the century
which preceded the advent of our Savior]
Fra Tommaso Campanella, Vol. 1, di Luigi Amabile 56890
[Subtitle: la sua congiura, i suoi processi e la sua pazzia]
[Language: Italian]
The Blue Star, by Fletcher Pratt 56889
Importanza e risultati degli incrociamenti in avicoltura, 56888
di Teodoro Pascal
[Language: Italian]
The Junior Classics, Volume 3: Tales from Greece and Rome, by Various 56887
~ ~ ~ ~ Posting Dates for the below eBooks: 1 Mar 2018 to 31 Mar 2018 ~ ~ ~ ~
TITLE and AUTHOR ETEXT NO.
The American Missionary, Volume 41, No. 1, January, 1887, by Various 56886
Morganin miljoonat, mennessä Sven Elvestad 56885
[Author a.k.a. Stein Riverton]
[Subtitle: Salapoliisiromaani]
[Language: Finnish]
"Trip to the Sunny South" in March, 1885, by L. S. D 56884
Balaam and His Master, by Joel Chandler Harris 56883
[Subtitle: and Other Sketches and Stories]
Susien saaliina, mennessä Jack London 56882
[Language: Finnish]
Forged Egyptian Antiquities, by T. G. Wakeling 56881
The Secret Doctrine, Vol. 3 of 4, by Helena Petrovna Blavatsky 56880
[Subtitle: Third Edition]
No Posting 56879
作者姓名通常以“by”开头,或者当行中没有“by”时,作者姓名以逗号“,”开头。但是,如果该行有 by,则“,”可以是标题的一部分。
所以,我先解析它,然后解析逗号。
这是我尝试过的:
def search_by_author():
fhand = open('GUTINDEX.ALL')
print("Search by Author:")
for line in fhand:
if not line.startswith(" [") and not line.startswith("TITLE"):
if not line.startswith("~"):
words = line.rstrip()
words = line.lstrip()
words = words[:-6]
if ", by" in words:
words = words[words.find(', by'):]
words = words[5:]
print (words)
else:
words = words[words.find(', '):]
words = words[5:]
if "," in words:
words = words[words.find(', '):]
if words.startswith(','):
words =words[words.find(','):]
print (words)
else:
print (words)
else:
print (words)
if " by" in words:
words = words[words.find('by')]
print(words)
search_by_author()
但是,它似乎找不到像这样的行的作者姓名
Aspects of plant life; with special reference to the British flora, 56900
by Robert Lloyd Praeger