python - 如何在 Python 3.6 中读取文件并分离其内容

Question

我正在尝试读取 Python 3.6 上的文件并将他的信息存储在两个不同的变量中，第一个存储来自 #list of exemplars 的“exemplars”，另一个存储来自 #list of samples 的“样本”。但是，我只从第一个列表和整个第二个列表中得到一行。

我正在阅读的文件：

这就是我得到的：

第一个列表

ff44578jhT marsBug 2 7 3 5 2 1 71 235 312

第二名单

k345fv78 小怪兽 2 4 3 0 2 1 89 2345 0

k434fv78 大怪物 1 3 3 0 2 1 89 2345 0

k623fv78 巨大怪物 2 4 3 0 2 1 89 2345 0

k13ued31 教育 3 2 1 8 0 1 20 4 0

k123vv31 notbigMonster 4 8 9 3 4 2 200 4000 0

这就是我应该得到的：

第一个列表

ff44578jhT marsBug 2 7 3 5 2 1 71 235 312

ff11443asT; 妈咪；4；2；1个；4；6；3；1个；11; 23

ff1123dast; 名词；1个；3；1个；2；3；2；1个；1个；3

ff44578jhT; 木星虫；2；7; 3；5个；2；1个；71; 235; 312

ff44578jhT; 天王星虫；2；7; 3；5个；2；1个；71; 235; 312

k123vv31；臭虫; 4；8个；9; 3；4；2；200; 4000；0

第二名单

k345fv78 小怪兽 2 4 3 0 2 1 89 2345 0

k434fv78 大怪物 1 3 3 0 2 1 89 2345 0

k623fv78 巨大怪物 2 4 3 0 2 1 89 2345 0

k13ued31 教育 3 2 1 8 0 1 20 4 0

k123vv31 notbigMonster 4 8 9 3 4 2 200 4000 0

def readFromFile(file_name):
    examplars=[]
    samples=[]
    in_file = open(file_name, 'r')

    if "#List of exemplars:\n" in in_file:
        for line in in_file:
            info1, info2, info3, info4, info5, info6, info7, info8, info9, info10, info11 = line.split("; ")
            print(info1, info2, info3, info4, info5, info6, info7, info8, info9, info10, info11) #using print to see what is happening but the objective would be to append all the infos in a tuple
            if "#List of samples:\n" in in_file:
                    for line in in_file:
                        info1, info2, info3, info4, info5, info6, info7, info8, info9, info10, info11 = line.split("; ")
                        print(info1, info2, info3, info4, info5, info6, info7, info8, info9, info10, info11) #using print to see what is happening but the objective would be to append all the infos in a tuple

score 1 · Accepted Answer

最好使用带有 ';' 的pandas 分隔器：

import pandas as pd
df = pd.read_csv('file_name.txt', separator = ';', header=None)

只需读入两个文件，然后操作数据框即可获得所需的内容。

score 1 · Accepted Answer

根据您格式化的方式和您想要获得的内容，我建议您使用CSV 模块。如果您有这种格式的带有冒号的大型列表，请不要担心，Python 的 csv 模块也可以让您更改分隔符。

以下是您可能可以使用的一些代码：

import csv
with open('example.csv', 'rb') as csvfile:
    reader = csv.reader(csvfile, delimiter='; ')

然后要获取每个阅读器对象的内容，基本上可以用作列表。

print(reader[row][column])

这将打印行和列的值。您可能必须为文件创建标头才能使其在 Python 中工作。查看 Python 文档以获取更多信息。

score 1 · Accepted Answer

正如@Preston Hager 所提到的，您的文件格式表明您应该使用该csv模块。但是，另一种不使用csv和读取.txt文件的方法是：

with open("examples.txt", "r") as inFile:

    #Read all data from file.
    data = inFile.read()

    #Split each set to examples and samples.
    examples = data.split("#")[1].split(":\n")[1].split("\n")
    samples = data.split("#")[2].split(":\n")[1].split("\n")

    #Create sublists of every example or sample record and dispose the last record which is empty.
    examples = [example.split(";") for example in examples][:-1]
    samples = [sample.split(";") for sample in samples][:-1]

    #Print results.
    print("Examples: ")
    for example in examples:
        print(example)

    print("Samples: ")
    for sample in samples:
        print(sample)

输出：

 Examples: 
['ff44578jhT', ' marsBug', ' 2', ' 7', ' 3', ' 5', ' 2', ' 1', ' 71', ' 235', ' 312']
['ff11443asT', ' momu', ' 4', ' 2', ' 1', ' 4', ' 6', ' 3', ' 1', ' 11', ' 23']
['ff1123dasT', ' nomu', ' 1', ' 3', ' 1', ' 2', ' 3', ' 2', ' 1', ' 1', ' 3']
['ff44578jhT', ' jupiterBug', ' 2', ' 7', ' 3', ' 5', ' 2', ' 1', ' 71', ' 235', ' 312']
['ff44578jhT', ' uranusBug', ' 2', ' 7', ' 3', ' 5', ' 2', ' 1', ' 71', ' 235', ' 312']
['k123vv31', ' bibug', ' 4', ' 8', ' 9', ' 3', ' 4', ' 2', ' 200', ' 4000', ' 0']
Samples: 
['k345fv78', ' littleMonster', ' 2', ' 4', ' 3', ' 0', ' 2', ' 1', ' 89', ' 2345', ' 0']
['k434fv78', ' bigMonster', ' 1', ' 3', ' 3', ' 0', ' 2', ' 1', ' 89', ' 2345', ' 0']
['k623fv78', ' hugeMonster', ' 2', ' 4', ' 3', ' 0', ' 2', ' 1', ' 89', ' 2345', ' 0']
['k13ued31', ' edu', ' 3', ' 2', ' 1', ' 8', ' 0', ' 1', ' 20', ' 4', ' 0']

python - 如何在 Python 3.6 中读取文件并分离其内容

第一个列表

第二名单

第一个列表

第二名单

3 回答 3

Related

Reference