python - 使用 Python 和正则表达式添加自定义 ID

Question

我在 markdown 中有一个文档，我想为每个城市条目添加一个自定义 ID。文档的基本布局如下：

#Country

## StateA

### CityA
#### Population
#### Government
#### History

### CityB
#### Population
#### Government
#### History

## StateB

### CityA
#### Population
#### Government
#### History

### CityB
#### Population
#### Government
#### History

对于每个城市，我想添加一个带有计数器的自定义 ID。例如，ID 看起来像：

#USA

## FL

### US_FL_00001
### US_FL_00002
### US_FL_00003

## GA

### US_GA_00001
### US_GA_00002
### US_GA_00003

我知道使用正则表达式来选择城市相对简单，使用 re.findall() 和 re.sub() 作为 '###' 标题，但我怎样才能拉入状态和 ID 的连续计数器？

score 1 · Accepted Answer

看起来您的样本输入和样本输出可能存在差异，但我的答案基于您的样本输出，您可以对其进行调整以满足您的需求。

这个想法是读入输入文件并逐行测试以查看该行是否代表一个国家、一个州或一个城市。然后将这些存储到以“####”开头的行，然后将结果与计数器一起输出到新文件。

import re

with open('input.md', 'r') as f:
    # read in the original file
    text = f.readlines()

# open the output file and loop through the original data
with open('output.md', 'w') as o:
    country_counter = counter = 0
    for line in text:
        # get the country
        m = re.match(r'^#([A-Za-z]+)', line)
        if m:
            country = m.group(1)
            # this checks to see if it is the first country
            # in the file. If so then we don't want the leading
            # newline characters
            if country_counter == 0:
                o.write(f'#{country}')
            else:
                o.write(f'\n\n#{country}')
            country_counter += 1

        # get the state
        m = re.match(r'^##\s([A-Za-z]+)', line)
        if m:
            state = m.group(1)
            # reset the counter
            counter = 0
            o.write(f'\n\n## {state}\n')

        # get the city
        m = re.match(r'^###\s([A-Za-z]+)', line)
        if m:
            # increase the counter and output the results
            # the counter is padded to 5 digits.
            counter += 1
            o.write(f'\n### {city}_{state}_{counter:05}')

python - 使用 Python 和正则表达式添加自定义 ID

1 回答 1

Related

Reference