12

I want find two types of files with two different extensions: .jl and .jsonlines. I use

from pathlib import Path
p1 = Path("/path/to/dir").joinpath().glob("*.jl")
p2 = Path("/path/to/dir").joinpath().glob("*.jsonlines")

but I want p1 and p2 as one variable not two. Should I merge p1 and p2 in first place? Are there other ways to concatinate glob's patterns?

4

6 回答 6

20
from pathlib import Path

exts = [".jl", ".jsonlines"]
mainpath = "/path/to/dir"

# Same directory

files = [p for p in Path(mainpath).iterdir() if p.suffix in exts]

# Recursive

files = [p for p in Path(mainpath).rglob('*') if p.suffix in exts]

# 'files' will be a generator of Path objects, to unpack into strings:

list(files)
于 2019-09-11T16:06:15.307 回答
4

If you're ok with installing a package, check out wcmatch. It can patch the Python PathLib so that you can run multiple matches in one go:

from wcmatch.pathlib import Path
paths = Path('path/to/dir').glob(['*.jl', '*.jsonlines'])
于 2019-12-03T15:13:35.707 回答
1

Inspired by @aditi's answer, I came up with this:

from pathlib import Path
from itertools import chain

exts = ["*.jl", "*.jsonlines"]
mainpath = "/path/to/dir"

P = []
for i in exts:
    p = Path(mainpath).joinpath().glob(i)
    P = chain(P, p)
print(list(P))
于 2018-01-10T06:59:25.513 回答
1

Depending on your application the proposed solution can be inefficient as it has to loop over all files in the directory multiples times, (one for each extension/pattern).

In your example you are only matching the extension in one folder, a simple solution could be:

from pathlib import Path

folder = Path("/path/to/dir")
extensions = {".jl", ".jsonlines"}
files = [file for file in folder.iterdir() if file.suffix in extensions]

Which can be turned in a function if you use it a lot.

However, if you want to be able to match glob patterns rather than extensions, you should use the match() method:

from pathlib import Path

folder = Path("/path/to/dir")
patterns = ("*.jl", "*.jsonlines")

files = [f for f in folder.iterdir() if any(f.match(p) for p in patterns)]

This last one is both convenient and efficient. You can improve efficiency by placing most common patterns at the beginning of the patterns list as any is a short-circuit operator.

于 2021-04-01T13:34:21.960 回答
0

Try this:

from os.path import join
from glob import glob

files = []
for ext in ('*.jl', '*.jsonlines'):
   files.extend(glob(join("path/to/dir", ext)))

print(files)
于 2018-01-10T06:11:55.817 回答
0
keep = [".jl", ".jsonlines"]
files = [p for p in Path().rglob("*") if p.suffix in keep]
于 2022-02-05T21:51:50.103 回答