1

我有一个 S3 服务器,每个存储桶下都有数百万个文件。我想从存储桶下载文件,但只下载满足特定条件的文件。有没有比获取所有存储桶然后在遍历文件时检查特定条件更好的方法?可以在这里看到:

import os
# Import the SDK
import boto
from boto.s3.connection import OrdinaryCallingFormat

LOCAL_PATH = 'W:/RD/Fancy/s3_opportunities/'

bucket_name = '/recording'#/sampledResponseLogger'

# connect to the bucket
print 'Connecting...'
conn = boto.connect_s3(calling_format=OrdinaryCallingFormat()) #conn = boto.connect_s3()

print 'Getting bucket...'
bucket = conn.get_bucket(bucket_name)

print 'Going through the list of files...' 
bucket_list = bucket.list()

for l in bucket_list:

    keyString = str(l.key)

    # SOME CONDITION
    if('2015-08' in keyString):

        # check if file exists locally, if not: download it
        filename=LOCAL_PATH+keyString[56:]
        if not os.path.exists(filename):

            print 'Downloading file: ' + keyString + '...'

            # Download the object that the key represents
            l.get_contents_to_filename(filename)
4

1 回答 1

0

唯一可用于服务器端过滤ListBucket操作的机制是prefix. 因此,如果 S3 中的对象具有某种隐含的目录结构(例如foo/bar/fie/baz/object1),那么您可以使用前缀仅列出以 . 开头的对象foo/bar/fie。如果您的对象名称不显示此分层命名,那么您实际上无能为力,只能列出所有对象并使用您自己的机制进行过滤。

于 2015-09-15T16:38:09.963 回答