python - s3fs 和 Python os.walk

Question

我正在尝试找出一种从 S3 存储桶中读取图像的方法。现在，我的设置是使用 s3fs 挂载存储桶，然后使用 python 脚本os.walk遍历每个单独的图像并使用 numpy 对它们进行一些操作。

然而，输出

os.walk("mnt/")

没什么！该命令在已安装的驱动器中看不到任何文件，但如果我手动找到图像

plt.imread("mnt/path/to/file")

我收到图像。我束手无策，试图弄清楚这一点。有任何想法吗？

score 1 · Accepted Answer

从 S3 挂载的存储桶的行为与文件系统中的普通文件/目录不同，因此类似语句os.walk不会像您预期的那样工作。最好的办法是使用库从 Python 本身中搜索 S3 存储桶并与之交互。

我建议研究一下 boto，它有很多与 AWS 交互的工具。另请查看 AWS Python 开发工具包。

Boto：https ://github.com/boto/boto 适用于 Python 的 AWS 开发工具包：https ://aws.amazon.com/sdk-for-python/

score 0 · Accepted Answer

As an alternative I have implemented something similar to os.walk() using just boto3.

see my answer in related question.

score 0 · Accepted Answer

你可以做：

s3 = s3fs.S3FileSystem()
for dirpath, dirnames, filename in s3.walk(<your bucket name>):
# care about the how many directories your bucket have
    for filename in filenames:
        file_path = f'{dirpath}{filepath}'
            with s3.open(file_path, 'rb') as f:
                # do your numpy stuff with the "f" object

上面的代码会循环遍历整个bucket，并且只有当bucket的根目录有文件时才有效，如果之前有目录，则添加if语句，例如：

if dirpath.split('/') == <depth of the directory with the files>:

score 0 · Accepted Answer

这里有一些错误。我认为{dirpath}{filepath}应该{dirpath}/{filename}在第 5 行，filename应该filenames在第 2 行，但在其他方面很有帮助！

python - s3fs 和 Python os.walk

4 回答 4

Related

Reference