我正在尝试在 docker 容器中运行 python 脚本。版本是
Cloud integration: 1.0.4
Version: 20.10.2
API version: 1.41
Go version: go1.13.15
Git commit: 2291f61
Built: Mon Dec 28 16:12:42 2020
OS/Arch: darwin/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Version: 20.10.2
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: 8891c58
Built: Mon Dec 28 16:15:23 2020
OS/Arch: linux/amd64
Experimental: false
Version: v1.4.3
GitCommit: 269548fa27e0089a8b8278fc4fc781d7f65a939b
Version: 1.0.0-rc92
GitCommit: ff819c7e9184c13b7c2607fe6c30ae19403a7aff
Version: 0.19.0
GitCommit: de40ad0
我正在尝试使用此功能从目录中读取大约 99k 文件
import os
# Prepare a list of file names
corpus_path = 'data/cnn/'
corpus_filenames = []
i = 0
limit = 10000
for entry in os.scandir('data/cnn'):
if not entry.name.startswith('.') and entry.is_file():
if limit is not None:
if i >= limit:
corpus_filenames.append(os.path.join(corpus_path, entry.name))
i += 1
# What did we find?
N_files = len(corpus_filenames)
for entry in os.scandir(corpus_path):
OSError: [Errno 5] Input/output error: 'data/cnn/'
此错误仅发生在 docker 容器内。但是如果我在外面运行这个脚本,它不会显示任何错误,只是从目录中读取文件。
FROM ubuntu:18.04
ENTRYPOINT [ "/bin/bash", "-l", "-i", "-c" ]
# Set the mirror for `apt-get` to talk to. This seems to have helps a situation where some packages below
# will sometimes work and sometimes give an IP Not Found error. It's still not perfect.
RUN sed --in-place --regexp-extended "s/(\/\/)(archive\.ubuntu)/\us.\2/" /etc/apt/sources.list && \
apt-get update && apt-get upgrade --yes
# delete all the apt list files since they're big and get stale quickly
RUN rm -rf /var/lib/apt/lists/*
# this forces "apt-get update" in dependent images, which is also good
# (see also https://bugs.launchpad.net/cloud-images/+bug/1699913)
# enable the universe
RUN sed -i 's/^#\s*\(deb.*universe\)$/\1/g' /etc/apt/sources.list
# make systemd-detect-virt return "docker"
# See: https://github.com/systemd/systemd/blob/aa0c34279ee40bce2f9681b496922dedbadfca19/src/basic/virt.c#L434
RUN mkdir -p /run/systemd && echo 'docker' > /run/systemd/container
# Clean cache and basic repository setup
RUN apt-get clean
RUN apt-get update && apt-get update --fix-missing
RUN apt-get install -y software-properties-common
RUN printf 'Y' | apt-get install apt-utils
RUN printf 'Y' | apt-get install vim
RUN apt-get update && export PATH
RUN apt-get install bc
# `libpython3.6-dev` is required for `python3-pip`
RUN printf 'Y' | apt-get install libpython3.6-dev
RUN printf 'Y' | apt-get install python3-pip
# AWS Python SDK and CLI installations
RUN apt-get install -y unzip
RUN apt-get install -y curl
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
RUN unzip awscliv2.zip
RUN ./aws/install
# Python dependencies
COPY requirements.txt .
RUN pip3 install -r requirements.txt
RUN python3.6 -c "import nltk; nltk.download('stopwords'); \
nltk.download('punkt'); \
nltk.download('averaged_perceptron_tagger'); \
nltk.download('maxent_ne_chunker'); \
RUN cp -r /root/nltk_data /usr/share/nltk_data
# Set python 3.7 as the default for the container
RUN ln -s /usr/bin/python3.6 /usr/bin/python
# Set root password
RUN echo "root:##abc%%" | chpasswd
# Install sudo
RUN apt-get update && apt-get -y install sudo
# overwrite this with 'CMD []' in a dependent Dockerfile
CMD ["/bin/bash"]
# Create and boot into a development user instead of working as root
RUN groupadd -r sophia -g 901
RUN useradd -u 901 -r -g sophia sophia
RUN echo "sophia:##abc%%" | chpasswd
RUN adduser rmarkbio sudo
RUN mkdir /home/sophia
RUN mkdir /home/sophia/project
RUN mkdir /home/sophia/logs
RUN chown -R sophia /home/sophia
USER sophia
WORKDIR /home/sophia/project
编辑:似乎我的 docker 无法正确挂载本地目录。我的 docker run 脚本看起来像这样。
docker run -i -t \
--entrypoint /bin/bash \
--net="host" \
-v $PWD:/home/sophia/project \
-v $PWD/../logs:/home/sophia/logs \
-v ~/.ssh/id_rsa:/root/.ssh/id_rsa \