0

我正在尝试在 docker 容器中运行 python 脚本。版本是

 Cloud integration: 1.0.4
 Version:           20.10.2
 API version:       1.41
 Go version:        go1.13.15
 Git commit:        2291f61
 Built:             Mon Dec 28 16:12:42 2020
 OS/Arch:           darwin/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.2
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       8891c58
  Built:            Mon Dec 28 16:15:23 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          v1.4.3
  GitCommit:        269548fa27e0089a8b8278fc4fc781d7f65a939b
 runc:
  Version:          1.0.0-rc92
  GitCommit:        ff819c7e9184c13b7c2607fe6c30ae19403a7aff
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

我正在尝试使用此功能从目录中读取大约 99k 文件

import os
  
# Prepare a list of file names
corpus_path = 'data/cnn/'
corpus_filenames = []
i = 0
limit = 10000
for entry in os.scandir('data/cnn'):
    if not entry.name.startswith('.') and entry.is_file():
        print(entry)
        if limit is not None:
            if i >= limit:
                break
        corpus_filenames.append(os.path.join(corpus_path, entry.name))
        i += 1
# What did we find?
N_files = len(corpus_filenames)
print(N_files)

我得到了错误

    for entry in os.scandir(corpus_path):
OSError: [Errno 5] Input/output error: 'data/cnn/'

此错误仅发生在 docker 容器内。但是如果我在外面运行这个脚本,它不会显示任何错误,只是从目录中读取文件。

这也是Dockefile

FROM ubuntu:18.04
ENTRYPOINT [ "/bin/bash", "-l", "-i", "-c" ]

# Set the mirror for `apt-get` to talk to.  This seems to have helps a situation where some packages below
# will sometimes work and sometimes give an IP Not Found error.  It's still not perfect.
RUN sed --in-place --regexp-extended "s/(\/\/)(archive\.ubuntu)/\us.\2/" /etc/apt/sources.list && \
    apt-get update && apt-get upgrade --yes

# delete all the apt list files since they're big and get stale quickly
RUN rm -rf /var/lib/apt/lists/*
# this forces "apt-get update" in dependent images, which is also good
# (see also https://bugs.launchpad.net/cloud-images/+bug/1699913)

# enable the universe
RUN sed -i 's/^#\s*\(deb.*universe\)$/\1/g' /etc/apt/sources.list

# make systemd-detect-virt return "docker"
# See: https://github.com/systemd/systemd/blob/aa0c34279ee40bce2f9681b496922dedbadfca19/src/basic/virt.c#L434
RUN mkdir -p /run/systemd && echo 'docker' > /run/systemd/container

# Clean cache and basic repository setup
RUN apt-get clean
RUN apt-get update && apt-get update --fix-missing
RUN apt-get install -y software-properties-common
RUN printf 'Y' | apt-get install apt-utils
RUN printf 'Y' | apt-get install vim
RUN apt-get update && export PATH
RUN apt-get install bc

# `libpython3.6-dev` is required for `python3-pip`
RUN printf 'Y' | apt-get install libpython3.6-dev
RUN printf 'Y' | apt-get install python3-pip

# AWS Python SDK and CLI installations
RUN apt-get install -y unzip
RUN apt-get install -y curl
RUN curl "https://awscli.amazonaws.com/awscli-exe-linux-x86_64.zip" -o "awscliv2.zip"
RUN unzip awscliv2.zip
RUN ./aws/install

# Python dependencies
COPY requirements.txt .
RUN pip3 install -r requirements.txt

# NLTK
RUN python3.6 -c "import nltk; nltk.download('stopwords'); \ 
    nltk.download('punkt'); \
    nltk.download('averaged_perceptron_tagger'); \
    nltk.download('maxent_ne_chunker'); \
    nltk.download('words');"
RUN cp -r /root/nltk_data /usr/share/nltk_data

# Set python 3.7 as the default for the container
RUN ln -s /usr/bin/python3.6 /usr/bin/python

# Set root password
RUN echo "root:##abc%%" | chpasswd

# Install sudo
RUN apt-get update && apt-get -y install sudo

# overwrite this with 'CMD []' in a dependent Dockerfile
CMD ["/bin/bash"]

# Create and boot into a development user instead of working as root
RUN groupadd -r sophia -g 901
RUN useradd -u 901 -r -g sophia sophia
RUN echo "sophia:##abc%%" | chpasswd
RUN adduser rmarkbio sudo
RUN mkdir /home/sophia
RUN mkdir /home/sophia/project
RUN mkdir /home/sophia/logs
RUN chown -R sophia /home/sophia
USER sophia
WORKDIR /home/sophia/project

请帮忙!我一直在努力解决这个问题!!!!

编辑:似乎我的 docker 无法正确挂载本地目录。我的 docker run 脚本看起来像这样。

docker run -i -t \
            --entrypoint /bin/bash \
            --net="host" \
            --name=$CONTAINER_NAME \
            -v $PWD:/home/sophia/project \
            -v $PWD/../logs:/home/sophia/logs \
            -v ~/.ssh/id_rsa:/root/.ssh/id_rsa \
            -e GEMFURY_TOKEN=$GEMFURY_TOKEN \
            $USER_NAME/$IMAGE_NAME:$VERSION
        ;;
4

1 回答 1

0

os.scandir()应该返回一个生成器。但如果在 docker 中运行,情况似乎并非如此。我必须在外面运行它并生成结果并安装结果。

于 2022-01-04T16:10:54.683 回答