python - 将抓取的数据 csv 文件从 docker 容器内保存到本地主机

Question

我运行 python webscraper 来收集各种网站上的文章，然后将其保存为 csv 文件。我一直在手动运行这些，但最近一直在尝试在 google cloud shell 中运行它们。我在依赖关系方面遇到了一些问题，所以我决定构建一个 docker 映像来运行我的 python scraper

到目前为止，我已经成功创建了一个 Dockerfile，用于构建具有所有必要依赖项的容器。

FROM python:3
# Set the working directory to /app
WORKDIR /app

# Copy the current directory contents into the container at /app
ADD . /app

# Install any needed packages specified in requirements.txt
RUN pip install --trusted-host pypi.python.org -r requirements.txt
RUN pip install lxml
COPY Fin24 ./Fin24
COPY scraped_list.csv ./scraped_list.csv

# Run fin24.py when the container launches
CMD ["python3", "fin24.py"]

fin24.py 包含我的刮板。Fin24 是一个 txt 文件，其中包含我的爬虫抓取文章链接的所有基本 URL，然后再进入每篇文章并提取内容。scraped_list.csv 包含我之前抓取的所有网站，我的 python 脚本会检查这些网站以确保我不会再次抓取同一篇文章。

运行上述内容后，我可以看到它有效。python 脚本在它找到的所有网站都被抓取后停止。但是，我猜它正在将 csv 文件（输出）保存在 docker 容器中。我怎么能把它保存到我正在运行 docker 的目录中？

最终，我想简单地将 Dockerfile 上传到我的谷歌云 shell，并将其作为 cronjob 运行，并将所有输出保存在 shell 中。任何帮助将非常感激

score 0 · Accepted Answer

您将需要在 docker 部署中安装该路径。为此，您需要做两件事： 1. 在 Dockerfile 中添加一个卷

WORKDIR /path/in/container
VOLUME ["/path/in/container"]

2. 使用 -v 选项运行你的容器

docker run -i -t -v /path/on/host:/path/in/container:rw "container name"

python - 将抓取的数据 csv 文件从 docker 容器内保存到本地主机

1 回答 1

Related

Reference