0

I'm looking for simplification/encapsulation so my existing programs that use (sic) open("my_file.txt") can be ported to colaboratory with the minimum change in the existing logic flow. Happy to have some cut/paste logic before my existing logic.

The mental model I understand from google (here) is that I have to do these prerequisites to get my file loaded.

  1. upload to google drive
  2. download to python (vm, probably in /tmp)

And then I can execute my existing code w/o change.

Therefore the I suspect/propose that what works for me (but not just me!) would be an interface/function as follows:

  • inputs (from local computer)
    • source_file_dir
    • source_file_name
    • (of course authentication inputs are implicitly required)
  • output
    • python_vm_file_dir (dir I can use in my program; /tmp is fine)
    • (implicitly I expect the same dest_file_name)

With this code snippet, I code easily move code into colaboratory.

Has anyone already created this?

Thank you.

4

2 回答 2

2

我一直在处理类似的问题。就简单性而言,我发现将数据文件保存在 Google Cloud Storage 中最容易。教程中对此进行了很好的解释 - https://colab.research.google.com/notebook#fileId=/v2/external/notebooks/io.ipynb

我发现最简单的做法是插入单元以将数据复制到运行笔记本的 VM

!gsutil cp gs://{bucket_name}/to_upload.txt /tmp/gsutil_download.txt

这样,我通常可以让“活动”代码块与我在本地运行的代码块相同。

我外出时使用 chromebook,因此希望尽可能多地保存在云端。将“映射网络驱动器”(在 Windows 中)设置到 GCS 存储桶非常容易 - 用于移动文件。在 Linux 上也很容易。Windows,我发现这个实用程序真的很方便https://www.cloudberrylab.com/drive/google-cloud.aspx - 不是广告,我只是一个粉丝。

于 2017-12-24T16:30:01.540 回答
0

上传到 Google 云端硬盘。这是一个直接访问它的代码片段。

!apt-get install -y -qq software-properties-common python-software-properties 
module-init-tools
!add-apt-repository -y ppa:alessandro-strada/ppa 2>&1 > /dev/null
!apt-get update -qq 2>&1 > /dev/null
!apt-get -y install -qq google-drive-ocamlfuse fuse
from google.colab import auth
auth.authenticate_user()
from oauth2client.client import GoogleCredentials
creds = GoogleCredentials.get_application_default()
import getpass
!google-drive-ocamlfuse -headless -id={creds.client_id} -secret= 
{creds.client_secret} < /dev/null 2>&1 | grep URL
vcode = getpass.getpass()
!echo {vcode} | google-drive-ocamlfuse -headless -id={creds.client_id} -secret= 
{creds.client_secret}

现在创建一个驱动器目录

!mkdir -p drive
!google-drive-ocamlfuse drive

您可以简单地访问谷歌驱动器中存在的任何文件作为驱动器/文件名

例如。

df = pandas.read_hdf("drive/Colab Notebooks/S2C5_complete_cleaned_by_me_10percent.h5")

此外,您只需为一个笔记本执行一次。之后,您也可以访问其他笔记本中的数据。

于 2018-06-28T06:31:06.587 回答