arrays - 使用 Google Colab 时如何保存 np.array 的结果以供将来使用

Question

我正在做一个信息检索项目。为此，我正在使用 Google Colab。我正处于计算一些特征（“ input_features ”）并且通过执行 for 循环获得标签（“ labels ”）的阶段，这花了我大约 4 个小时来完成。

所以最后我将结果附加到一个数组中：

input_features = np.array(input_features)
labels = np.array(labels)

所以我的问题是：在使用 google colab 时，是否可以保存这些结果以便将来使用它们？

我找到了 2 个可能适用的选项，但我不知道这些文件是在哪里创建的。

1) 将它们保存为 csv 文件。我的代码是：

from numpy import savetxt
# save to csv file
savetxt('input_features.csv', input_features, delimiter=',')
savetxt('labels.csv', labels, delimiter=',')

为了加载它们：

from numpy import loadtxt
# load array
input_features = loadtxt('input_features.csv', delimiter=',')
labels = loadtxt('labels.csv', delimiter=',')
# print the array
print(input_features)
print(labels)

但是当我打印时我仍然没有得到任何东西。

2）使用pickle保存数组的结果，我从这里按照以下说明操作： https ://colab.research.google.com/drive/1EAFQxQ68FfsThpVcNU7m8vqt4UZL0Le1#scrollTo=gZ7OTLo3pw8M

from google.colab import files
import pickle
def features_pickeled(input_features, results):
  input_features = input_features + '.txt'
  pickle.dump(results, open(input_features, 'wb'))
  files.download(input_features)
def labels_pickeled(labels, results):
  labels = labels + '.txt'
  pickle.dump(results, open(labels, 'wb'))
  files.download(labels)

并将它们加载回来：

def load_from_local():
  loaded_features = {}
  uploaded = files.upload()
  for input_features in uploaded.keys():
      unpickeled_features = uploaded[input_features]
      loaded[input_features] = pickle.load(BytesIO(data)) 
  return loaded_features 
def load_from_local():
  loaded_labels = {}
  uploaded = files.upload()
  for labels in uploaded.keys():
      unpickeled_labels = uploaded[labels]
      loaded[labels] = pickle.load(BytesIO(data))
  return loaded_labes

#How do I print the pickled files to see if I have them ready for use???

使用 python 时，我会为泡菜做这样的事情：

#Create pickle file
with open("name.pickle", "wb") as pickle_file:
     pickle.dump(name, pickle_file)
#Load the pickle file
with open("name.pickle", "rb") as name_pickled:
     name_b = pickle.load(name_pickled)

但问题是我没有在我的谷歌驱动器中看到任何要创建的文件。

我的代码是正确的还是我错过了代码的某些部分？

详细的描述，希望能详细解释我想要做什么以及我为这个问题做了什么。

预先感谢您的帮助。

score 2 · Accepted Answer

当您断开连接并重新连接时，Google Colaboratory 笔记本实例永远无法保证能够访问相同的资源，因为它们是在虚拟机上运行的。因此，您不能在 Colab 中“保存”您的数据。以下是一些解决方案：

Colab 会保存您的代码。如果您引用的 for 循环操作不需要很长时间来运行，只需保留代码并在每次连接笔记本时运行它。
查看np.save。此功能允许您将数组保存到二进制文件中。然后，您可以在重新连接笔记本时重新上传二进制文件。更好的是，您可以将二进制文件存储在 Google Drive 上，将您的驱动器安装到您的 notebook上，然后像这样引用它。

score 1 · Accepted Answer

# Mount driver to authenticate yourself to gdrive
from google.colab import drive
drive.mount('/content/gdrive')

#---

# Import necessary libraries
import numpy as np
from numpy import savetxt
import pandas as pd

#---

# Create array
arr = np.array([1, 2, 3, 4, 5])

# save to csv file
savetxt('arr.csv', arr, delimiter=',')  # You will see the results if you press in the File icon (left panel)

然后您可以通过以下方式再次加载它：

# You can copy the path when you find your file in the file icon
arr = pd.read_csv('/content/arr.csv', sep=',', header=None) # You can also save your result as a txt file
arr

arrays - 使用 Google Colab 时如何保存 np.array 的结果以供将来使用

2 回答 2

Related

Reference