dataframe - SFrame 列类型：字典

Question

当我运行时：

my_sframe['col_1'] = ''

我得到一个空白列，这就是我想要的。

但是，当我运行时：

my_sframe['col_1'] = {}

我收到一个错误，指出意外的数据类型。

SFrame API 没有解决这个问题，如下所示：

https://turi.com/products/create/docs/generated/graphlab.SFrame.html

在这一点上，我的理解是 SFrame 列不能是字典。

但是，出于好奇，我尝试了这个：

my_sframe['col_1'] = graphlab.text_analytics.count_words('my_text')

type(my_sframe['col_1'][1])

out: dict

这个结果确实与我之前的理解相反。

我想要的是一个字典列，每一行都有自己的字典，就像.count_words我使用从头开始制作的 word_count 字典一样，通过import string.

这是一条双向的街道，还是.count_words一个例外，我不应该期望能够重现这种数据转换？

请指教，

谢谢

更新

这里似乎是 GitHub 上的一些相关信息：

https://github.com/turi-code/how-to/blob/master/sframe_pack.py

我不确定这种技术是否可以创造出我所追求的东西，我仍在尝试。让我知道是否有人对此有任何想法。

score 0 · Accepted Answer

我仍然愿意接受更有效的答案，但与此同时，如果其他人遇到此问题，这是创建 SFrame 字典列的一种方法。我刚刚想通了：

def count_words(text):
    words = text.split()
    wordfreq = {}
    for x in words:
        if x not in wordfreq:
            wordfreq[x] = 0
        wordfreq[x] += 1
    return wordfreq

sframe['word_count'] = sframe['text'].apply(count_words)

你会注意到 dtype 是 dict。好像有点复杂。我仍然很想知道为什么我们不能只对新列使用强制转换方法，而不是说错误：意外数据类型。

dataframe - SFrame 列类型：字典

1 回答 1

Related

Reference