python - 如何让 Python 使所有相同的字符串使用相同的内存？

Question

可能重复：
python实习生做什么，什么时候应该使用它？

我正在使用 python 中的一个程序，该程序必须与具有数百万个字符串对象的数组相关联。我发现如果它们都来自同一个带引号的字符串，那么每个附加的“字符串”只是对第一个主字符串的引用。但是，如果从文件中读取字符串，并且字符串都相等，则每个字符串仍需要新的内存分配。

也就是说，这需要大约 14meg 的存储空间：

a = ["foo" for a in range(0,1000000)]

虽然这需要超过 65meg 的存储空间：

a = ["foo".replace("o","1") for a in range(0,1000000)]

现在我可以用这个来减少内存占用的空间：

s = {"f11":"f11"}
a = [s["foo".replace("o","1")] for a in range(0,1000000)]

但这似乎很愚蠢。有没有更简单的方法来做到这一点？

score 14 · Accepted Answer

只需执行一个intern()，它告诉 Python 从内存中存储和获取字符串：

a = [intern("foo".replace("o","1")) for a in range(0,1000000)]

这也导致大约 18MB，与第一个示例相同。

如果您使用 python3，还请注意下面的评论。谢谢@Abe Karplus

score 0 · Accepted Answer

你可以尝试这样的事情：

strs=["this is string1","this is string2","this is string1","this is string2",
      "this is string3","this is string4","this is string5","this is string1",
      "this is string5"]
new_strs=[]
for x in strs:
    if x in new_strs:
        new_strs.append(new_strs[new_strs.index(x)]) #find the index of the string
                                                     #and instead of appending the
                                                #string itself, append it's reference.
    else:
        new_strs.append(x)

print [id(y) for y in new_strs]

相同的字符串现在将具有相同的id()

输出：

[18632400, 18632160, 18632400, 18632160, 18651400, 18651440, 18651360, 18632400, 18651360]

score -1 · Accepted Answer

保存一个可见字符串的字典应该可以工作

new_strs = []
str_record = {}
for x in strs:
    if x not in str_record:
        str_record[x] = x
    new_strs.append(str_record[x])

（未经测试。）

python - 如何让 Python 使所有相同的字符串使用相同的内存？

3 回答 3

Related

Reference