2

这是我之前关于 Python 中的字符串实习的问题的后续,尽管我认为它的相关性不足以作为一个单独的问题。简而言之,在使用 sys.intern 时,我是否需要在大多数/每次使用时将有问题的字符串传递给函数,或者我只需要将字符串实习一次并跟踪它的引用?为了澄清一个伪代码用例,我做了我认为正确的事情:(见评论)

# stores all words in sequence, 
# we want duplicate words too,
# but those should refer to the same string
# (the reason we want interning)
word_sequence = []
# simple word count dictionary
word_dictionary = {}
for line in text:
    for word in line: # using magic unspecified parsing/tokenizing logic
        # returns a canonical "reference"
        word_i = sys.intern(word)
        word_sequence.append(word_i)
        try:
            # do not need to intern again for
            # specific use as dictonary key,
            # or is something undesirable done
            # by the dictionary that would require 
            # another call here?
            word_dictionary[word_i] += 1 
        except KeyError:
            word_dictionary[word_i] = 1

# ...somewhere else in a function far away...
# Let's say that we want to use the word sequence list to
# access the dictionary (even the duplicates):
for word in word_sequence:
    # Do NOT need to re-sys.intern() word
    # because it is the same string object
    # interned previously?
    count = word_dictionary[word]
    print(count)

如果我想访问不同字典中的单词怎么办?插入 key:value 时是否需要再次使用 sys.intern(),即使该键已被实习?我可以澄清一下吗?先感谢您。

4

1 回答 1

1

sys.intern() 每次有新的字符串对象时都必须使用,否则不能保证所表示的值具有相同的对象。

但是,您的word_seq列表包含对内部字符串对象的引用。您不必sys.intern()再次使用这些。任何时候都不会在此处创建字符串的副本(这将是不必要和浪费的)。

所做sys.intern()的只是将字符串映射到具有该值的特定对象。只要您保留对返回值的引用,就可以保证您仍然可以访问该特定对象。

于 2017-01-01T20:20:33.290 回答