0

I want to implement a python module with one method, which first loads a big file and after that apply the filtering to the parameters, like this:

def filter(word_list):
    filtered_words = []
    special_words =  [line.strip() for line in open('special_words.txt', 'r')]
    for w in word_list:
        if not w in special_words
            filtered_words.append(w)
    return filtered_words

The problem is, that I want to load this file only once for the hole execution, and not every time I call this method. In Java I can just use the static blocks for this purpose, but what options do I have in python?

4

3 回答 3

4

您可以将文件加载到模块全局范围内的列表中;此代码只会在第一次导入模块时运行一次。

于 2013-06-18T15:47:21.827 回答
2

对我来说,这听起来像你想要记忆函数,这样当你用已知参数调用它时,它会返回已知响应而不是重做它......这个特定的实现来自http://wiki.python.org/moin/PythonDecoratorLibrary#记忆

虽然这个问题可能有点矫枉过正,但 memoize 是一个非常有用的模式

import collections
import functools

class memoized(object):
   '''Decorator. Caches a function's return value each time it is called.
   If called later with the same arguments, the cached value is returned
   (not reevaluated).
   '''
   def __init__(self, func):
      self.func = func
      self.cache = {}
   def __call__(self, *args):
      if not isinstance(args, collections.Hashable):
         # uncacheable. a list, for instance.
         # better to not cache than blow up.
         return self.func(*args)
      if args in self.cache:
         return self.cache[args]
      else:
         value = self.func(*args)
         self.cache[args] = value
         return value
   def __repr__(self):
      '''Return the function's docstring.'''
      return self.func.__doc__
   def __get__(self, obj, objtype):
      '''Support instance methods.'''
      return functools.partial(self.__call__, obj)

@memoized
def get_words(fname):
   return list(open(fname, 'r')) 

@memoized
def filter(word_list):
    filtered_words = []
    special_words =  [line.strip() for line in get_words("special_words.txt")]
    for w in word_list:
        if not w in special_words
            filtered_words.append(w)
    return filtered_words

在旁注中,一个巧妙的技巧是

 print set(word_list).difference(special_words) 

这应该更快(假设您不关心丢失的重复项)

于 2013-06-18T15:47:41.197 回答
1

您想预先构建单词集,这样您就不会在每次调用函数时都读取文件。此外,您可以使用列表理解来简化过滤器功能:

with open('special_words.txt', 'r') as handle:
    special_words = {line.strip() for line in handle}

def filter(word_list):
    return [word for word in word_list if word not in special_words]
于 2013-06-18T15:52:09.103 回答