0

我有一个生成数据并将其写入数据库的循环:

myDatabase = Database('myDatabase')
for i in range(10):
    #some code here that generates dictionaries that can be saved as activities
     myDatabase.write({('myDatabase', 'valid code'): activityDict})

这样创建的单个活动可以保存到数据库中。但是,当创建多个时,数据库的长度始终为 1,并且只有最后一个活动进入数据库。

因为我有很多非常大的数据集,将它们全部存储在一个字典中并一次写入数据库并不方便。

有没有办法将活动增量添加到现有数据库?

4

1 回答 1

1

正常活动写作

Database.write()将替换整个数据库。最好的方法是在python中创建数据库,然后编写整个东西:

data = {}
for i in range(10):
    # some code here that generates data
    data['foo'] = 'bar'
Database('myDatabase').write(data)

动态生成数据集

但是,如果您从现有数据库动态创建聚合数据集,则可以在自定义生成器中创建单个数据集。该生成器需要支持以下内容:

  • __iter__:返回数据库键。用于检查每个数据集是否属于正在写入的数据库。因此我们只需要返回第一个元素。
  • __len__:要写入的数据集数。
  • keys: 用于将键添加到mapping.
  • values:用于将活动地点添加到geomapping。由于我们的源数据库和聚合系统数据库中的位置相同,因此我们可以在此处提供原始数据集。
  • items:新的键和数据集。

这是代码:

class IterativeSystemGenerator(object):
    def __init__(self, from_db_name, to_db_name):
        self.source = Database(from_db_name)
        self.new_name = to_db_name
        self.lca = LCA({self.source.random(): 1})
        self.lca.lci(factorize=True)

    def __len__(self):
        return len(self.source)

    def __iter__(self):
        yield ((self.new_name,))

    def get_exchanges(self):
        vector = self.lca.inventory.sum(axis=1)
        assert vector.shape == (len(self.lca.biosphere_dict), 1)
        return [{
                    'input': flow,
                    'amount': float(vector[index]),
                    'type': 'biosphere',
                } for flow, index in self.lca.biosphere_dict.items()
                if abs(float(vector[index])) > 1e-17]

    def keys(self):
        for act in self.source:
            yield (self.new_name, act['code'])

    def values(self):
        for act in self.source:
            yield act

    def items(self):
        for act in self.source:
            self.lca.redo_lci({act: 1})
            obj = copy.deepcopy(act._data)
            obj['database'] = self.new_name
            obj['exchanges'] = self.get_exchanges()
            yield ((self.new_name, obj['code']), obj)

和用法:

new_name = "ecoinvent 3.2 cutoff aggregated"
new_data = IterativeSystemGenerator("ecoinvent 3.2 cutoff", new_name)
Database(new_name).write(new_data)            

这种方法的局限性

If you are writing so many datasets or exchanges within datasets that you are running into memory problems, then you are also probably using the wrong tool. The current system of database tables and matrix builders uses sparse matrices. In this case, dense matrices would make much more sense. For example, the IO table backend skips the database entirely, and just writes processed arrays. It will take a long time to load and create the biosphere matrix if it has 13.000 * 1.500 = 20.000.000 entries. In this specific case, my first instinct is to try one of the following:

  • Don't write the biosphere flows into the database, but save them separately per aggregated process, and then add them after the inventory calculation.
  • Create a separate database for each aggregated system process.
于 2016-07-14T21:24:53.620 回答