104

我有一条记录,如果它不存在,我想存在于数据库中,如果它已经存在(主键存在),我希望将字段更新到当前状态。这通常称为upsert

以下不完整的代码片段演示了什么会起作用,但它似乎过于笨拙(特别是如果有更多的列)。什么是更好/最好的方法?

Base = declarative_base()
class Template(Base):
    __tablename__ = 'templates'
    id = Column(Integer, primary_key = True)
    name = Column(String(80), unique = True, index = True)
    template = Column(String(80), unique = True)
    description = Column(String(200))
    def __init__(self, Name, Template, Desc):
        self.name = Name
        self.template = Template
        self.description = Desc

def UpsertDefaultTemplate():
    sess = Session()
    desired_default = Template("default", "AABBCC", "This is the default template")
    try:
        q = sess.query(Template).filter_by(name = desiredDefault.name)
        existing_default = q.one()
    except sqlalchemy.orm.exc.NoResultFound:
        #default does not exist yet, so add it...
        sess.add(desired_default)
    else:
        #default already exists.  Make sure the values are what we want...
        assert isinstance(existing_default, Template)
        existing_default.name = desired_default.name
        existing_default.template = desired_default.template
        existing_default.description = desired_default.description
    sess.flush()

有没有更好或更简洁的方法来做到这一点?这样的事情会很棒:

sess.upsert_this(desired_default, unique_key = "name")

虽然unique_keykwarg 显然是不必要的(ORM 应该能够很容易地解决这个问题)我添加它只是因为 SQLAlchemy 倾向于只使用主键。例如:我一直在研究Session.merge是否适用,但这仅适用于主键,在这种情况下,主键是一个自动递增的 id,对此目的并不是非常有用。

一个示例用例是在启动可能已升级其默认预期数据的服务器应用程序时。即:这个upsert没有并发问题。

4

8 回答 8

64

SQLAlchemy 确实具有“保存或更新”行为,在最近的版本中已内置到session.add.,但以前是单独的session.saveorupdate调用。这不是“upsert”,但它可能足以满足您的需求。

很高兴您询问具有多个唯一键的类;我相信这正是没有单一正确方法可以做到这一点的原因。主键也是唯一键。如果没有唯一约束,只有主键,这将是一个足够简单的问题:如果给定 ID 不存在,或者如果 ID 为 None,则创建一条新记录;否则使用该主键更新现有记录中的所有其他字段。

然而,当有额外的独特约束时,这种简单的方法就会出现逻辑问题。如果您想“更新”一个对象,并且您的对象的主键与现有记录匹配,但另一个唯一列匹配不同的记录,那么您会怎么做?同样,如果主键不匹配现有记录,但另一个唯一列匹配现有记录,那该怎么办?对于您的特定情况,可能有一个正确的答案,但总的来说,我认为没有一个正确的答案。

这就是没有内置“upsert”操作的原因。应用程序必须定义这在每个特定情况下的含义。

于 2011-08-23T19:37:00.687 回答
54

SQLAlchemy 支持ON CONFLICT两种方法on_conflict_do_update()on_conflict_do_nothing().

从文档中复制:

from sqlalchemy.dialects.postgresql import insert

stmt = insert(my_table).values(user_email='a@b.com', data='inserted data')
stmt = stmt.on_conflict_do_update(
    index_elements=[my_table.c.user_email],
    index_where=my_table.c.user_email.like('%@gmail.com'),
    set_=dict(data=stmt.excluded.data)
)
conn.execute(stmt)
于 2017-06-06T17:12:12.870 回答
24

如今,SQLAlchemy 提供了两个有用的函数on_conflict_do_nothingon_conflict_do_update. 这些函数很有用,但需要您从 ORM 接口切换到较低级别的一个 - SQLAlchemy Core

尽管这两个函数使得使用 SQLAlchemy 的语法进行更新插入并不那么困难,但这些函数远没有为更新插入提供完整的开箱即用解决方案。

我的常见用例是在单个 SQL 查询/会话执行中插入大量行。我通常在更新插入时遇到两个问题:

例如,我们已经习惯的更高级别的 ORM 功能丢失了。您不能使用 ORM 对象,而是必须ForeignKey在插入时提供 s 。

我正在使用我编写的以下函数来处理两个问题:

def upsert(session, model, rows):
    table = model.__table__
    stmt = postgresql.insert(table)
    primary_keys = [key.name for key in inspect(table).primary_key]
    update_dict = {c.name: c for c in stmt.excluded if not c.primary_key}

    if not update_dict:
        raise ValueError("insert_or_update resulted in an empty update_dict")

    stmt = stmt.on_conflict_do_update(index_elements=primary_keys,
                                      set_=update_dict)

    seen = set()
    foreign_keys = {col.name: list(col.foreign_keys)[0].column for col in table.columns if col.foreign_keys}
    unique_constraints = [c for c in table.constraints if isinstance(c, UniqueConstraint)]
    def handle_foreignkeys_constraints(row):
        for c_name, c_value in foreign_keys.items():
            foreign_obj = row.pop(c_value.table.name, None)
            row[c_name] = getattr(foreign_obj, c_value.name) if foreign_obj else None

        for const in unique_constraints:
            unique = tuple([const,] + [row[col.name] for col in const.columns])
            if unique in seen:
                return None
            seen.add(unique)

        return row

    rows = list(filter(None, (handle_foreignkeys_constraints(row) for row in rows)))
    session.execute(stmt, rows)
于 2018-07-28T02:54:49.483 回答
13

我使用“先看再跳跃”的方法:

# first get the object from the database if it exists
# we're guaranteed to only get one or zero results
# because we're filtering by primary key
switch_command = session.query(Switch_Command).\
    filter(Switch_Command.switch_id == switch.id).\
    filter(Switch_Command.command_id == command.id).first()

# If we didn't get anything, make one
if not switch_command:
    switch_command = Switch_Command(switch_id=switch.id, command_id=command.id)

# update the stuff we care about
switch_command.output = 'Hooray!'
switch_command.lastseen = datetime.datetime.utcnow()

session.add(switch_command)
# This will generate either an INSERT or UPDATE
# depending on whether we have a new object or not
session.commit()

优点是这是 db-neutral 并且我认为它很容易阅读。缺点是在如下场景中存在潜在的竞争条件

  • 我们在数据库中查询 aswitch_command并没有找到
  • 我们创建一个switch_command
  • 另一个进程或线程创建一个switch_command与我们相同的主键
  • 我们尝试承诺我们的switch_command
于 2017-10-19T20:00:34.240 回答
2

这允许基于字符串名称访问底层模型

def get_class_by_tablename(tablename):
  """Return class reference mapped to table.
  https://stackoverflow.com/questions/11668355/sqlalchemy-get-model-from-table-name-this-may-imply-appending-some-function-to
  :param tablename: String with name of table.
  :return: Class reference or None.
  """
  for c in Base._decl_class_registry.values():
    if hasattr(c, '__tablename__') and c.__tablename__ == tablename:
      return c


sqla_tbl = get_class_by_tablename(table_name)

def handle_upsert(record_dict, table):
    """
    handles updates when there are primary key conflicts

    """
    try:
        self.active_session().add(table(**record_dict))
    except:
        # Here we'll assume the error is caused by an integrity error
        # We do this because the error classes are passed from the
        # underlying package (pyodbc / sqllite) SQLAlchemy doesn't mask
        # them with it's own code - this should be updated to have
        # explicit error handling for each new db engine

        # <update>add explicit error handling for each db engine</update> 
        active_session.rollback()
        # Query for conflic class, use update method to change values based on dict
        c_tbl_primary_keys = [i.name for i in table.__table__.primary_key] # List of primary key col names
        c_tbl_cols = dict(sqla_tbl.__table__.columns) # String:Col Object crosswalk

        c_query_dict = {k:record_dict[k] for k in c_tbl_primary_keys if k in record_dict} # sub-dict from data of primary key:values
        c_oo_query_dict = {c_tbl_cols[k]:v for (k,v) in c_query_dict.items()} # col-object:query value for primary key cols

        c_target_record = session.query(sqla_tbl).filter(*[k==v for (k,v) in oo_query_dict.items()]).first()

        # apply new data values to the existing record
        for k, v in record_dict.items()
            setattr(c_target_record, k, v)
于 2019-04-05T17:25:38.013 回答
2

以下对我来说适用于 redshift 数据库,也适用于组合主键约束。

来源这个

在函数def start_engine()中创建 SQLAlchemy 引擎只需进行少量修改

from sqlalchemy import Column, Integer, Date ,Metadata
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
from sqlalchemy.dialects import postgresql

Base = declarative_base()

def start_engine():
    engine = create_engine(os.getenv('SQLALCHEMY_URI', 
    'postgresql://localhost:5432/upsert'))
     connect = engine.connect()
    meta = MetaData(bind=engine)
    meta.reflect(bind=engine)
    return engine


class DigitalSpend(Base):
    __tablename__ = 'digital_spend'
    report_date = Column(Date, nullable=False)
    day = Column(Date, nullable=False, primary_key=True)
    impressions = Column(Integer)
    conversions = Column(Integer)

    def __repr__(self):
        return str([getattr(self, c.name, None) for c in self.__table__.c])


def compile_query(query):
    compiler = query.compile if not hasattr(query, 'statement') else 
  query.statement.compile
    return compiler(dialect=postgresql.dialect())


def upsert(session, model, rows, as_of_date_col='report_date', no_update_cols=[]):
    table = model.__table__

    stmt = insert(table).values(rows)

    update_cols = [c.name for c in table.c
                   if c not in list(table.primary_key.columns)
                   and c.name not in no_update_cols]

    on_conflict_stmt = stmt.on_conflict_do_update(
        index_elements=table.primary_key.columns,
        set_={k: getattr(stmt.excluded, k) for k in update_cols},
        index_where=(getattr(model, as_of_date_col) < getattr(stmt.excluded, as_of_date_col))
        )

    print(compile_query(on_conflict_stmt))
    session.execute(on_conflict_stmt)


session = start_engine()
upsert(session, DigitalSpend, initial_rows, no_update_cols=['conversions'])
于 2019-03-26T12:41:26.637 回答
1

这对我来说适用于 sqlite3 和 postgres。尽管它可能会因组合主键约束而失败,并且很可能会因额外的唯一约束而失败。

    try:
        t = self._meta.tables[data['table']]
    except KeyError:
        self._log.error('table "%s" unknown', data['table'])
        return

    try:
        q = insert(t, values=data['values'])
        self._log.debug(q)
        self._db.execute(q)
    except IntegrityError:
        self._log.warning('integrity error')
        where_clause = [c.__eq__(data['values'][c.name]) for c in t.c if c.primary_key]
        update_dict = {c.name: data['values'][c.name] for c in t.c if not c.primary_key}
        q = update(t, values=update_dict).where(*where_clause)
        self._log.debug(q)
        self._db.execute(q)
    except Exception as e:
        self._log.error('%s: %s', t.name, e)
于 2018-11-08T09:01:56.197 回答
0

有多个答案,这里还有另一个答案(YAA)。由于涉及元编程,其他答案不那么可读。这是一个例子

  • 使用 SQLAlchemy ORM

  • 显示如果有零行使用如何创建行on_conflict_do_nothing

  • 显示如何在不使用创建新行的情况下更新现有行(如果有)on_conflict_do_update

  • 使用表主键作为constraint

原始问题中的更长示例此代码与什么相关


import sqlalchemy as sa
import sqlalchemy.orm as orm
from sqlalchemy import text
from sqlalchemy.dialects.postgresql import insert
from sqlalchemy.orm import Session

class PairState(Base):

    __tablename__ = "pair_state"

    # This table has 1-to-1 relationship with Pair
    pair_id = sa.Column(sa.ForeignKey("pair.id"), nullable=False, primary_key=True, unique=True)
    pair = orm.relationship(Pair,
                        backref=orm.backref("pair_state",
                                        lazy="dynamic",
                                        cascade="all, delete-orphan",
                                        single_parent=True, ), )


    # First raw event in data stream
    first_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))

    # Last raw event in data stream
    last_event_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))

    # The last hypertable entry added
    last_interval_at = sa.Column(sa.TIMESTAMP(timezone=True), nullable=False, server_default=text("TO_TIMESTAMP(0)"))

    @staticmethod
    def create_first_event_if_not_exist(dbsession: Session, pair_id: int, ts: datetime.datetime):
        """Sets the first event value if not exist yet."""
        dbsession.execute(
            insert(PairState).
            values(pair_id=pair_id, first_event_at=ts).
            on_conflict_do_nothing()
        )

    @staticmethod
    def update_last_event(dbsession: Session, pair_id: int, ts: datetime.datetime):
        """Replaces the the column last_event_at for a named pair."""
        # Based on the original example of https://stackoverflow.com/a/49917004/315168
        dbsession.execute(
            insert(PairState).
            values(pair_id=pair_id, last_event_at=ts).
            on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_event_at": ts})
        )

    @staticmethod
    def update_last_interval(dbsession: Session, pair_id: int, ts: datetime.datetime):
        """Replaces the the column last_interval_at for a named pair."""
        dbsession.execute(
            insert(PairState).
            values(pair_id=pair_id, last_interval_at=ts).
            on_conflict_do_update(constraint=PairState.__table__.primary_key, set_={"last_interval_at": ts})
        )
于 2021-06-14T11:24:16.140 回答