scala - 性能灵活查询一对多

Question

我正在使用 play 2.5 和 slick 3.1.1，我正在尝试为一对多和一对一的多个关系构建最佳查询。我有一个这样的数据库模型：

case class Accommodation(id: Option[Long], landlordId: Long, name: String)
case class LandLord(id: Option[Long], name: String)
case class Address(id: Option[Long], accommodationId: Long, street: String)
case class ExtraCharge(id: Option[Long], accommodationId: Long, title: String)

对于数据输出：

case class AccommodationFull(accommodation: Accommodation, landLord: LandLord, extraCharges:Seq[ExtraCharge], addresses:Seq[Address])

我创建了两个查询来通过 id 获取住宿：

/** Retrieve a accommodation from the id. */
def findByIdFullMultipleQueries(id: Long): Future[Option[AccommodationFull]] = {
  val q = for {
    (a, l) <- accommodations join landLords on (_.landlordId === _.id)
    if a.id === id
  } yield (a, l)

  for {
    (data) <- db.run(q.result.headOption)
    (ex) <- db.run(extraCharges.filter(_.accommodationId === id).result)
    (add) <- db.run(addresses.filter(_.accommodationId === id).result)
  } yield data.map { accLord => AccommodationFull(accLord._1, accLord._2, ex, add) }
}

/** Retrieve a accommodation from the id. */
def findByIdFull(id: Long): Future[Option[AccommodationFull]] = {

  val qr = accommodations.filter(_.id === id).join(landLords).on(_.landlordId === _.id)
    .joinLeft(extraCharges).on(_._1.id === _.accommodationId)
    .joinLeft(addresses).on(_._1._1.id === _.accommodationId)
      .result.map { res =>
    res.groupBy(_._1._1._1.id).headOption.map {

       case (k, v) =>
         val addresses = v.flatMap(_._2).distinct
         val extraCharges = v.flatMap(_._1._2).distinct
         val landLord = v.map(_._1._1._2).head
         val accommodation = v.map(_._1._1._1).head
         AccommodationFull(accommodation, landLord, extraCharges, addresses)
    }
  }

  db.run(qr)
}

经过测试，多个查询比连接快 5 倍。如何创建更优化的连接查询？

=== 更新 ===

我现在正在使用数据在 postgresql 9.3 上进行测试：

private[bootstrap] object InitialData {

  def landLords = (1L to 10000L).map { id =>
    LandLord(Some(id), s"Good LandLord $id")
  }

  def accommodations = (1L to 10000L).map { id =>
    Accommodation(Some(id), s"Nice house $id", 100 * id, 3, 5, 500, 1l, None)
  }

  def extraCharge = (1L to 10000L).flatMap { id =>
    (1 to 100).map { nr =>
      ExtraCharge(None, id, s"Extra $nr", 100.0)
    }
  }

  def addresses = (1L to 1000L).flatMap { id =>
    (1 to 100).map {  nr =>
      Address(None, id, s"Słoneczna 4 - $nr", "17-200", "", "PL")
    }
  }
}

这里是多次运行的结果（毫秒）：

JOIN: 367
MULTI: 146
JOIN: 306
MULTI: 110
JOIN: 300
MULTI: 103

== 更新 2 ==

添加索引后它会更好，但仍然 multi 更快：

def accommodationLandLordIdIndex = index("ACCOMMODATION__LANDLORD_ID__INDEX", landlordId, unique = false)
def addressAccommodationIdIndex = index("ADDRESS__ACCOMMODATION_ID__INDEX", accommodationId, unique = false)
def extraChargeAccommodationIdIndex = index("EXTRA_CHARGE__ACCOMMODATION_ID__INDEX", accommodationId, unique = false)

我做了一个测试：

val multiResult = (1 to 1000).map { i =>
  val start = System.currentTimeMillis()
  Await.result(accommodationDao.findByIdFullMultipleQueries(i), Duration.Inf)
  System.currentTimeMillis() - start
}
println(s"MULTI AVG Result: ${multiResult.sum.toDouble / multiResult.length}")

val joinResult = (1 to 1000).map { i =>
  val start = System.currentTimeMillis()
  Await.result(accommodationDao.findByIdFull(i), Duration.Inf)
  System.currentTimeMillis() - start
}
println(s"JOIN AVG Result: ${joinResult.sum.toDouble / joinResult.length}")

这里是 2 次运行的结果：

MULTI AVG Result: 3.287
JOIN AVG Result: 96.797
MULTI AVG Result: 3.206
JOIN AVG Result: 100.221

score 2 · Accepted Answer

Postgres 不为外键列添加索引。多查询在所有三个表（主键）上使用索引，而单连接查询将扫描连接表以查找所需的 ID。

accommodationId尝试在列上添加索引。

更新

虽然如果这是 1:1 关系，索引会有所帮助，但看起来这些都是 1:many 关系。在这种情况下，使用连接和稍后的distinct过滤器将从数据库中返回比您需要的更多的数据。

对于您的数据模型，执行多个查询似乎是处理数据的正确方法。

score 1 · Accepted Answer

我认为这取决于您的数据库引擎。Slick 生成的查询可能不是最佳的（请参阅文档），但您需要在数据库级别分析查询以了解正在发生的事情并进行优化

scala - 性能灵活查询一对多

2 回答 2

Related

Reference