scala - 值切片不是 play.api.libs.iteratee.Enumerator 的成员

Question

我正在编写基于https://github.com/websudos/phantom#partial-select-queries中描述的“大型记录集的异步迭代器”的代码

import scala.concurrent.ExecutionContext.Implicits.global
import scala.concurrent.Future

import org.joda.time.DateTime
import org.joda.time.format.DateTimeFormat
import org.joda.time.format.DateTimeFormatter

import com.anomaly42.aml.dao.CassandraConnector
import com.websudos.phantom.CassandraTable
import com.websudos.phantom.Implicits._

object People extends People {
  def getPersonByUpdatedAt(from:String, to:String, start: Int, limit: Int) = {
    val dtf:DateTimeFormatter = DateTimeFormat.forPattern("yyyy-MM-dd'T'HH:mm:ssZ");
    val fromDateTime = dtf.parseDateTime(from)
    val toDateTime = dtf.parseDateTime(to)

    People.select(_.updated_at, _.firstName).allowFiltering.where(_.updated_at gte fromDateTime).and(_.updated_at lte toDateTime).fetchEnumerator().slice(start, limit).collect
  }
}

我正在使用以下库依赖项：

scalaVersion  := "2.11.6"
libraryDependencies ++= Seq(
  "com.websudos"        %%  "phantom-dsl"     % "1.5.4",
  many more...
)

但是编译时出现以下错误：

value slice is not a member of play.api.libs.iteratee.Enumerator[(org.joda.time.DateTime, Option[String])]

我想要做的是编写一个查询，每次调用 getPersonByUpdatedAt() 方法时，从“开始”开始返回下一个“限制”结果数。

score 3 · Accepted Answer

这里有很多实现细节需要解决。首先，如果您在分页之后，可能有一种更简单的方法可以通过简单的范围查询而不是过滤数据来实现这一目标。

看看 using CLUSTERING ORDER，那个调用不ALLOW FILTERING应该在那里。此外，如果没有CLUSTERING ORDER默认的 Murmur3 分区器，实际上并没有排序，因此您无法保证以与写入数据相同的顺序检索数据。

这可能意味着您的分页根本不起作用。最后但同样重要的是，直接使用枚举器可能不是您所追求的。

它们是异步的，因此您必须在未来进行映射才能获得切片，但除此之外，当像 Spark 这样的东西一次加载整个表时，枚举器很有用，例如很多很多结果。

总结一下，在 people 表中：

object id extends UUIDColumn(this) with PartitionKey[UUID]// doesn't have to be UUID
object start extends DateTimeColumn(this) with ClusteringOrder[DateTime] with Ascending
object end extends DateTimeColumn(this) with ClusteringOrder[DateTime] with Ascending

只需使用Scala 集合库中的fetch()然后Seq.slice。以上假设您要按升序进行分页，例如首先检索最旧的。

您还需要弄清楚实际的分区键可能是什么。如果同时更新 2 个用户，最坏的情况是您丢失数据并以 FIFO 队列结束，例如在给定时间的最后一次更新“获胜”。我在id上面使用过，但这显然不是你需要的。

而且您可能需要有几个表来存储人员，这样您就可以覆盖所有需要的查询。

score 1 · Accepted Answer

您应该使用 Play 框架中的 Iteratee 和 Enumerator。在您的情况下，您需要：

import com.websudos.phantom.iteratee.Iteratee

val enumerator =  People.select(_.updated_at, _.firstName).allowFiltering.where(_.updated_at gte fromDateTime).and(_.updated_at lte toDateTime).fetchEnumerator

val iteratee = Iteratee.slice[PeopleCaseClass](start, limit)

enumerator.run( iteratee ).map( _.foldLeft( List.empty[PeopleCaseClass] )( (l,e) => { e :: l } ))

希望这会有所帮助

scala - 值切片不是 play.api.libs.iteratee.Enumerator 的成员

2 回答 2

Related

Reference