0

我有一个包含重叠元素的元组列表。

val tupLis:Seq[(List[(Integer,Char)],Int)] = null//data

我正在尝试合并元组列表中的重叠元素。这是我正在处理的代码,它使用 foldleft 来合并列表中的重叠元组列表。合并无法正常工作,因为它错过了元组列表的一些元素。每个元组列表中包含 4 个元组. 列表中的每个元组列表经常重叠,因为它们是使用滑动函数从更大的列表生成的。

val alLis:Seq[(List[(Integer,Char)],Int)] = snGrMap.map(_._2).flatten.toList.sortBy(_._1.head._1)
val res = alLis.foldLeft(mutable.HashMap.empty[Int,(List[Integer],List[(Integer,Char)],Int)]) { (map, value) =>
  if(map.size<=0){
    map.put(0,(value._1.map(_._1),value._1,value._2))
  }else{
    val cads = map.filter(p=>value._1.intersect(p._2._2).size>=3)
    if(cads.size>=1) {
      cads.foreach { i =>
        val cmnPos = i._2._1.intersect(value._1.map(_._1))
        val cmnBase = i._2._2.filter(p=>cmnPos.contains(p._1)).intersect(value._1.filter(p=>cmnPos.contains(p._1)))
        println(cmnBase.size,cmnPos.size,value._1, i._2._2)
        if(cmnBase.size == cmnPos.size)
          map.put(i._1,((i._2._1++value._1.map(_._1)).distinct,(i._2._2++value._1).distinct,i._2._3+value._2))
        else
          map.put(map.size,(value._1.map(_._1),value._1,value._2))
      }
    }else{
      map.put(map.size,(value._1.map(_._1),value._1,value._2))
    }
  }
  map
}

这是我正在使用的示例数据:

(List((306,c), (328,g), (336,a), (346,g)),282)
(List((306,g), (328,c), (336,g), (346,a)),22)
(List((306,c), (328,c), (336,g), (346,a)),4)
(List((328,g), (336,a), (346,g), (348,t)),164)
(List((328,g), (336,a), (346,g), (348,c)),161)
(List((328,c), (336,g), (346,a), (348,c)),28)
(List((336,a), (346,g), (348,t), (358,a)),168)
(List((336,a), (346,g), (348,c), (358,a)),154)
(List((336,g), (346,a), (348,c), (358,g)),30)
(List((346,g), (348,t), (358,a), (361,c)),178)
(List((346,g), (348,c), (358,a), (361,c)),166)
(List((346,a), (348,c), (358,g), (361,g)),34)

合并后的列表如下:

List((306,c), (328,g), (336,a), (346,g), (348,t), (358,a), (361,c),792)
List((306,c), (328,g), (336,a), (346,g), (348,c), (358,a), (361,c) ),763)
List((306,g), (328,c), (336,g), (346,a), (348,c),  (358,g), (361,g) ),96)

更新1:

重叠:如果两个元组列表在两个列表中都存在3个或更多精确元组,那么它们应该是重叠的元组列表。但是当两个列表合并时应该没有任何区别。如果其中一个元组值两个列表具有相同的整数但不同的字符,则它们不应合并。合并:当它们重叠时组合两个或多个元组列表。

更新 2:我想出了一个小解决方案,但不确定它的效率如何。

val alLisWithIndex = alLis.zipWithIndex
    val interGrps = new ListBuffer[(Int,Int)]()
    alLisWithIndex.foreach{i=>
      val cads = alLisWithIndex.filter(p=>p._1._1.take(3).intersect(i._1._1.takeRight(3)).size>=3)
      cads.foreach(p=>interGrps.append((i._2,p._2)))
    }
println(interGrps.sortBy(_._1))

所以当我打印上面的代码时,我得到以这种方式分组的元组列表。我只打印了应该合并的每个元组组的索引。

生成的结果:ListBuffer((0,2), (0,3), (1,4), (2,5), (3,6), (4,7), (5,8), (6, 9), (7,10))

这是使用了索引的元组列表

List(((List((306,c), (328,g), (336,a), (346,g)),282),0),
((List((306,g), (328,c), (336,g), (346,a)),22),1),
((List((328,g), (336,a), (346,g), (348,t)),164),2),
((List((328,g), (336,a), (346,g), (348,c)),161),3),
((List((328,c), (336,g), (346,a), (348,c)),28),4),
((List((336,a), (346,g), (348,t), (358,a)),168),5),
((List((336,a), (346,g), (348,c), (358,a)),154),6),
((List((336,g), (346,a), (348,c), (358,g)),30),7),
((List((346,g), (348,t), (358,a), (361,c)),178),8),
((List((346,g), (348,c), (358,a), (361,c)),166),9),
((List((346,a), (348,c), (358,g), (361,g)),34),10))

所以现在我要做的就是使用interGrps,根据第二个值链接组,最后用元组列表替换索引..

4

1 回答 1

2

我认为以下代码遵循您的算法的描述。但是,它并没有给出相同的输出,所以还有一些东西需要弄清楚你想要什么

一、测试数据

var xs = List(
(List((306,"c"), (328,"g"), (336,"a"), (346,"g")),282),
(List((306,"g"), (328,"c"), (336,"g"), (346,"a")),22),
(List((306,"c"), (328,"c"), (336,"g"), (346,"a")),4),
(List((328,"g"), (336,"a"), (346,"g"), (348,"t")),164),
(List((328,"g"), (336,"a"), (346,"g"), (348,"c")),161),
(List((328,"c"), (336,"g"), (346,"a"), (348,"c")),28),
(List((336,"a"), (346,"g"), (348,"t"), (358,"a")),168),
(List((336,"a"), (346,"g"), (348,"c"), (358,"a")),154),
(List((336,"g"), (346,"a"), (348,"c"), (358,"g")),30),
(List((346,"g"), (348,"t"), (358,"a"), (361,"c")),178),
(List((346,"g"), (348,"c"), (358,"a"), (361,"c")),166),
(List((346,"a"), (348,"c"), (358,"g"), (361,"g")),34))

现在有一种方法来实现“如果两个元组列表在两个列表中都存在 3 个或更多精确的元组,那么它们应该是重叠的元组列表。”

def isOverlap[A](a:(List[A],Int),b:(List[A],Int)) = (a._1 intersect b._1).size >= 3

然后,使用我在这里写的东西,根据谓词对“匹配”的元素进行分组

def groupWith[A](xs: List[A], f: (A, A) => Boolean) = {
  // helper function to add "e" to any list with a member that matches the predicate
  // otherwise add it to a list of its own
  def addtoGroup(gs: List[List[A]], e: A): List[List[A]] = {
    val (before, after) = gs.span(_.exists(!f(_, e)))
    if (after.isEmpty)
      List(e) :: gs
    else
      before ::: (e :: after.head) :: after.tail
  }
  // now a simple foldLeft adding each element to the appropriate list
  xs.foldLeft(Nil: List[List[A]])(addtoGroup)
}  

我们可以得到重叠元素列表的列表

List(List((List((346,g), (348,c), (358,a), (361,c)),166), 
          (List((346,g), (348,t), (358,a), (361,c)),178)),
     List((List((346,a), (348,c), (358,g), (361,g)),34),
          (List((336,g), (346,a), (348,c), (358,g)),30)), 
     List((List((336,a), (346,g), (348,c), (358,a)),154),
          (List((336,a), (346,g), (348,t), (358,a)),168)), 
     List((List((328,c), (336,g), (346,a), (348,c)),28),
          (List((306,c), (328,c), (336,g), (346,a)),4),
          (List((306,g), (328,c), (336,g), (346,a)),22)),
     List((List((328,g), (336,a), (346,g), (348,c)),161),
          (List((328,g), (336,a), (346,g), (348,t)),164),
          (List((306,c), (328,g), (336,a), (346,g)),282)))

然后我们编写一个函数来合并重叠元组的列表:

def merge(ys: List[(List[(Int, String)], Int)]) = 
   ys.foldLeft((Nil:List[(Int, String)], 0))
  {(acc, e) => ((acc._1 ++ (e._1 diff acc._1)).sorted, acc._2 + e._2)}

(通过添加任何尚未在累积结果中的元组来合并元组,并将整数相加。这.sorted只是为了更容易直观地查看结果)

然后合并重叠的条目

ms.map(merge)

给出这个,但这不是你的输出吗?

List((List((346,g), (348,c), (348, t), (358,a), (361,c)),344), 
     (List((336,g), (346,a), (348,c), (358,g), (361, g)),64),
     (List((336,a), (346,g), (348,c), (348,t), (358,a)),322),
     (List((306,c), (306,g), (328,c), (336,g), (346,a), (348,c)),54),
     (List((306,c), (328,g), (336,a), (346,g), (348,c), (348,t)),607))

编辑:在评论之后,这里是更新的 isOverlap。但是,这意味着比原始的重叠更少,因此最终合并输出中的元素更多,因此仍然不对:

def isOverlap(a:(List[(Int, String)],Int),b:(List[(Int, String)],Int)) =
  // combine the tuples by Int, and check that we don't get two entries
  // for any Int (i.e. if we do, they have different Strings so it's not an overlap)
 !((a._1++b._1).groupBy(_._2).exists(_._2.length > 1)) &&
  // check there are at least 2 matching tuples
  (a._1 intersect b._1).size >= 3  
于 2016-06-08T07:14:18.997 回答