sql - SQLite在where子句中的前一个选择值

Question

序言（你可以跳过这个，这只是我的理由）

我创建了一个使用 sqlite 作为其数据库后端的应用程序，并且该模式在一般应用程序使用期间工作（和执行）非常好。

现在我正在尝试为它构建一个报告系统，并且我已经构建了一个 excel xll，它从一个未命名的 DSN 创建查询表。因此，我只能在 sql 中完成所有报告（即我不能以编程方式执行任何操作）。这对除了一个查询之外的所有内容都非常有效......

/// 跳过这里....

我的数据库包含一个特征列表，其中包含一个 id、距离和一个指示该特征是否为标记的指示符。id 不一定与距离的顺序相同（id 为 10 的要素可能有距离 100，而 id 为 11 的要素可能有距离 90）。

所以这个项目基本上看起来像：

Feature { int id, int distance, bool is_marker }

我想做的是找到下一个和上一个特征也是标记。

/// 编辑

我的第一次尝试使用：

select 
*          /* I want all the data from this feature */
(select MAX(f2.distance) - f1.distance 
    from feature as f2
    where f2.is_marker && f2.distance < f1.distance) /* and the distance to the previous marker */
from feature as f2

第二次尝试（这一次有效，对于 100,000 个功能来说，WAAAY 的时间太长了，大约需要 9 天......）：

select
*,          /* I want all the data from this feature */
(select f1.distance - MAX(f2.distance)
    from feature as f2
    where f2.distance AND f2.distance< f1.distance) /* and the distance to the previous marker */
from feature as f1

这个查询确实返回了我想要的，并且对小型数据库执行得很好，但我也必须支持更大的数据库。

（有些数据库的特征少于 1000 个，但我现在正在研究的那个有 >90,000 个特征。查询 1000 个特征需要 <1s，但查询 90,000 个特征需要 20 小时。这是因为它没有线性增长导致性能下降 80 倍：20*60*60/(90,000/1000) = 8000)

后端数据库使用 sqlite，我使用 sqliteodbc 连接器连接 excel。

如果我要在代码中执行此操作，我会这样做：

var features = featureRepository.GetAll();
var featuresWithMarkerDistance = new List<FeatureWithMarkerDistance>();
var previousMarker = null;
for(var index = 0; index < features.Length; index++) {
    var currentFeature = features[index];
    featuresWithMarkerDistance.Add(
        new FeaturesWithMarkerDistance(currentFeature, 
            feature.distance - previousMarker.distance));
    if(feature.is_marker) {
        previousMarker = feature;
    }
}

// FeatureWithMarkerDistance { int id, int distance, bool is_marker, int marker_distance }

// 编辑：

这是一个具体的例子：

(The underlying table)
feature_id is_marker distance
1          false     100
2          false     90
3          false     101
4          true      50
5          false     5
6          true      85
7          false     150
8          false     75

（有距离指数）

我想要的结果：

feature_id is_marker distance distance_to_closest_previous_marker
1          false     100      15
2          false     90       5
3          false     101      16
4          true      50       null
5          false     5        null
6          true      85       35
7          false     150      65
8          false     75       25

因此，如果我要获取 feature_id 1 的前一个标记，则 feature_id 1 的距离是 100，最近的标记是距离 85 处的 feature_id 6。要获得到最近的前一个标记的距离，我取 (100 - 85) = 15。我需要为要包含在报告中的每个功能获取此值。（这必须在单个 sql 查询中完成，因为我正在使用带有 excel 的 odbc 连接器）。上面的查询确实获取了我想要的内容，但它的性能非常糟糕，因为在 where 子句中它必须在整个数据库中搜索每个功能。

我想做的是：（除非有更高效的方式）

   select *          
    /* I want all the data from this feature */ 
    /* previous  = */ (select MAX(f2.distance) - f1.distance 
        from feature as f2
        where f2.is_marker && f2.distance >= previous && f2.distance < f1.distance) 
    /* and the distance to the previous marker */
    from feature as f2

所以基本理论是我会存储前一个标记值，并且在我寻找下一个标记时只查看该值及之后的值。

对最初的混乱感到抱歉（我忘了把 MAX() 放在原来的位置）

score 0 · Accepted Answer

不知道 SQLite，但做这样的事情（我查了语法，发现 LEFT JOIN 和 EXISTS，但不是 NOT EXISTS）？

select f2.*, f2.distance - f1.distance
from feature f2
left join feature f1 on f1.is_marker
                    and f2.distance > f1.distance
                    and not exists(select 1 from feature f1b
                                   where f1b.is_marker
                                     and f2.distance > f1b.distance
                                     and f1.distance < f1b.distance)
where f2.is_marker

我对性能一无所知，但希望 (is_marker, distance) 上的索引可能是有利的（您必须测试在索引中包含 is_marker 是否有用，除此之外取决于 SQLite，它可能还取决于具有 is_marker = true 的列的百分比）。

score 0 · Accepted Answer

这些例子真的很有帮助。干得好。

SELECT F2.feature_id, F2.is_marker, F2.distance, 
       F2.distance - (SELECT F1.distance FROM features F1
                      WHERE F1.is_marker<>0 
                        AND F1.distance<F2.distance
                      ORDER BY F1.distance DESC
                      LIMIT 1) AS "distance_to_closest_previous_marker"
FROM features F2

score 0 · Accepted Answer

我使用了 SQLite3 shell，并尝试了您的查询

SELECT *, 
       (SELECT MIN(feature.distance-distance) FROM feature AS f
               WHERE is_marker AND distance<feature.distance) 
       FROM feature;

它在 5000 条记录中表现相当不错。也许你的弱点是 sqliteobdc？如果它确实仍然很慢，并且假设您几乎没有真正的 is_marker，您可以创建一个表，其中距离 is_marker 为真的特征只有距离：

CREATE TEMP TABLE markers_distance (distance);
CREATE UNIQUE INDEX markers_idx ON markers_distance (distance);
INSERT OR IGNORE INTO markers_distance 
       SELECT distance FROM feature WHERE is_marker;

现在您对 markers_distance 的查询应该更快：

SELECT *, 
       (SELECT MIN(feature.distance-distance) FROM markers_distance
               WHERE distance<feature.distance) 
       FROM feature;

sql - SQLite在where子句中的前一个选择值

3 回答 3

Related

Reference