4

在无休止地尝试在 R 中处理大型 (3-35gb) csv 文件之后,我已经转移到 SQL 来处理这些数据集。所以我在 R 环境中使用这个代码(使用基于 SQlite 的 RSQLite 包),但它不应该减损我的 SQL 问题!

我的问题:如何根据另一个表中给出的匹配值来选择一个表?

我想举例说明。我有以下表格格式:

“数据”表

Symbol| Value| EX
A  | 1  | N       
A  | 1  | N     
A  | 2  | T  
A  | 3  | N  
A  | 4  | N  
A  | 5  | N  
B  | 1  | P       
B  | 2  | P  
B  | 2  | N  
B  | 2  | N  
B  | 3  | P  
B  | 5  | P  
B  | 6  | T  
... 

我想根据下表中给出的特定条件选择符号交换值匹配的所有条目。

“符号交换”表:

Ticker| Exchange
A  | N       
B  | P  
... 

(注意,symbolticker是同一个属性,EXExchange也是同一个属性)

所以我想要的输出是它只保留一个给定交换为 N 等的条目:

Symbol| Value| EX
A  | 1  | N       
A  | 1  | N     
A  | 3  | N  
A  | 4  | N  
A  | 5  | N  
B  | 1  | P       
B  | 2  | P  
B  | 3  | P  
B  | 5  | P  
... 

我可以通过两种方法做到这一点,尽管我对它们不太满意。

此方法在原始表旁边的列中添加引用表,这是多余的。

SELECT *
FROM Data
INNER JOIN Symbolexchange 
ON Data.EX=Symbolexchange.EXCHANGE
AND Data.SYMBOL=Symbolexchange.TICKER

此方法也可以直接完成工作,但比上述方法要慢。

SELECT *
FROM Data
WHERE EX=(SELECT exchange FROM Symbolexchange WHERE ticker = SYMBOL)

有没有更好更快的编程方法?由于我的数据集的大小,速度非常重要。欢迎对我的代码发表任何其他评论!

谢谢

4

2 回答 2

3

Two things that you can do to improve performance:

First (and most importantly) add a key or index to your tables. I don't know SQLite, but usually there's a command something like this:

CREATE INDEX DataIX1 ON Data(Symbol,EX)

You'll want one on the other table too:

CREATE INDEX SymbolExchangeIX1 ON Symbolexchange(Ticker,Exchange)

You may need to throw in ".." or '..' on the names...

The second thing is that although your first query is probably your best approach, you should only return the columns that you actually need/want:

SELECT Data.*
FROM Data
INNER JOIN Symbolexchange 
ON Data.EX=Symbolexchange.EXCHANGE
AND Data.SYMBOL=Symbolexchange.TICKER
于 2013-10-30T16:42:36.710 回答
-1

我不确定您使用的是 mysql 还是 MS SQL。对于 MS SQL,您可以通过向查询添加无锁来加快查询速度。

1) 带(无锁)

Select * from user with (NOLOCK)

或者

2) SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
select * from user a, class b where a.userid=b.userid

You can refer to previously discussed topic on this below. WITH (NOLOCK) vs SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED

于 2013-10-30T16:28:02.937 回答