如果在 C# 或 C++ 中执行这种计算有任何收获,我会感到非常惊讶。将数据从 SQL 服务器传输到 C# 或 C++ 程序所需的时间将远远超过速度差异。请记住,SQL 服务器仍然使用您的 C++ 或 C# 代码将使用的相同 C 或 C++ 库(或至少一个非常相似的库),因此实际exp
和log
计算本身的速度将非常相似。开销来自 SQL 元素的解析。而且我认为根本不会有太大的不同。
如果您真的认为这是一个问题(我没有,但我不负责您正在做的工作......),我建议您尝试构建一个测试用例,其中包含一些具有实际值的表和实际大小(可能更大一些),然后比较计算值的速度并直接获取值(在纯 SQL 代码中 - 我假设有一个 SQL 命令行工具可以使用,或者一些网络-interface 或其他允许您执行计算的东西)。也许也只返回sum
这些值。
编辑:我写了一些 PHP(因为我几乎已经在我的机器上安装了 PHP + MySQL 环境)。[不,这些不是我的用户名/密码组合——我不会在这样的公共服务器上发布它!]
<?php
$dbconnect = mysql_connect("localhost", "username", "password");
if (!$dbconnect)
{
die('Could not connect: ' . mysql_error());
}
mysql_select_db("test", $dbconnect)
or die ("Couldn't connect to database: " . mysql_error() );
echo "Argv[1]=" . $argv[1] . "\n";
if ($argv[1] == "Create")
{
$rm = getrandmax();
for($i = 0; $i < 100000; $i++)
{
$a = rand() / $rm;
$b = rand() / $rm;
$c = rand() / $rm;
$d = rand() / $rm;
$e = rand() / $rm;
$f = rand() / $rm;
$sql = "INSERT INTO test1 (id, a, b, c, d, e, f) VALUES ("
. $i .
", " . $a . ", " . $b . ", " . $c . ", " . $d . ", " . $e
. ", " . $f . ");";
if (mysql_query($sql, $dbconnect) === false)
{
die("Could not add element " . mysql_error());
}
}
}
if ($argv[1] == "ExpSumLog")
{
$sql = "SELECT exp(sum(log(a))) AS a1,
exp(sum(log(b))) AS b1,
exp(sum(log(c))) AS c1,
exp(sum(log(d))) AS d1,
exp(sum(log(e))) AS e1
FROM test1
GROUP BY e,f,id";
$result = mysql_query($sql, $dbconnect) or die("Failed " . mysql_error());
$count = 0;
$sum = 0;
while($row = mysql_fetch_assoc($result))
{
$count++;
$sum += $row['a1'];
}
echo "Sum=" . $sum . ", count=" . $count . "\n";
}
if ($argv[1] == "Sum")
{
$sum = 0;
$sql = "SELECT sum(a) AS a1,
sum(b) AS b1,
sum(c) AS c1,
sum(d) AS d1,
sum(e) AS e1
FROM test1
GROUP BY e,f,id";
$result = mysql_query($sql, $dbconnect) or die("Failed " . mysql_error());
$count = 0;
while($row = mysql_fetch_assoc($result))
{
$count++;
$sum += $row['a1'];
}
echo "Sum=" . $sum . ", count=" . $count . "\n";
}
if ($argv[1] == "List")
{
$sum = 0;
$sql = "SELECT * FROM test1;";
$result = mysql_query($sql, $dbconnect) or die("Failed " . mysql_error());
$count = 0;
while($row = mysql_fetch_assoc($result))
{
$count++;
$sum += $row['a'];
}
echo "Sum=" . $sum . ", count=" . $count . "\n";
}
if ($argv[1] == "SumA")
{
$sum = 0;
$sql = "SELECT sum(a) FROM test1;";
$result = mysql_query($sql, $dbconnect) or die("Failed " . mysql_error());
$count = 0;
while($row = mysql_fetch_assoc($result))
{
$count++;
$sum += $row['sum(a)'];
}
echo "Sum=" . $sum . ", count=" . $count . "\n";
}
if ($argv[1] == "ExpSumLogA")
{
$sum = 0;
$sql = "SELECT sum(exp(log(a))) AS a1 FROM test1;";
$result = mysql_query($sql, $dbconnect) or die("Failed " . mysql_error());
$count = 0;
while($row = mysql_fetch_assoc($result))
{
$count++;
$sum += $row['a1'];
}
echo "Sum=" . $sum . ", count=" . $count . "\n";
}
?>
创建大约需要 55 分钟......幸运的是,其他步骤要快得多。
Argv[1]=ExpSumLog
Sum=50017.011061374, count=100000
real 0m1.102s
user 0m0.289s
sys 0m0.066s
Argv[1]=Sum
Sum=50017.011061374, count=100000
real 0m1.004s
user 0m0.278s
sys 0m0.055s
Argv[1]=List
Sum=50017.011061374, count=100000
real 0m0.993s
user 0m0.322s
sys 0m0.060s
Argv[1]=SumA
Sum=50017.011061374, count=1
real 0m0.068s
user 0m0.019s
sys 0m0.012s
Argv[1]=ExpSumLogA
Sum=50017.011061374, count=1
real 0m0.095s
user 0m0.024s
sys 0m0.017s
如您所见,执行实际计算所需的时间远少于复制所有数据所需的时间。并且将数据计算为 sum(exp(log(a))) 和 sum(a) 之间的差异略有不同(但始终不同 - ExpSumLogA 和 SumA 的每次运行慢约 20-30 毫秒)。
为了证明数据传输是重点,我添加了以下四个变体:
if ($argv[1] == "SortedA")
{
$sum = 0;
$sql = "SELECT a AS a1 FROM test1 ORDER BY a;";
$result = mysql_query($sql, $dbconnect) or die("Failed " . mysql_error());
$count = 0;
while($row = mysql_fetch_assoc($result))
{
$count++;
$sum += $row['a1'];
}
echo "Sum=" . $sum . ", count=" . $count . "\n";
}
if ($argv[1] == "SortedExpLogA")
{
$sum = 0;
$sql = "SELECT exp(log(a)) AS a1 FROM test1 ORDER BY a;";
$result = mysql_query($sql, $dbconnect) or die("Failed " . mysql_error());
$count = 0;
while($row = mysql_fetch_assoc($result))
{
$count++;
$sum += $row['a1'];
}
echo "Sum=" . $sum . ", count=" . $count . "\n";
}
if ($argv[1] == "UnsortedA")
{
$sum = 0;
$sql = "SELECT a AS a1 FROM test1;";
$result = mysql_query($sql, $dbconnect) or die("Failed " . mysql_error());
$count = 0;
while($row = mysql_fetch_assoc($result))
{
$count++;
$sum += $row['a1'];
}
echo "Sum=" . $sum . ", count=" . $count . "\n";
}
if ($argv[1] == "UnsortedExpLogA")
{
$sum = 0;
$sql = "SELECT exp(log(a)) AS a1 FROM test1;";
$result = mysql_query($sql, $dbconnect) or die("Failed " . mysql_error());
$count = 0;
while($row = mysql_fetch_assoc($result))
{
$count++;
$sum += $row['a1'];
}
echo "Sum=" . $sum . ", count=" . $count . "\n";
}
显然,这些变体比导出所有数据运行得更快,但比“只返回一个值”慢,而且确实如此。
Argv[1]=SortedA
Sum=50017.011061375, count=100000
real 0m0.375s
user 0m0.194s
sys 0m0.027s
Argv[1]=SortedExpLogA
Sum=50017.011061375, count=100000
real 0m0.394s
user 0m0.202s
sys 0m0.023s
Argv[1]=UnsortedA
Sum=50017.011061374, count=100000
real 0m0.353s
user 0m0.206s
sys 0m0.018s
Argv[1]=UnsortedExpLogA
Sum=50017.011061374, count=100000
real 0m0.383s
user 0m0.223s
sys 0m0.025s
可以看到,Sorted 比 Unsorted 花费的时间稍微长一点(预计,如果你要排序 100K 个项目,它会增加一些时间),而 ExpLog 变体比“just return”慢一点A”变体。这是相当一致的。