java - 确定整数平方根是否为整数的最快方法

Question

我正在寻找确定一个long值是否是完美平方的最快方法（即它的平方根是另一个整数）：

我通过使用内置Math.sqrt() 函数以简单的方式完成了它，但我想知道是否有一种方法可以通过将自己限制在仅限整数域来更快地完成它。
维护查找表是不切实际的（因为大约有 2 ^31.5个整数的平方小于 2 ⁶³）。

这是我现在正在做的非常简单直接的方法：

public final static boolean isPerfectSquare(long n)
{
  if (n < 0)
    return false;

  long tst = (long)(Math.sqrt(n) + 0.5);
  return tst*tst == n;
}

_{注意：我在很多Project Euler问题中都使用了这个函数。因此，没有其他人将不得不维护此代码。而这种微优化实际上可以产生影响，因为部分挑战是在不到一分钟的时间内完成每个算法，并且在某些问题中这个函数需要被调用数百万次。}

我尝试了不同的解决方案来解决这个问题：

经过详尽的测试，我发现添加0.5到 Math.sqrt() 的结果是没有必要的，至少在我的机器上不是。
快速逆平方根更快，但它在n >= 410881 时给出了不正确的结果。但是，正如BobbyShaftoe所建议的，我们可以对 n < 410881 使用 FISR hack。
牛顿的方法比Math.sqrt(). 这可能是因为Math.sqrt()它使用了类似于牛顿法的东西，但在硬件中实现，因此它比 Java 快得多。此外，牛顿法仍然需要使用双打。
修改后的牛顿方法，它使用了一些技巧，因此只涉及整数数学，需要一些技巧来避免溢出（我希望这个函数可以处理所有正的 64 位有符号整数），它仍然比Math.sqrt().
二进制斩波甚至更慢。这是有道理的，因为二进制斩波平均需要 16 遍才能找到 64 位数字的平方根。
根据 John 的测试，or在 C++ 中 using 语句比使用 a 更快，但在 Java 和 C# 中， andswitch之间似乎没有区别。orswitch
我还尝试制作一个查找表（作为 64 个布尔值的私有静态数组）。or然后，我只想说，而不是 switch 或语句if(lookup[(int)(n&0x3F)]) { test } else return false;。令我惊讶的是，这（只是稍微）慢了一点。这是因为在 Java 中检查了数组边界。

score 794 · Accepted Answer

我想出了一个比你的 6bits+Carmack+sqrt 代码快 35% 的方法，至少在我的 CPU (x86) 和编程语言 (C/C++) 上是这样。您的结果可能会有所不同，尤其是因为我不知道 Java 因素将如何发挥作用。

我的方法有三个：

首先，过滤掉明显的答案。这包括负数和查看最后 4 位。（我发现查看最后六个没有帮助。）我也对 0 回答是。（在阅读下面的代码时，请注意我的输入是int64 x。）
```
if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
    return false;
if( x == 0 )
    return true;
```
接下来，检查它是否是模 255 = 3 * 5 * 17 的平方。因为这是三个不同素数的乘积，所以模 255 的余数中只有大约 1/8 是平方。但是，根据我的经验，调用模运算符 (%) 的成本高于获得的收益，因此我使用涉及 255 = 2^8-1 的位技巧来计算残差。（无论好坏，我没有使用从单词中读取单个字节的技巧，只是按位与和移位。）
```
int64 y = x;
y = (y & 4294967295LL) + (y >> 32); 
y = (y & 65535) + (y >> 16);
y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
// At this point, y is between 0 and 511.  More code can reduce it farther.
```
为了实际检查余数是否为正方形，我在预先计算的表格中查找答案。
```
if( bad255[y] )
    return false;
// However, I just use a table of size 512
```
最后，尝试使用类似于Hensel 引理的方法计算平方根。（我不认为它直接适用，但它可以通过一些修改来工作。）在此之前，我用二分搜索除以 2 的所有幂：
```
if((x & 4294967295LL) == 0)
    x >>= 32;
if((x & 65535) == 0)
    x >>= 16;
if((x & 255) == 0)
    x >>= 8;
if((x & 15) == 0)
    x >>= 4;
if((x & 3) == 0)
    x >>= 2;
```
此时，要使我们的数字成为正方形，它必须是 1 mod 8。
```
if((x & 7) != 1)
    return false;
```
亨塞尔引理的基本结构如下。（注意：未经测试的代码；如果不起作用，请尝试 t=2 或 8。）
```
int64 t = 4, r = 1;
t <<= 1; r += ((x - r * r) & t) >> 1;
t <<= 1; r += ((x - r * r) & t) >> 1;
t <<= 1; r += ((x - r * r) & t) >> 1;
// Repeat until t is 2^33 or so.  Use a loop if you want.
```
这个想法是，在每次迭代中，您将一位添加到 r 上，即 x 的“当前”平方根；每个平方根都精确模数越来越大的 2 次方，即 t/2。最后，r 和 t/2-r 将是 x 模 t/2 的平方根。（请注意，如果 r 是 x 的平方根，那么 -r 也是如此。偶数模数也是如此，但请注意，对某些数模数，事物的平方根甚至可能超过 2 个；值得注意的是，这包括 2 的幂。 ) 因为我们的实际平方根小于 2^32，此时我们实际上可以检查 r 或 t/2-r 是否是真正的平方根。在我的实际代码中，我使用了以下修改后的循环：
```
int64 r, t, z;
r = start[(x >> 3) & 1023];
do {
    z = x - r * r;
    if( z == 0 )
        return true;
    if( z < 0 )
        return false;
    t = z & (-z);
    r += (z & t) >> 1;
    if( r > (t >> 1) )
        r = t - r;
} while( t <= (1LL << 33) );
```
这里的加速是通过三种方式获得的：预先计算的起始值（相当于循环的约 10 次迭代）、循环的提前退出和跳过一些 t 值。对于最后一部分，我看一下z = r - x * x，并将 t 设置为 2 除 z 的最大幂，有点技巧。这使我可以跳过不会影响 r 值的 t 个值。在我的例子中，预先计算的起始值选择了“最小正”平方根模 8192。

即使这段代码对你来说运行得更快，我希望你喜欢它包含的一些想法。完整的、经过测试的代码如下，包括预先计算的表。

typedef signed long long int int64;

int start[1024] =
{1,3,1769,5,1937,1741,7,1451,479,157,9,91,945,659,1817,11,
1983,707,1321,1211,1071,13,1479,405,415,1501,1609,741,15,339,1703,203,
129,1411,873,1669,17,1715,1145,1835,351,1251,887,1573,975,19,1127,395,
1855,1981,425,453,1105,653,327,21,287,93,713,1691,1935,301,551,587,
257,1277,23,763,1903,1075,1799,1877,223,1437,1783,859,1201,621,25,779,
1727,573,471,1979,815,1293,825,363,159,1315,183,27,241,941,601,971,
385,131,919,901,273,435,647,1493,95,29,1417,805,719,1261,1177,1163,
1599,835,1367,315,1361,1933,1977,747,31,1373,1079,1637,1679,1581,1753,1355,
513,1539,1815,1531,1647,205,505,1109,33,1379,521,1627,1457,1901,1767,1547,
1471,1853,1833,1349,559,1523,967,1131,97,35,1975,795,497,1875,1191,1739,
641,1149,1385,133,529,845,1657,725,161,1309,375,37,463,1555,615,1931,
1343,445,937,1083,1617,883,185,1515,225,1443,1225,869,1423,1235,39,1973,
769,259,489,1797,1391,1485,1287,341,289,99,1271,1701,1713,915,537,1781,
1215,963,41,581,303,243,1337,1899,353,1245,329,1563,753,595,1113,1589,
897,1667,407,635,785,1971,135,43,417,1507,1929,731,207,275,1689,1397,
1087,1725,855,1851,1873,397,1607,1813,481,163,567,101,1167,45,1831,1205,
1025,1021,1303,1029,1135,1331,1017,427,545,1181,1033,933,1969,365,1255,1013,
959,317,1751,187,47,1037,455,1429,609,1571,1463,1765,1009,685,679,821,
1153,387,1897,1403,1041,691,1927,811,673,227,137,1499,49,1005,103,629,
831,1091,1449,1477,1967,1677,697,1045,737,1117,1737,667,911,1325,473,437,
1281,1795,1001,261,879,51,775,1195,801,1635,759,165,1871,1645,1049,245,
703,1597,553,955,209,1779,1849,661,865,291,841,997,1265,1965,1625,53,
1409,893,105,1925,1297,589,377,1579,929,1053,1655,1829,305,1811,1895,139,
575,189,343,709,1711,1139,1095,277,993,1699,55,1435,655,1491,1319,331,
1537,515,791,507,623,1229,1529,1963,1057,355,1545,603,1615,1171,743,523,
447,1219,1239,1723,465,499,57,107,1121,989,951,229,1521,851,167,715,
1665,1923,1687,1157,1553,1869,1415,1749,1185,1763,649,1061,561,531,409,907,
319,1469,1961,59,1455,141,1209,491,1249,419,1847,1893,399,211,985,1099,
1793,765,1513,1275,367,1587,263,1365,1313,925,247,1371,1359,109,1561,1291,
191,61,1065,1605,721,781,1735,875,1377,1827,1353,539,1777,429,1959,1483,
1921,643,617,389,1809,947,889,981,1441,483,1143,293,817,749,1383,1675,
63,1347,169,827,1199,1421,583,1259,1505,861,457,1125,143,1069,807,1867,
2047,2045,279,2043,111,307,2041,597,1569,1891,2039,1957,1103,1389,231,2037,
65,1341,727,837,977,2035,569,1643,1633,547,439,1307,2033,1709,345,1845,
1919,637,1175,379,2031,333,903,213,1697,797,1161,475,1073,2029,921,1653,
193,67,1623,1595,943,1395,1721,2027,1761,1955,1335,357,113,1747,1497,1461,
1791,771,2025,1285,145,973,249,171,1825,611,265,1189,847,1427,2023,1269,
321,1475,1577,69,1233,755,1223,1685,1889,733,1865,2021,1807,1107,1447,1077,
1663,1917,1129,1147,1775,1613,1401,555,1953,2019,631,1243,1329,787,871,885,
449,1213,681,1733,687,115,71,1301,2017,675,969,411,369,467,295,693,
1535,509,233,517,401,1843,1543,939,2015,669,1527,421,591,147,281,501,
577,195,215,699,1489,525,1081,917,1951,2013,73,1253,1551,173,857,309,
1407,899,663,1915,1519,1203,391,1323,1887,739,1673,2011,1585,493,1433,117,
705,1603,1111,965,431,1165,1863,533,1823,605,823,1179,625,813,2009,75,
1279,1789,1559,251,657,563,761,1707,1759,1949,777,347,335,1133,1511,267,
833,1085,2007,1467,1745,1805,711,149,1695,803,1719,485,1295,1453,935,459,
1151,381,1641,1413,1263,77,1913,2005,1631,541,119,1317,1841,1773,359,651,
961,323,1193,197,175,1651,441,235,1567,1885,1481,1947,881,2003,217,843,
1023,1027,745,1019,913,717,1031,1621,1503,867,1015,1115,79,1683,793,1035,
1089,1731,297,1861,2001,1011,1593,619,1439,477,585,283,1039,1363,1369,1227,
895,1661,151,645,1007,1357,121,1237,1375,1821,1911,549,1999,1043,1945,1419,
1217,957,599,571,81,371,1351,1003,1311,931,311,1381,1137,723,1575,1611,
767,253,1047,1787,1169,1997,1273,853,1247,413,1289,1883,177,403,999,1803,
1345,451,1495,1093,1839,269,199,1387,1183,1757,1207,1051,783,83,423,1995,
639,1155,1943,123,751,1459,1671,469,1119,995,393,219,1743,237,153,1909,
1473,1859,1705,1339,337,909,953,1771,1055,349,1993,613,1393,557,729,1717,
511,1533,1257,1541,1425,819,519,85,991,1693,503,1445,433,877,1305,1525,
1601,829,809,325,1583,1549,1991,1941,927,1059,1097,1819,527,1197,1881,1333,
383,125,361,891,495,179,633,299,863,285,1399,987,1487,1517,1639,1141,
1729,579,87,1989,593,1907,839,1557,799,1629,201,155,1649,1837,1063,949,
255,1283,535,773,1681,461,1785,683,735,1123,1801,677,689,1939,487,757,
1857,1987,983,443,1327,1267,313,1173,671,221,695,1509,271,1619,89,565,
127,1405,1431,1659,239,1101,1159,1067,607,1565,905,1755,1231,1299,665,373,
1985,701,1879,1221,849,627,1465,789,543,1187,1591,923,1905,979,1241,181};

bool bad255[512] =
{0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
 0,0,1,1,0,1,1,1,1,0,1,1,1,1,1,0,0,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,
 1,1,0,1,0,1,1,1,1,1,1,1,1,1,1,1,1,0,1,0,1,1,1,0,1,1,1,1,0,1,1,1,
 0,1,0,1,1,0,0,1,1,1,1,1,0,1,1,1,1,0,1,1,0,0,1,1,1,1,1,1,1,1,0,1,
 1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,0,1,1,1,0,1,1,1,1,0,0,1,1,1,1,1,1,
 1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,0,1,1,0,1,1,1,1,1,
 1,1,1,1,1,1,0,1,1,0,1,0,1,1,0,1,1,1,1,1,1,1,1,1,1,1,0,1,1,0,1,1,
 1,1,1,0,0,1,1,1,1,1,1,1,0,0,1,1,1,1,1,1,1,1,1,1,1,1,1,0,0,1,1,1,
 1,0,1,1,1,0,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,0,1,1,1,1,1,1,1,1,
 0,0};

inline bool square( int64 x ) {
    // Quickfail
    if( x < 0 || (x&2) || ((x & 7) == 5) || ((x & 11) == 8) )
        return false;
    if( x == 0 )
        return true;

    // Check mod 255 = 3 * 5 * 17, for fun
    int64 y = x;
    y = (y & 4294967295LL) + (y >> 32);
    y = (y & 65535) + (y >> 16);
    y = (y & 255) + ((y >> 8) & 255) + (y >> 16);
    if( bad255[y] )
        return false;

    // Divide out powers of 4 using binary search
    if((x & 4294967295LL) == 0)
        x >>= 32;
    if((x & 65535) == 0)
        x >>= 16;
    if((x & 255) == 0)
        x >>= 8;
    if((x & 15) == 0)
        x >>= 4;
    if((x & 3) == 0)
        x >>= 2;

    if((x & 7) != 1)
        return false;

    // Compute sqrt using something like Hensel's lemma
    int64 r, t, z;
    r = start[(x >> 3) & 1023];
    do {
        z = x - r * r;
        if( z == 0 )
            return true;
        if( z < 0 )
            return false;
        t = z & (-z);
        r += (z & t) >> 1;
        if( r > (t  >> 1) )
            r = t - r;
    } while( t <= (1LL << 33) );

    return false;
}

score 440 · Accepted Answer

我参加聚会已经很晚了，但我希望提供更好的答案；更短并且（假设我的基准测试是正确的）也更快。

long goodMask; // 0xC840C04048404040 computed below
{
    for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i);
}

public boolean isSquare(long x) {
    // This tests if the 6 least significant bits are right.
    // Moving the to be tested bit to the highest position saves us masking.
    if (goodMask << x >= 0) return false;
    final int numberOfTrailingZeros = Long.numberOfTrailingZeros(x);
    // Each square ends with an even number of zeros.
    if ((numberOfTrailingZeros & 1) != 0) return false;
    x >>= numberOfTrailingZeros;
    // Now x is either 0 or odd.
    // In binary each odd square ends with 001.
    // Postpone the sign test until now; handle zero in the branch.
    if ((x&7) != 1 | x <= 0) return x == 0;
    // Do it in the classical way.
    // The correctness is not trivial as the conversion from long to double is lossy!
    final long tst = (long) Math.sqrt(x);
    return tst * tst == x;
}

第一个测试快速捕获大多数非正方形。它使用一个包含 64 项的长表，因此没有数组访问成本（间接和边界检查）。对于均匀随机long的，有 81.25% 的概率在这里结束。

第二个测试捕获所有在因式分解中具有奇数个二的数字。该方法Long.numberOfTrailingZeros非常快，因为它将 JIT 编入单个 i86 指令。

删除尾随零后，第三个测试处理以二进制结尾的 011、101 或 111 的数字，它们不是完美的正方形。它还关心负数并处理 0。

最后的测试回到double算术。由于double只有 53 位尾数，从longto的转换double包括对大值的舍入。尽管如此，测试是正确的（除非证明是错误的）。

试图合并 mod255 的想法并不成功。

score 136 · Accepted Answer

你必须做一些基准测试。最佳算法将取决于输入的分布。

您的算法可能接近最优，但您可能需要在调用平方根例程之前快速检查以排除一些可能性。例如，通过按位“与”来查看十六进制数字的最后一位数字。完美的平方只能以 16 为底以 0、1、4 或 9 结尾，因此对于 75% 的输入（假设它们是均匀分布的），您可以避免调用平方根以换取一些非常快速的位旋转。

Kip 对以下实现十六进制技巧的代码进行了基准测试。在测试数字 1 到 100,000,000 时，此代码的运行速度是原始代码的两倍。

public final static boolean isPerfectSquare(long n)
{
    if (n < 0)
        return false;

    switch((int)(n & 0xF))
    {
    case 0: case 1: case 4: case 9:
        long tst = (long)Math.sqrt(n);
        return tst*tst == n;

    default:
        return false;
    }
}

当我在 C++ 中测试类似代码时，它的运行速度实际上比原来的要慢。但是，当我消除 switch 语句时，十六进制技巧再次使代码速度提高了一倍。

int isPerfectSquare(int n)
{
    int h = n & 0xF;  // h is the last hex "digit"
    if (h > 9)
        return 0;
    // Use lazy evaluation to jump out of the if statement as soon as possible
    if (h != 2 && h != 3 && h != 5 && h != 6 && h != 7 && h != 8)
    {
        int t = (int) floor( sqrt((double) n) + 0.5 );
        return t*t == n;
    }
    return 0;
}

消除 switch 语句对 C# 代码几乎没有影响。

score 56 · Accepted Answer

我在想我在数值分析课程中度过的可怕时光。

然后我记得，有一个函数在 Quake 源代码中的“网络”周围盘旋：

float Q_rsqrt( float number )
{
  long i;
  float x2, y;
  const float threehalfs = 1.5F;

  x2 = number * 0.5F;
  y  = number;
  i  = * ( long * ) &y;  // evil floating point bit level hacking
  i  = 0x5f3759df - ( i >> 1 ); // wtf?
  y  = * ( float * ) &i;
  y  = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
  // y  = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed

  #ifndef Q3_VM
  #ifdef __linux__
    assert( !isnan(y) ); // bk010122 - FPE?
  #endif
  #endif
  return y;
}

它基本上计算平方根，使用牛顿的近似函数（不记得确切的名称）。

它应该是可用的，甚至可能更快，它来自一个现象级的 id 软件游戏！

它是用 C++ 编写的，但是一旦你明白了，在 Java 中重用相同的技术应该不会太难：

我最初在以下位置找到它： http: //www.codemaestro.com/reviews/9

牛顿的方法在维基百科上解释：http ://en.wikipedia.org/wiki/Newton%27s_method

您可以点击链接以获取有关其工作原理的更多说明，但如果您不太在意，那么这大致就是我从阅读博客和参加数值分析课程时所记得的：

这* (long*) &y基本上是一个快速转换为长的函数，因此可以对原始字节应用整数运算。
该0x5f3759df - (i >> 1);线是近似函数的预先计算的种子值。
将* (float*) &i值转换回浮点数。
该y = y * ( threehalfs - ( x2 * y * y ) )行基本上再次迭代函数的值。

逼近函数提供的值越精确，您在结果上迭代的次数越多。在 Quake 的情况下，一次迭代“足够好”，但如果它不适合您……那么您可以根据需要添加尽可能多的迭代。

这应该更快，因为它将天真的平方根中完成的除法运算次数减少到简单的除以 2（实际上是* 0.5F乘法运算），并用几个固定数量的乘法运算代替它。

score 39 · Accepted Answer

我不确定它是否会更快，甚至更准确，但您可以使用John Carmack 的 Magical Square Root算法更快地求解平方根。您可能可以轻松地测试所有可能的 32 位整数，并验证您实际上得到了正确的结果，因为它只是一个近似值。但是，现在我考虑一下，使用双打也是近似的，所以我不确定这将如何发挥作用。

score 36 · Accepted Answer

如果您进行二进制切分以尝试找到“正确的”平方根，您可以相当容易地检测到您所获得的值是否足够接近以判断：

(n+1)^2 = n^2 + 2n + 1
(n-1)^2 = n^2 - 2n + 1

因此，计算n^2后，选项是：

n^2 = target: 完成，返回真
n^2 + 2n + 1 > target > n^2：你很接近，但它并不完美：return false
n^2 - 2n + 1 < target < n^2: 同上
target < n^2 - 2n + 1：在较低的二进制印章n
target > n^2 + 2n + 1：更高的二进制印章n

（对不起，这n用作您当前的猜测和target参数。为混淆道歉！）

我不知道这是否会更快，但值得一试。

编辑：二进制印章也不必包含整个整数范围(2^x)^2 = 2^(2x)，因此，一旦您在目标中找到了最高设置位（可以通过位旋转技巧来完成；我完全忘记了如何）您可以快速获得一系列可能的答案。请注意，一个简单的二元切割仍然只需要 31 或 32 次迭代。

score 25 · Accepted Answer

我对这个线程中的几个算法进行了自己的分析，并得出了一些新结果。您可以在此答案的编辑历史记录中看到这些旧结果，但它们并不准确，因为我犯了一个错误，并且浪费了时间分析几种不接近的算法。然而，从几个不同的答案中吸取教训，我现在有两种算法可以粉碎这个线程的“赢家”。这是我做的与其他人不同的核心事情：

// This is faster because a number is divisible by 2^4 or more only 6% of the time
// and more than that a vanishingly small percentage.
while((x & 0x3) == 0) x >>= 2;
// This is effectively the same as the switch-case statement used in the original
// answer. 
if((x & 0x7) != 1) return false;

然而，这个简单的行，大多数时候添加一两个非常快速的指令，将switch-case语句大大简化为一个 if 语句。但是，如果许多测试数字具有显着的二次幂因子，它可以增加运行时间。

下面的算法如下：

互联网- Kip 发布的答案
Durron - 我使用一次性答案作为基础的修改答案
DurronTwo - 我使用两遍答案的修改答案（@JohnnyHeggheim），还有一些其他的细微修改。

如果数字是使用生成的，这是一个示例运行时Math.abs(java.util.Random.nextLong())

 0% Scenario{vm=java, trial=0, benchmark=Internet} 39673.40 ns; ?=378.78 ns @ 3 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 37785.75 ns; ?=478.86 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 35978.10 ns; ?=734.10 ns @ 10 trials

benchmark   us linear runtime
 Internet 39.7 ==============================
   Durron 37.8 ============================
DurronTwo 36.0 ===========================

vm: java
trial: 0

这是一个示例运行时，如果它仅在前一百万个 long 上运行：

 0% Scenario{vm=java, trial=0, benchmark=Internet} 2933380.84 ns; ?=56939.84 ns @ 10 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 2243266.81 ns; ?=50537.62 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronTwo} 3159227.68 ns; ?=10766.22 ns @ 3 trials

benchmark   ms linear runtime
 Internet 2.93 ===========================
   Durron 2.24 =====================
DurronTwo 3.16 ==============================

vm: java
trial: 0

正如你所看到的，DurronTwo对于大输入做得更好，因为它经常使用魔术，但与第一个算法相比，它被破坏了，Math.sqrt因为数字要小得多。同时，更简单Durron的是一个巨大的赢家，因为它在前一百万个数字中不必多次除以 4。

这是Durron：

public final static boolean isPerfectSquareDurron(long n) {
    if(n < 0) return false;
    if(n == 0) return true;

    long x = n;
    // This is faster because a number is divisible by 16 only 6% of the time
    // and more than that a vanishingly small percentage.
    while((x & 0x3) == 0) x >>= 2;
    // This is effectively the same as the switch-case statement used in the original
    // answer. 
    if((x & 0x7) == 1) {

        long sqrt;
        if(x < 410881L)
        {
            int i;
            float x2, y;

            x2 = x * 0.5F;
            y  = x;
            i  = Float.floatToRawIntBits(y);
            i  = 0x5f3759df - ( i >> 1 );
            y  = Float.intBitsToFloat(i);
            y  = y * ( 1.5F - ( x2 * y * y ) );

            sqrt = (long)(1.0F/y);
        } else {
            sqrt = (long) Math.sqrt(x);
        }
        return sqrt*sqrt == x;
    }
    return false;
}

和DurronTwo

public final static boolean isPerfectSquareDurronTwo(long n) {
    if(n < 0) return false;
    // Needed to prevent infinite loop
    if(n == 0) return true;

    long x = n;
    while((x & 0x3) == 0) x >>= 2;
    if((x & 0x7) == 1) {
        long sqrt;
        if (x < 41529141369L) {
            int i;
            float x2, y;

            x2 = x * 0.5F;
            y = x;
            i = Float.floatToRawIntBits(y);
            //using the magic number from 
            //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
            //since it more accurate
            i = 0x5f375a86 - (i >> 1);
            y = Float.intBitsToFloat(i);
            y = y * (1.5F - (x2 * y * y));
            y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate
            sqrt = (long) ((1.0F/y) + 0.2);
        } else {
            //Carmack hack gives incorrect answer for n >= 41529141369.
            sqrt = (long) Math.sqrt(x);
        }
        return sqrt*sqrt == x;
    }
    return false;
}

还有我的基准线束：（需要 Google caliper 0.1-rc5）

public class SquareRootBenchmark {
    public static class Benchmark1 extends SimpleBenchmark {
        private static final int ARRAY_SIZE = 10000;
        long[] trials = new long[ARRAY_SIZE];

        @Override
        protected void setUp() throws Exception {
            Random r = new Random();
            for (int i = 0; i < ARRAY_SIZE; i++) {
                trials[i] = Math.abs(r.nextLong());
            }
        }


        public int timeInternet(int reps) {
            int trues = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < ARRAY_SIZE; j++) {
                    if(SquareRootAlgs.isPerfectSquareInternet(trials[j])) trues++;
                }
            }

            return trues;   
        }

        public int timeDurron(int reps) {
            int trues = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < ARRAY_SIZE; j++) {
                    if(SquareRootAlgs.isPerfectSquareDurron(trials[j])) trues++;
                }
            }

            return trues;   
        }

        public int timeDurronTwo(int reps) {
            int trues = 0;
            for(int i = 0; i < reps; i++) {
                for(int j = 0; j < ARRAY_SIZE; j++) {
                    if(SquareRootAlgs.isPerfectSquareDurronTwo(trials[j])) trues++;
                }
            }

            return trues;   
        }
    }

    public static void main(String... args) {
        Runner.main(Benchmark1.class, args);
    }
}

更新：我制作了一种新算法，在某些情况下更快，在其他情况下更慢，我已经根据不同的输入获得了不同的基准。如果我们计算模数0xFFFFFF = 3 x 3 x 5 x 7 x 13 x 17 x 241，我们可以消除 97.82% 的不能为平方的数字。这可以（在某种程度上）在一行中完成，使用 5 次按位操作：

if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;

结果索引是 1) 残基，2) 残基+ 0xFFFFFF，或 3) 残基+ 0x1FFFFFE。当然，我们需要有一个残基模数查找表0xFFFFFF，它大约是一个 3mb 文件（在这种情况下，存储为 ascii 文本十进制数字，不是最佳的，但显然可以用 aByteBuffer等来改进。但由于这是预先计算，它不会没关系。你可以在这里找到文件（或自己生成）：

public final static boolean isPerfectSquareDurronThree(long n) {
    if(n < 0) return false;
    if(n == 0) return true;

    long x = n;
    while((x & 0x3) == 0) x >>= 2;
    if((x & 0x7) == 1) {
        if (!goodLookupSquares[(int) ((n & 0xFFFFFFl) + ((n >> 24) & 0xFFFFFFl) + (n >> 48))]) return false;
        long sqrt;
        if(x < 410881L)
        {
            int i;
            float x2, y;

            x2 = x * 0.5F;
            y  = x;
            i  = Float.floatToRawIntBits(y);
            i  = 0x5f3759df - ( i >> 1 );
            y  = Float.intBitsToFloat(i);
            y  = y * ( 1.5F - ( x2 * y * y ) );

            sqrt = (long)(1.0F/y);
        } else {
            sqrt = (long) Math.sqrt(x);
        }
        return sqrt*sqrt == x;
    }
    return false;
}

我将它加载到这样的boolean数组中：

private static boolean[] goodLookupSquares = null;

public static void initGoodLookupSquares() throws Exception {
    Scanner s = new Scanner(new File("24residues_squares.txt"));

    goodLookupSquares = new boolean[0x1FFFFFE];

    while(s.hasNextLine()) {
        int residue = Integer.valueOf(s.nextLine());
        goodLookupSquares[residue] = true;
        goodLookupSquares[residue + 0xFFFFFF] = true;
        goodLookupSquares[residue + 0x1FFFFFE] = true;
    }

    s.close();
}

示例运行时。在我运行的每次试验中，它都击败了Durron（第一版）。

 0% Scenario{vm=java, trial=0, benchmark=Internet} 40665.77 ns; ?=566.71 ns @ 10 trials
33% Scenario{vm=java, trial=0, benchmark=Durron} 38397.60 ns; ?=784.30 ns @ 10 trials
67% Scenario{vm=java, trial=0, benchmark=DurronThree} 36171.46 ns; ?=693.02 ns @ 10 trials

  benchmark   us linear runtime
   Internet 40.7 ==============================
     Durron 38.4 ============================
DurronThree 36.2 ==========================

vm: java
trial: 0

score 18 · Accepted Answer

使用牛顿法计算Integer Square Root应该快得多，然后将这个数字平方并检查，就像您在当前解决方案中所做的那样。牛顿法是其他一些答案中提到的卡马克解决方案的基础。您应该能够获得更快的答案，因为您只对根的整数部分感兴趣，从而可以更快地停止近似算法。

您可以尝试的另一种优化：如果数字的数字根不是以 1、4、7 或 9 结尾，则该数字不是完美的平方。这可以用作在应用较慢的平方根算法之前消除 60% 输入的快速方法。

score 15 · Accepted Answer

我希望这个函数适用于所有正的 64 位有符号整数

Math.sqrt()使用双精度作为输入参数，因此对于大于2^53的整数，您将无法获得准确的结果。

score 13 · Accepted Answer

仅作记录，另一种方法是使用素数分解。如果分解的每个因素都是偶数，那么这个数字是一个完美的平方。所以你想要的是看看一个数字是否可以分解为素数平方的乘积。当然，你不需要得到这样的分解，只要看看它是否存在。

首先建立一个小于 2^32 的素数平方表。这远小于包含此限制的所有整数的表。

一个解决方案将是这样的：

boolean isPerfectSquare(long number)
{
    if (number < 0) return false;
    if (number < 2) return true;

    for (int i = 0; ; i++)
    {
        long square = squareTable[i];
        if (square > number) return false;
        while (number % square == 0)
        {
            number /= square;
        }
        if (number == 1) return true;
    }
}

我想这有点神秘。它所做的是在每一步中检查素数的平方除以输入数。如果是这样，那么它会尽可能地将数字除以平方，以从素数分解中删除该平方。如果通过这个过程，我们得到 1，那么输入数字是素数平方的分解。如果平方变得大于数字本身，那么这个平方或任何更大的平方都无法分割它，因此该数字不能是素数平方的分解。

鉴于如今的 sqrt 在硬件中完成并且需要在这里计算素数，我想这个解决方案要慢得多。但正如 mrzl 在他的回答中所说，它应该比使用 sqrt 的解决方案提供更好的结果，因为 sqrt 不能超过 2^54。

score 13 · Accepted Answer

整数问题需要整数解。因此

对（非负）整数进行二进制搜索以找到最大整数 t 使得t**2 <= n. 然后测试是否r**2 = n准确。这需要时间 O(log n)。

如果你不知道如何对正整数进行二进制搜索，因为集合是无界的，这很容易。你首先计算你的递增函数 f （上图f(t) = t**2 - n） 2 的幂。当你看到它变成正数时，你已经找到了一个上限。然后你可以进行标准的二分搜索。

score 11 · Accepted Answer

有人指出，d完美正方形的最后一位数字只能取某些值。一个数字的最后一位d（以为基数b）与除以n时的余数相同，即。用 C 表示法。nb^dn % pow(b, d)

这可以推广到任何模数m，即。n % m可用于排除某些百分比的数字不是完美的正方形。您当前使用的模数是 64，它允许 12，即。19% 的余数，尽可能为正方形。通过一点编码，我找到了模数 110880，它只允许 2016，即。余数的 1.8% 作为可能的正方形。因此，根据模数运算（即除法）和表查找与机器上的平方根的成本，使用此模数可能会更快。

顺便说一句，如果 Java 有办法存储查找表的压缩位数组，请不要使用它。如今，110880 个 32 位字的 RAM 并不多，获取机器字将比获取单个位更快。

score 10 · Accepted Answer

maaartinus 解决方案的以下简化似乎在运行时减少了几个百分点，但我在基准测试方面还不够好，无法生成我可以信任的基准：

long goodMask; // 0xC840C04048404040 computed below
{
    for (int i=0; i<64; ++i) goodMask |= Long.MIN_VALUE >>> (i*i);
}

public boolean isSquare(long x) {
    // This tests if the 6 least significant bits are right.
    // Moving the to be tested bit to the highest position saves us masking.
    if (goodMask << x >= 0) return false;
    // Remove an even number of trailing zeros, leaving at most one.
    x >>= (Long.numberOfTrailingZeros(x) & (-2);
    // Repeat the test on the 6 least significant remaining bits.
    if (goodMask << x >= 0 | x <= 0) return x == 0;
    // Do it in the classical way.
    // The correctness is not trivial as the conversion from long to double is lossy!
    final long tst = (long) Math.sqrt(x);
    return tst * tst == x;
}

值得检查如何省略第一个测试，

if (goodMask << x >= 0) return false;

会影响性能。

score 9 · Accepted Answer

为了性能，您经常不得不做出一些妥协。其他人已经表达了各种方法，但是，您注意到 Carmack 的 hack 在 N 的某些值下更快。然后，您应该检查“n”，如果它小于该数字 N，请使用 Carmack 的 hack，否则使用描述的其他方法在这里的答案中。

score 8 · Accepted Answer

这是我能想到的最快的 Java 实现，它结合了这个线程中其他人建议的技术。

Mod-256 测试
不精确的 mod-3465 测试（以一些误报为代价避免整数除法）
浮点平方根，四舍五入并与输入值比较

我还尝试了这些修改，但它们对性能没有帮助：

额外的 mod-255 测试
将输入值除以 4 的幂
Fast Inverse Square Root（对于 N 的高值，它需要 3 次迭代，足以使其比硬件平方根函数慢。）

public class SquareTester {

    public static boolean isPerfectSquare(long n) {
        if (n < 0) {
            return false;
        } else {
            switch ((byte) n) {
            case -128: case -127: case -124: case -119: case -112:
            case -111: case -103: case  -95: case  -92: case  -87:
            case  -79: case  -71: case  -64: case  -63: case  -60:
            case  -55: case  -47: case  -39: case  -31: case  -28:
            case  -23: case  -15: case   -7: case    0: case    1:
            case    4: case    9: case   16: case   17: case   25:
            case   33: case   36: case   41: case   49: case   57:
            case   64: case   65: case   68: case   73: case   81:
            case   89: case   97: case  100: case  105: case  113:
            case  121:
                long i = (n * INV3465) >>> 52;
                if (! good3465[(int) i]) {
                    return false;
                } else {
                    long r = round(Math.sqrt(n));
                    return r*r == n; 
                }
            default:
                return false;
            }
        }
    }

    private static int round(double x) {
        return (int) Double.doubleToRawLongBits(x + (double) (1L << 52));
    }

    /** 3465<sup>-1</sup> modulo 2<sup>64</sup> */
    private static final long INV3465 = 0x8ffed161732e78b9L;

    private static final boolean[] good3465 =
        new boolean[0x1000];

    static {
        for (int r = 0; r < 3465; ++ r) {
            int i = (int) ((r * r * INV3465) >>> 52);
            good3465[i] = good3465[i+1] = true;
        }
    }

}

score 7 · Accepted Answer

你应该从一开始就去掉 N 的 2 次方部分。

第二次编辑 下面 m 的神奇表达式应该是

m = N - (N & (N-1));

而不是写的

第二次编辑结束

m = N & (N-1); // the lawest bit of N
N /= m;
byte = N & 0x0F;
if ((m % 2) || (byte !=1 && byte !=9))
  return false;

第一次编辑：

小改进：

m = N & (N-1); // the lawest bit of N
N /= m;
if ((m % 2) || (N & 0x07 != 1))
  return false;

第一次编辑结束

现在像往常一样继续。这样，当你到达浮点部分时，你已经摆脱了所有 2 次方为奇数的数字（大约一半），然后你只考虑剩下的 1/8。即，您在 6% 的数字上运行浮点部分。

score 7 · Accepted Answer

标签中提到了 Project Euler，其中的许多问题都需要检查数字 >> 2^64。当您使用 80 字节缓冲区时，上面提到的大多数优化都不容易工作。

我使用了 java BigInteger 和牛顿方法的略微修改版本，它对整数效果更好。问题是精确平方n^2收敛到(n-1)而不是n因为n^2-1 = (n-1)(n+1)，最终误差仅比最终除数低一步，算法终止。通过在计算错误之前向原始参数添加一个很容易修复。（为立方根等添加两个）

该算法的一个很好的特性是您可以立即判断该数字是否为完美平方 - 牛顿方法中的最终误差（非校正）将为零。一个简单的修改还可以让您快速计算floor(sqrt(x))而不是最接近的整数。这对于几个欧拉问题很方便。

score 6 · Accepted Answer

这是旧的 Marchant 计算器算法从十进制到二进制的返工（对不起，我没有参考），在 Ruby 中，专门针对这个问题进行了调整：

def isexactsqrt(v)
    value = v.abs
    residue = value
    root = 0
    onebit = 1
    onebit <<= 8 while (onebit < residue)
    onebit >>= 2 while (onebit > residue)
    while (onebit > 0)
        x = root + onebit
        if (residue >= x) then
            residue -= x
            root = x + onebit
        end
        root >>= 1
        onebit >>= 2
    end
    return (residue == 0)
end

这是一个类似的工作（请不要因为编码风格/气味或笨拙的 O/O 而投票给我 - 这是重要的算法，C++ 不是我的母语）。在这种情况下，我们正在寻找残差 == 0：

#include <iostream>  

using namespace std;  
typedef unsigned long long int llint;

class ISqrt {           // Integer Square Root
    llint value;        // Integer whose square root is required
    llint root;         // Result: floor(sqrt(value))
    llint residue;      // Result: value-root*root
    llint onebit, x;    // Working bit, working value

public:

    ISqrt(llint v = 2) {    // Constructor
        Root(v);            // Take the root 
    };

    llint Root(llint r) {   // Resets and calculates new square root
        value = r;          // Store input
        residue = value;    // Initialise for subtracting down
        root = 0;           // Clear root accumulator

        onebit = 1;                 // Calculate start value of counter
        onebit <<= (8*sizeof(llint)-2);         // Set up counter bit as greatest odd power of 2 
        while (onebit > residue) {onebit >>= 2; };  // Shift down until just < value

        while (onebit > 0) {
            x = root ^ onebit;          // Will check root+1bit (root bit corresponding to onebit is always zero)
            if (residue >= x) {         // Room to subtract?
                residue -= x;           // Yes - deduct from residue
                root = x + onebit;      // and step root
            };
            root >>= 1;
            onebit >>= 2;
        };
        return root;                    
    };
    llint Residue() {           // Returns residue from last calculation
        return residue;                 
    };
};

int main() {
    llint big, i, q, r, v, delta;
    big = 0; big = (big-1);         // Kludge for "big number"
    ISqrt b;                            // Make q sqrt generator
    for ( i = big; i > 0 ; i /= 7 ) {   // for several numbers
        q = b.Root(i);                  // Get the square root
        r = b.Residue();                // Get the residue
        v = q*q+r;                      // Recalc original value
        delta = v-i;                    // And diff, hopefully 0
        cout << i << ": " << q << " ++ " << r << " V: " << v << " Delta: " << delta << "\n";
    };
    return 0;
};

score 6 · Accepted Answer

如前所述， sqrt 调用并不完全准确，但有趣且有启发性的是，它不会在速度方面吹走其他答案。毕竟，sqrt 的汇编语言指令序列很小。英特尔有一个硬件指令，我相信 Java 不使用它，因为它不符合 IEEE。

那为什么慢呢？因为 Java 实际上是通过 JNI 调用 C 例程，而且这样做实际上比调用 Java 子例程要慢，而 Java 子例程本身比内联调用要慢。这很烦人，Java 应该想出一个更好的解决方案，即在必要时构建浮点库调用。那好吧。

在 C++ 中，我怀疑所有复杂的替代方案都会降低速度，但我还没有全部检查过。我所做的，以及 Java 人会发现有用的，是一个简单的 hack，是 A. Rex 建议的特殊情况测试的扩展。使用单个 long 值作为位数组，不检查边界。这样，您就有了 64 位布尔查找。

typedef unsigned long long UVLONG
UVLONG pp1,pp2;

void init2() {
  for (int i = 0; i < 64; i++) {
    for (int j = 0; j < 64; j++)
      if (isPerfectSquare(i * 64 + j)) {
    pp1 |= (1 << j);
    pp2 |= (1 << i);
    break;
      }
   }
   cout << "pp1=" << pp1 << "," << pp2 << "\n";  
}


inline bool isPerfectSquare5(UVLONG x) {
  return pp1 & (1 << (x & 0x3F)) ? isPerfectSquare(x) : false;
}

例程 isPerfectSquare5 在我的 core2 duo 机器上运行的时间大约是 1/3。我怀疑沿着相同的路线进行进一步的调整可以进一步减少平均时间，但是每次你检查时，你都是在用更多的测试来换取更多的消除，所以你不能在这条路上走得太远。

当然，您可以用同样的方法检查高 6 位，而不是单独测试阴性。

请注意，我所做的只是消除可能的正方形，但是当我有潜在的情况时，我必须调用原始的内联 isPerfectSquare。

init2 例程被调用一次以初始化 pp1 和 pp2 的静态值。请注意，在我的 C++ 实现中，我使用的是 unsigned long long，所以既然您已签名，就必须使用 >>> 运算符。

对数组进行边界检查没有本质上的需要，但是 Java 的优化器必须很快解决这些问题，所以我不怪他们。

score 6 · Accepted Answer

我喜欢在某些输入上使用几乎正确的方法的想法。这是一个具有更高“偏移量”的版本。该代码似乎工作并通过了我的简单测试用例。

只需更换您的：

if(n < 410881L){...}

这个代码：

if (n < 11043908100L) {
    //John Carmack hack, converted to Java.
    // See: http://www.codemaestro.com/reviews/9
    int i;
    float x2, y;

    x2 = n * 0.5F;
    y = n;
    i = Float.floatToRawIntBits(y);
    //using the magic number from 
    //http://www.lomont.org/Math/Papers/2003/InvSqrt.pdf
    //since it more accurate
    i = 0x5f375a86 - (i >> 1);
    y = Float.intBitsToFloat(i);
    y = y * (1.5F - (x2 * y * y));
    y = y * (1.5F - (x2 * y * y)); //Newton iteration, more accurate

    sqrt = Math.round(1.0F / y);
} else {
    //Carmack hack gives incorrect answer for n >= 11043908100.
    sqrt = (long) Math.sqrt(n);
}

score 6 · Accepted Answer

考虑到一般位长（尽管我在这里使用了特定类型），我尝试设计如下简单的算法。最初需要对 0、1、2 或 <0 进行简单而明显的检查。从某种意义上说，以下很简单，它不会尝试使用任何现有的数学函数。大多数运算符可以替换为按位运算符。不过，我还没有使用任何基准数据进行测试。我既不是数学专家，也不是计算机算法设计方面的专家，我很乐意看到你指出问题。我知道那里有很多改进的机会。

int main()
{
    unsigned int c1=0 ,c2 = 0;  
    unsigned int x = 0;  
    unsigned int p = 0;  
    int k1 = 0;  
    scanf("%d",&p);  
    if(p % 2 == 0) {  
        x = p/2; 
    }  
    else {  
        x = (p/2) +1;  
    }  
    while(x) 
    {
        if((x*x) > p) {  
            c1 = x;  
            x = x/2; 
        }else {  
            c2 = x;  
            break;  
        }  
    }  
    if((p%2) != 0)  
        c2++;

    while(c2 < c1) 
    {  
        if((c2 * c2 ) == p) {  
            k1 = 1;  
            break;  
        }  
        c2++; 
    }  
    if(k1)  
        printf("\n Perfect square for %d", c2);  
    else  
        printf("\n Not perfect but nearest to :%d :", c2);  
    return 0;  
}

score 5 · Accepted Answer

当观察到正方形的最后 n 位时，我检查了所有可能的结果。通过连续检查更多位，可以消除多达 5/6 的输入。我实际上设计了这个来实现费马的因式分解算法，而且速度非常快。

public static boolean isSquare(final long val) {
   if ((val & 2) == 2 || (val & 7) == 5) {
     return false;
   }
   if ((val & 11) == 8 || (val & 31) == 20) {
     return false;
   }

   if ((val & 47) == 32 || (val & 127) == 80) {
     return false;
   }

   if ((val & 191) == 128 || (val & 511) == 320) {
     return false;
   }

   // if((val & a == b) || (val & c == d){
   //   return false;
   // }

   if (!modSq[(int) (val % modSq.length)]) {
        return false;
   }

   final long root = (long) Math.sqrt(val);
   return root * root == val;
}

最后一点伪代码可用于扩展测试以消除更多值。上面的测试是针对 k = 0, 1, 2, 3

a 的形式为 (3 << 2k) - 1

b 的形式为 (2 << 2k)

c 的形式为 (2 << 2k + 2) - 1

d 的形式为 (2 << 2k - 1) * 10

它首先测试它是否具有模量为 2 的平方残差，然后基于最终模数进行测试，然后使用 Math.sqrt 进行最终测试。我从上面的帖子中提出了这个想法，并试图对其进行扩展。我感谢任何意见或建议。

更新：使用模数 (modSq) 和模数基数 44352 进行的测试，我的测试在 OP 更新中高达 1,000,000,000 的数字的 96% 的时间内运行。

score 2 · Accepted Answer

具有整数算术的牛顿法

如果您希望避免非整数运算，可以使用以下方法。它基本上使用针对整数算术修改的牛顿法。

/**
 * Test if the given number is a perfect square.
 * @param n Must be greater than 0 and less
 *    than Long.MAX_VALUE.
 * @return <code>true</code> if n is a perfect
 *    square, or <code>false</code> otherwise.
 */
public static boolean isSquare(long n)
{
    long x1 = n;
    long x2 = 1L;

    while (x1 > x2)
    {
        x1 = (x1 + x2) / 2L;
        x2 = n / x1;
    }

    return x1 == x2 && n % x1 == 0L;
}

此实现无法与使用Math.sqrt. 但是，可以通过使用其他一些帖子中描述的过滤机制来提高其性能。

score 2 · Accepted Answer

这是一个分而治之的解决方案。

如果自然数 ( ) 的平方根number是自然数 ( )，您可以根据的位数solution轻松确定的范围：solutionnumber

number有 1 位数字：solution在范围内 = 1 - 4
number有 2 位数字：solution范围 = 3 - 10
number有 3 位数字：solution在范围内 = 10 - 40
number有 4 位数字：solution在范围内 = 30 - 100
number有 5 位数字：solution在范围内 = 100 - 400

注意到重复了吗？

您可以在二进制搜索方法中使用此范围来查看是否存在solution以下内容：

number == solution * solution

这是代码

这是我的课 SquareRootChecker

public class SquareRootChecker {

    private long number;
    private long initialLow;
    private long initialHigh;

    public SquareRootChecker(long number) {
        this.number = number;

        initialLow = 1;
        initialHigh = 4;
        if (Long.toString(number).length() % 2 == 0) {
            initialLow = 3;
            initialHigh = 10;
        }
        for (long i = 0; i < Long.toString(number).length() / 2; i++) {
            initialLow *= 10;
            initialHigh *= 10;
        }
        if (Long.toString(number).length() % 2 == 0) {
            initialLow /= 10;
            initialHigh /=10;
        }
    }

    public boolean checkSquareRoot() {
        return findSquareRoot(initialLow, initialHigh, number);
    }

    private boolean findSquareRoot(long low, long high, long number) {
        long check = low + (high - low) / 2;
        if (high >= low) {
            if (number == check * check) {
                return true;
            }
            else if (number < check * check) {
                high = check - 1;
                return findSquareRoot(low, high, number);
            }
            else  {
                low = check + 1;
                return findSquareRoot(low, high, number);
            }
        }
        return false;
    }

}

这是一个如何使用它的例子。

long number =  1234567;
long square = number * number;
SquareRootChecker squareRootChecker = new SquareRootChecker(square);
System.out.println(square + ": " + squareRootChecker.checkSquareRoot()); //Prints "1524155677489: true"

long notSquare = square + 1;
squareRootChecker = new SquareRootChecker(notSquare);
System.out.println(notSquare + ": " + squareRootChecker.checkSquareRoot()); //Prints "1524155677490: false"

score 2 · Accepted Answer

一个数字的平方根，假设该数字是一个完美的平方。

复杂度为 log(n)

/**
 * Calculate square root if the given number is a perfect square.
 * 
 * Approach: Sum of n odd numbers is equals to the square root of n*n, given 
 * that n is a perfect square.
 *
 * @param number
 * @return squareRoot
 */

public static int calculateSquareRoot(int number) {

    int sum=1;
    int count =1;
    int squareRoot=1;
    while(sum<number) {
        count+=2;
        sum+=count;
        squareRoot++;
    }
    return squareRoot;
}

score 1 · Accepted Answer

如果速度是一个问题，为什么不将最常用的输入集及其值划分到查找表中，然后针对特殊情况执行任何优化的魔法算法？

score 1 · Accepted Answer

应该可以比这更有效地打包“如果最后 X 位是 N，则不能是完美的正方形”！我将使用 java 32 位整数，并生成足够的数据来检查数字的最后 16 位——即 2048 个十六进制整数值。

...

行。要么我遇到了一些超出我能力的数论，要么我的代码中有错误。无论如何，这里是代码：

public static void main(String[] args) {
    final int BITS = 16;

    BitSet foo = new BitSet();

    for(int i = 0; i< (1<<BITS); i++) {
        int sq = (i*i);
        sq = sq & ((1<<BITS)-1);
        foo.set(sq);
    }

    System.out.println("int[] mayBeASquare = {");

    for(int i = 0; i< 1<<(BITS-5); i++) {
        int kk = 0;
        for(int j = 0; j<32; j++) {
            if(foo.get((i << 5) | j)) {
                kk |= 1<<j;
            }
        }
        System.out.print("0x" + Integer.toHexString(kk) + ", ");
        if(i%8 == 7) System.out.println();
    }
    System.out.println("};");
}

结果如下：

（ed：因为 prettify.js 的性能不佳而被省略；查看修订历史以查看。）

score 1 · Accepted Answer

“我正在寻找最快的方法来确定一个长值是否是一个完美的平方（即它的平方根是另一个整数）。”

答案令人印象深刻，但我没有看到一个简单的检查：

检查 long it 右边的第一个数字是否是集合 (0,1,4,5,6,9) 的成员。如果不是，那么它不可能是一个“完美的正方形”。

例如。

4567 - 不可能是完美的正方形。

score 1 · Accepted Answer

这是最简单和最简洁的方法，虽然我不知道它在 CPU 周期方面的比较。如果您只想知道根是否为整数，则此方法非常有用。如果你真的关心它是否是一个整数，你也可以算出来。这是一个简单（纯粹）的功能：

private static final MathContext precision = new MathContext(20);

private static final Function<Long, Boolean> isRootWhole = (n) -> {
    long digit = n % 10;
    if (digit == 2 || digit == 3 || digit == 7 || digit == 8) {
        return false;
    }
    return new BigDecimal(n).sqrt(precision).scale() == 0;
};

如果您不需要微优化，那么这个答案在简单性和可维护性方面会更好。如果要计算负数，则需要相应地处理，并将绝对值发送到函数中。我已经包含了一个小的优化，因为由于二次残差 mod 10，没有完美的正方形具有 2、3、7 或 8 的十位数。

在我的 CPU 上，在 0 - 10,000,000 上运行该算法，每次计算平均需要 1000 - 1100 纳秒。

如果您执行的计算数量较少，则较早的计算需要更长的时间。

我有一个负面评论，说我之前的编辑不适用于大量数字。OP提到了Longs，Long的最大完美正方形是9223372030926249001，所以这种方法适用于所有Longs。

score 1 · Accepted Answer

用牛顿法计算平方根非常快……只要起始值合理。然而，没有合理的起始值，实际上我们以二等分和 log(2^64) 行为结束。
为了真正快速，我们需要一种快速的方法来获得合理的起始值，这意味着我们需要深入研究机器语言。如果处理器在 Pentium 中提供类似 POPCNT 的指令，它计算前导零，我们可以使用它来获得具有一半有效位的起始值。小心，我们可以找到一个固定数量的牛顿步，这总是足够的。（因此无需循环并具有非常快速的执行。）

第二种解决方案是通过浮点工具进行，它可能具有快速的 sqrt 计算（如 i87 协处理器）。甚至通过 exp() 和 log() 的偏移也可能比牛顿退化为二进制搜索更快。这有一个棘手的方面，即依赖于处理器的分析，分析之后需要进行哪些改进以及是否需要改进。

第三种解决方案解决了一个略有不同的问题，但值得一提，因为问题中描述了这种情况。如果你想为稍微不同的数字计算大量平方根，你可以使用牛顿迭代，如果你从不重新初始化起始值，而只是把它留在前面计算停止的地方。我在至少一个欧拉问题中成功地使用了它。

score 1 · Accepted Answer

可能是该问题的最佳算法是快速整数平方根算法https://stackoverflow.com/a/51585204/5191852

@Kde 声称牛顿方法的三次迭代足以使 32 位整数的精度达到 ±1。当然，64 位整数需要更多的迭代，可能是 6 或 7。

score 0 · Accepted Answer

如果你想要速度，考虑到你的整数大小有限，我怀疑最快的方法是（a）按大小划分参数（例如，按最大位集划分类别），然后根据完美正方形数组检查值在那个范围内。

score 0 · Accepted Answer

关于 Carmac 方法，再次迭代似乎很容易，这应该会使准确度的位数增加一倍。毕竟，它是一种非常截断的迭代方法——牛顿法，第一次猜测非常好。

关于您目前的最佳表现，我看到了两个微优化：

使用 mod255 检查后将检查与 0 移动
重新排列四的除法以跳过通常（75％）情况的所有检查。

IE：

// Divide out powers of 4 using binary search

if((n & 0x3L) == 0) {
  n >>=2;

  if((n & 0xffffffffL) == 0)
    n >>= 32;
  if((n & 0xffffL) == 0)
      n >>= 16;
  if((n & 0xffL) == 0)
      n >>= 8;
  if((n & 0xfL) == 0)
      n >>= 4;
  if((n & 0x3L) == 0)
      n >>= 2;
}

更好的可能是一个简单的

while ((n & 0x03L) == 0) n >>= 2;

显然，知道每个检查点有多少数字被剔除会很有趣——我相当怀疑这些检查是否真正独立，这让事情变得棘手。

score -1 · Accepted Answer

不确定这是否是最快的方法，但这是我（很久以前在高中时）在数学课上无聊和玩计算器时偶然发现的。当时，我真的很惊讶这能奏效......

public static boolean isIntRoot(int number) {
    return isIntRootHelper(number, 1);
}

private static boolean isIntRootHelper(int number, int index) {
    if (number == index) {
        return true;
    }
    if (number < index) {
        return false;
    }
    else {
        return isIntRootHelper(number - 2 * index, index + 1);
    }
}

score -1 · Accepted Answer

static boolean isPerfectSquare (int input) {
  return Math.sqrt(input) == (int) Math.sqrt(input);
}

如果平方根的整数值input等于双精度值，这将返回。这意味着它是一个整数，它将返回true. 否则，它将返回false。

score -3 · Accepted Answer

不知道最快，但最简单的是以正常方式取平方根，将结果乘以自己，看看它是否与您的原始值匹配。

由于我们在这里讨论整数，所以禁食可能会涉及一个集合，您可以在其中进行查找。

java - 确定整数平方根是否为整数的最快方法

36 回答 36

具有整数算术的牛顿法

Related

Reference