1

I've shortened the tables to only show the relevant columns for this query. It requires two tables and the queries are taking a long time and we've not even rolled into the 4+ million queries and a log file that is 30+ million records or a user table with 1+ million records. It has me rethinking this... I need some guidance and suggestions:

Here's the table:

// an abreviated users table
CREATE TABLE IF NOT EXISTS `users` (
  `userid` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `type` tinyint(1) NOT NULL COMMENT '1=biz, 2=apt, 3=condo, 4=home',
  `distance` decimal(12,7) NOT NULL DEFAULT '1.0000000' COMMENT 'distance away to recv stuff',
  `lat` decimal(12,7) NOT NULL,
  `lon` decimal(12,7) NOT NULL,
  `location` point NOT NULL COMMENT 'GeomFromText',
  UNIQUE KEY `userid` (`userid`),
  KEY `distance` (`distance`),
  KEY `lat` (`lat`),
  KEY `lon` (`lon`),
  SPATIAL KEY `location` (`location`),
  KEY `idx_user_type` (`type`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci AUTO_INCREMENT=501 ;

Here's the log table.

// pretty much the full log table
CREATE TABLE IF NOT EXISTS `some_log` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'record num',
  `userid` int(11) unsigned NOT NULL COMMENT 'user id receiving alert',
  `trackid` bigint(20) unsigned NOT NULL COMMENT 'id of msg from message table',
  `sent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'when msg created',
  PRIMARY KEY (`id`),
  KEY `idx_msg_log_userid` (`userid`),
  KEY `idx_msg_log_trackid` (`trackid`)
) ENGINE=MyISAM  DEFAULT CHARSET=ascii COMMENT='log of all of some stuff' AUTO_INCREMENT=62232;

Some sample data for the log file

INSERT INTO `some_log` (`id`, `userid`, `trackid`, `sent`) VALUES
(1, 1, 4, '2011-07-14 18:14:25'),
(2, 2, 4, '2011-07-14 18:14:25'),
(3, 13, 6, '2011-07-25 23:05:54'),
(4, 44, 7, '2011-08-09 16:20:02'),
(5, 12, 17, '2011-08-16 07:35:01'),
(6, 43, 17, '2011-08-16 07:35:01'),
(7, 45, 17, '2011-08-16 07:35:01'),
(8, 12, 18, '2011-08-16 08:05:01'),
(9, 43, 18, '2011-08-16 08:05:01'),
(10, 45, 18, '2011-08-16 08:05:01');

Here's the query.

// the query = $distance can be from 1/10th mile to 5 miles
SELECT *,(((acos(sin(($lat *pi()/180)) * sin((`lat`*pi()/180))+cos(($lat *pi()/180)) * cos((`lat`*pi()/180))* cos((($lon - `lon`)*pi()/180))))*180/pi())*60*1.1515) AS dist_x 
   FROM `users`
   WHERE userid NOT IN (
      SELECT userid
      FROM some_log AS L
      WHERE L.trackid='$trackid')
   HAVING dist_x<='$distance' AND dist_x<=`distance` 
   ORDER BY dist_x ASC";

Here's another query. This one is slow.

// the above query is pretty quick given the test data
// this query is dog crap slow...
// we added in type and 4 is the most common type of user
SELECT *,(((acos(sin(($lat *pi()/180)) * sin((`lat`*pi()/180))+cos(($lat *pi()/180)) * cos((`lat`*pi()/180))* cos((($lon - `lon`)*pi()/180))))*180/pi())*60*1.1515) AS dist_x 
   FROM `users`
   WHERE type='4' AND userid NOT IN (
      SELECT userid
      FROM some_log AS L
      WHERE L.trackid='$trackid')
   HAVING dist_x<='$distance' AND dist_x<=`distance` 
   ORDER BY dist_x ASC";

One question would be: Is there a radius/circle search that uses GeomFromText/POINT field vs the lat/lon search?

Another question: Is there a better way to check the some_log table for an entry where this $userid already has a $trackid?

4

1 回答 1

2

忘记表中的空间索引和空间列。他们不会帮助您进行经纬度计算。

您可以使用 lat 索引从半正弦计算中排除整组点对。利用这一事实:每个纬度大约有 69 英里、60 海里或 111.045 公里。(这并不准确,但非常接近)。

因此,您可以在查询中添加几个条件。这些将在您的 lat 索引上添加范围扫描,这比您的条件要快得多。HAVING

WHERE ....
  AND $lat >= lat - ($distance/69.0)
  AND $lat <= lat + ($distance/69.0)
  ...

这将排除所有太北或太南而无法包含在您的斜线距离计算中的点。这将节省大量时间。

您也可以对 lon 执行此操作,但经度和距离之间的关系因纬度而异。距离两极越近,经度线越靠近。因此公式比较棘手。

最后,float对于 lat 和 lon 来说是一个非常好的数据类型。此应用程序不需要高精度十进制数据,除非您是土木工程师并且您关心地球的真实形状是大地水准面,而不是球体。如果你关心这一点,你最好使用比haversine更精确的距离公式。但我们在这里谈论的是厘米的差异——停车场里的大水坑,但对于寻找商店的人来说没问题。

于 2013-06-26T21:10:10.700 回答