I've shortened the tables to only show the relevant columns for this query. It requires two tables and the queries are taking a long time and we've not even rolled into the 4+ million queries and a log file that is 30+ million records or a user table with 1+ million records. It has me rethinking this... I need some guidance and suggestions:
Here's the table:
// an abreviated users table
`userid` int(10) unsigned NOT NULL AUTO_INCREMENT,
`type` tinyint(1) NOT NULL COMMENT '1=biz, 2=apt, 3=condo, 4=home',
`distance` decimal(12,7) NOT NULL DEFAULT '1.0000000' COMMENT 'distance away to recv stuff',
`lat` decimal(12,7) NOT NULL,
`lon` decimal(12,7) NOT NULL,
`location` point NOT NULL COMMENT 'GeomFromText',
UNIQUE KEY `userid` (`userid`),
KEY `distance` (`distance`),
KEY `lat` (`lat`),
KEY `lon` (`lon`),
SPATIAL KEY `location` (`location`),
KEY `idx_user_type` (`type`)
Here's the log table.
// pretty much the full log table
`id` int(11) unsigned NOT NULL AUTO_INCREMENT COMMENT 'record num',
`userid` int(11) unsigned NOT NULL COMMENT 'user id receiving alert',
`trackid` bigint(20) unsigned NOT NULL COMMENT 'id of msg from message table',
`sent` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'when msg created',
KEY `idx_msg_log_userid` (`userid`),
KEY `idx_msg_log_trackid` (`trackid`)
) ENGINE=MyISAM DEFAULT CHARSET=ascii COMMENT='log of all of some stuff' AUTO_INCREMENT=62232;
Some sample data for the log file
INSERT INTO `some_log` (`id`, `userid`, `trackid`, `sent`) VALUES
(1, 1, 4, '2011-07-14 18:14:25'),
(2, 2, 4, '2011-07-14 18:14:25'),
(3, 13, 6, '2011-07-25 23:05:54'),
(4, 44, 7, '2011-08-09 16:20:02'),
(5, 12, 17, '2011-08-16 07:35:01'),
(6, 43, 17, '2011-08-16 07:35:01'),
(7, 45, 17, '2011-08-16 07:35:01'),
(8, 12, 18, '2011-08-16 08:05:01'),
(9, 43, 18, '2011-08-16 08:05:01'),
(10, 45, 18, '2011-08-16 08:05:01');
Here's the query.
// the query = $distance can be from 1/10th mile to 5 miles
SELECT *,(((acos(sin(($lat *pi()/180)) * sin((`lat`*pi()/180))+cos(($lat *pi()/180)) * cos((`lat`*pi()/180))* cos((($lon - `lon`)*pi()/180))))*180/pi())*60*1.1515) AS dist_x
FROM `users`
WHERE userid NOT IN (
SELECT userid
FROM some_log AS L
WHERE L.trackid='$trackid')
HAVING dist_x<='$distance' AND dist_x<=`distance`
ORDER BY dist_x ASC";
Here's another query. This one is slow.
// the above query is pretty quick given the test data
// this query is dog crap slow...
// we added in type and 4 is the most common type of user
SELECT *,(((acos(sin(($lat *pi()/180)) * sin((`lat`*pi()/180))+cos(($lat *pi()/180)) * cos((`lat`*pi()/180))* cos((($lon - `lon`)*pi()/180))))*180/pi())*60*1.1515) AS dist_x
FROM `users`
WHERE type='4' AND userid NOT IN (
SELECT userid
FROM some_log AS L
WHERE L.trackid='$trackid')
HAVING dist_x<='$distance' AND dist_x<=`distance`
ORDER BY dist_x ASC";
One question would be: Is there a radius/circle search that uses GeomFromText/POINT field vs the lat/lon search?
Another question: Is there a better way to check the some_log table for an entry where this $userid already has a $trackid?