1

我有一个大约 400 万行的 MySQL 表。假设表格如下:

表中的列Person

  • Id
  • Name
  • Age
  • Marital Status
  • Education Level
  • '位置国家'
  • '描述'

当我基于 运行查询时Age,我还希望汇总具有不同婚姻状况以及不同“教育水平”和“所在国家/地区”的相同年龄的人。

当我根据年龄和教育水平运行查询时,我还希望汇总具有相同年龄和教育水平、不同婚姻状况以及不同“位置国家”的人。

例如,发出的查询是SELECT * FROM Person WHERE Age = 27;. 我还想要由SELECT Education Level, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Education Level;and产生的结果SELECT Location Country, COUNT(*) FROM Person WHERE Age = 27 GROUP BY Location Country;

此外,当我必须根据描述中的关键字进行搜索并希望对其他每一列进行汇总计数时,这对我来说变得更具挑战性。我正在开发的应用程序是一种搜索引擎。这可以在 Ebay 等网站上看到,

我可以单独运行这些查询。但是,对于 400 万行,GROUP BY 查询将花费大量时间。这是一个互联网应用程序,查询应在几秒钟内完成。

任何帮助将非常感激。

4

2 回答 2

0

From what you are describing, I would have a separate aggregate table to query directly from that has those "roll-up" stats you want. How frequent is the "Person" table getting added to / changed. If you are only storing a person's "Age", what is the basis of the age if no date, and you add the person again in future they would have multiple records... such that

At age X, so many people were married (or not) and had this level of education. At age Y, so many people... etc..

I would create a summary table, something like

create table AgeStat ( 
   age int, 
   married int, 
   single int, 
   divorced int, 
   HighSchool int, 
   Associates int,
   Bachelors int,
   Masters int,
   Doctorate int )

Then, add a trigger to the person table such that during insert (or inclusive of update/delete as needed), the new record just adds 1 to each respective count applicable.

Then, for your web app, it would be instantaneous to grab one record from this summary table where age = 27 and you have ALL your classification stats.

However, if you distinctly wanted to know how many Married with Masters degree, you would have to roll back to master person list.

Alternatively, you could do a similar pre-aggregation but down a level of granularity something like

create table AgeStat ( 
   age int, 
   maritalstat int,    -- but I would actually use an enumerated value for marital status
   educationlevel int, -- and education level vs a hard description of each.
   peoplecount int )

and likewise have a trigger that updates the count based on the two combination elements per age. Then, if you wanted the total "Married", you can sum(peoplecount) for age = 27 and maritalstat=(enumerator for "married" value)

Good luck, and hope it helps alternative solution for you.

于 2012-11-17T08:52:50.137 回答
0

您可以在一个查询中完成这两项操作

SELECT p.*, count(p2.id)  
FROM Person p, Person p2 
WHERE p2.Age = p.age and p2.marital != p.marital and p1.education != p2.education 
GROUP BY p1.id

在这种情况下,我建议将数据保存在memcache缓存中。如果新数据插入到表中或在某个过期时间之后,您可以使缓存过期,以避免长时间执行查询。另一个改进是使用 LIMIT 来减少 DB 返回的行数,如下所示:

SELECT p.*, count(p2.id)  
FROM Person p, Person p2 
WHERE p2.Age = p.age and p2.marital != p.marital and p1.education != p2.education 
GROUP BY p1.id
LIMIT 10
于 2012-11-17T03:38:43.593 回答