1

I'm hoping to get some help choosing a database and layout well suited to a web application I have to write (outlined below), I'm a bit stumped given the large number of records and fact that they need to be able to be queried in any manner.

The web app will basically allow querying of a large number of records using any combination of criteria that make up the records, date is the only mandatory item. A record consists of only eight items (below), but there will be about three million new records a day, with very few duplicate records. Data will be constantly inserted into the database real time for the current day.

I know the biggest interest will be in the last 6 months -> 1 years worth of data, but the rest will still need to be available for the same type of queries.

I'm not sure what database is best suited for this, nor how to structure it. The database will be on a reasonably powerful server. I basically want to start with a good db design, and see how the queries perform. I can then judge if I'd rather do optimizations or throw more powerful hardware at it. I just don't want to have to redo the base db design, and it's fine initially if we're doing a lot of optimizations we have time but not $$$.

We need to use something open source, not something like oracle. Right now I'm leaning towards postgres.

A record consists of:

1 Date
2 unsigned integer
3 unsigned integer
4 unsigned integer
5 unsigned integer
6 unsigned integer
7 Text 16 chars
8 Text 255 chars

I'm planning on creating yearly schemas, monthly tables, and indexing the record tables on date for sure.

I'll probably be able to add another index or two after I analyze usage patterns to see what the most popular queries are. I can do lots of tricks on the app site as far as caching popular queries and what not, it's really the db side I need assistance with. Field 8 will have some duplicate values so I'm planning on having that column be an id into a lookup table to join on. Beyond that I guess the remaining fields will all be in one monthly table...

I could break it into weekly tables i suppose as well and use a view for queries so the app doesn't have to deal with trying to assemble a complex query....

anyway, thanks very much for any feedback or assistance!

4

2 回答 2

1

一些简短的建议...

  1. 每天 300 万条记录已经很多了!(至少我是这么认为的,其他人可能甚至不会眨眼。)我会尝试编写一个工具来插入虚拟记录,看看 Postgres 之类的东西如何处理一个月的数据。

  2. 最好研究一下 NoSQL 解决方案,它为您提供开源 + 可扩展性。从 Couchbase 和 Mongo 开始。如果您要在线保存一个月的数据以进行实时查询,我不确定 Postgres 将如何处理 9000 万条记录。也许很棒,但也许不是。

  3. 考虑在您决定的任何系统中使用“离线”数据库。你将实时数据保存在最好的机器上,它已经准备好了,但是你将旧数据移到另一个更便宜的服务器上(阅读:更慢)。通过这种方式,您始终可以回答查询,但有些查询比其他查询要快。

于 2013-01-31T22:29:48.280 回答
0

以我的经验,主要使用具有类似记录插入频率(数十亿行表)的 Oracle,您可以通过仔细分区数据(可能按日期,在您的情况下)和索引表来实现良好的 Web 应用程序查询性能。你如何准确地处理你的数据库架构将取决于很多因素,但是网络上有很多很好的资源可以帮助你获得这些东西。

听起来你的数据库比较扁平,所以也许另一个数据库解决方案会更好,但 Oracle 一直对我很好。

于 2013-01-31T22:50:21.907 回答