2

i'm currently working on a project where the client has handed me a database that includes a table with over 200 columns and 3 million rows of data lol. This is definitely poorly designed and currently exploring some options. I developed the app on my 2012 mbp with 16gb of ram and an 512 ssd. I had to develop the app using mvc4 so set up the development and test environment using parallels 8 on osx. As part of the design, I developed an interface for the client to create custom queries to this large table with hundreds of rows so I am sending a queryString to the controller which is passed using dynamic linq and the results are sent to the view using JSON (to populate a kendo ui grid). On my mbp, when testing queries using the interface i created it takes max 10 secs (which find too much) to return the results to my kendo ui grid. Similarly, when I test queries directly in sql server, it never takes really long.

However when I deployed this to the client for testing these same queries take in excess of 3 mins +. So long story short, the client will be upgrading the server hardware but in the mean time they still need to test the app.

My question is, despite the fact that the table holds 200 columns, each row is unique. More specifically, the design is:

PK-(GUID) OrganizationID (FK) --- 200 columns (tax fields)

If I redesign this to:

PK (GUID) OrganizationID (FK) FieldID(FK) Input

Field table: FieldID FieldName

This would turn this 3 million rows of data table into 600 million rows but only 3 columns. Will I see performance enhancements?

Any insight would be appreciated - I understand normalization but most of my experience is in programming.

Thanks in advance!

4

2 回答 2

1

在不知道您在表上运行的查询的情况下,很难做出任何判断。

以下是一些注意事项:

  1. 如果查询只返回少数几行,请确保查询正在使用索引。
  2. 检查您是否有足够的内存来将表存储在内存中。
  3. 做计时时,一定要忽略第一次运行,因为这只是加载页面缓存。

出于测试目的,只需减小表的大小。那应该加快速度。

至于你关于标准化的问题。您的非规范化结构比规范化结构占用的磁盘空间少得多,因为您不需要为每个值重复键。如果您在一行中寻找一个值,规范化将无济于事。您仍然需要扫描索引以找到该行,然后加载该行。并且,该行将在一页上,无论它是规范化的还是非规范化的。事实上,归一化可能会更糟,因为索引会大得多。

有一些查询示例对数据进行规范化会有所帮助。但是,一般来说,如果您按行获取数据,您已经拥有了更高效的数据结构。

于 2013-05-17T02:24:05.490 回答
0

您可以采取分页方式。将有 2 个查询:initial 将返回所有行,但仅返回具有唯一 ID 的列。这个数组可以分成页面,比如每页 100 个 ID。当用户选择特定页面时 - 您将 100 个 ID 传递给第二个查询,这一次将返回所有 200 列,但仅返回请求的 100 行。这样,您不必一次返回所有行中的所有列,这应该会显着提高性能。

于 2013-05-17T02:15:44.920 回答