sql - Granularity of data rows

Question

We're developing an application with one function of managing payments to people. A payment will be written to a row in a table, with the following fields:

PersonId (INT)
TransactionDate (DATETIME)
Amount (MONEY)
PaymentTypeId (INT)
...
...
...

It looks like we deal with around 8000 people who we send payments to, and a new transaction per person is added daily (Around 8,000 inserts per day). This means that after 7 years (The time we need to store the data for), we will have over 20,000,000 rows.

We get around 10% more people per year, so this number rises a bit.

The most common query would be to get a SUM(Amount), per person, where Transaction Date between a start date and an end date.

SELECT PersonId, SUM(Amount)
FROM Table
WHERE PaymentTypeId = x
AND TransactionDate BETWEEN StartDate AND EndDate
GROUP BY PersonId

My question is, is this going to be a performance problem for SQL Server 2012? Or is 20,000,000 rows not too bad?

I'd have assumed a clustered index on PersonID? (To group them), but this would cause very slow insert/updates?

An index on the TransactionDate?

score 0 · Accepted Answer

If your query selects based on TransactionDate and PaymentTypeId and also needs PersonId and Amount at the same, I would recommend putting a nonclustered index on TransactionDate and PaymentTypeId and including those other two columns in the index:

CREATE NONCLUSTERED INDEX IX_Table_TransactionDate
ON dbo.Table (TransactionDate, PaymentTypeId)
INCLUDE (PersonId, Amount)

That way, your query can be satisfied from just this index - no need to go back to the actual complete data pages.

Also: if you have years that can be "finalized" (no more changes), you could possibly pre-compute and store certain of those summations, e.g. for each day, for each month etc. With this approach, certain queries might just pull pre-computed sums from a table, rather than having to again compute the sum over thousands of rows.

sql - Granularity of data rows

1 回答 1

Related

Reference