I have normalized a Country/region/city database into multiple tables. City has a foreign key to region which has a foreign key to country.
The CITY
table includes 2 additional columns for finding the associated numerical IPAddress
. As you can imagine the city table has over 4 million records (representing the cities in the world which maps back to a region and then a country).
CITY
, REGION
, COUNTRY
are entities that I have mapped with Entity Framework power tools, that all have a name column (that represents a cityname
, regionname
, countryname
, respectively), and a primary key IDENTITY column that is indexed.
Let's say I have a table / entity called VisitorHit
that has the following columns:
id as int (primary key, identity)
dateVisited as datetime
FK_City as int (which has a many to one relationship to the CITY entity)
In code I use the VisitorHit
entity like:
var specialVisitors = VisitorRepository.GetAllSpecialVisitors();
var distinctCountries = specialVisitors.Select(i => i.City.CityName).Distinct().ToArray();
now the GetAllSpecialVisitors
returns a subset of the actual visitors (and it works pretty fast). The typical subset contains approximately 10,000 rows. The Select Distinct
statement takes minutes to return. Ultimately I need to further delimit the distinctCountries
by a date range (using the visitorhit.datevisited
field) and return the count for each distinctCountry
.
Any ideas on how I could speed up this operation?