I have normalized a Country/region/city database into multiple tables. City has a foreign key to region which has a foreign key to country.
The CITY table includes 2 additional columns for finding the associated numerical IPAddress. As you can imagine the city table has over 4 million records (representing the cities in the world which maps back to a region and then a country).
CITY, REGION, COUNTRY are entities that I have mapped with Entity Framework power tools, that all have a name column (that represents a cityname, regionname, countryname, respectively), and a primary key IDENTITY column that is indexed.
Let's say I have a table / entity called VisitorHit that has the following columns:
id as int (primary key, identity)
dateVisited as datetime
FK_City as int (which has a many to one relationship to the CITY entity)
In code I use the VisitorHit entity like:
var specialVisitors = VisitorRepository.GetAllSpecialVisitors();
var distinctCountries = specialVisitors.Select(i => i.City.CityName).Distinct().ToArray();
now the GetAllSpecialVisitors returns a subset of the actual visitors (and it works pretty fast). The typical subset contains approximately 10,000 rows. The Select Distinct statement takes minutes to return. Ultimately I need to further delimit the distinctCountries by a date range (using the visitorhit.datevisited field) and return the count for each distinctCountry.
Any ideas on how I could speed up this operation?