I am writing a multi-threaded program to scrape a certain site and collect ID's. It is storing these ID's in a shared static List<string>
object.
When any item is added to the List<string>
, it is first checked against a HashSet<string>
which contains a blacklist of already collected ID's.
I do this as follows:
private static HashSet<string> Blacklist = new HashSet<string>();
private static List<string> IDList = new List<string>();
public static void AddIDToIDList(string ID)
{
lock (IDList)
{
if (IsIDBlacklisted(ID))
return;
IDList.Add(ID);
}
}
public static bool IsIDBlacklisted(string ID)
{
lock (Blacklist)
{
if (Blacklist.Contains(ID))
return true;
}
return false;
}
The Blacklist is saved to a file after finishing and is loaded every time the program starts, therefore, it will get pretty large over time (up to 50k records). Is there a more efficient way to not only store this blacklist, but also to check each ID against it?
Thanks!