As you've noticed, this isn't a problem for which Bloom Filters are appropriate. With a Bloom Filter, when you get a hit you must then check the authoritative source to differentiate between a false positive and a true positive - they're useful in situations where most queries against the filter will be expected to give a negative result, which is the opposite to your case.
What you could do is have each side build a partial Prefix Tree in memory of all the filenames known to that side. It wouldn't be a full prefix tree - once you number of filenames below a node drops below a certain level you'd just include the full list of those filenames in that node. You then synchronise those prefix trees using a recursive algorithm starting at the root of the trees:
- Each side creates a hash of all the sorted, concatenated filenames below the current node.
- If the hashes are equal then this node and all descendents are synchronised - return.
- If there are no child nodes, send the (short) list of filenames at this terminal node from one side to the other to synchronise and return.
- Otherwise, recursively synchronise the child nodes and return.
The hash should be at least 128 bits, and make sure that when you concatenate the filenames for the hash you do so in a reversible manner (ie. seperate them with a character that can't appear in filenames like \0
, or prefix each one with its length).