If you use single-linkage, you can use SLINK which only needs O(n) memory.
It will still need O(n^2) time.
Hierarchical clustering is not very scalable.
CLINK can do complete linkage with O(n) memory, but the result quality is usually not very good (usually worse than Anderberg with complete linkage).
All the other algorithms we have are O(n^2) in memory, unfortunately.
Thus at around 65535 instances you will hit a wall with Java.
I have one algorithm on my todo list that should be able to run in O(n log n) if I'm not mistaken with index support. But I did not get around to study it in detail yet - it's not clear which linkage strategies it can support besides single-linkage.
If you have a lot of duplicates, make sure to merge them first!