I have a bunch of String
s I'd like a fast lookup for. Each String is 22 chars long and is looked up by the first 12 only (the "key" so to say), the full set of Strings is recreated periodically. They are loaded from a file and refreshed when the file changes. I have to deal with too little available memory, other server processes on my VPS need it too and need it more.
How do I best store the Strings and search for them?
My current idea is to store them all one after another inside a char[]
(to save RAM), and sort them for faster lookups (I figure the lookup is fastest if I have them presorted so I can use binary or interpolation search). But I'm not exactly sure how I should code it - if anyone is in the mood for a challenging puzzle: here it is...
Btw: It's probably ok to exceed the memory constraints for a while during the recreation / sorting, but it shouldn't be by much or for long.
Thanks!
Update
For the "I want to know specifics" crowd (correct me if I'm wrong in the Java details): The source files contain about 320 000 entries (all ANSI text), I really want to stay (WAY!) below 64 MB RAM usage and the data is only part of my program. Here's some information on sizes of Java types in memory.
My VPS is a 32bit OS, so...
- one
byte[]
, all concatenated = 12 + length bytes - one
char[]
, all concatenated = 12 + length * 2 bytes String
= 32 + length * 2 bytes (is Object, haschar[]
+ 3int
)
So I have to keep in memory:
- ~7 MB if all are stored in a
byte[]
- ~14 MB if all are stored in a
char[]
- ~25 MB if all are stored in a
String[]
- > 40 MB if they are stored in a HashTable / Map (for which I'd probably have to finetune the initial capacity)
A HashTable is not magical - it helps on insertion, but in principle it's just a very long array of String where the hashCode modulus capacity is used as an index, the data is stored in the next free position after the index and searched lineary if it's not found there on lookup. But for a Hashtable, I'd need the String itself and a substring of the first 12 chars for lookup. I don't want that (or do I miss something here?), sorry folks...