I'm creating a DBMS (basically a software handling SQL queries) strictly for fun and as a learning experience. And I need to know the best way to separate values and rows.
For the table configuration I use XML as it's a good way to store information. Although this can not be done with all inserted rows as all the xml tags will take up a LOT of memory. I also thought about serializing all the objects representing a database (as I use Java) to store the data but my guess is that that too would take up a lot of memory.
So the only thing I could think of was using some value separator and row separator to take up minimum amount of memory. Although the problem with separators as single-characters (if I use multicharacter I might as well use XML) is that problems will occur if that separator is in one of the values. So I thought about if I could use a hexadecimal character with no attached symbol. Does that exist? And if so, is it a good approach? One problem is if I, in the future, starts allowing BLOBs. Those contain binary data and might contain my value separator. What is the best solution to this?
Tell me what you think! I'm open for discussion. Also, if anyone knows how MySQL (or some other widely used SQL engine) stores data, that could be interesting.
A new idea I got
What if you can read the entire table into a TreeSet loaded with different comparators based on what you are searching on/order by. Then the search would be equally fast what ever you are searching on. The downside of this is of course that the whole file will have to be written into objects that are placed in the TreeSet, could be a lot of RAM. What do you think?