Yes, there are standard techniques for caching and memory rebalancing. The simplest approach would follow what you're thinking of doing - create a cache 'factory' or a 'manager'. It would allocate cache objects on demand, each objects having a size limit (think of it as a CPU cache line, which has a preset size of 64 bytes). Knowing only the number of cache objects allocated the manager would be able to roughly estimate the amount of used memory and compare it to a total_max_limit which it would know based on the machine it runs on and the type of the OS and so on. So when the total_max_limit is hit and some cache objects need to be freed, the most commonly used approach is LRU (choosing a least recently used cache object to destroy). To implement this you would store pointers to cache objects inside the manager in a deque and when an cache object gets accessed it tells the manager (through the pointer in the cache object structure) to 'mark-as-accessed' meaning to move the pointer to this cache object to the front of the deque. This means that the last pointer in the deque (the *tail) references the least recently used cache object. And factory.rebalance() would just pop_back and free the object returned.
There're other algorithms, but LRU is the most commonly used one. Priority caching could be implemented using it as well. What you'd want is to create several 'cache managers' and distribute their total_max_limits so that the higher priority one get more memory and lower priority ones get less and less and less, what you'll get as a result is that lower priority stuff will be evicted faster and more higher priority stuff will reside in memory/cache. This approach might perform better than calculating some weights-based formula each time for each cache to choose how far from the head of the deque this particular cache object should be moved to.