In both hardware and software design, there's an overriding principle which suggests that the execution speed of something that's done a million times is far more important than the execution speed of something that's done once. A corollary of this is that if something is done a million times, the time required to do something the first time is far less important than the time required for the other 999,999. One of the biggest reasons computers are so much faster today than 25 years ago is that designers have focus on making repeated operations faster, even when doing so might slow down the performance of one-off operations.
As a simple example from a hardware perspective, consider two approaches to memory design: (1) there is a single memory store, and every operation takes sixty nanoseconds to complete; (2) there are several levels of cache; fetching a word which is held in the first level of cache will take one nanosecond; a word which isn't there, but is held in the second level will take five; a word which isn't there but is in the third level will take ten, and one which isn't there will take sixty. If all memory accesses were totally random, the first design would not only be simpler than the second, but it would also perform better. Most memory accesses would cause the CPU to waste ten nanoseconds looking up data in the cache before going out and fetching it from main memory. On the other hand, if 80% of memory accesses are satisfied by the first cache level, 16% by the second, and 3% by the third, so only one in a hundred have to go out to main memory, then the average time for those memory accesses will be 2.5ns. That's forty times as fast, on average, as the simpler memory system.
Even if an entire program is pre-loaded from disk, the first time a routine like "printf" is run, neither it nor any data it requires is likely to be in any level of cache. Consequently, slow memory accesses will be required the first time it's run. On the other hand, once the code and much of its required data have been cached, future executions will be much faster. If a repeated execution of a piece of code occurs while it is still in the fastest cache, the speed difference can easily be an order of magnitude. Optimizing for the fast case will in many cases cause one-time execution of code to be much slower than it otherwise would be (to an even greater extent than suggested by the example above) but since many processors spend much of their time running little pieces of code millions or billions of time, the speedups obtained in those situations far outweigh any slow-down in the execution of routines that only run once.