What happens is this:
On the caller's side, a return slot is provided which can hold the result, that means that the caller provides the memory for the variable of type std::vector<int>. It expects the called method to construct the value and is itself responsible for calling the destructor when the result is no longer used and freeing the memory (if necessary, it probably just lives on the stack).
The called function (which may live in a different translation unit!) would, without the NRVO, so this:
- Provide a memory slot for ret.
- Construct a local variable retin this memory slot.
- Do stuff...
- Copy-construct the return value in the provided memory slot by copying ret.
- Call ret's destructor.
Now, with the NRVO, the decision to optimize this can be done in the called function's translation unit. It transforms the above into:
- Construct retin the memory of the method's return slot.
- Do stuff...
No need to do anything else as the memory is owned and the destructor is called by the caller and because the optimization is transparent for the caller :)
This, of course, can't eliminate the assignment into v in your example. If you store the result in a different variable, e.g.
std::vector<int> w = f(v);
the NRVO will construct ret directly into w's memory (as this will be passed in as the return slot to f).