Short answer: no, it is incorrect to nest kernel_fpu_begin()
calls, and it will lead to the userspace FPU state getting corrupted.
Medium answer: This won't work because kernel_fpu_begin()
use the current thread's struct task_struct
to save off the FPU state (task_struct
has an architecture-dependent member thread
, and on x86, thread.fpu
holds the thread's FPU state), and doing a second kernel_fpu_begin()
will overwrite the original saved state. Then doing kernel_fpu_end()
will end up restoring the wrong FPU state.
Long answer: As you saw looking at the actual implementation in <asm/i387.h>
, the details are a bit tricky. In older kernels (like the 3.2 source you looked at), the FPU handling is always "lazy" -- the kernel wants to avoid the overhead of reloading the FPU until it really needs it, because the thread might run and be scheduled out again without ever actually using the FPU or needing its FPU state. So kernel_fpu_end()
just sets the TS flag, which causes the next access of the FPU to trap and cause the FPU state to be reloaded. The hope is that we don't actually use the FPU enough of the time for this to be cheaper overall.
However, if you look at newer kernels (3.7 or newer, I believe), you'll see that there is actually a second code path for all of this -- "eager" FPU. This is because newer CPUs have the "optimized" XSAVEOPT instruction, and newer userspace uses the FPU more often (for SSE in memcpy, etc). The cost of XSAVEOPT / XRSTOR is less and the chance of the lazy optimization actually avoiding an FPU reload is less too, so with a new kernel on a new CPU, kernel_fpu_end()
just goes ahead and restores the FPU state. (
However in both the "lazy" and "eager" FPU modes, there is still only one slot in the task_struct
to save the FPU state, so nesting kernel_fpu_begin()
will end up corrupting userspace's FPU state.