Actually the above answer is wrong. It gets 0xFFFFFFFE for p because it initialized q with 0x7FFFFFFFE which does not fit and is (except for undefined behavior) in fact truncated to 0xFFFFFFFE.
So the code in that answer in fact starts with p = 0xFFFFFFFF and q = 0xFFFFFFFE and successfully swaps them.
(edit: it WAS wrong, it has been corrected now)
The routine in fact works perfectly (except for eventual undefined behavior which I'm not sure is there or not, I'm a bit fuzzy on the details of the standard).
Ignoring eventual undefined behavior, if the underlying arithmetic is the common 2's complement modulo arithmetic, the routine works and have no problems with overflows/underflows (it's a modulo arithmetic).
The only real problem is with aliasing. If the two pointers are identical, the pointed to integer will become 0.
The precondition of the routine is therefore that the two pointers are valid and not aliased. Checks for null pointers and for identical pointers may be added, if that's what you expect your program to need.
The question of why to use this mechanism instead of the xor trick instead of the temp variable swap, is still open. But has NOTHING to do with the OP question: the routine works and overflow, as it's handled by 2's complement modulo arithmetic, which is the most common on processors, is not an issue (edit: it becomes only because of overflow being undefined behavior in the standard, but it works on any processor that uses 2's complement modulo arithmetic). As for efficiency, it all depends on the optimizer AND the processor.