From Intel 64 and IA-32 Architectures Software Developer's Manual, Volume 3: "System Programming Guide"
8.2.5 "Strengthening or Weakening the Memory-Ordering Model"
Synchronization mechanisms in multiple-processor systems may depend
upon a strong memory-ordering model. Here, a program can use a locking
instruction such as the XCHG instruction or the LOCK prefix to ensure
that a read-modify-write operation on memory is carried out
atomically. Locking operations typically operate like I/O operations
in that they wait for all previous instructions to complete and for
all buffered writes to drain to memory (see Section 8.1.2, “Bus
Locking”).
And from 8.1.2:
Locked operations are atomic with respect to all other memory
operations and all externally visible events. Only instruction fetch
and page table accesses can pass locked instructions. Locked
instructions can be used to synchronize data written by one processor
and read by another processor.
For the P6 family processors, locked operations serialize all
outstanding load and store operations (that is, wait for them to
complete). This rule is also true for the Pentium 4 and Intel Xeon
processors, with one exception. Load operations that reference weakly
ordered memory types (such as the WC memory type) may not be
serialized.