Don't count on x += 1
being thread-safe. Here is an example where it does not work (see Josiah Carlson's comment):
import threading
x = 0
def foo():
global x
for i in xrange(1000000):
x += 1
threads = [threading.Thread(target=foo), threading.Thread(target=foo)]
for t in threads:
t.daemon = True
t.start()
for t in threads:
t.join()
print(x)
If you disassemble foo
:
In [80]: import dis
In [81]: dis.dis(foo)
4 0 SETUP_LOOP 30 (to 33)
3 LOAD_GLOBAL 0 (xrange)
6 LOAD_CONST 1 (1000000)
9 CALL_FUNCTION 1
12 GET_ITER
>> 13 FOR_ITER 16 (to 32)
16 STORE_FAST 0 (i)
5 19 LOAD_GLOBAL 1 (x)
22 LOAD_CONST 2 (1)
25 INPLACE_ADD
26 STORE_GLOBAL 1 (x)
29 JUMP_ABSOLUTE 13
>> 32 POP_BLOCK
>> 33 LOAD_CONST 0 (None)
36 RETURN_VALUE
You see that there is a LOAD_GLOBAL
to retrieve the value of x
, there is an INPLACE_ADD
, and then a STORE_GLOBAL
.
If both threads LOAD_GLOBAL
in succession, then they might both load the same value of x
. Then they both increment to the same number, and store the same number. So the work of one thread overwrites the work of the other. This is not thread-safe.
As you can see, the final value of x
would be 2000000 if the program were thread-safe, but instead you almost always get a number less than 2000000.
If you add a lock, you get the "expected" answer:
import threading
lock = threading.Lock()
x = 0
def foo():
global x
for i in xrange(1000000):
with lock:
x += 1
threads = [threading.Thread(target=foo), threading.Thread(target=foo)]
for t in threads:
t.daemon = True
t.start()
for t in threads:
t.join()
print(x)
yields
2000000
I think the reason why the code you posted does not exhibit a problem:
for i in range(1000):
t = threading.Thread(target = worker)
threads.append(t)
t.start()
is because your worker
s complete so darn quickly compared to the time it takes to spawn a new thread that in practice there is no competition between threads. In Josiah Carlson's example above, each thread spends a significant amount of time in foo
which increases the chance of thread collision.