I have a userlevel program which opens a file using the flags O_WRONLY|O_SYNC
. The program creates 256 threads which attempt to write 256 or more bytes of data each to the file. I want to have a total of 1280000 requests, making it a total of about 300 MB of data. The program ends once 1280000 requests have been completed.
I use pthread_spin_trylock()
to increment a variable which keeps track of the number of requests that have been completed. To ensure that each thread writes to a unique offset, I use pwrite()
and calculate the offset as a function of the number of requests that have been written already. Hence, I don't use any mutex when actually writing to the file (does this approach ensure data integrity?)
When I check the average time for which the pwrite()
call was blocked and the corresponding numbers (i.e., the average Q2C times -- which is the measure of the times for the complete life cycle of BIOs) as found using blktrace
, I find that there is a significant difference. In fact, the average completion time for a given BIO is much greater than the average latency of a pwrite()
call. What is the reason behind this discrepancy? Shouldn't these numbers be similar since O_SYNC
ensures that the data is actually written to the physical medium before returning?