The solution I used seemed like a common workaround once I found something similar in the source of drbd block driver. The bi_private field can be used only by the function that allocates it. So I used bio_clone in the following way
bio_copy = bio_clone(bio_source, GFP_NOIO);
struct something *instance = kmalloc(sizeof(struct something), GFP_KERNEL);
instance->bio_original = bio_source;
//update timestamps for latency inside this struct instance
bio_copy->bi_private = instance;
bio_copy->bi_end_io = my_end_io_function;
bio_copy->bi_dev = bio_source->bi_dev;
...
...
make_request_fn(queue, bio_copy);
You'll have to write a bi_end_io function. Do remember to call bio_endio for original bio inside this function. You might need to copy bi_error field into bio_source's bi_error before calling bio_endio(bio_source).
Hope this helps someone.