There is no simple formula for this.
The instant used for sampling the frame before encoding is called the PTS (presentation timestamp). It's out of the scope of the encoder, you must remember it in your data flow when you capture the frames.
From there, you have 2 possibilities:
- The H264 encoder does not generate B-frame, then the RTP timestamp should be the PTS + random offset (the same for all streaming session)
- If the encoder generate B-frames (or B-slices), then the decoding order needs to be modified, since B-frame requires the next frame to be decoded, so it must be sent before.
In the latter case, the RFC6184 states that you have multiple way to stream the encoded NAL units.
Most of the streaming software will use the mode called "Non interleaved", in which, you must set the RTP timestamp to the PTS + offset, but send them in the decoding order so the timestamp will not increase monotonically.
This also means the client will have to decode in the order received and not reorder the frames in the PTS order.
I'm not using the term DTS here for a reason, because you don't need the decoding timestamp for this to work, only the order.
The last mode described in RFC6184 is the so-called interleaved order where you can reorder the NAL units. In that case, you have to implement some application logic to reorder the units, refer to RFC6184 for details.