Initially, Adobe didn't expect the XMP data length would exceed the limit of one JPEG segment (about 64K) and their XMP specification stated the XMP data must fit into one. Later when they found a single JPEG APP1 segment is not large enough to hold the XMP data, they changed their specification to allow for multiple APP1 segments for the whole XMP data. The data is split into two parts: the standard XMP and the ExtendedXMP. The standard XMP part is a "normal" XMP structure with a package wrapper while the ExtendedXMP part doesn't have a package wrapper. The ExtendedXMP data can be further divided to fit into multiple APP1.
The following quote is from Adobe XMP specification Part 3 for ExtendedXMP chunks as JPEG APP1:
Each chunk is written into the JPEG file within a separate APP1 marker
segment. Each ExtendedXMP marker segment contains:
- A null-terminated signature string of "http://ns.adobe.com/xmp/extension/".
- A 128-bit GUID stored as a 32-byte ASCII hex string, capital A-F, no null termination. The GUID is a 128-bit MD5 digest of the full
ExtendedXMP serialization.
- The full length of the ExtendedXMP serialization as a 32-bit unsigned integer
- The offset of this portion as a 32-bit unsigned integer.
- The portion of the ExtendedXMP
We can see besides the null-terminated string as an id for the ExtendedXMP data, there is also a GUID which should be the same value as the one found in the standard XMP part. The offset is used to join the different parts of the ExtendedXMP - so the sequence for the ExtendedXMP APP1 may not even be in order. Then come the actual data part and this is why @Matt's answer need some way to fix the string. There is another value - full length of the ExtendedXMP serialization which serves two purposes: check the integrity of the data as well as provides the buffer size for joining the data.
When we found a ExtendedXMP segment, we need to join the current data with the other ExtendedXMP segments and finally got the whole ExtendedXMP data. We then join the two XML tree together (removing the GUID from the standard XMP part as well) to retrieve the entire XMP data.
I have made a library icafe in Java which can extract and insert XMP as well as ExtendedXMP. One of the usecase for the ExtendedXMP is for Google's depth map data which in fact is a grayscale image hidden inside the actual image as a metadata, and in the case of JPEG, as XMP data. The depth map image could be used for example to blur the original image. The depth map data are usually large and have to be split into standard and extended XMP parts. The whole data is Base64 encoded and could be in PNG format.
The following is an example image and the extracted depth map:

The original image comes from here.
Note: Recently I found another website talking about Google Cardboard Camera app which can take advantage of both the image and audio embedded in the JPEG XMP data. ICAFE now supports both image and audio extraction from such images. Example usage can be found here with the following call JPEGTweaker.extractDepthMap()
Here is the image extracted by ICAFE from the original image on the website talking about Google Cardboard Camera app:

Unfortunately, I can't find a way to insert the MP4 audio here.