I need to extract the information from '< a href="...">something.jpg< /a>' tags from a large string recursively that could contain multiple instances of the tags. I need to do this using regex on Oracle 11g.
An example of what I am looking for is:
Example String:
The string will always contain at least 1 instance of the < a> tag and there is no maximum to how many it can contain
The href will always a xid-[[:digit:]]
The attributes in the tag can vary
<p>text about something important</p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1234_1" target="_blank">file.pdf</a> </p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1235_1" target="_blank">anotherfile.pptx</a> </p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1236_1" target="_blank">yetanotherfile.pdf</a> </p>
Now with that string I want to extract the 3 < a ...>...< /a> blocks using
REGEXP_SUBSTR(< string>, '< pattern>', < start>, < occurrence >) and adjusting the occurrence value to grab the 3 instances.
What I have so far is:
SELECT REGEXP_SUBSTR(main_data, ''<a[[:print:]]+href="[[:print:]]+xid-1234_1"[[:print:]]+>[[:print:]]+</a>'', 1, 1)
FROM table
and the results I get from that are
<a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1234_1" target="_blank">file.pdf</a> </p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1235_1" target="_blank">anotherfile.pptx</a> </p><p><a href="@X@EmbeddedFile.requestUrlStub@X@bbcswebdav/xid-1236_1" target="_blank">yetanotherfile.pdf</a>
So it is starting with the first < a and then grabbing all the way to the last < /a>. When I need it to stop at the first instance of < /a>. Then when I increment the occurrence to 2 it should grab the second set of < a>< /a> tags. However currently setting the occurrence to 2 nothing is returned.
Any help will be appreciated. Thank you