Before you tell me not to use Regex to parse html, I'm aware of this but my company uses Iconico Data Extractor to extract data from its website, and it allows you to create custom scripts, but it has to be regular expressions in javascript, I am therefore stuck with using RegEx to achieve my goal.
What I need is to take the following example html and extract each line
<b>Item 1</b> Text <br>
<b>Item 2</b> Text <br>
<b>Item 3</b> Text <br>
<p><font color="#000000" face="Arial, Helvetica, sans-serif"><b>Item 4:</b></font></p>
<p><font color="#000000" face="Arial, Helvetica, sans-serif">Detailed Description</font></p>
What I need is to break down each item into an expression to retrieve all of the line complete with tags, exactly how it appears in the html. I have tried /<b>*details(.|\s)*?\/a>/gi
Which gets me the Item 4. But I cannot work out how to get items 1 - 3, as what I require is just the line from to
/<b>*Item 1(.|\s)*?\br>/gi
simply does not work and after hours of playing around with it i'm no further forward. I also need to get rid of the font tags too if thats possible. i think it's complicated by the fact that there is a closing </b>
in the middle.
can anyone offer some advice on how to set up the expression. I already know that the general consenus is no to Regex, so no need to go down that route again :)
This is all quite new to me, so hope ive explained what im trying to do.
Thanks in advance