I am trying to get the total number of videos that are on a dynamically generated page. To do this I parse the page's html and search for all <object>
, <iframe>
and <embed>
tags. The page won't have any other type of iframe content other than video embed codes so I can be sure that any iframe tag is a video. The problem is some embed codes like Hulu for example have the <embed>
tag inside the <object>
tag. So with my current REGEX:
'/(<iframe|<object|<embed)/i'
this Hulu embed code is seen as 2 videos instead of one:
<object id="videoplayer1" width="728" height="407">
<param name="movie" value='http://www.hulu.com/embed/7qXAa2z1zXKPMw4mBakrRw'></param>
<param name="allowFullScreen" value="true"></param>
<param name="allowScriptAccess" value="never"></param>
<embed src='http://www.hulu.com/embed/7qXAa2z1zXKPMw4mBakrRw' type="application/x-shockwave-flash" allowfullscreen="true" width="728" height="407" allowscriptaccess='never'></embed>
</object>
Rather than searching for all embed tags I only want to search for the ones that aren't encapsulated by <object>
tags. So the hulu one above will be avoided but one like this will be counted:
<embed src="http://www.ebaumsworld.com/player.swf" allowScriptAccess="always" flashvars="id1=81748652" wmode="opaque" width="567" height="345" allowfullscreen="true" />
What would the REGEX pattern look like for this, I'm using PHP.