When you are already using an HTML parser (even a poor one like SimpleHtmlDom), why are you trying to mess up things with Regex then? That's like using a scalpel to open up the patient and then falling back to a sharpened spoon for the actual surgery.
Since I strongly believe no one should use SimpleHtmlDom because it has a poor codebase and is much slower than libxml based parsers, here is how to do it with PHP's native DOM extension and XPath. XPath is effectively the Regex or SQL for X(HT)ML documents. Learn it, so you will never ever have to touch Regex for HTML again.
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLFile('https://laperuanavegana.wordpress.com/2011/06/11/ensalada-tibia-de-quinua-mango-y-tomate/');
libxml_clear_errors();
$recipe = array();
$xpath = new DOMXPath($dom);
$contentDiv = $dom->getElementById('content');
$recipe['title'] = $xpath->evaluate('string(div/h2/a)', $contentDiv);
foreach ($xpath->query('div/div/ul/li', $contentDiv) as $listNode) {
$recipe['ingredients'][] = $listNode->nodeValue;
}
print_r($recipe);
This will output:
Array
(
[title] => Ensalada tibia de quinua, mango y tomate
[ingredients] => Array
(
[0] => 250gr de quinua cocida tibia
[1] => 1 mango grande
[2] => 2 tomates
[3] => Unas hojas de perejil
[4] => Sal
[5] => Aceite de oliva
[6] => Vinagre balsámico
)
)
Note that we are not parsing http://laperuanavegana.wordpress.com/
but the actual blog post. The main URL will change content whenever the blog owner adds a new post.
To get all the Recipes from the main page, you can use
$dom = new DOMDocument;
libxml_use_internal_errors(true);
$dom->loadHTMLFile('https://laperuanavegana.wordpress.com');
libxml_clear_errors();
$contentDiv = $dom->getElementById('content');
$xp = new DOMXPath($dom);
$recipes = array();
foreach ($xp->query('div/h2/a|div/div/ul/li', $contentDiv) as $node) {
echo
($node->nodeName === 'a') ? "\n# " : '- ',
$node->nodeValue,
PHP_EOL;
}
This will output
# Ensalada tibia de quinua, mango y tomate
- 250gr de quinua cocida tibia
- 1 mango grande
- 2 tomates
- Unas hojas de perejil
- Sal
- Aceite de oliva
- Vinagre balsámico
# Flan de lúcuma
- 1 lúcuma grandota o 3 pequeñas
- 1/2 litro de leche de soja evaporada
…
and so on
Also see