这是我的第一次尝试,但没有奏效。
$this->crawler = $client->request('GET', $this->url);
$document = new \DOMDocument('1.0', 'UTF-8');
$root = $document->appendChild($document->createElement('_root'));
$this->crawler->rewind();
$root->appendChild($document->importNode($this->crawler->current(), true));
$selectorsToRemove = ['script','p'];
foreach ($selectorsToRemove as $selector) {
$crawlerInverse = $this->crawler->filter($selector);
foreach ($crawlerInverse as $elementToRemove) {
$parent = $elementToRemove->parentNode;
$parent->removeChild($elementToRemove);
}
}
$this->crawler->clear();
$this->crawler->add($document);
我想从这个页面http://www.amazon.com/dp/B00IOY8XWQ/ref=fs_kv获取“p”标签,它接缝在段落中有一些 js,所以当我尝试做 $node->文本(); 它让我得到了“p”中“脚本”中的文本和js。结构是这样的;
<p>
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat.
<script>
"JS CODE"
</script>
</p>
所以我只想要 Lorem ipsum 文本。