0

I'm using Goutte 1.0.6 (the latest to use Guzzle 3) to build a web scraper. For testing I want to load an HTTP response and serve that instead of a real cURL response, and this mostly works fine. Presently the use case is unit testing, but I expect I'll want to use this for production caching too.

Interestingly, I've noticed sometimes that if I am disconnected from the web, my unit tests slow down. I did a bit of digging with Wireshark, and find that my calls to http://example.com/something are generating DNS requests, even though they are unnecessary.

Here is the relevant snippet of my Guzzle plugin to supply the fake response. The relevant bits are the captured request.before_send event, and the fact that the request has its response filled in at the end, if a cache item is found in a Propel table:

class SavedPageLoaderPlugin extends HttpPluginBase
{
    public static function getSubscribedEvents()
    {
        return array(
            'request.before_send' => 'onRequestBeforeSend',
        );
    }

    /**
     * Handles a Guzzle event before an HTTP op is attempted
     * 
     * @param \Guzzle\Common\Event $event
     * @throws \WebScraper\PauseException
     */
    public function onRequestBeforeSend(Event $event)
    {
        // @var $request Guzzle\Http\Message\Response
        $request = $event['request'];

        // Decide if we are caching for this run
        if (!$this->isLoadEnabled())
        {
            return;
        }

        // Decide if we have a URL for this request
        $url = $request->getUrl();
        $httpPage = HttpPageQuery::create()->
            filterByUrl($url)->
            findOne();
        if (!$httpPage)
        {
            return;
        }

        // Set a notification message for all subscribers, then set response
        $this->setContainerMessage(self::MESSAGE_USES_SAVED_PAGE);
        $response = new Response(
            200,
            $this->convertHeadersIntoKeyedArray($httpPage->getHeaders()),
            $httpPage->getBody()
        );

        $request->setResponse($response);
    }

To recap, the code itself is working. Wireshark shows no actual attempts to fetch data over port 80 and there is no failure exception (e.g. 404) relating to example.com's inability to supply the documents I ask for. So, my fake responses seem to be fine.

Is there a way for me to prevent Guzzle making these pointless DNS calls? I did think about using the MockPlugin but I wasn't sure how to do that at the time, and nor now whether that would fix this one remaining issue.

(I rather like doing the faking/mocking inside a plugin, so whilst I've no problem with using MockPlugin, I would want to do interceptions inside it, rather than outside as per the docs. I guess I could extend it, perhaps?)

It may be that I need to move to a later version of Guzzle, and if that is the only way, so be it. I'm on an old project where the latest Goutte at the time used Guzzle 3. I intend to upgrade, but would rather do that later if possible, as my current versions do everything I want.


Post Script: it occurs to me that the DNS call could conceivably come from Goutte and not Guzzle. I'm not sure how to go about debugging that, at least in part because Goutte is fetched by Composer as a .phar file. Could a debugger like xdebug be useful here to see what is making the network call, and where?

4

1 回答 1

2

啊哈:这既不是 Guzzle 也不是 Goutte。在我的代码的其他地方,我request.success为了 HTTP 日志记录的目的而拦截了该事件。我在这里调用gethostbyname(),其目的是明确地进行 DNS 查找。

现在这被禁用了,“神秘”的 DNS 调用已经消失了。

于 2015-01-25T14:24:49.607 回答