问题标签 [goutte]

问问题

For questions regarding programming in ECMAScript (JavaScript/JS) and its various dialects/implementations (excluding ActionScript). Note JavaScript is NOT the same as Java! Please include all relevant tags on your question; e.g., [node.js], [jquery], [json], [reactjs], [angular], [ember.js], [vue.js], [typescript], [svelte], etc.

298 问题

0 投票

2 回答

8492 浏览

php - 使用 Goutte 和 PHP 抓取列表以获取 href 的问题

我正在尝试抓取以下内容，我基本上想要文本和链接，我正在使用Goutte和 PHP。我可以使用以下代码很好地获取文本，但无法获取 href 值。任何帮助都会很棒。

2015-03-16T09:57:42.417

0 投票

2 回答

1196 浏览

php - 在控制器中使用 Goutte 和 Symfony2

我正在尝试抓取页面并且我对php框架不是很熟悉，所以我一直在尝试学习Symfony2。我已经启动并运行它，现在我正在尝试使用 Goutte。它安装在供应商文件夹中，我有一个用于我的抓取项目的包。

问题是，从 a 中进行刮擦是一种好习惯Controller吗？如何？我一直在搜索，无法弄清楚如何Goutte从包中使用，因为它深深地埋在文件结构中。

}

php symfony web-scraping goutte

2015-03-16T21:01:46.997

0 投票

1 回答

4764 浏览

symfony - Goutte Scrape 登录到 https 安全网站

因此，我尝试使用 Goutte 登录https网站，但出现以下错误：

cURL error 60: SSL certificate problem: unable to get local issuer certificate 500 Internal Server Error - RequestException 1 linked Exception: RingException

这是 Goutte 的创建者说要使用的代码：

或者这里是 Symfony 推荐的代码：

问题是它们都不起作用，我收到了上面发布的错误。我可以，但是使用我问过的过去问题中编写的代码登录： cURL Scrape then Parse/Find Specific Content

我只想使用 Symfony/Goutte 登录，这样抓取我需要的数据会更容易。请问有什么帮助或建议吗？谢谢！

symfony ssl curl web-scraping goutte

2015-03-17T05:53:55.410

0 投票

1 回答

4136 浏览

php - PHP 痛风。选择没有“值”字段的按钮

这是我的目标网站：http ://www.rapid7.com/db/我想在那里进行搜索，比如说字符串“Symphony”来检查它的漏洞。

检查输入表单的元素，我看到它的名称是“q”，到目前为止还不错，但是按钮没有值，而我需要提交查询的按钮是：<span id="run_search" class="vbsearchBtn"></span>没有值字段。

我的代码：

有谁知道如何执行此操作？

php html goutte

2015-03-17T17:11:11.453

0 投票

1 回答

2505 浏览

php - 是否可以使用 Goutte/PHP 抓取基于 JavaScript 的网站？

我想抓取几个网站，这些网站显然是使用 JavaScript 渲染的。具体来说，我想定位这个网站：http ://cve.mitre.org/find/index.html

这是我的代码：

如果我查看源代码，我看不到 HTML，因为这个请求是由 JavaScript 完成的，那么，有人知道如何抓取这些网站吗？

php html web-crawler goutte

2015-03-18T08:49:20.870

0 投票

1 回答

985 浏览

symfony - Goutte Scraper 通过页面对象解析

这对我来说是一种学习体验，但使用的是 Symfony 和 Goutte。我已经能够登录到一个安全的网站，然后返回一个页面。

我现在要做的是解析对象$crawler。让我感到困惑的是，Goutte 似乎并没有说明如何做到这一点。我想很多人都用过 Guzzle 和 Goutte，但我不能use Guzzle\Client;和use Goutte\Client;.

我要做的就是解析$crawler对象以在 html 源代码中找到某些内容。（注意：这个特定页面不使用 id 或 classes，所以我不能做filter('#stuff')or filter('.stuff')。）

有人可以帮我解释一下如何使用 Goutte 解析我得到的对象吗？

（编辑：我想指定，我想也许只是搜索一个字符串或其他东西。我可以将$crawler对象转换为纯文本源代码然后只做一个preg_match或什么吗？）

symfony web-scraping web-crawler guzzle goutte

2015-03-18T20:13:31.603

0 投票

1 回答

677 浏览

behat - 这个异常的原因是什么。？

我正在尝试使用此代码 "bin\behat --format html --out report.html --profile firefox" 运行 Behat\mink。但我收到了这个错误。

composer.json 看起来像这样

behat.yml

如果您能告诉我哪里出错了，那将非常有帮助。

behat goutte

2015-03-23T12:22:53.670

0 投票

1 回答

1684 浏览

php - DOMCrawler 查找带有内部 HTML 文本的标记

我正在尝试使用 Goutte 抓取网页，但找不到DOMCrawler搜索实际文本的方法。假设有一个td，但它没有类或 ID。所以，我需要搜索让我们说“标题”，然后得到td下一个兄弟。

php symfony web-scraping goutte domcrawler

2015-03-23T23:01:55.057

0 投票

2 回答

1667 浏览

php - DOM。从选项标签中的给定文本中获取值属性

我正在尝试通过 CSS 选择器或 xpath 表达式从给定文本中获取值，但我不知道是否可以执行此操作。这是我的 HTML：

假设我想通过给出文本来获得值 3511。

我想要这个的原因是因为我想做这样的网络爬行：

而且我不想将数字 3511 作为参数传递，而是传递文本。

希望我说清楚了，提前谢谢你。

php dom xpath css-selectors goutte

2015-03-24T10:37:53.920

0 投票

2 回答

474 浏览

php - DOMCrawler not dumping data properly for parsing

I'm using Symfony, Goutte, and DOMCrawler to scrape a page. Unfortunately, this page has many old fashioned tables of data, and no IDs or classes or identifying factors. So I'm trying to find a table by parsing through the source code I get back from the request, but I can't seem to access any information

I think when I try to filter it, it only filters the first node, and that's not where my desired data is, so it returns nothing.

so I have a $crawler object. And I've tried to loop through the following to get what I want:

I'm not sure what Crawler $node, I just got it from the example on the web page. Perhaps if I can get this working, then it will loop through each node in the $crawler object and find what I'm actually looking for.

Here's an example of the page:

And this is just one table, there are many tables and a huge sloppy mess outside of this one. Any ideas?

(Note: earlier I was able to apply a filter to the $crawler object for some information I needed, then I serialize() the information, and has a string finally, which made sense. But I cannot get a string at all anymore, idk why.)

php symfony web-scraping goutte domcrawler

2015-03-25T22:08:38.633

1 2 3 4 5 6 7 8 9 10

问题标签 [goutte]

Reference