我想抓取一个网页并提取页面中嵌入的视频网址。我第一次使用Inspect
工具,可以很容易地看到嵌入的链接,如下图所示:
和目标<video>
标签:
<video class="jw-video jw-reset" tabindex="-1" disableremoteplayback="" webkit-playsinline="" playsinline="" preload="metadata" src="https://lh3.googleusercontent.com/YYxKbKt3Apa8A2LkHKBJ7Fx6GU_iCIjEeGyyPJm_Ll-9hO4K8fDZV1pAbYprwpRhS5yFanf7=m18?title=[CayPhim.Net]-Bay-Vien-Ngoc-Rong-Sieu-Cap-tap-6.[360p]"></video>
然后我尝试使用View Page Source
工具并搜索链接,但我找不到它。相反,我发现一些javascript
代码似乎用于动态获取和添加到页面的链接(在页面加载时):
<div id="switchserver" style="height:100%;">
<div id="phim_html5" style="height:100%;">
<div class="loading"></div>
</div>
<script>$(document).ready(function () {
$.ajax({
url: "http://player.cayphim.net/jwplayer7/index_googima.php",
type: "GET",
cache: false,
data: {
"url": "8ce46ffa35805780571877c8ae5808f6a5e8898ebf9d294326735716694ccb4279505da51df9678cc8601a390a422d5e639449ec90332ee518e06f1dd579606d106f292d49bb38d9b2e80d0ee965a5c0e2911922e48ac972c521c4236512d356681404472b2cb39d9fff915bb4da21c8315d3fd6fc6cb0d2ed27183598661d40",
"name": "QmF5IFZpZW4gTmdvYyBSb25nIFNpZXUgQ2FwIHRhcCA2",
"sub": ""
},
success: function (msg) {
$("#phim_html5").html(msg);
}, error: function () {
$("#phim_html5").html("<div class='player-error'>Server quá tải. Vui lòng chọn server khác bên dưới...</div>");
},
});
});
</script>
<img style="display: none" src="http://image.cayphim.net/1553256337-lSC0nSX6Wj9dlOXfK29gK2iwoKF9D0p4YwnxYgmyCwmyBfJ29eW1wKpPetD3BkBKF9D0p4MsZPPjEReTD0UVY0K4YvoGKF9D0p42wODHaQKFo5pGMVmG9XmN05lP80DksxCQaHXKF9D0p4MMJwwhnyohFxsEYKF9D0p4wRJH5xYk1eXEr9mETjpng" />
</div>
现在我曾经使用代码中指定的参数Advanced REST Client
发出GET
请求,但我得到的响应包含如下内容:http://player.cayphim.net/jwplayer7/index_googima.php
javascript
<div id="playerjw7">Trình duyệt của bạn không hỗ trợ xem phim bằng Player HTML5. Vui lòng cài đặt Chrome hoặc Firefox</div>
是的,它是越南语,但它的意思是“您的浏览器不支持视频播放器 HTML5。请安装 Chrome 或 Firefox”。
如何以编程方式抓取和提取嵌入的视频网址?