0

在了解有关网络抓取的更多信息时,我正在尝试直接访问此处的视频文件http://video.disney.com/watch/disneychannel-no-service-4df889ed3d82b43c9c01a272

什么是直接链接?你是怎么推导出来的?

更新:按照以下网址的答案后,网址为http://cdn.videos.dolimg.com/channel_shortform/unknown/i29083/869089-tpr_hi29083_gdj-h264m_aac_848x480_904x96.mp4

检查员给了我以下“查询字符串参数”。

app:w88_dolwa_prod02
trckTp:trackvideo
vendorLst:c,n,o
lSwid:AA50B128-8C31-4B59-A487-019721763B4A
pgVwId:cto-1372188161916-8691154154948
fullPgNm:dcom|dch|watch:disneychannel-no-service-4df889ed3d82b43c9c01a272|disneychannel-no-service-4df889ed3d82b43c9c01a272
arPgNm:na
plgId:7173e513b73b5ca23f3b93fbb4664f0be83c0df8
ua:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
res:1440x900
cod:32
eventLst:e1,e73,e68
categoryCd:dcom
siteCd:dch
brdcrums:watch:disneychannel-no-service-4df889ed3d82b43c9c01a272
buId:4b90600fa5ba550f422a8ba2
propId:4bdfd35102a87496bbe03d00
buCd:dch
mstCd:mic
templateTp:watch
ASSETID:0_y1nplyhi
KSESSIONID:4ca7bebd-9389-893f-ed7a-de46a7d4928c
KSESSIONSEQ:1
KDPEVNT:percentReached
KDPDAT_VALUE:0
KDPDAT_PLAYHEAD:0
ASSETNAME:vid|dch|dmms|shr|0_y1nplyhi|No Service
AUTO:true
KDPPROTO:Flash
assetNm:vid|dch|dmms|shr|0_y1nplyhi|no service
adPgNm:/7046/dch/mickey-mouse/video
adSzLst:300x60,970x90,970x66,728x90
url:http://video.disney.com/watch/disneychannel-no-service-4df889ed3d82b43c9c01a272
urlDom:disney.com
urlFDom:video.disney.com
urlFDom1:video.disney.com/watch
refUrl:na
sessionData:no_dolWASessionData_cookie
visitorData:no_dolWAVisitorData_cookie
logStatus:lo|nr
prevPgNm:dcom|dch|watch:disneychannel-no-service-4df889ed3d82b43c9c01a272|disneychannel-no-service-4df889ed3d82b43c9c01a272
VIDLEN:210
GENTIME:1372188203486
GENTITLE:No Service | Mickey Mouse and Friends | Disney Video
GENURL:http://video.disney.com/watch/disneychannel-no-service-4df889ed3d82b43c9c01a272
DEVID:-1
USRAGNT:Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/27.0.1453.116 Safari/537.36
WIGID:1959
BITRATE:1000
KDPID:1959
mediaSumm:vid|dch|dmms|shr|0_y1nplyhi|No Service--**--210--**--dch--**--0--**--1372188203486--**--S0L0
accnt:disneyvideo2
brndSeg:
4

1 回答 1

2

It's in the last <script> tag, stored as JSON.

I don't work with Ruby, but here's how I would do it:

  • Get the HTML of your page.
  • Get the contents of the last <script> tag.
  • The JSON is between two pieces of JavaScript:
    • Remove this.Grill?Grill.burger= from the beginning.
    • Remove :(function(){var a=document.getElementsByTagName("html")[0];a.setAttribute("class",a.getAttribute("class")+" grill-error")})() from the end.
  • Parse the JSON and you've got all of the URLs.
于 2013-06-25T20:41:40.107 回答