1

我有这个html文件,从Projekktor下载:

<!DOCTYPE HTML>
<html>
<head>
<title>Projekktor Version 8 Test</title>
<link rel="stylesheet" href="theme/style.css" type="text/css" media="screen" />

<script type="text/javascript" src="projekktor/jquery.min.js"></script> <!-- Load jquery -->
<script type="text/javascript" src="projekktor/projekktor.min.js"></script> <!-- load projekktor -->
</head>
<body>

<video id="player_a" class="projekktor" poster="intro.png" title="this is projekktor" width="640" height="360" controls>

         <source src="" />

</video>

<script type="text/javascript">
$(document).ready(function() {
    projekktor('#player_a', {
    volume: 0.8,
    playerFlashMP4: 'http://www.localhost:8000/StrobeMediaPlayback.swf',
    playerFlashMP3: 'http://www.localhost:8000/StrobeMediaPlayback.swf'
    });
});
</script> 


</body>
</html>

然后我通过 API 调用(我有凭据)获取 youtube 视频的 url,以便src=''用以下代码替换结果形式

import lxml.html as LH

link = youtube_call(id)

def parse_html(link):

    filename = 'projekktor.html'
    f = LH.parse(filename)

    for el in f.iter('video'):
        el.attrib['src'] = link
        # have also tried
        # el.attrib['src'] = link.replace('amp;', '')

    new_html = LH.tostring(f, pretty_print=True)
    print (new_html)

但是当我打印它时,一个讨厌的amp;被添加到src=,并且访问链接被拒绝。(出于可读性目的,我将此处的链接分解为换行符)

https://r3---sn-oxunxg8pjvn-bpbs.googlevideo.com/videoplayback?expire=1485418386&
amp;mv=m&
amp;mt=1485396620&
amp;ms=au&
amp;clen=13475559&
amp;mn=sn-oxunxg8pjvn-bpbs&
amp;mm=31&
amp;ipbits=0&
amp;requiressl=yes&
amp;itag=18&amp;id=o-AG-dux-Jvtia_DsWZcyRfNpbMlzulsNn6I3SXyi0SI1B&
amp;lmt=1458188966300704&
amp;signature=BDC946187F74386CE00C5452CD703F9B13E4E30F.766549AB6A7C1811899CCC04742353B5BD0153D7&amp;dur=266.448&amp;key=yt6&
amp;ip=177.142.138.140&
amp;sparams=clen%2Cdur%2Cei%2Cgir%2Cid%2Cinitcwndbps%2Cip%2Cipbits%2Citag%2Clmt%2Cmime%2Cmm%2Cmn%2Cms%2Cmv%2Cpl%2Cratebypass%2Crequiressl%2Csource%2Cupn%2Cexpire&
amp;ei=MluJWO_aEIr_-AXHx6GwDA&
amp;mime=video%2Fmp4&
amp;upn=aFGwEwwIS1o&amp;pl=20&amp;source=youtube&
amp;ratebypass=yes&amp;initcwndbps=1178750&
amp;gir=yes

全部删除amp;,链接是有效的,但我试过link.replace('amp;', '')不行。

有什么解决方法吗?

4

0 回答 0