1

我是 Python 新手,我一直在尝试使用 BeautifulSoup 从脚本元素中定义的变量中提取一个特定的数据行。

代码:

import requests
from bs4 import BeautifulSoup
import esprima

#----------------some comment'

URL = 'https://downdetector.com/status/facebook/'

browser = {'user-agent': 'my agent'}


#--------------some comment:
page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


#---------------some comment:

chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()
print(chart)

输出:

var data = {
status: 'success',  
baseline: 29,       
communicate: null,  
company: 'Facebook',
max: 66,
series: [

                      { x: '2020-05-30T13:22:28.168484-04:00', y: 25  },

                      { x: '2020-05-30T13:37:28.168484-04:00', y: 27  },

                      .....

                      { x: '2020-05-31T13:07:28.168484-04:00', y: 30  },

                  ]
                }

                $(function () {
                  chartThis(data, 'holder', 'line')
                });

                if (data.communicate && $('#dd-communicate').length) {
                  $('#dd-communicate').html('<div class="border text-left d-inline-block p-2"><i class="fa" aria-hidden="true" style="color: red; width:16px; height:12px; background:url(https://cdn2.downdetector.com/d328eb8cbe4e164/images/v2/message.svg) no-repeat"></i>'
                    +'<span class="d-inline-block px-1">'+ data.company+' &bull;  ' + moment.utc(data.communicate.created_at).fromNow()
                    + '</span><p class="font-weight-bold my-0">'+ data.communicate.message + '</p></div>')
                }

你知道从上面的 var 结果中提取“最大值”的简单方法吗?

我试过使用 esprima,但仍然没有运气,因为我遇到了错误:

回溯(最后一次调用):文件“c:/test.py,第 31 行,在 if token["type"] == "Identifier" and token["value"] == "max": TypeError: 'BufferEntry ' 对象不可下标

我的 esprima 代码如下所示:

import requests
from bs4 import BeautifulSoup
import esprima

#----------------some comment'

URL = 'https://downdetector.com/status/facebook/'

browser = {'user-agent': 'my agent'}


#--------------some comment:
page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


#---------------some comment:

chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()

tokens = esprima.tokenize(chart)

token_iterator = iter(tokens)

for token in token_iterator:
    if token["type"] == "Identifier" and token["value"] == "max":
        value_token = next(next(token_iterator))
        result = value_token["value"]

任何帮助将不胜感激!

4

1 回答 1

0

提取最大值的快速解决方案是在split上使用chart

import requests
from bs4 import BeautifulSoup

URL = 'https://downdetector.com/status/facebook/'
browser = {'user-agent': 'my agent'}

page = requests.get(URL, headers=browser)
soup = BeautifulSoup(page.content, 'html.parser')


chart = soup.find("div",{"class":"popover-container justify-content-center p-relative"}).script.get_text()
max_val= chart.split("max: ")[1].split(",")[0]

print(max_val)

OUT: 64
于 2020-05-31T17:36:59.687 回答