出于机器学习的目的,我有一个 html 页面作为输入,以提取所有 DOM 元素的所有样式属性。所以,这是我的初步代码:
from selenium import webdriver
start = time.time()
driver = webdriver.PhantomJS()
driver.get('example page')
elements = driver.find_elements(By.XPATH, "//*[not(child::*)]") #select only leaf nodes
l = {}
css_properties=("line-height", "text-align","font-size", "font-style")
for i in elements:
if i.text:
#print time.time() - end_dl
if i.text not in l:
l[i.text] = {}
for el in css_properties:
l[i.text][el] = str(i.value_of_css_property(el))
l[i.text]["text_length"] = len(i.text)
问题是这段代码解析我的特征(~8s)的时间太长了。任何人都可以以更快的方式思考吗?