这主要有两个问题。1. 性能 您的 UDF 需要在每个存储单元格中获取 HTTP 资源。2. HTML 不幸的是,OpenOffice 或 LibreOffice 中没有 HTML 解析器。只有一个 XML 解析器。这就是为什么我们不能直接用 UDF 解析 HTML。
这会起作用,但速度很慢而且不是很普遍:
Public Function FETCHHOUSE(sHouse as String) as String
sURL = "http://awoiaf.westeros.org/index.php/House_" & sHouse
oSimpleFileAccess = createUNOService ("com.sun.star.ucb.SimpleFileAccess")
oInpDataStream = createUNOService ("com.sun.star.io.TextInputStream")
on error goto falseHouseName
oInpDataStream.setInputStream(oSimpleFileAccess.openFileRead(sUrl))
on error goto 0
dim delimiters() as long
sContent = oInpDataStream.readString(delimiters(), false)
lStartPos = instr(1, sContent, "<table class=" & chr(34) & "infobox infobox-body" )
if lStartPos = 0 then
FETCHHOUSE = "no infobox on page"
exit function
end if
lEndPos = instr(lStartPos, sContent, "</table>")
sTable = mid(sContent, lStartPos, lEndPos-lStartPos + 8)
lStartPos = instr(1, sTable, "Words" )
if lStartPos = 0 then
FETCHHOUSE = "no Words on page"
exit function
end if
lEndPos = instr(lStartPos, sTable, "</tr>")
sRow = mid(sTable, lStartPos, lEndPos-lStartPos + 5)
oTextSearch = CreateUnoService("com.sun.star.util.TextSearch")
oOptions = CreateUnoStruct("com.sun.star.util.SearchOptions")
oOptions.algorithmType = com.sun.star.util.SearchAlgorithms.REGEXP
oOptions.searchString = "<td[^<]*>"
oTextSearch.setOptions(oOptions)
oFound = oTextSearch.searchForward(sRow, 0, Len(sRow))
If oFound.subRegExpressions = 0 then
FETCHHOUSE = "Words header but no Words content on page"
exit function
end if
lStartPos = oFound.endOffset(0) + 1
lEndPos = instr(lStartPos, sRow, "</td>")
sWords = mid(sRow, lStartPos, lEndPos-lStartPos)
FETCHHOUSE = sWords
exit function
falseHouseName:
FETCHHOUSE = "House name does not exist"
End Function
更好的方法是,如果您可以从 Wiki 提供的 Web API 中获取所需的信息。你知道维基背后的人吗?如果是这样,那么您可以将其放在那里作为建议。
问候
阿克塞尔