我强迫自己学习如何仅在 AppleScript 中编写脚本,但我目前正面临一个问题,即尝试使用类删除特定标签。我试图找到可靠的文档和示例,但目前似乎非常有限。
这是我拥有的 HTML:
<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl <span class="foo">shoulder</span> biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami <span class="foo">jerky</span> strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>
我想要做的是删除一个特定的类,所以它会删除<span class="foo">
,结果:
<p>Bacon ipsum dolor amet pork chop landjaeger short ribs boudin short loin jowl shoulder biltong shankle capicola drumstick pork loin rump spare ribs ham hock. <span class="bar">Pig brisket</span> jowl ham pastrami jerky strip steak bacon doner. Short loin leberkas jowl, filet mignon turducken chicken ribeye shank tail swine strip steak pork loin sausage. Frankfurter ground round porchetta, pork short ribs jowl alcatra flank sausage.</p>
我知道如何使用do shell script
终端并通过终端执行此操作,但我想了解通过 AppleScript 字典可用的内容。
在研究中,我能够找到一种方法来解析所有 HTML 标签:
on removeMarkupFromText(theText)
set tagDetected to false
set theCleanText to ""
repeat with a from 1 to length of theText
set theCurrentCharacter to character a of theText
if theCurrentCharacter is "<" then
set tagDetected to true
else if theCurrentCharacter is ">" then
set tagDetected to false
else if tagDetected is false then
set theCleanText to theCleanText & theCurrentCharacter as string
end if
end repeat
return theCleanText
end removeMarkupFromText
但这会删除所有 HTML 标签,这不是我想要的。搜索所以我能够找到如何在使用 AppleScript 解析 HTML 源代码的标签之间进行提取,但我不想解析文件。
我熟悉下拉列表中的 BBEdit Balance Tags
,Balance
但是当我运行时:
tell application "BBEdit"
activate
find "<span class=\"foo\">" searching in text 1 of text document "test.html" options {search mode:grep, wrap around:true} with selecting match
balance tags
end tell
它变得贪婪并抓住第一个标签到倒数第二个结束标签之间的整行,中间有文本,而不是把自己隔离到第一个标签的文本中。
tag
在我确实遇到过find tag
我可以做的字典中的进一步研究:set spanTarget to (find tag "span" start_offset counter)
然后用类定位标签|class| of attributes of tag of spanTarget
并使用balance tags
,但我仍然遇到与以前相同的问题。
因此,在纯AppleScript 中,如何删除与类关联的标签而不使其变得贪婪?