0

我现在正在处理我的 applescript,但我被困在这里.. 让我们以这个片段作为 html 代码的示例

<body><div>Apple don't behave accordingly <a href = "http://apple.com>apple</a></div></body>

我现在需要的是返回没有 html 标签的单词。通过删除包含所有内容的括号,或者可能有任何其他方式将html重新格式化为纯文本..

结果应该是:

苹果没有相应的行为

4

3 回答 3

1

由于我遇到的问题,我想我会添加一个额外的答案。如果您希望 UTF-8 字符不会丢失,您需要:

set plain_text to do shell script "echo " & quoted form of ("<!DOCTYPE HTML PUBLIC><meta charset=\"UTF-8\">" & html_string) & space & "| textutil  -convert txt  -stdin -stdout"

您基本上需要添加<meta charset=\"UTF-8\">元标记以确保 textutil 将其视为 utf-8 文档。

于 2015-11-18T04:39:53.910 回答
0

使用textutil怎么样?

on run -- example (don't forget to escape quotes)
    removeMarkup from "<body><div>Apple don't behave accordingly <a href = \"http://apple.com\">apple</a></div></body>"
end run

to removeMarkup from someText -- strip HTML using textutil
    set someText to quoted form of ("<!DOCTYPE HTML PUBLIC>" & someText) -- fake a HTML document header
    return (do shell script "echo " & someText & " | /usr/bin/textutil -stdin -convert txt -stdout") -- strip HTML
end removeMarkup
于 2011-08-25T03:08:26.360 回答
0
on findStrings(these_strings, search_string)
    set the foundList to {}
    repeat with this_string in these_strings
        considering case
            if the search_string contains this_string then set the end of the foundList to this_string
        end considering
    end repeat
    return the foundList
end findStrings

findStrings({"List","Of","Strings","To","find..."}, "...in String to search")
于 2011-08-25T01:47:09.753 回答