1

Applescript:删除文本文件中的重复项

我有一个我创建的 .txt 文件,我正在尝试删除某些重复的文本行。

我给这个文件起的名字是 today.txt,它位于我的桌面上,它包含一个 NYtimes Today's Paper 的 URL 列表。但是,通过解析 html 文件,我收到了几个重复的 url,如下所示:

http://www.nytimes.com/2012/07/06/education/no-child-left-behind-whittled-down-under-obama.html
http://www.nytimes.com/2012/07/06/business/global/markets-look-to-europes-central-bank-for-action.html
http://www.nytimes.com/2012/07/06/business/global/markets-look-to-europes-central-bank-for-action.html
http://www.nytimes.com/2012/07/06/nyregion/3-children-killed-in-long-island-boating-accident.html
http://www.nytimes.com/2012/07/06/nyregion/3-children-killed-in-long-island-boating-accident.html
http://www.nytimes.com/2012/07/06/world/americas/earthquake-relief-where-haiti-wasnt-broken.html
http://www.nytimes.com/2012/07/06/world/americas/earthquake-relief-where-haiti-wasnt-broken.html
http://www.nytimes.com/2012/07/06/us/politics/journal-critique-of-romney-shows-murdoch-doubt-on-candidacy.html
http://www.nytimes.com/2012/07/06/us/politics/journal-critique-of-romney-shows-murdoch-doubt-on-candidacy.html
http://www.nytimes.com/2012/07/06/technology/at-hacker-hostels-living-on-the-cheap-and-dreaming-of-digital-glory.html
http://www.nytimes.com/2012/07/06/technology/at-hacker-hostels-living-on-the-cheap-and-dreaming-of-digital-glory.html

我一直在尝试通过 Applescript 中的do shell 脚本删除重复项,但我无法使其工作。这是我的代码:

set delDups to do shell script "sort /Users/paolob/Desktop/today.txt | uniq -u"
return delDups

所以我的问题是:如何删除today.txt文件中的重复项,然后将结果保存到同一个today.txt文件中

任何帮助将不胜感激。先感谢您。

编辑
如果实际上 shell 脚本或您建议的任何重复删除器直接在 Applescript 编辑器中读取文本,然后将新文本设置为 *new_text* 变量,这将更加经济和快捷。

4

3 回答 3

3

试试这个...

set filePath to (path to desktop as text) & "today.txt"
set theText to read file filePath
set textList to paragraphs of theText

set uniqueList to {}
repeat with i from 1 to count of textList
    set thisParagraph to item i of textList
    if thisParagraph is not in uniqueList then set end of uniqueList to thisParagraph
end repeat

set {tids, text item delimiters} to {text item delimiters, return}
set uniqueText to uniqueList as text
set text item delimiters to tids

set openFile to open for access file filePath with write permission
set eof of openFile to 0
write uniqueText to openFile starting at eof as text
close access openFile
于 2012-07-06T14:25:20.350 回答
1

这可以仅使用 shell 脚本来完成。你真的不需要任何applescript,除非它是更大程序的一部分。

以下将排序并强制唯一性并将其保存回同一文件。

sort -u -o /Users/paolob/Desktop/today.txt /Users/paolob/Desktop/today.txt

这可以像这样用applescript包装:

do shell script "sort -u -o /Users/paolob/Desktop/today.txt /Users/paolob/Desktop/today.txt"

如果您想在手后进行更多处理,则可以使用以下方法:

set myText to do shell script "sort -u /Users/paolob/Desktop/today.txt"
于 2012-07-06T14:32:00.227 回答
0

这可以在红宝石中完成。在您的文件所在的文件夹中从终端打开“irb”,并在交互式 shell 中执行以下操作:

file = File.new("test.txt", 'r') #这会打开“test.txt”(替换你自己的文件名)

array = [] #创建一个新数组。

file.lines.each{|k| array << k.to_s} #将文件中的行放入数组中

数组.uniq!#使这些行独一无二

File.open("outfile.txt", "w"){|file| s = String.new(); 数组.each{|k| s << k}; file.puts(s)} # 创建一个文件(outfile.txt)并将唯一的行写入文件

退出#关闭irb

于 2012-07-06T14:40:14.150 回答