emacs - 如何删除emacs中的重复行

Question

我有很多行的文本，我的问题是如何删除 emacs 中的重复行？在没有外部工具的 emacs 或 elisp 包中使用该命令。

例如：

this is line a
this is line b
this is line a

删除第 3 行（与第 1 行相同）

this is line a
this is line b

score 46 · Accepted Answer

如果您有 Emacs 24.4 或更新版本，最简洁的方法就是使用新delete-duplicate-lines功能。注意

这适用于区域，而不是缓冲区，因此请先选择所需的文本
它保持原件的相对顺序，杀死重复件

例如，如果您的输入是

test
dup
dup
one
two
one
three
one
test
five

M-x delete-duplicate-lines会成功的

test
dup
one
two
three
five

您可以选择通过在其前面加上通用参数 ( C-u) 来向后搜索。结果将是

dup
two
three
one
test
five

归功于emacsredux.com 。

其他回旋处选项，不给出完全相同的结果，可通过 Eshell 获得：

sort -u; 不保持原件的相对顺序
uniq; 更糟糕的是，它需要对其输入进行排序

score 19 · Accepted Answer

将此代码放入您的 .emacs：

(defun uniq-lines (beg end)
  "Unique lines in region.
Called from a program, there are two arguments:
BEG and END (region to sort)."
  (interactive "r")
  (save-excursion
    (save-restriction
      (narrow-to-region beg end)
      (goto-char (point-min))
      (while (not (eobp))
        (kill-line 1)
        (yank)
        (let ((next-line (point)))
          (while
              (re-search-forward
               (format "^%s" (regexp-quote (car kill-ring))) nil t)
            (replace-match "" nil nil))
          (goto-char next-line))))))

用法：

M-x uniq-lines

score 13 · Accepted Answer

在linux中，选择区域，然后输入

M-| uniq <RETURN>

没有重复的结果在新缓冲区中。

score 2 · Accepted Answer

(defun unique-lines (start end)
  "This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are 
removed sans the first one, which may be confusing!"
  (interactive "r")
  (let ((hash (make-hash-table :test #'equal)) (i -1))
    (dolist (s (split-string (buffer-substring-no-properties start end) "$" t)
               (let ((lines (make-vector (1+ i) nil)))
                 (maphash 
                  (lambda (key value) (setf (aref lines value) key))
                  hash)
                 (kill-region start end)
                 (insert (mapconcat #'identity lines "\n"))))
      (setq s                           ; because Emacs can't properly
                                        ; split lines :/
            (substring 
             s (position-if
                (lambda (x)
                  (not (or (char-equal ?\n x) (char-equal ?\r x)))) s)))
      (unless (gethash s hash)
        (setf (gethash s hash) (incf i))))))

替代：

不会使用撤消历史来存储匹配项。
通常会更快（但如果您追求终极速度 - 构建前缀树）。
具有替换所有以前的换行符的效果，无论它们是什么\n（UNIX 样式）。根据您的情况，这可能是一个好处或一个缺点。
split-string如果您以接受字符而不是正则表达式的方式重新实现，您可以让它更好一点（更快）。

稍长一些，但也许是更有效的变体：

(defun split-string-chars (string chars &optional omit-nulls)
  (let ((separators (make-hash-table))
        (last 0)
        current
        result)
    (dolist (c chars) (setf (gethash c separators) t))
    (dotimes (i (length string)
                (progn
                 (when (< last i)
                   (push (substring string last i) result))
                 (reverse result)))
      (setq current (aref string i))
      (when (gethash current separators)
        (when (or (and (not omit-nulls) (= (1+ last) i))
                  (/= last i))
          (push (substring string last i) result))
        (setq last (1+ i))))))

(defun unique-lines (start end)
  "This will remove all duplicating lines in the region.
Note empty lines count as duplicates of the empy line! All empy lines are 
removed sans the first one, which may be confusing!"
  (interactive "r")
  (let ((hash (make-hash-table :test #'equal)) (i -1))
    (dolist (s (split-string-chars
                (buffer-substring-no-properties start end) '(?\n) t)
               (let ((lines (make-vector (1+ i) nil)))
                 (maphash 
                  (lambda (key value) (setf (aref lines value) key))
                  hash)
                 (kill-region start end)
                 (insert (mapconcat #'identity lines "\n"))))
      (unless (gethash s hash)
        (setf (gethash s hash) (incf i))))))

score 2 · Accepted Answer

其他方式：

选择文本区域。
Ctrl-U（前缀），M-| （shell-command-on-region），sort -u（在选择上运行并用其输出替换选择的命令）。

emacs - 如何删除emacs中的重复行

5 回答 5

Related

Reference