unix - 如何检查文本文件的行尾以查看它是 unix 还是 dos 格式？

Question

如果文件是 unix 格式（仅在每行末尾），我需要将文本文件转换为 dos 格式（每行以0x0d0x0a, 而不是仅结尾）。0x0a0x0a

我知道如何转换它（sed 's/$/^M/'），但不知道如何检测文件的行尾字符。

我正在使用 ksh。

任何帮助，将不胜感激。

[更新]：有点想通了，这是我的 ksh 脚本来做检查。

[qiangxu@host:/my/folder]# cat eol_check.ksh
#!/usr/bin/ksh

if ! head -1 $1 |grep ^M$ >/dev/null 2>&1; then
  echo UNIX
else
  echo DOS
fi

在上面的脚本中，^M应该插入viwithCtrl-V和Ctrl-M。

想知道有没有更好的方法。

score 10 · Accepted Answer

只需使用file命令。如果文件包含CR LF结尾带有的行，则会通过注释打印出来： 'ASCII text, with CRLF line terminators'。

例如

if file  myFile | grep "CRLF"  > /dev/null 2>&1;
  then
  ....
fi

score 6 · Accepted Answer

与 Cygwin 和一些最近的 Linux 发行版一起安装的最新 (7.1) 版本的dos2unix (和unix2dos ) 命令有一个方便的--info选项，可以打印出每个文件中不同类型的换行符的计数。这是 dos2unix 7.1 (2014-10-06) http://waterlan.home.xs4all.nl/dos2unix.html

从手册页：

--info[=FLAGS] FILE ...
       Display file information. No conversion is done.

The following information is printed, in this order: 
number of DOS line breaks, number of Unix line breaks, number of Mac line breaks, byte order mark, text or binary, file name.

       Example output:
            6       0       0  no_bom    text    dos.txt
            0       6       0  no_bom    text    unix.txt
            0       0       6  no_bom    text    mac.txt
            6       6       6  no_bom    text    mixed.txt
           50       0       0  UTF-16LE  text    utf16le.txt
            0      50       0  no_bom    text    utf8unix.txt
           50       0       0  UTF-8     text    utf8dos.txt
            2     418     219  no_bom    binary  dos2unix.exe

Optionally extra flags can be set to change the output. One or more flags can be added.
       d   Print number of DOS line breaks.
       u   Print number of Unix line breaks.
       m   Print number of Mac line breaks.
       b   Print the byte order mark.
       t   Print if file is text or binary.
       c   Print only the files that would be converted.

With the "c" flag dos2unix will print only the files that contain DOS line breaks, unix2dos will print only file names that have Unix line breaks.

因此：

if [[ -n $(dos2unix --info=c "${filename}") ]] ; then echo DOS; fi

反过来：

if [[ -n $(unix2dos --info=c "${filename}") ]] ; then echo UNIX; fi

score 2 · Accepted Answer

2

if awk  '/\r$/{exit 0;} 1{exit 1;}' myFile
then
  echo "is DOS"
fi

于 2013-08-07T15:28:45.237 回答

score 1 · Accepted Answer

我无法在 AIX 上进行测试，但请尝试：

if [[ "$(head -1 filename)" == *$'\r' ]]; then echo DOS; fi

score 1 · Accepted Answer

您可以简单地从所有行中删除任何现有的回车，然后将回车添加到所有行的末尾。那么传入文件的格式无关紧要。传出格式将始终是 DOS 格式。

sed 's/\r$//;s/$/\r/'

score 0 · Accepted Answer

我可能迟到了，但是我遇到了同样的问题，我不想将特殊^M字符放在我的脚本中（我担心某些编辑器可能无法正确显示特殊字符，或者某些后来的程序员可能会替换它由 2 个普通字符组成：^ 和 M...)。

我找到的解决方案通过让外壳转换其十六进制值来将特殊字符提供给 grep：

if head -1 ${filename} | grep $'[\x0D]' >/dev/null
then
  echo "Win"
else
  echo "Unix"
fi

不幸的是，我无法使$'[\x0D]'构造在 ksh 中工作。在 ksh 中，我发现了这个： if head -1 ${filename} | od -x | grep '0d0a$' >/dev/null 然后 echo "Win" else echo "Unix" fi

od -x以十六进制代码显示文本。 '0d0a$'是 CR-LF（DOS-Win 行终止符）的十六进制代码。Unix 行终止符是'0a00$'

unix - 如何检查文本文件的行尾以查看它是 unix 还是 dos 格式？

6 回答 6

Related

Reference