1

I am using a perl script to read in a file, but I'm not sure what encoding the file is in. Basically, my file is a list of book titles, but each book has other info associated with it (author, publication date, etc). So each book title is within a discrete chunk of data for the book. So I iterate through the file line by line until I find the regular expression '/Book Title: (.*)/' and take what's in the paren. Then, I create a separate .txt file with the name of the text file being my book. However, in my unix server, when I look at the name of the file, it's actually not, for example, 'LordOfTheFlies.txt' but rather 'LordOfTheFlies^M.txt'

What is this '^M'? Is that a weird end of line encoding I'm not taking into account? I tried chomp but it doesn't seem to be working. What is the best file encoding for working with perl?

4

3 回答 3

5

它是 Windows 系统在换行符之前插入的附加回车符(M == 第 13 个字母,因此 ASCII 13 可视化为 ^M)。

它与文件编码无关,它只是咬你的行尾策略。Perl 通常擅长正确处理行尾字符,但如果它们出现在行尾以外的其他地方,您必须自己处理。您可以使用 s/\r// 而不是 chomp() 将它们取出。

于 2010-03-01T07:44:49.277 回答
0

尝试剁碎,而不是“剁碎”。Chomp 删除了“换行符”。s/\r// 也不错。对于您的一般性问题,您可能希望为您必须使用 Perl 使您的生活更轻松、更好的文件类型使用适当的模块。

于 2010-03-01T19:48:13.373 回答
0

在处理文件之前,你需要知道文件的编码,这是由文件的生产者决定的。
"^M" 是 control-M,它是一个回车,在 Unix 文件系统中不需要。
看起来该文件是在 Unix 中创建并传输到 Windows 的。当文本文件作为二进制文件传输时,它也可以与 ftp 一起添加。

于 2010-03-01T07:46:59.537 回答