1

The use-case example is saving the contents of http://example.com as a filename on your computer, but with the unsafe characters (i.e. : and /) escaped.

The classic way is to use a regex to strip all non-alphanumeric-dash-underscore characters out, but then that makes it impossible to reverse the filename into a URL. Is there a way, possibly a combination of CGI.escape and another filter, to sanitize the filename for both Windows and *nix? Even if the tradeoff is a much longer filename?

edit:

Example with CGI.escape

 CGI.escape 'http://www.example.com/Hey/whatsup/1 2 3.html#hash'
 #=> "http%3A%2F%2Fwww.example.com%2FHey%2Fwhatsup%2F1+2+3.html%23hash"

A couple things...are % signs completely safe as file characters? Unfortunately, CGI.escape doesn't convert spaces in a malformed URL to %20 on the first pass, so I suppose any translation method would require changing all spaces to + with a gsub and then applying CGI.escape

4

2 回答 2

3

其中一种方法是“散列”文件名。例如,此问题的 URL 是:https://stackoverflow.com/questions/18234870/how-to-reversibly-escape-a-url-in-ruby-so-that-it-can-be-saved-to-the-file-syste。您可以使用 Ruby 标准库的digest/md5库来散列名称。简单而优雅。

require "digest/md5"

foldername = "https://stackoverflow.com/questions/18234870/how-to-reversibly-escape-a-url-in-ruby-so-that-it-can-be-saved-to-the-file-syste"
hashed_name = Digest::MD5.hexdigest(foldername) # => "5045cccd83a8d4d5c4fc01f7b4d8c502"

该方案的推论是 MD5 散列用于验证下载的真实性/完整性,因为出于所有实际目的,字符串的 MD5 摘要总是返回相同的十六进制字符串。

但是,我不会称其为“可逆”。您需要有一种自定义方式来查找生成的每个散列的 URL。可能是.yml包含该数据的文件。


更新:正如@the Tin Man.yml所建议的,当有大量文件需要存储时,一个简单的 SQLite 数据库会比一个文件好得多。

于 2013-08-14T15:27:33.670 回答
2

这是我的做法(根据需要调整正则表达式):

url = "http://stackoverflow.com/questions/18234870/how-to-reversibly-escape-a-url-in-ruby-so-that-it-can-be-saved-to-the-file-syste"
filename = url.each_char.map {|x|
  x.match(/[a-zA-Z0-9-]/) ? x : "_#{x.unpack('H*')[0]}"
}.join

编辑:

如果结果文件名的长度是一个问题,那么我会将文件存储在与 url 路径段同名的子目录中。

于 2013-08-14T18:43:00.230 回答