3

我正在向我的项目添加一个功能,我们正在生成指向我们网站内部内容的链接,并且我们希望这些链接尽可能短,因此我们将制作自己的“URL Shortener”。

我想知道用于生成的短 URL 的最佳编码/字母表是什么。这在很大程度上是一个主观问题,我想知道您对最佳方法/权衡的看法。

我想到了几个选项:
- 数字,大写 + 小写(以 62 为底)
- 数字,只有小写(以 36 为底)
- 以 32 为底(http://www.crockford.com/wrmg/base32.html
- linkpot。 net(使用常见的短英文单词)

当然,后两个更适合点击以外的用途,前两个更适合推特。

此外,如果我要使用“仅可点击”的 URL,我想让字母表尽可能大,并添加其他符号。

  • 我可以在不会进行 URL 编码的 URL 中使用哪些符号?
  • 我应该使用什么符号?其中一些会被证明是有问题的吗?例如,我正在考虑斜线和点。

你怎么看?

注意:这些 URL 的主要目标是 Twitter。记住这一点,我们可能应该拥有尽可能大的字母表,因为大多数人都会点击。但是,我对您以其他方式(通过电话、印刷纸等)使用短 URL 的人的体验感兴趣。这种情况发生的可能性有多大?

注意 2:我不是在制作“又一个 URL 缩短器”,请不要用反对票来谴责我。我们正在为我们网站中的内部内容生成短 URL,不允许任何人缩短任何 URL。想象一下,当您生成指向特定坐标的链接时,Google 地图会为您提供短 URL。

4

3 回答 3

3

我会选择 Base-62,它是最短的。缩短的 URL 并不意味着任何人都可以手动输入,因此不必担心区分大小写。

于 2009-09-11T20:36:15.967 回答
2

If these are "clickable only URLS" I'd probably go with a base-64 encoding. MIME's base-64 uses a couple of characters you shouldn't use, but there are enough unreserved safe characters in URLs that you can just swap them out. (Also, you don't need the padding that MIME's base-64 uses, since you know when your URL ends.)

Here's a page that discusses one way to do this.

You can look at RFC2396 to figure out exactly what characters are safe in URIs if you want to double check.

于 2009-09-11T17:41:57.717 回答
2

I'd be curious to know a little more about the implementation. How will these URLs be "unshortened", or will the internal pages being accessed be saved as shortened URLs? In either case, even if you went with the encoding set of [A-Z] you'd be able to reference 26 * 26 * 26 = 17,576 pages with only 3 characters; how many internal web pages are you talking about?

In general I would lean on what your use case requirements are for picking the right encoding set. Are you planning on having these links available for "uses other than clicking"? What would those uses be, and how do you suspect they'll alter the encoding? (For example, using parts of the URL as a file name on a case-insensitive file system reduces the available character set.)

Here's an informative page on the character set you have available to you when writing a URL.

于 2009-09-11T17:48:41.637 回答