18

There are a lot of questions here on stackoverflow regarding URL shorteners as well as elsewhere on the internet, e.g.

How to code a URL shortener?

How do URL shortener calculate the URL key? How do they work?

http://www.codinghorror.com/blog/2007/08/url-shortening-hashes-in-practice.html

However, there is one thing I don't understand. For instance, http://goo.gl uses four characters at the moment. However, they claim their short URLs don't expire. As mentioned in the article on coding horror, if they can't recycle URLs, the only possible solution is at one point to add an additional character.

Ok, so far so good. With 4 characters that means about 15 million unique addresses. For something like Google Maps, I don't think that is very much and if you can't recycle, my guess is they run out of available addresses fairly quickly.

Now for the part I don't get. While handing out addresses, they start to run out of unused addresses. They have to check if a newly generated address has not been issued yet. The chance this has happened and the address is already in use increases. The straightforward solution of course is to generate a new URL over and over again until they find a free one or until they have generated all 1.5 million alternatives. However, this surely can't be how they actually do it, because this would be far too time-consuming. So how do they manage this?

Also, there are probably several visitors at once asking for a short URL, so they must have some synchronization going on as well. But how should the situation be managed when the fifth character needs to be added?

Finally, when doing some research on how the URLs from http://goo.gl work, of course I requested a short URL for a map on Google Maps several times. None of them will ever be used. However, when Google would strictly enforce the policy of URLs not expiring once issued, this means that there are lots and lots of dormant URLs in the system. Again, I assume Google (and the other services as well) have come up with a solution to this problem as well. I could imagine a clean up service which recycle URLs which have not been visited in the first 48 hours after creation or less then 10 times in the first week. I hope someone can shed some light on this issue as well.

In short, I get the general principle of URL shorteners, but I see several problems when these URLs cannot expire. Does anyone know how the problems mentioned above might be resolved and are there any other problems?


EDIT

Ok, so this blog post sheds some light on things. These services don't randomly generate anything. They rely on the underlying database's auto-increment functionality and apply a simple conversion on the resulting id. That eliminates the need to check if an id already exists (it doesn't) and the database handles synchronization. That still leaves one of my three questions unanswered. How do these services "know" if a link is actually used once created?

4

1 回答 1

34

为什么 URL 缩短器不删除条目

我确实写了 TinyURL(十年前)来回馈我不需要的条目。他们的回答让我意识到我是多么可笑:他们告诉我“只需创建您需要的所有 URL ”。数字自己说话:

A - 使用 26 个小写字母 + 26 个大写字母 + 10 个数字(合理的站点选择),使用 ONE 字符可以得到 62 个位置(即 62 个缩短的 URL),然后每个附加字符将位置编号乘以62:

  • 0 个字符 = 1 个网址
  • 1 个字符 = 62 个 URL
  • 2 个字符 = 3,844(村庄中每个人 1 个 URL)
  • 3 个字符 = 238,328(同上,在一个城市)
  • 4 个字符 = 14,776,336(在洛杉矶地区)
  • 5 个字符 = 916,132,832(在美洲,N+Central+S)
  • 6 个字符 ~ 56,800,235,580(世界上每个人 8 个 URL)
  • 7 个字符 ~ 3,521,614,606,000(每个人 503 个,世界上每个网页 4 个)
  • 8 个字符 ~ 218,340,105,600,000(每个人 31,191 个 URL)
  • 9 个字符 ~ 13,537,708,655,000,000(每个人约 200 万个 URL)
  • 10 个字符 ~ 839,299,365,900,000,000(每个人约 1200 亿个 URL)
  • 11 个字符 ~ 52,036,560,680,000,000,000

B - 实际上需求和用途低于人们的预期。很少有人创建短 URL,每个人创建的 URL 也很少。在大多数情况下,原始 URL 就足够了。结果是,最流行的缩短器,经过多年,仍然只用 4 或 5 个字符满足今天的需求,并且在需要时添加另一个字符几乎为零。显然 goo.gl 和 goo.gl/maps 都使用 5 个字符,youtube 使用 11 个(使用上面的 62 个字符,加上破折号和其他一些字符)。

C - 托管(存储 + 运营)一个 URL 的成本是 1Terabyte 每年 1000 美元,每 TB 能够包含 50 亿个 URL,因此 1 个 URL 的托管成本为 0.2 微美元/年。但是Shortener的收益也很薄,因此业务不是很强大。对于用户来说,一个 URL 的好处很难评估,但是丢失的链接将比托管成本高得多。

D - 如果用户创建短 URL 有在未来几年失效的风险,那么它是没有意义的,因此持久性是 Shortener 的主要吸引力,除非他们被迫,否则一个认真的 Shortener 可能永远不会停止为他们服务停业;然而这已经发生了,无论如何,短 URL 有很多缺点和好处,正如Wikipedia“URL 缩短”中所解释的那样(针对用户、目标站点或缩短器的各种黑客攻击的风险;例如人们可以通过 bot 请求千兆数量的 URL 来攻击 Shortener,大多数 Shortener 肯定会抵御这种威胁)。

凡尔赛,2013 年 3 月 12 日星期二 20:48:00 +0100,编辑时间 21:01:25

于 2013-03-12T19:48:00.410 回答