algorithm - Will this obfuscation algorithm for a URL shortener work?

Question

DISCLAIMER: I am not asking how to make a URL shortener (I have already implemented the "bijective function" answer found HERE that uses a base-62 encoded string). Instead, I want to expand this implementation to obfuscate the generated string so that it is both:

A) not an easily guessable sequence, and

B) still bijective.

You can easily randomize your base-62 character set, but the problem is that it still increments like any other number in any other base. For example, one possible incremental progression might be {aX9fgE, aX9fg3, aX9fgf, aX9fgR, … ,}

I have come up with an obfuscation technique that I am pleased with in terms of requirement A), but I'm only partially sure that it satisfies B). The idea is this:

The only thing that is guaranteed to change in the incremental approach is the "1's place" (I'll use decimal terminology for practicality reasons). In the sample progression I gave earlier, that would be {E, 3, f, R, …}. So if each character in the base-62 set had its own unique offset number (say, its distance from the "zero character"), then you could apply the offset of the "1's place" character to the rest of the string.

For instance, let's assume a base-5 set with characters {A, f, 9, p, Z, 3} (in ascending order from 0 to 5). Each one would then have a unique offset of 0 to 5 respectively. Counting would look like {A, f, 9, p, Z, 3, fA, ff, f9, fp, …} and so on. So the algorithm, when given a value of fZ3p, would look at the p and, having an offset of +3, would permute the string into Zf9p (assuming the base-5 set is a circular array). The next incremental number would be fZ3Z, and with Z's offset being +4, the algorithm returns 39pZ. These permutated results would be handed off to the user as his/her "unique URL", who would never see the actual base-62 encoded string.

This approach certainly seems reversible; just look at the last character, and perform the same permutation with the negative offset. And I'm thinking that for this reason, it has to still be bijective. But I don't know if this is necessarily true? Are there any edge/corner cases I'm not considering?

EDIT : My intentions are more heavily weighed towards the length of the shortened-URL rather than the security of the pattern. I realize there are plenty of solutions involving cryptographic functions, block ciphers, etc. But I would like to emphasize that I am not asking the best way to achieve A), but rather, "is my offset-approach satisfying B)".

Any holes you can find would be appreciated.

score 2 · Accepted Answer

如果您真的希望它们难以猜测，请保持简单。

从以计数器模式运行的普通加密算法开始。当您获得要缩短的 URL 时，增加您的计数器，对其进行加密，将结果转换为使用可打印字符（例如，base 64）的内容并将原始 URL 和缩短的版本放入您的表中，以便您可以从需要时缩短版本。

那时唯一真正的问题是使用什么加密算法。反过来，这取决于您的威胁模型。我看不到通过使缩短的 URL 难以猜测而获得的确切收益，因此我对威胁模型有点不确定。

如果你想让它稍微难以猜测，你可以使用 40 位版本的 RC4 之类的东西。这很容易破解，但足以让大多数人免于烦恼。

如果您想要更高的安全性，您可以升级到 DES。这已经被打破了，但即使在这么晚的日期打破它也是相当多的工作。

如果你想要更多的安全性，你可以使用 AES。

请注意，随着您提高安全性，缩短的 URL 会变长。RC4-40 以 5 字节开始，DES 以 7 字节开始，AES 以 32 字节开始。根据您转换为可打印文本的方式，它至少会扩大一点。

score 1 · Accepted Answer

我试图解决同样的问题（在 php 中）并最终得到了这些函数：

所以对于A）：（对我来说）它不容易猜到，因为你不能在没有算法的情况下增加一个字符串来获取下一条记录

对于 B）：据我所知，它是 100% 双射的。

感谢@Nemo 为 feistel 网络命名，这使我找到了我链接到的第一个功能。

score 1 · Accepted Answer

另一种选择是使用Luby-Rackoff 构造（另请参见此处），这是一种从伪随机函数生成伪随机排列的方法。

您只需要选择一个“圆形函数”F。F 必须将一个密钥 K 和一个比特块作为您正在编码的一半大小的比特块。F 必须产生一个比特块作为输出，它的大小也是你编码的大小的一半。

然后你只需运行 Luby-Rackoff 构造（又名“Feistel 网络”）四轮，每轮使用不同的 K。

该构造保证结果是一个双射映射，并且如果 F 难以反转，则将难以反转。

score 0 · Accepted Answer

如果您试图避免人们抓取 URL，我认为 Nick Johnson 的想法是正确的，即您需要确保您的 URL 空间不密集。

这是一个简单的想法：获取您的 URL，并在其前面添加一些随机字符。然后通过压缩算法运行它——我会尝试范围编码（如果你找到一个好的库，你可以指定基础）。这应该可以解压缩为原始形式，并且应该既影响局部性又使编码空间更加稀疏。

也就是说，我想几乎所有的 URL 缩短器都会在服务器端保留一个带有状态的哈希表。您还打算如何将一百个字符的 URL 无损压缩成 5 或 6 个字符？

algorithm - Will this obfuscation algorithm for a URL shortener work?

4 回答 4

Related

Reference