java - 从没有破折号的字符串创建 UUID

Question

如何从没有破折号的字符串创建 java.util.UUID？

"5231b533ba17478798a3f2df37de2aD7" => #uuid "5231b533-ba17-4787-98a3-f2df37de2aD7"

score 56 · Accepted Answer

tl;博士

java.util.UUID.fromString(
    "5231b533ba17478798a3f2df37de2aD7"
    .replaceFirst( 
        "(\\p{XDigit}{8})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}+)", "$1-$2-$3-$4-$5" 
    )
).toString()

5231b533-ba17-4787-98a3-f2df37de2ad7

或者将十六进制字符串的每一半解析为long整数，并传递给.UUID

UUID uuid = new UUID ( long1 , long2 ) ;

位，而不是文本

UUID是一个128 位的值。UUID实际上不是由字母和数字组成的，它是由位组成的。您可以将其视为描述一个非常非常大的数字。

我们可以将这些位显示为 128 0&个1字符。

0111 0100 1101 0010 0101 0001 0101 0110 0110 0110 0000 1110 0110 0100 0100 0100 0100 0100 0100 1100 1100 1010 0001 0111 0111 0111 10101101010110 1110 0110 0110 0110 0111 1111 1111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111111 11111111EM

人类不容易读取位，因此为方便起见，我们通常将 128 位值表示为由字母和数字组成的十六进制字符串。

74d25156-60e6-444c-a177-a96e67ecfc5f

这样的十六进制字符串不是 UUID 本身，只是一种人类友好的表示。根据 UUID 规范将连字符添加为规范格式，但是是可选的。

74d2515660e6444ca177a96e67ecfc5f

顺便说一句，UUID 规范明确规定在生成十六进制字符串时必须使用小写字母，而应允许输入大写字母。不幸的是，许多实现都违反了小写生成规则，包括来自 Apple、Microsoft 和其他公司的实现。请参阅我的博文。

以下是指 Java，而不是 Clojure。

在 Java 7（及更早版本）中，您可以使用java.util.UUID类来实例化基于带有连字符作为输入的十六进制字符串的 UUID。例子：

java.util.UUID uuidFromHyphens = java.util.UUID.fromString("6f34f25e-0b0d-4426-8ece-a8b3f27f4b63");
System.out.println( "UUID from string with hyphens: " + uuidFromHyphens );

但是，该 UUID 类在输入没有连字符的十六进制字符串时失败。这种失败是不幸的，因为 UUID 规范不需要十六进制字符串表示形式的连字符。这失败了：

java.util.UUID uuidFromNoHyphens = java.util.UUID.fromString("6f34f25e0b0d44268ecea8b3f27f4b63");

正则表达式

一种解决方法是格式化十六进制字符串以添加规范连字符。这是我尝试使用正则表达式格式化十六进制字符串。当心...此代码有效，但我不是正则表达式专家。您应该使此代码更健壮，例如在格式化之前检查字符串的长度是 32 个字符，然后是 36 个字符。

    // -----|  With Hyphens  |----------------------
java.util.UUID uuidFromHyphens = java.util.UUID.fromString( "6f34f25e-0b0d-4426-8ece-a8b3f27f4b63" );
System.out.println( "UUID from string with hyphens: " + uuidFromHyphens );
System.out.println();

// -----|  Without Hyphens  |----------------------
String hexStringWithoutHyphens = "6f34f25e0b0d44268ecea8b3f27f4b63";
// Use regex to format the hex string by inserting hyphens in the canonical format: 8-4-4-4-12
String hexStringWithInsertedHyphens =  hexStringWithoutHyphens.replaceFirst( "([0-9a-fA-F]{8})([0-9a-fA-F]{4})([0-9a-fA-F]{4})([0-9a-fA-F]{4})([0-9a-fA-F]+)", "$1-$2-$3-$4-$5" );
System.out.println( "hexStringWithInsertedHyphens: " + hexStringWithInsertedHyphens );
java.util.UUID myUuid = java.util.UUID.fromString( hexStringWithInsertedHyphens );
System.out.println( "myUuid: " + myUuid );

Posix 符号

您可能会发现这种替代语法更具可读性，在正则表达式中使用 Posix 表示法\\p{XDigit}代替[0-9a-fA-F]（请参阅模式文档）：

String hexStringWithInsertedHyphens =  hexStringWithoutHyphens.replaceFirst( "(\\p{XDigit}{8})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}+)", "$1-$2-$3-$4-$5" );

完整的例子。

java.util.UUID uuid =
        java.util.UUID.fromString (
                "5231b533ba17478798a3f2df37de2aD7"
                        .replaceFirst (
                                "(\\p{XDigit}{8})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}{4})(\\p{XDigit}+)",
                                "$1-$2-$3-$4-$5"
                        )
        );

System.out.println ( "uuid.toString(): " + uuid );

uuid.toString(): 5231b533-ba17-4787-98a3-f2df37de2ad7

score 19 · Accepted Answer

Clojure 的#uuid 标记文字是对java.util.UUID/fromString. 并且，fromString用“-”分割它并将其转换为两个Long值。（UUID的格式标准化为 8-4-4-4-12 十六进制数字，但“-”实际上仅用于验证和视觉识别。）

直接的解决方案是重新插入“-”并使用java.util.UUID/fromString.

(defn uuid-from-string [data]
  (java.util.UUID/fromString
   (clojure.string/replace data
                           #"(\w{8})(\w{4})(\w{4})(\w{4})(\w{12})"
                           "$1-$2-$3-$4-$5")))

如果你想要一些没有正则表达式的东西，你可以使用 aByteBuffer和DatatypeConverter。

(defn uuid-from-string [data]
  (let [buffer (java.nio.ByteBuffer/wrap 
                 (javax.xml.bind.DatatypeConverter/parseHexBinary data))]
    (java.util.UUID. (.getLong buffer) (.getLong buffer))))

score 12 · Accepted Answer

正则表达式解决方案可能更快，但您也可以查看 :)

String withoutDashes = "44e128a5-ac7a-4c9a-be4c-224b6bf81b20".replaceAll("-", "");      
BigInteger bi1 = new BigInteger(withoutDashes.substring(0, 16), 16);                
BigInteger bi2 = new BigInteger(withoutDashes.substring(16, 32), 16);
UUID uuid = new UUID(bi1.longValue(), bi2.longValue());
String withDashes = uuid.toString();

顺便说一下，从 16 个二进制字节转换为 uuid

  InputStream is = ..binarty input..;
  byte[] bytes = IOUtils.toByteArray(is);
  ByteBuffer bb = ByteBuffer.wrap(bytes);
  UUID uuidWithDashesObj = new UUID(bb.getLong(), bb.getLong());
  String uuidWithDashes = uuidWithDashesObj.toString();

score 11 · Accepted Answer

你可以做一个愚蠢的正则表达式替换：

String digits = "5231b533ba17478798a3f2df37de2aD7";                         
String uuid = digits.replaceAll(                                            
    "(\\w{8})(\\w{4})(\\w{4})(\\w{4})(\\w{12})",                            
    "$1-$2-$3-$4-$5");                                                      
System.out.println(uuid); // => 5231b533-ba17-4787-98a3-f2df37de2aD7

score 8 · Accepted Answer

与使用正则表达式和字符串操作相比，一个快得多（~ 900%）的解决方案是将十六进制字符串解析为 2 个 long 并从中创建 UUID 实例：

(defn uuid-from-string
  "Converts a 32digit hex string into java.util.UUID"
  [hex]
  (java.util.UUID.
    (Long/parseUnsignedLong (subs hex 0 16) 16)
    (Long/parseUnsignedLong (subs hex 16) 16)))

score 7 · Accepted Answer

public static String addUUIDDashes(String idNoDashes) {
    StringBuffer idBuff = new StringBuffer(idNoDashes);
    idBuff.insert(20, '-');
    idBuff.insert(16, '-');
    idBuff.insert(12, '-');
    idBuff.insert(8, '-');
    return idBuff.toString();
}

也许其他人可以评论这种方法的计算效率。（这不是我的应用程序所关心的问题。）

score 5 · Accepted Answer

@maerics答案的优化版本：

    String[] digitsList= {
            "daa70a7ffa904841bf9a81a67bdfdb45",
            "529737c950e6428f80c0bac104668b54",
            "5673c26e2e8f4c129906c74ec634b807",
            "dd5a5ee3a3c44e4fb53d2e947eceeda5",
            "faacc25d264d4e9498ade7a994dc612e",
            "9a1d322dc70349c996dc1d5b76b44a0a",
            "5fcfa683af5148a99c1bd900f57ea69c",
            "fd9eae8272394dfd8fd42d2bc2933579",
            "4b14d571dd4a4c9690796da318fc0c3a",
            "d0c88286f24147f4a5d38e6198ee2d18"
    };

    //Use compiled pattern to improve performance of bulk operations
    Pattern pattern = Pattern.compile("(\\w{8})(\\w{4})(\\w{4})(\\w{4})(\\w{12})");

    for (int i = 0; i < digitsList.length; i++)
    {
        String uuid = pattern.matcher(digitsList[i]).replaceAll("$1-$2-$3-$4-$5");
        System.out.println(uuid);
    }

score 3 · Accepted Answer

另一个解决方案类似于 Pawel 的解决方案，但不创建新字符串，只解决问题问题。如果性能是一个问题，请避免像瘟疫一样的 regex/split/replaceAll 和 UUID.fromString。

String hyphenlessUuid = in.nextString();
BigInteger bigInteger = new BigInteger(hyphenlessUuid, 16);
 new UUID(bigInteger.shiftRight(64).longValue(), bigInteger.longValue());

score 2 · Accepted Answer

我相信以下是性能方面最快的。它甚至比Long.parseUnsignedLong 版本快一点。它是来自java-uuid-generator的稍微改动的代码。

 public static UUID from32(
        String id) {
    if (id == null) {
        throw new NullPointerException();
    }
    if (id.length() != 32) {
        throw new NumberFormatException("UUID has to be 32 char with no hyphens");
    }

    long lo, hi;
    lo = hi = 0;

    for (int i = 0, j = 0; i < 32; ++j) {
        int curr;
        char c = id.charAt(i);

        if (c >= '0' && c <= '9') {
            curr = (c - '0');
        }
        else if (c >= 'a' && c <= 'f') {
            curr = (c - 'a' + 10);
        }
        else if (c >= 'A' && c <= 'F') {
            curr = (c - 'A' + 10);
        }
        else {
            throw new NumberFormatException(
                    "Non-hex character at #" + i + ": '" + c + "' (value 0x" + Integer.toHexString(c) + ")");
        }
        curr = (curr << 4);

        c = id.charAt(++i);

        if (c >= '0' && c <= '9') {
            curr |= (c - '0');
        }
        else if (c >= 'a' && c <= 'f') {
            curr |= (c - 'a' + 10);
        }
        else if (c >= 'A' && c <= 'F') {
            curr |= (c - 'A' + 10);
        }
        else {
            throw new NumberFormatException(
                    "Non-hex character at #" + i + ": '" + c + "' (value 0x" + Integer.toHexString(c) + ")");
        }
        if (j < 8) {
            hi = (hi << 8) | curr;
        }
        else {
            lo = (lo << 8) | curr;
        }
        ++i;
    }
    return new UUID(hi, lo);
}

score 2 · Accepted Answer

这是一个更快的示例，因为它不使用正则表达式。

public class Example1 {
    /**
     * Get a UUID from a 32 char hexadecimal.
     * 
     * @param string a hexadecimal string
     * @return a UUID
     */
    public static UUID toUuid(String string) {

        if (string == null || string.length() != 32) {
            throw new IllegalArgumentException("invalid input string!");
        }

        char[] input = string.toCharArray();
        char[] output = new char[36];

        System.arraycopy(input, 0, output, 0, 8);
        System.arraycopy(input, 8, output, 9, 4);
        System.arraycopy(input, 12, output, 14, 4);
        System.arraycopy(input, 16, output, 19, 4);
        System.arraycopy(input, 20, output, 24, 12);

        output[8] = '-';
        output[13] = '-';
        output[18] = '-';
        output[23] = '-';

        return UUID.fromString(output)
    }

    public static void main(String[] args) {
        UUID uuid = toUuid("daa70a7ffa904841bf9a81a67bdfdb45");
    }
}

uuid-creator中有一个编解码器可以更有效地做到这一点：Base16Codec. 例子：

// Parses base16 strings with 32 chars (case insensitive)
UuidCodec<String> codec = new Base16Codec();
UUID uuid = codec.decode("0123456789AB4DEFA123456789ABCDEF");

java - 从没有破折号的字符串创建 UUID

10 回答 10

tl;博士

位，而不是文本

正则表达式

Posix 符号

Related

Reference