0

I wanted to compare the LCS of two files from their binary, therefore i used the usual LCS source code, and using the GenStr command to change the bytes of the file to String first. The problem is, I received memory out of bound error because comparing String has limit, therefore i am planning to use array that stores the bytes then compare it. Is it possible to use LCS algorithm to compare two arrays of bytes?

EDIT:

public static byte[] Compare(byte[] x, byte[] y) {

    int i, j;
    final int x_length = x.length;
    final int y_length = y.length;
    int n = 2048;
    int m = 2048;


    // D[i][j] = direction, L[i][j] = Length of LCS 
    int[][] D = new int[n + 1][m + 1];
    byte[][] L = new byte[n + 1][m + 1]; // { 1, 2, 3 }

    // D[i][0] = 0 for 0<=i<=n 
    // D[0][j] = 0 for  0<=j<=m 
    for (i = 1; i <= n; i++) {
        for (j = 1; j <= m; j++) {
            if (x[i - 1] == y[j - 1]) {
                D[i][j] = D[i - 1][j - 1] + 1;
                L[i][j] = 1;
            } else if (D[i - 1][j] >= D[i][j - 1]) {
                D[i][j] = D[i - 1][j];
                L[i][j] = 2;
            } else {
                D[i][j] = D[i][j - 1];
                L[i][j] = 3;
            }
        }
    }

    // Backtrack 
    ByteArrayOutputStream lcs = new ByteArrayOutputStream();
    i = n;  
    j = m;
    while (i != 0 && j != 0) {
        switch (L[i][j]) {
            case 1:   // diagonal 
                lcs.write(x[i - 1]); // Unreversed LCS
                --i;
                --j;
                break;
            case 2:  // up 
                --i;
                break;
            case 3:  // backward 
                --j;
                break;
        }
    }
    byte[] result = lcs.toByteArray();

    // Reverse:
    for (i = 0, j = result.length - 1; i < j; ++i, --j) {
        byte b = result[i];
        result[i] = result[j];
        result[j] = b;
    }
    return result;

    //While not end of file
    while(n < x_length && m < y_length){
        if(n+2048 < x.length){
            n = n+2048;
        } else {
            n = x.length;
        }

        if(m+2048 < y.length){
            m = m+2048;
        } else {
            m = y.length;
        }

    // D[i][j] = direction, L[i][j] = Length of LCS 
    int[][] D_new = new int[n + 1][m + 1];
    byte[][] L_new = new byte[n + 1][m + 1]; // { 1, 2, 3 }

    // D[i][0] = 0 for 0<=i<=n 
    // D[0][j] = 0 for  0<=j<=m 
    for (i = i+2048; i <= n; i++) {
        for (j = j+2048; j <= m; j++) {
            if (x[i - 1] == y[j - 1]) {
                D_new[i][j] = D_new[i - 1][j - 1] + 1;
                L_new[i][j] = 1;
            } else if (D_new[i - 1][j] >= D_new[i][j - 1]) {
                D_new[i][j] = D_new[i - 1][j];
                L_new[i][j] = 2;
            } else {
                D_new[i][j] = D_new[i][j - 1];
                L_new[i][j] = 3;
            }
        }
    }

    // Backtrack 
    ByteArrayOutputStream lcs_next = new ByteArrayOutputStream();
    i = n;  
    j = m;
    while (i != 0 && j != 0) {
        switch (L[i][j]) {
            case 1:   // diagonal 
                lcs_next.write(x[i - 1]); // Unreversed LCS
                --i;
                --j;
                break;
            case 2:  // up 
                --i;
                break;
            case 3:  // backward 
                --j;
                break;
        }
    }
    byte[] result_new = lcs_next.toByteArray();

    // Reverse:
    for (i = 0, j = result_new.length - 1; i < j; ++i, --j) {
        byte b = result_new[i];
        result_new[i] = result_new[j];
        result_new[j] = b;
    }
    return result_new;
    Arrays.fill(D_new, null);
    Arrays.fill(L_new, null);
    Arrays.fill(result_new, null);
    lcs_next.reset();
}
}

I tried, but haven't been able to check if this can be used or not, because of some errors.

Questions:

  1. how do you append the lcs in line (return result) and line (return result_new)?
  2. how do you clear the array so i can use it over and over again with different input? (Array.fill(D_new, null) and Array.fill(L_new, null) doesn't work)?

Thank you in advance

4

2 回答 2

1

没有什么可以阻止您使用byte数组。这将使用int数组的一半内存,但它的最大长度将是相同的:Integer.MAX_VALUE. 如果您的 RAM 用完了,但没有达到长度限制,那么这可能会节省您的时间。

如果这些来自文件,那么这就是你应该做的。您真的不应该将它们作为整个字符串阅读。逐字节读取它们。

但是,如果文件很大(超过 2GB),正确的做法是随时处理文件,而不是事先读取它们,并使用文件来存储您正在创建的 LCS 数据。该算法的好处是所有访问都是本地化的:您按顺序扫描输入文件(因此您不会从预先读取它们中获得任何收益);并且您在计算新值时仅考虑前一行和当前行来编写非常接近顺序的数组(因此将它们放在 RAM 中也不会获得太多收益)。

这样做可以让您任意缩放文件。CPU时间将成为决定因素。磁盘缓存将为您提供接近与首先读取文件并从 RAM 执行此操作所获得的性能相同的性能。

于 2014-10-14T10:59:53.747 回答
0

不考虑算法的转换。

在 javanew中初始化为 0 / 0.0 / false / null。

另一方面,添加到 lcs 不能开箱即用。但是反转数组很简单。

public static byte[] compare(byte[] x, byte[] y) {
    int i, j;
    final int n = x.length;
    final int m = y.length;
    /* D[i][j] = direction, L[i][j] = Length of LCS */
    int[][] D = new int[n + 1][m + 1];
    byte[][] L = new byte[n + 1][m + 1]; // { 1, 2, 3 }

    /* D[i][0] = 0 for 0<=i<=n */
    /* D[0][j] = 0 for  0<=j<=m */
    for (i = 1; i <= n; i++) {
        for (j = 1; j <= m; j++) {
            if (x[i - 1] == y[ - 1]) {
                D[i][j] = D[i - 1][j - 1] + 1;
                L[i][j] = 1;
            } else if (D[i - 1][j] >= D[i][j - 1]) {
                D[i][j] = D[i - 1][j];
                L[i][j] = 2;
            } else {
                D[i][j] = D[i][j - 1];
                L[i][j] = 3;
            }
        }
    }

    /* Backtrack */
    ByteArrayOutputStream lcs = new ByteArrayOutputStream();
    i = n;
    j = m;
    while (i != 0 && j != 0) {
        switch (L[i][j]) {
            case 1:   /* diagonal */
                lcs.write(x[i - 1]); // We want lcs reversed though.
                --i;
                --j;
                break;
            case 2:  /* up */
                --i;
                break;
            case 3:  /* backward */
                --j;
                break;
        }
    }
    byte[] result = lcs.toByteArray();
    // Reverse:
    for (i = 0, j = result.length - 1; i < j; ++i, --j) {
        byte b = result[i];
        result[i] = result[j];
        result[j] = b;
    }
    return result;
}
于 2014-10-14T11:07:09.387 回答