4

在阅读其他内容之前,请花点时间阅读原帖

概述: .xfdl 文件是经过 gzip 压缩的 .xml 文件,然后使用 base64 进行编码。我希望将 .xfdl 解编码为 xml,然后我可以对其进行修改,然后重新编码回 .xfdl 文件。

xfdl > xml.gz > xml > xml.gz > xfdl

我已经能够获取一个 .xfdl 文件并使用 uudeview 从 base64 对其进行解码:

uudeview -i yourform.xfdl

然后用gunzip解压

gunzip -S "" < UNKNOWN.001 > yourform-unpacked.xml

生成的 xml 是 100% 可读的,看起来很棒。在不修改 xml 的情况下,我应该能够使用 gzip 重新压缩它:

gzip yourform-unpacked.xml

然后用base-64重新编码:

base64 -e yourform-unpacked.xml.gz yourform_reencoded.xfdl

如果我的想法是正确的,那么原始文件和重新编码的文件应该是相等的。但是,如果我将 yourform.xfdl 和 yourform_reencoded.xfdl 放在无法比较的范围内,它们就不匹配了。此外,可以在 http://www.grants.gov/help/download_software.jsp#pureedge">.xfdl 查看器中查看原始文件。查看器说重新编码的 xfdl 不可读。

我也尝试过 uuenview 在 base64 中重新编码,它也会产生相同的结果。任何帮助,将不胜感激。

4

8 回答 8

2

据我所知,您找不到已压缩文件的压缩级别。当您压缩文件时,您可以使用 -# 指定压缩级别,其中 # 是从 1 到 9(1 是最快的压缩,9 是最压缩的文件)。在实践中,您永远不应该将压缩文件与已提取并重新压缩的文件进行比较,细微的变化很容易出现。在您的情况下,我将比较 base64 编码版本而不是 gzip 版本。

于 2008-08-09T16:56:56.737 回答
1

我在来自http://iharder.net/base64的 Base64 类的帮助下在 Java 中做到了这一点。

我一直在开发一个在 Java 中进行表单操作的应用程序。我解码文件,从 XML 创建一个 DOM 文档,然后将其写回文件。

我在 Java 中读取文件的代码如下所示:

public XFDLDocument(String inputFile) 
        throws IOException, 
            ParserConfigurationException,
            SAXException

{
    fileLocation = inputFile;

    try{

        //create file object
        File f = new File(inputFile);
        if(!f.exists()) {
            throw new IOException("Specified File could not be found!");
        }

        //open file stream from file
        FileInputStream fis = new FileInputStream(inputFile);

        //Skip past the MIME header
        fis.skip(FILE_HEADER_BLOCK.length());   

        //Decompress from base 64                   
        Base64.InputStream bis = new Base64.InputStream(fis, 
                Base64.DECODE);

        //UnZIP the resulting stream
        GZIPInputStream gis = new GZIPInputStream(bis);

        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        doc = db.parse(gis);

        gis.close();
        bis.close();
        fis.close();

    }
    catch (ParserConfigurationException pce) {
        throw new ParserConfigurationException("Error parsing XFDL from file.");
    }
    catch (SAXException saxe) {
        throw new SAXException("Error parsing XFDL into XML Document.");
    }
}

我在java中的代码看起来像这样将文件写入磁盘:

    /**
     * Saves the current document to the specified location
     * @param destination Desired destination for the file.
     * @param asXML True if output needs should be as un-encoded XML not Base64/GZIP
     * @throws IOException File cannot be created at specified location
     * @throws TransformerConfigurationExample
     * @throws TransformerException 
     */
    public void saveFile(String destination, boolean asXML) 
        throws IOException, 
            TransformerConfigurationException, 
            TransformerException  
        {

        BufferedWriter bf = new BufferedWriter(new FileWriter(destination));
        bf.write(FILE_HEADER_BLOCK);
        bf.newLine();
        bf.flush();
        bf.close();

        OutputStream outStream;
        if(!asXML) {
            outStream = new GZIPOutputStream(
                new Base64.OutputStream(
                        new FileOutputStream(destination, true)));
        } else {
            outStream = new FileOutputStream(destination, true);
        }

        Transformer t = TransformerFactory.newInstance().newTransformer();
        t.transform(new DOMSource(doc), new StreamResult(outStream));

        outStream.flush();
        outStream.close();      
    }

希望有帮助。

于 2011-03-28T22:46:01.223 回答
1

我一直在做类似的事情,这应该适用于 php。您必须有一个可写的 tmp 文件夹,并且 php 文件命名为 example.php!

    <?php
    function gzdecode($data) {
        $len = strlen($data);
        if ($len < 18 || strcmp(substr($data,0,2),"\x1f\x8b")) {
            echo "FILE NOT GZIP FORMAT";
            return null;  // Not GZIP format (See RFC 1952)
        }
        $method = ord(substr($data,2,1));  // Compression method
        $flags  = ord(substr($data,3,1));  // Flags
        if ($flags & 31 != $flags) {
            // Reserved bits are set -- NOT ALLOWED by RFC 1952
            echo "RESERVED BITS ARE SET. VERY BAD";
            return null;
        }
        // NOTE: $mtime may be negative (PHP integer limitations)
        $mtime = unpack("V", substr($data,4,4));
        $mtime = $mtime[1];
        $xfl   = substr($data,8,1);
        $os    = substr($data,8,1);
        $headerlen = 10;
        $extralen  = 0;
        $extra     = "";
        if ($flags & 4) {
            // 2-byte length prefixed EXTRA data in header
            if ($len - $headerlen - 2 < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $extralen = unpack("v",substr($data,8,2));
            $extralen = $extralen[1];
            if ($len - $headerlen - 2 - $extralen < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $extra = substr($data,10,$extralen);
            $headerlen += 2 + $extralen;
        }

        $filenamelen = 0;
        $filename = "";
        if ($flags & 8) {
            // C-style string file NAME data in header
            if ($len - $headerlen - 1 < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $filenamelen = strpos(substr($data,8+$extralen),chr(0));
            if ($filenamelen === false || $len - $headerlen - $filenamelen - 1 < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $filename = substr($data,$headerlen,$filenamelen);
            $headerlen += $filenamelen + 1;
        }

        $commentlen = 0;
        $comment = "";
        if ($flags & 16) {
            // C-style string COMMENT data in header
            if ($len - $headerlen - 1 < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $commentlen = strpos(substr($data,8+$extralen+$filenamelen),chr(0));
            if ($commentlen === false || $len - $headerlen - $commentlen - 1 < 8) {
                return false;    // Invalid header format
                echo "INVALID FORMAT";
            }
            $comment = substr($data,$headerlen,$commentlen);
            $headerlen += $commentlen + 1;
        }

        $headercrc = "";
        if ($flags & 1) {
            // 2-bytes (lowest order) of CRC32 on header present
            if ($len - $headerlen - 2 < 8) {
                return false;    // Invalid format
                echo "INVALID FORMAT";
            }
            $calccrc = crc32(substr($data,0,$headerlen)) & 0xffff;
            $headercrc = unpack("v", substr($data,$headerlen,2));
            $headercrc = $headercrc[1];
            if ($headercrc != $calccrc) {
                echo "BAD CRC";
                return false;    // Bad header CRC
            }
            $headerlen += 2;
        }

        // GZIP FOOTER - These be negative due to PHP's limitations
        $datacrc = unpack("V",substr($data,-8,4));
        $datacrc = $datacrc[1];
        $isize = unpack("V",substr($data,-4));
        $isize = $isize[1];

        // Perform the decompression:
        $bodylen = $len-$headerlen-8;
        if ($bodylen < 1) {
            // This should never happen - IMPLEMENTATION BUG!
            echo "BIG OOPS";
            return null;
        }
        $body = substr($data,$headerlen,$bodylen);
        $data = "";
        if ($bodylen > 0) {
            switch ($method) {
                case 8:
                    // Currently the only supported compression method:
                    $data = gzinflate($body);
                    break;
                default:
                    // Unknown compression method
                    echo "UNKNOWN COMPRESSION METHOD";
                return false;
            }
        } else {
            // I'm not sure if zero-byte body content is allowed.
            // Allow it for now...  Do nothing...
            echo "ITS EMPTY";
        }

        // Verifiy decompressed size and CRC32:
        // NOTE: This may fail with large data sizes depending on how
        //       PHP's integer limitations affect strlen() since $isize
        //       may be negative for large sizes.
        if ($isize != strlen($data) || crc32($data) != $datacrc) {
            // Bad format!  Length or CRC doesn't match!
            echo "LENGTH OR CRC DO NOT MATCH";
            return false;
        }
        return $data;
    }
    echo "<html><head></head><body>";
    if (empty($_REQUEST['upload'])) {
        echo <<<_END
    <form enctype="multipart/form-data" action="example.php" method="POST">
    <input type="hidden" name="MAX_FILE_SIZE" value="100000" />
    <table>
    <th>
    <input name="uploadedfile" type="file" />
    </th>
    <tr>
    <td><input type="submit" name="upload" value="Convert File" /></td>
    </tr>
    </table>
    </form>
    _END;

    }
    if (!empty($_REQUEST['upload'])) {
        $file           = "tmp/" . $_FILES['uploadedfile']['name'];
        $orgfile        = $_FILES['uploadedfile']['name'];
        $name           = str_replace(".xfdl", "", $orgfile);
        $convertedfile  = "tmp/" . $name . ".xml";
        $compressedfile = "tmp/" . $name . ".gz";
        $finalfile      = "tmp/" . $name . "new.xfdl";
        $target_path    = "tmp/";
        $target_path    = $target_path . basename($_FILES['uploadedfile']['name']);
        if (move_uploaded_file($_FILES['uploadedfile']['tmp_name'], $target_path)) {
        } else {
            echo "There was an error uploading the file, please try again!";
        }
        $firstline      = "application/vnd.xfdl; content-encoding=\"base64-gzip\"\n";
        $data           = file($file);
        $data           = array_slice($data, 1);
        $raw            = implode($data);
        $decoded        = base64_decode($raw);
        $decompressed   = gzdecode($decoded);
        $compressed     = gzencode($decompressed);
        $encoded        = base64_encode($compressed);
        $decoded2       = base64_decode($encoded);
        $decompressed2  = gzdecode($decoded2);
        $header         = bin2hex(substr($decoded, 0, 10));
        $tail           = bin2hex(substr($decoded, -8));
        $header2        = bin2hex(substr($compressed, 0, 10));
        $tail2          = bin2hex(substr($compressed, -8));
        $header3        = bin2hex(substr($decoded2, 0, 10));
        $tail3          = bin2hex(substr($decoded2, -8));
        $filehandle     = fopen($compressedfile, 'w');
        fwrite($filehandle, $decoded);
        fclose($filehandle);
        $filehandle     = fopen($convertedfile, 'w');
        fwrite($filehandle, $decompressed);
        fclose($filehandle);
        $filehandle     = fopen($finalfile, 'w');
        fwrite($filehandle, $firstline);
        fwrite($filehandle, $encoded);
        fclose($filehandle);
        echo "<center>";
        echo "<table style='text-align:center' >";
        echo "<tr><th>Stage 1</th>";
        echo "<th>Stage 2</th>";
        echo "<th>Stage 3</th></tr>";
        echo "<tr><td>RAW DATA -></td><td>DECODED DATA -></td><td>UNCOMPRESSED DATA -></td></tr>";
        echo "<tr><td>LENGTH: ".strlen($raw)."</td>";
        echo "<td>LENGTH: ".strlen($decoded)."</td>";
        echo "<td>LENGTH: ".strlen($decompressed)."</td></tr>";
        echo "<tr><td><a href='tmp/".$orgfile."'/>ORIGINAL</a></td><td>GZIP HEADER:".$header."</td><td><a href='".$convertedfile."'/>XML CONVERTED</a></td></tr>";
        echo "<tr><td></td><td>GZIP TAIL:".$tail."</td><td></td></tr>";
        echo "<tr><td><textarea cols='30' rows='20'>" . $raw . "</textarea></td>";
        echo "<td><textarea cols='30' rows='20'>" . $decoded . "</textarea></td>";
        echo "<td><textarea cols='30' rows='20'>" . $decompressed . "</textarea></td></tr>";
        echo "<tr><th>Stage 6</th>";
        echo "<th>Stage 5</th>";
        echo "<th>Stage 4</th></tr>";
        echo "<tr><td>ENCODED DATA <-</td><td>COMPRESSED DATA <-</td><td>UNCOMPRESSED DATA <-</td></tr>";
        echo "<tr><td>LENGTH: ".strlen($encoded)."</td>";
        echo "<td>LENGTH: ".strlen($compressed)."</td>";
        echo "<td>LENGTH: ".strlen($decompressed)."</td></tr>";
        echo "<tr><td></td><td>GZIP HEADER:".$header2."</td><td></td></tr>";
        echo "<tr><td></td><td>GZIP TAIL:".$tail2."</td><td></td></tr>";
        echo "<tr><td><a href='".$finalfile."'/>FINAL FILE</a></td><td><a href='".$compressedfile."'/>RE-COMPRESSED FILE</a></td><td></td></tr>";
        echo "<tr><td><textarea cols='30' rows='20'>" . $encoded . "</textarea></td>";
        echo "<td><textarea cols='30' rows='20'>" . $compressed . "</textarea></td>";
        echo "<td><textarea cols='30' rows='20'>" . $decompressed  . "</textarea></td></tr>";
        echo "</table>";
        echo "</center>";
    }
    echo "</body></html>";
    ?>
于 2012-01-17T22:57:27.210 回答
1

您需要将以下行放在 XFDL 文件的开头:

application/vnd.xfdl; content-encoding="base64-gzip"

生成 base64 编码文件后,在文本编辑器中打开它并将上面的行粘贴到第一行。确保 base64'ed 块从第二行的开头开始。

保存并在查看器中尝试!如果仍然无法正常工作,则可能是对 XML 所做的更改以某种方式使其不兼容。在这种情况下,在 XML 被修改之后,但在它被 gzip 和 base64 编码之前,使用 .xfdl 文件扩展名保存它并尝试使用查看器工具打开它。如果文件是有效的 XFDL 格式,查看器应该能够解析和显示未压缩/未编码的文件。

于 2009-09-12T06:27:45.070 回答
1

检查这些:

http://www.ourada.org/blog/archives/375

http://www.ourada.org/blog/archives/390

它们是用 Python 编写的,而不是 Ruby,但这应该让你非常接近。

该算法实际上适用于标题为 'application/x-xfdl;content-encoding="asc-gzip"' 而不是 'application/vnd.xfdl; content-encoding="base64-gzip"' 但好消息是 PureEdge(又名 IBM Lotus Forms)将毫无问题地打开该格式。

然后最重要的是,这是一个 base64-gzip 解码(在 Python 中),因此您可以进行完整的往返:

with open(filename, 'r') as f:
  header = f.readline()
  if header == 'application/vnd.xfdl; content-encoding="base64-gzip"\n':
    decoded = b''
    for line in f:
      decoded += base64.b64decode(line.encode("ISO-8859-1"))
    xml = zlib.decompress(decoded, zlib.MAX_WBITS + 16)
于 2011-02-16T20:43:17.513 回答
0

gzip 算法的不同实现总是会产生略有不同但仍然正确的文件,原始文件的压缩级别也可能与您运行它的不同。

于 2008-08-09T15:57:40.617 回答
0

有意思,我试一试。然而,变化并不小。新编码的文件更长,并且在比较前后的二进制文件时,数据几乎完全不匹配。

之前(前三行)

H4sIAAAAAAAAC+19eZOiyNb3/34K3r4RT/WEU40ssvTtrhuIuKK44Bo3YoJdFAFZ3D79C6hVVhUq
dsnUVN/qmIkSOLlwlt/JPCfJ/PGf9dwAlorj6pb58wv0LfcFUEzJknVT+/ml2uXuCSJP3kNf/vOQ
+TEsFVkgoDfdn18mnmd/B8HVavWt5TsKI2vKN8magyENiH3Lf9kRfpd817PmF+jpiOhQRFZcXTMV

之后(前三行):

H4sICJ/YnEgAAzEyNDQ2LTExNjk2NzUueGZkbC54bWwA7D1pU+JK19/9FV2+H5wpByEhJMRH
uRUgCMom4DBYt2oqkAZyDQlmQZ1f/3YSNqGzKT3oDH6RdE4vOXuf08vFP88TFcygYSq6dnlM
naWOAdQGuqxoo8vjSruRyGYzfII6/id3dPGjVKwCBK+Zl8djy5qeJ5NPT09nTduAojyCZwN9

正如你所看到H4SI的那样,比赛之后就是混乱。

于 2008-08-09T17:10:48.380 回答
0

gzip 会将文件名放在文件头中,因此 gzip 压缩文件的长度会根据未压缩文件的文件名而有所不同。

如果 gzip 作用于流,则省略文件名并且文件更短一些,因此以下应该可以工作:

gzip yourform-unpacked.xml.gz

然后用base-64重新编码:base64 -e yourform-unpacked.xml.gz yourform_reencoded.xfdl

也许这会产生一个相同长度的文件

于 2009-10-08T19:28:39.257 回答