java - 使用 StuartMacKay 的 transform-swf 库从 swf 读取文本

Question

我需要从一些 swf 文件中提取所有文本。我正在使用 Java，因为我有很多用这种语言开发的模块。因此，我在网上搜索了所有专门用于处理 SWF 文件的免费 Java 库。最后，我找到了StuartMacKay开发的库。该库名为transform-swf，可通过单击此处在 GitHub 上找到。

问题是：一旦我GlyphIndex从 a 中提取 es TextSpan，我如何转换字符中的 glyp？

请提供一个完整的工作和测试示例。不会接受任何理论答案，也不会接受诸如“无法完成”、“不可能”等答案。

我所知道的和我所做 的我知道GlyphIndexes 是通过使用 aTextTable构建的，它是通过循环到表示字体大小和DefineFont2对象提供的字体描述的整数来构造的，但是当我解码所有 DefineFont2 时，都有零长度提前。

以下是我所做的。

//Creating a Movie object from an swf file.
Movie movie = new Movie();
movie.decodeFromFile(new File(out));

//Saving all the decoded DefineFont2 objects.
Map<Integer,DefineFont2> fonts = new HashMap<>();
for (MovieTag object : list) {
  if (object instanceof DefineFont2) {
    DefineFont2 df2 = (DefineFont2) object;
    fonts.put(df2.getIdentifier(), df2);
  }
} 
//Now I retrieve all the texts       
for (MovieTag object : list) {
    if (object instanceof DefineText2) {
        DefineText2 dt2 = (DefineText2) object;
        for (TextSpan ts : dt2.getSpans()) {
            Integer fontIdentifier = ts.getIdentifier();
            if (fontIdentifier != null) {
                int fontSize = ts.getHeight();
                // Here I try to create an object that should
                // reverse the process done by a TextTable
                ReverseTextTable rtt = 
                  new ReverseTextTable(fonts.get(fontIdentifier), fontSize);
                System.out.println(rtt.charactersForText(ts.getCharacters()));
            }
        }
    }
}

课程ReverseTextTable如下：

public final class ReverseTextTable {


    private final transient Map<Character, GlyphIndex> characters;
    private final transient Map<GlyphIndex, Character> glyphs;

    public ReverseTextTable(final DefineFont2 font, final int fontSize) {    
        characters = new LinkedHashMap<>();
        glyphs = new LinkedHashMap<>();

        final List<Integer> codes = font.getCodes();
        final List<Integer> advances = font.getAdvances();
        final float scale = fontSize / EMSQUARE;
        final int count = codes.size();

        for (int i = 0; i < count; i++) {
            characters.put((char) codes.get(i).intValue(), new GlyphIndex(i,
                    (int) (advances.get(i) * scale)));
            glyphs.put(new GlyphIndex(i,
                    (int) (advances.get(i) * scale)), (char) codes.get(i).intValue());
        }
    }    

    //This method should reverse from a list of GlyphIndexes to a String
    public String charactersForText(final List<GlyphIndex> list) {
        String text="";
        for(GlyphIndex gi: list){
            text+=glyphs.get(gi);
        }
        return text;
    }        
}

不幸的是，advanced from 的列表DefineFont2是空的，然后是ReverseTableTextget an的构造函数ArrayIndexOutOfBoundException。

score 1 · Accepted Answer

老实说，我不知道如何在 Java 中做到这一点。我并不是说这是不可能的，我也相信有办法做到这一点。但是，您说有很多图书馆可以做到这一点。您还建议了一个库，即swftools。因此，我建议重复使用该库以从 Flash 文件中提取文本。为此，您可以Runtime.exec()只使用执行命令行来运行该库。

就个人而言，我更喜欢Apache Commons exec而不是随 JDK 一起发布的标准库。好吧，让我告诉你你应该怎么做。您应该使用的可执行文件是“ swfstrings.exe ”。假设它放在“ C:\”中。假设在同一个文件夹中您可以找到一个 flash 文件，例如page.swf. 然后，我尝试了以下代码（它工作正常）：

    Path pathToSwfFile = Paths.get("C:\" + File.separator + "page.swf");
    CommandLine commandLine = CommandLine.parse("C:\" + File.separator + "swfstrings.exe");
    commandLine.addArgument("\"" + swfFile.toString() + "\"");
    DefaultExecutor executor = new DefaultExecutor();
    executor.setExitValues(new int[]{0, 1}); //Notice that swfstrings.exe returns 1 for success,
                                            //0 for file not found, -1 for error

    ByteArrayOutputStream stdout = new ByteArrayOutputStream();
    PumpStreamHandler psh = new PumpStreamHandler(stdout);
    executor.setStreamHandler(psh);
    int exitValue;
    try{
        exitValue = executor.execute(commandLine);
    }catch(org.apache.commons.exec.ExecuteException ex){
        psh.stop();
    }
    if(!executor.isFailure(exitValue)){
       String out = stdout.toString("UTF-8"); // here you have the extracted text
    }

我知道，这不完全是您要求的答案，但工作正常。

score 1 · Accepted Answer

我碰巧现在正在用 Java 反编译 SWF，我在弄清楚如何对原始文本进行逆向工程时遇到了这个问题。

在查看源代码后，我意识到它非常简单。每种字体都有一个指定的字符序列，可以通过调用来检索DefineFont2.getCodes()，glyphIndex 是中匹配字符的索引DefineFont2.getCodes()。

但是，在单个 SWF 文件中使用多种字体的情况下，很难将每种字体与DefineText对应DefineFont2的字体进行匹配，因为没有用于标识DefineFont2每个DefineText.

为了解决这个问题，我想出了一个自学习算法，它会尝试猜测DefineFont2每个人的正确性，DefineText从而正确地推导出原始文本。

为了对原始文本进行反向工程，我创建了一个名为FontLearner：

public class FontLearner {

    private final ArrayList<DefineFont2> fonts = new ArrayList<DefineFont2>();
    private final HashMap<Integer, HashMap<Character, Integer>> advancesMap = new HashMap<Integer, HashMap<Character, Integer>>();

    /**
     * The same characters from the same font will have similar advance values.
     * This constant defines the allowed difference between two advance values
     * before they are treated as the same character
     */
    private static final int ADVANCE_THRESHOLD = 10;

    /**
     * Some characters have outlier advance values despite being compared
     * to the same character
     * This constant defines the minimum accuracy level for each String
     * before it is associated with the given font
     */
    private static final double ACCURACY_THRESHOLD = 0.9;

    /**
     * This method adds a DefineFont2 to the learner, and a DefineText
     * associated with the font to teach the learner about the given font.
     * 
     * @param font The font to add to the learner
     * @param text The text associated with the font
     */
    private void addFont(DefineFont2 font, DefineText text) {
        fonts.add(font);
        HashMap<Character, Integer> advances = new HashMap<Character, Integer>();
        advancesMap.put(font.getIdentifier(), advances);

        List<Integer> codes = font.getCodes();

        List<TextSpan> spans = text.getSpans();
        for (TextSpan span : spans) {
            List<GlyphIndex> characters = span.getCharacters();
            for (GlyphIndex character : characters) {
                int glyphIndex = character.getGlyphIndex();
                char c = (char) (int) codes.get(glyphIndex);

                int advance = character.getAdvance();
                advances.put(c, advance);
            }
        }
    }

    /**
     * 
     * @param text The DefineText to retrieve the original String from
     * @return The String retrieved from the given DefineText
     */
    public String getString(DefineText text) {
        StringBuilder sb = new StringBuilder();

        List<TextSpan> spans = text.getSpans();

        DefineFont2 font = null;
        for (DefineFont2 getFont : fonts) {
            List<Integer> codes = getFont.getCodes();
            HashMap<Character, Integer> advances = advancesMap.get(getFont.getIdentifier());
            if (advances == null) {
                advances = new HashMap<Character, Integer>();
                advancesMap.put(getFont.getIdentifier(), advances);
            }

            boolean notFound = true;
            int totalMisses = 0;
            int totalCount = 0;

            for (TextSpan span : spans) {
                List<GlyphIndex> characters = span.getCharacters();
                totalCount += characters.size();

                int misses = 0;
                for (GlyphIndex character : characters) {
                    int glyphIndex = character.getGlyphIndex();
                    if (codes.size() > glyphIndex) {
                        char c = (char) (int) codes.get(glyphIndex);

                        Integer getAdvance = advances.get(c);
                        if (getAdvance != null) {
                            notFound = false;

                            if (Math.abs(character.getAdvance() - getAdvance) > ADVANCE_THRESHOLD) {
                                misses += 1;
                            }
                        }
                    } else {
                        notFound = false;
                        misses = characters.size();

                        break;
                    }
                }

                totalMisses += misses;
            }

            double accuracy = (totalCount - totalMisses) * 1.0 / totalCount;

            if (accuracy > ACCURACY_THRESHOLD && !notFound) {
                font = getFont;

                // teach this DefineText to the FontLearner if there are
                // any new characters
                for (TextSpan span : spans) {
                    List<GlyphIndex> characters = span.getCharacters();
                    for (GlyphIndex character : characters) {
                        int glyphIndex = character.getGlyphIndex();
                        char c = (char) (int) codes.get(glyphIndex);

                        int advance = character.getAdvance();
                        if (advances.get(c) == null) {
                            advances.put(c, advance);
                        }
                    }
                }
                break;
            }
        }

        if (font != null) {
            List<Integer> codes = font.getCodes();

            for (TextSpan span : spans) {
                List<GlyphIndex> characters = span.getCharacters();
                for (GlyphIndex character : characters) {
                    int glyphIndex = character.getGlyphIndex();
                    char c = (char) (int) codes.get(glyphIndex);
                    sb.append(c);
                }
                sb = new StringBuilder(sb.toString().trim());
                sb.append(" ");
            }
        }

        return sb.toString().trim();
    }
}

用法：

Movie movie = new Movie();
movie.decodeFromStream(response.getEntity().getContent());

FontLearner learner = new FontLearner();
DefineFont2 font = null;

List<MovieTag> objects = movie.getObjects();
for (MovieTag object : objects) {
if (object instanceof DefineFont2) {
    font = (DefineFont2) object;
} else if (object instanceof DefineText) {
    DefineText text = (DefineText) object;
    if (font != null) {
        learner.addFont(font, text);
        font = null;
    }
    String line = learner.getString(text); // reverse engineers the line
}

我很高兴地说，这种方法使我在使用 StuartMacKay 的 transform-swf 库对原始字符串进行逆向工程时获得了 100% 的准确度。

score 0 · Accepted Answer

你试图实现的目标似乎很困难，你试图编译文件 bur 我很抱歉地说这是不可能的，我建议你做的是将它转换成一些位图（如果可能的话）或任何其他方法尝试使用OCR读取字符

有一些软件可以做到这一点，您也可以查看一些有关此的论坛。因为一旦编译的 swf 版本非常困难（据我所知，这是不可能的）。如果你愿意，你可以检查这个反编译器，或者尝试使用其他一些语言，比如这里的项目

score 0 · Accepted Answer

我在使用transform-swf库时遇到了类似的长字符串问题。

得到源代码并调试它。
我相信课堂上有一个小错误com.flagstone.transform.coder.SWFDecoder。

第 540 行（适用于 3.0.2 版本），更改

目的地 += 长度；

和

目的地+=计数；

那应该为你做（它是关于提取字符串）。我也通知了斯图尔特。仅当您的字符串非常大时才会出现问题。

score 0 · Accepted Answer

我知道这不是你问的，但我最近需要使用 Java 从 SWF 中提取文本，发现ffdec库比transform-swf好得多

如果有人需要示例代码，请评论

java - 使用 StuartMacKay 的 transform-swf 库从 swf 读取文本

5 回答 5

Related

Reference