0

java中是否有任何方法可以动态识别句子格式并找到与该格式匹配的另一个单词/单词组?

例如:

workExperience:
some text

educationalQualification:
some text

假设 worExperience 是文本文件的第一行。java 程序应该扫描这一行并提取句子格式,因为xYwherex是一个全小写的单词,并且Y是一个第一个字母为大写字母的单词。然后使用 this ,它应该匹配educationalQualification为匹配项。这必须是动态的,因为格式可能因文件而异。例如,另一个文件的第一行可能是Work Experience.

现在我们所做的是我们创建了一组可能的格式模板并尝试使用它来匹配它们。还有其他方法可以做到这一点吗?

在这里,我不是在寻找文本匹配。我想确定第一行的格式,workExperience并希望匹配文件中属于这种格式的所有其他词组,在这个例子中它应该找到educationalQualification

可能的格式可能是

Work Experience
workExperience
WORK EXPERIENCE
work Experience etc
4

2 回答 2

1

如果您只关心小写/大写字母和空格的组合,那么您可以根据输入的第一行动态构建正则表达式。您可以充分利用 Guava 的CharMatcher

像这样的东西:

String getPattern(String fromString) {
    Map<CharMatcher, String> charToRegex = Maps.newHashMap();
    charToRegex.put(CharMatcher.JAVA_LOWER_CASE, "[a-z]");
    charToRegex.put(CharMatcher.JAVA_UPPER_CASE, "[A-Z]");
    charToRegex.put(CharMatcher.WHITESPACE, "\\s");

    StringBuilder pattern = new StringBuilder();
    String lastRegexPart = "";

    for (int i = 0; i < fromString.length(); i++) {
        for (CharMatcher matcher : charToRegex.keySet()) {
            if (matcher.apply(fromString.charAt(i))) {
                String regexPart = charToRegex.get(matcher);
                if (lastRegexPart.equals(regexPart)) {
                    if (pattern.lastIndexOf("+") != pattern.length() - 1) {
                        pattern.append("+");
                    }
                } else {
                    pattern.append(regexPart);
                    lastRegexPart = regexPart;
                }
            }
        }
    }
    return pattern.toString();
}

似乎工作得很好:

getPattern("workExperience"); // returns [a-z]+[A-Z][a-z]+
getPattern("Work Experience"); // returns [A-Z][a-z]+\s[A-Z][a-z]+

即使您的要求有点复杂,我认为您可以微调此算法以满足您的需求。

于 2013-10-01T13:10:41.530 回答
0

假设您想匹配您的任何排列,work experience:可以尝试使行小写并删除所有空格,即

"work experience:".toLowerCase().replaceAll(" ","").equals("workexperience:");
"work experience:".toLowerCase().replaceAll(" ","").equals("workexperience:");   
"Work Experience:".toLowerCase().replaceAll(" ","").equals("workexperience:");   
"workExperience:".toLowerCase().replaceAll(" ","").equals("workexperience:");   
"workexperience:".toLowerCase().replaceAll(" ","").equals("workexperience:");   
" work   experience   :".toLowerCase().replaceAll(" ","").equals("workexperience:");   

这些都会返回 true。

或者,使用equalsIgnoreCase()

"work experience:".replaceAll(" ","").equalsIgnoreCase( "workexperience:");

编辑:切换参数会使其更具可读性:

"workexperience:".equalsIgnoreCase( "work experience:".replaceAll(" ",""));
"workexperience:".equalsIgnoreCase( "workExperience:".replaceAll(" ",""));
"workexperience:".equalsIgnoreCase( "Work Experience:".replaceAll(" ",""));
"workexperience:".equalsIgnoreCase( "WorkExperience:".replaceAll(" ",""));
"workexperience:".equalsIgnoreCase( "   work experience    :".replaceAll(" ",""));
于 2013-10-01T12:51:38.993 回答