java - 在 Java 中将文本文件表示为单个单元，并匹配文本中的字符串

Question

如何将文本文件（或 XML 文件）表示为整个字符串，并在其中搜索（或匹配）特定字符串？

我创建了一个 BufferedReader 对象：

BufferedReader input =  new BufferedReader(new FileReader(aFile));

然后我尝试使用 Scanner 类及其选项来指定不同的分隔符，如下所示：

//Scanner scantext = new Scanner(input);
//Scanner scantext = new Scanner(input).useDelimiter("");
Scanner scantext = new Scanner(input).useDelimiter("\n");
while (scantext.hasNext()) {  ... }

使用这样的 Scanner 类，我可以逐行或逐字阅读文本，但这对我没有帮助，因为有时在我想要处理的文本中，我有

</review><review>

我想说：如果您<review>在文本中的任何地方找到“”，请对以下下一行（或一段文本）执行某些操作，直到找到“ </review>”。问题是<review>和</review>位于文本中的不同位置，有时会粘在其他文本上（因此作为分隔符的空白对我没有帮助）。

我曾想过我可能会在 Java 中使用正则表达式 API（Pattern 和 Matcher 类），但它们似乎匹配特定的字符串或行，并且我希望将文本作为一个连续的字符串（至少这是我的印象从我所读到的关于它们的信息中）。你能告诉我在这种情况下我应该使用什么结构/方法/类吗？谢谢你。

score 3 · Accepted Answer

不要尝试用正则表达式解析 XML；它只会导致痛苦。Java 中已经有很多~~非常好的~~ 现有 XML API；为什么要重新发明它们？

无论如何，要在文本文件中搜索字符串，您应该：

将文件加载为字符串（示例）
创建一个Pattern搜索
使用 aMatcher遍历任何匹配项

score 1 · Accepted Answer

在我看来，好像您正在尝试使用结构化的 xml 文件，并建议您查看javax.xml.parsers.DocumentBuilder或其他内置API来解析文档。

score 1 · Accepted Answer

1

使用 XML 解析器。

或者使用 xpath，就像在这个例子中一样。

于 2009-05-04T19:40:40.793 回答

score 1 · Accepted Answer

I have thought that I might use the regular expression API in Java (the Pattern and Matcher classes), but they seem to match a particular string or line, and I want to have the text as one continuous string

Um, does something prevent you from reading the XML file into a String, and then operating on that, using the regular expression API?

You can easily read a file into a String using e.g. FileUtils from Apache Commons IO: see readFileToString(File file, String encoding).

score 1 · Accepted Answer

I also would recommend using a XML parsing API...But as you only want to do something in case of "review" tag, maybe you could use SAX better than DOM...

score 0 · Accepted Answer

I think here, we can copy individual line in the text file into a string and then try to match a substring(search string) with the string(line)

But error produces while excuting metacharacters like / or # etc..

java - 在 Java 中将文本文件表示为单个单元，并匹配文本中的字符串

6 回答 6

Related

Reference