I'm trying to match a regular expression to textbook definitions that I get from a website. The definition always has the word with a new line followed by the definition. For example:
Zither
Definition: An instrument of music used in Austria and Germany It has from thirty to forty wires strung across a shallow sounding board which lies horizontally on a table before the performer who uses both hands in playing on it Not to be confounded with the old lute shaped cittern or cithern
In my attempts to get just the word (in this case "Zither") I keep getting the newline character.
I tried both ^(\w+)\s
and ^(\S+)\s
without much luck. I thought that maybe ^(\S+)$
would work, but that doesn't seem to successfully match the word at all. I've been testing with rubular, http://rubular.com/r/LPEHCnS0ri; which seems to successfully match all my attempts the way I want, despite the fact that Java doesn't.
Here's my snippet
String str = ...; //Here the string is assigned a word and definition taken from the internet like given in the example above.
Pattern rgx = Pattern.compile("^(\\S+)$");
Matcher mtch = rgx.matcher(str);
if (mtch.find()) {
String result = mtch.group();
terms.add(new SearchTerm(result, System.nanoTime()));
}
This is easily solved by triming the resulting string, but that seems like it should be unnecessary if I'm already using a regular expression.
All help is greatly appreciated. Thanks in advance!