There are two issues here.
- In regex
.
is special. It means "match one of any character". However, you are trying to use it to match a regular period. (It will indeed match that, but it will also match everything else.) Instead, to match a period, you need to use the pattern \.
. And to change that to match either a period or a hyphen, you can use a class, like [-.]
.
- You are using
\b
at the end of your pattern to match the word boundary, but \b
is defined as being the boundary between a word character and a non-word character, and periods and spaces are both non-word characters. This means that Python won't find a match. Instead, you could use a lookahead assertion, which will match whatever character you want, but won't consume the string.
Now, to match a whole word - any word - you can do something like \w+
, which matches one or more word characters.
Also, it is quite possible that there won't be a match anyway, so you should check whether a match occurred using an if
statement or a try
statement. Putting it all together:
txt = "The indian in. Spain."
pattern = r"\w+[-.]"
x = re.search(r"\b" + pattern + r"(?=\W)", txt)
if x:
print(x.start(), x.end())
Edit
There is one problem with the lookahead assertion above - it won't match the end of the string. This means that if your text is The rain in Spain.
then it won't match Spain.
, as there is no non-word character following the final period.
To fix this, you can use a negative lookahead assertion, which matches when the following text does not include the pattern, and also does not consume the string.
x = re.search(r"\b" + pattern + r"(?!\w)", txt)
This will match when the character after the word is anything other than a word character, including the end of the string.