You should modify your sql query to preserve any words that don't match a term in your lexicon (perhaps by using an outer join; if you show us that query we could give you more specific advice). Then, assuming your output then looks something like this (with just a /
following each term that didn't match your lexicon):
The/article big/adjective red/ ball/noun fell/.
You could clean it up with sed
like this (assuming the string had been saved in a variable called $variablename
:
sed 's_\/\([ .]\)_\1_g' <(echo "$variablename")
Explanation:
I used _
instead of /
to delimit my s
command for legibility. The syntax s/search/replace/g
is synonymous with s_search_replace_g
.
\/\([ .]\)
tells sed to match anything with a literal /
(escaped as \/
) followed by either a space or a period [ .]
. anything matching this pattern is stored into a reference because of the \(
and \)
surrounding the pattern.
\1
in the replacement pattern is the backreference I mentioned earlier. This works like a variable storing the matched portion we surrounded with parentheses in the search pattern. In effect, I've told sed to strip out any forward slashes that are followed by a space or a period, without stripping away the space or period itself.
Output:
The/article big/adjective red ball/noun fell.