I created that code http://paste.ubuntu.com/5730390/ and I'm trying to extract titles which contain 3 or more a's (upercase or lowcase),also α's (greek letter) from some websites. I have already stored on a local hdd the websites content in txt format (there is a large number of websites).
My input in dfs is like: site_1.txt, site_2.txt, site_3.txt etc.
Supose that the titles below belong to site_1.txt,site_2.txt,site_3.txt respectively.
Academia.edu - Share research
Google
News12.gr | Αθλητική Ενημέρωση από τα Δωδεκάνησα
Now I want the output to contains: titles 1 and 3 (3 cause there is greek websites and contains a letter "α") in a form like:
Academia.edu - Share research, site_1.txt
News12.gr | Αθλητική Ενημέρωση από τα Δωδεκάνησα, site_2.txt
I tried regex pattern like "?:[αa{3,}]).(?:[αa{3}]).", but there is no results. Would anyone help with that?
Thanks In advance!