$ more xtract_emails.sh
#/bin/ksh
sed -e 's/\,/\n/g' -e 's/ /\n/g' $1 | \
grep '@' | \
sed -e "s/[<>();]//g" -e 's/mailto://g' \
| sort -u > ${1}.extracted.txt
wc -l ${1}.extracted.txt
Example:
$ ./xtract.sh emails_unformatted.txt
131 emails_formatted.extracted.txt
Hope it is useful for someone else for extracting emails in a single shot. Otherwise, it takes a lot of time for doing several passes by examining the post-processed output. Even with the above heuristic rules, the output may not have 100% proper email, so some proof reading would be needed.
If this is useful to you, please leave a comment here.
No comments:
Post a Comment