Search This Blog

Sunday, March 29, 2009

yahoo2mbox.pl error: Unexpected title page format

The problem


c:\emails\yahoo2mbox-0.24>perl yahoo2mbox.pl --user=pooja13pandey --pass=<password>
1 --delay=5 desimasala
Logging in as pooja13pandey... ok.
Getting number of messages in group desimasala...
Unexpected title page format (DesiMasala).

The solution


This simply means that the user with which we are trying to archive the yahoo group's messages locally, is not part of the yahoo group yet. So now I make pooja13pandey user a part of the desimasala yahoo group.
c:\emails\yahoo2mbox-0.24>perl yahoo2mbox.pl --user=pooja13pandey --pass=<password>
1 --delay=5 desimasala
Logging in as pooja13pandey... ok.
Getting number of messages in group desimasala...
Retrieving messages 1..12389:
Endless redirect loop detected while retrieving message 1.
This error is often due to using incorrect case in the group name.
Saved 0 message(s) in desimasala.

Hmm, so there is still some problem. Although, there is a yahoo group called Desimasala, Im not able to login to it through yahoo2mbox.pl script for some reason. On closer look, one can see that the group name is case-sensitive. It needs to be DesiMasala, instead of Desimasala or desiMasala or desimasala.

So, lets make that correction and try again.
c:\emails\yahoo2mbox-0.24>perl yahoo2mbox.pl --user=pooja13pandey --pass=<password>
1 --delay=5 DesiMasala
Will resume at message 1
Logging in as pooja13pandey... ok.
Getting number of messages in group DesiMasala...
Retrieving messages 1..12389:

.. And, Bingo! It worked.

Saturday, March 28, 2009

quick script to extract email ids from a detailed mail header

Since my creative juices are flowing so much today, I thought of posting the simple, yet effective awk script to extract sender's email ids from a detailed mail message (even though if the mail id be ecrypted).

A word of caution is that the file being processed should have a line with "EOF" value (without strings) ONLY at the end of file, otherwise the awk script will hang.

BEGIN { print FILENAME | "wc -l |cut -f1 -d' '" ; }
/^X-Sender:/ || /^From / { #print NR, $0;
#
# get the real userid from where the email came
#
m=split($0,b,"@");
from=b[1];
#print from;
x=split(from,c," ");
realfrom=c[x];
gsub("<","",realfrom);
#print realfrom;
#
# get the domain name of the smtp server now
#
while ($0 !~ /HELO/ && $0 != "EOF") getline;
if ($0 == "EOF") exit;
domain=$5; gsub(")","",domain); n=split(domain,a,".")
#print n, domain;
realdomain=a[n-1]"."a[n];
if (n>1) print realfrom"@"realdomain;
next;
}

How to uninstall a CPAN module ?

CPAN module does not have the option of un-installing a perl module as of now. This can be especially frustrating for newbies (as it was for me). I struggled quite a bit on google and found this solution. For this, you need the CPANPLUS module, which needs to be first installed through CPAN.
C:\Users\Pooja Verma>perl -MCPAN -e shell

cpan> install CPANPLUS

cpan> exit

Unlike CPAN, you cannot invoke CPANPLUS by typing cpanplus on command prompt, even if its installed.
C:\Users\Pooja Verma>cpanplus
'cpanplus' is not recognized as an internal or external command,
operable program or batch file.

C:\Users\Pooja Verma>perl -MCPANPLUS -e shell
CPANPLUS::Shell::Default -- CPAN exploration and module installation (v0.84)
*** Please report bugs to <bug-cpanplus@rt.cpan.org>.
*** Using CPANPLUS::Backend v0.84.  ReadLine support disabled.

*** Type 'p' now to show start up log

Did you know...
    The documentation in CPANPLUS::Module and CPANPLUS::Backend is very useful
CPAN Terminal> help

[General]
    h | ?                  # display help
    q                      # exit
    v                      # version information
[Search]
    a AUTHOR ...           # search by author(s)
    m MODULE ...           # search by module(s)
    f MODULE ...           # list all releases of a module
    o [ MODULE ... ]       # list installed module(s) that aren't up to date
    w                      # display the result of your last search again
[Operations]
    i MODULE | NUMBER ...  # install module(s), by name or by search number
    i URI | ...            # install module(s), by URI (ie http://foo.com/X.tgz)
    t MODULE | NUMBER ...  # test module(s), by name or by search number
    u MODULE | NUMBER ...  # uninstall module(s), by name or by search number
    d MODULE | NUMBER ...  # download module(s)
    l MODULE | NUMBER ...  # display detailed information about module(s)
    r MODULE | NUMBER ...  # display README files of module(s)
    c MODULE | NUMBER ...  # check for module report(s) from cpan-testers
    z MODULE | NUMBER ...  # extract module(s) and open command prompt in it
[Local Administration]
    b                      # write a bundle file for your configuration
    s program [OPT VALUE]  # set program locations for this session
    s conf    [OPT VALUE]  # set config options for this session
    s mirrors              # show currently selected mirrors

Now you can uninstall the module that you want to. If needed, you can use the --verbose and --force option with it.
CPAN Terminal> u MIME::Head --force --verbose

How to scrape emails for online marketing? -- for free

Prologue - Is harmless marketing always Spam?


spam

Ok, so I had started a website and wanted to promote it. At some point, you might be in the same boat too.

Unfortunately, I did not have a while lot of contacts for sending emails to. Well, the website was not really spammy (it was actually a community related website for open source audio resources), and I was not going to solicit for money, so I did not have a lot of compunctions in telling people about it. If they wanted to listen to some audios, they could stick around, otherwise they could do anything else they pleased to.

I knew some owners of yahoo groups that had heavy audience. I thought maybe it was worth a shot explaining about the website and asking if they might be willing to share emails for targeted promotion. The results were quite disappointing, and somewhat expected.
After talking to a few of them, my general observation seemed that most group owners think that as long as they dont give out member emails to others, those emails would always be protected from spam. They tend to think of other groups/email distributions as spam! (as if those people WONT get any other SPAM in the future at all, huh?)

Whatever, I thought. As long as my intentions were pure, I did not need to have second thoughts.

So what are my options?


q-man-thinking-21

Hmph! What a Bummer. How long could I depend on google to make the website more popular? It was basically a deadlock.

So one fine day, I started thinking about how ELSE I could approach this problem. For example, could I not use a unix utility like wget or perl to simulate web clicks or use some robot utility to do clicks, web navigation and download information?

Then another thought was -- could I not get emailids from specific yahoo groups or gmail groups that I was subscribed to, and whose target audience might be interested in knowing about a website? After further digging on google, I found that there IS a perl module called WWW::Yahoo::Groups that can programmatically fetch group messages to local PC drive.

WWW::Yahoo::Groups -- Easier said than done..


I tried installing WWW::Yahoo::Groups on a unix box I had access to, but it seems that its easier said than done. It had a ton of dependent modules that wouln't install properly (even with the force option) -- I specifically had a lot of trouble with the Crypt::SSLeay module.

So what next?


This failure forced me to look for other options. And boy, there are quite a few of them available. fetchyahoo is one, then you have yahoo2mbox.
Now, An interesting point about yahoo2mbox is that there are a couple places where you can find it. Apparently, TT solutions has their own version at http://www.tt-solutions.com/en/Products/yahoo2mbox, while there is a debian version available at http://packages.debian.org/lenny/yahoo2mbox. I had problems with the debian version on Ubuntu and tried using the one from tt-solutions, which worked better.

So while fetchyahoo can fetch the contents of a single yahoo user, yahoo2mbox can do the same for a yahoogroup.

Fine, I thought, lets go for the jugular and get yahoo2mbox to work. The results were inconsistent: if you are a moderator of the group or if the group has really loose security setting for not masking email ids, only then can you see the email ids in the downloaded messages. Also, you get weird messages/errors if the group home page has photo and text (got some tag related errors with a yahoo group FHRS_USA).

Email masking on Yahoogroups and Google groups..


Anyway, I realized that both yahoogroups and google groups had elaborate email masking on. True, without it, the online world would NOT be a safe place to be in. Email harvesters would retrieve emails left, right and center and make big bucks.

This is an example from yahoo groups content fetched through yahoo2mbox:

...

...
From devendra999@8_ZAVFsOGj8FjcpG8_oA372HgMSEAl6ol8NKeUSx7VKOtZW8QZCRKJGqiqzqMU-GwjIpYZBzGXe2JmZrjCeniA.yahoo.invalid Sat Aug 05 14:41:35 2000
Return-Path: <devendra999@8_ZAVFsOGj8FjcpG8_oA372HgMSEAl6ol8NKeUSx7VKOtZW8QZCRKJGqiqzqMU-GwjIpYZBzGXe2JmZrjCeniA.yahoo.invalid>
Received: (qmail 6062 invoked from network); 5 Aug 2000 21:41:35 -0000

...

...

This is an example of detailed header fetched using a similar utility called gmail2mbox.pl:
Received: by 10.90.81.11 with SMTP id e11mr6036070agb.27.1237961013457;
Tue, 24 Mar 2009 23:03:33 -0700 (PDT)
Return-Path: <raviraj.pe...@gmail.com>
Received: from mail-gx0-f164.google.com (mail-gx0-f164.google.com [209.85.217.164])
by gmr-mx.google.com with ESMTP id 15si1362316gxk.4.2009.03.24.23.03.32;
Tue, 24 Mar 2009 23:03:32 -0700 (PDT)

Again, what are my options?


So a couple days into the quest and no clear solution yet. Water, water everywhere and not a drop to drink. What an irony.

With these thoughts in mind, I was looking at individual emails (of the yahoo group that I was interested in) in my Yahoo! inbox. Quite absent mindedly, I opened one of the messages a view source and realized that it did not have email masking turned on. In fact, I realized, it could be on, because what would the reply-to address be then? In other words, the From Email id HAD to be available in the mail header of yahoo group email in my inbox.

Aha!


aha_moment

So then, the solution was plain and simple. I would extract the individual email messages for a yahoo group and then extract the email ids from there. Theoritically, it should work. In practice too, it did. Here is an example:
From v_ramesh00001@yahoo.com Thu Feb 10 16:59:16 2005
Return-Path: <v_ramesh00001@yahoo.com>
X-Sender: v_ramesh00001@yahoo.com
X-Apparently-To: sewausa_atl@yahoogroups.com
Received: (qmail 45652 invoked from network); 11 Feb 2005 00:59:15 -0000

So this meant that one would just need to be part of email distribution of a particular yahoo group that one is interested in. It is easier to extract emails if all yahoo group related emails are filtered into a specific folder (then you can utilize --folder option of fetchyahoo).

The quirks of making 'fetchyahoo' work..


For using perl, there are a couple options available, but the best option, in my humble opinion, seems to be Active Perl on windows, available at http://www.activestate.com/activeperl/ . The "other" options on windows are Cygwin or Virtualbox to simulate a unix environment in windows.

If you want to use Virtualbox, the easiest option is to install Ubuntu. You can find a lot of Virtualbox How to articles here. Another option is http://wubi-installer.org. Look at http://www.technobuzz.net/how-to-install-ubuntu-in-windows-with-wubi/

While Cygwin mostly works for simple unix stuff, I found that most of the perl package dependencies do not work out and you end up getting frustrated (I found that Crypt::SSLeay module had problems).

Installing ActivePerl and required modules for fetchyahoo


So activeperl .msi is installed, the perl executable is automatically aded to the windows PATH:
c:\emails\fetchyahoo-2.13.3>perl -version

This is perl, v5.10.0 built for MSWin32-x86-multi-thread
(with 5 registered patches, see perl -V for more detail)
..
..

Also, we will assume that the latest version of fetchyahoo is downloaded and extracted to c:\emails\fetchyahoo-2.13.3 folder:
c:\emails\fetchyahoo-2.13.3>dir
Volume in drive C has no label.
Volume Serial Number is DA07-A231

Directory of c:\emails\fetchyahoo-2.13.3

03/27/2009 09:48 PM <DIR> .
03/27/2009 09:48 PM <DIR> ..
03/09/2009 12:09 PM 15,182 ChangeLog
03/09/2009 12:09 PM 17,992 COPYING
03/09/2009 12:09 PM 2,747 Credits
03/09/2009 12:09 PM 107,289 fetchyahoo
03/09/2009 12:09 PM 5,359 fetchyahoo.1
03/09/2009 12:09 PM 2,287 fetchyahoo.spec
03/09/2009 12:09 PM 4,907 fetchyahoorc
03/09/2009 12:09 PM 6,314 index.html
03/09/2009 12:09 PM 19,380 INSTALL
03/09/2009 12:09 PM 966 TODO
10 File(s) 182,423 bytes
2 Dir(s) 147,889,139,712 bytes free

You should now try to run it. it gave me this error initially:
c:\emails\fetchyahoo-2.13.3>perl fetchyahoo
Can't locate MIME/Head.pm in @INC (@INC contains: C:/Perl/site/lib C:/Perl/lib .
) at fetchyahoo line 59.
BEGIN failed--compilation aborted at fetchyahoo line 59.

Basically, it needs the MIME::Head module installed. For this, we will use CPAN module. You can also use the CPANPLUS module, which is more advanced and has the option of un-installing PERL modules too. CPAN module cannot un-install modules.

This is how you invoke CPAN (you can also just type c:/> cpan). Note that the overwriting the lockfile message might come if a previous session did not terminate properly:
c:\emails\fetchyahoo-2.13.3>perl -MCPAN -e shell

There seems to be running another CPAN process (pid 5220). Contacting...
Other job not responding. Shall I overwrite the lockfile 'C:\Perl\cpan\.lock'? (
Y/n) [y]

cpan shell -- CPAN exploration and modules installation (v1.9205)
ReadLine support enabled

cpan> install MIME::Head

Going to read C:\Perl\cpan\Metadata
Database was generated on Thu, 26 Mar 2009 10:26:54 GMT
Running install for module 'MIME::Head'
Running make for D/DO/DONEILL/MIME-tools-5.427.tar.gz
Fetching with LWP:
http://ppm.activestate.com/CPAN/authors/id/D/DO/DONEILL/MIME-tools-5.427.tar.g
z
Fetching with LWP:
http://ppm.activestate.com/CPAN/authors/id/D/DO/DONEILL/CHECKSUMS
Checksum for C:\Perl\cpan\sources\authors\id\D\DO\DONEILL\MIME-tools-5.427.tar.g
z ok
Scanning cache C:\Perl/cpan/build for sizes
DONE
MIME-tools-5.427/
MIME-tools-5.427/testin/
MIME-tools-5.427/testin/multi-simple.msg
MIME-tools-5.427/testin/andreas-1296.uu
MIME-tools-5.427/testin/ak-0696.msg
MIME-tools-5.427/testin/short.txt
MIME-tools-5.427/testin/words.txt
..
..
..

CPAN.pm: Going to build D/DO/DONEILL/MIME-tools-5.427.tar.gz

*** Module::AutoInstall version 1.03
*** Checking for Perl dependencies...
[Core Features]
- Test::More ...loaded. (0.72)
- Mail::Header ...missing. (would need 1.01)
- Mail::Internet ...missing. (would need 1.0203)
- Mail::Field ...missing. (would need 1.05)
- MIME::Base64 ...loaded. (3.07_01 >= 2.2)
- IO::File ...loaded. (1.14 >= 1.13)
- IO::Handle ...loaded. (1.27)
- IO::Stringy ...missing. (would need 2.11)
- File::Spec ...loaded. (3.2501 >= 0.6)
- File::Path ...loaded. (2.04 >= 1)
- File::Temp ...loaded. (0.18 >= 0.18)
==> Auto-install the 4 mandatory module(s) from CPAN? [y]
...
...
Appending installation info to C:\Perl\lib/perllocal.pod
GBARR/TimeDate-1.16.tar.gz
nmake install -- OK
Running install for module 'Test::Pod'
Running make for P/PE/PETDANCE/Test-Pod-1.26.tar.gz
Fetching with LWP:
http://ppm.activestate.com/CPAN/authors/id/P/PE/PETDANCE/Test-Pod-1.26.tar.gz
Fetching with LWP:
http://ppm.activestate.com/CPAN/authors/id/P/PE/PETDANCE/CHECKSUMS
Checksum for C:\Perl\cpan\sources\authors\id\P\PE\PETDANCE\Test-Pod-1.26.tar.gz
ok
..
..
t/require.....ok
t/send........ok
All tests successful.
Files=8, Tests=127, 2 wallclock secs ( 0.00 cusr + 0.00 csys = 0.00 CPU)
MARKOV/MailTools-2.04.tar.gz
nmake test -- OK
Running make install
Prepending C:\Perl\cpan\build\MailTools-2.04-fCc9N7/blib/arch C:\Perl\cpan\build
\MailTools-2.04-fCc9N7/blib/lib to PERL5LIB for 'install'
...
...
Module 'MIME::Head' installed successfully

No errors installing all modules

Interestingly, you can also check this using Perl Package Manager GUI Utility (invoked by typing ppm on the windows command prompt). It just takes lot of time to load the GUI's data:

perl-package-manager-screenshot

Now, let us check if it works:
c:\emails\fetchyahoo-2.13.3>perl fetchyahoo --nodownload
No username specified.
Please enter your Yahoo! username: pooja33pandey
Please enter your Yahoo! password:
No mailbox or mailspool specified.
Please enter the path to and name of your mail spool or mailbox (eg /var/spool/m
ail/username): pooja33pandey.mbox
Logging in securely via SSL as poojagverma on Fri Mar 27 22:29:50 2009
Failed: Invalid ID or password entered (username: pooja33pandey )

If you are running Vista, you might see this infamous pop-up window, which you will need to unblock:
vista-perl-block-dialog

All right! So it works. Now, lets try a more comprehensive example:
c:\emails\fetchyahoo-2.13.3>perl fetchyahoo --onlylistmessages --username=pooja1
3pandey --password=<pwd> --spoolfile=pooja.mbox --logout
Use of uninitialized value $ENV{"HOME"} in concatenation (.) or string at fetchy
ahoo line 1992.
Logging in securely via SSL as pooja13pandey on Fri Mar 27 22:45:49 2009
Country Code 'in' not found. We will try the translation for 'us'.
Country code : in FetchYahoo! Version: 2.13.3
Successfully logged in as pooja13pandey.
Marking messages read on the server

Fetching mail from folder: Inbox
Getting Message ID(s) for message(s) 1 - 25.
1. new "Public Records " - Locate anyone. Search public records. 7:46 AM 6KB
2. new "Pooja Pandey <p" - online skype number (678) 534-2725 2:47 AM 3KB
3. old "Nimesh Bhuva <b" - Re: [GHPCSB_MCA_2k] Re: Happy Holi 27/3/09 35KB
4. old "Nimesh Bhuva <b" - Re: [GHPCSB_MCA_2k] Happy Holi 27/3/09 32KB
5. old "Birthday Remind" - First Reminder for Vibha Deshmukh's Birthday 26/3/09
4KB
6. old "Sharma, Ashish " - RE: [LIKELY JUNK]RE: [LIKELY JUNK]Re: So it 25/3/09 6
0KB
....
....
Got 90 Message IDs
Not downloading messages
Messages have not been deleted.
Logged out.

Note that fetchyahoo limits the messages fetched to 90 by default, because there is a download limit of 65mb per hour per user per IP address that is set by yahoo. You can use --safedownload option to give a gap of 5-10 seconds between each message fetch. This way, you can run a single command for a long time, without hitting the yahoo imposed download limit (per user, per IP).

Note that once you download the messages locally, they will be marked as read. If you want to terminate the download in between, you can do so and resume it later with the --newonly and --msgidarchivefile option. By defeault, the messages are appended to the archive/spool file:
D:\emails\fetchyahoo-2.13.3>perl fetchyahoo --folder=<foldername>  \
--username=<username> --password=<password> \
--safedownload  --spoolfile=<foldername>.mbox \
--msgidarchivefile=<foldername>_msgids  --newonly

Conclusion: The strategy in a nutshell


So there you have it. A simple mechanism to get targeted email ids for making your online marketing campaign successful:

1) Identify the Yahoo Group that you are interested in. This is a strategic decision. You want to limit your focus to people who would be interested in your idea. The demographics are important for high return on interest.

2) Become a member of the group and subscribe to individual emails.

3) Setup a filter to direct all Group emails to a specific folder. Free Yahoo account allows for 100 such filters now. Make sure the traffic is flowing in.

4) Sit on your ass for 6 months to 1 year to allow of significant volume of emails. If it is high activity/volume group, then your wait time would be lesser.

5) Fetch the Yahoo folder contents to your local PC. Now you are sitting on the goldmine.

6) Filter out the email ids using simple shell script provided here. Feel free to extent it to your needs. Always manually check the email ids retrieved.

7) Last, but not the least, input the contacts gathered to your mail broadcast software and reap the benefits by inviting them to your newsletter/broadcast.

Remember, the golden rule of thumb to retain the interest of your audience -- Do not send too many similar mails in too short a period of time. Start very moderately and hope that most of them would join your newsletter.

Conclusion - With great power comes great responsbility..


Well, I hope that this article was helpful to you, if your intentions are true and pure. I DO NOT support mis-use of this method for spamming people's inboxes and for immoral or lucrative purposes (that is NOT the intention with which this article has been written).

If you find this article useful or would like to discuss it further, please leave a comment here. Have a great day and Good luck.

Saturday, March 21, 2009

Quick script for extracting emails from unformatted text

Often, we face a need of extracting emails from some un-formatted text like or html tags etc. For this, the following script can come handy for extracting emails into simple text file, which can be uploaded to mailman or other mailing software contact lists:
$ more xtract_emails.sh
#/bin/ksh
sed -e 's/\,/\n/g' -e 's/ /\n/g' $1 | \
grep '@' | \
sed -e "s/[<>();]//g" -e 's/mailto://g' \
| sort -u > ${1}.extracted.txt

wc -l ${1}.extracted.txt

Example:

$ ./xtract.sh emails_unformatted.txt

131 emails_formatted.extracted.txt

Hope it is useful for someone else for extracting emails in a single shot. Otherwise, it takes a lot of time for doing several passes by examining the post-processed output. Even with the above heuristic rules, the output may not have 100% proper email, so some proof reading would be needed.

If this is useful to you, please leave a comment here.

Thursday, March 19, 2009

Failed opening required 'wp-blog-header.php' while using cformsII wordpress plugin

Problem


All right, so I was trying to insert a cforms II contact form in Wordpress 2.7.0 Installation and was facing this error in a pop-up dialog box:
Warning: require_once(wp-blog-header.php) [function.require-once]: failed to open stream: No such file or directory in /home/wisdom/public_html/prashna/wp-content/plugins/cforms/js/insertdialog25.php on line 9

Fatal error: require_once() [function.require]: Failed opening required 'wp-blog-header.php' (include_path='.:/usr/lib/php:/usr/local/lib/php') in /home/wisdom/public_html/prashna/wp-content/plugins/cforms/js/insertdialog25.php on line 9

I ignored it for a little while, but a point came when I had to really get it going. As everyone else, I googled it. The first 3-5 pages of it are sheer nonsensical links of websites facing the same error, but without solution. Then finally, I hit upon a support forum of cforms II, which suggested many things.

Solution


This is what worked for me. Apparently, the abspath.php was MISSING from the cforms plugin directory:
wisdom@wisdomspeak.org [~]# find . -name abspath.php
./public_html/mantra/wp-content/plugins/cforms/abspath.php

wisdom@wisdomspeak.org [~]# find . -name cforms
./public_html/mantra/wp-content/plugins/cforms
./public_html/gita/wp-content/plugins/cforms
./public_html/prashna/wp-content/plugins/cforms
./public_html/vedvaani/wp-content/plugins/cforms

# cp ./public_html/mantra/wp-content/plugins/cforms/abspath.php \
./public_html/prashna/wp-content/plugins/cforms

And everything was golden..


Thanks to that, I was able to have a contact form like this in the wordpress page:

cforms1