Alwanza Home How I Created stats.cgi Using
Web Statistics with LAMP
Alwanza Bells and Whistles
 
Here are the three web pages involved in the "spying":
www.alwanza.com The Alwanza home page
www.alwanza.net/alwanza/cgi-bin/stats.cgi The statistics display page
www.alwanza.net/alwanza/cgi-bin/spy.cgi The cookie-setting, count-updating, and header-gathering page (this was a "hidden" page with no clickable links to it until now).

The Visitor Count on the Alwanza home page www.alwanza.com is where the "spying" happens.  The Alwanza home page itself is a static page, but it includes an iframe which contains a link to a dynamic Perl-CGI page, spy.cgi.  Spy.cgi will also do spying, but since (until now) it was a hidden page, I was the only one who went there (usually to debug my script) and I've set my script to ignore my own "hits" so they don't fill up the stats page stats.cgi

Restated:  when a visitor comes to the Alwanza home page, he or she is really looking at not one, but two pages, the Alwanza home page and the spy.cgi page.  The spy.cgi page is being viewed through an iframe (like a little window) on the Alwanza page.  It just happens to display the visitor count because I've aligned top and left through HTML so that the visitor count on spy.cgi lines up with the iframe on the Alwanza home page.  I've done my best to ensure that it works in all browsers (that render iframes) on all operating systems, at all resolutions. 

Commercial websites that collect visitor statistics to target customers and guide them to potential sales, use similar techniques.  Some of them serve up their entire web site using dynamic pages doing a lot of behind-the-scenes click-tracking.  Others leave the click-tracking to a subcontractor and just include an image (visible or invisible) on every page that relays header information to a third party (or several third parties).  The visible images are often advertisements.  Whether visible, or invisible, (and in or out of an "iframe") these links that permit headers to be collected from visitors are called "Web bugs" (because they "bug" - listen in on - the traffic that comes to the Web site).

But enough about everyone else:  Alwanza does not do click-tracking.  That ISN'T to say I CAN'T, or that I won't do it in the future, but for now, the only page I collect statistics from is my home page.  Click tracking involves collecting visitor headers from every or most pages, so the collector of the statistics can tell by time stamps, IP addresses, cookies (if they are accepted), how long a visitor stayed at any page, what page he/she last visited and what page he/she visited next (the headers don't tell that, the time stamp does).

Notice that the "domain" part of the URL (Universal Resource Locator - web address) is different from the Alwanza home page to spy.cgi and stats.cgi.  That is because, unlike the Alwanza home page, which is served from a commercial Web hosting server, the spy.cgi and stats.cgi pages are served off my own server at home.  Yes, I COULD put my entire Web site on my home server, but there are some advantages to having a Web site housed on a big fast commercial server with back-up power supplies (especially when one lives on power-outage riddled Finn Hill).  The reason I serve the stats.cgi and spy.cgi pages from my home server is so I can have more control over the environment.

Once the spy page spy.cgi, collects the statistics, the information gets slightly reformatted (for human viewing) and then my spy.cgi script enters the "transaction data" into my MySQL database.

The stats page stats.cgi displays the last 150 records from the MySQL database.  The stats page only displays the result of the MySQL query, it does not take in any new information.

Some people have asked me "Why are there so few cookies displayed in your cookie column?"  There are several answers to that.  When I coded the stats.cgi page, I made a decision that I would only display RETURNING cookies.  That is why the count numbers displayed on the cookies do NOT contain the current count.  In order to do this, I retrieve the cookie that already exists in the browser cookie cache, and display that, then set a new cookie with the new count added to the existing value.  I've also coded the cookies to "expire" in 2 years.  So the following categories of visitors will not display cookies on the stats.cgi page:
  • new visitors who have arrived at alwanza.com for the first time
  • new or returning visitors who have their cookie acceptance turned off
  • returning visitors who have emptied their "cookie cache" since their last visit
  • returning visitors who are using a different computer or who have reformatted their old one
  • returning visitors who last visited more than 2 years ago
As it turns out, a lot of visitors to Alwanza.com have their cookies turned off or empty their "cookie cache" on a regular basis.  I know this because I see lots of returning IP addresses with no cookies.  If I were a commercial business I could coerce the use of cookies (by making pages or features viewable only to those who allowed cookies), or at least test if the user had cookies turned off (an easy addition to the script), but since that is not the purpose of this script, I haven't done that.  Detecting if a visitor allows cookies might be a project for the future:  if I find out that the cookies are turned off, I could display "cookies turned off" in the cookie column.  All that involves is attempting to retrieve the cookie that was just set.  If I can't retrieve it, cookies have been turned off.

You can't see the script code that captures the header information if you "view source" on these pages, although you can see the iframe on the Alwanza home page.  They are written in Perl-CGI which renders the code "server-side" (as opposed to Javascript spyingJS.html which renders code client side, but is invoked at the Web page which is too late to pick up headers).

Data gathered from headers alone, viewable from the stats.cgi page, doesn't usually contain enough information for a malicious or greedy host to launch either a spam campaign, nor a phishing expedition against you, using your personal computer.  That doesn't mean that all sites that collect headers are safe.  Some may also be exploiting security holes in your browser, or operating system, to gain information you wouldn't freely give to strangers.  Although at Alwanza.com (and probably many sites that collect web statistics), the statistics are being used for curiousity only.

The commercial web host that houses the Alwanza.com domain also collects web statistics, bundles them monthly and makes them available to me in long text pages.  Every so often I look at those statistics.  They give me an indication of which of my Web pages are the favorites among people who come to my Web site, also which search engine sent them here.

Similar information can be found in the access log on Apache web servers.  Access logs can be configured to collect the URL of the Web page that brought the visitor to a Web page, as well as header information.


If headers don't provide enough information for identity theft, how can that happen?

It is important to always be aware that information put into a Web browser has at least 3 vulnerable points: 
  • The point of origin (the keyboard or hard drive of the computer you are using may be accessed by someone else or by a worm to gather the information you have supplied or stored - like stored passwords),
  • the destination (do you know if all the people who will have access to your information at the destination are trustworthy?), and
  • in-between (information maybe captured as it passes from one point to another).
Of course this is also true with cell phones, and other kinds of transactions, but other kinds of transactions usually require human proximity or knowledge of a physical location.  The human element is more distant with the computer transaction and that makes accountability harder to enforce.  For example, many people who shop online, don't even bother to find out the physical location or phone number of their suppliers.  Many ebusinesses don't provide that information on their Web sites.


Nuts and Bolts:

LAMP is Linux, Apache, MySQL, and Perl.  Well, it is on MY web site, anyhow.  It is possible to substitute PHP for Perl.  Some people use the database PostgreSQL instead of MySQL.  The Linux operating system especially lends itself to being a Web server because of the ease and granularity of permissions (not yet achieved by Microsoft).  Apache can run on Windows, too; and, for that matter, so can MySQL, although it isn't quite as secure.  For those who want it spelled out:  Linux is the Operating System, Apache is the Web service, MySQL is the database, and Perl is the programming language.  When they all work together they make "Web bugs" and "click-tracking" possible.

One last thing before I get into pieces of code:  There are two additional techniques that often go along with LAMP:  the use of secure web pages (https) and plain old HTML forms.  Since I haven't provided associated examples, I'm not writing about them, except to say that when they are used in combination with LAMP, the information supplied by the visitor when completing the form with the addition of the information gained through the email headers is much more complete and useful to whoever has access to it.


My Code:

spy.cgi

stats.cgi

SPYCOUNT.pm

SPYDB.pm


Please email to contact.cgi if you have any comments or questions about this page.

Created: 02/20/06
Updated: 12/30/09