Tuesday, January 22, 2013

Visitors apache log analyzer TXT and HTML output

Visitors is a very fast web log analyzer for Linux, Windows, and other Unix-like
operating systems. It takes as input a web server log file, and outputs
statistics in form of different reports. The design principles are very different compared to other software of the same type:

No installation required, can process up to 150,000 lines of log entries per second
in fast computers (20MB/s with my log files average length).
Designed to be executed by the command line, output html and text reports. The text
report can be used in pipe to less to check web stats from ssh.
Support for real time statistics with the Visitors Stream Mode introduced with
version 0.3.
To specify the log format is not needed at all. Works out of box with apache and most
other web servers with a standard log format (see the documentation for more
information on the format).
It's a portable C program, can be compiled on many different systems. Binaries for
Windows systems are in the Download section of this page.
The produced html report doesn't contain images or external CSS, is self-contained,
you can send it by email to users.
Visitors is free software (and of course, freeware), under the terms of the GPL
license. You don't need to pay to use it. Visitors is supported, if you want a
custom version made directly by the original author for a modest price, contact
me at antirez (at) invece (dot) org. ISPs may take advantage of the high
processing speed.



Graph generation combined with Graphviz
Using the graphviz mode Visitors will process the web log files and output a graph
ready to be rendered using Graphviz. The generated graph is the visual equivalent
of web trials, but is much more interesting for complex sites, so the focus of
this feature is not to create a generic graph of the whole site, but a graph of
the usage patterns that shows how the users are using it. Click on the image to
see the graph for www.hping.org, or read how to generate it in the on line
documentation.


Examples
The simplest usage, to be used interactively when you have a web log to check (for
example over ssh in your web server), just type:

visitors access.log | less
that will produce an human readable output in text only. To generate html web stats
with much more information you may use instead this:

visitors -A -m 30 access.log -o html > report.html
If you want information on the usage patterns for your site you must provide the url
prefix of your web site, and specify the --trails option.

visitors -A -m 30 access.log -o html --trails --prefix http://www.hping.org > report.html
Note that's ok to specify multiple file names, or to provide the input using the
standard input like in the following two examples:

visitors /var/log/apache/access.log.*
zcat access.log.*.gz | visitors -
Check the documentation for more information on how to use it.
Statistics generated with VISITORS version 0.7
http://www.hping.org/visitors for more information

=== General information ===
--- Information about analyzed log files
--- Generated: Fri Mar 31 12:53:09 2006

* Number of entries processed: 39472
* Number of invalid entries: 0
* Processing time in seconds: 3

=== Generated reports ===
--- Click on the report name you want to see
* Number of reports generated: 27
-> Unique visitors in each day

-> Unique visitors in each month
-> Unique visitors from Google in each day
-> Unique visitors from Google in each month
-> Pageviews per visit
-> Weekday-Hour combined map
-> Month-Day combined map
-> Requested pages
-> Requested images and CSS
-> Referers
-> Referers by first time
-> Robots and web spiders
-> User agents
-> Operating Systems
-> Browsers
-> 404 Errors
-> Domains
-> Googled pages
-> Adsensed pages
-> Google Keyphrases
-> Google Keyphrases by first time
-> Google Human Language
-> Screen resolution
-> Screen color depth
-> Web trails
-> Weekday distribution
-> Hours distribution

=== Unique visitors in each day ===
--- Multiple hits with the same IP, user agent and access day, are considered a
single visit
* Number of unique visitors: 7583
* Different days in logfile: 6
26/Mar/2006 : 794 |################ 10.5%
27/Mar/2006 : 1271 |########################### 16.8%
28/Mar/2006 : 1441 |############################## 19.0%
29/Mar/2006 : 2062 |############################################ 27.2%
30/Mar/2006 : 1417 |############################## 18.7%
31/Mar/2006 : 598 |############ 7.9%

=== Unique visitors in each month ===
--- Multiple hits with the same IP, user agent and access day, are considered a
single visit
* Number of unique visitors: 7583
* Different months in logfile: 1
Mar/2006 : 7583 |############################################ 100.0%

=== Unique visitors from Google in each day ===
--- The red part of the bar expresses the percentage of visits originated from Google
* Number of unique visitors: 7583
* Number of unique visitors from google: 1428
* Different days in logfile: 6
26/Mar/2006: 147 |########.................................... 18.5%
27/Mar/2006: 249 |########.................................... 19.6%
28/Mar/2006: 333 |##########.................................. 23.1%
29/Mar/2006: 316 |######...................................... 15.3%
30/Mar/2006: 272 |########.................................... 19.2%
31/Mar/2006: 111 |########.................................... 18.6%

=== Unique visitors from Google in each month ===
--- The red part of the bar expresses the percentage of visits originated from Google
* Number of unique visitors: 7583
* Number of unique visitors from google: 1428
* Different months in logfile: 1
Mar/2006: 1428 |########.................................... 18.8%

=== Weekday-Hour combined map ===
--- Brighter means higher level of hits
* Hour with max traffic starting at We 12:00 with hits: 253
* Hour with min traffic starting at Fr 13:00 with hits: 0

Mo: . .... ..
Tu: . . ......... ...
We: .. . #+-.-.......
Th: . . ....... ....
Fr: ..
Sa:
Su: .

000000000011111111112222
012345678901234567890123


=== Month-Day combined map ===
--- Brighter means higher level of hits
* Day with max traffic is Mar 29 with hits: 2156
* Day with min traffic is Mar 31 with hits: 674

Jan:
Feb:
Mar: .--#-.
Apr:
May:
Jun:
Jul:
Aug:
Sep:
Oct:
Nov:
Dec:

0000000001111111111222222222233
1234567890123456789012345678901


=== Pageviews per visit ===
--- Number of pages requested per visit
* Only documents are counted (not images). Reported ranges:: 13
1 : 3090 |############################################ 62.4%
2 : 1045 |############## 21.1%
3 : 391 |##### 7.9%
4 : 164 |## 3.3%
5 : 92 |# 1.9%
6 : 55 | 1.1%
11-20 : 43 | 0.9%
7 : 22 | 0.4%
8 : 17 | 0.3%
21-30 : 9 | 0.2%
9 : 8 | 0.2%
10 : 7 | 0.1%
> 30 : 5 | 0.1%

=== Requested pages ===
--- Page requests ordered by hits
* Different pages requested: 197
1) /: 4280
2) /robots.txt: 1272

=== Requested images and CSS ===
--- Images and CSS requests ordered by hits
* Different images and CSS requested: 55
1) /favicon.ico: 4850

=== Referers ===
--- Referers ordered by visits (google excluded)
* Different referers: 752

=== Referers by first time ===
--- Referers ordered by first time date, newer on top (referers from google excluded)
* Different referers: 750

=== Robots and web spiders ===
--- Agents requesting robots.txt. MSIECrawler excluded.
* Total number of different robots: 67
1) Mozilla/5.0 (compatible; Yahoo! Slurp;
http://help.yahoo.com/help/us/ysearch/slurp): 26
2) msnbot/1.0 (+http://search.msn.com/msnbot.htm): 14
3) Mozilla/5.0 (compatible; Yahoo! Slurp China;
http://misc.yahoo.com.cn/help.html): 13
4) Mozilla/2.0 (compatible; Ask Jeeves/Teoma;
+http://sp.ask.com/docs/about/tech_crawling.html): 9
5) Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.0.1) Gecko/20060111
Firefox/1.5.0.1: 8 6) NutchCVS/0.7.1 (Nutch; http://lucene.apache.org/nutch/bot.html; nutch-
agent@lucene.apache.org): 7 7) psbot/0.1 (+http://www.picsearch.com/bot.html): 5
8) msnbot/0.9 (+http://search.msn.com/msnbot.htm): 5
9) Mozilla/5.0 (compatible; BecomeBot/2.3; MSIE 6.0 compatible;
+http://www.become.com/site_owners.html): 5
10) larbin_2.6.3 (larbin2.6.3@unspecified.mail): 5
11) NutchCVS/0.8-dev (Nutch running at UW; http://www.nutch.org/docs/en/bot.html; sycrawl@cs.washington.edu): 4
12) Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.7.10) Gecko/20050724
Firefox/1.0.6: 4
13) -: 4
14) NaverBot-1.0 (NHN Corp. / +82-31-784-1989 /
nhnbot@naver.com): 4 15) Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html): 3
16) MJ12bot/v1.0.7 (http://majestic12.co.uk/bot.php?+): 3
17) Mozilla/4.0 (compatible; MSIE 5.0; Windows 95) VoilaBot BETA 1.2
(http://www.voila.com/): 2
18) Mozilla/4.0: 2
19) CSCrawler -> http://www.kde.cs.uni-kassel.de/lehre/ss2005/googlespam/crawler.html RPT-HTTPClient/0.3-3: 2
20) Mozilla/5.0 (Windows; U; Windows NT 5.1; ja; rv:1.8.0.1) Gecko/20060111
Firefox/1.5.0.1: 2

=== User agents ===
--- The entire user agent string ordered by visits
* Different agents: 1507

=== Operating Systems ===
--- Operating Systems by visits
* Different operating systems listed: 8
Windows : 5436 |############################################ 67.1%
Unknown : 1348 |########## 16.6%
Linux : 1001 |######## 12.4%
Macintosh : 244 |# 3.0%
FreeBSD : 37 | 0.5%
NetBSD : 12 | 0.1%
SunOS : 11 | 0.1%
OpenBSD : 8 | 0.1%

=== Browsers ===
--- Browsers used by visits
* Different browsers listed: 22
Firefox : 3178 |############################################ 39.2%
Explorer 6.x: 2664 |#################################### 32.9%
Unknown : 699 |######### 8.6%
Opera : 249 |### 3.1%
GoogleBot : 215 |## 2.7%
Other Mozilla based: 193 |## 2.4%
MSNbot : 181 |## 2.2%
Wget : 138 |# 1.7%
Safari : 133 |# 1.6%
Explorer 5.x: 122 |# 1.5%
Yahoo Slurp : 110 |# 1.4%
Konqueror : 74 |# 0.9%
Explorer unknown version: 60 | 0.7%
Lynx : 30 | 0.4%
ZyBorg : 9 | 0.1%
Ask Jeeves : 9 | 0.1%
Galeon : 9 | 0.1%
Links : 8 | 0.1%
NATSU-MICAN : 5 | 0.1%
W3M : 5 | 0.1%
Explorer 4.x: 3 | 0.0%
MultiZilla : 3 | 0.0%

=== 404 Errors ===
--- Requests for missing documents
* Different missing documents requested: 122

=== Domains ===
--- Top Level Domains sorted by visits
* Total number of Top Level Domains: 1
numeric IP : 8097 |############################################ 100.0%

=== Googled pages ===
--- Pages accessed by the Google crawler, last access reported
* Number of pages googled: 53

=== Adsensed pages ===
--- Pages accessed by the Adsense crawler, last access reported
* Number of pages adsensed: 22

=== Google Keyphrases ===
--- Keyphrases used in google searches ordered by visits
* Total number of keyphrases: 589

=== Google Keyphrases by first time ===
--- Keyphrases ordered by first time date, newer on top
* Different referers: 589

=== Google Human Language ===
--- The 'hl' field in the query string of google searches
* Different human languages: 34
1) en: 570
2) es: 98
3) de: 81
4) fr: 75
5) it: 65
6) ja: 37
7) pl: 34
8) zh: 27
9) pt: 24
10) nl: 23
11) ca: 14
12) tr: 10
13) sv: 10
14) vi: 8
15) ru: 7
16) no: 7
17) hu: 6
18) da: 6
19) ro: 6
20) fi: 5
21) cs: 5
22) th: 5
23) ko: 4
24) is: 3
25) id: 3
26) el: 3
27) bg: 2
28) sl: 2
29) sk: 2
30) bn: 1
31) tl: 1
32) ar: 1
33) lv: 1
34) lt: 1

=== Screen resolution ===
--- user screen width x height resolution
* Different resolutions: 25
1) 1024x768: 477
2) 1280x1024: 430
3) 1400x1050: 80
4) 1152x864: 56
5) 1600x1200: 55
6) 1280x800: 53
7) 1680x1050: 51
8) 800x600: 36
9) 1440x900: 33
10) 1280x768: 21
11) 1920x1200: 21
12) 1280x960: 17
13) 1280x854: 16
14) 2560x1024: 6
15) 3200x1200: 5
16) 1152x768: 3
17) 1024x1280: 3
18) 2800x1050: 2
19) 1240x1024: 2
20) 2680x1050: 2
21) 1440x960: 1
22) 1152x870: 1
23) 1600x1024: 1
24) 3840x1024: 1
25) 1280x973: 1

=== Screen color depth ===
--- user screen color depth in bits per pixel
* Different color depths: 4
1) 32: 999
2) 24: 216
3) 16: 154
4) 8: 5

=== Web trails ===
--- Referer -> Target common moves
* Total number of trails: 461

=== Weekdays distribution ===
--- Percentage of hits in every day of the week
Mo : 1377 |############################ 17.0%
Tu : 1511 |############################## 18.7%
We : 2156 |############################################ 26.6%
Th : 1484 |############################## 18.3%
Fr : 674 |############# 8.3%
Sa : 0 | 0.0%
Su : 895 |################## 11.1%

=== Hours distribution ===
--- Percentage of hits in every hour of the day
00 : 381 |########################### 4.7%
01 : 298 |##################### 3.7%
02 : 216 |############### 2.7%
03 : 259 |################## 3.2%
04 : 252 |################## 3.1%
05 : 204 |############## 2.5%
06 : 220 |############### 2.7%
07 : 216 |############### 2.7%
08 : 265 |################### 3.3%
09 : 318 |###################### 3.9%
10 : 289 |#################### 3.6%
11 : 340 |######################## 4.2%
12 : 611 |############################################ 7.5%
13 : 458 |################################ 5.7%
14 : 393 |############################ 4.9%
15 : 391 |############################ 4.8%
16 : 451 |################################ 5.6%
17 : 383 |########################### 4.7%
18 : 414 |############################# 5.1%
19 : 340 |######################## 4.2%
20 : 340 |######################## 4.2%
21 : 325 |####################### 4.0%
22 : 416 |############################# 5.1%
23 : 317 |###################### 3.9%

No comments: