Webalizer
   HOME

TheInfoList



OR:

The Webalizer is
web log analysis software Web log analysis software (also called a web log analyzer) is a kind of web analytics software that parses a server log file from a web server, and based on the values contained in the log file, derives indicators about when, how, and by whom a web ...
, which generates web pages of
analysis Analysis ( : analyses) is the process of breaking a complex topic or substance into smaller parts in order to gain a better understanding of it. The technique has been applied in the study of mathematics and logic since before Aristotle (3 ...
, from access and usage logs. It is one of the most commonly used web server administration tools. It was initiated by Bradford L. Barrett in 1997. Statistics commonly reported by Webalizer include hits, visits, referrers, the visitors' countries, and the amount of data downloaded. These statistics can be viewed graphically and presented by different time frames, such as by day, hour, or month.


Overview

Website traffic analysis is produced by grouping and aggregating various data items captured by the web server in the form of log files while the website visitor is browsing the website. The Webalizer analyzes web server log files, extracting such items as client's IP addresses, URL paths, processing times, user agents, referrers, etc. and grouping them in order to produce HTML reports. Web servers log HTTP traffic using different file formats. Common file formats are
Common Log Format For computer log management, the Common Log Format, also known as the NCSA Common log format, (after NCSA HTTPd) is a standardized text file format used by web servers when generating server log files. Because the format is standardized, the ...
(CLF), the Apache Custom Log Format, and Extended Log File Format. An example of a CLF log line is shown below.
192.168.1.20 - -  6/Dec/2006:03:09:16 -0500"GET  HTTP/ 1.1" 200 1774
Apache Custom Log Format can be customized to log most HTTP parameters, including request processing time and the size of the request itself. The format of a custom log is controlled by the format line. A typical Apache log format configuration is shown below. LogFormat "%a %l \"%u\" %t %m \"%U\" \"%q\" %p %>s %b %D \"%i\" \"%i\"" my_custom_log CustomLog logs/access_log my_custom_log Microsoft's Internet Information Services (IIS) web server logs HTTP traffic in W3C Extended Log File Format. Similarly to Apache Custom Log format, IIS logs may be configured to capture such extended parameters as request processing time. W3C extended logs may be recognized by the presence of one or more format lines, such as the one shown below.
#Fields: date time s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs(User-Agent) cs(Referer) sc-status sc-bytes cs-bytes time-taken
The Webalizer can process CLF, Apache and W3C Extended log files, as well as
HTTP proxy The Hypertext Transfer Protocol (HTTP) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, ...
log files produced by Squid servers. Other log file formats are usually converted to CLF in order to be analyzed. In addition, logs compressed with either GZip (.gz) or BZip2 (.bz2) can be processed directly without the need to uncompress before use.


Command line

The Webalizer is a command line application and is launched from the
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common services for computer programs. Time-sharing operating systems schedule tasks for efficient use of the system and may also i ...
shell prompt. A typical command is shown below.
webalizer -p -F clf -n en.wikipedia.org -o reports logfiles/access_log
This command instructs The Webalizer to analyze the log file access_log, run in the incremental mode (-p), interpret the log as a CLF log file (-F), use the domain name en.wikipedia.org for report links (-n) and produce the output subdirectory of the current directory. Use the -h option to see the complete list of command line options.


Configuration

Besides the command line options, the Webalizer may be configured through parameters of a configuration file. By default, The Webalizer reads the file webalizer.conf and interprets each line as a processing instruction. Alternatively, a user-specified file may be provided using the -c option. For example, if the webmaster would like to ignore all requests made from a particular group of hosts, he or she can use the IgnoreSite parameter to discard all log records with the IP address matching the specified pattern:
IgnoreSite        192.168.0.*
There are over one hundred available configuration parameters, which make The Webalizer a highly configurable web traffic analysis application. For a complete list of configuration parameters please refer to the README file shipped with every source or binary distribution.


Reports

By default, The Webalizer produces two kinds of reports - a yearly summary report and a detailed monthly report, one for each analyzed month. The yearly summary report provides such information as the number of hits, file and page requests, hosts and visits, as well as daily averages of these counters for each month. The report is accompanied by a yearly summary graph. Each of the monthly reports is generated as a single HTML page containing a monthly summary report (listing the overall number of hits, file and page requests, visits, hosts, etc.), a daily report (grouping these counters for each of the days of the month), an aggregated hourly report (grouping counters for the same hour of each day together), a URL report (grouping collected information by URL), a host report (by IP address), website entry and exit URL reports (showing most common first and last visit URLs), a referrer report (grouping the referring third-party URLs leading to the analyzed website), a search string report (grouping items by search terms used in such search engines as Google), a user agent report (grouping by the browser type) and a country report (grouping by the host's country of origin). Each of the standard HTML reports described above lists only top entries for each item (e.g. top 20 URLs). The actual number of lines for each of the reports is controlled by configuration. The Webalizer may also be configured to produce a separate report for each of the items, which will list every single item, such as all website visitors, all requested URLs, etc. In addition to HTML reports, The Webalizer may be configured to produce comma-delimited dump files, which list all of the report data in a plain-text file. Dump files may be imported to spreadsheet applications or databases for further analysis.


Internationalization

HTML reports may be produced reports in over 30 languages, including Catalan, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hungarian, Icelandic, Indonesian, Italian, Japanese, Korean, Latvian, Malay, Norwegian, Polish, Portuguese, Portuguese (Brazil), Romanian, Russian, Serbian, Simplified Chinese, Slovak, Slovene, Spanish, Swedish, Turkish, Ukrainian. To generate reports in an alternate language requires a separate webalizer binary compiled specifically for that language.


Criticism

*Generated statistics do not differentiate between human visitors and robots. As a result, all reported metrics are higher than those due to people alone. Many webmasters claim that webalizer produces highly unrealistic figures of visits, which are sometimes 200 to 900% higher than the data produced by Javascript based web statistics such as
Google Analytics Google Analytics is a web analytics service offered by Google that tracks and reports website traffic, currently as a platform inside the Google Marketing Platform brand. Google launched the service in November 2005 after acquiring Urchin. As o ...
or
StatCounter StatCounter is a web traffic analysis website started in 1999. Access to basic services is free to use and advanced services can cost between and US$119 a month. StatCounter is based in Dublin, Ireland. The statistics from StatCounter are used ...
. *Reported hits are too high for download managers with segmented downloads; each 206 "Partial Content" is reported as one hit. *No query string analysis. Dynamically generated websites can not be listed separately (e.g. PHP pages with arguments).


See also

*
List of web analytics software This is a list of web analytics software used to collect and display data about visiting website users. Self-hosted software Free / Open source ( FLOSS) This is a comparison table of web analytics software released under a free software license ...


External links

*{{official, http://www.webalizer.net/ Free web analytics software