ApaLogFilter (V0.99.010)

Abstract

ApaLogFilter is a tool - released under the GNU General Public License - which filters and manipulates web server logfiles (NCSA Combined Log Format). The purpose is

  • to reduce the number of lines in the logfile to lines which are relevant for the statistics,
  • to manipulate the data in the entries, so that they can be better interpreted.
The analyses itself can be done with programs like analog or webalizer. This tool complements those by providing a preprocessing step for raising the quality of the reports.

Figure 1: Files of "normal" page

To illustrate the use of the program look at the case shown in Figure 1. There we have a page and its files. When a clients calls this page, the webserver adds an entry for every file.

The guy that have to do an report is only interested in that someone accessed the page, so one instead of ten entrys is enough. In our case he wants to know that someone has accessed the whole frameset and for better interpretation he might want to replace the filename from s00c000m.htm to a more clear name like Home.htm.

This is exactly what ApaLogFilter on the whole does.

Download

The script, config file and documentation are packed into one tarball.

ApaLogFilter-0.99.010.tar.gz

Arguments

The program needs the following arguments:
Call ........: >>--- ApaLogFilter.pl --- -in FILE --- -out FILE --->

                >--- -exf FILE ---><

Sample ......: ApaLogFilter.pl -in access.log -out access.log.mod -exf access.log.err

Arguments ...: -in FILE    Full qualified name of the logfile to process. 
               -out FILE   Full qualified name of the ResultFile.
	       -err FILE   Fill qualified name of the Exceptionfile.
   
Figure 2: Syntaxdiagram for ApaLogFilter

Usage

Modifiy ApaLogFilter.cfg to fit your environment:

  • Status.<number>
    Instructions for handling the "status" field. The records are consecutive numbered beginning by 1 and consists of the following subkeys:
    • .Value = The error code for which the action is defined.
    • .Action = The action that should be done when the error code matches:
      • FORCE_RECORD_IN_EXCEPTION_LIST = The record will be forced into the ExceptionFile without checking or processing the other keys following in this list. This is necessary to ensure, that invalid requests will not be lost. Redirecting them into a seperate file prevents the statistics from that impact.
      • REPLACE_VALUE = The error code will be replaced by another one. This is usefull for converting state 302 to 200 (this ensures, that your redirect-cgi can be integrated in a request report instead of a redirection report).
    • .NewValue = If the action is REPLACE_VALUE, than this field contains the new status code.

  • RemoteHost.<number>
    Instructions for handling the "RemoteHost" field. The records are consecutive numbered beginning by 1 and consists of the following subkeys:
    • .Value = A regular expression that represents the name of the client accessing your server.
    • .Action = The action that should be done when the string matches:
      • EXCLUDE_RECORD = The record will be excluded from the output. This is usefull for example to prevent records caused by yourself.

  • RequestUrl.<number>
    Instructions for handling the "RequestURL" field. The records are consecutive numbered beginning by 1 and consists of the following subkeys:
    • .Title = The string that will be set in the output, if one of the reqular expressions matches.
      Using the token $ALL, the original URL will be used for the output.
      Using the token $FILE, it will be replaced by everything after the last "/" of the original URL.
    • .Value.<number> = One string or regular expression per number (beginning by 1).
    • .Type.<number> = Indicator for wether an string or an regular expression was given given:
      • STRING = The value represents an ordinary string.
      • REGEXP = The value represents an regular expression.

    =>If there are no RequestUrl.<number>... entrys, the program puts all rows from AccessLogFile into the ResultFile (excepting that once going to the ExceptionFile). If there are RequestUrl.<number>... entrys, the program only puts that rows from AccessLogFile into the ResultFile (excepting that once going to the ExceptionFile), that matches one of the RequestUrl.<number>.Value.<number> entrys.

  • ReferredUrl.<number>
    Instructions for handling the "ReferredURL" field. The records are consecutive numbered beginning by 1 and consists of the following subkeys:
    • .Value = The substring representing the beginning of the referred-URL.
    • .Action = The action that should be done when the string matches:
      • REPLACE_WITH_BLANK = The referred-URL will be set to "-". Most of the times this is usefull for reducing informations in the output.

Figure 3: Keys in ApaLogFilter.cfg

###############################################################################
##                                                                           ##
##   A P A L O G F I L T E R . C F G                                         ##
##                                                                           ##
###############################################################################


#+----------------------------------------------------------------------------+
#| Instructions for handling the "status" field.                              |
#+----------------------------------------------------------------------------+
Status.1.Value    = 302
Status.1.Action   = REPLACE_VALUE
Status.1.NewValue = 200
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --
Status.2.Value    = 403
Status.2.Action   = FORCE_RECORD_IN_EXCEPTION_LIST
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --
Status.3.Value    = 404
Status.3.Action   = FORCE_RECORD_IN_EXCEPTION_LIST


#+----------------------------------------------------------------------------+
#| Instructions for handling the "remotehost" field.                          |
#+----------------------------------------------------------------------------+
RemoteHost.1.Value  = proxy[0-9]+\.sap-ag\.de
RemoteHost.1.Action = EXCLUDE_RECORD


#+----------------------------------------------------------------------------+
#| Instructions for handling the "request-URL" field.                         |
#+----------------------------------------------------------------------------+
RequestUrl.1.Title   = /Index.htm
RequestUrl.1.Value.1 = /html/s00c000m\.htm
RequestUrl.1.Type.1  = STRING
##### --
##### Means, that "/html/s00c000m.htm" is matched and
##### replaced by "/Index.htm" (for the output)
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --
RequestUrl.2.Title   = /Gallery/Index.htm
RequestUrl.2.Value.1 = /html/s01c000m\.htm
RequestUrl.2.Type.1  = STRING
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --
RequestUrl.3.Title   = /Gallery/Schottland.htm
RequestUrl.3.Value.1 = /html/s01c800b\.htm
RequestUrl.3.Type.1  = STRING
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --
RequestUrl.4.Title   = /Gallery/BigPic/$FILE
RequestUrl.4.Value.1 = /cgi-bin/p9\.pl\?STYLE=1&FILE=/pics/f[0-9][0-9][0-9]n[0-9][0-9]b\.jpg
RequestUrl.4.Type.1  = REGEXP
RequestUrl.4.Value.2 = /cgi-bin/p9\.pl\?STYLE=1&FILE=/pics/fp[0-9][0-9][0-9][0-9]b\.jpg
RequestUrl.4.Type.2  = REGEXP
##### --
##### Means, that e.g. "/cgi-bin/p9.pl?STYLE=1&FILE=/pics/f042n42b.jpg" is matched and
##### replaced by      "/Gallery/BigPic/f042n42b.jpg" (for the output)
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --
RequestUrl.5.Title   = $ALL
RequestUrl.5.Value.1 = /favicon.ico
RequestUrl.5.Type.1  = STRING
##### --
##### Means, that "/favicon.ico" is matched and
#####             "/favicon.ico" also will be used for output.
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --
#...


#+----------------------------------------------------------------------------+
#| Instructions for handling the "referred-URL" field.                        |
#+----------------------------------------------------------------------------+
ReferredUrl.1.Value  = http://www.scheibli.com
ReferredUrl.1.Action = REPLACE_WITH_BLANK
#- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - --
ReferredUrl.2.Value  = http://sites.inka.de/scheibli
ReferredUrl.2.Action = REPLACE_WITH_BLANK
   
Figure 4: Sample for a ApaLogFilter.cfg

Feedback

In case you have comments, added functionality or found a bug, please feel free to contact me.