Приглашаем посетить
Есенин (esenin-lit.ru)

WebLog

                             DOCUMENTATION

         WebLog 1.05 by Darryl C. Burgdorf (burgdorf@awsd.com)

                    http://awsd.com/scripts/weblog/

WebLog is a comprehensive access log analysis tool.  It allows you
to keep track of activity on your site by month, week, day and hour,
monitor total hits, bytes transferred and unique domains visiting, and
keep track of your most popular pages.  It can even print out a
secondary report which tracks "user sessions," showing the path taken
through your site by each visitor and giving you a rough idea of how
long they spent looking at your pages.

              ===========================================

I.  THE REPORTS

The primary WebLog access report provides the following information:

    A.  Long-Term Statistics

        1.  Monthly Statistics:  An overview of site activity (number
            of hits, number of bytes transferred, and number of unique
            user sessions) per month for each month since you started
            running WebLog.
        2.  Daily Statistics (Past Five Weeks):  An overview of site
            activity per day for the past five weeks.
        3.  Day of Week Statistics:  An overview of site activity by
            weekday, maintained as a running total since you started
            running WebLog.
        4.  Hourly Statistics:  An overview of site activity by hour of
            the day, maintained as a running total.

        Each of the "Long-Term Statistics" reports includes a simple
        "bar graph" representation of the number of bytes transferred.

    B.  Statistics for The Current Month

        1.  Current Month Summary:  A quick synopsis of site activity
            since the beginning of the current calendar month.
        2.  Top N Files by Number of Hits (optional):  A list of the
            pages most frequently requested.
        3.  Top N Files by Volume (optional):  A list of the pages which
            resulted in the greatest number of bytes transferred.
        4.  Complete File Statistics:  A list of all pages accessed in
            the current calendar month, with the date of last access,
            number of times requested, and total number of bytes
            transferred.
        5.  Top N Most Frequently Requested 404 Files (optional):  A
            list of the pages people are requesting most often which
            don't actually exist on your site.
        6.  Complete 404 File Not Found Statistics (optional):  A
            complete list of the nonexistent files.
        7.  "Top Level" Domains:  A breakdown of how many
            people are visiting from which type of domain (.com, .net,
            .edu, etc.)
        8.  Top N Domains by Number of Hits (optional):  A list of the
            IP addresses (domains) from which people have visited your
            site most often.
        9.  Top N Domains by Volume (optional):  A list of the IP
            addresses from which people have requested the greatest
            amount of information.
        10. Complete Domain Statistics (optional):  A complete list of
            the IP addresses from which people have visited your site
            since the beginning of the current calendar month.

The optional access details report keeps track of "user sessions."  It
will show you detailed "tracks" of the paths taken through your site by
visitors for however many days you specify, and will give you overview
information regarding how many unique visitors you've had each day and
how long they seem to be staying around.  It will also provide you with
a running tally of the total number of user sessions and the apparent
average visit length since you started running WebLog.

    (CAVEAT:

    (Like any log analysis software, WebLog is based squarely upon
    several unfortunately questionable assumptions.  Chief among these
    is the assumption that any accesses from a specific IP address
    within a reasonably short period of time belong to a single user,
    and the assumption that analysis of access logs can actually tell
    you anything useful about site visitors, anyway.

    (It is possible for different users to access your site with the
    same IP address, so a single "user session" might actually reflect
    visits from multiple users.  As well, thanks to the number of
    systems which now employ local caching, it is quite likely that some
    of the pages which seem to be accessed only once are in actuality
    viewed many times by many different users.

    (For more information on these problems, you might want to read "On
    Interpreting Access Statistics" <http://www.cranfield.ac.uk/stats/>
    by Cranfield University's Jeff Goldberg.

    (WebLog also assumes that the time between the loading of one page
    and the loading of the next, so long as it is less than 30 minutes,
    is actually spent looking at the first page.  This is clearly not
    necessarily the case.  The user could have gotten up to fix himself
    lunch or use the bathroom.  He could have reloaded another page
    already in his browser's cache, or could even have gone to look at
    pages on other sites before returning to yours.  There is no way of
    knowing.

    (Finally, WebLog assumes that the average length of time spent
    viewing the last -- or only -- page visited in a user session is 30
    seconds.  Again, there is obviously no way to check the validity of
    this assumption.)

              ===========================================

II.  SETTING UP AND RUNNING WEBLOG

The files that you need are as follows:

weblog.pl:  This is the main program file.  You don't actually need to
  do anything to it; in fact, you don't even have to execute it.

config.pl:  This is the configuration file.  Everything you need to
  change or modify is contained here.  This is also the file that you
  will execute.  (Things are set up this way so that you can effectively
  maintain multiple versions of the script, for example if you want to
  run separate log analyses for different sites, just by keeping
  separate config files for each.)

countrys.txt:  An index list of "translations" for domain extensions.

bar1.gif, bar2.gif, bar3.gif, bar4.gif, bar5.gif and bar6.gif:  These
  six small graphics files are used to create the bar graphs in the main
  access report.

The script is intended to be run daily (via cron, preferably) on a
standard NCSA-format access log detailing the previous day's activity.
If it is not run in that manner, some of the report's information will
not be accurate!

As noted above, the WebLog configuration file, and not the WebLog
program itself, should be executed.  The configuration file should, of
course, be set executable.  Make sure that the first line of the script
matches the location of your system's Perl interpreter.  As well, the
following variables need to be defined:

$CountriesFile:  The filename of the domain extensions index.  (So long
  as you have all the files in the same directory, and haven't changed
  the file names, $CountriesFile = "countrys.txt" should suffice.
  Otherwise, you should provide the *full* path to the file.)

$LogFile:  The filename of the daily NCSA-format access log from which
  the log report will be generated.  Provide the *full* (absolute) path
  to the file.  Note that this file is generated by your server; if
  you're not sure where to find it or what it's called, check with
  your system administrators.  It is possible, though not likely, that
  you don't actually have access to log data.  If that is the case, then
  you won't be able to run WebLog at all.

$logtype:  This should be set to "standard" (or left undefined) if your
  server provides logs which do not include browser and referrer info,
  and to "combined" if your log files do include that information.
  (WebLog doesn't parse that information, but the fact that it's there
  will make a difference in how the rest of the data is read.)

$ReportFile and $ReportURL:  The filename and corresponding URL for the
  main report.  You'll need to create this file before the first time
  you run WebLog.  (You can do so with the UNIX "touch" command.)
  The filename should be specified as a *full* path.

$DetailsFile and $DetailsURL:  The filename and corresponding URL for
  the details report.  (If these variables are left undefined, the
  details report will not be generated.)  Again, you'll need to create
  this file before the WebLog program can use it!  Also, again, you'll
  want to specify the *full* path to the file.  The details report is
  the optional report which tracks "user sessions."

$SystemName:  The name or description which you want to appear at the
  top of your reports (e.g., "WebScripts").

$OrgName and $OrgDomain:  The name and domain of the "host" organization
  (e.g., ISP and isp.com).  If these variables are defined, accesses
  from this organization/domain will be counted separately from other
  accesses in the details report.

$GraphURL:  The URL of the directory containing the bar graph images
  (e.g., "http://awsd.com/graphs").  Do NOT include a trailing slash!

$IncludeOnlyRefsTo and $ExcludeRefsTo:  Regexs specifying files or
  directories to include or ignore in the files lists.  For example, to
  include only files in a "scripts" subdirectory, $IncludeOnlyRefsTo =
  "^/scripts" would suffice.  Multiple entries should be "OR"ed
  (e.g., $IncludeOnlyRefsTo = "(^/dir1|^/dir2)").

$IncludeOnlyDomain and $ExcludeDomain:  Regexs specifying domains to
  include or ignore in the domains lists.  (This, of course, is
  irrelevant if you're not printing domain lists.)

$IncludeQuery:  If this variable is set to "0" any query information
  contained in a URL will be stripped as the log file is processed.  If
  it is set to "1" the information will be retained.

$Print404:  A flag specifying whether the "Code 404" file lists should
  be printed.  0 = no; 1 = yes.

$PrintDomains:  A flag specifying whether or not to print lists of
  visiting IP addresses.  0 = no; 1 = yes.  NEW IN 1.04:  This variable
  can now also be set to "2" to indicate that you want only second-level
  domains tracked.  (In other words, for example, one hit each from
  user1.foo.com and user2.foo.com will both show up simply as two hits
  from foo.com, which can greatly reduce the size of your log file,
  especially if your site is busy!)

$PrintTopNFiles:  The number of files to include in the "Top N Files"
  lists.  Set to 0 if you don't want to print the lists.

$TopFileListFilter:  Regex defining files to exclude from the "Top N
  Files" lists.  The default value of "(\.gif|\.jpg|\.jpeg|Code 404)"
  will filter out most image files and any frequently-requested but non-
  existing files.

$PrintTopNDomains:  The number of domains to include in the "Top N
  Domains" lists.  (This, of course, is irrelevant if you're not
  printing domain lists.)

$DetailsFilter:  A regex defining files to exclude from the details
  report.  (This, of course, is irrelevant if you're not printing a
  details report.)  The default value of "(\.gif|\.jpg|\.jpeg)" will
  filter out most image files, making it easier to follow which actual
  pages were viewed, and allowing a (theoretically) more accurate
  tracking of the time spent on each page.

$DetailsDays:  The number of days past to include in the details report.
  (Again, this is only relevant if you're actually printing a details
  report.)  The number cannot be greater than 35.

$NoSessions:  If set to "1" this variable will instruct WebLog *not* to
  include unique session counts on the monthly, daily and day-of-week
  lists.

              ===========================================

This documentation assumes that you have at least a general familiarity
with setting up Perl scripts.  If you need more specific assistance,
check with your system administrators, consult the WebScripts FAQs
(frequently-asked questions) file <http://awsd.com/scripts/faqs.shtml>,
or ask on the WebScripts Forum <http://awsd.com/scripts/forum/>.

-- Darryl C. Burgdorf