Logs

Web Ten can be configured to have either Apache or Squid do its logging. When using Apache, Web Ten for Mac OS supports two logging formats: The CLF (Common Log Format), which also includes the Extended Log Format. When Web Ten uses the Squid cache, Squid does all the logging in CLF (Common Log Format). The advantage to using Squid cache is that server performance will improve (since Squid caches content in memory), but the log format capabilities are much less flexible than when Apache is used to log. If the Squid cache is on, the Squid cache accelerator will be responsible for logging activity. If caching is off, Apache is responsible for keeping logs of activity. The Logs section is divided into sections describing these two possibilities.

13.1 Apache Logging

The Apache-based logging is controlled via the Administration Server (http://servername/webten_admin). If you decide to use Apache logging you must start by turning off the Squid cache (see section See Cache Settings).

To set a custom Log file for each virtual host, use the TransferLog setting of the Virtual Host Config page (see section See TransferLog) or use the TransferLog setting in the Server Defaults (see section See TransferLog) to set a log file for all site activity. If there is not a file set as the TransferLog , Apache will only log to the disk if the Display Access Log window is opened.

A LogFormat must also be set for the TransferLog file to be used (see section See LogFormat). The standard WebSTAR Log format or the default standard CLF (Common Log Format) can be automatically entered into this field. In the LogFormat value field, you can also set a unique format rather than using one of the standard formats (note that if you use the toggle switch, then the Admin Server will replace any custom format symbols inserted with one of the standard configurations.)

To create a custom format, enter a combination of format symbols into the LogFormat field. For example, "%h %l %u %t %r %b" would be a functional format setting (don't forget the quotation marks). The subsections that follow describe some possible Apache log formats and the log format element symbols.

13.1.1 Log File Format Symbol Definitions

%W (Log records in Webstar format)

%h (The hostname of the client, (or IP number if hostname is not available or if

DNSLookup is off.))

%u (remote user from authorization, if any.)

%t (date and time in CLF format: (day/month/year:hour:minute:second zone.))

%r (first line of request exactly as it came from the client (i.e., the file

name, and protocol requested.))

%s (original http request status code returned to client before internal

redirection. Indicates where or not the file was successfully retrieved,

and if not, what error message was returned.)

%>s (final http request status code.)

%b (number of bytes sent, not including headers.)

%U (url path requested)

%T (transfer time or time taken to serve a request in seconds)

%p (TCP port of the server servicing the request)

%P (process ID of the server servicing the request)

%l (the clients remote logname, if supplied)

%v (name of (virtual) server servicing request)

%w (WebSTAR result, i.e. "OK", "ERR", or "PRIV")

%d (date and time in WebSTAR format)

%{}n (contents of note from another module in brackets)

%{}i (Input header item in brackets)

%{}o (output header item)

%{Referer}i (The URL the client was on before requesting your URL.

%{User-agent}i (The identity of the client software (browser.))

13.1.1.1 Some Configuration Examples:

Standard WebSTAR Format

DATE TIME RESULT HOSTNAME URL BYTES_SENT

"%W %d %w %h %>U %b"

Standard Common Log Format (CLF) (Default)

HOSTNAME LOGINNAME USER GMT_TIME "REQUEST" STATUS BYTES_SENT

"%h %l %u %t \"%r\" %>s %b"

WebSTAR Custom Format Example

HOSTNAME USER_AGENT USER GMT_TIME 'REQUEST' RESULT STATUS BYTES_SENT

"%W %h %{User-Agent}i %>u %t '%r' %w %>s %b"

13.2 Squid Logging

When the cache is enabled, Squid does all caching for the web server. When Squid is enabled, the directive "CacheTransferLog" in the httpd.conf file specifies the TransferLog file. When Squid is disabled, Apache takes responsibility for logging and uses the TransferLog directive with the Transfer log file specified in the Administration server. The default Squid log format is:

Client Ident - [Timestamp1] "Method URI" Type Sizes

This logging configuration can not be changed, but you can add HTTP Header Fields. Near the end of the squid.conf file, you will notice the lines:

# TAG: log_http_hdrs

# Append individual HTTP request headers to CLF log entry

#log_http_hdrs Referer User-Agent

The last line is an example implementing the most popular HTTP Header Fields, Referer and User-Agent. Using the same format, you can add any Header Field you want. Or you can remove the # on the last line and log these example Header Fields.

13.2.1 Squid Log Rolling and Splitting

13.2.1.1 Log Splitting

A common task that web administrators need to accomplish is to split up the logs into separate files so that they can be either processed by a program such as Funnel Web Pro, and/or placed into the virtual host folders so that virtual-host site administrators can view the log files.

There is also a log splitting script to split Squid generated logs which is included on the Web Ten CD ROM. This script archives the main WebTen.log file and splits it into separate files for each virtual host. Installing the script involves simply following the instructions given in the readme file that accompanies the script (which includes editing the crontab file for automatic log rolling).

13.2.1.2 Log Rolling

This log rolling method uses Squid's internal log rolling features to roll the logs. This may be more convenient if you do not want your logs split up into your separate virtual hosts. The first step involves editing the squid.conf file that resides in the tenon:squid:etc folder in the WebTen folder.

Under the "LOGFILE PATHNAMES AND CACHE DIRECTORIES" section in the squid.conf file, change "cache_access_log" to the directory and the name you want your logs to be. Example: We have two servers, so for Web01 we do "cache_access_log /usr/local/apache/web01_apache.log".

Under the "MISCELLANEOUS" section in the squid.conf file, change "logfile_rotate" to the number of log files you want to rotate through. If you want to rotate through a weeks worth of logs, then it would be "logfile_rotate 10".

You'll need to create a cron (see chapter See Clock Service (Cron)) job to tell Squid to rotate it's logs when you want it to. Cron executes programs, scripts, etc., at intervals you specify (Seconds, Minutes, Days, Weeks, Month, etc.)

Add the following line to your crontab file:

0 0 * * * /usr/local/squid/bin/squid -k rotate

cron will execute the "squid -k rotate" command at zero hour (midnight) to run Squid with the command line argument "-k rotate". Squid will close the current log, rename it with a ".#" at the end, and then create a new log file. Each successive day the log files are rotate and given a higher number.

Example: If our log file is named "podunk_apache.log", then at midnight Squid would rotate this to "podunk_apache.log.0". The next day the ".0" log file would be renamed with a ".1" at the end, and the current ".log" file would be renamed ".log.0". Your most current rotated log file will always have ".0" at the end.

Using CGIs

In general, when traversing a Web page, clicking on a link causes that client (browser) to send a message to the server (the site maintaining the Web page the client wishes to view) with a given URL. The server gets the file indicated by the URL and sends the contents of the file back to the browser to be displayed to the user. The Common Gateway Interface (CGI) is a mechanism that causes the server to behave differently.

The CGI protocol defines communication between the server and an external program. When the URL points to a CGI script file, instead of simply sending the contents of the file to the browser, the server executes the script and then returns the program output to the browser. This allows Webmasters to create dynamic documents and interactive pages.

14.1 Shell CGIs

A shell CGI is a text file that contains commands for the Bourne Shell or C Shell command interpreter. Any text editor can be used to create shell CGIs. The resultant file will typically have the file extension of " .sh " (e.g., mycgi.sh ). Place the file in the Web Ten cgi-bin folder.

The simplest CGI to create and use -- the shell CGI -- is a text file that contains commands for the Bourne Shell command interpreter. The steps are as follows:

Create a CGI called mycgi.sh. Store the newly created file in the cgi-bin directory. The new CGI can be referenced from a browser with the following URL: /cgi-bin/<cgi-name>. If mycgi.sh is stored in the cgi-bin directory, the URL would be: /cgi-bin/mycgi.sh.

Basic Steps

Create a text file (see See Required Shell Script Content)
Place the file in the Web Ten cgi-bin directory
Reference the file from a Web browser

14.1.1 Required Shell Script Content

In addition to creating the text file, there are a few important considerations with respect to the content of the file. First, the top line of the file must contain the following text:

#!/bin/sh

This tells the system that this is a Bourne Shell script and that the Bourne Shell should be used to interpret the rest of the script.

Second, you can use the echo command to generate text which will be returned to the browser that initiated the URL. The first echo command must contain the following Bourne Shell commands to generate HTTP. This puts Web Ten and the browser in the proper mode to accept everything else:

echo Content-type: text/plain

echo

The first echo indicates that text/plain will follow. The second echo is necessary in order to get the HTTP interpreter to accept the Content-type request. After that, any text sent with an echo command is printed on the originating browser's screen as a response to the URL request.

Shell scripts are text files containing Bourne Shell commands that can generate a stream of characters in response to being executed. There are Bourne Shell commands for assigning integer and string values to shell variables, commands for prescribing conditional flow through the shell script, and commands for running other programs. Relatively sophisticated CGIs can be created by combining different Bourne Shell commands. There are a number of widely available books describing Bourne Shell programming.

Bourne Shell CGIs are used for low-performance, easy-to-develop CGIs. Each Bourne Shell script is text, and is interpreted by a Bourne Shell interpreter controlled by Web Ten . Since the interpreter interprets each command, shell scripts operate fairly slowly and use a large number of processing cycles. Therefore, Bourne Shell scripts should be used primarily for rapid CGI development or CGI prototyping. If a CGI will be used in high volume, you may want to consider constructing a more efficient C Language CGI or a Perl CGI.

14.1.2 Printenv.sh Example

A sample shell CGI is included in the printenv.sh file located in the Web Ten cgi-bin directory. The first few lines of the file establish the mandatory #!/bin/sh and echo Content-type: text/plain requirements for any shell script. The remaining shell script commands are used to output a few lines of constant text, followed by a dozen or more lines that output the values of a family of shell variables. The following is the content of the printenv.sh CGI:

#!/bin/sh

# disable filename globbing

set -f

echo Content-type: text/plain

echo

echo CGI/1.0 test script report:

echo

echo argc is $#. argv is "$*".

echo

echo SERVER_SOFTWARE = $SERVER_SOFTWARE

echo SERVER_NAME = $SERVER_NAME

echo GATEWAY_INTERFACE = $GATEWAY_INTERFACE

echo SERVER_PROTOCOL = $SERVER_PROTOCOL

echo SERVER_PORT = $SERVER_PORT

echo REQUEST_METHOD = $REQUEST_METHOD

echo HTTP_ACCEPT = "$HTTP_ACCEPT"

echo PATH_INFO = "$PATH_INFO"

echo PATH_TRANSLATED = "$PATH_TRANSLATED"

echo QUERY_STRING = $QUERY_STRING

echo SCRIPT_NAME = $SCRIPT_NAME

echo REMOTE_HOST = $REMOTE_HOST

echo REMOTE_ADDR = $REMOTE_ADDR

echo REMOTE_USER = $REMOTE_USER

echo AUTH_TYPE = $AUTH_TYPE

echo CONTENT_TYPE = $CONTENT_TYPE

echo CONTENT_LENGTH = $CONTENT_LENGTH

When the printenv.sh CGI is referenced by a URL, it produces the following output:

CGI/1.0 test script report:

argc is 0. argv is .

SERVER_SOFTWARE = Apache/1.2.6.36 WebTen/3.0

SERVER_NAME = www.tenon.com

GATEWAY_INTERFACE = CGI/1.1

SERVER_PROTOCOL = HTTP/1.0

SERVER_PORT = 80

REQUEST_METHOD = GET

HTTP_ACCEPT = image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*

PATH_INFO =

PATH_TRANSLATED =

SCRIPT_NAME = /cgi-bin/printenv.sh

QUERY_STRING =

REMOTE_HOST = 192.83.246.60

REMOTE_ADDR = 192.83.246.60

REMOTE_USER =

AUTH_TYPE =

CONTENT_TYPE =

CONTENT_LENGTH =

14.1.3 Shell Variables

Shell variables are pre-defined values set by Web Ten before the shell CGI is started. Shell variables are referenced by placing a "$" character in front of the

name of the shell variable. If the shell interpreter finds a name that matches the string of characters following any "$" character, it substitutes the value of that

variable in its processing. In the case of the echo command, the value of the $VAR shell variable is substituted as a parameter to the echo command and is

output to the browser as a partial response to the URL request.

14.2 Perl CGIs

A Perl CGI is a text file that contains commands for the Perl language interpreter. The file name extension is usually " .pl ", and the file is placed in the cgi-bin folder. A Perl interpreter is included with Web Ten , so Web Ten is able to interpret Perl scripts.

This document describes Web Ten Perl CGIs. A Perl CGI is a text file that contains commands for the Perl language interpreter.

Create a new CGI called mycgi.pl.

Store the newly created file in the cgi-bin directory, under the Web Ten cgi-bin directory. The new CGI can be referenced from a browser with the following URL: /cgi-bin/<cgi-name>.

14.2.1 Required Script Content

In addition to creating the text file, there are a few important considerations with respect to the content of the file. First, the top line of the file must contain the text:

#!/usr/bin/perl

This tells the Web Ten system that this is a Perl script and that Perl should be used to process the remainder of the file.

Second, you can use Perl print statements to generate text which will be returned to the browser that initiated the URL. The first print command must contain an HTTP header. This header indicates what format or kind of data will be output by the remainder of the print commands. The choices are usually plain text or text that is marked up using the HyperText Markup Language (HTML). This first print command puts Web Ten and the browser in the proper mode to accept everything else.

For Perl scripts that output plain text, use:

print "Content-type: text/plain \n\n";

For Perl scripts that output HTML statements, use:

print "Content-type: text/html \n\n";

The print indicates that text/plain or text/html will follow. After that, any text generated with a print command is sent to the originating browser as a response to the URL request.

Perl scripts are text files containing Perl language statements that generate a stream of text characters in response to being executed. There are Perl statements for assigning integer and string values to variables, statements for prescribing conditional flow through the script, and statements for running other programs. Very sophisticated CGIs can be created by combining different Perl statements. A number of widely available books describing Perl programming are available.

Programming Perl, Second Edition by Larry Wall, Tom Christiansen and Randal L. Schwartz, with Stephen Potter. 1996, O'Reilly & Associates

Perl is used for medium-performance, easy-to-develop CGIs. Each Perl program is text. The scripts are interpreted by a Perl interpreter controlled by Web Ten . Since the interpreter interprets each Perl statement, Perl scripts can consume a lot of memory and use a large number of processing cycles.

14.2.2 Printenv.pl Example

A sample Perl CGI is included in the .printenv.pl file located in the Web Ten cgi-bin directory. The first few lines of the file establish the mandatory #!/usr/bin/perl and print Content-type: text/plain requirements for any Perl script. The remaining two Perl statements output a dozen or more lines that contain the values of a family of environment variables. The following is the content of the printenv.pl CGI:

#!/usr/bin/perl

print "Content-type: text/html\n\n";

while( ($key,$val) = each %ENV ) { print "$key = $val<BR>\n"; }

When the printenv.pl CGI is referenced by a URL, it produces the following output:

SERVER_SOFTWARE = Apache/1.2.6.36 WebTen/3.0

GATEWAY_INTERFACE = CGI/1.1

DOCUMENT_ROOT = /usr/local/etc/httpd/WebSites/www.tenon.com

REMOTE_ADDR = 192.83.246.60

APACHE_PORT = 81

SERVER_PROTOCOL = HTTP/1.0

REQUEST_METHOD = GET

REMOTE_HOST = 192.83.246.60

QUERY_STRING =

HTTP_USER_AGENT = Mozilla/4.61 (Macintosh; I; PPC)

ADMIN_PORT = 84

PATH = /bin:/usr/bin:/usr/ucb:/usr/bsd:/usr/local/bin

HTTP_ACCEPT = image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*

REMOTE_PORT = 1138

HTTP_ACCEPT_LANGUAGE = en,pdf

HTTP_CACHE_CONTROL = Max-age=259200

SCRIPT_NAME = /cgi-bin/printenv.pl

SCRIPT_FILENAME = /usr/local/etc/httpd/cgi-bin/printenv.pl

HTTP_ACCEPT_ENCODING = gzip

SERVER_NAME = www.tenon.com

REQUEST_URI = /cgi-bin/printenv.pl

HTTP_ACCEPT_CHARSET = iso-8859-1,*,utf-8

HTTP_X_FORWARDED_FOR = 192.83.246.60

SERVER_PORT = 80

HTTP_HOST = www.tenon.com

SERVER_ADMIN = webmaster@tenon.com

HTTP_VIA = 1.0 www.tenon.com:80 (Squid/1.1.20.6)

14.2.3 Environment Variables

Environment variables are pre-defined values set by Web Ten before the Perl CGI is started. Environment variables are referenced by the Perl statement $ENV{<env var>}. The Perl statement:

$ENV{PATH} = "/bin:/usr/bin";

sets the PATH environment variable. The Perl statement:

print $ENV{PATH};

prints the current value of the PATH environment variable.

14.3 C Language CGIs

A C language CGI is a computer program. To produce a C language CGI, you need to write the C language source program using any text editor. Then, a C language translator called a C compiler is needed to translate the C program into machine language. The machine language file with the extension " .c " is stored in the cgi-bin folder in a file that can be executed by Web Ten .

A C Language CGI is a computer program. To produce a C Language CGI you must first write the C Language source code using a text editor program. Once the program is written, a C Language translator, called a C compiler, is used to translate the C Language into machine language.

Create a new CGI called mycgi.c. Once the C Language source file is constructed, invoke the C Language compiler using the following format:

cc -O -o mycgi mycgi.c

This command produces a machine language file named mycgi using the C Language source found in the file mycgi.c. The resulting machine language file or objectfile is directly executable under Web Ten . You can use debugging techniques to ensure that the C Language CGI operates correctly. Once the CGI is complete, store the CGI in the Web Ten cgi-bin directory. Then reference the CGI with the following URL: /cgi-bin/mycgi

The CGI will be invoked by Web Ten and the output will be transported to your browser.

Basic Steps

Create a C Language source file
Compile and debug
Place the file in the Web Ten cgi-bin directory
Reference the file from a Web browser

C Language CGIs are used for high-performance CGIs since each C Language CGI is a compiled program.

14.3.1 Printenv.c Example

The C Language CGI example included with Web Ten is in a file named printenv.c, which is located in the Web Ten cgi-bin directory.The printenv source code is in tenon/examples/printenv.c.text. Note that this code will not compile and run. It is only listed as an example of how to write C language CGIs. Below is the content of the printenv.c CGI:

#include <stdio.h>

#include <stdlib.h>

typedef struct {

char name[128];

char val[128];

} entry;

void getword(char *word, char *line, char stop);

char x2c(char *what);

void unescape_url(char *url);

void plustospace(char *str);

entry entries[10000];

main(int argc, char *argv[]) {

char *cl;

printf("Content-type: text/html%c%c",10,10);

if(strcmp(getenv("REQUEST_METHOD"),"GET")) {

printf("This script should be referenced with a METHOD of GET.\n");

printf("If you don't understand this, see this ");

printf("<A HREF=\"http://www.ncsa.uiuc.edu/SDG/Software/Mosaic/Docs/fill-out-forms/overview.html\">forms

overview</A>.%c",10);

exit(1);

}

cl = getenv("QUERY_STRING");

if(cl == NULL) {

printf("No query information to decode.\n");

exit(1);

}

for(x=0;cl[0] != '\0';x++) {

m=x;

getword(entries[x].val,cl,'&');

plustospace(entries[x].val);

unescape_url(entries[x].val);

getword(entries[x].name,entries[x].val,'=');

}

printf("<H1>Query Results</H1>");

printf("You submitted the following name/value pairs:<p>%c",10);

printf("<ul>%c",10);

for(x=0; x <= m; x++)

printf("<li> <code>%s = %s</code>%c", entries[x].name, entries[x].val,10);

printf("</ul>%c",10);

}

This CGI prints the name/value parameter pairs that are available to any CGI when the CGI is invoked. The general flow of the printenv CGI is that it uses the printf statement to output Content-type: text/html\n\n. This is needed in order for the CGI to inform Web Ten and the remote browser of the type of content to follow.

The program then verifies whether or not a GET type of HTTP request was used to initiate the CGI. If a GET request was not used, an error message is returned with several printf statements and the program exits. If a GET HTTP request is found, the environment variable QUERY_STRING is requested. If that string is unavailable, an error message is printed and the program exits. If QUERY_STRING is found, a for loop is entered. The for loop calls the getword subroutine to parse the string into name and value pairs. Once all of the parameters have been parsed, the printf subroutine is called several times to output a constant string "QUERY RESULTS", followed by the string "You submitted the following name/value pairs:", followed by a name and value pair on each line until all of the name/value parameters have been displayed. When the printenv CGI is referenced by the URL:

/cgi-bin/printenv?company=Tenon Intersystems&addr=1123 Chapala St.&city=Santa Barbara

it produces the following output:

Query Results

You submitted the following name/value pairs:

company = Tenon Intersystems

addr = 1123 Chapala St.

city = Santa Barbara

14.4 Fast CGI

Web Ten includes built-in support for the execution of FastCGI scripts. FastCGI scripts are faster than normal CGI scripts because they are always running, whereas normal CGIs are re-loaded each time they are run. Any CGI can take advantage of FastCGI capabilities if the script's code is modified. Below is an example of the simple printenv.pl script in the form of a FastCGI. The "use CGI::Fast;" line makes the FastCGI capabilities available to the script. The "while" loop must contain the CGI's code. The "$query" variable will change every time the CGI is used by a client and therefore can be used to track which request is being processed.

#!/usr/bin/perl

use CGI::Fast;

while ($query = new CGI::Fast)

{

print "Content-type: text/html<BR>\n";

while (($key, $val) = each %ENV) {

print "$key = $val<BR>\n";

}

When a FastCGI such as this is run the first time, mod_fastcgi (an Apache module) spawns a process that keeps the script running while Apache is running. To have the FastCGI run automatically when Apache is first started, put the following lines in Web Ten 's httpd.conf file:

FastCGIServer /usr/local/apache/cgi-bin/printenv.fcgi -processes 1

</IfModule>

These lines will create one instance of the printenv.fcgi script whenever Apache is run. The number of processes can be increased if more instances are needed to accommodate the volume of requests. All FastCGI scripts are named with the ".fcgi" extension by convention. Be sure to set the correct path to the FastCGI script in the Apache directives (/usr/local/apache/ is the path to the Web Ten folder.)

ht://Dig

The version of ht://Dig included in Web Ten has been extended with a CGI interface that supports the administrative tasks of creating and maintaining searchable databases in a fully integrated, multiple virtual host Web Ten package.

ht://Dig is a very customizable utility. The Web Ten indexing CGI is designed as an easy to use front-end to htdig. It provides a quick way to get a basic set of htdig's search capabilities working for each virtual host in a Web Ten system. To further exploit the power of htdig, refer to the ht://Dig documentation (http://host.domain.com/htdig/doc/index.html). Note that the htdig configuration files created by the indexing CGI are stored in the /htdig/conf/<virtualhostname>.conf file for each virtual host.

You will probably want to customize the HTML search page and the results page from the defaults that are provided. Look in the ht://Dig documentation (http://host.domain.com/htdig/doc/index.html) for a description of the files that it uses for each page. Also look in the WebTen/tenon/apache/conf/httpd.conf file for the extra htdig configuration lines that were added by the Web Ten Search Engine Installer. You might want to change these directives if, for example, you wanted to change the URLs for users to access the search engine for a particular virtual host or for your entire Web Server.

Once a searchable database has been built, it may be necessary to periodically rebuild the database to include new or changed pages that have been added to a site. To facilitate periodic updates, the indexing CGI can also be run as a CRON script.

The indexing process can create large database files. Almost every word that is retrieved from examining a document is stored into a sorted database file for later searching. This means that a lot of disk space may be required to successfully complete an indexing operation. A large site might require as much as 300 Mbytes of available disk space!

16.1 Build the Web Ten Search Engine Index File

The Web Ten Search Engine Index files are built and maintained using a special indexing CGI. This CGI is intended only for Web Ten Administrators and it is protected within the Web Ten Admin realm (username and password are required). Use the following URL to open the indexing CGI.

Substitute your Web Ten servers name into: http://hostname/index.cgi

The indexing CGI displays a form with a fields for entering the URLs to be indexed, excluded and limited and an optional email address.

Figure 74: Default Indexing Options

The indexing form contains fields for specifying which URLs should be indexed. The Start URLs are the starting point for the indexing engine. The Exclude URLs are URLs that should not be indexed. The Limit URLs contains sets of patterns that the URLs must match.

The default Start URLs is a single URL matching the virtual host name used in the request. This default instructs the indexing process to visit all of the documents on this virtual host that are reachable (following any numbers of links) from the home page. The default Limit URLs specifies a set that exactly matches the set of Start URLs. In most cases, this is all that is needed to build a complete index of an entire virtual host. Additional URLs can be added to these lists.

The form also provides a field for an email address. It an email address is provided, the results of the indexing process will be emailed to that address.

Additional options may be displayed by clicking on the Options button. In this case, the form is displayed again with the default options shown (below). These defaults can then be modified. (The default options are used if the form is submitted without displaying the options.) The default settings are sufficient to create a search engine index (or database) file for the specified URLs.

Figure 75: All Indexing Options

To begin the indexing process, click on the Run! button. The CGI will start a batch indexing process (if the batch options is specified) that continues to run after the CGI has completed. A link to a file which will contain the detailed results of the indexing process is provided. Note that it may take some time for the batch indexing process to complete. (For example, a default Web Ten installation takes about 10 minutes.) If

the results are referenced before the indexing process is complete, only the completed parts of the indexing process will be shown. Providing an email address is the best way to be notified when the entire indexing operation is complete.

To continually monitor the progress of the indexing process, uncheck the batch option before clicking on the Run! button. In this case, the output from the indexing process is continually displayed in the CGIs output and the CGI does not complete until the indexing process completes.

16.2 Test the Web Ten Search Engine Database

The best way to test the searchable database is to perform some actual searches. Use the following URL to search for a particular topic on the indexed site:

Substitute your Web Ten servers name into

http:/host.domain.com/search.html

16.3 Multiple Virtual Hosts

The Web Ten Search Engine supports indexing and searching for multiple virtual hosts. By default, searchable databases are built on a per virtual host basis. For example, to build the index files for virtual hosts www.domain1.com and www.domain2.com, use the following URLs:

http://www.domain1.com/index.cgi

http://www.domain2.com/index.cgi

To search the databases for these virtual hosts, use the following corresponding URLs:

http://www.domain1.com/search.html

http://www.domain2.com/search.html

Logs

13.1 Apache Logging

13.1.1 Log File Format Symbol Definitions

13.1.1.1 Some Configuration Examples:

13.2 Squid Logging

Client Ident - [Timestamp1] "Method URI" Type Sizes

13.2.1 Squid Log Rolling and Splitting

13.2.1.1 Log Splitting

13.2.1.2 Log Rolling

Using CGIs

14.1 Shell CGIs

14.1.1 Required Shell Script Content

14.1.2 Printenv.sh Example

14.1.3 Shell Variables

14.2 Perl CGIs

14.2.1 Required Script Content

14.2.2 Printenv.pl Example

14.2.3 Environment Variables

14.3 C Language CGIs

14.3.1 Printenv.c Example

14.4 Fast CGI

use CGI::Fast;

while ($query = new CGI::Fast)

{

print "Content-type: text/html<BR>\n";

while (($key, $val) = each %ENV) {

print "$key = $val<BR>\n";

}

}

WEBmail

http://host.yourdomain.com/webmail_adduser.

http://host.yourdomain.com/webmail

15.1 Using WEBmail as an e-mail Client

http://host.yourdomain.com/webmail

Figure 72: WEBmail Login

15.2 Adding a WEBmail mailbox

Figure 73: Choose WEBmail account password

15.3 Customizing WEBmail

ht://Dig

16.1 Build the Web Ten Search Engine Index File

Figure 74: Default Indexing Options

Figure 75: All Indexing Options

16.2 Test the Web Ten Search Engine Database

16.3 Multiple Virtual Hosts

Plug-ins and Apache Modules

Figure 76: Apache Modules and plug-ins

17.1 Plug-ins

17.1.1 Installing Plug-ins

17.2 Apache Modules

Figure 77: Included Apache Modules

17.2.1 Installing Apache Modules