Приглашаем посетить
Грин (grin.lit-info.ru)

Database Search Engine

                            BASIC INFORMATION

Name: Selena Sol's Database Search Engine
Version: 5.02
Last Modified: 01-23-96

                               DESCRIPTION

This database search script allows users to search a flatfile UNIX
database by keyword, number or date range.

                          COPYRIGHT INFORMATION

This application was written by Selena Sol (selena@eff.org,
http://www.eff.org/~erict) and Gunther Birznieks
(birzniek@hlsun.redcross.org)having been inspired by countless other Perl
authors.  Feel free to copy, cite, reference, sample, borrow, resell or
plagiarize the contents.  However, if you don't mind, please let me know
where it goes so that I can at least watch and take part in the
development of the memes. Information wants to be free, support public
domain freware.  Donations are appreciated and will be spent on further
upgrades and other public domain scripts.

Finally, PLEASE SEND WORKING URL's to selena@eff.org.  I maintain a list
of implementations around the world.

				 SUPPORT

This script comes with no gaurentees or warranties.  Don't expect the   
scripts to be perfect. They have evolved continually over the last two
years and will continue to do so.

Bug reports are greatly appreciated but installation support is
extremely discouraged. I have attempted to include as much information as 
I could think of in this README and in the Customization and Installation
FAQ available at http://www.eff.org/~erict/Scripts/.  And there is
always lively discussion in Selena Sol's Discussion Forum
(http://www.sidestreets.com/info/sssa/). Please try ALL available sources 
of information BEFORE you email me.  But if you must, make sure to include
the following bits of information (I may not respond to your email if you
do not answer ALL of the following questions):

1. What type of Web server are you running?
2. What type of Operating System is the Web server running on?

3. What is the "exact" error message from the Web?
4. What is the "exact" error message in your web server's error log?
5. What is the "exact" error message you receive when running the script
       from the command line.
                    
6. Are you running this script on an ISP?  If so, what is the email
       address of the Sysadmin there?
7. Are you using a virtual server setup?  If so, what is the root path set
       in your Web server's environment?

8. In which directory is the Perl interpreter located?
9. In which directory is sendmail located (if you are using a script which
       demands use of sendmail)

Again, I MAY NOT ANSWER YOUR QUESTION unless you have answered all nine of
these questions.

                BASIC INSTALLATION (DOWNLOADING THE SCRIPT)

It is recommended that you point your Web browser to "Selena Sol's Script
Archive" to get the latest version of this script.  The Script Archive is
located at the following URL:

                    http://www.eff.org/~erict/Scripts/

From the "Script Archive" frontpage follow the hyperlinks to the detailed
page dedicated to this script.  Then click on the hyperlink "Download the
scripts as a single tar file".

                BASIC INSTALLATION (UNARCHIVING THE APPLICATION)

Tar is a UNIX command that allows you to create a single archive file
containing many files.  Such archiving allows you to maintain directory
relationships and facilitates transferring complex programs with many 
separate but integrated parts which must have their relationships
preserved. Tar has a motley of options which allow you to do archiving and
unarchiving in many ways.  However, for the purpose of untarring this    
application, the commands will be fairly simple.

Once you have downloaded the TAR file transfer it to an executable
directory on your web server and "untar" it.  On UNIX systems, you may
type the following at the command line (in the same directory as the TAR
file itself):
       
                        tar xvfp web_store.tar
                
Tar will go through the archive file and separate out each individual
directory and file, expanding them into their appropriate places
underneath the current directory.  The "xvfp" letters in the tar command  
above are parameters that tell the program to extract the files and   
directories out of the ".tar" file. Specifically, "x" tells tar to extract
the files. "v" tells tar to output information about the status of its
extraction while it is performing the work, "f" informs tar to use the
".tar" filename as the source of the files to be extracted, and "p" notes
that the original permissions hould be maintained. The reason the
"f" parameter has to be used is that tar, by default, archives files and 
directories to a tape drive. Tar is actually short for "[T]ape [AR]chive".
                    
       (Note: If you are using a non-UNIX Operating System, you may
       download a TAR/UNTAR program by pointing your Web browser
       to http://www.shareware.com  I suggest using "untar" as your
       keyword when you search their inventory).

                 BASIC INSTALLATION (SETTING PERMISSIONS)

Untarring the files is only one part of the equation of installing the 
application and getting it to actually run.  Frequently, the Web
server needs to be given special permission to run your scripts and have
the scripts perform their job with the appropriate "rights".

The cardinal rule for setting up Web server software is that the server
should be given only minimal capabilities. This definitely rules out the
Web server running as the ROOT user (Super user on UNIX). More often than
not, it means the Web server is run as a user that has no rights to do
anything significant -- the user "nobody". By default, "nobody" usually 
does not have permission to read any files in directories that you create.
However, when you download scripts, you need to make it so that the
scripts can be read and executed by the Web server software.  In other
words, "nobody" has to be able to get to the files.

The magic command is

               chmod [permission digit] filename

Thus, the following command gives everyone rights to read and execute, but
noone rights to write to the file hello.cgi.

			chmod 555 hello.cgi

The actual value of the digit determines the permissions granted to that
area. Permissions consist of three numbers -- 4 for read, 2 for write, and
1 for execute access.  By adding these numbers together, you form the
permissions that make up one digit. For example, 4 + 2 + 1 = 7 which
grants read, write, and execute permissions. 4 + 1 = 5 which only grants
read and execute permissions. Thus, 755 grants 7 (read, write, execute) to
the owner of the file, and 5 (read and execute) to the group the file is 
in and the world. Below is a chart which will can be used as a quick
reference

        PERMISSION      COMMAND
         U   G   W
        rwx rwx rwx     chmod 777 filename
        rwx rwx r-x     chmod 775 filename
        rwx r-x r-x     chmod 755 filename
        rw- rw- r--     chmod 664 filename
        rw- r-- r--     chmod 644 filename

        U = User
        G = Group
        W = World

        r = Readable
        w = writable
        x = executable
        - = no permission


[WARNING] You may be tempted to simply use chmod 777 on all the files and
directories since that assures the Web server can do anything with the
files. However, it is strongly advised that you do not leave the files in
this state. It is considered a big security risk to leave your scripts
open to changes by the Web server instead of being read-only. Anyone on
the server could use another rogue CGI script to write over your scripts
and make them do something completely different. There is still a risk
involved in making the messages directory writable, but at least if
someone is going to be messing with your area, they will only destroy a
bit of data and not your main programs. It is "OK" to set the scripts to
777 if you are troubleshooting a problem and want to rule out permissions
entirely, but do not leave the scripts like this. On another security   
note, if you are really concerned with the security of your data such as,
please do not use a shared server where other people can write CGI scripts
using the same Web server configuration.  It is much better to use your
own server software or purchase space on a "virtual server" which may be
shared, but is set up in such a way that each user's scripts are shielded 
from each other.

        Note: Not setting your permissions correctly is the
        NUMBER 1 reason why installations fail.  Take time to
        get this right.

The actual permissions required for the subdirectories and files used by 
this application are listed in the next section.

         BASIC INSTALLATION (FILES, DIRECTORIES, AND PERMISSIONS)

The TAR file will then expand into a root directory called
Database_search. Database_search will contain several sub-directories
and several files. The diagram below depicts the directory structure as
well as the permissions which must be applied to the files and
subdirectories used by the application.

Database_search Root Directory (drwxr-xr-x)
   |____Data_files (drwxr-xr-x)
   |       |____address_book.data (-rw-r--r--)
   |____Library Subdirectory (drwxr-xr-x)
   |       |____auth-extra-html.pl (-rw-r--r--)
   |       |____auth-extra-lib.pl (-rw-r--r--)
   |       |____auth-lib-fail-html.pl (-rw-r--r--)
   |       |____auth-lib.pl (-rw-r--r--)
   |       |____auth-server-lib.pl (-rw-r--r--)
   |       |____auth_fail_html.pl (-rw-r--r--)
   |       |____cgi-lib.pl (-rw-r--r--)
   |       |____cgi-lib.sol (-rw-r--r--)
   |       |____db-lib.pl (-rw-r--r--)
   |       |____mail-lib.pl (-rw-r--r--)
   |____Setup_files (drwxr-xr-x)
   |       |____address_book.setup (-rw-r--r--)
   |____Users (drwxrwxrwx)
   |       |____default.users (-rw-rw-rw-)
   |____db_search.cgi (-rwxr-xr-x)

Database_search is the root directory of this application.  It must have
	its permissions set to be readable and executable by the web server.

Data_files is a subdirectory containing all of the data files
	which db_search is supposed to seacrh.  The
	directory itself must be readable and executable and all the files
	within it must be readble.

	By default, we have included one sample data file called
	address_book.data

        Every data file is simply a pipe (|) delimited database using a
        newline to represent a new database row.  For example, the first
        line might read:

        1234|Bic Ball Point Pen|1.45|1

        As you can see, every database field is separated by the pipe
        symbol (which means that the pipe symbol may not appear in your
        data!).  In this example, field one is the item number, field two
        is the item name, field three is the price, and field four is the
	unique database id number.

        Every database row MUST have a unique database id number, so if
        you want to add database rows manually, you must make sure to add
        this final field carefully.  

        Comment lines are acceptable within the data file, but they must
        be specified using the comment tag "COMMENT:" flush against the
        left margin.  The first line of video.data uses a comment line to
        describe the database fields as follows:

        COMMENT: Category|Item #|Name|Description|Size|Price|URL|Link...

Library is a subdirectory containing various supporting files used by the
	application.  The directory must be redable and executable by the
	web server and each library file within must be readable.  
	cgi-lib.pl is used to read and parse form input.
	cgi-lib.sol is used to lock the data file when it is being
		manipulated.  
	db-lib.pl is used to search through the datafile according to the
		criteria entered by the user.
	all libraries beginning with "auth" are used for authentication
	mail-lib.pl is used with authentication to mail new registrants.

Setup_files is a subdirectory containing the setup files which describe
	server specific variables and database specific variables.  The
	subdirectory must be readable and executable by the web server and
	the setup files within the directory must be readable by the web
	server. By default, we have included one sample setup file called
	address_book.setup.  Below is a short description of the variables
	and their meanings.

	$data_file_path is the location of the data file which will be
	searched

	$max_rows_returned is the maximum number of rows which will be
	returned at one time.  If the users search criteria turn
	up more than that they will not be displayed unless you
	define a "see next hits" button in your HTML code.

	$current_century is the current century.

	%db defines the the structure of your datafile.  It is an
	associative array which matches up the
	variable names which you will use to access data files
	with the actual numerical index of that field.  Remember
	that arrays start counting at zero.

	$index_for_email is the numberical index of email in case you want
	to make it a hyperlink.

	$index_of_field_to_be_sorted_by defines which field we should sort
	the database by default.  This index corresponds to %db and can be
	overridden by a form variable called sort_by.

	@db_display_fields are the headers of the fields that you wish to
	display to the user when they receive the search results.
	You do not have to display all the fields you defined in
	%db, but you must have one element in @db_display_fields
	for every element in @db_index_for_display

	@db_index_for_display is the index into %db of the fields that you
	want displayed when the user get a search results page.
	This corresponds to @db_display_fields

	@db_query_criteria defines how each database field will be
	searched.  The specifics of how to use @db_query_criteria
	are discussed in db-lib.pl

        The array contains pipe-delimited fields inside each
        list item. The fields are the

        1. form variable name
        2. index into the database that this criteria applies to
        3. operator for comparison

                Possible values: >,<,>=,<=,=,!= (not equal)

                The operator is compared the following way:

                form_variable OPERATOR database_field_value

                That is, (1) above is the left hand side of the
                operator and (2) above is the right hand side of
                the operator.

        4. data type of the field (This is cool because it
                determines how the operator in (3) gets
                applied to the data.

                The data type can be:
                        date
                        number
                        string

                If the data type is a date, then the operator
                for comparison is done after the form value
                and the fields being compared in the database
                are converted to DATES.  

                If the data type is a number, then the operator
                for comparison is done based off of numerical
                if operators (>,<,==, etc.)

                If the data type is a string, then the operator
                for comparison is done based off of string
                if operators (gt, lt, eq, ne, etc.) with ONE
                EXCEPTION.

                If the datatype is a STRING *AND* the operator
                is =, then the search that is done becomes a
                more flexible search.

                1. All the words in the form variable are split
                apart and searched as seperate keywords in the
                text of the fields.

                2. By default, the search on string = string
                is a pattern match search and is not case
                sensitive.

                3. If you want this special string,= combination
                searching to be case sensitive and to match on
                whole words only, you MUST set up two new form
                variables: case_sensitive and exact_match

                If exact_match is on (checkbox) then the 
                combination of string,= in the query criteria
                array will match on WHOLE WORDS only.

                If case_sensitive is on (checkbox) then the
                combination of string,= in the query criteria 
                array must have matching case values (upper/lower).

        There are three main cases that you generally want to set 
        up a query_criteria array for:

        Case 1: General keyword search through the database.

        (a) Set the first field equal to your keywords form variable
        (eg keywords).  You will need something like the follwoign in your
        HTML page.

        <INPUT TYPE = "text" NAME = "keywords" SIZE = "40" 
               MAXLENGTH = "40">

        (b) Set the 2nd field equal to field numbers of the 
        database file you want to search. Since you are going to
        do a keyword search through the whole database, you want
        to comma-seperate these (eg 1,2,3,4,6 where database filed #5 is
        not searched)

        (c) operator is set to =, data type is set to string so
        that the keyword search is done using pattern matching
        and case insensitive.

        @sc_db_query_criteria would be equal to
                ("keywords|1,2,3,4,6|=|string)

        Case 2: Just want to do a search on a field like lname
        and include that search term within URLs in a frontpage such
        as web_store.cgi?lname=Smith.

        (a) Set the form variable equal to the above (lname)
        
        (b) set the 2nd field equal to the field in the database
        corresponding to a lnamename (eg 1).

        (c) set operator =, data type string to do a keyword
        search that is case insensitive.
        
        @sc_db_query_criteria would be equal to
                ("lname|1|=|string")

        Case 3: You want to make a front page form where several
                fields in the database are being searched.

                You want to allow the user to search on an age
                range, plus a keyword search on lname field.

                Here is the setup:

        @sc_db_query_criteria would be equal to
                ("age_low|9|<=|number",
                "age_high|9|>=|number",
                "lname|1|=|string");

        Notice that we set up TWO form variables for allowing
        the age range searching. This is because we allow the
        user to enter the low range and the high range of the age
        they want to search for (database field #9).

        Note, also, that the "age_low|2|<=" means that
        the database row returns a match if and only if the 
        value of "age_low" form field is <=- the value of
        field #9 in the database row (remember counting starts at 0).

        Then, we set the lname form variable to be a keyword
        (=,string) search on database field #1.

        The form itself would have these fields as HTML:

        Lowest Age To Search For:
        <INPUT TYPE=TEXT NAME=age_low VALUE="">
        
        Highest Age To Search For:
        <INPUT TYPE=TEXT NAME=age_high VALUE="">

        Enter Last Name:
        <INPUT TYPE=TEXT NAME=lname VALUE="">

        SPECIAL NOTE: Only criteria that is entered on the form 
        is queried against. If the user leaves one or more fields blank
        (or you neglect to place them on the form), then those fields
        never get queried.

        Of the criteria that IS entered on the form, all the criteria
        must be satisfied for that row before the row will be considered
        safe to display to the user.

        The same goes for the string,= (special case for keywords). 
        When the string is split into keywords separated by whitespace,
        all the keywords must be found in the database field before
        the program will consider it a valid match.

        This logic is there because we want to provide the capability
        of letting the user narrow down the query as they enter more
        data into the form.

        $auth_lib = location of the authentication library files.  By
        default, they are all located in the Library subdirectory and are
        all prefixed with "auth" so that they are easily distinguishable
        as a package.
   
        $auth_server = are you using server based authentication with
        access.conf type stuff?  If so, set this to on in order to take
        advantage of the excellent power of server-based authentication.
        Typically though, you won't have server based authentication opn
        because most people will not have access to those services on
        their ISP.  The benefit of using server based authentication is
        that you can absolutely validate the user name against the server
        authentication database.   
   
        $auth_cgi = If you are not using server based authentication, then
        you must be using cgi based authentication with your own flatfile
        user database.  The distribution copy assumes this, so the
        variable is set to "on".

                
        $auth_user_file is the flatfile user database of validated admins.
        By default, this file should be empty.  When you are ready to add
        yourself as an admin, you will do so and the script will
        automatically add you as a validated admin to this file.  You
        should then expect to see something like the following line in
        that file:
                
        encrypted password|username|admin|fname|lname|you@yourdomain.com

        $auth_alt_user_file is an extra security option.  If alt_user_file
        is defined, when a user registers, their information will be   
        stored in the alternate user file until the system admin (you)  
        copies them over.  Normally, you will just let them register into
        the main file however.
        
        $auth_default_group - Since security for this type of script mainly
        focusses on one user, I'd just leave this as admin and not worry
        about it...I don't anticipate much need for groups and users.  If
        you did decide to incorporate different levels of security into
        the script, you would have to write your own logic into the main
        script anyway.
        
        $auth_add_register gives the script the ability to add new users
        "directly" to the database.  $auth_allow_register determines 
        whether or not the authentication frontscreen shows a registration
        button, or just a logon button.
        
        We recommend that you set both of these OFF unless you are adding
        validated users...If not, someone could theoretically, make a
        mirror form somewhere else and add themselves to the
        database...these options disable the add function entirely until
        the time when you actually want to do it.

                
        Thus, by default, we have set both these variables to "off".
        Therefore, the first time you try to run the administrative
        functions you will hit a deadend.  There will not be any validated
        admins in the user file AND there will be no way for you to  
        register yourself as an admin.
        
        What you'll have to do is change these two variables in the setup
        file to "on".  Then, you will call up the script from the web   
        with a URL something like the following:
        
        http://www.you.com/cgi/Mailing_list/mailing_list.cgi?action=admin
        
        At that point, you should now get the login screen, but this time you
        should see an extra button, "Register for an Account".  You
        should click this button.  Then, you will get the login screen.
        Fill out the information and submit the data to the script.  The   
        script will then encrypt your password, add you as an admin and 
        tell you that you may now log in as an administrator.
        
        At this point it is time for you to go back into the setuyp file
        and change both the variables back to "no" so that noone else can
        add themselves as an admin.
        
        $auth_email_register determines if this script should mail the admin a
        note when someone registers...I would leave this on so that you   
        have a double check that no one uninvited can use the admin
        interface.
        
        $auth_admin_from_address and $auth_admin_email_address help  
        mail-lib.pl send email notification on new registrations.  If you
        are having trouble with the mailing options, make sure you go into
        mail-lib.pl and set the path to sendmail correctly.  Also, if you
        are using aliases, you need to remove the -t option.
        
        $auth_session_length determines the number of days session files stay
        around for. Session files are used to keep track of the
        admin information while they are performing a mass mailing.  The
        script needs to be able to keep track of the admin from screen to
        screen to make sure that she is validated.  This info is kept in a
        session file in the Session_files directroy.  However, these files
        are only needed "while" the admin is performing administratove  
        functions...which might only be 5 minutes.  Thus, the scriopt
        automatically deltes these files when they are no longer needed at
        some time interval defined by you.
        
        $auth_session_dir is the location of the session file directory.
        
        $auth_register_message is the mesage that you will receive after
        you have registered.
        
        $auth_allow_search determines whether or not the search button will
        appear on the registration screen...since you will have direct
        access to the user file, you don't need this option at all.
        Typically, it is used so that new registrants can search the user
        file for usernames so that they will not pick one already in use.
        
        $auth_generate_password if set to "on", directs the script
        to generate a password for you. Leave it off if you want to
        generate your own.
        
        $auth_check_duplicates checks to make sure no one has already
        used your username that you submit when creating a new validated
        user. It is best to leave this "on", but there shouldn't be
        so many valid users that you'd forget.

        $auth_password_message is the message that is sent to the user
        with their password if $auth_generate_password is set to on.
        
        @auth_extra_fields and @auth_extra_desc are arrays used to keep track
        of information in the user file besides username and password.  
        Leave this the way it is unless you really need to collect data on
        admins.

	output_html_query_form is responsible for printing out the query
	form which users can submit their search criteria on.  It
	is not essential of course that you have the script produce the HTML
	frontend search form.  You can write your own HTML form provided
	you make sure to hardcode the essential information that the
	script will need to proces the request.  For example, you must
	make sure that the FORM tag points to db_search.cgi and that you
	include the setup file information as a hidden variable.  for
	example:

	 <INPUT TYPE = "hidden" NAME = "setup_file"
                VALUE = "address_book.setup">

	Also, you must make sure that you include a submit button with the NAME
	parameter set to "submit_search". For example:

	<INPUT TYPE = "submit" NAME = "submit_search"
               VALUE = "Submit These Search Parameters">

	The VALUE parameter can be anything of course.

	Further, you must include input fields so that the user can submit
	search criteria.  These fields must have NAME parameters which
	correspond to %db and @sc_db_query_criteria.

	Also, you may use the special form variable "sort_by" to give the
	user the ability to choose which field they want to sort the returned
	databse rows by.  For example, the following SELECT list gives the
	user several options.

  	<TH>Sort by which field</TH>
  	<TD><SELECT NAME = "sort_by">
  	<OPTION VALUE = "0">First Name
  	<OPTION SELECTED VALUE = "1">Last Name
  	<OPTION VALUE = "2">Email
  	<OPTION VALUE = "9">Age
  	</SELECT></TD>
  	</TR>

	Notice that the VALUE parameter corresponds to the index of that
	field in %db

	Finally, be aware that all of these subroutines take advantage of
	the qq method of printing such that the print delimiter is
	changed to a tilde (~) instead of a quoite (").  This is done so
	that it is easier to display HTML tags which use quotes
	themselves.  Otherwise we would have to backslash all quote marks
	the same way we backslash at signs (@).  Of course, all tildes
	must be escaped with backslashes aasd a result, but tildes are
	less common in HTML code.

	search_results_header displays the header HTML of the search
	results page.  Make sure that if you want users to be able
	to use a "see more hits" button, that you create a self
	referential form tag pointing back to db_search.cgi.

	If you are providing sort options, you must include a hidden form
	field for sort_by so that you pass that information on to every
	following screen created by this script.  

	Typically, you just need to include the following lines along
	with your template HTML:

  	<FORM METHOD = "post" ACTION = "db_search.cgi">
  	<INPUT TYPE = "hidden" NAME = "session_file"
               VALUE = "$session_file">
  	<INPUT TYPE = "hidden" NAME = "sort_by"
               VALUE = "$index_of_field_to_be_sorted_by">

	search_results_body defines the format of how matched databse rows
	should be displayed to the user on the search results page.  This
	is a little bit more complicated than simply writing HTML code
	within a print qq block.  This routine basically is responsible
	for printing out all the database rows so you need to carefully
	define the format of rows.  In the example setup file with this
	distribution, we have commented our own setup so you can get the
	idea of how you could achieve our own format.

	search_results_footer displays the HTML footer for the search
	results page.  Note that if you want to provide a "See more hits"
	button, you must include both the hits_seen and the setup_file
	variables as hidden form fields.

	no_hits_message displays the message the user receives if their
	search turns up no hits.

	Users is a subdirecotry containing user files with validated
	users.  The directory must be read, write and executable by the
	web server and all user files must be readable and writable.

db_search.cgi is the main script for the application.  It must be redable
	and executable by the web server.

			   RUNNING THE SCRIPT

To run the script, simply point your browser to the main script with a URL
like the following:

http://www.foobar.com/cgi-bin/Database_seacrh/db_search.cgi?setup_file=address_book.setup

Don't forget that you need to pass the setup_file variable whenever you
call the script