Ok folks. It's time we got busy and made this better. Here is a proposal
for a simple site definition file. Let's hash out some of the issues and
then do it. The sample file is on my server right now
(http://www.bsdi.com/site.idx).
What we need to accomplish
--------------------------
1) agree on the filename of the site.idx file
2) agree on the format of the file (either "foo: data" or something else).
3) agree on the initial content and semantics of the index file
4) setup an email address where people can send registration forms
(these don't have to be processed right away, yet).
Constraints
-----------
1) The data format must be extensible (need I even say it)
2) It must be simple enough that we can get started soon
3) It must allow for meta-indexing other protocols in the future
4) the database must be distributed (so you can do the search on
a nearby site).
What we will need next
----------------------
1) software to accept and process registration forms (via email)
2) software for updating registration (a robot)
3) software for building the indexes (wais?)
4) software for searching the index and a site to host it
I believe that the above is all fairly easy.
Let the indexing begin!
-----------------------
This document is a proposal. Discussion to take place on:
[email protected]
Or send electronic mail to Tony Sanders _<[email protected]>_
The latest version of this document_ is available online at:
http://www.bsdi.com/HTTP:TNG/www-indexing.etx
To get the process of a WWW global index started I would like to propose
the following for a site registration file format. This data should
be accessible on your server as http://server/site.idx_
To jumpstart the registration process you will have to email one of
these to some address yet to be determined (thereafter, your file will
be occasionally updated by an automated retrieval process). Of course,
you can always email in a new one if something important changes.
We can extend the syntax later to include pointers to other resources.
WWW-wondering-robots would use this file to determine the server's
preferences for indexing. For example, we could add a field "wwwwr:
never" (or "0000 / 2400" for always). If you would like additional
information to be indexed we could invent a tag that points to those
documents (or whatever we want to do).
I believe this covers the basics and sufficiently allows for future
extension.
First an example, then I will explain each field
(this file is http://www.bsdi.com/site.idx_):
Name: www.bsdi.com:80
Organization: Berkeley Software Design, Inc
Organization-Type: Commercial software developer
Contact: Tony Sanders
Postal-Address: 3110 Fairview Park Dr, Suite 580;
Falls Church, VA 22042
Electronic-address: [email protected]
Telephone: +1 800 800 BSDI
Location: Fairfax County, VA, USA
Latitude-Longitude: 77 12 00 - / 38 51 37 +
Timezone: -0500 (Eastern Standard Time)
Written-By: [email protected] (Tony Sanders);
Mon Oct 25 11:39:14 CDT 1993
Access times: 0000 / 2400
Policy: None
Description: This site contains public sources and information
related to BSDI's software products (eg: BSD/386).
Currently all sources are for publicly contributed
BSD/386 utilities.
Keywords: BSD, OS, source, berkeley, BSD/386, BSDI
Index: /info/ BSDI and BSD/386 Information
Index: /bsdi-man/ BSD/386 hypertext manual pages
Index: /official_patches/ BSDI 1.0 Official Patches Archive
Continuation lines begin with white space.
Case is only significant in data that requires it (e.g., inside URLs).
The following isn't a complete specification, but I think it's enough to
get us started. Most of this is stolen from other formats.
Name
----
Server name (including an option port number).
host[:port]
Host is a fully qualified domain name or a dot-quad ip address.
port should be a numeric. For example:
Name: www.bsdi.com:80
Organization
------------
Organization name. For example:
Organization: Berkeley Software Design, Inc
Organization-Type
-----------------
A general classification of what you do. For example:
Organization-Type: Commercial software developer
Contact
-------
Name of a human to contact. For example:
Contact: Tony Sanders
Postal-Address
--------------
Postal address. For example:
Postal-Address: 3110 Fairview Park Dr, Suite 580;
Falls Church, VA 22042
Electronic-address
------------------
Email address contact for the server. For example:
Electronic-address: [email protected]
Telephone
---------
Telephone number for contact. For example:
Telephone: +1 800 800 BSDI
Location
--------
General geographical location. For example:
Location: Fairfax County, VA, USA
Latitude-Longitude
------------------
Degrees minutes and seconds, for drawing cute maps. For example:
Latitude-Longitude: 77 12 00 - / 38 51 37 +
Timezone
--------
Offset from GMT and then a textual name. For example:
Timezone: -0500 (Eastern Standard Time)
Written-By
----------
Author of this text, including the last update time. For example:
Written-By: [email protected] (Tony Sanders);
Mon Oct 25 11:39:14 CDT 1993
Access-times
------------
When the server is available (in local 24 hour time). For example:
Access-times: 0000 / 2400
Multiple entries are allowed.
Policy
------
Any policy statement you wish to make (e.g., the GNN server might
wish to give registration information here). For example:
Policy: None
Description
-----------
A brief description of the server (used for building meta-indexes).
For example:
Description: This site contains public sources and information
related to BSDI's software products (eg: BSD/386).
Currently all sources are for publicly contributed
BSD/386 utilities.
Keywords
--------
Keywords for constrained searches. The words are comma separated,
use "text, text" if you need to embed a comma, but it's best to have
simple words and not phrases. For example:
Keywords: BSD, OS, source, berkeley, BSD/386, BSDI
Multiple entries are allowed.
Index
-----
These are pointers to information indexes that the server supplies.
The first word is a partial URL (relative to the top of the server)
and the rest of the text is used to build the meta-index. For example:
Index: /info/ BSDI and BSD/386 Information
Multiple entries are allowed.
Tony_Sanders_
.. _Tony_Sanders http://www.bsdi.com/hyplan/sanders.html
.. _document http://www.bsdi.com/HTTP:TNG/www-indexing.etx
.. _http://www.bsdi.com/site.idx http://www.bsdi.com/site.idx