Re: Getting searching to work

Tim Berners-Lee ([email protected])
Thu, 28 Jan 93 08:14:39 +0100


> Date: Wed, 27 Jan 93 16:33:38 GMT
> From: [email protected] (Peter Flynn)
>

> How is it best to add a simple search facility to a httpd server?
> Or, better put, if a user wends hir way into one of my html menus,
> is there a simple addition I can make that will add a search
capability?

Not as simple as you'd like, I suspect. The httpd daemon doesn't
support a search directly. What you can do is (hack httpd or) run
another daemon which is completely written in sh or csh or perl (pick
your favorite). There are some examples on the web. Then you have
something like

<dt><a name=something
href="http://curia.ucc.ie:9000/keywordsearchsearch/joyce">
Search<dd>the above texts for a name or keyword

When the user follows the link, the special server on
port 9000 gets a
GET /keywordsearchsearch/joyce
request, and returns a search panel document:

<head>
<isindex>
</head>
<body>
Give keywords, or words from the title, to find books
by James Joyce which match all keywords given.
</body>

The <isindex> flag tells the www program that the document is a
search panel and enables the FIND command. (On smarter browers it
enables the search text input field.)

When the user types a keyword, that same special server gets a
different request:

GET /keywordsearch/joyce?portrait+young

Your script reads that from stdin and must write the result back to
stdout. Like

#! /bin/csh
request = ( `echo "$<" )

'echo request[2] | sed -f request.sed ' | sed -f reply.sed

where request.sed is something like

s|^\([^/]*\)/\([^?]*\)?\(.*\)| pat -\1 -cat \2 \3|g
s|+| |g

(I no nothing of pat, so that is all made up. Notice I used parts of
the address of the serach panel to specify options to pat) The
output is formatted into a hypertext file, in the example by
reply.sed
which has to generate a hypertext document with a valid reference
to the found documents with their addresses on your original
httpd server.

Which all is in fact simpler than it looks -- largely because the
thibng is just a program runnng from stdin to stdout which you
can test on a terminal. When you run it under inetd (just like
httpd is run, but on port 9000) it is inetd which takes care of
attaching stdin and stdout to the client.

As pat sounds like a serious peice of retrieval machinery, it
would certainly be worth wrapping it up as a W3 server to make it
available on the web.

A couple of hints: 1. Put lots of parameters into the address
of the serach panels so that you can put pointers to all kinds
of different pat features if you needs them
2. In the search panel document which your port 9000 server script
generates, put a pointer to related serach panels, help pages etc.

> What I'd like is something like:
>

> <dl>
> <dt><a name=dub href="dubliners.html">Dubliners<dd>by James Joyce
> <dt><a name=ulysses href="ulysses.html">Ulysses<dd>by James Joyce
> [etc]
> <dt><a name=something href=somepointer>Search<dd>the above texts
for
> a name or keyword
> </dl>

This is basically what I have described. When the guy follws the
link, he gets back a micro-document which tells him about the search
he can do. This is the WWW model. The Gopher model, which Dan
Connolly prefers, is that he should immediately get prompted for
keywords with a default search panel. I'll discuss that in a separate
message.

Tim