Paul Phillips <[email protected]> said:
> This isn't going to fly with the current Referer implementations. Too
> many browsers lie, especially all the Mozillas which constitute over half
> the web clients currently. Even if every version written from now on
> were accurate, the sheer number of liars deployed will results in too
> many false positives. I get dozens of the MCOM home URL in my Referer
> logs on a daily basis.
Could you explain exactly what you mean by lying
browsers? Are they putting incorrect URLs into
the Referer field, or places the user hasn't
actually been to, or what?
How widespread is this? What other browsers lie?
We've been thinking about tracking Referer fields
to see where people are coming into GNN, and where
they come from. It sounds like that may not end up
to be as useful as I thought.
> There also needs to be a more reliable way of ascertaining the maintainer
> of a page. There are a few machine heuristics and a few more human ones
> that can work, but no reliable method. Even a ~user URL isn't
> guaranteed to be able to receive mail at the same machine.
For many months now, we've used the <LINK> tag in
all GNN documents. The main reason was because the
Lynx browser used to (maybe still does?) check for
broken links, and reports these errors by email to
the maintainer of the document containing the bad
links. The tricky part was that unless otherwise
specified, the errors would go to the maintainers
of Lynx (hi, Lou!), *not* GNN. Adding a tag like:
<LINK REV=MADE HREF="mailto:[email protected]">
caused the errors to be mailed to whoever was
specified in the HREF attribute.
This seems to have fixed the problem (at least we're
not hearing from the Lynx folks ;), but still has
some problems:
- the errors were formatted to be readable
by humans, which makes tracking and
automation difficult
- at least in the case of Lynx, apparent
network problems were sometimes classified
as bad links; so we ignore the 'bad link'
messages that refer to, say, http://info.cern.ch ;)
In a related problem, I've noticed that our error
logs are sometimes filled with URL requests for a
document that really doesn't exist in GNN, but is
close. For instance, /gnn/gnnhome.html is *probably*
GNNhome.html, our home page.
I'd guess that these are from people typing in URLs
from newspaper articles or other print media.
It seems silly to simply return a 404 error for these.
I've thought about hacking the server to instead bring
up a nicer page that says, in effect, 'Sorry, you've
dialed a wrong URL. Perhaps you mean one of these...'
Has anyone experimented with a more user friendly
approach to handling errors like this? With the right
interface, perhaps there could be a button that says
'Report this URL as a bad link,' which could then link
to a CGI script that could catalog the bad link.
-- John Labovitz Technical Services Manager, Global Network Navigator <http://gnn.com/> O'Reilly & Associates, Sebastopol, California, USA (+1 707 829 0515)