Re: Network Abuse by Netscape? -- Was: Mosaic replacements, etc...

Brian Behlendorf ([email protected])
Sat, 22 Oct 1994 12:18:14 -0700 (PDT)


On Sat, 22 Oct 1994, Robert Raisch wrote:
> On Sat, 22 Oct 1994, Internet Presence Inc. wrote, on the inet-marketing
> mailing list, regarding the approach Netscape takes to retrieving all of
> the graphical elements of a web document all at once:
>
> > We've noticed that now, 4-5 "hits" will just pop up in the logs at
> > once when people use Netscape or WebExplorer. It's no big deal.
>
> Sorry, but it is a very big deal indeed. I have great concern over the
> technical implications of this approach and I am not alone.

I'm not as concerned, really. Other than the overhead in opening
separate TCP connections, it's still the same amount of data. If
you are totally optimizing for people on the slow end of a 14.4 modem,
then this is the right thing to do in terms of speed - even waiting for
the whole initial document to download before issuing an MGET on the
inlines would be unacceptible in most situations.

Server performance shouldn't be a problem - most server software I've
dealt with and most platforms I know people are running should easily
be able to handle 3-4 hits/second. Most server models are pretty good
about concurrent connections, too - those that are started by inetd are
their own processes anyways so there's an unlimited number of those that
can run, those that run as daemons which fork can handle as many concurrent
connections as memory for the forked processes, and those that run with a
real-time configurable number of concurrent connections have solved that
problem anyways.

Bursty network load is the only thing I'd worry about.

> What Netscape has done, in a sense, is to abrogate its responsibilities
> for efficient behavior at the expense of the network at large and those
> who choose to operate http servers.

Well actually, one thing I think they've done is demonstrate a fundamental
weakness in the HTTP/inline image model. One solution that is in line
with the talk here and on the HTTP working group list is that servers
could combine their documents and inline images into a MIME document,
multiplexing the elements into fixed-size chunks of something small like
2Kbytes apiece or something. I don't know what the overhead would be in
that, but if it could be reduced to below that of the TCP connection
overhead, it's a win. So, if you had:

1. a 6K document (a)
2. a 10K inlined image (b)
3. a 2K inlined image (c)

The document would look something like this:

2k of a
2k of b
2k of c
2k of a
2k of b
2k of a
2k of b
2k of b
2k of b

An additional optimization would be to not multiplex inline images in
until the reference for the inlined image appears in the document itself.
I.e., if c appeared in the document half-way through, its position would
be #5 rather than #3, thus a document's download could be aborted at any
time and you wouldn't have lots of extra bits that you don't have a home
for.

Benefits: downloads are one continuous stream, a single server action
document authors can start thinking of their document and images
as one complete whole
having 10 2K inlined images is now totally equivalent to having
1 20K image, giving the document author much more
flexibility.

Drawbacks: server-side assembly is computationally expensive
(well, the server can cache it, though, or not do it when instructed).
requires changes to both servers and clients. HTTP/2.0, anyone?

> Under Sun/OS, the kernel is preconfigured to provide only 32 IO slots
> per process. When your server (either inetd if you run under that
> mechanism or the actual www server itself) receives the requests which
> comprise a document with 16 graphical elements on it, that single user
> has consumed half of the available IO slots for the length of time it
> takes to fulfill the request. (Yes, I realize you can up the IO slots
> and remake the kernel, that is not the point.)

Just to clarify - the number of simultaneous TCP connections is
configurable at run time in Mozilla, the default being 5 I believe.
Also, for which servers does the above model hold? Sounds like
it only holds for a multithreaded daemon like MDMA.

Brian