Yes... let's nip this sort of thing in the bud, shall we?
HTTP is not Internet Mail. HTTP is a protocol based on a reliable byte
stream, such as TCP. A reliable byte stream does not munge
whitespace. It doesn't lose characters because it translated to EBCDIC
and back.
HTTP is not for the human eye: it's for a piece of software that groks
TCP (or perhaps some other reliable transport eventually...).
It is not the case that there are 1000s of broken HTTP implementations
out there that we need to support. There are perhaps 10 or 20, with 2
or 3 represending 99% of the traffic.
Let us keep the HTTP protocol clear and free of such kludgery.
In the HTTP headers, A line is terminated by CRLF. That's octet 13,
octet 10. Anything else is broken. One should not expect to use
idioms such as:
printf("HearderName: stuff\n")
or
echo "HeaderName: stuff"
successfully. Care must be taken to terminate lines with CRLF.
Similarly for the blank line that ends the headers: I'm not sure if
RFC822 specifies that the line shall be empty or not, but I'd support
a clarification in HTTP that says it shall.
The data stream is something different altogether. The possible
content-transfer-encodings are:
7bit -- 7bit text, lines terminated by CRLF (no reason to use this)
8bit -- 8bit text, lines terminated by CRLF
binary -- 8bit data, not necessarily any linebreaks anywhere.
I believe binary is the default Content-Transfer-Encoding in HTTP
(though I believe I saw 8bit documented as the default somewhere...).
This means, for example, that you shouldn't expect html lines to
be terminated in any particular way. Of course it doesn't matter how
they're terminated except inside PRE elements. There, I'd say that
a newline is (CR|LF|CRLF).
Daniel W. Connolly "We believe in the interconnectedness of all things"
Software Engineer, Hal Software Systems, OLIAS project (512) 834-9962 x5010
<[email protected]> http://www.hal.com/%7Econnolly/index.html