D. J. Bernstein
Internet mail
Internet mail message header format
Headers
A header is a sequence of nonempty lines,
consisting of a concatenation of fields.
A line is a string of zero or more bytes.
A line is empty if it contains zero bytes.
Every line in a header contains one or more bytes.
A field is a sequence of one or more nonempty lines.
The first line does not begin with space or tab.
Subsequent lines, if there are any, each begin with space or tab.
Each field includes a name and a value.
For example, the following header contains five fields:
Received: (qmail-queue invoked by uid 666);
30 Jul 1996 11:54:54 -0000
From: "D. J. Bernstein" <djb@silverton.berkeley.edu>
To: fred@silverton.berkeley.edu
Date: 30 Jul 1996 11:54:54 -0000
Subject: Go, Bears!
The first field contains two lines:
Received: (qmail-queue invoked by uid 666);
30 Jul 1996 11:54:54 -0000
Note on terminology: some people refer to fields as ``headers.''
Non-ASCII characters
Users often send bytes between 128 and 255,
relying on out-of-band agreement to specify the character set.
(One system administrator in France has reported that 20% of the
messages received by his users contain such bytes.)
However,
822 requires that each byte in a header line be between 0 and 127 inclusive.
Furthermore,
sendmail gets rather confused by bytes between 128 and 159;
it uses them for internal macro handling.
Other byte restrictions
822 specifies a particular line encoding,
with each line terminated by \015\012;
so the byte sequence \015\012 cannot appear inside a line.
This restriction is also enforced by the
message encoding used in SMTP.
UNIX mail programs store headers as UNIX text files,
so the byte \012 cannot appear anywhere inside a line.
If a message with a bare \012 is transmitted through sendmail,
the \012 will be treated as a line ending;
if a message with a bare \012 is given to qmail,
the message will be rejected.
822bis prohibits \012.
sendmail corrupts any \0 in a message header.
Some mailers truncate lines at \0.
822bis prohibits \0.
822bis also prohibits \015.
822 also discouraged tabs,
and prohibited one use of backspaces.
Header termination
An Internet mail message is a sequence of lines, starting with a header.
The header is terminated by an empty line (or by the end of the message).
The header is not terminated by an invisible line,
i.e., a line consisting entirely of spaces and tabs,
unless the line is empty.
Users occasionally create invisible lines,
usually for aesthetic reasons:
Received:
from [censored] by pinerolo.piw.it
with smtp
(Linux Smail3.1.28.1 #5)
id m0uqMBq-000BzwC; Tue, 13 Aug 96 17:19 GMT+0100
There are three popular strategies for detecting the end of the header:
- Stop reading the header at the first empty line.
Unfortunately, there are a few broken gateways that corrupt messages,
inserting a space into every empty line.
- Stop reading the header at the first invisible line,
whether or not it is empty.
This handles the broken gateways mentioned above.
However, it violates 822;
for example, it completely misinterprets the valid Received line shown above.
Eudora reportedly does this,
so I recommend that writers avoid generating invisible lines.
822bis prohibits invisible lines.
- Stop reading the header at the first line that cannot possibly
be a header line:
(1) an empty line or
(2) a line that does not start with
a space, a tab, or a
field name
followed by a colon.
I recommend this strategy;
it correctly handles 822-compliant messages
as well as practically all of the broken messages.