D. J. Bernstein
Internet mail
Internet mail message header format
Tokenizable field values
A
field value
may be structured as a series of
tokens, comments, spaces, and tabs.
The semantics of a tokenizable field value depend only on its tokens.
A writer can freely insert comments, spaces, and tabs between tokens.
For example, some writers will insert spaces inside long addresses:
To: cryptographic-cookie-940d3af4f1357d203c7afd5162e9d06e
@heaven.af.mil
Readers should identify tokens as discussed below,
and ignore spaces and comments during parsing.
Unfortunately,
many readers use ad-hoc parsers that do not extract tokens correctly.
It is a bad idea to put spaces, tabs, or comments at unusual locations.
822bis has a huge number of new rules prohibiting or discouraging
various spaces, tabs, and comments.
Tokens
There are three types of tokens:
- Punctuation: <, >, comma, semicolon, colon.
- Address symbols: at sign (@), dot.
- Encoded strings: atom, domain literal, quoted string.
An atom is a string of one or more characters
terminated by the end of the field value or by any of these characters:
space, tab, @, <, >, [,
left parenthesis, comma, semicolon, colon,
dot, double quote.
(Note that adjacent atom tokens must be separated by at least one
comment, space, or tab.)
An atom represents itself as a string.
For example,
heaven
is an atom representing the 6-byte string "heaven".
A string containing an unusual character, such as space or semicolon,
cannot be encoded as an atom.
(822 prohibits all control characters, byte 127 and bytes 0 through 31,
in atoms, as well as ], backslash, and right parenthesis.)
A quoted string is
a double quote, zero or more quoted string chunks, and another double quote.
A quoted string chunk represents a single character;
it can be
- that character, if it is not \015, backslash, or double quote; or
- a backslash followed by that character.
A quoted string represents the concatenation of the
characters represented by the quoted string chunks.
For example,
"heaven"
and
"h\e\ave\n"
are two quoted strings,
each representing the 6-byte string "heaven";
and
"\\\\\\"
is a quoted string representing three backslashes.
Any string can be encoded as a quoted string.
A domain literal is
a left bracket, zero or more domain literal chunks, and a right bracket.
A domain literal chunk represents a single character;
it can be
- that character, if it is not \015, backslash, [, or ]; or
- a backslash followed by that character.
A domain literal represents the concatenation of
(1) a left bracket,
(2) the characters represented by the domain literal chunks, and
(3) a right bracket.
For example,
[127.0.0.1]
and
[\1\2\7\.\0\.\0\.\1]
are two domain literals,
each representing the 11-byte string "[127.0.0.1]".
Any string starting with [ and ending with ]
can be encoded as a domain literal.
Several clients
(reportedly: AMS and various IMAP servers)
cannot handle domain literals containing colons:
[FF02::3492:A98F]
Comments
A comment is a left parenthesis, zero or more comment chunks,
and a right parenthesis.
A comment chunk can be
- a character other than \015, backslash, or parentheses;
- a backslash followed by any character; or
- a comment.
Some examples of comments:
(D. J. Bernstein)
(comment (nested (deeply)) (and (oh no!) again))
(\)\\)
(by way of Whatever <redir@my.org>) (generated by Eudora)
Examples
The field value
":sysmail"@ group. org, Muhammed.(the greatest) Ali @(the)Vegas.WBA
contains
- token: quoted string representing the 8-byte string ":sysmail"
- token: at sign
- space
- space
- token: atom representing the 5-byte string "group"
- token: dot
- space
- token: atom representing the 3-byte string "org"
- token: comma
- space
- token: atom representing the 8-byte string "Muhammed"
- token: dot
- comment (the greatest)
- space
- token: atom representing the 3-byte string "Ali"
- space
- token: at sign
- comment (the)
- token: atom representing the 5-byte string "Vegas"
- token: dot
- token: atom representing the 3-byte string "WBA"
This is the first example in RFC 822,
but most mail-reading programs can't handle it.
Pine 3.91 can't even handle
God@heaven. af.mil
correctly; it truncates the address after the first dot.