D. J. Bernstein
Internet mail
Internet mail message header format

Threading: Message-ID, References, In-Reply-To

Often a message is a response to a previous message. The parent of a response is the message that it responds to.

The parent may have been a response to yet another message. There can also be other messages with the same parent, with the same grandparent, etc. A collection of messages with a common ancestor is usually called a thread.

Readers can use Message-ID and References to see the structure of a thread.

Message identifiers

In theory a message identifier is tokenizable; it contains a < token, an encoded address, and a > token. Two examples:
     <9609171955.AA24342@cmstex2.maths.umanitoba.ca >
     <"16913 Tue Apr  9 14:24:59 1996"@bnr.ca>
I recommend against using quoted strings, spaces, tabs, or comments inside message identifiers; there is no reason for identifiers to contain user-interface fluff.

Note that the Internet mail address encoded in a message identifier is usually not an address that can receive mail.

Message-ID

The value of a Message-ID field is a message identifier. For example:
     Message-ID: <19951223192543.3034.qmail@silverton.berkeley.edu>
The Internet mail address encoded in Message-ID is required to be a unique worldwide identifier for this message.

It is the writer's responsibility to obtain authorization from the owner of the domain used in Message-ID. It is up to the owner to decide how to allocate a different box part for each new message.

822bis says that the domain used in Message-ID ``SHOULD be the domain name of the host on which it was created.'' However, in reality, many dialup hosts don't have domain names, and many hosts behind firewalls don't have public domain names. Some organizations have set aside banks of Message-ID names that do not refer to hosts.

Any message that starts or continues a thread needs a Message-ID. Not all messages contain Message-ID; for example, bounce messages from qmail do not contain Message-ID, and the Bell Labs upas mailer never creates Message-ID.

Security and reliability issues

In practice, Message-IDs are not necessarily unique. For example, Internet Mail Service 5.0.1457.3 reportedly copies Message-ID into a bounce message from the message being bounced; and Microsoft Internet Mail reportedly uses the same Message-ID for every message. Furthermore, from a security perspective, an attacker can easily forge a message with a duplicate Message-ID.

This means that it is neither secure nor reliable to discard a message with the same Message-ID as a previous message. If you want to discard duplicates, you should compute a cryptographic hash of each message.

References

In theory the value of a References field is tokenizable, consisting of a series of words and message identifiers.

In practice, References has nothing other than the message identifiers, each preceded by exactly one space, all on one line:

     References: <19980506192030.26456.qmail@cr.yp.to> <19980507220459.5655.qmail@warren.demon.co.uk> <19980508103652.B21462@iconnect.co.ke> <19980509035615.40087@rucus.ru.ac.za>
This is the USENET References syntax. Writers should follow the same format.

Writers use References to indicate that a message has a parent. The last identifier in References identifies the parent. The first identifier in References identifies the first article in the same thread. There may be more identifiers in References, with grandparents preceding parents, etc. (The basic idea is that a writer should copy References from the parent and append the parent's Message-ID. However, if there are more than about ten identifiers listed, the writer should eliminate the second one.)

In-Reply-To

The value of an In-Reply-To field is tokenizable, consisting of a series of words and message identifiers.

According to 822, In-Reply-To lists parents, and References lists ``other correspondence.'' Some MUAs do in fact put parents into In-Reply-To. However, very few readers are able to parse the complicated syntax of In-Reply-To specified by 822, let alone the syntactically incorrect fields that show up in practice:

     In-Reply-To: Your message of 10 Jan 1998 20:22:41 -0000
       (WRONG)

The USENET References syntax, as described above, has been much more successful than the 822 In-Reply-To, thanks to its simplicity. I recommend that writers support References as described above. 822bis recommends generating In-Reply-To as well as References; however, I don't know if there are any readers that understand In-Reply-To without also understanding References. There are some writers that generate In-Reply-To without References; I recommend that, for backwards compatibility, readers look for identifiers in In-Reply-To and append them to References if they are not already included in References.