D. J. Bernstein
Internet publication
FTP: File Transfer Protocol
Files, usernames, and pathnames
An FTP server provides access to a collection of files.
Each file is identified by
a server-defined username
and a server-defined pathname.
Many servers provide public files under the standard username
anonymous.
Most of these servers demand a
password,
but allow any password that ends with @.
The Netscape FTP client uses the password mozilla@,
for example.
FTP defines three types of files:
text files,
binary files,
and
directories.
Text files and binary files are collectively known as
regular files.
In theory, the client needs to use different
requests
to retrieve different types of files.
In practice,
servers store all regular files internally as binary files;
when the client asks for a text file,
the server reads a series of lines from the binary file
in the server's favorite text format,
and sends the lines separated by \015\012.
Pathnames and encoded pathnames
A pathname is any string of bytes
beginning with a slash and not containing \000.
An encoded pathname is a string of bytes not containing \012.
It normally represents the pathname obtained by
replacing each \000 in the encoded pathname with \012.
However, if it does not start with a slash,
it represents the pathname obtained by concatenating
- the server's current name prefix;
- a slash, if the name prefix does not end with a slash; and
- the string obtained by
replacing each \000 in the encoded pathname with \012.
For example, if the name prefix is /home/joe,
the encoded pathname /public represents the pathname /public;
the encoded pathname tex represents the pathname /home/joe/tex;
the encoded pathname ab\000c represents the pathname /home/joe/ab\012c.
In practice,
several bytes cause problems when they appear in pathnames:
- \012:
Most clients simply send \012 to the server,
producing a disastrous loss of synchronization,
instead of encoding \012 as \000.
- Space (\040):
Some servers have trouble with spaces in
parameters.
- " (\042):
Most clients and servers
do not correctly handle double quotes in
responses to
PWD
and
MKD
requests.
- \377:
Some clients and servers do not use the
TELNET string \377\377 correctly.
Adding further to the problems
is a widespread document that recommends encoding \015 as \015\000.
I strongly recommend against this;
it breaks current use of \015 without fixing anything.
Pathname display
By convention,
any FTP pathname that is a valid UTF-8 string
is displayed as a UTF-8 string.
In particular,
any 7-bit FTP pathname is displayed as an ASCII string.
RFC 959 generally requires 7-bit requests and responses except in
TELNET strings.
This requirement is obsolete.
Some FTP servers provide access to local file collections
in which file names are, by convention, displayed as ISO-8859-1.
These servers can easily translate file names from ISO-8859-1 to UTF-8
before providing them as pathnames to FTP clients.
When the FTP client asks for a file under a pathname p,
the server can behave as follows:
- Go to step 4 if p is not a valid UTF-8 string,
or if some of the characters in the pathname cannot be expressed in ISO-8859-1.
- Translate p from UTF-8 to ISO-8859-1.
If the translation fails temporarily
(because of, e.g., insufficient memory), print an error and stop.
- Attempt to find the file under the translated name.
If the attempt fails temporarily, print an error and stop.
If the attempt succeeds, operate on the file and stop.
- Attempt to find the file under the name p.
If the attempt fails temporarily, print an error and stop.
If the attempt fails permanently, print an error and stop.
If the attempt succeeds, operate on the file and stop.
This provides a reasonable level of backwards compatibility
with clients using ISO-8859-1 pathnames.