Hi Simon,
Servers that correctly implement the FTP RFC will use the port
immediately adjacent to (and lower than) the control port for the
default data port. Hence, for most servers port 21 is the control port
and port 20 is the data port. However, it seems that many servers
don't do this and just grab an ephemeral port for the data port.
The "port 20" and the "ephemeral port" things are not mutually exclusive.
The terms "client" and "server" can get a little confusing in active
FTP, because the program we refer to as the "FTP client" actually acts
as a TCP server for establishing the data port, and vice-versa. So, for
the sake of clarity, I'm going to refer to the "FTP client" as the one
sending the PORT command and subsequently listening in that port number,
and the "FTP server" as the one receiving the port command, and
attempting to connect to that address/port.
The FTP client will use the PORT command to tell the FTP server which IP
address & port to connect to. The server will then connect *from* port
20 to whatever that ephemeral port is.
Remember, there's always two ports in a TCP connection. There's a source
and a target port in every TCP datagram. We usually ignore the port
assigned to program that calls connect() (the TCP client, or FTP server
in this example) because the connect() API traditionally picks a random
port to connect *from*, and only cares what it connects to. However,
FTP is an exception to that rule. In FTP, the server (TCP client) must
use bind() to force the source port to be 20, in order to follow specs.
That way, a firewall can be configured to allow datagrams matching port
20 -- which is a lot better that trying to explicitly allow every
possible ephemeral port.
The problem with active FTP is usually due to NAT (or similar tools)
changing the address (and often the port as well). Since that
information is negotiated at run-time in the control data, for FTP to
work properly through NAT, the router must read every datagram and look
for PORT or PASV strings, and change them accordingly... Too many NAT
routers, especially in the early days, didn't do that, and therefore FTP
didn't work through NAT.
A poorly configured firewall could certainly have the same effect. But
the problem is much easier to solve in a misconfigured firewall than it
is in NAT. Just allow all packets with a source/destination port of 20.
PASV just connects the opposite direction, so the FTP client acts as a
TCP client and connects to the server's port instead of the other way
around. Assuming that only the client side will run behind NAT, this
will solve the NAT problem. But when the server is behind NAT, then the
same problem creeps up! It doesn't help with firewalls, though...
unless your firewall allows ALL outgoing connections, that is.
In general, I suggest using HTTP instead of FTP. If for no other reason
than because I'm tired of explaining this stuff :)
As an Amazon Associate we earn from qualifying purchases.