Re: sockets connect() behaving strangely on local network -- RPG400-L


Scott,

Thanks for getting back to me.  I'll try to shed some more light on our
particular situation, but bear in mind, I'm not a networking guy.

>> If I attempt to make a connect() to a server on our LAN
(199.xxx.xxx.xxx)
>> and disable all the interface that the LAN traffic travels on, the
program
>> sits at the connect() statement for about two minutes before returning
-1
>> as the return code.  Anyone have any idea why this is happening?


> In order to really be able to answer your question, I'd really need
> to understand all of the details of how your LAN(s) and internet
> connections are set up.  I'd also need more details about the errors, and
> probably to try a few different things to prove or disprove my
theories...

> However, I can take a guess:

> Most likely, what's happening is that the sockets API is creating an IP
> datagram ("packet") that is sent to the destination host as part of the
> setup for that connection.

> Normally, if the destination host doesn't want to received a connection
on
> the port, it sends back an ICMP datagram that indicates a "connection
> refused" message.   Your system receives that message, and knows
something
> went wrong, so connect() returns -1.  Because it's receiving a message
> back, it knows the connection won't work, so connect() ends just as soon
> as the ICMP message is received.

> When the destination host is actually not connected to the internet
> because a link is down or something like that, one of the gateway/router
> boxes (the one just before the linkage that's down) will know that the
host
> is unreachable, so it will send back an ICMP datagram that says "host
> unreachable" or "network unreachable" or something of that nature.

That sounds logical.  However, the destination host *is* connected to the
internet.
The local machine that the client is running on is not.  I am disabling the
IP interface on the client machine.  I'm trying to test for a scenario
where a NIC on the client machine is disabled, or becomes physically
disconnected
from the network.  Once I'm happy with those results, then I'll knock the
server off the network.

> However, none of this happens when connecting to a host on the LAN.
There
> is no router or gateway involved in that situation.   There is nothing
> to detect the error and send back an ICMP error message.  Do you see what
> I'm saying?  The AS/400 would normally send DIRECTLY to the host since
> they're on the same LAN.  Since the host isn't receiving the setup
> datagram, and no other computers are involved in the connection process,
> nothing ever sends back a "failure message."  Therefore, connect() will
> simply sit there until it times out.

Also sounds logical, but as I said, I am disconnecting the client machine
from the LAN.

> Another (less likely) possibility would be that the delay is happening
> on the gethostbyname() instead of connect().  This could potentially
> happen because your DNS server is available when the internet is down,
but
> not when the LAN is down.

This sounds logical as well, but I'm really doing most of my testing with
the
SCLIENT4 and SSERVER4 programs in the redbook, and they take dotted IP
addresses,
not hostnames.  I used CLIENTEX1 to see if I got the same results, and I
did.

Also, I ran both the clients in debug, and they sit and wait on the connect
().

> My question to you is:  Why does it matter?  What are you trying to
> accomplish?

We are attempting to write a sockets client and server to allow us to
communicate
with our sister companies that are all on a VPN into our LAN.  We're the
most IT
savvy of the group, so we're writing all of this.  This is going to be used
on
the fly to get data to use in CGI programs, so two minutes is too long for
a
timeout.  I'd like to know that the connection has failed immediately.

> Maybe I've got a solution for you...

I have a feeling you do.  I've seen you write about non-blocking sockets
elsewhere on
this lilst.  SCLIENT4 uses them, but it does not set the socket to
non-blocking
until *after* the connect.  I copied the code that sets the socket to
non-blocking
before the connect, and the connect() gives me the return code of -1
immediately
when I have disabled the interface.  Unfortunately, it also gives me a
return code
of -1 when the interface is up.

Does your tutorial have an example of a non-blocking connect() that will
properly
handle this situation.  I've only gotten to CLIENTEX1 so far.  But I didn't
want to
go too much farther without finding out if what's happening with the two
different
timeouts was a symptom of a problem with our network that would keep
*anything*
from working properly.  Sounds like that's not the case.

Thanks again for your help!