[prev in list] [next in list] [prev in thread] [next in thread] 

List:       grid-engine-dev
Subject:    question about behaviour of utilbin/<arch>/gethostbyaddr
From:       Chris Dagdigian <dag () sonsorol ! org>
Date:       2004-06-21 20:57:49
Message-ID: 40D74BCD.9030107 () sonsorol ! org
[Download RAW message or body]


The vast majority of SGE installation problems I see are related to 
hostname resolution issues. If everything is not perfectly resolving 
forwards and backwords a smooth install will not be likely to occur.

I got bit by this today (again) with SGE 6

The problem was with utilbin/gethostbyaddr

It does not cleanly handle reverse lookups on IPs that return only a 
hostname instead of a FQDN ptr record. It fails in a way that would 
confuse and frustrate someone attempting to intall SGE for the first 
time (assuming they can even trace the install failure to a hostname 
resolution issue...)

Every other lookup related tool such as dig, host and nslookup handles 
this situation just fine. It is only yhe gethostbyaddr in SGE's utilbin 
dir that fails and this will often be the root cause of a failure to 
successfully install a qmaster host.

My basic home lab setup uses a private internal-only DNS zone called 
"private.sonsorol.net". I've got forward and reverse DNS enabled but my 
zone maps for the reverse range just return a hostname pointer instead 
of a FQDN. This has never been a problem with any OS or any query tool 
and I think is how many zonefiles are setup by default:

An example would be:

> $ host bladebox
> bladebox.private.sonsorol.net has address 192.168.0.205

And the reverse...

> $ host 192.168.0.205
> 205.0.168.192.in-addr.arpa domain name pointer bladebox.

The problem with SGE utilbin/gethostbbyaddr is that I see this behavior 
with N1GE 6 on OS X and Linux:

$ ./utilbin/lx24-x86/gethostbyaddr -all 192.168.0.205
  error resolving ip "192.168.0.205": can't resolve ip (Success)

This sort of resolving error is all that you need to have a failed SGE 
install. It is only by habit and experience that I know to start running 
the utilbin programs when things go awry :) This could frustrate some 
new users I suspect.

The problem goes away if I change my reverse DNS zonefile to use a FQDN:

$ ./utilbin/lx24-x86/gethostbyaddr -all 192.168.0.205
  Hostname: bladebox.private.sonsorol.net
  SGE name: bladebox.private.sonsorol.net
  Aliases:
  Host Address(es): 192.168.0.205


My basic question is this:

Is there a reason why gethostbyaddr fails to properly perform a reverse 
IP lookup when non FQDN's are involved? Does it not honor search paths 
in resolv.conf or something?

If there is a valid reason :) can gethostbyaddr perhaps throw a more 
informative error message that points the user to the root cause?


Regards,
Chris



[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic