[prev in list] [next in list] [prev in thread] [next in thread] 

List:       npaci-rocks-discussion
Subject:    [Rocks-Discuss]compute node problem
From:       Scooter Willis <willishf () ufl ! edu>
Date:       2003-01-29 23:24:28
Message-ID: 3E3862AC.7010106 () ufl ! edu
[Download RAW message or body]

We are in the process of setting up a rockscluster and having a couple 
problems.

My first step at building a compute node I had a keyboard mouse monitor 
plugged in and was presented with the standard redhat install waiting 
for me to answer questions. I figured out that I needed to remove 
keyboard and mouse to force an install from the server via a config file 
or some default option on the CD.

I ran insert-ethers on server and got an error about /home/install not 
found. I created the directory and reran and got other errors and then 
realized I needed to log in as root and reran couple more errors but 
then had a response and the program ran waiting for a compute node to 
connect. The mac address and 10.255.255.254 registered. I was unable to 
telnet to 10.255.255.254 to watch the install. So I assume everything 
installed.correctly and couldn't find a way to validate.

I ran the following command to try and kick off a job and got the ssh 
error message.

1. Any clues on why I can't connect??

2. What can I do to validate the install of the compute node was 
successfull??

/opt/mpich/ethernet/gcc/bin/mpirun -nolocal -np 2 -machinefile machines 
/opt/hpl-eth/bin/gcc/xhpl
ssh: connect to address 10.255.255.254 port 22: Connection refused

I can ping compute-0-0

ping compute-0-0
PING compute-0-0 (10.255.255.254) from 10.1.1.1 : 56(84) bytes of data.
64 bytes from compute-0-0 (10.255.255.254): icmp_seq=1 ttl=255 time=0.136 ms
64 bytes from compute-0-0 (10.255.255.254): icmp_seq=2 ttl=255 time=0.128 ms
64 bytes from compute-0-0 (10.255.255.254): icmp_seq=3 ttl=255 time=0.106 ms
64 bytes from compute-0-0 (10.255.255.254): icmp_seq=4 ttl=255 time=0.109 ms
64 bytes from compute-0-0 (10.255.255.254): icmp_seq=5 ttl=255 time=0.110 ms

If I look at the hosts file I found a warning in the definition of 
frontend-0 not sure if I should change to have it fix the problem.

cat /etc/hosts
#
# Do NOT Edit (generated by dbreport)
#
127.0.0.1       frontend-0      localhost
10.1.1.1        frontend-0 # warning: should be frontend-0-0
10.255.255.254  compute-0-0 c0-0

The following is /etc/dhcpd.conf

cat /etc/dhcpd.conf
#
# Do NOT Edit (generated by dbreport)
#
ddns-update-style none;
option space PXE;
option PXE.mtftp-ip code 1 = ip-address;
subnet 10.0.0.0 netmask 255.0.0.0 {
        default-lease-time 6000;
        max-lease-time 6000;
        option broadcast-address 10.255.255.255;
        option nis-domain "rocks";
        option domain-name-servers 128.227.21.149;
        option subnet-mask 255.0.0.0;
        option routers 10.1.1.1;

        if substring (option  vendor-class-identifier, 0, 9)  = 
"PXEClient" {
                filename  "X86PC/UNDI/pxelinux/pxelinux.0";
                option vendor-class-identifier  "PXEClient";
                option PXE.mtftp-ip 0.0.0.0;
                vendor-option-space PXE;
        } else {
                filename "/install/kickstart.cgi";
        }

        host frontend-0 {
                option host-name "frontend-0";
                fixed-address 10.1.1.1;
        }
        host compute-0-0 {
                hardware ethernet 00:e0:81:22:8a:ce;
                option host-name "compute-0-0";
                fixed-address 10.255.255.254;
        }
}


[prev in list] [next in list] [prev in thread] [next in thread] 

Configure | About | News | Add a list | Sponsored by KoreLogic