[prev in list] [next in list] [prev in thread] [next in thread]
List: npaci-rocks-discussion
Subject: [Rocks-Discuss]compute node problem
From: Scooter Willis <willishf () ufl ! edu>
Date: 2003-01-29 23:24:28
Message-ID: 3E3862AC.7010106 () ufl ! edu
[Download RAW message or body]
We are in the process of setting up a rockscluster and having a couple
problems.
My first step at building a compute node I had a keyboard mouse monitor
plugged in and was presented with the standard redhat install waiting
for me to answer questions. I figured out that I needed to remove
keyboard and mouse to force an install from the server via a config file
or some default option on the CD.
I ran insert-ethers on server and got an error about /home/install not
found. I created the directory and reran and got other errors and then
realized I needed to log in as root and reran couple more errors but
then had a response and the program ran waiting for a compute node to
connect. The mac address and 10.255.255.254 registered. I was unable to
telnet to 10.255.255.254 to watch the install. So I assume everything
installed.correctly and couldn't find a way to validate.
I ran the following command to try and kick off a job and got the ssh
error message.
1. Any clues on why I can't connect??
2. What can I do to validate the install of the compute node was
successfull??
/opt/mpich/ethernet/gcc/bin/mpirun -nolocal -np 2 -machinefile machines
/opt/hpl-eth/bin/gcc/xhpl
ssh: connect to address 10.255.255.254 port 22: Connection refused
I can ping compute-0-0
ping compute-0-0
PING compute-0-0 (10.255.255.254) from 10.1.1.1 : 56(84) bytes of data.
64 bytes from compute-0-0 (10.255.255.254): icmp_seq=1 ttl=255 time=0.136 ms
64 bytes from compute-0-0 (10.255.255.254): icmp_seq=2 ttl=255 time=0.128 ms
64 bytes from compute-0-0 (10.255.255.254): icmp_seq=3 ttl=255 time=0.106 ms
64 bytes from compute-0-0 (10.255.255.254): icmp_seq=4 ttl=255 time=0.109 ms
64 bytes from compute-0-0 (10.255.255.254): icmp_seq=5 ttl=255 time=0.110 ms
If I look at the hosts file I found a warning in the definition of
frontend-0 not sure if I should change to have it fix the problem.
cat /etc/hosts
#
# Do NOT Edit (generated by dbreport)
#
127.0.0.1 frontend-0 localhost
10.1.1.1 frontend-0 # warning: should be frontend-0-0
10.255.255.254 compute-0-0 c0-0
The following is /etc/dhcpd.conf
cat /etc/dhcpd.conf
#
# Do NOT Edit (generated by dbreport)
#
ddns-update-style none;
option space PXE;
option PXE.mtftp-ip code 1 = ip-address;
subnet 10.0.0.0 netmask 255.0.0.0 {
default-lease-time 6000;
max-lease-time 6000;
option broadcast-address 10.255.255.255;
option nis-domain "rocks";
option domain-name-servers 128.227.21.149;
option subnet-mask 255.0.0.0;
option routers 10.1.1.1;
if substring (option vendor-class-identifier, 0, 9) =
"PXEClient" {
filename "X86PC/UNDI/pxelinux/pxelinux.0";
option vendor-class-identifier "PXEClient";
option PXE.mtftp-ip 0.0.0.0;
vendor-option-space PXE;
} else {
filename "/install/kickstart.cgi";
}
host frontend-0 {
option host-name "frontend-0";
fixed-address 10.1.1.1;
}
host compute-0-0 {
hardware ethernet 00:e0:81:22:8a:ce;
option host-name "compute-0-0";
fixed-address 10.255.255.254;
}
}
[prev in list] [next in list] [prev in thread] [next in thread]
Configure |
About |
News |
Add a list |
Sponsored by KoreLogic